[00:00:00] *** no reply needed *** keine Antwort erforderlich *** réponse pas indispensable *** [00:01:11] <^demon> Ryan_Lane: Still down? :( [00:01:27] ^demon: somewhat, yes [00:01:33] ^demon: do you need any specific projects up? [00:01:44] I need to stop and start each of the volumes individually [00:01:48] there's like 350 [00:01:57] <^demon> Eh, was going to do some work on gerrit-dev while watching Wheel & Jeopardy, but if you're busy doing things don't bother. [00:02:09] <^demon> (and gerrit-db) [00:02:41] which project is this? [00:03:04] <^demon> gerrit. [00:03:06] hell, I may need to kill the gluster process on all the clients too [00:03:10] we'll see [00:04:24] this is exactly why I've been avoiding this [00:11:55] ^demon: bastion's volumes are back [00:12:03] working on gerrit's [00:12:33] PROBLEM Current Load is now: WARNING on deployment-apache32.pmtpa.wmflabs 10.4.0.166 output: WARNING - load average: 4.52, 10.18, 7.79 [00:14:03] PROBLEM Current Load is now: WARNING on deployment-apache33.pmtpa.wmflabs 10.4.0.187 output: WARNING - load average: 3.17, 7.84, 7.10 [00:22:33] RECOVERY Current Load is now: OK on deployment-apache32.pmtpa.wmflabs 10.4.0.166 output: OK - load average: 0.07, 2.10, 4.64 [00:24:03] RECOVERY Current Load is now: OK on deployment-apache33.pmtpa.wmflabs 10.4.0.187 output: OK - load average: 0.04, 1.56, 4.13 [00:28:32] Ryan_Lane: hey [00:28:37] Ryan_Lane: nova/glance USNs [00:28:41] sound pretty serious [00:29:24] we don't use boot from volume [00:29:58] we also don't use a swift backend for glance [00:30:46] in the middle of a bad gluster outage [00:30:50] for the millionth time [00:38:23] RECOVERY Free ram is now: OK on newchanges-bot.pmtpa.wmflabs 10.4.0.221 output: OK: 22% free memory [00:38:33] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 6% free memory [00:39:33] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [00:46:23] PROBLEM Free ram is now: WARNING on newchanges-bot.pmtpa.wmflabs 10.4.0.221 output: Warning: 14% free memory [00:48:53] PROBLEM Free ram is now: WARNING on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: Warning: 15% free memory [00:57:33] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [00:58:52] PROBLEM Free ram is now: WARNING on bots-cb.pmtpa.wmflabs 10.4.0.44 output: Warning: 15% free memory [00:59:32] PROBLEM Free ram is now: WARNING on newprojectsfeed-bot.pmtpa.wmflabs 10.4.0.232 output: Warning: 17% free memory [01:03:53] PROBLEM Free ram is now: CRITICAL on bots-cb.pmtpa.wmflabs 10.4.0.44 output: Critical: 5% free memory [01:08:42] PROBLEM Free ram is now: CRITICAL on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Critical: 4% free memory [01:08:52] RECOVERY Free ram is now: OK on bots-cb.pmtpa.wmflabs 10.4.0.44 output: OK: 74% free memory [01:09:03] PROBLEM Total processes is now: WARNING on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS WARNING: 172 processes [01:13:34] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 6% free memory [01:14:32] Ryan_Lane, is there a handy way to get the fully-qualified hostname for a labs instance (e.g. foo.pmtpa.wmflabs) in puppet? [01:14:55] hm [01:15:25] dunno [01:15:25] on new instances it's just the fqdn [01:17:54] I'm not sure it's possible on old instances [01:17:58] I want to rename all instances [01:19:52] last time I tried to rename an instance it didn't work so well [01:21:15] there was INSTANCENAME in bashrc and you have $realm.. hmm.. combine those ? [01:22:03] well, at least the memory leak seems to be gone [01:22:03] PROBLEM Current Load is now: WARNING on deployment-apache33.pmtpa.wmflabs 10.4.0.187 output: WARNING - load average: 7.61, 7.57, 5.96 [01:22:05] in gluster [01:22:41] I need to upgrade the clients after this [01:22:42] * Ryan_Lane sighs [01:24:06] andrewbogott: this is at least the first part of it https://gerrit.wikimedia.org/r/#/c/13444/1/templates/environment/bash.bashrc [01:24:06] RECOVERY Total processes is now: OK on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS OK: 96 processes [01:24:16] Ryan_Lane: Wait, 'just the fqdn'? Is $fqdn (or something like it) a puppet var? [01:24:25] yeah [01:24:30] but, that won't work on old instances [01:24:39] because their fqdn is actually the i-xxx names [01:25:02] Yep, that doesn't worry me in this case… I just want a reasonable default. [01:25:34] PROBLEM Current Load is now: WARNING on deployment-apache32.pmtpa.wmflabs 10.4.0.166 output: WARNING - load average: 5.34, 7.32, 6.19 [01:26:03] ah ok [01:26:05] yeah, that's easiest [01:26:46] PROBLEM Free ram is now: CRITICAL on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Critical: 5% free memory [01:31:03] <^demon> Ryan_Lane: Were you able to get gerrit-dev back up? [01:31:09] it should be [01:31:13] PROBLEM Free ram is now: WARNING on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Warning: 6% free memory [01:31:31] yep [01:31:31] it is [01:31:38] <^demon> Ah yep, there is is. [01:31:41] <^demon> Thank you sir. [01:31:45] yw [01:32:15] I think most project should be up, but some may have missing files until all their bricks are available [01:35:49] also, gluster is going to be slower than average right now [01:35:49] since I'm pegging out its cpu [01:57:39] newsudotestproj-home ? [01:57:43] what project is that? :D [01:57:55] we should try to avoid creating single use projects [01:58:08] since it creates two gluster volumes [01:59:03] in fact, at some point we should look into how to properly delete projects, so that we can delete the gluster volumes that go with them [02:00:25] so what currently happens with them? they just stick around for eternity? [02:00:31] yep [02:00:38] they do now anyway [02:00:48] the gluster volumes are actually problematic [02:01:32] I think to properly delete a project we'd need to make sure we deleted all references to it in ldap, labsconsole, and nova [02:03:42] PROBLEM Free ram is now: CRITICAL on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Critical: 4% free memory [02:04:35] and then even make sure that same project name is never used again or project logs would be confusing? [02:04:51] we'd delete that too [02:05:05] I'm pretty sure labsconsole does [02:06:03] would it maybe make sense to keep previous project's logs for archival purposes? [02:08:02] probablty [02:08:05] *probably [02:08:16] I doubt we'll put much resources into getting deletions working, though [02:08:35] I may just have gluster volumes turned off for projects with no instances [02:08:58] that would also solve part of the issue [02:16:24] PROBLEM Free ram is now: WARNING on bots-3.pmtpa.wmflabs 10.4.0.59 output: Warning: 19% free memory [02:19:23] Ryan_Lane or anyone, is https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots for bots that run on wiki sites or bots that participate on IRC? [02:21:19] either/or [02:21:24] Ryan_Lane: I can't ssh into cvn-apache2.pmtpa.wmflabs though it is responding fine from the web. [02:21:36] http://cvn.wmflabs.org https://labsconsole.wikimedia.org/wiki/Nova_Resource:Cvn [02:21:45] Krinkle: we're having a glusterfs outage right now [02:21:50] k [02:22:51] hm [02:23:00] actually [02:23:03] it should be working [02:25:30] Krinkle: cna you try logging into it? [02:25:30] I'm tailing its log [02:25:30] Ryan_Lane: yep [02:25:30] 90 packages can be updated. [02:25:30] *** /dev/vdb will be checked for errors at next reboot *** [02:25:30] *** System restart required *** [02:35:34] RECOVERY Current Load is now: OK on deployment-apache32.pmtpa.wmflabs 10.4.0.166 output: OK - load average: 0.57, 2.57, 4.72 [02:37:04] RECOVERY Current Load is now: OK on deployment-apache33.pmtpa.wmflabs 10.4.0.187 output: OK - load average: 0.16, 1.86, 4.24 [02:39:50] Ryan_Lane: I can use git-clone to pull down data to my home directory, but I can't pass git the second argument for target directory, nor can I `mv` the directory afterwards. [02:39:59] Is this related to the problem you're dealing with? [02:40:22] :D All the errors! "Could not acquire 'labswiki:messages:en:status' lock." [02:40:59] $ mv foo/ bar/ [02:40:59] mv: accessing `bar': Invalid argument [02:41:19] Krinkle: yes. very likely [02:42:00] > mv: cannot move `foo/' to `bar/': Transport endpoint is not connected [02:43:15] Krinkle: try now [02:53:13] Ryan_Lane, there didn't seem to be any issues/questions that arose on the topic of the Opensim project, so i was wondering how you wanted to proceed? [02:53:53] JasonDC: let me get back to you after I fix things [02:54:09] sure thing, business comes first [03:19:23] whom should I ask nicely to add me to the NovaProject "bots" ? [03:21:13] PROBLEM Free ram is now: CRITICAL on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Critical: 5% free memory [03:31:13] PROBLEM Free ram is now: WARNING on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Warning: 6% free memory [03:31:16] Damianz or petan [03:34:42] Damianz or petan, can you add me (Spage) to the NovaProject "bots"? I'm learning puppet and my LevelUp goal from Andrew Bogott is to deploy logbot in E3's channel. Thanks! [03:36:12] PROBLEM Free ram is now: CRITICAL on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Critical: 5% free memory [04:19:36] i'm trying to go through my first wmf git commit with gerrit and running into a problem: "Exception: Could not connect to gerrit at ssh://Emw@gerrit.wikimedia.org:29418/mediawiki/extensions/PDBHandler.git" [04:20:25] i'm probably missing something basic, but i've looked around a bit and don't see what it is -- any ideas? [04:27:06] ah, https://bugzilla.wikimedia.org/show_bug.cgi?id=44398#c5 implies that going through the instructions outlined on https://labsconsole.wikimedia.org/wiki/Git-review on a labs instance doesn't work [04:27:36] so do developers use git-review from labs instances? [04:31:14] PROBLEM Free ram is now: WARNING on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Warning: 6% free memory [04:37:34] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [04:38:34] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 6% free memory [04:38:54] RECOVERY Free ram is now: OK on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: OK: 23% free memory [04:39:34] RECOVERY Free ram is now: OK on newprojectsfeed-bot.pmtpa.wmflabs 10.4.0.232 output: OK: 24% free memory [04:41:23] RECOVERY Free ram is now: OK on newchanges-bot.pmtpa.wmflabs 10.4.0.221 output: OK: 22% free memory [04:49:22] PROBLEM Free ram is now: WARNING on newchanges-bot.pmtpa.wmflabs 10.4.0.221 output: Warning: 14% free memory [04:50:32] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [04:58:34] PROBLEM Free ram is now: CRITICAL on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Critical: 4% free memory [05:02:32] PROBLEM Free ram is now: WARNING on newprojectsfeed-bot.pmtpa.wmflabs 10.4.0.232 output: Warning: 16% free memory [05:06:53] PROBLEM Free ram is now: WARNING on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: Warning: 15% free memory [05:16:13] PROBLEM Free ram is now: WARNING on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Warning: 6% free memory [05:36:12] PROBLEM Free ram is now: CRITICAL on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Critical: 5% free memory [06:06:23] PROBLEM Free ram is now: CRITICAL on bots-3.pmtpa.wmflabs 10.4.0.59 output: Critical: 5% free memory [06:26:06] PROBLEM Current Load is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: WARNING - load average: 5.38, 5.53, 5.16 [06:28:26] PROBLEM Total processes is now: WARNING on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS WARNING: 151 processes [06:29:36] PROBLEM Current Load is now: WARNING on deployment-apache32.pmtpa.wmflabs 10.4.0.166 output: WARNING - load average: 5.20, 6.55, 5.43 [06:31:52] RECOVERY Free ram is now: OK on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: OK: 30% free memory [06:33:52] PROBLEM Current Load is now: WARNING on deployment-apache33.pmtpa.wmflabs 10.4.0.187 output: WARNING - load average: 8.64, 7.24, 5.92 [06:38:24] RECOVERY Total processes is now: OK on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS OK: 145 processes [06:38:54] RECOVERY Current Load is now: OK on deployment-apache33.pmtpa.wmflabs 10.4.0.187 output: OK - load average: 0.32, 3.81, 4.93 [06:39:34] RECOVERY Current Load is now: OK on deployment-apache32.pmtpa.wmflabs 10.4.0.166 output: OK - load average: 0.22, 3.28, 4.81 [06:44:13] PROBLEM Disk Space is now: CRITICAL on bots-3.pmtpa.wmflabs 10.4.0.59 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:21] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [06:57:33] RECOVERY Free ram is now: OK on newprojectsfeed-bot.pmtpa.wmflabs 10.4.0.232 output: OK: 32% free memory [06:59:23] RECOVERY Free ram is now: OK on newchanges-bot.pmtpa.wmflabs 10.4.0.221 output: OK: 29% free memory [07:07:52] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 153 processes [07:17:43] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 150 processes [07:20:23] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [07:36:33] Ryan_Lane: [07:36:40] Wikinaut: ? [07:36:53] good morning [07:36:59] Hi, good morning fro´m Berlin (08:36 local time) [07:37:24] hi [07:38:22] Ryan_Lane: there is still what I call a problem (bug?) in the Janrain library, or a bug in the E:OpenID w.r.t. to a identity selection. [07:38:39] the OpenID protocol requires for this.. [07:38:45] a special (fixed) code [07:38:57] (mom) [07:39:07] mom? [07:39:26] one moment pls [07:39:31] "http://specs.openid.net/auth/2.0/identifier_select" [07:39:58] When I plug this in, the protocol - as it is implemented now - fails. When I patch it, it works. [07:40:03] In short [07:40:08] I need more time. [07:40:15] That's the info. [07:40:19] ok [07:41:23] http://openid.net/specs/openid-authentication-2_0.html 9.1 [07:41:25] "Note: If this is set to the special value "http://specs.openid.net/auth/2.0/identifier_select" then the OP SHOULD choose an Identifier that belongs to the end user." [07:41:51] and section 10. [07:42:06] I googled for possible problems with Janrain... [07:42:49] but as said: I am not yet ready. [07:43:06] that's fine [07:43:23] You will hear or read from me in 1.5 weeks [07:43:40] because of other urgent duties [07:45:22] * Ryan_Lane nods [07:50:32] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [08:02:17] Ryan_Lane what happened [08:02:29] I did a point upgrade of gluster server [08:02:31] how long was that outage from my logs it seems to be weird [08:02:35] a *point* upgrade [08:02:52] like if it was don't only for a short periods for several times [08:02:55] * down [08:03:00] kind of, yes [08:03:11] aha [08:03:13] PROBLEM dpkg-check is now: CRITICAL on asher1.pmtpa.wmflabs 10.4.0.10 output: DPKG CRITICAL dpkg reports broken packages [08:03:31] it wasn't down from when I sent the first email till the last [08:03:37] aha [08:03:40] some projects were only down for maybe 30 minutes [08:03:44] others for hours [08:04:40] ok it's funny because wm-bot is using gluster or it writes to /data/project but nothing is missing in logs so far [08:04:43] PROBLEM dpkg-check is now: CRITICAL on bastion1.pmtpa.wmflabs 10.4.0.54 output: DPKG CRITICAL dpkg reports broken packages [08:05:17] nothing should be missing [08:05:31] it may not have been able to write for some period of time, though [08:05:47] aha, if it can't write it's ok [08:05:50] because it uses cache [08:05:53] ah [08:06:02] but problem is if it can write but it goes nowhere :P [08:06:07] thankfully the upgrade did at least seem to fix the bugs I was aiming to fix [08:06:09] like /dev/null [08:06:12] heh [08:06:30] I still need to upgrade the damn clients [08:06:34] * Ryan_Lane sighs [08:07:14] RECOVERY dpkg-check is now: OK on openstack-wiki-instance.pmtpa.wmflabs 10.4.1.49 output: All packages OK [08:07:34] PROBLEM dpkg-check is now: CRITICAL on udp-filter.pmtpa.wmflabs 10.4.0.135 output: DPKG CRITICAL dpkg reports broken packages [08:07:44] PROBLEM dpkg-check is now: CRITICAL on nginx-ffuqua-doom1-3.pmtpa.wmflabs 10.4.0.80 output: DPKG CRITICAL dpkg reports broken packages [08:11:12] PROBLEM Free ram is now: CRITICAL on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Critical: 5% free memory [08:20:34] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [08:22:24] PROBLEM dpkg-check is now: CRITICAL on outreacheval.pmtpa.wmflabs 10.4.0.91 output: DPKG CRITICAL dpkg reports broken packages [08:24:48] !logs [08:24:48] logs http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs [08:38:44] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 6% free memory [08:40:32] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [08:41:12] PROBLEM Free ram is now: WARNING on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Warning: 7% free memory [08:44:58] if someone won't fix the RAM problems on switf-be4 I will put it to ignore [08:48:34] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [08:48:34] PROBLEM Free ram is now: CRITICAL on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Critical: 4% free memory [08:50:52] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [08:56:13] PROBLEM Free ram is now: CRITICAL on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Critical: 5% free memory [09:02:33] PROBLEM Free ram is now: WARNING on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: Warning: 16% free memory [09:06:12] PROBLEM Free ram is now: WARNING on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Warning: 7% free memory [09:20:53] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [09:21:24] !log nagios ignoring all swift-be* instances - no one cares about them and they are spamming channel [09:21:26] Logged the message, Master [09:28:51] @seenrx spag [09:28:51] petan: Last time I saw spagewmf they were quitting the network with reason: Ping timeout: 240 seconds at 1/30/2013 8:14:41 AM (01:14:09.8099110 ago) (multiple results were found: spagewmf1, spagewmfx) [09:30:11] if UTC date is given, maybe date should be in ISO format? :) [09:34:01] what is wiki name of spagewmf [09:34:08] @labs-user Spagewmf [09:34:08] That user is not a member of any project [09:34:12] @labs-user SpageWMF [09:34:12] That user is not a member of any project [09:41:52] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 154 processes [09:45:02] RECOVERY host: bots-3.pmtpa.wmflabs is UP address: 10.4.0.59 PING OK - Packet loss = 0%, RTA = 9.28 ms [09:46:24] RECOVERY Free ram is now: OK on bots-3.pmtpa.wmflabs 10.4.0.59 output: OK: 93% free memory [09:49:04] RECOVERY Disk Space is now: OK on bots-3.pmtpa.wmflabs 10.4.0.59 output: DISK OK [09:51:43] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 150 processes [10:10:44] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 151 processes [11:11:52] hashar, yesterday I had to revert https://gerrit.wikimedia.org/r/#/c/46240/ temporarily because it wasn't deployed [11:17:57] MaxSem: ahhh [11:18:03] MaxSem: thanks for thenotification [11:18:34] I would've deployed it anyway hadn't fatalmonitor been broken [11:29:46] MaxSem: I would have reverted that change just ilk eyou did [11:29:54] MaxSem: I think Chad got it +2ed [11:30:00] then Jenkins merged the change in [11:30:06] and we both forgot to deploy it [11:30:33] we could deploy it now once and forever:) [11:35:52] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 151 processes [11:45:43] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 150 processes [12:37:34] RECOVERY Free ram is now: OK on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: OK: 24% free memory [12:38:32] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [12:45:32] PROBLEM Free ram is now: WARNING on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: Warning: 16% free memory [12:46:32] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [13:53:52] PROBLEM Current Load is now: CRITICAL on checkwiki-web2.pmtpa.wmflabs 10.4.1.55 output: Connection refused by host [13:54:32] PROBLEM Disk Space is now: CRITICAL on checkwiki-web2.pmtpa.wmflabs 10.4.1.55 output: Connection refused by host [13:55:14] PROBLEM Free ram is now: CRITICAL on checkwiki-web2.pmtpa.wmflabs 10.4.1.55 output: Connection refused by host [13:58:54] RECOVERY Current Load is now: OK on checkwiki-web2.pmtpa.wmflabs 10.4.1.55 output: OK - load average: 0.15, 0.74, 0.54 [13:59:34] RECOVERY Disk Space is now: OK on checkwiki-web2.pmtpa.wmflabs 10.4.1.55 output: DISK OK [14:00:14] RECOVERY Free ram is now: OK on checkwiki-web2.pmtpa.wmflabs 10.4.1.55 output: OK: 89% free memory [14:03:44] RECOVERY Total processes is now: OK on checkwiki-web.pmtpa.wmflabs 10.4.0.24 output: PROCS OK: 84 processes [14:03:54] RECOVERY Current Load is now: OK on checkwiki-web.pmtpa.wmflabs 10.4.0.24 output: OK - load average: 0.10, 0.60, 0.45 [14:04:24] RECOVERY dpkg-check is now: OK on checkwiki-web.pmtpa.wmflabs 10.4.0.24 output: All packages OK [14:04:34] RECOVERY Disk Space is now: OK on checkwiki-web.pmtpa.wmflabs 10.4.0.24 output: DISK OK [14:05:12] RECOVERY Free ram is now: OK on checkwiki-web.pmtpa.wmflabs 10.4.0.24 output: OK: 87% free memory [14:25:12] PROBLEM Free ram is now: WARNING on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: Warning: 19% free memory [14:25:52] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 152 processes [14:28:52] PROBLEM Current Load is now: CRITICAL on checkwiki-web3.pmtpa.wmflabs 10.4.1.74 output: Connection refused by host [14:29:32] PROBLEM Disk Space is now: CRITICAL on checkwiki-web3.pmtpa.wmflabs 10.4.1.74 output: Connection refused by host [14:30:13] PROBLEM Free ram is now: CRITICAL on checkwiki-web3.pmtpa.wmflabs 10.4.1.74 output: Connection refused by host [14:33:53] RECOVERY Current Load is now: OK on checkwiki-web3.pmtpa.wmflabs 10.4.1.74 output: OK - load average: 0.06, 0.51, 0.40 [14:34:33] RECOVERY Disk Space is now: OK on checkwiki-web3.pmtpa.wmflabs 10.4.1.74 output: DISK OK [14:35:12] RECOVERY Free ram is now: OK on checkwiki-web3.pmtpa.wmflabs 10.4.1.74 output: OK: 89% free memory [15:15:52] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 150 processes [15:17:21] !log deployment-prep removing -cache-bits-02 (been replaced a long time ago by -cache-bits-03) [15:17:23] Logged the message, Master [15:28:53] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 153 processes [15:40:13] RECOVERY Free ram is now: OK on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: OK: 20% free memory [16:40:33] RECOVERY Free ram is now: OK on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: OK: 24% free memory [16:41:33] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [16:48:32] PROBLEM Free ram is now: WARNING on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: Warning: 16% free memory [16:49:32] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [16:56:23] PROBLEM Free ram is now: WARNING on message-remailer.pmtpa.wmflabs 10.4.0.251 output: Warning: 16% free memory [17:27:22] hello, can anybody add me to bastion project..?? [17:27:42] my username on wikilabs website is zeek [17:28:05] zeekzack: I can… just a minute. [17:28:49] zeekzack: Done. It'll take a few minutes for the change to take effect. [17:28:59] okk..thanks a lot..!! [17:42:23] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 7.36, 6.35, 5.51 [17:46:31] after doing things on my local system... [17:46:42] how to do things on bastion?? [17:48:07] i am following steps in Help:Access in which it tells to execute statement to execute on bastion...how to do that?? [17:48:55] zeekzack: Can you tell me what you're trying to accomplish? [17:52:17] i am following https://labsconsole.wikimedia.org/wiki/Help:Access#Using_agent_forwarding [17:52:24] in which steps described for local system have been accomplished. [17:53:23] So you've done the third step, 'ssh -A @bastion.wmflabs.org' [17:53:25] ? [17:54:14] yes.. [17:54:39] how to do "on bastion" part ? [17:54:40] And it worked? If so, you should be logged into bastion already. [17:55:15] That's what 'ssh -A @bastion.wmflabs.org' does… it logs you into bastion. [17:56:33] ok..i.e. dont need to do on bastion thing ?? [17:57:01] Well, so, again, it would be useful for me to know what it is you're trying to do. [17:57:29] Bastion is a step on the way to labs instances. So, you connect to bastion, and then from there you connect to your instance. [17:58:45] okk.. [17:58:46] i m newbie...and learnin things.. [18:04:48] when i try to clone whole core repsitory, i am getting error of permission denial [18:05:26] <^demon> pastebin? [18:07:55] http://pastebin.com/KvP6dwGf [18:23:21] <^demon> zeekzack1: That just sounds like your key isn't in your agent. As long as the key works normally (like on your localhost) and this is on a labs instance, make sure you're forwarding your key with -A when ssh'ing. [18:23:28] <^demon> Alternatively, you can clone over https. [18:28:54] PROBLEM Current Load is now: CRITICAL on gerrit-dev-fresh.pmtpa.wmflabs 10.4.0.24 output: Connection refused by host [18:29:56] PROBLEM Disk Space is now: CRITICAL on gerrit-dev-fresh.pmtpa.wmflabs 10.4.0.24 output: Connection refused by host [18:29:57] RECOVERY dpkg-check is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: All packages OK [18:30:15] PROBLEM Free ram is now: CRITICAL on gerrit-dev-fresh.pmtpa.wmflabs 10.4.0.24 output: Connection refused by host [18:32:25] RECOVERY dpkg-check is now: OK on outreacheval.pmtpa.wmflabs 10.4.0.91 output: All packages OK [18:32:35] RECOVERY dpkg-check is now: OK on udp-filter.pmtpa.wmflabs 10.4.0.135 output: All packages OK [18:32:45] RECOVERY dpkg-check is now: OK on nginx-ffuqua-doom1-3.pmtpa.wmflabs 10.4.0.80 output: All packages OK [18:33:15] RECOVERY dpkg-check is now: OK on asher1.pmtpa.wmflabs 10.4.0.10 output: All packages OK [18:33:25] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 159 processes [18:33:55] RECOVERY Current Load is now: OK on gerrit-dev-fresh.pmtpa.wmflabs 10.4.0.24 output: OK - load average: 0.14, 0.61, 0.43 [18:34:35] RECOVERY Disk Space is now: OK on gerrit-dev-fresh.pmtpa.wmflabs 10.4.0.24 output: DISK OK [18:35:25] RECOVERY Free ram is now: OK on gerrit-dev-fresh.pmtpa.wmflabs 10.4.0.24 output: OK: 96% free memory [18:36:22] PROBLEM Free ram is now: CRITICAL on message-remailer.pmtpa.wmflabs 10.4.0.251 output: Critical: 5% free memory [18:38:07] remote end hung up unexpectedly : http://pastebin.com/2bNZ5SC0 [18:49:32] memcached restart imminent. it'll log you out if you aren't using long-lived tokens [18:54:34] Ryan_Lane: log us out of all WMF sites? [18:54:39] or just labs [18:54:40] never mnind [18:54:43] heh [18:54:55] sorry, took a sec to see what channel I was in [18:55:00] * Ryan_Lane nods [18:55:10] I logged the action in the -operations channel [18:55:18] I downgraded memcached on virt0 [18:56:35] hmm hungry [19:06:54] PROBLEM Current Load is now: WARNING on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: WARNING - load average: 5.65, 5.45, 5.12 [19:10:32] PROBLEM dpkg-check is now: CRITICAL on gerrit-dev-fresh.pmtpa.wmflabs 10.4.0.24 output: DPKG CRITICAL dpkg reports broken packages [19:10:43] <^demon> Oh shut up nagios. [19:15:22] RECOVERY dpkg-check is now: OK on gerrit-dev-fresh.pmtpa.wmflabs 10.4.0.24 output: All packages OK [19:15:58] * Damianz pats ^demon [19:16:24] <^demon> I was in the middle of apt-get update. [19:16:29] <^demon> Of course the dpkg check is going to fail. [19:16:33] <^demon> :) [19:18:22] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 150 processes [19:31:32] PROBLEM Free ram is now: WARNING on message-remailer.pmtpa.wmflabs 10.4.0.251 output: Warning: 8% free memory [19:43:22] PROBLEM dpkg-check is now: CRITICAL on rt-puppetdev.pmtpa.wmflabs 10.4.0.201 output: DPKG CRITICAL dpkg reports broken packages [19:48:23] RECOVERY dpkg-check is now: OK on rt-puppetdev.pmtpa.wmflabs 10.4.0.201 output: All packages OK [19:51:22] RECOVERY Free ram is now: OK on message-remailer.pmtpa.wmflabs 10.4.0.251 output: OK: 31% free memory [20:06:37] On piramido (on labs), memcached sometimes stops working. [20:06:40] Symptom: [20:06:41] Could not acquire 'testwiki:messages:en:status' lock [20:06:53] We have to restart every so often, but we'd like to track down the underlying problem. [20:18:47] beta labs is down? [20:19:42] seems so [20:19:53] chrismcmahon: likely related to the gluster outage yesterday [20:20:37] just had a bunch of tests fail, thanks Ryan_Lane [20:26:52] RECOVERY Current Load is now: OK on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: OK - load average: 4.98, 4.89, 4.98 [20:33:33] PROBLEM dpkg-check is now: CRITICAL on pediapress-packager2.pmtpa.wmflabs 10.4.0.13 output: DPKG CRITICAL dpkg reports broken packages [20:39:32] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [20:42:25] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.79, 4.81, 4.96 [20:43:35] RECOVERY Free ram is now: OK on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: OK: 40% free memory [20:43:35] RECOVERY dpkg-check is now: OK on pediapress-packager2.pmtpa.wmflabs 10.4.0.13 output: All packages OK [20:57:55] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [21:38:38] PROBLEM Current Load is now: WARNING on nagios-main.pmtpa.wmflabs 10.4.0.120 output: WARNING - load average: 9.96, 9.54, 7.11 [22:08:53] RECOVERY Current Load is now: OK on nagios-main.pmtpa.wmflabs 10.4.0.120 output: OK - load average: 1.26, 1.60, 4.07 [22:18:53] PROBLEM Current Load is now: CRITICAL on rt-puppetdev2.pmtpa.wmflabs 10.4.0.24 output: Connection refused by host [22:19:33] PROBLEM Disk Space is now: CRITICAL on rt-puppetdev2.pmtpa.wmflabs 10.4.0.24 output: Connection refused by host [22:20:12] PROBLEM Free ram is now: CRITICAL on rt-puppetdev2.pmtpa.wmflabs 10.4.0.24 output: Connection refused by host [22:23:52] RECOVERY Current Load is now: OK on rt-puppetdev2.pmtpa.wmflabs 10.4.0.24 output: OK - load average: 0.18, 0.71, 0.48 [22:24:32] RECOVERY Disk Space is now: OK on rt-puppetdev2.pmtpa.wmflabs 10.4.0.24 output: DISK OK [22:25:13] RECOVERY Free ram is now: OK on rt-puppetdev2.pmtpa.wmflabs 10.4.0.24 output: OK: 91% free memory [23:32:16] PROBLEM Current Load is now: WARNING on nagios-main.pmtpa.wmflabs 10.4.0.120 output: WARNING - load average: 5.83, 7.19, 5.77 [23:38:15] PROBLEM Free ram is now: WARNING on rt-puppetdev2.pmtpa.wmflabs 10.4.0.24 output: Warning: 10% free memory [23:47:23] RECOVERY Current Load is now: OK on nagios-main.pmtpa.wmflabs 10.4.0.120 output: OK - load average: 1.65, 3.03, 4.32 [23:53:52] PROBLEM Total processes is now: CRITICAL on rt-puppetdev3.pmtpa.wmflabs 10.4.0.195 output: Connection refused by host [23:54:32] PROBLEM dpkg-check is now: CRITICAL on rt-puppetdev3.pmtpa.wmflabs 10.4.0.195 output: Connection refused by host [23:56:04] PROBLEM Current Load is now: CRITICAL on rt-puppetdev3.pmtpa.wmflabs 10.4.0.195 output: Connection refused by host [23:56:44] PROBLEM Disk Space is now: CRITICAL on rt-puppetdev3.pmtpa.wmflabs 10.4.0.195 output: Connection refused by host [23:57:24] PROBLEM Free ram is now: CRITICAL on rt-puppetdev3.pmtpa.wmflabs 10.4.0.195 output: Connection refused by host [23:58:54] RECOVERY Total processes is now: OK on rt-puppetdev3.pmtpa.wmflabs 10.4.0.195 output: PROCS OK: 83 processes [23:59:34] RECOVERY dpkg-check is now: OK on rt-puppetdev3.pmtpa.wmflabs 10.4.0.195 output: All packages OK