[00:06:22] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 4.71, 5.21, 5.01 [00:15:24] PROBLEM Free ram is now: WARNING on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: Warning: 9% free memory [00:20:25] RECOVERY Free ram is now: OK on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: OK: 21% free memory [00:37:51] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [00:39:32] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 21% free memory [00:41:23] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.02, 4.37, 4.98 [00:43:23] PROBLEM Free ram is now: WARNING on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: Warning: 17% free memory [00:55:53] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [01:03:23] RECOVERY Free ram is now: OK on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: OK: 25% free memory [01:07:37] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [01:07:37] PROBLEM Total processes is now: WARNING on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS WARNING: 174 processes [01:12:45] RECOVERY Total processes is now: OK on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS OK: 96 processes [04:37:38] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 21% free memory [04:40:53] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [04:55:32] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [05:08:52] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [05:55:52] PROBLEM Total processes is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to popen() failed [05:55:52] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:56:22] PROBLEM Disk Space is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:57:22] PROBLEM Current Load is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:00:53] RECOVERY Total processes is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: PROCS OK: 172 processes [06:00:53] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [06:01:23] RECOVERY Disk Space is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: DISK OK [06:02:23] RECOVERY Current Load is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: OK - load average: 0.01, 0.14, 0.24 [06:49:32] PROBLEM Free ram is now: WARNING on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: Warning: 17% free memory [07:04:33] RECOVERY Free ram is now: OK on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: OK: 20% free memory [08:25:12] Damianz did you restart wm-bot [08:25:49] !sal [08:25:49] https://labsconsole.wikimedia.org/wiki/Server_Admin_Log see it and you will know all you need [08:38:52] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [08:40:32] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 21% free memory [08:58:33] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 16% free memory [09:06:53] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [09:45:18] hey hashar [09:45:19] <3 wm-bot [09:45:27] I wanted to add apaches to beta [09:45:39] because of performance [09:45:46] users complain it's slow [09:46:04] maybe we should use local storage instead of gluster for data [09:46:11] isn't that how it works on prod [09:46:31] just woke up :( [09:46:36] that damn flu is not passing [09:46:53] so hmm [09:47:00] that is not apaches fault afaik :-] [09:47:29] if you like at the code source for a page, you get a HTML comment saying <-- served by XXX in YY seconds --> [09:47:35] that is usually low number [09:48:45] oh [09:49:09] something changed http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page?uncached [09:49:18] gives me more than 500ms [09:49:19] grr [09:50:18] must be super gluster being slow again [09:57:23] hashar what is that then? [09:57:31] btw performance of labs is worse than ever [09:57:35] !ping [09:57:35] pong [09:58:31] I think Ryan did some change to gluster this week [09:59:05] can you be specific when you say "performance of labs"? [09:59:58] oh yeah Ryan is an European this week [09:59:59] \O/ [10:00:07] login speed is bad right now. let me check ldap issues [10:01:47] Ryan_Lane that is what I mean :P [10:01:49] login to instance takes like 1 minute [10:01:49] @notify Damianz [10:01:49] This user is now online in #wikimedia-tech so I will let you know when they show some activity (talk etc) [10:02:04] ah [10:02:19] it's on instances that don't yet have gluster homedirs mounted yet [10:02:26] o.o [10:02:28] howcome [10:02:38] because no one accessed them [10:02:39] projectstorage.pmtpa.wmnet:/bots-home on /home type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) [10:02:44] and they unmounted [10:02:46] eh [10:02:49] why it happens [10:02:54] because it's an automount [10:02:54] do we need to unmount them ever [10:03:04] can we make it perma mount [10:03:50] not unless we want gluster outages to also cause complete instances outages too [10:03:51] + there is a process of glusterfs eating about 2gb of ram [10:03:53] can I kill it [10:04:22] yes [10:04:23] then restart autofs [10:04:31] oh wait. no. this is ldap calls [10:04:41] it eats 4gb of ram now [10:04:46] it's on bots-bnr1 [10:05:00] 4560mb [10:06:20] bleh [10:06:30] fucking upgraded gluster package didn't remove the old one [10:06:38] the old one had a memory leak [10:06:44] aha [10:06:48] so how to fix it [10:06:51] :/ [10:06:55] I fixed it [10:06:57] ok [10:07:00] I'm going to fix it in puppet for the rest [10:07:27] wow [10:07:31] weird [10:07:35] ? [10:07:46] gifti's process eats 3gb of ram too but free now shows only 400mb used [10:08:11] swap is 0 [10:08:43] but it's virtual memory so who knows [10:08:54] I think instances were failed over to virt1000 [10:09:16] login seems to be fast again [10:09:19] nslcd is such a giant piece of crap [10:09:44] wait [10:09:45] wtf now my dns stopped resolving wmflabs :/ [10:09:47] I take it back [10:09:59] now it does [10:10:03] ugh [10:10:10] I forgot to restart pdns [10:10:59] we need to not use ldap backend for pdns [10:11:20] :| [10:11:24] once it can't reach ldap it never tries again [10:14:20] yeah. login is fast again [10:14:37] how is ldap client behavior so bad now? [10:14:50] it was much better with nssldap [10:19:42] hm. funny enough glusterfs mounting is much faster in this version of glusterfs [10:25:53] RECOVERY dpkg-check is now: OK on deployment-cache-bits03.pmtpa.wmflabs 10.4.0.51 output: All packages OK [12:38:32] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 20% free memory [12:41:53] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [12:54:52] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 13% free memory [12:56:33] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 16% free memory [13:08:19] I'm looking for someone with super powers to merge puppet stuff. paravoid, are you there? [13:08:32] I am [13:09:21] cool! hi! [13:09:42] I sent you two small things [13:10:06] 47726 and 47585 [13:35:22] !g 47726 [13:35:22] https://gerrit.wikimedia.org/r/#q,47726,n,z [13:35:26] !g 47585 [13:35:26] https://gerrit.wikimedia.org/r/#q,47585,n,z [13:35:42] ah reviewed already :-) [14:04:53] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 154 processes [14:06:32] PROBLEM Free ram is now: WARNING on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: Warning: 16% free memory [14:09:52] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 150 processes [14:11:33] RECOVERY Free ram is now: OK on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: OK: 20% free memory [14:49:32] PROBLEM Free ram is now: WARNING on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: Warning: 19% free memory [14:53:18] hashar ur here? [14:53:27] what do you think that is making apaches slow then [14:53:34] gluster [14:53:34] is it possible to debug it in mediawiki [14:53:38] why do you think that? [14:53:52] so if we moved the docroot to local storage it would be better? [14:54:02] cause 90% of perf issues in labs are related to Gluster hehe [14:54:08] the rest we can blame LDAP [14:54:20] ldap can't slow web servers don't say that [14:54:37] the rest of labs issues [14:54:38] :-] [14:54:41] these processes are already running just forking them selve [14:54:50] no authentication at all [14:59:19] hashar is there scap on production still [14:59:23] or whatever [15:01:09] @notify hashar [15:01:10] This user is now online in #wikimedia-dev so I will let you know when they show some activity (talk etc) [15:03:07] hashar bot tell me ur talking in -operations :D [15:03:21] where are you [15:03:25] u ignore me [15:03:26] :'( [15:03:47] meh [15:04:30] !ping [15:04:30] pong [15:05:25] petan: na [15:05:33] petan: I am busy in -ops with faidon :-] [15:05:34] so [15:05:41] mhm [15:05:49] to get labs faster, the files should be in the /dev/vdb of each application server [15:05:54] instead of /data/project [15:05:58] ok, that's no problem [15:06:02] it's how it's done on prod [15:06:03] which also mean we want to use git-deploy to copy the file on the instances [15:06:11] why we don't do that [15:06:12] and thus we need to adapt operations/mediawiki-config to support git-deploy [15:06:21] why we didn't adapt it [15:06:26] which in turn means we most probably need to do it in production :-] [15:06:34] why we didn't do it in prod [15:06:43] the adaptation work has been done in the "new deploy" branch of mediawiki-config [15:06:53] and in prod I have no idea [15:06:58] it seems it is stalled for now [15:08:28] hi hashar, have you looked at https://bugzilla.wikimedia.org/show_bug.cgi?id=42188 ? do you think we could track a prototype version of AFTv5 on beta labs, and get the db updated? [15:09:35] nop not working on that [15:11:07] chrismcmahon: the work around is to have the feature in master and enabled via a $wg setting [15:11:08] hashar: is tracking a prototype branch of an extension possible? [15:11:14] possibly [15:11:25] but for now we are fetching mediawiki/extensions.git which only track master [15:11:31] arerr [15:11:39] ok, this is something that would be useful for matthiasmullie and me [15:11:44] but for now we are fetching mediawiki/extensions.git which points to the 'master' version of each extensions. [15:11:55] and for others in the future [15:12:06] possibly we could have a "beta" branch there that would track 'master' by default and the 'beta' branch if it exists [15:12:20] though I have no idea how mediawiki/extensions.git is maintained [15:22:43] hashar matthiasmullie looks like a simple change to the "branch" var in .gitmodules should do it for beta https://github.com/wikimedia/mediawiki-extensions/blob/master/.gitmodules or am I missing something? [15:24:22] would that not result in that branch automatically being deployed bi-weekly as well? [15:24:52] matthiasmullie: I think we should be able to override the global settings for beta, the way we do with the *Settings.php files? [15:28:15] the wmf branches are pointing to the master versions IIRC [15:28:37] in wmf branches, the extensions are made submodules of mediawiki/core.git [15:28:51] whereas beta is using a different repo : mediawiki/extension.git [15:30:28] that repo is probably updated magically by Gerrit itself [15:30:42] I don't think we can change the branch properly there [15:55:20] hashar: if you have any thoughts you could add to that BZ ticket for supporting non-master versions of extensions, could you add them to the ticket? [15:55:54] chrismcmahon: I did iirc [16:07:56] damn [16:39:53] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [16:44:32] RECOVERY Free ram is now: OK on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: OK: 20% free memory [16:45:15] ugh. my internet connection is so terrible [17:07:53] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 13% free memory [17:12:27] Change on 12mediawiki a page Wikimedia Labs/Stability improvement project was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=641781 edit summary: [+320] /* LDAP */ [17:28:49] Ryan_Lane, on the improvement project pages, should we just rm or strikethrough done entries? [17:29:01] I've been rm'ing them [17:29:05] okay [17:29:27] Change on 12mediawiki a page Wikimedia Labs/Interface usability improvement project was modified, changed by Krenair link https://www.mediawiki.org/w/index.php?diff=641786 edit summary: [-92] /* Notifications */ [17:32:09] Ryan_Lane, maybe we should move https://bugzilla.wikimedia.org/show_bug.cgi?id=44182 to the SMW component if we want the SMW devs to be aware of it... [17:32:21] ah [17:32:21] yeah [17:32:26] I couldn't think of where to put that [17:33:40] Since the change would have to be in SMW code (and needs someone familiar with the SMW codebase) I think that's best [17:52:32] PROBLEM Free ram is now: WARNING on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: Warning: 19% free memory [18:13:42] PROBLEM Free ram is now: WARNING on rocsteady-cleanup.pmtpa.wmflabs 10.4.0.206 output: Warning: 19% free memory [18:17:40] PROBLEM Current Load is now: CRITICAL on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:17:40] PROBLEM Free ram is now: CRITICAL on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:19:00] PROBLEM Free ram is now: WARNING on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: Warning: 15% free memory [18:19:00] PROBLEM dpkg-check is now: CRITICAL on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:20:00] PROBLEM SSH is now: CRITICAL on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: CRITICAL - Socket timeout after 10 seconds [18:20:34] PROBLEM Free ram is now: WARNING on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: Warning: 17% free memory [18:21:03] PROBLEM Free ram is now: WARNING on message-remailer.pmtpa.wmflabs 10.4.0.251 output: Warning: 16% free memory [18:22:03] PROBLEM SSH is now: CRITICAL on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: CRITICAL - Socket timeout after 10 seconds [18:22:23] PROBLEM Current Load is now: WARNING on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: WARNING - load average: 5.50, 5.74, 5.49 [18:24:33] PROBLEM Free ram is now: WARNING on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: Warning: 18% free memory [18:27:33] PROBLEM Current Load is now: CRITICAL on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:27:43] PROBLEM Total processes is now: CRITICAL on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:27:53] PROBLEM Disk Space is now: CRITICAL on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:29:03] PROBLEM Free ram is now: CRITICAL on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:29:33] PROBLEM SSH is now: CRITICAL on message-remailer.pmtpa.wmflabs 10.4.0.251 output: CRITICAL - Socket timeout after 10 seconds [18:32:52] RECOVERY Disk Space is now: OK on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: DISK OK [18:33:02] PROBLEM SSH is now: CRITICAL on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: CRITICAL - Socket timeout after 10 seconds [18:34:22] RECOVERY Free ram is now: OK on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: OK: 20% free memory [18:35:42] PROBLEM dpkg-check is now: CRITICAL on message-remailer.pmtpa.wmflabs 10.4.0.251 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:35:42] PROBLEM Free ram is now: CRITICAL on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:36:32] PROBLEM SSH is now: CRITICAL on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: CRITICAL - Socket timeout after 10 seconds [18:38:12] PROBLEM Disk Space is now: WARNING on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: DISK WARNING - free space: / 111 MB (5% inode=30%): [18:38:32] PROBLEM Current Load is now: WARNING on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: WARNING - load average: 9.90, 9.55, 6.59 [18:38:52] PROBLEM Current Load is now: CRITICAL on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:38:52] PROBLEM Free ram is now: WARNING on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: Warning: 12% free memory [18:39:02] PROBLEM SSH is now: CRITICAL on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: CRITICAL - Socket timeout after 10 seconds [18:39:02] PROBLEM dpkg-check is now: CRITICAL on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:39:32] PROBLEM Current Load is now: WARNING on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: WARNING - load average: 9.14, 8.98, 6.27 [18:40:02] PROBLEM dpkg-check is now: CRITICAL on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:40:33] PROBLEM dpkg-check is now: CRITICAL on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:40:43] PROBLEM Free ram is now: WARNING on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: Warning: 13% free memory [18:41:13] PROBLEM Current Load is now: WARNING on message-remailer.pmtpa.wmflabs 10.4.0.251 output: WARNING - load average: 9.55, 9.11, 6.74 [18:41:33] PROBLEM Disk Space is now: WARNING on message-remailer.pmtpa.wmflabs 10.4.0.251 output: DISK WARNING - free space: / 94 MB (4% inode=30%): [18:43:53] PROBLEM Current Load is now: WARNING on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: WARNING - load average: 10.66, 10.16, 7.63 [18:44:33] PROBLEM Current Load is now: WARNING on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: WARNING - load average: 8.49, 8.13, 6.23 [18:44:43] PROBLEM Free ram is now: CRITICAL on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:45:52] PROBLEM Disk Space is now: CRITICAL on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:45:52] RECOVERY dpkg-check is now: OK on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: All packages OK [18:46:32] RECOVERY dpkg-check is now: OK on extrev1.pmtpa.wmflabs 10.4.0.210 output: All packages OK [18:47:32] PROBLEM Current Load is now: WARNING on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: WARNING - load average: 10.48, 11.94, 12.94 [18:47:32] PROBLEM Free ram is now: WARNING on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: Warning: 19% free memory [18:47:32] RECOVERY Total processes is now: OK on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: PROCS OK: 110 processes [18:49:02] RECOVERY dpkg-check is now: OK on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: All packages OK [18:49:32] PROBLEM Free ram is now: WARNING on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: Warning: 19% free memory [18:50:33] RECOVERY dpkg-check is now: OK on grail.pmtpa.wmflabs 10.4.0.239 output: All packages OK [18:52:53] RECOVERY dpkg-check is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: All packages OK [18:53:33] PROBLEM Disk Space is now: WARNING on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: DISK WARNING - free space: / 106 MB (5% inode=30%): [18:54:33] PROBLEM Free ram is now: CRITICAL on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:54:43] PROBLEM Current Load is now: CRITICAL on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:58:43] PROBLEM Disk Space is now: CRITICAL on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:59:33] PROBLEM Current Load is now: WARNING on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: WARNING - load average: 8.02, 8.21, 7.38 [19:02:34] PROBLEM Current Load is now: CRITICAL on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: CHECK_NRPE: Socket timeout after 10 seconds. [19:03:44] PROBLEM Disk Space is now: WARNING on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: DISK WARNING - free space: / 102 MB (5% inode=30%): [19:11:13] PROBLEM Free ram is now: CRITICAL on message-remailer.pmtpa.wmflabs 10.4.0.251 output: CHECK_NRPE: Socket timeout after 10 seconds. [19:15:43] RECOVERY dpkg-check is now: OK on message-remailer.pmtpa.wmflabs 10.4.0.251 output: All packages OK [19:16:03] PROBLEM Free ram is now: WARNING on message-remailer.pmtpa.wmflabs 10.4.0.251 output: Warning: 13% free memory [19:19:53] PROBLEM Disk Space is now: WARNING on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: DISK WARNING - free space: / 113 MB (5% inode=30%): [19:23:23] PROBLEM Disk Space is now: CRITICAL on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: CHECK_NRPE: Socket timeout after 10 seconds. [19:23:43] PROBLEM Current Load is now: CRITICAL on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: CHECK_NRPE: Socket timeout after 10 seconds. [19:24:03] PROBLEM dpkg-check is now: CRITICAL on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: CHECK_NRPE: Socket timeout after 10 seconds. [19:28:34] PROBLEM Current Load is now: WARNING on changefeed-bot.pmtpa.wmflabs 10.4.0.240 output: WARNING - load average: 9.64, 10.68, 10.14 [19:39:02] RECOVERY dpkg-check is now: OK on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: All packages OK [19:40:32] RECOVERY dpkg-check is now: OK on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: All packages OK [19:47:33] PROBLEM Current Load is now: WARNING on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: WARNING - load average: 10.17, 10.12, 10.37 [19:49:33] PROBLEM Current Load is now: CRITICAL on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: CHECK_NRPE: Socket timeout after 10 seconds. [19:51:43] PROBLEM Free ram is now: WARNING on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: Warning: 19% free memory [19:53:33] PROBLEM dpkg-check is now: CRITICAL on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: CHECK_NRPE: Socket timeout after 10 seconds. [19:54:23] PROBLEM Current Load is now: WARNING on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: WARNING - load average: 8.39, 8.16, 7.93 [19:56:53] PROBLEM Free ram is now: CRITICAL on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: CHECK_NRPE: Socket timeout after 10 seconds. [19:59:53] PROBLEM Disk Space is now: CRITICAL on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: DISK CRITICAL - free space: / 55 MB (2% inode=30%): [20:01:42] RECOVERY Free ram is now: OK on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: OK: 20% free memory [20:03:22] @notify Damianz [20:03:22] You already requested this user to be watched [20:03:52] PROBLEM Free ram is now: WARNING on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: Warning: 12% free memory [20:05:52] * Damianz stalks petan [20:06:13] Damianz did u restart wm-bot yesterday? [20:06:22] someone killed the process twice [20:06:29] nope [20:06:37] maybe ryan, he was fixing glusterish stuff [20:06:38] I have exact time but no SAL log or even a reason why that should be [20:06:52] what time (gmt)? [20:06:54] according to logs bot had no problems :/ [20:06:58] labs time [20:07:03] I don't know what time is on system [20:07:07] probably GMT [20:07:07] I wasn't working on gluster today [20:07:10] >.< [20:07:19] Ryan_Lane: /yes/terday [20:07:19] hmm weird [20:07:40] * Damianz makes no comment on timezones [20:07:48] lol [20:07:51] :D [20:07:51] oh [20:07:51] heh [20:08:39] Any WMLabs admins on? [20:08:41] Sooo cold, hmm [20:08:49] hi Vacation9 are you still going to work on huggle? [20:08:56] Vacation9 just ask [20:09:08] PROBLEM Free ram is now: CRITICAL on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:09:11] petan: If you give me something to do :) [20:09:18] i-00000390.pmtpa.wmflabs : Feb 6 18:25:45 : mwdeploy : 3 incorrect password attempts ; TTY=pts/0 ; PWD=/ ; USER=apache ; COMMAND=/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --list-file=/home/wikipedia/common/wmf-config/extension-list [20:09:21] Vacation9 yes there is a lot of stuff to do [20:09:24] I have no idea where that message comes from ryan [20:09:32] hashar: ... [20:09:32] the crontabs are empty on 390 instance [20:09:34] hashar: booo [20:09:42] hashar: something is running sudo [20:09:44] (which is deployment-bastion ) [20:09:48] and it's failing [20:09:51] ahh [20:09:57] that must be the beta auto updater thus [20:10:31] Could an admin please create a project for me and Fox Wilson (shell fwilson)? If possible we would like something such as VoxelTools. We would run VoxelBot on here (partly because of the recent Bots cluster instability) and also some web tools we've been developing. [20:10:53] petan: If you give me something specific I would be glad to help [20:11:07] Vacation9 yes I were posting that to #huggle few days ago [20:11:25] petan: Sorry, I'm not on IRC much [20:12:00] Vacation9 it's needed to fill in all configuration variables to functions which write and read them from disk and to apply localization to all forms [20:12:15] both is monkey work which needs almost 0 knowledge of programming :P [20:12:44] petan: Hopefully I'll get to it a bit later :) Can't today or tonight [20:12:50] mhm [20:13:14] I asked addshore for that like a week ago and he didn't touch it yet, so no prob [20:13:29] ? [20:13:32] lol [20:13:43] I will quote you two when people ask me why it takes so long for huggle 3 to be released :D [20:13:43] iv been busy :/ [20:14:07] my plan is to finish my work, then finish my bot then finally make it onto huggle :P [20:14:27] addshore nobody cares about you being busy neither about your work. Top priority of your entire life is to work on huggle [20:14:36] xD [20:14:38] if only [20:14:44] :) [20:15:21] But anyway... back to my first question? [20:15:37] ahh [20:16:13] PROBLEM Free ram is now: CRITICAL on message-remailer.pmtpa.wmflabs 10.4.0.251 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:16:25] Ryan_Lane can do that [20:16:47] or paravoid or andrewbogott [20:16:55] ok, thanks [20:17:01] Ryan_Lane: I found out the root cause. Now I need to fix it :-] [20:17:02] adding another project isn't going to make things more stable [20:17:10] +1 [20:17:19] Ryan_Lane: We know that, but we would like to add web tools as well [20:17:29] use the bots and tools projects, then [20:17:34] why you don't run bot on bots and web tool on web tools [20:17:48] Web Tools is still in early early beta though [20:18:03] your project would be in earlier than that [20:18:06] :P [20:18:26] petan: But we wouldn't need to worry about other people when making changes :D [20:18:29] also there is a limited number of free ipv4 public ip's [20:18:44] RECOVERY Free ram is now: OK on rocsteady-cleanup.pmtpa.wmflabs 10.4.0.206 output: OK: 28% free memory [20:18:53] what kind of changes you need to make a webtool [20:19:18] I suppose most of that changes could be useful for all webtools creators [20:19:19] petan: Apache, I don't think web tools even has it installed yet, or at least publicly accessable [20:19:25] hmm, I like huggles... not so sure on huggle [20:19:43] apache is surely a part people on webtools won't blame you for [20:20:22] I requested to be added to webtools as sysadmin so that I can help setting it up, but I was ignored, so indeed, they are not very useable yet [20:20:55] it's true that without sysadmin it's hard to get anything working in webtools right now [20:20:59] petan: Yes, the problem is we don't have root I don't think [20:21:04] On Webtools [20:21:12] nope, problem is that you can't create other instances [20:21:19] everyone has root afaik [20:21:42] PROBLEM Disk Space is now: CRITICAL on message-remailer.pmtpa.wmflabs 10.4.0.251 output: DISK CRITICAL - free space: / 54 MB (2% inode=30%): [20:21:43] petan: Oh... I thought I checked and I didn't [20:21:51] sysadmin doesn't give you root, it just give you ability to manage the project [20:21:59] Anyway gtg [20:22:25] so, I am curious about him working on huggle [20:22:34] @notify Vacation9 [20:22:34] I will notify you, when I see Vacation9 around here [20:23:52] PROBLEM Free ram is now: WARNING on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: Warning: 12% free memory [20:29:02] PROBLEM Free ram is now: CRITICAL on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:29:42] PROBLEM Current Load is now: WARNING on watchlist-bot.pmtpa.wmflabs 10.4.0.229 output: WARNING - load average: 6.62, 6.74, 6.63 [20:31:42] RECOVERY Free ram is now: OK on rt-puppetdev6.pmtpa.wmflabs 10.4.0.24 output: OK: 28% free memory [20:32:32] PROBLEM Current Load is now: WARNING on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: WARNING - load average: 9.66, 9.81, 9.76 [20:34:52] PROBLEM Disk Space is now: WARNING on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: DISK WARNING - free space: / 79 MB (4% inode=30%): [20:37:17] @seen Vacation9 [20:37:17] chrismcmahon: Last time I saw Vacation9 they were quitting the network with reason: Quit: work :( N/A at 2/6/2013 8:22:05 PM (00:15:11.7682060 ago) [20:37:23] RECOVERY SSH is now: OK on spellcheckself-bot.pmtpa.wmflabs 10.4.0.246 output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [20:37:29] wow, didn't know we had that, thanks petan [20:37:53] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [20:38:01] huh? [20:38:33] PROBLEM Disk Space is now: WARNING on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: DISK WARNING - free space: / 76 MB (3% inode=30%): [20:38:38] if u mean notify I notified it on wikitech mail :P [20:43:32] PROBLEM Disk Space is now: CRITICAL on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: DISK CRITICAL - free space: / 56 MB (2% inode=30%): [20:44:32] PROBLEM Current Load is now: CRITICAL on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:45:15] omg [20:45:36] this is exactly what that stupid script should have never done [20:45:50] here we go [20:46:13] Ban ALL the users!!1! [20:46:16] at least I found a bug [20:46:21] in my client :D [20:46:24] which I wrote [20:46:41] funny is I made a check which should have prevent this [20:46:44] it doesn't work [20:46:48] What? Impossible. Don't you write bug-free code? [20:47:02] no I write only bug-full code [20:47:28] I write completely bug-free code. All the bugs are completely free. [20:47:42] or just they are undocumented functions [20:48:02] Heh. The only difference between a bug and a feature is documentation. :-) [20:48:52] RECOVERY SSH is now: OK on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [20:50:35] * Damianz slaps Coren [20:50:48] o_O [20:50:53] That seems unwarranted. :-) [20:50:57] petan only writes bugs [20:50:57] ;) [20:51:06] :D [20:51:23] sure I don't write programs, I write bugfs [20:51:25] :D [20:51:30] I even have bugs in my typing [20:51:45] I thought your typing was just due to you being europeanish [20:52:02] yes in europe we say bugfs [20:52:08] that give [20:52:14] me a great idea to make a new filesystem [20:52:18] :D [20:52:19] bugfs [20:52:38] it will insert random shit into your files [20:53:03] great for testing a software that checksum its data for corruption [20:53:30] Or you could write a filesystem that interfaces with bz so I can cat issues out as files [20:53:33] :D [20:53:39] heh [20:53:52] PROBLEM Current Load is now: CRITICAL on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:54:20] http://ganglia.wikimedia.org/latest/?c=Virtualization%20cluster%20pmtpa&h=virt7.pmtpa.wmnet&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [20:54:22] that should help some [20:54:30] I would have thought bugfs to be a DAV-like interface into bugzilla or somesuch. [20:54:39] omfg [20:54:40] p0rn link [20:54:43] ganglia isn't broken [20:54:44] :o [20:55:00] * petan naively clicks [20:55:15] we need to kill all tiny instances [20:55:22] BTW I installed ganglia to my server [20:55:26] one of my servers [20:55:31] Ryan_Lane: YES [20:55:42] and I can't figure out how to check free memory without buffesr :/ [20:55:47] with fire [20:55:47] oh [20:55:47] I know how to fix ganglia.wmflabs.org [20:55:47] it's using incorrect ports [20:55:53] We also need to change all hostnames, maybe when you break...errr update glusterfs so everything needs reboobing again [20:56:08] reboobing? [20:56:10] :D [20:56:24] * petan reboobs [20:56:33] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 21% free memory [20:56:40] Damianz: I already did glusterfs [20:56:41] my boobs are great now [20:56:45] it doesn't require a reboot [20:57:04] give it a month it will need a reboot to fix something [20:57:08] checkout some instances. gluster isn't eating stupid amounts of memory [20:57:37] yeah... bots havn't crashed in like a day [20:57:37] :D [20:58:07] Ryan_Lane: It's still slow as hell though :( [20:58:10] gluster's memory leak likely wasn't helping the situation [20:58:12] I think maybe that's ldap being crap though [20:58:18] gluster? [20:58:25] es [20:58:27] *yes [20:58:43] PROBLEM Current Load is now: WARNING on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: WARNING - load average: 8.86, 8.85, 8.80 [20:58:45] is it very slow right now? [20:59:16] on virt0: Search: 237177 Avg: 12.234 ms Max: 647 ms >100ms: 26 (0%) >1000ms: 0 (0%) [20:59:16] on virt1000: Search: 90540 Avg: 12.728 ms Max: 656 ms >100ms: 9 (0%) >1000ms: 0 (0%) [20:59:37] ldap is fast right now [20:59:47] * Ryan_Lane has a script to check ldap server speed now ;) [20:59:59] instances on virt7 were slow [21:00:05] which is what I was just tracking down [21:00:15] it still has pretty high waitio [21:00:16] not very slow... but it's slow [21:00:24] just generally, allways slow [21:00:36] I'll probably need to reboot some more instances [21:00:52] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 13% free memory [21:00:55] describe what you mean by slow [21:01:01] so that I can try to track it down [21:01:58] mounting is slow [21:02:10] though I've not tried today, probably faster now you fixed ldap [21:02:17] things seem relatively fast for me right now, and I'm on a terrible internet connection [21:02:22] oh [21:02:24] mounting is always slow [21:02:34] that said, it's actually faster since the gluster upgrade [21:02:35] that's not what she said [21:02:38] :D [21:03:33] we can make that less obvious by increasing the time it stays mounted [21:03:58] lasting longer is allways good [21:03:59] I like bots - it never goes down [21:04:32] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 11% free memory [21:06:03] rebooting the instance proxy [21:07:32] no fucks where given [21:07:53] also, yay [21:07:59] * Damianz gives hashar a chocolate lolly [21:08:38] * Damianz gets popcorn to watch petan kick Ryan_Lane's ass for not !log'ing his breakage of stuff [21:08:54] :) [21:09:04] !help [21:09:04] !documentation for labs !wm-bot for bot [21:09:12] !wm-bot [21:09:12] http://meta.wikimedia.org/wiki/WM-Bot [21:09:32] @labs-info wmang-proxy [21:09:32] I don't know this instance, sorry, try browsing the list by hand, but I can guarantee there is no such instance matching this name, host or Nova ID unless it was created less than 17 seconds ago [21:10:03] @labs-info i-0000058e [21:10:03] [Name i-0000058e doesn't exist but resolves to I-0000058e] I-0000058e is Nova Instance with name: mwang-proxy, host: virt7, IP: 10.4.0.243 of type: m1.small, with number of CPUs: 1, RAM of this size: 2048M, member of project: testlabs, size of storage: 30 and with image ID: ubuntu-12.04-precise [21:10:10] Ryan_Lane it may be broken [21:10:22] or you just mystyped the name :D [21:10:24] !log testlabs rebooted mwang-proxy [21:10:24] Logged the message, Master [21:10:38] heh, yeah. mistype ;) [21:10:40] btw @labs-resolve has search too [21:10:45] @labs-resolve proxy [21:10:45] I don't know this instance - aren't you are looking for: I-000004bc (instance-proxy), I-0000058e (mwang-proxy), [21:11:06] why is it called mwang proxy? [21:11:17] mike wang [21:11:26] Yeah - but why are we proxying him [21:11:26] it may not actually be the instance proxy [21:11:29] I quite liked him in person [21:11:33] well over tcp [21:11:38] he was writing something to puppetize it [21:12:02] @labs-info i-000000e9 [21:12:02] [Name i-000000e9 doesn't exist but resolves to I-000000e9] I-000000e9 is Nova Instance with name: nova-dev3, host: virt7, IP: 10.4.0.65 of type: m1.tiny, with number of CPUs: 1, RAM of this size: 512M, member of project: openstack, size of storage: 0 and with image ID: lucid-server-cloudimg-amd64.img [21:12:24] you know - I should fix nagios snmptt now I fixed the puppet side of things for most the hosts [21:12:24] andrewbogott: do we need that instance? ^^ [21:12:28] forgot about that.... f00d first [21:12:35] it's m1.tiny and I'd like to kill it :) [21:13:03] Ryan_Lane: I wish you could make them bigger without breaking them :( [21:13:10] yes, agreed [21:13:15] I'm going to work on that soon too [21:13:36] I'd like to resize people's instances for them, when I see they chose too small and are swapping [21:14:11] swappin like it's hot,swappin like it's hot [21:14:21] http://www.mediawiki.org/wiki/File:Maitre_du_feu.jpg < interesting image choice for git-gerrit [21:14:22] PROBLEM Current Load is now: WARNING on mw1-21beta-precise.pmtpa.wmflabs 10.4.0.174 output: WARNING - load average: 5.28, 5.10, 5.29 [21:14:24] oh [21:14:31] actually, petan created that one [21:14:41] petan: I'd imagine you don't need nova-dev3, right? [21:14:49] did I? [21:14:55] nope I don't [21:14:57] cool [21:15:00] deleting [21:15:14] * Damianz imagines Ryan_Lane say that in a darlek voice [21:15:23] Damianz: did you see the new skin we're working on? https://nova-precise2.pmtpa.wmflabs/wiki/Main_Page [21:15:54] nope - 1min [21:16:38] andrewbogott: also, do we still use nova-ldap1 or nova-ldap2? [21:17:39] you know what [21:17:41] that's like [21:17:42] fucking [21:17:44] 200% [21:17:46] better [21:18:02] though... the header is too tall - center the text with the image [21:18:06] looks padded [21:18:13] yeah, I need to adjust that [21:18:20] Ryan_Lane, hey I got an email from echo on labsconsole finally [21:18:34] Krenair: on a non-wikimedia email address? [21:18:35] "Ryan Lane deleted instance 'nova-dev3' in project [[Nova Resource:Openstack]]" :( [21:18:38] oh [21:18:45] Krenair: were you using that? [21:18:49] No [21:18:52] oh [21:18:54] good :) [21:18:54] heh [21:18:57] also... we need a really big pink unicorn on the frontpage [21:18:58] #justsaying [21:18:59] was worried for a sec [21:19:07] The point is that it doesn't format the email properly [21:19:34] * Ryan_Lane nods [21:21:05] !log nagios rebooting nagios-main [21:21:07] Logged the message, Master [21:21:15] Ryan_Lane: noooooo :( [21:21:18] :D [21:21:21] * Damianz makes ryan fix snmptt [21:21:30] oh [21:21:31] wait [21:21:36] I don't need to reboot this one [21:21:45] !log nagios scratch that. it doesn't need to reboot [21:21:46] Logged the message, Master [21:21:56] lololol [21:22:00] it actually is occasionally writing 3Mb/s [21:22:07] rather than swapping [21:22:28] hmm... not sure what it's writing rofl [21:22:33] \o/ [21:22:34] probably logs somewhere [21:22:37] yeah [21:22:48] iowait is down to something a little more reasonable [21:23:03] is labs capable of running sql queries just yet? [21:23:15] matanya: you mean replicated databases? [21:23:18] not yet [21:23:22] too bad [21:23:30] this month, though [21:23:36] is the slated target date [21:23:38] Ryan_Lane: I think we should just say screw it and give everyone root on project - open data right? [21:23:41] :D [21:23:44] how can one get those ? [21:23:56] get what? replicated databases? [21:24:05] hm. iowait is still high on virt7 [21:24:09] a result of a needed query [21:24:20] toolserver, currently [21:24:37] you can die before you there something [21:25:42] die hmm [21:25:52] well just wait a few months and use labs [21:25:52] quicker than dying [21:26:25] I think asher should come keep us company and bring us sexy toys like databases [21:30:12] PROBLEM host: pdbhandler-2.pmtpa.wmflabs is DOWN address: 10.4.1.73 CRITICAL - Host Unreachable (10.4.1.73) [21:31:22] RECOVERY host: pdbhandler-2.pmtpa.wmflabs is UP address: 10.4.1.73 PING OK - Packet loss = 0%, RTA = 0.72 ms [21:32:02] RECOVERY SSH is now: OK on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [21:34:32] PROBLEM Free ram is now: CRITICAL on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Connection refused by host [21:48:32] RECOVERY Disk Space is now: OK on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: DISK OK [21:48:42] RECOVERY Current Load is now: OK on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: OK - load average: 0.33, 0.17, 0.06 [23:05:03] * Damianz notes andrewbogott has a weird fetish [23:05:23] Reviewing comment-only patches? Whitespace is even better. [23:05:55] Gonna send you 80 files with whitepsace changes to standardise puppet manifests :D [23:06:31] But, really, I'm on the fence about using in-code comments for docs… probably would get more contributions if it used a wiki [23:06:41] But it's so nice to have things in sync... [23:27:08] ganglia has downed [23:27:13] There was an error collecting ganglia data (127.0.0.1:8654): fsockopen error: Connection refused [23:28:19] devunt: what site? just all ganglia sites? [23:28:25] yes. [23:28:35] http://ganglia.wmflabs.org/latest/ [23:29:31] ok. who runs the ganglia project in Labs? [23:29:48] Ryan_Lane: csteipp ^^ [23:29:52] per https://labsconsole.wikimedia.org/wiki/Nova_Resource:Ganglia [23:31:54] Looks like someone killed gmond or gmetad there... I'm not sure who actually runs that server [23:32:26] ganglia needs some fixing [23:32:29] it's not reporting either [23:32:40] sara smollett was last to work on it [23:32:43] thanks for the report devunt [23:32:54] it's never recovered from a reboot without manual intervention [23:42:24] I started gmetad, so it's back up... very little data though. I don't recall if ganglia stores that if the central service is down.