[00:01:05] RECOVERY Disk Space is now: OK on wikidata-dev-3 i-00000225.pmtpa.wmflabs output: DISK OK [00:01:44] RECOVERY Free ram is now: OK on wikidata-dev-3 i-00000225.pmtpa.wmflabs output: OK: 58% free memory [00:01:54] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [00:09:02] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [00:32:03] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [00:39:02] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [00:41:43] RECOVERY Free ram is now: OK on su-fe2 i-000002e6.pmtpa.wmflabs output: OK: 446% free memory [00:47:43] RECOVERY Free ram is now: OK on aggregator-test1 i-000002bf.pmtpa.wmflabs output: OK: 89% free memory [00:48:46] PROBLEM Total processes is now: CRITICAL on su-be1 i-000002e7.pmtpa.wmflabs output: PROCS CRITICAL: 227 processes [00:48:46] PROBLEM Total processes is now: CRITICAL on su-be2 i-000002e8.pmtpa.wmflabs output: PROCS CRITICAL: 235 processes [00:51:42] RECOVERY Free ram is now: OK on pdbhandler-1 i-0000030e.pmtpa.wmflabs output: OK: 796% free memory [00:52:32] PROBLEM Total processes is now: CRITICAL on aggregator-test1 i-000002bf.pmtpa.wmflabs output: PROCS CRITICAL: 241 processes [00:53:42] PROBLEM Total processes is now: CRITICAL on su-be3 i-000002e9.pmtpa.wmflabs output: PROCS CRITICAL: 235 processes [00:53:42] RECOVERY Total processes is now: OK on sultest1 i-0000032d.pmtpa.wmflabs output: PROCS OK: 76 processes [00:54:22] RECOVERY Current Load is now: OK on sultest1 i-0000032d.pmtpa.wmflabs output: OK - load average: 0.20, 0.38, 0.21 [00:55:33] RECOVERY Current Users is now: OK on sultest1 i-0000032d.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [00:56:13] RECOVERY Disk Space is now: OK on sultest1 i-0000032d.pmtpa.wmflabs output: DISK OK [00:56:43] RECOVERY Free ram is now: OK on sultest1 i-0000032d.pmtpa.wmflabs output: OK: 435% free memory [00:56:43] RECOVERY Free ram is now: OK on testing-singer-puppetization i-00000331.pmtpa.wmflabs output: OK: 278% free memory [01:01:42] RECOVERY Free ram is now: OK on tutopuppet i-00000336.pmtpa.wmflabs output: OK: 339% free memory [01:03:02] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [01:06:03] PROBLEM Total processes is now: WARNING on bots-salebot i-00000457.pmtpa.wmflabs output: PROCS WARNING: 175 processes [01:06:57] RECOVERY Free ram is now: OK on extrev1 i-00000346.pmtpa.wmflabs output: OK: 653% free memory [01:06:57] RECOVERY Free ram is now: OK on rocsteady-cleanup i-00000349.pmtpa.wmflabs output: OK: 89% free memory [01:09:06] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [01:11:02] RECOVERY Total processes is now: OK on bots-salebot i-00000457.pmtpa.wmflabs output: PROCS OK: 99 processes [01:11:52] RECOVERY Free ram is now: OK on wlmpuppet i-0000035c.pmtpa.wmflabs output: OK: 266% free memory [01:21:52] RECOVERY Free ram is now: OK on robh-spl i-00000369.pmtpa.wmflabs output: OK: 254% free memory [01:21:52] RECOVERY Free ram is now: OK on maps-osmrails i-00000373.pmtpa.wmflabs output: OK: 992% free memory [01:21:52] RECOVERY Free ram is now: OK on gerrit-db i-0000038b.pmtpa.wmflabs output: OK: 715% free memory [01:22:03] PROBLEM Free ram is now: CRITICAL on jesusaurus-cleanup i-0000038a.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [01:26:52] RECOVERY Free ram is now: OK on jesusaurus-cleanup i-0000038a.pmtpa.wmflabs output: OK: 66% free memory [01:26:52] RECOVERY Free ram is now: OK on solr-ci i-00000391.pmtpa.wmflabs output: OK: 316% free memory [01:26:52] RECOVERY Free ram is now: OK on maps-osmmapnik i-0000039b.pmtpa.wmflabs output: OK: 1037% free memory [01:31:52] RECOVERY Free ram is now: OK on blamemaps-m1xsmall i-0000039e.pmtpa.wmflabs output: OK: 211% free memory [01:31:52] RECOVERY Free ram is now: OK on mars i-000003a8.pmtpa.wmflabs output: OK: 881% free memory [01:33:02] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [01:33:32] RECOVERY Current Users is now: OK on parsoid-spof i-000004d6.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [01:39:03] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:02:42] RECOVERY Disk Space is now: OK on mw1-21beta-lucid i-00000416.pmtpa.wmflabs output: DISK OK [02:03:05] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [02:09:43] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:10:42] PROBLEM Disk Space is now: WARNING on mw1-21beta-lucid i-00000416.pmtpa.wmflabs output: DISK WARNING - free space: / 76 MB (5% inode=51%): [02:21:36] PROBLEM Current Users is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: USERS WARNING - 6 users currently logged in [02:33:04] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [02:36:55] PROBLEM Free ram is now: WARNING on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Warning: 19% free memory [02:39:46] RECOVERY Free ram is now: OK on bots-sql2 i-000000af.pmtpa.wmflabs output: OK: 20% free memory [02:40:22] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:40:52] PROBLEM Free ram is now: WARNING on integration-jenkins i-00000363.pmtpa.wmflabs output: Warning: 15% free memory [02:41:55] RECOVERY Free ram is now: OK on dumps-bot1 i-000003ed.pmtpa.wmflabs output: OK: 20% free memory [02:47:42] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: Warning: 19% free memory [03:03:05] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [03:04:53] PROBLEM Free ram is now: WARNING on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Warning: 19% free memory [03:10:24] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:33:53] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [03:40:42] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:49:54] PROBLEM Current Load is now: WARNING on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: WARNING - load average: 9.30, 8.86, 6.30 [03:51:52] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: WARNING - load average: 8.69, 7.85, 5.85 [03:55:01] hi Ryan_Lane [03:56:20] howdy [03:56:23] I tried to fix it [03:56:26] didn't work [03:56:36] I can only access it via remote execution [03:56:47] which means I can only do the basics [03:57:30] Ryan_Lane: there's nothing useful left there afais [03:57:37] but labsconsole doesn't work for me now [03:57:42] ah ok [03:57:42] no? [03:57:45] what isue? [03:57:57] ah [03:57:57] it says "No Nova credentials found for your account." but I'm logged in [03:57:57] no creds [03:58:13] log out and back in [03:58:28] I think something is leaking memory and dropping sessions from memcache [03:59:28] are we expecting every use has nova creds? [03:59:39] or maybe I have my timing screwed up between mediawiki and keystone [03:59:44] every nova use requires it [04:00:00] I'm does every wiki user have one? [04:00:04] yeah [04:00:12] but you won't see that error unless you use a nova interface [04:00:12] RECOVERY Disk Space is now: OK on conventionextension-trial i-000003bf.pmtpa.wmflabs output: DISK OK [04:00:22] if so, maybe we can auto log users out of wiki if we find 'no creds' on him [04:00:38] yeah, probably should [04:01:00] realistically it shouldn't happen [04:01:09] keystone creds shouldn't expire before the mediawiki token [04:01:34] eventually I'll just switch to keystone auth from ldap [04:01:42] but that's going to take some work [04:02:20] Ryan_Lane: I clicked delete on it but it's still listed in Special:NovaInstance? [04:02:31] yeah. takes a sec to go away [04:02:53] it makes a request to nova, nova eventually deletes it [04:03:03] it'll still show it as active until it's actually deleted [04:03:20] ok there's another issue in Special:NovaAddress [04:03:22] oh? [04:03:27] (Remove host name) doesn't work [04:03:28] (that isn't surprising :)) [04:03:35] yeah, the dns code is full of bugs [04:03:37] it says "The requested host does not exist. " [04:03:40] which one? [04:04:05] proxy.wikipedia.wmflabs.org on 208.80.153.147 [04:04:05] which project is this again? [04:04:12] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [04:04:17] no need to delete the hostname [04:04:23] you can add it on the new instance [04:04:31] it's a floating IP [04:04:34] the host is added to the IP [04:04:54] I know it but is an unused one wasting resource? [04:05:03] or prevent someone else from using it [04:05:15] do you plan on using it? [04:05:19] on a new instance? [04:05:29] if so, no need to let it go [04:06:29] if you aren't going to use it, I can clean it up manually [04:06:29] not really, or just *maybe* used in some say but allocating one seems easy [04:06:52] * Ryan_Lane nods [04:06:52] ok. lemme clean up the hostname [04:07:30] maybe you'd better just fix the removal code [04:07:41] so, I've been avoiding that [04:07:51] because we're going to completely replace the dns code [04:07:51] with a service [04:07:52] PROBLEM Current Load is now: WARNING on parsoid-roundtrip4 i-000004d7.pmtpa.wmflabs output: WARNING - load average: 5.50, 5.69, 5.17 [04:08:12] PROBLEM Disk Space is now: WARNING on conventionextension-trial i-000003bf.pmtpa.wmflabs output: DISK WARNING - free space: / 78 MB (5% inode=51%): [04:08:36] Ryan_Lane: that's ok [04:08:50] lots of other bugs to fix. until then I can just clean manually [04:08:54] it's not too often I need to do so [04:09:46] cleaned it [04:09:53] you should be able to deallocate now [04:11:03] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:13:34] Ryan_Lane: you cleaned everything on that ip... [04:17:52] RECOVERY Current Load is now: OK on parsoid-roundtrip4 i-000004d7.pmtpa.wmflabs output: OK - load average: 2.83, 3.39, 4.30 [04:18:50] Ryan_Lane: and see the complaint in labs-l [04:26:56] RECOVERY Current Load is now: OK on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: OK - load average: 3.05, 3.15, 4.64 [04:27:40] wait, was I not supposed to do that? [04:27:52] liangent: I'm confused [04:28:48] I thought the service wasn't working et [04:28:48] *yet [04:29:54] RECOVERY Current Load is now: OK on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: OK - load average: 3.50, 3.38, 4.98 [04:34:13] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [04:41:12] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs CRITICAL - Plugin timed out after 10 seconds [05:02:25] Ryan_Lane: only proxy.wikipedia.wmflabs.org ... the other two should be left [05:02:44] and the proxy is working [05:04:16] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [05:04:33] Ryan_Lane: btw I can't see its public ip with `ifconfig`? though the ip is working [05:11:42] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:29:44] PROBLEM Free ram is now: WARNING on aggregator2 i-000002c0.pmtpa.wmflabs output: Warning: 7% free memory [05:30:33] RECOVERY Current Users is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [05:30:47] liangent: ah [05:30:54] liangent: didn't know the others should be there [05:30:57] liangent: I'll put it back [05:31:02] the IP is NAT [05:31:08] so you won't see it [05:31:23] RECOVERY Current Load is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: OK - load average: 0.51, 0.57, 0.59 [05:31:37] Ryan_Lane: I already added them back [05:31:40] oh [05:31:41] cool [05:31:53] RECOVERY SSH is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [05:31:53] RECOVERY Disk Space is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: DISK OK [05:32:53] RECOVERY dpkg-check is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: All packages OK [05:33:33] PROBLEM Total processes is now: UNKNOWN on aggregator2 i-000002c0.pmtpa.wmflabs output: NRPE: Call to fork() failed [05:34:13] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [05:34:43] PROBLEM Free ram is now: UNKNOWN on aggregator2 i-000002c0.pmtpa.wmflabs output: NRPE: Call to fork() failed [05:38:32] PROBLEM Current Users is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:38:32] PROBLEM Total processes is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:39:22] PROBLEM Current Load is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:39:42] PROBLEM Free ram is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:39:52] PROBLEM SSH is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: Server answer: [05:39:52] PROBLEM Disk Space is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:40:53] PROBLEM dpkg-check is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:41:56] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:46:12] PROBLEM Disk Space is now: CRITICAL on sube i-000003d0.pmtpa.wmflabs output: DISK CRITICAL - free space: / 11 MB (0% inode=40%): [05:51:16] PROBLEM Disk Space is now: WARNING on sube i-000003d0.pmtpa.wmflabs output: DISK WARNING - free space: / 41 MB (3% inode=40%): [05:52:38] !htmllogs [05:52:41] experimental: http://bots.wmflabs.org/~wm-bot/html/%23wikimedia-labs [06:04:27] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [06:12:36] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:34:12] PROBLEM dpkg-check is now: CRITICAL on nova-ldap2 i-00000238.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [06:34:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [06:38:13] RECOVERY Disk Space is now: OK on conventionextension-trial i-000003bf.pmtpa.wmflabs output: DISK OK [06:39:13] RECOVERY dpkg-check is now: OK on nova-ldap2 i-00000238.pmtpa.wmflabs output: All packages OK [06:43:22] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:47:23] PROBLEM dpkg-check is now: CRITICAL on micro-design i-000003e8.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [06:52:45] RECOVERY dpkg-check is now: OK on micro-design i-000003e8.pmtpa.wmflabs output: All packages OK [06:59:58] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Normal - 6normal) [Bug 37080] upload, thumbnails and transcoding on beta (tracking) - https://bugzilla.wikimedia.org/show_bug.cgi?id=37080 [06:59:59] [bz] (8UNCONFIRMED - created by: 2Damian Z, priority: 4Unprioritized - 6minor) [Bug 38792] Thumbnails are broken - https://bugzilla.wikimedia.org/show_bug.cgi?id=38792 [07:02:52] PROBLEM Free ram is now: WARNING on changefeed-bot i-0000041b.pmtpa.wmflabs output: Warning: 15% free memory [07:04:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [07:07:53] RECOVERY Free ram is now: OK on changefeed-bot i-0000041b.pmtpa.wmflabs output: OK: 20% free memory [07:13:31] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:34:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [07:44:06] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:51:32] RECOVERY Current Users is now: OK on parsoid-spof i-000004d6.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [08:04:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [08:11:13] PROBLEM Disk Space is now: WARNING on conventionextension-trial i-000003bf.pmtpa.wmflabs output: DISK WARNING - free space: / 78 MB (5% inode=51%): [08:14:43] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:32:52] PROBLEM Free ram is now: UNKNOWN on changefeed-bot i-0000041b.pmtpa.wmflabs output: Invalid host name i-0000041b.pmtpa.wmflabs [08:34:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [08:36:12] RECOVERY Disk Space is now: OK on conventionextension-trial i-000003bf.pmtpa.wmflabs output: DISK OK [08:45:33] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:59:15] PROBLEM Disk Space is now: WARNING on conventionextension-trial i-000003bf.pmtpa.wmflabs output: DISK WARNING - free space: / 78 MB (5% inode=51%): [09:02:19] Change on 12mediawiki a page Developer access was modified, changed by Greeenjohnny link https://www.mediawiki.org/w/index.php?diff=597421 edit summary: [09:03:51] Change on 12mediawiki a page Developer access was modified, changed by Greeenjohnny link https://www.mediawiki.org/w/index.php?diff=597423 edit summary: /* User:Greeenjohnny */ [09:04:23] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [09:07:30] Change on 12mediawiki a page Developer access was modified, changed by Greeenjohnny link https://www.mediawiki.org/w/index.php?diff=597425 edit summary: /* User:Greeenjohnny */ [09:15:42] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:20:52] RECOVERY Free ram is now: OK on integration-jenkins i-00000363.pmtpa.wmflabs output: OK: 118% free memory [09:34:24] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [09:46:14] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:04:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [10:11:06] 10/25/2012 - 10:11:05 - User .docs has been renamed, moving home directory in project(s): wikidata-dev [10:13:35] Maybe I should open a puppet questions channel... [10:14:23] After pulling changes from gerrit I keep getting error messages I didn't see before. [10:15:21] puppet now apparently wants to access the private part that I don't have... [10:15:26] err: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not parse for environment production: No file(s) found for import of '../private/manifests/passwords.pp' at /etc/puppet/manifests/base.pp:10 on node wikidata-dev-4.pmtpa.wmflabs [10:16:08] Though I do not include base.pp anywhere... [10:16:09] 10/25/2012 - 10:16:08 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [10:16:46] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:17:18] Any ideas? [10:21:01] 10/25/2012 - 10:21:00 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [10:26:02] 10/25/2012 - 10:26:01 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [10:30:21] Well, actually the private files *are* there. [10:31:02] 10/25/2012 - 10:30:59 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [10:34:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [10:36:14] 10/25/2012 - 10:36:10 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [10:41:14] 10/25/2012 - 10:41:10 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [10:46:12] 10/25/2012 - 10:46:12 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [10:46:54] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:51:07] 10/25/2012 - 10:51:06 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [10:56:06] 10/25/2012 - 10:56:05 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [10:56:13] hashar Jan_Luca Hi! Question: What happens to ownership and/or file permissions in a puppetmaster::self when I execute "sudo GIT_SSH=/var/lib/git/ssh git pull --rebase" for the first time? Something seems has changed since yesterday. [10:57:23] Silke_WMDE: files under /var/lib/git/operations/puppet are fetched by puppet and should belong to root:root [10:57:44] so `GIT_SSH=/var/lib/git/ssh git pull --rebase` should probably be run as root too [10:57:52] PROBLEM Free ram is now: WARNING on changefeed-bot i-0000041b.pmtpa.wmflabs output: Warning: 19% free memory [10:57:59] something like sudo 'GIT_SSH=/var/lib/git/ssh git pull --rebase' [10:58:33] ah. I did "sudo" as smeyer. Now files belong to smeyer:root [10:59:15] so maybe use sudo -s first [10:59:24] and then run git [11:01:01] 10/25/2012 - 11:01:00 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [11:01:02] Silke_WMDE: When you run "sudo -s" you get a root shell where you can run your commands as root [11:01:43] :) I know. [11:02:13] I'll try to chown the directory [11:03:01] But when using sudo -s new files should be root:root [11:04:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [11:06:04] 10/25/2012 - 11:06:03 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [11:06:38] Jan_Luca: In the documentation for puppetmaster::self, it says "sudo GIT..." so I switched explicitly to smeyer and then ran sudo. [11:06:48] chown root:root doesn't help [11:07:04] It still says "err: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not parse for environment production: No file(s) found for import of '../private/manifests/passwords.pp' at /etc/puppet/manifests/base.pp:10 on node wikidata-dev-4.pmtpa.wmflabs" [11:07:08] chown -R root:root should work [11:07:47] it does't [11:08:16] ahhh [11:08:24] Silke_WMDE: that is a different issue I guess [11:08:33] Silke_WMDE: you need to checkout the labs/private repository as well I guess [11:08:38] did you check that there is a symlink /etc/puppet/private -> private-copy [11:08:55] hashar: puppetmaster::sel should do this [11:09:02] ohh [11:09:51] the symlink exists (/etc/puppet/private -> /var/lib/git/labs/private) and there are files in the private repo [11:10:17] also that passwords.pp from the error message is there. [11:11:01] 10/25/2012 - 11:11:01 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [11:12:44] passwords.pp is now owned by root:root and has 644 permissions. [11:16:02] 10/25/2012 - 11:15:59 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [11:17:32] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:21:01] 10/25/2012 - 11:21:01 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [11:26:06] 10/25/2012 - 11:26:06 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [11:31:03] 10/25/2012 - 11:31:03 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [11:34:44] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [11:36:02] 10/25/2012 - 11:36:02 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [11:41:00] 10/25/2012 - 11:40:59 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [11:46:00] 10/25/2012 - 11:45:59 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [11:47:32] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:48:52] PROBLEM Current Load is now: CRITICAL on wikidata-dev-6 i-000004e1.pmtpa.wmflabs output: Connection refused by host [11:49:32] PROBLEM Current Users is now: CRITICAL on wikidata-dev-6 i-000004e1.pmtpa.wmflabs output: Connection refused by host [11:50:13] PROBLEM Disk Space is now: CRITICAL on wikidata-dev-6 i-000004e1.pmtpa.wmflabs output: Connection refused by host [11:51:04] PROBLEM Free ram is now: CRITICAL on wikidata-dev-6 i-000004e1.pmtpa.wmflabs output: Connection refused by host [11:51:08] 10/25/2012 - 11:51:08 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [11:52:23] PROBLEM Total processes is now: CRITICAL on wikidata-dev-6 i-000004e1.pmtpa.wmflabs output: Connection refused by host [11:52:53] PROBLEM dpkg-check is now: CRITICAL on wikidata-dev-6 i-000004e1.pmtpa.wmflabs output: Connection refused by host [11:56:02] RECOVERY Free ram is now: OK on wikidata-dev-6 i-000004e1.pmtpa.wmflabs output: OK: 493% free memory [11:56:09] 10/25/2012 - 11:56:09 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [11:57:22] RECOVERY Total processes is now: OK on wikidata-dev-6 i-000004e1.pmtpa.wmflabs output: PROCS OK: 93 processes [11:57:52] RECOVERY dpkg-check is now: OK on wikidata-dev-6 i-000004e1.pmtpa.wmflabs output: All packages OK [11:58:52] RECOVERY Current Load is now: OK on wikidata-dev-6 i-000004e1.pmtpa.wmflabs output: OK - load average: 1.39, 0.55, 0.20 [11:59:32] RECOVERY Current Users is now: OK on wikidata-dev-6 i-000004e1.pmtpa.wmflabs output: USERS OK - 2 users currently logged in [12:00:13] RECOVERY Disk Space is now: OK on wikidata-dev-6 i-000004e1.pmtpa.wmflabs output: DISK OK [12:01:09] 10/25/2012 - 12:01:09 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [12:04:43] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [12:06:08] 10/25/2012 - 12:06:08 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [12:11:03] 10/25/2012 - 12:11:01 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [12:16:04] 10/25/2012 - 12:16:04 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [12:17:32] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [12:21:08] 10/25/2012 - 12:21:08 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [12:26:10] 10/25/2012 - 12:26:10 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [12:31:09] 10/25/2012 - 12:31:08 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [12:34:43] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [12:34:52] hashar: The solution was: /var/lib/git/labs should be owned by root:puppet. :) [12:34:53] PROBLEM host: i-000004e2.pmtpa.wmflabs is DOWN address: i-000004e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004e2.pmtpa.wmflabs) [12:36:02] 10/25/2012 - 12:36:01 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [12:36:58] Silke_WMDE_: ohhh [12:37:40] Silke_WMDE_: puppet group ;) [12:37:45] Silke_WMDE_: well done [12:37:48] yep [12:38:08] Abraham_WMDE1: I just installed a Wikidata repo via puppet [12:38:30] Silke_WMDE_: awesome, thx [12:38:50] Silke_WMDE_: This seems to be a dangerous thing ;-) [12:40:02] RECOVERY host: i-000004e2.pmtpa.wmflabs is UP address: i-000004e2.pmtpa.wmflabs PING OK - Packet loss = 0%, RTA = 505.21 ms [12:41:15] 10/25/2012 - 12:41:14 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [12:43:53] PROBLEM Current Load is now: CRITICAL on integration-jobbuilder i-000004e3.pmtpa.wmflabs output: Connection refused by host [12:44:33] PROBLEM Current Users is now: CRITICAL on integration-jobbuilder i-000004e3.pmtpa.wmflabs output: Connection refused by host [12:45:13] PROBLEM Disk Space is now: CRITICAL on integration-jobbuilder i-000004e3.pmtpa.wmflabs output: Connection refused by host [12:45:53] PROBLEM Free ram is now: CRITICAL on integration-jobbuilder i-000004e3.pmtpa.wmflabs output: Connection refused by host [12:46:04] 10/25/2012 - 12:46:04 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [12:47:23] PROBLEM Total processes is now: CRITICAL on integration-jobbuilder i-000004e3.pmtpa.wmflabs output: Connection refused by host [12:47:33] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [12:49:33] RECOVERY Current Users is now: OK on integration-jobbuilder i-000004e3.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [12:50:12] RECOVERY Disk Space is now: OK on integration-jobbuilder i-000004e3.pmtpa.wmflabs output: DISK OK [12:50:52] RECOVERY Free ram is now: OK on integration-jobbuilder i-000004e3.pmtpa.wmflabs output: OK: 550% free memory [12:51:03] 10/25/2012 - 12:50:59 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [12:52:22] RECOVERY Total processes is now: OK on integration-jobbuilder i-000004e3.pmtpa.wmflabs output: PROCS OK: 84 processes [12:53:52] RECOVERY Current Load is now: OK on integration-jobbuilder i-000004e3.pmtpa.wmflabs output: OK - load average: 0.17, 0.50, 0.46 [12:56:12] 10/25/2012 - 12:56:10 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [13:01:16] 10/25/2012 - 13:01:16 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [13:04:43] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [13:05:32] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 196 processes [13:05:57] 10/25/2012 - 13:05:57 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [13:10:36] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 103 processes [13:11:05] 10/25/2012 - 13:11:05 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [13:16:07] 10/25/2012 - 13:16:04 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [13:18:22] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs CRITICAL - Plugin timed out after 10 seconds [13:21:07] 10/25/2012 - 13:21:07 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [13:26:13] 10/25/2012 - 13:26:13 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [13:31:15] 10/25/2012 - 13:31:15 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [13:34:47] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [13:36:05] 10/25/2012 - 13:36:05 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [13:41:05] 10/25/2012 - 13:41:04 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [13:46:05] 10/25/2012 - 13:46:05 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [13:48:23] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [13:51:01] 10/25/2012 - 13:51:01 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [13:56:02] 10/25/2012 - 13:56:02 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [14:01:06] 10/25/2012 - 14:01:06 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [14:04:44] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [14:06:10] 10/25/2012 - 14:06:10 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [14:11:06] 10/25/2012 - 14:11:05 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [14:12:52] RECOVERY Free ram is now: OK on changefeed-bot i-0000041b.pmtpa.wmflabs output: OK: 20% free memory [14:16:15] 10/25/2012 - 14:16:15 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [14:18:24] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [14:21:09] 10/25/2012 - 14:21:09 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [14:25:52] PROBLEM Free ram is now: WARNING on changefeed-bot i-0000041b.pmtpa.wmflabs output: Warning: 19% free memory [14:26:00] 10/25/2012 - 14:26:00 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [14:26:12] PROBLEM Disk Space is now: CRITICAL on sube i-000003d0.pmtpa.wmflabs output: DISK CRITICAL - free space: / 27 MB (2% inode=40%): [14:31:03] 10/25/2012 - 14:31:03 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [14:31:13] PROBLEM Disk Space is now: WARNING on sube i-000003d0.pmtpa.wmflabs output: DISK WARNING - free space: / 41 MB (3% inode=40%): [14:35:12] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [14:36:02] 10/25/2012 - 14:36:00 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [14:41:07] 10/25/2012 - 14:41:07 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [14:46:01] 10/25/2012 - 14:46:00 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [14:48:52] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [14:51:12] 10/25/2012 - 14:51:09 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [14:56:04] 10/25/2012 - 14:56:04 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [15:01:05] 10/25/2012 - 15:01:05 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [15:04:32] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 196 processes [15:06:05] 10/25/2012 - 15:06:04 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [15:06:13] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [15:11:09] 10/25/2012 - 15:11:07 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [15:16:05] 10/25/2012 - 15:16:05 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [15:19:03] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [15:21:18] 10/25/2012 - 15:21:17 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [15:24:21] Change on 12mediawiki a page Developer access was modified, changed by Matthewrbowker link https://www.mediawiki.org/w/index.php?diff=597518 edit summary: [15:25:59] 10/25/2012 - 15:25:59 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [15:29:32] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 103 processes [15:31:12] 10/25/2012 - 15:31:12 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [15:36:15] 10/25/2012 - 15:36:14 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [15:37:03] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [15:41:06] 10/25/2012 - 15:41:05 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [15:46:11] 10/25/2012 - 15:46:10 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [15:49:32] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [15:51:05] 10/25/2012 - 15:51:03 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [15:56:10] 10/25/2012 - 15:56:10 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [16:01:11] 10/25/2012 - 16:01:11 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [16:06:01] 10/25/2012 - 16:06:00 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [16:08:05] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [16:09:33] PROBLEM Current Users is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: USERS WARNING - 6 users currently logged in [16:11:03] 10/25/2012 - 16:11:03 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [16:15:53] RECOVERY Free ram is now: OK on changefeed-bot i-0000041b.pmtpa.wmflabs output: OK: 20% free memory [16:16:03] 10/25/2012 - 16:16:02 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [16:19:43] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [16:21:02] 10/25/2012 - 16:21:01 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [16:23:55] ^demon: Do you know why Gerrit is down? [16:24:06] <^demon> Announced downtime. [16:24:15] <^demon> We're upgrading the server. It's almost done. [16:24:52] Ok, it seems that I overseen this mail... [16:25:57] <^demon> Jan_Luca: Back up now. [16:26:09] 10/25/2012 - 16:26:07 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [16:29:52] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: WARNING - load average: 6.04, 6.31, 5.52 [16:31:00] 10/25/2012 - 16:31:00 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [16:31:41] <^demon> Ryan_Lane: manganese & formey are both on precise now. [16:33:53] PROBLEM Free ram is now: WARNING on changefeed-bot i-0000041b.pmtpa.wmflabs output: Warning: 19% free memory [16:36:19] 10/25/2012 - 16:36:18 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [16:37:07] ^demon: \o/ [16:38:03] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [16:39:32] RECOVERY Current Users is now: OK on parsoid-spof i-000004d6.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [16:41:08] 10/25/2012 - 16:41:07 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [16:46:11] 10/25/2012 - 16:46:10 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [16:46:42] ^demon: Thank you [16:46:54] <^demon> you're welcome. [16:47:32] PROBLEM Current Users is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: USERS WARNING - 6 users currently logged in [16:50:22] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [16:51:03] 10/25/2012 - 16:50:59 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [16:53:14] andrewbogott: Could you review my change? [16:53:34] Sure, just a moment... [16:55:18] When this is OK I want to refactor the website-classes [16:56:10] 10/25/2012 - 16:56:06 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [17:01:10] 10/25/2012 - 17:01:09 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [17:04:52] RECOVERY Current Load is now: OK on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: OK - load average: 7.25, 3.01, 1.12 [17:06:02] 10/25/2012 - 17:06:02 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [17:06:49] I need a new labs project created [17:06:59] who do I talk to about that? [17:08:02] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [17:09:34] PROBLEM host: i-000004d7.pmtpa.wmflabs is DOWN address: i-000004d7.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004d7.pmtpa.wmflabs) [17:11:03] 10/25/2012 - 17:11:02 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [17:14:16] preilly: andrewbogott [17:14:40] Jan_Luca: thanks [17:14:50] andrewbogott: ping [17:14:53] PROBLEM Current Load is now: WARNING on parsoid-roundtrip5-8core i-000004db.pmtpa.wmflabs output: WARNING - load average: 12.46, 11.73, 7.24 [17:14:59] prielly: yo! [17:15:05] andrewbogott: I need a new labs project created [17:15:07] Do you have a labs login already? [17:15:12] andrewbogott: yes [17:15:18] And, what do you want the project called? [17:15:21] username 'preilly'? [17:15:39] andrewbogott: yes preilly [17:15:49] andrewbogott: I want the project named performance [17:15:55] ok. Stay tuned... [17:16:03] 10/25/2012 - 17:16:01 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [17:16:52] preilly: Need a public ip? [17:17:03] andrewbogott: yes [17:17:07] 'k [17:17:17] andrewbogott: thanks for your help [17:17:52] PROBLEM Current Load is now: WARNING on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: WARNING - load average: 10.44, 10.02, 6.78 [17:19:39] preilly: OK, you should be all set. The GUI for setting up DNS is pretty straightforward but let me know if you need guidance. [17:19:53] andrewbogott: I know how to do the DNS [17:19:57] andrewbogott: thanks [17:20:05] np [17:20:26] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [17:21:02] 10/25/2012 - 17:20:59 - Creating a project directory for performance [17:21:02] 10/25/2012 - 17:20:59 - Created a home directory for preilly in project(s): performance [17:22:46] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: WARNING - load average: 8.86, 8.44, 6.26 [17:26:01] 10/25/2012 - 17:25:59 - User preilly may have been modified in LDAP or locally, updating key in project(s): performance [17:31:08] 10/25/2012 - 17:31:08 - User asher may have been modified in LDAP or locally, updating key in project(s): performance [17:31:47] andrewbogott: there's a bug with project creation [17:32:07] andrewbogott: if you add users during creation it'll fail to add your own user to the project [17:32:14] Oh yeah, I have to twiddle the member list don't I? [17:32:14] which causes security group creation to fail [17:32:29] yeah [17:32:37] also need to manually add the security rules [17:32:48] preilly -- hang on, I'll clean that up. [17:32:59] andrewbogott: okay, thanks! [17:33:04] until the bug is fixed, it's best to create the project, then add people [17:33:19] I tried tracking it down last week, but didn't see the problem [17:33:34] I must have changed that code as some point and broke it, but have no clue how [17:33:49] it's kind of stupid right now, and it may have memcache issues [17:34:04] it should aggregate all users and add them immediately, but it adds them one at a time [17:34:08] same with roles [17:34:32] PROBLEM host: i-000004e4.pmtpa.wmflabs is DOWN address: i-000004e4.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [17:34:39] so, it probably pulls the entry from memcache, or memory, and overwrites what's there with an incorrect list [17:34:51] Ryan_Lane: Actually, looks like it did add me. No security rules though. [17:35:05] Hm… no, it added me but not as sysadmin [17:35:06] yeah [17:35:09] it's stupid :) [17:35:24] no, weirder still, I'm a sysadmin but not a member...! [17:35:29] yep [17:35:33] that's the problem [17:35:52] roles seem to work properly [17:35:57] Pretty sure this is the third time I've learned my way around this bug [17:36:02] but not membership [17:36:03] 10/25/2012 - 17:36:03 - User khorn may have been modified in LDAP or locally, updating key in project(s): performance [17:38:02] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [17:38:41] What's up with: https://labsconsole.wikimedia.org/w/index.php?title=Special:NovaInstance&action=consoleoutput&project=performance&instanceid=d2d245d1-de74-4bb5-a637-c8f9d22837b6®ion=pmtpa [17:38:57] Oct 25 17:37:57 timing nslcd[13262]: [22fbb7] error writing to client: Broken pipe [17:38:57] Oct 25 17:37:57 timing nslcd[13262]: [934699] error writing to client: Broken pipe [17:39:33] preilly: Not sure, but until a minute ago new instances were fully firewalled off from the outside world. So might be best to delete that instance and start over. [17:39:35] ssh should be open now [17:40:15] andrewbogott: seriously I've got to rebuild it? [17:40:22] PROBLEM host: i-000004d7.pmtpa.wmflabs is DOWN address: i-000004d7.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004d7.pmtpa.wmflabs) [17:40:29] maybe or maybe not :) Might be if you give it a few minute and/or reboot it it'll recover. [17:41:02] 10/25/2012 - 17:41:01 - Created a home directory for andrew in project(s): performance [17:41:35] andrewbogott: I just deleted it and created a new one [17:41:45] that should do it. [17:41:48] andrewbogott: let's see if it works this time around [17:42:08] Why is labs so buggy? [17:42:17] Is that a rhetorical question? [17:42:38] andrewbogott: well it just seems that anytime I try to use it there is always some sort of issue [17:42:51] well, in this case the project was missing all of its security groups [17:42:59] nothing will work in that situation [17:43:34] Alternate answers: 1) Because it's a new thing doing new things 2) because what it does is a fucking miracle and you should be impressed when it ever works at all :) [17:43:39] heh [17:43:57] 3) our other ops person was pulled to work on swift [17:44:38] hmm [17:44:46] Okay [17:45:10] 4) I'm pulled to work on toolserver migration [17:45:28] That's bullshit [17:45:56] well, it's in our roadmap. kind of makes things hard that we lost someone at the same time [17:46:06] 10/25/2012 - 17:46:05 - User andrew may have been modified in LDAP or locally, updating key in project(s): performance [17:48:53] PROBLEM Current Load is now: CRITICAL on timing i-000004e5.pmtpa.wmflabs output: Connection refused by host [17:49:33] PROBLEM Current Users is now: CRITICAL on timing i-000004e5.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:50:44] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [17:51:17] 10/25/2012 - 17:51:17 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [17:53:52] RECOVERY Current Load is now: OK on timing i-000004e5.pmtpa.wmflabs output: OK - load average: 0.04, 0.54, 0.47 [17:54:36] RECOVERY Current Users is now: OK on timing i-000004e5.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [17:55:42] yeah [17:56:01] 10/25/2012 - 17:56:01 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:01:05] thankfully I have ideas on how to fix the issue of needing to delete/recreate on creation failures [18:01:17] for most of them [18:01:17] 10/25/2012 - 18:01:14 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:01:42] use salt to do the initial bootstrapping, then if something fails we can make it re-try via remote execution [18:05:57] 10/25/2012 - 18:05:57 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:08:03] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [18:10:25] PROBLEM host: i-000004d7.pmtpa.wmflabs is DOWN address: i-000004d7.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004d7.pmtpa.wmflabs) [18:11:14] 10/25/2012 - 18:11:07 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:13:41] aude: Hi [18:14:01] Have you had a chance to look at things in the repository? [18:16:08] 10/25/2012 - 18:16:08 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:21:05] 10/25/2012 - 18:21:04 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:21:18] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [18:26:00] 10/25/2012 - 18:25:59 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:31:10] 10/25/2012 - 18:31:10 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:36:04] 10/25/2012 - 18:36:01 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:38:52] PROBLEM Current Load is now: CRITICAL on testing-pamhome i-000004e6.pmtpa.wmflabs output: Connection refused by host [18:39:13] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [18:40:23] PROBLEM host: i-000004d7.pmtpa.wmflabs is DOWN address: i-000004d7.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004d7.pmtpa.wmflabs) [18:41:06] 10/25/2012 - 18:41:05 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:43:53] RECOVERY Current Load is now: OK on testing-pamhome i-000004e6.pmtpa.wmflabs output: OK - load average: 0.02, 0.46, 0.45 [18:46:01] 10/25/2012 - 18:46:01 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:51:04] apmon_: i've only looked and not tried them yet on a new instance [18:51:07] 10/25/2012 - 18:51:06 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:51:16] it all looks good though [18:51:52] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [18:52:27] If you try them on a new instance, there are a couple of things you need to do that aren't puppetized. [18:53:21] You need to create /tmp/home in order to create the osm user, as one can't create home directories under /home. [18:53:58] apmon_: right [18:54:04] You need to import my gpg key, for the signed packages to be valid. And I think there was one or two other things, but I forgot [18:54:19] ok [18:54:49] Things, that would likely not be the same on a "real system", so I didn't include it in the puppet scripts [18:55:08] sure [18:55:37] Oh, yes, you also actually need to push the scripts into the puppet directory and add them to the site.pp. [18:56:04] i think i'll get a new instance, as the one i have is buggy i think [18:56:05] 10/25/2012 - 18:56:05 - User .lib has been renamed, moving home directory in project(s): wikidata-dev [18:56:25] Yes, during testing, I have created a bunch of new instances tried it out and then deleted them again. [18:56:40] ok [18:56:48] Otherwise it is hard to make sure things are in a "verbatim" state [18:56:51] sure [18:57:18] i was having trouble with puppet and wasn't sure it was me or the instance (both probably) that weren't right [18:57:33] paravoid: hm. any idea where I should stick pam_mkhomedir? [18:57:35] common-session [18:57:35] ? [18:57:40] i'm trying puppet on linode and it seems to work there [18:59:20] sticking it there seems to work [18:59:29] You will also need the "misc::labsdebrepo" and "puppetmaster::self" classes in the instances puppet config [18:59:35] as long as I don't use non-interactive, it should be fine [18:59:50] I think common-account [18:59:57] but they two are very similar [19:00:04] it's session [19:00:05] not account [19:00:20] aude: What is linode? [19:00:34] session required pam_mkhomedir.so umask=0077 [19:01:19] apmon_: another VM hosting [19:01:25] Ah, OK [19:01:28] apmon_: so, technically you can create a directory under /home now [19:01:33] comes with IPs, etc. :) [19:01:39] though it's not recommended [19:01:49] Ryan_Lane: ok, but only technically [19:01:49] service accounts shouldn't use /home [19:02:21] they should use /var/lib/ [19:02:30] /home is only meant for interactive users [19:02:36] hmmm, ok [19:02:39] OK. I can change that [19:02:51] system and service accounts should always live in /var/lib [19:02:51] is that how wikimedia does things? [19:03:06] ok [19:03:06] that's the standard way of doing things [19:03:11] ok [19:03:14] And I can probably just put the scripts into the standard /usr/bin directories rather than in the home directory [19:03:21] apmon_: yeah [19:03:29] ideally you'd wrap all of this in a debian package :) [19:03:39] but that's quite a bit of work [19:03:40] :) [19:03:45] one thing at a time [19:03:50] yep [19:04:27] paravoid: so, I'm going to enable pam_mkhomedir and turn off the script [19:04:36] except for keys [19:04:39] that needs to stay :) [19:04:41] andrewbogott, hashar: Can you rereview my php-change please? [19:04:51] # ? [19:04:55] this is one step closer to gluster homedirs :) [19:05:07] https://gerrit.wikimedia.org/r/#/c/29975 [19:05:07] Jan_Luca: danke :-] [19:05:08] Ryan_Lane: Most of it is wrapped into debian packages. But a few extra scripts currently aren't [19:05:14] ah ok [19:05:22] * aude is happy to have an extra hour on sunday (daylight savings) to do more coding and hacking :) [19:05:26] apmon_: thanks for all the work on this btw! [19:05:31] aude: you too [19:05:37] poking at it now [19:05:44] Jan_Luca: who are you by the way? :-] Are you working for wmf on labs ? :-) [19:05:52] But I can probably see if perhaps they should move into the debian packages too. [19:05:52] Ryan_Lane: thank you for advising [19:05:58] totally welcome. always willing to [19:06:23] hashar: No I'm a German student that do this in his free time [19:06:54] Jan_Luca: I guess that is a nice way to acquire experience :-] [19:07:24] Yes, before labs I didn't know what puppet is :-) [19:07:32] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 196 processes [19:07:40] + gerrit + git + us ;-] [19:07:47] :-D [19:07:49] anyway, welcome aboard ! [19:08:14] Jan_Luca: you'll know what salt is eventually too ;) [19:09:14] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [19:09:14] With this little changes like the php-change I try to learn puppet so I can get the centralauth-project using puppet [19:09:22] * Ryan_Lane nods [19:09:36] Ryan_Lane: Do mean salt for passwords? [19:09:46] saltstack [19:10:09] Jan_Luca: there is a few more packages you want to add in. I have listed them in the diff comment https://gerrit.wikimedia.org/r/#/c/29975/1/manifests/php.pp,unified [19:10:18] it's for remote execution and asyncronous event based configuration management [19:10:24] https://gerrit.wikimedia.org/r/#/c/8732/ <-- salt [19:10:42] PROBLEM host: i-000004d7.pmtpa.wmflabs is DOWN address: i-000004d7.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004d7.pmtpa.wmflabs) [19:10:48] * aude wonders when that will be approved and merged [19:10:53] aude: I'm working on it [19:11:03] though I may actually delete that repo [19:11:09] I think I'm going to do all of it inside of the puppet repo [19:11:10] ah, ok [19:11:18] * aude nods [19:11:25] as a module, of course [19:11:49] sure [19:12:05] Jan_Luca: I have commented on https://gerrit.wikimedia.org/r/#/c/29975/ [19:12:36] Jan_Luca: I am not part of the operations team so will not have any final word ;-]  But I might be able to give you some basic guidances. [19:12:40] I like that the puppet repo is the third most contributed to repo :) [19:12:54] too many merge commits :-] [19:13:07] *by number of commiters [19:13:25] I think I did more commit this year in puppet than in mediawiki [19:13:36] what's the 2nd most? [19:13:36] mediawiki extensions [19:13:39] hashar: With the module it is easy to add the others [19:14:07] ok, all of them combined [19:14:10] yep [19:14:15] cheating [19:14:22] heh [19:15:19] and you do post commit review :-] [19:15:26] and don't stage on labs ;-D [19:15:32] so end up having tons of commit for just one change hehe [19:15:41] I actually do stage on labs [19:15:56] <^demon> But testing on production is more fun :) [19:15:59] hashar: I was counting by contributor number and not number of commits ;) [19:16:11] ;-D [19:16:30] I have looked at the ohloh listing last weekend, https://www.ohloh.net/p/wikimedia-puppet/contributors [19:17:14] and thought that it could use a bit more volunteers in it :-D [19:17:27] I am wondering how people install their software in labs, they probably just apt-get install :/ [19:17:36] * aude is counted twice on ohloh [19:17:46] once for merges i think, once for commits [19:17:54] Aude != aude [19:17:55] aude if you get an account, I think you can claim usernames [19:18:03] i see [19:18:08] and yeah Gerrit generates wrong real names :(((((( [19:18:22] My git client has "Antoine Musso" but git knows me as "Hashar" [19:18:37] ryan said it was not trivial to fix since that need the LDAP schema to be rewritten (iirc) [19:18:55] well, it's gerrit that needs to be fixed [19:18:56] not ldap [19:19:34] shouldn't be case sensitive [19:19:35] hashar: Running apt-get install takes about a minute. Doing it through puppet takes months, if you have to learn it first and figure out how to push things through review... [19:19:38] Damianz: I pulled in your change [19:19:49] Damianz: for labsconsole [19:19:51] it now reports storage properly [19:19:55] * aude nods :/ [19:20:23] apmon_: I tend to agree ;) [19:20:35] ugh [19:20:37] is memcache dead again? [19:20:59] wow [19:21:06] memcache segfauleted [19:21:28] Ryan_Lane: well the accountFullName parameter in Gerrit points to LDAP field "cn" [19:21:37] Ryan_Lane: and the LDAP schema does not seem to have any real name in it :/ [19:21:52] Unless you want to provision 100s of identical hosts, puppet doesn't seem worth it, at least not to me as an inexperienced user. [19:21:52] how is that wrong? [19:21:54] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [19:22:08] apmon_: well, if you need to go from lucid to precise, it does [19:22:23] because you can bring up an instance, test the new one, then delete the old one [19:22:38] and puppet will handle it all for you [19:22:41] it also documents how things are set up [19:22:53] sure once we learn it but quite a learning curve [19:23:01] which makes it easier to work collaboratively [19:23:10] yeah [19:23:15] it's a learning curve for sure [19:23:24] * aude likes reading through the stuff in operations/puppet though [19:23:30] always learn something [19:23:33] I need to figure out why memcache is dying [19:24:13] In big projects like wikipedia, I am sure it does make sense. But for individual projects in labs, it might not. [19:24:36] apmon_: it totally does in labs [19:24:41] ahh [19:24:57] Gerrit uses cn as the account name, when it should most probably be the surname aka sn: [19:25:04] instance state = error :( [19:25:05] hashar: no. it should use CN [19:25:12] <^demon> No, we should keep using CN [19:25:12] why shouldn't it use cn? [19:25:15] aude: oh? [19:25:27] shall i try again or are things broken? [19:25:27] so we can login with our nickname [19:25:29] ? [19:25:34] https://labsconsole.wikimedia.org/wiki/Special:NovaInstance [19:25:42] things shouldn't be broken. you may have hit a race condition [19:25:42] maps-osmmapnik2 [19:25:46] ok. sec [19:25:51] * aude shall try again [19:25:52] aude: this is in project maps? [19:25:58] yes [19:25:58] don't delete just yet [19:26:02] ok [19:26:13] ^demon: anyway for us to change our cn so ? :-] [19:26:28] {u'message': u'DetachedInstanceError', u'code': 500, u'created': u'2012-10-25T19:03:54Z'} [19:26:28] -_- [19:26:32] it's very annoying to hit that error [19:26:44] <^demon> hashar: What has to happen is we have to update 2 tables in gerrit after changing CN [19:26:44] this likely has something to do with a specific IP address [19:26:46] uh, okay [19:26:51] <^demon> Otherwise it gets treated as a new user. [19:26:52] should i try again? [19:26:58] yes, please [19:27:00] ip = 10.4.1.24 [19:27:05] this started happening after I expanded the network [19:27:23] so there's likely something screwed up in the database [19:27:28] ^demon: I am still unsure why we use the real name (cn) as a username [19:27:34] hashar: why wouldn't we? [19:27:47] <^demon> It's the same thing you use in labs. [19:27:51] exactly [19:27:53] would make more sense to me if we logged in using our surname [19:28:15] why? [19:28:15] just like my account name is "hashar" on the wiki [19:28:15] surname is your last name [19:28:15] but my ~~~~ shows my real name "Antoine Musso" [19:28:16] it's hard enough having *two* unique usernames [19:28:19] ohhhh [19:28:21] surname .. [19:28:23] bah [19:28:32] it would be nearly impossible to have to have three [19:28:45] I was thinking about "nickname" hehe [19:28:55] if you want your git name to be your real name, then your user name should be your real name [19:29:01] hashar: Some of your packages I don't find in Ubuntu repo: php5-redis, and php5-parsekit and php5-wmerrors [19:29:01] that's why we have that mentioned [19:29:07] Are they WM-packages, too? [19:29:10] Jan_Luca: those are on the wikimedia repository. [19:29:31] Jan_Luca: are you using lucid or precise? [19:29:38] (please don't say oneiric :) ) [19:29:56] Ryan_Lane: so I am mostly complaining about being forced to use our real name as a username [19:30:03] I need to mark those as (unsupported) [19:30:04] hashar: you aren't forced to [19:30:07] Ryan_Lane: I use precise as reference [19:30:19] hashar: only if you want your git name to be your real name [19:30:20] Jan_Luca: ok [19:30:28] those should be available in precise [19:30:39] though we're just now upgrading our app servers to precise [19:30:51] so it's possible they haven't been added et [19:30:51] *yet [19:31:26] Ryan_Lane: so yeah, if I want my real name to be properly credited in git, I am forced to use my real name as a username. Makes sense ? ;) [19:31:28] aude: seems your new one is error state too [19:31:38] I use for the Ubuntu packages this list: http://packages.ubuntu.com/precise/allpackages?format=txt.gz [19:31:45] same IP address [19:31:46] wtf [19:31:53] Ryan_Lane: which in turns means I would need to use my real name to log on labs console and in gerrit. True. [19:32:08] hashar: I don't see the problem there [19:32:54] dealing with authentication between multiple applications is incredibly hard. we have to standardize in some way [19:32:54] Ryan_Lane: I prefer using "hashar" to connect, I can probably survive not being properly credited in git :-] [19:33:25] hashar: Is there a list of WM-packages? [19:33:25] Jan_Luca: do you have a shell account on labs ? [19:33:35] hashar: yes, jan [19:33:41] Jan_Luca: our repo is at http://apt.wikimedia.org [19:34:02] Jan_Luca: so you could use the apt utilities in a precise instance to find out [19:34:34] or browse http://apt.wikimedia.org/ I am not sure where the list of packages is [19:34:52] PROBLEM host: i-000004e8.pmtpa.wmflabs is DOWN address: i-000004e8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004e8.pmtpa.wmflabs) [19:35:22] Jan_Luca: maybe http://apt.wikimedia.org/wikimedia/dists/precise-wikimedia/universe/binary-amd64/Packages.gz [19:35:39] aude: this looks like a bug [19:35:45] aude: can you delete those two instances? [19:35:52] I'm going to manually update the database [19:36:20] hashar: Yes the Packages file contains the package infos [19:37:46] Ryan_Lane: ok [19:38:10] it's funny. it has it associated with an instance [19:38:17] but not marked as reserved or allocated [19:38:22] hmm [19:38:24] and it has no virtual interface associated [19:38:32] maybe i should pick a different name [19:38:39] now it has no instance_id associated [19:38:47] nah, the instance name is fine [19:38:49] ok [19:39:17] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [19:39:49] Ryan_Lane: any regarding memcached server going down, it has been happening from time to time for the last few days. Nagios would tell but I can't find virt0 in nagios :/ [19:39:56] Ryan_Lane: also you have lost the virtualization cluster in ganglia [19:40:06] hashar: I have the virt cluster back [19:40:11] oh nice [19:40:42] PROBLEM host: i-000004d7.pmtpa.wmflabs is DOWN address: i-000004d7.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004d7.pmtpa.wmflabs) [19:40:50] but no virt0 :-] [19:40:53] in nagios [19:40:55] yeah. [19:41:05] nor in ganglia apparently [19:41:07] because it's not able to talk to the aggregators [19:41:39] what puzzle me is that I am pretty sure to have seen a nagios notification on irc about virt0:memcached being down [19:44:11] jan_Luca: Doesn't there still need to be a php::core class somewhere? Or is it there and I overlooked? [19:45:19] andrewbogott: there is a class php in init.pp but hashar suggest to add some more classes for other php-modules [19:45:46] so I would suggest to wait untill I upload this change [19:46:03] Oh yeah, there it is [19:52:24] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [19:53:57] PROBLEM Current Load is now: CRITICAL on maps-osmmapnik4 i-000004e9.pmtpa.wmflabs output: Connection refused by host [19:53:57] Ryan_Lane: remind me again what the difference is between m1 instance type and s1? [19:54:32] PROBLEM Current Users is now: CRITICAL on maps-osmmapnik4 i-000004e9.pmtpa.wmflabs output: Connection refused by host [19:55:13] PROBLEM Disk Space is now: CRITICAL on maps-osmmapnik4 i-000004e9.pmtpa.wmflabs output: Connection refused by host [19:56:02] PROBLEM Free ram is now: CRITICAL on maps-osmmapnik4 i-000004e9.pmtpa.wmflabs output: Connection refused by host [19:57:22] PROBLEM Total processes is now: CRITICAL on maps-osmmapnik4 i-000004e9.pmtpa.wmflabs output: Connection refused by host [19:57:26] hashar, andrewbogott: I uploaded a new patch set with all the classes that hashar suggested: https://gerrit.wikimedia.org/r/#/c/29975/3 [19:57:52] PROBLEM dpkg-check is now: CRITICAL on maps-osmmapnik4 i-000004e9.pmtpa.wmflabs output: Connection refused by host [19:58:52] RECOVERY Current Load is now: OK on maps-osmmapnik4 i-000004e9.pmtpa.wmflabs output: OK - load average: 1.22, 1.14, 0.65 [19:59:32] RECOVERY Current Users is now: OK on maps-osmmapnik4 i-000004e9.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [19:59:33] Jan_Luca: nice :-] [19:59:53] Jan_Luca: I guess Faidon will take a look at it tomorrow [20:00:09] With the solution as module it is only copy-and-paste ;-) [20:00:12] RECOVERY Disk Space is now: OK on maps-osmmapnik4 i-000004e9.pmtpa.wmflabs output: DISK OK [20:00:47] Jan_Luca: yeah, we could even have a list of the extensions and write a script that would generate all the classes :-D [20:01:01] Jan_Luca: though ops will probably dislikes having a shell script to generate classes :-] [20:01:02] RECOVERY Free ram is now: OK on maps-osmmapnik4 i-000004e9.pmtpa.wmflabs output: OK: 1116% free memory [20:01:44] hashar: The problem is that sometimes it is php-... and the most time php5-... [20:02:22] RECOVERY Total processes is now: OK on maps-osmmapnik4 i-000004e9.pmtpa.wmflabs output: PROCS OK: 83 processes [20:02:44] PROBLEM Total processes is now: CRITICAL on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS CRITICAL: 289 processes [20:02:52] RECOVERY dpkg-check is now: OK on maps-osmmapnik4 i-000004e9.pmtpa.wmflabs output: All packages OK [20:03:33] andrewbogott: The seperation in many files is needed by the autoloader [20:03:51] ok [20:04:04] Jan_Luca: seems fine to me. ops will have the final word though [20:04:54] andrewbogott: so yeah the autoloader when given foo::bar will look for a module named 'foo' and under its manifests directory for a 'bar.pp' [20:05:06] that's reasonable. [20:05:51] andrewbogott: There is a nice overview: http://docs.puppetlabs.com/puppet/3/reference/modules_fundamentals.html [20:06:07] in the section "Module Layout" [20:09:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [20:10:43] PROBLEM host: i-000004d7.pmtpa.wmflabs is DOWN address: i-000004d7.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004d7.pmtpa.wmflabs) [20:22:23] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [20:32:32] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 196 processes [20:39:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [20:40:52] PROBLEM host: i-000004d7.pmtpa.wmflabs is DOWN address: i-000004d7.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004d7.pmtpa.wmflabs) [20:52:32] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [21:07:22] aude: s1 has more local storage [21:07:26] Ryan_Lane: ok [21:07:38] it was for when we didn't have project storage [21:07:43] I should see if I can delete them [21:07:54] not sure it matters [21:08:09] project storage is good [21:08:16] * aude accidentially picked the first choice [21:08:38] but s1 also has project storage? [21:08:54] yes [21:08:54] ok [21:08:54] but it has less ram and less cpu [21:08:59] oh, ok [21:09:11] i could delete and recreate yet again [21:09:25] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [21:09:25] heh [21:09:25] yeah [21:09:28] I need to get resize working [21:09:41] it's possible in this version of nova [21:09:46] ok [21:09:48] it needs to be configured and added to the interface, though [21:09:58] that would be nice :) [21:11:12] PROBLEM host: i-000004d7.pmtpa.wmflabs is DOWN address: i-000004d7.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004d7.pmtpa.wmflabs) [21:13:40] I'm not too sure how resize would work from larger to smaller for storage, though :) [21:16:38] aude: I just noticed that the most recent mod_tile package I created has a bug in it. So I'll need to recreate them [21:20:53] apmon_: :/ [21:22:34] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [21:37:57] Change on 12mediawiki a page Developer access was modified, changed by Mgrover(WMF) link https://www.mediawiki.org/w/index.php?diff=597590 edit summary: [21:39:24] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [21:40:56] PROBLEM host: i-000004d7.pmtpa.wmflabs is DOWN address: i-000004d7.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004d7.pmtpa.wmflabs) [21:44:36] Change on 12mediawiki a page Developer access was modified, changed by Mgrover(WMF) link https://www.mediawiki.org/w/index.php?diff=597591 edit summary: /* User:Mgrover(WMF) */ [21:45:23] I have a vm that is refusing to boot: parsoid-roundtrip4, group visualeditor [21:45:42] RECOVERY Disk Space is now: OK on mw1-21beta-lucid i-00000416.pmtpa.wmflabs output: DISK OK [21:45:44] in shutdown state, reboot fails. Also no console output. [21:53:13] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [21:53:43] PROBLEM Disk Space is now: WARNING on mw1-21beta-lucid i-00000416.pmtpa.wmflabs output: DISK WARNING - free space: / 74 MB (5% inode=51%): [22:01:11] * gwicke goes ahead and nukes that vm [22:02:33] PROBLEM Total processes is now: CRITICAL on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS CRITICAL: 289 processes [22:07:25] hmm- I created a new VM (eight cores, yay!), but that now shows status 'error', again without console output [22:07:40] should I try to reboot it? [22:09:23] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [22:10:52] Failed to reboot instance. This is parsoid-roundtrip4-8core in the visualeditor project. [22:11:08] vanilla instance, just created and never logged in. [22:11:20] Ryan_Lane: ^^ [22:11:46] it was in error status on build? [22:11:51] if that's the case a reboot won't work [22:11:55] only a delete/recreate will [22:12:07] seems we're starting to run into nova issues again [22:12:32] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 196 processes [22:12:53] yep [22:12:53] Bleh cross wind tonight is awful [22:12:53] same issue that aude was having [22:12:57] :( [22:13:09] * aude always tries to create instances at the wrong time  [22:13:15] so it seems [22:13:16] I wonder if updated packages are availabl [22:13:25] aude: gwicke is having the same issue [22:13:27] this is a nova bug [22:13:31] * aude nods [22:13:38] scheduler crapness? [22:13:41] no [22:13:42] PROBLEM host: i-000004ea.pmtpa.wmflabs is DOWN address: i-000004ea.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004ea.pmtpa.wmflabs) [22:13:54] RECOVERY Free ram is now: OK on changefeed-bot i-0000041b.pmtpa.wmflabs output: OK: 20% free memory [22:13:57] this is a code bug that screws up the database [22:14:10] not permanently, thankfully [22:14:19] gwicke: delete/recreate will hopefully fix it [22:14:27] I'm going to check if there's updated packages available [22:14:32] * gwicke also tends to do stuff at wrong moment [22:14:43] everytime a bug makes it into stable a kitten gets very, very worried [22:14:55] Ryan_Lane: I just tried to fix a reboot issue with the recreate, but that failed too.. [22:14:59] will try once more [22:15:26] sems only keystone has patches, so I can upgrade any other service :) [22:16:01] if it's a reboot issue you should let me know [22:16:03] that's fixable [22:16:14] crossing fingers.. [22:16:31] Ryan_Lane: there wasn't much config on that instance, it is just a round-trip worker [22:16:39] ok [22:17:07] 'building' sounds promising [22:17:39] and back to 'error' [22:17:50] parsoid-roundtrip4-8core in visualeditor [22:17:53] same name? [22:18:03] yes [22:18:12] should I append random chars? [22:18:22] no [22:18:25] it's not the name [22:18:35] gimme a sec [22:20:15] huh [22:20:16] weird [22:20:25] it wasn't even assigned an ip [22:20:51] the web UI shows an IP (10.4.0.39) [22:20:56] it's annoying. I know this was fixed and released into the stable branch [22:21:28] seriously? nova list doesn't [22:23:04] it doesn't show one in the interface [22:23:22] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [22:26:53] PROBLEM Free ram is now: WARNING on changefeed-bot i-0000041b.pmtpa.wmflabs output: Warning: 19% free memory [22:28:53] PROBLEM host: i-000004eb.pmtpa.wmflabs is DOWN address: i-000004eb.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004eb.pmtpa.wmflabs) [22:39:26] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [22:42:38] arrrrggghhhh [22:42:54] we're like two update releases behind because our precise mirror is broken [22:44:57] mirror@brewster:~$ crontab -l [22:44:57] /var/spool/cron/crontabs/mirror: Permission denied [22:45:18] tasty [22:47:32] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 136 processes [22:51:06] I was really wondering what the hell the deal was [22:51:18] well, maybe I won't need these keystone patches anymore [22:51:30] and maybe these bugs will go away :) [22:53:22] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [22:56:23] ok, I'll try to do another delete - recreate [22:59:32] PROBLEM host: i-000004eb.pmtpa.wmflabs is DOWN address: i-000004eb.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004eb.pmtpa.wmflabs) [23:00:37] Change on 12mediawiki a page Developer access was modified, changed by Sharihareswara (WMF) link https://www.mediawiki.org/w/index.php?diff=597599 edit summary: /* User:Greeenjohnny */ done [23:00:37] no difference [23:02:53] Change on 12mediawiki a page Developer access was modified, changed by Sharihareswara (WMF) link https://www.mediawiki.org/w/index.php?diff=597602 edit summary: /* User:Matthewrbowker */ [23:03:42] Change on 12mediawiki a page Developer access was modified, changed by Sharihareswara (WMF) link https://www.mediawiki.org/w/index.php?diff=597604 edit summary: /* User:Mgrover(WMF) */ [23:04:53] Ryan_Lane: is this a systematic bug, or something that might be fixed with a restart somewhere in the infrastructure? [23:05:09] this is something that will be fixed by upgrading openstack [23:05:13] we're a couple point releases behind [23:05:15] argh [23:05:18] because out apt mirror has been broken [23:05:31] any chance to get that instance up before then? [23:05:31] this is an upgrade that is just upgrading the packages [23:05:39] I'm working on upgrading right now [23:05:43] it's a simple one [23:05:47] ah, ok [23:05:55] so there is hope ;) [23:05:56] it's the same major release [23:06:32] this is what we are producing btw: http://parsoid.wmflabs.org:8001/stats [23:06:44] awesome [23:07:45] Change on 12mediawiki a page Developer access was modified, changed by Sharihareswara (WMF) link https://www.mediawiki.org/w/index.php?diff=597607 edit summary: /* Requests */ done. [23:08:42] PROBLEM host: i-000004ec.pmtpa.wmflabs is DOWN address: i-000004ec.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004ec.pmtpa.wmflabs) [23:09:21] well, eventhough labs has been a little buggy for you guys, this would have cost thousands of dollars to test on ec2 :) [23:09:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [23:09:42] 8 core instances there are kind of expensive [23:09:58] Ryan_Lane: maybe- or $300 at Hetzner ;) [23:10:05] Hetzner? [23:10:19] http://www.hetzner.de/ [23:10:59] they are all single core [23:11:08] normally you get a physical server [23:11:15] and run openvz on that [23:11:22] ah [23:12:15] thousands? meh I can get 24core boxes for a matter of hundreds :D [23:12:35] ovh is also worth checking in the budget hosting space [23:14:41] for AWS high-cpu systems it's .66 /hr [23:14:51] ~$500 per month [23:15:31] ovh have four cores plus hyperthreading for 60€/month [23:15:39] I dunno why you'd pay that when you have have high-cpu redundant cluster for less than the price [23:16:01] hetzner for 50 I think [23:16:07] http://www.ovh.co.uk/private_cloud/#00048_00064_0000250_0_0 [23:16:31] haha! [23:16:40] vSphere is stupid expensive [23:16:47] We use it at work for our vmware clusters [23:16:53] hard to compare hosted vs virtual [23:16:55] Going to replace it with openstack [23:17:04] also hard to compare virtual vs dedicated hardware [23:17:10] <3 dedicated [23:17:24] dedicated hardware takes most companies time to set up, then you have to do all the virtual machine work yourself :) [23:17:38] you also usually need to have a contract [23:17:38] setting up openvz takes something like 10 minutes [23:18:08] could be sped up with a script [23:18:10] My argument is most providers can sort hardware in <24hours and if you need more than 24hours then your capacity planning sucks [23:18:35] * Ryan_Lane shrugs [23:18:46] I'm just comparing apples to apples :) [23:19:16] * Damianz throws an apple at Ryan [23:19:17] this stupid mirror update is taking ages [23:19:52] hosted totally doesn't count [23:20:02] because you are on a server with probably 100 other people [23:20:17] Ryan_Lane: not if you get a dedicated [23:20:23] right [23:20:31] hetzner setup takes 1-2 hours in my experience [23:20:52] anyway ;) [23:22:31] Reminds me, I need to check the capacity on our 140gb nodes [23:22:38] * Damianz finds vSphere [23:24:02] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [23:24:08] gwicke: so, for the time being, delete/recreate generally works [23:24:17] though it seems like the bug is occuring more frequently [23:24:27] I did track down the bug, and it's fixed in the upgrade [23:24:34] Ryan_Lane: I tried three times so far without success [23:24:37] ok [23:24:50] I guess you'll need to wait till our mirror finishes updating [23:25:19] am retrying once more in the meantime [23:26:18] wohoo! 'active' ;) [23:26:20] \o/ [23:29:35] in an unrelated note- it does not seem to be possible to connect to public labs ips from another labs instance [23:29:42] yep [23:29:50] because it's NAT'd [23:30:05] routing > nating #justsaying [23:30:09] Damianz: I agree [23:30:11] oh, I suspected missing nat in that case [23:30:25] well, if it's on the same device NAT won't work properly [23:30:35] so, it will likely work if they are on different hosts [23:30:37] but not if they are on the same one [23:31:09] k, makes sense [23:32:10] is there normally a delay until it becomes possible to log into new instances? [23:32:37] my public key is refused by the new instance [23:32:57] would NAT be unnecessary with IPv6? [23:33:15] gwicke: one puppet finished running [23:33:31] Ryan_Lane: ah, ok- will be patient then [23:33:54] PROBLEM Current Load is now: CRITICAL on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: Connection refused by host [23:34:09] We should totally nat ipv6 addresses just for shits and gigs [23:34:24] heh [23:34:24] we should enable ipv6 [23:34:29] I think I may be able to do it [23:34:32] PROBLEM Current Users is now: CRITICAL on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: Connection refused by host [23:34:33] since expanding the network [23:34:49] Would be nice [23:35:10] yeah [23:35:10] would be very nice [23:35:10] Kinda want a decent inter-region network stuff though [23:35:13] PROBLEM Disk Space is now: CRITICAL on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: Connection refused by host [23:35:17] that's not easy at all [23:36:03] PROBLEM Free ram is now: CRITICAL on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: Connection refused by host [23:36:05] I'd like to tackle easier problems first :) [23:36:10] nope since they are designed to be seperate [23:36:19] Though hey we have a layer 2 link, lets make cool shit [23:36:33] like, it would be awesome if you could be informed when an instance fully finished building [23:36:56] we need echo in mediawiki for that :) [23:37:26] Ryan_Lane: instance just finished building ;) [23:37:26] see, it works already [23:37:26] hahaha [23:37:26] well, I have pretty soonish plans to make that happen [23:37:31] have salt bootstrap the system [23:37:34] it runs puppet [23:37:44] when puppet finished running, it sends an event [23:37:55] the event gets picked up on the master, who informs irc and mediawiki [23:38:11] the mediawiki part is harder [23:38:11] I kinda seriously want to re-do the whole ui and make it event based but meh, time, effort, mediawiki [23:38:13] irc is easy [23:38:57] RECOVERY Current Load is now: OK on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: OK - load average: 0.26, 0.67, 0.48 [23:39:12] I really need to make snapshots of pre-puppetized systems and use those rather than bare images [23:39:29] it would speed up the create process by a lot [23:39:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [23:39:33] RECOVERY Current Users is now: OK on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: USERS OK - 1 users currently logged in [23:39:42] see, all this kind of shit I'd do if I had time [23:40:05] Well you have 2 weeks free soon [23:40:05] =D [23:40:12] hahaha [23:40:12] RECOVERY Disk Space is now: OK on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: DISK OK [23:40:17] if I had free-time at work ;) [23:40:58] It's the things that start with 'when I have spare time I'll sort that' and ends in 'yeah, when am I going to have free time' heh [23:41:02] RECOVERY Free ram is now: OK on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: OK: 1979% free memory [23:41:19] and this is why things stay buggy :( [23:41:28] * gwicke enlists the new vm for rt testing chores [23:41:36] \o/ [23:41:47] gwicke: you aren't using puppetmaster::self for all of those, are you? [23:42:28] no, it is just a single apt-get and an adjustment to a config for the number of workers to spawn [23:42:28] ah. cool [23:42:45] I went through and ensured puppet ran on almost every single instance yesterday [23:42:47] using supervisord [23:42:53] it was a total pain in the ass with puppetmaster::self :) [23:43:29] I am not clued yet about puppet, and this is still very manageable [23:43:32] * Ryan_Lane nods [23:43:45] well, there's not much need [23:44:00] until we're ready to move it to production, however you want to handle it is fine [23:44:33] parsoid is already puppetized, this is just about the rt test clients [23:44:38] ah [23:44:45] mapreduce on wikipedia page if you will [23:44:45] definitely no reason to do it for that [23:45:00] * Ryan_Lane nods [23:54:02] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [23:56:54] PROBLEM Current Load is now: WARNING on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: WARNING - load average: 11.31, 10.69, 7.24 [23:58:26] * gwicke happily notices the nova-compute-lxc package in debian