[00:03:58] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [00:11:51] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [00:34:05] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [00:41:55] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [01:04:21] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [01:08:15] can anyone think of a labs host that has an HTTPS server installed already? [01:09:49] Emw: GChriss: i chatted with Ryan_Lane1 earlier today but forgot to ask him [01:12:41] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [01:19:50] jeremyb: many thanks. It's not especially urgent, but I would enjoy trying my hand at migrating over the codebase (and knowing it is in a "safe" location) [01:20:50] ohhhhhh [01:20:56] you want a gerrit project then? [01:21:02] i was thinking labs project [01:21:06] both? [01:21:08] gerrit is a git host [01:21:23] right -- I've read through almost all of gerrit documentation [01:21:57] sad that i have the group id memorized, 119 ;) [01:22:00] https://gerrit.wikimedia.org/r/#/admin/groups/119,info [01:23:23] as far as I can tell metavidwiki isn't in the gerrit trunk/extensions branch (or anywhere else); I've made a number of changes since the initial metavidwiki checkout 3 yrs ago. so, if I understand correctly, now would be a good time to copy metavidwiki from svn into gerrit, run a 'diff' locally through the source tree, and submit patches through gerrit [01:23:27] robla's /away but also looks fairly unidle [01:23:54] uhhh, i guess [01:24:13] independently of that I'm looking forward to figuring out puppet + setting up a dev-only version of openmeetings.org [01:24:15] you might want to do one massive code review unless you already have it broken into individual patches [01:24:16] in labs [01:25:11] I don't have individual patches ready, an en-mass code review would be fine by me, and there are a ton of changes [01:25:45] but the changes I have hacked are important enough to identify/commit [01:25:47] note you don't need any review for a labs deployment [01:26:01] right, just a project :) [01:26:34] the sister projects proposal is looking better: http://meta.wikimedia.org/wiki/OpenMeetings.org [01:27:20] you could also have it on gerrit without review at all and then over time you can move stuff from the huge diff into a separate gerrit branch that does have code review. and things can be removed from the mega patch over time as they make their way into individual changes [01:28:31] do you mean 'code review' in a security sense? [01:30:02] also in a this doesn't break the site or cause dataloss or do something unintended sense [01:31:05] Thehelpfulone: ping [01:31:34] oh, pharos is on the committee too [01:31:39] i guess Thehelpfulone is asleep [01:34:25] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [01:35:45] * robla looks up... [01:35:48] hey! [01:36:00] I'm here for a sec...what's up? [01:36:02] gerrit project creation and labs project creation [01:36:14] i think labs is more of a priority [01:36:23] gerrit can wait for demon to do a repo conversion [01:37:20] I'm not the guy to go to for labs project creation [01:37:29] :( [01:37:52] who're the people to bug about that? [01:37:55] Anyone know what could be causing this error? http://integration.wmflabs.org/mw/ [01:38:01] I can't get mw to run on sqlite there [01:38:03] * jeremyb was just pulling up https://labsconsole.wikimedia.org/wiki/Special:ListUsers/sysop ;) [01:38:14] permissions are fine (tried 777 even) [01:38:19] libraries are installed [01:38:31] (Can't contact the database server: Cannot return last error, no db connection) [01:38:35] jeremyb: yup, that looks like the list [01:38:53] * robla notes that he's not on it [01:38:59] robla: WikiSysop looks sketchy [01:39:10] "root" [01:39:52] robla: sorry, i asked you because you were on the gerrit list and then realized we want a repo conversion [01:40:12] Krinkle: is the database server pingable from that machine? [01:40:19] no prob....I didn't mean to sound cranky about being pinged [01:40:24] Emw: sqlite.. [01:40:32] Emw: in the same directory [01:40:33] robla: nah ;) [01:40:56] Emw: from maintenance/eval.php things like wfMsg() and wfGetDB() work fine [01:40:57] Emw: it's not a network thing nor a server thing [01:41:18] (well server as in box. but not service) [01:41:25] ah, ok [01:42:45] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [02:04:36] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [02:12:46] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [02:24:47] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [02:32:57] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 5.70, 5.67, 5.18 [02:34:37] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [02:34:57] RECOVERY Current Users is now: OK on bastion-restricted1 i-0000019b output: USERS OK - 5 users currently logged in [02:39:26] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: PROCS CRITICAL: 212 processes [02:40:10] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 25% free memory [02:44:30] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 185 processes [02:44:35] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [02:57:49] PROBLEM Free ram is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [02:58:37] PROBLEM Free ram is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [02:58:37] PROBLEM Total Processes is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:01:13] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [03:01:13] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:02:22] PROBLEM Free ram is now: UNKNOWN on gluster-4 i-000002e4 output: NRPE: Unable to read output [03:03:28] RECOVERY Free ram is now: OK on worker1 i-00000208 output: OK: 94% free memory [03:03:28] RECOVERY Total Processes is now: OK on worker1 i-00000208 output: PROCS OK: 75 processes [03:06:08] PROBLEM Free ram is now: WARNING on incubator-bot1 i-00000251 output: Warning: 10% free memory [03:06:08] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [03:08:35] PROBLEM Free ram is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:10:21] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 69 MB (5% inode=57%): [03:12:00] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:11] PROBLEM Free ram is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [03:14:00] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 6.20, 5.22, 4.10 [03:15:26] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [03:16:48] PROBLEM Free ram is now: UNKNOWN on configtest-main i-000002dd output: NRPE: Unable to read output [03:17:58] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [03:23:55] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 1.95, 2.83, 3.51 [03:28:55] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.90, 1.52, 2.73 [03:34:55] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [03:36:36] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 16% free memory [03:36:55] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [03:39:55] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 15% free memory [03:46:28] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [03:50:20] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 15% free memory [03:57:01] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 4% free memory [03:59:52] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 2% free memory [04:02:09] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:05:40] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [04:06:14] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [04:06:15] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [04:06:31] PROBLEM dpkg-check is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:07:18] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [04:11:20] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 5.81, 6.59, 6.33 [04:11:20] RECOVERY dpkg-check is now: OK on incubator-bot2 i-00000252 output: All packages OK [04:16:00] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 5% free memory [04:17:22] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [04:20:22] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 14% free memory [04:25:58] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:37:56] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [04:40:34] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 4% free memory [04:45:23] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 95% free memory [04:47:53] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [04:52:15] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 187 processes [05:01:08] PROBLEM Current Users is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [05:01:08] PROBLEM dpkg-check is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [05:01:08] PROBLEM Disk Space is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [05:01:08] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [05:01:08] PROBLEM Total Processes is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [05:01:13] PROBLEM Free ram is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [05:05:53] RECOVERY Current Users is now: OK on mwreview i-000002ae output: USERS OK - 0 users currently logged in [05:05:53] RECOVERY Disk Space is now: OK on mwreview i-000002ae output: DISK OK [05:05:53] RECOVERY dpkg-check is now: OK on mwreview i-000002ae output: All packages OK [05:05:53] RECOVERY Current Load is now: OK on mwreview i-000002ae output: OK - load average: 0.37, 2.13, 1.88 [05:05:53] RECOVERY Total Processes is now: OK on mwreview i-000002ae output: PROCS OK: 108 processes [05:05:58] RECOVERY Free ram is now: OK on mwreview i-000002ae output: OK: 68% free memory [05:08:43] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [05:18:54] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [05:38:56] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [05:45:34] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [05:48:57] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [05:51:00] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 69 MB (5% inode=57%): [06:09:05] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [06:19:01] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [06:21:11] PROBLEM Free ram is now: UNKNOWN on gluster-4 i-000002e4 output: NRPE: Unable to read output [06:27:23] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [06:31:31] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 8.16, 7.75, 6.92 [06:37:03] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [06:42:02] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [06:42:19] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:43:35] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 18% free memory [06:46:19] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [06:46:30] PROBLEM SSH is now: CRITICAL on bots-sql2 i-000000af output: CRITICAL - Socket timeout after 10 seconds [06:48:29] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [06:48:29] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:35] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:35] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [06:52:51] PROBLEM Free ram is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:56] PROBLEM Current Load is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:56] PROBLEM Disk Space is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:56] PROBLEM Current Users is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:56] PROBLEM Total Processes is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:09] PROBLEM Disk Space is now: CRITICAL on ipv6test1 i-00000282 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:19] PROBLEM Free ram is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:06:12] PROBLEM Current Load is now: WARNING on pybal-precise i-00000289 output: WARNING - load average: 6.37, 6.67, 5.36 [07:06:28] PROBLEM Free ram is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [07:07:56] PROBLEM Total Processes is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:01] PROBLEM dpkg-check is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:01] PROBLEM Current Users is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:02] PROBLEM Free ram is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:02] PROBLEM Current Load is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:02] PROBLEM Disk Space is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:03] RECOVERY SSH is now: OK on bots-sql2 i-000000af output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [07:11:31] PROBLEM HTTP is now: CRITICAL on integration-apache1 i-000002eb output: Connection refused [07:11:31] PROBLEM Free ram is now: WARNING on incubator-bot1 i-00000251 output: Warning: 8% free memory [07:11:31] RECOVERY Current Users is now: OK on integration-apache1 i-000002eb output: USERS OK - 0 users currently logged in [07:11:36] PROBLEM Current Load is now: WARNING on incubator-bot2 i-00000252 output: WARNING - load average: 4.04, 5.49, 6.97 [07:11:36] RECOVERY Current Users is now: OK on incubator-bot2 i-00000252 output: USERS OK - 0 users currently logged in [07:11:36] RECOVERY Disk Space is now: OK on incubator-bot2 i-00000252 output: DISK OK [07:11:36] RECOVERY Free ram is now: OK on incubator-bot2 i-00000252 output: OK: 36% free memory [07:11:41] PROBLEM Current Load is now: CRITICAL on grail i-000002c6 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:12:40] PROBLEM dpkg-check is now: CRITICAL on ganglia-test2 i-00000250 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:13:34] PROBLEM Total Processes is now: CRITICAL on grail i-000002c6 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:13:57] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:13:57] PROBLEM Current Users is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:13:57] PROBLEM Disk Space is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:13:58] PROBLEM Free ram is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:13:58] PROBLEM Total Processes is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:14:19] RECOVERY dpkg-check is now: OK on incubator-bot2 i-00000252 output: All packages OK [07:14:57] PROBLEM Current Load is now: WARNING on ganglia-test2 i-00000250 output: WARNING - load average: 7.31, 7.43, 6.52 [07:15:13] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [07:15:13] PROBLEM Current Load is now: WARNING on upload-wizard i-0000021c output: WARNING - load average: 7.26, 6.14, 5.21 [07:15:13] RECOVERY Disk Space is now: OK on upload-wizard i-0000021c output: DISK OK [07:15:13] RECOVERY Total Processes is now: OK on upload-wizard i-0000021c output: PROCS OK: 91 processes [07:15:18] RECOVERY Free ram is now: OK on upload-wizard i-0000021c output: OK: 93% free memory [07:15:18] RECOVERY Current Users is now: OK on upload-wizard i-0000021c output: USERS OK - 0 users currently logged in [07:15:43] PROBLEM dpkg-check is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:15:43] PROBLEM Free ram is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:16:38] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 70 MB (5% inode=57%): [07:16:48] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 6.54, 9.90, 16.69 [07:16:48] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [07:16:48] RECOVERY Current Load is now: OK on integration-apache1 i-000002eb output: OK - load average: 1.31, 3.83, 4.72 [07:16:53] PROBLEM Free ram is now: CRITICAL on ganglia-test2 i-00000250 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:18:29] PROBLEM Current Load is now: WARNING on configtest-main i-000002dd output: WARNING - load average: 8.75, 8.65, 6.74 [07:18:29] PROBLEM Free ram is now: UNKNOWN on gluster-4 i-000002e4 output: NRPE: Unable to read output [07:18:29] PROBLEM Free ram is now: UNKNOWN on configtest-main i-000002dd output: NRPE: Unable to read output [07:18:29] PROBLEM Current Load is now: WARNING on grail i-000002c6 output: WARNING - load average: 3.55, 5.05, 5.98 [07:18:29] RECOVERY Total Processes is now: OK on grail i-000002c6 output: PROCS OK: 105 processes [07:18:34] RECOVERY HTTP is now: OK on integration-apache1 i-000002eb output: HTTP OK: HTTP/1.1 200 OK - 1104 bytes in 2.449 second response time [07:18:34] PROBLEM Total Processes is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [07:19:18] PROBLEM Current Load is now: WARNING on gluster-4 i-000002e4 output: WARNING - load average: 5.25, 6.30, 6.13 [07:19:18] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:19:19] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:19:19] PROBLEM Disk Space is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:19:19] PROBLEM Current Users is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:19:19] PROBLEM dpkg-check is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [07:19:19] PROBLEM Total Processes is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:19:41] PROBLEM Current Load is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:19:42] PROBLEM Current Load is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:19:42] PROBLEM Total Processes is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:23:39] PROBLEM Current Load is now: WARNING on precise-test i-00000231 output: WARNING - load average: 8.31, 6.53, 5.76 [07:23:39] PROBLEM Free ram is now: UNKNOWN on testforx i-000002f3 output: NRPE: Unable to read output [07:23:39] PROBLEM Current Load is now: WARNING on maps-tilemill1 i-00000294 output: WARNING - load average: 6.12, 7.76, 6.59 [07:23:44] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [07:23:44] PROBLEM Current Load is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:23:54] PROBLEM Current Load is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:24:05] PROBLEM Current Load is now: CRITICAL on ganglia-test2 i-00000250 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:43] PROBLEM host: psm-precise is DOWN address: i-000002f2 PING CRITICAL - Packet loss = 100% [07:26:43] PROBLEM Free ram is now: WARNING on ganglia-test2 i-00000250 output: Warning: 19% free memory [07:26:43] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 5.68, 7.22, 7.10 [07:26:43] RECOVERY Total Processes is now: OK on bots-cb i-0000009e output: PROCS OK: 113 processes [07:26:48] RECOVERY Total Processes is now: OK on mobile-testing i-00000271 output: PROCS OK: 132 processes [07:26:53] PROBLEM Current Load is now: WARNING on rds i-00000207 output: WARNING - load average: 4.63, 6.31, 6.06 [07:26:53] RECOVERY dpkg-check is now: OK on bots-cb i-0000009e output: All packages OK [07:27:03] PROBLEM Current Load is now: WARNING on reportcard2 i-000001ea output: WARNING - load average: 4.38, 5.42, 5.91 [07:27:03] RECOVERY Total Processes is now: OK on reportcard2 i-000001ea output: PROCS OK: 92 processes [07:27:08] RECOVERY Disk Space is now: OK on reportcard2 i-000001ea output: DISK OK [07:27:08] RECOVERY Current Users is now: OK on reportcard2 i-000001ea output: USERS OK - 0 users currently logged in [07:27:15] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [07:27:20] PROBLEM Current Load is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:28:37] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 197 processes [07:29:27] PROBLEM Free ram is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:27] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:27] PROBLEM Current Load is now: CRITICAL on grail i-000002c6 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:37] PROBLEM Current Load is now: CRITICAL on hugglewiki i-000000aa output: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:47] RECOVERY host: psm-precise is UP address: i-000002f2 PING OK - Packet loss = 0%, RTA = 0.54 ms [07:31:54] lo [07:33:09] RECOVERY Total Processes is now: OK on incubator-bot2 i-00000252 output: PROCS OK: 141 processes [07:33:32] PROBLEM Current Load is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:38] PROBLEM Current Load is now: CRITICAL on bots-cb i-0000009e output: CRITICAL - load average: 105.48, 50.95, 25.70 [07:34:38] RECOVERY Current Load is now: OK on gluster-4 i-000002e4 output: OK - load average: 0.08, 1.19, 3.33 [07:35:03] RECOVERY Current Load is now: OK on worker1 i-00000208 output: OK - load average: 5.35, 5.31, 4.98 [07:35:03] RECOVERY Disk Space is now: OK on worker1 i-00000208 output: DISK OK [07:35:03] RECOVERY Current Users is now: OK on worker1 i-00000208 output: USERS OK - 0 users currently logged in [07:35:03] RECOVERY Total Processes is now: OK on worker1 i-00000208 output: PROCS OK: 90 processes [07:35:08] RECOVERY Free ram is now: OK on worker1 i-00000208 output: OK: 94% free memory [07:35:08] PROBLEM Current Load is now: WARNING on mwreview i-000002ae output: WARNING - load average: 5.74, 7.69, 7.47 [07:35:08] PROBLEM Current Load is now: WARNING on hugglewiki i-000000aa output: WARNING - load average: 2.35, 5.34, 6.36 [07:35:09] PROBLEM Current Load is now: WARNING on pybal-precise i-00000289 output: WARNING - load average: 6.60, 6.23, 6.29 [07:35:09] PROBLEM Current Load is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [07:35:09] PROBLEM Current Load is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:35:14] PROBLEM Current Load is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:35:24] PROBLEM Current Load is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:36:04] PROBLEM Free ram is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [07:36:53] PROBLEM Current Load is now: WARNING on precise-test i-00000231 output: WARNING - load average: 3.65, 5.22, 5.60 [07:36:53] PROBLEM Current Load is now: WARNING on ganglia-test2 i-00000250 output: WARNING - load average: 5.03, 5.01, 5.74 [07:36:58] PROBLEM Current Load is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:31] PROBLEM Current Load is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:36] PROBLEM Current Users is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:36] PROBLEM Disk Space is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:37] PROBLEM Total Processes is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:47] PROBLEM Free ram is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:36] PROBLEM Current Users is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:36] PROBLEM Total Processes is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:41] PROBLEM Current Users is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:41] PROBLEM Disk Space is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:41] PROBLEM dpkg-check is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:41] PROBLEM Free ram is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:13] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 1.90, 19.82, 19.32 [07:40:14] PROBLEM Free ram is now: CRITICAL on ganglia-test2 i-00000250 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:14] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:14] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:33] RECOVERY Current Load is now: OK on hugglewiki i-000000aa output: OK - load average: 0.19, 1.79, 4.43 [07:40:49] PROBLEM Total Processes is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:54] PROBLEM Current Load is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:54] PROBLEM Free ram is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:54] PROBLEM dpkg-check is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:54] PROBLEM Disk Space is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:41:58] RECOVERY Current Load is now: OK on maps-tilemill1 i-00000294 output: OK - load average: 4.53, 4.83, 4.89 [07:42:09] RECOVERY Free ram is now: OK on reportcard2 i-000001ea output: OK: 88% free memory [07:42:09] RECOVERY dpkg-check is now: OK on reportcard2 i-000001ea output: All packages OK [07:42:57] RECOVERY Current Load is now: OK on migration1 i-00000261 output: OK - load average: 1.47, 3.94, 4.73 [07:44:07] RECOVERY Current Users is now: OK on precise-test i-00000231 output: USERS OK - 0 users currently logged in [07:44:07] RECOVERY Total Processes is now: OK on mwreview i-000002ae output: PROCS OK: 108 processes [07:44:12] RECOVERY Current Users is now: OK on mwreview i-000002ae output: USERS OK - 0 users currently logged in [07:44:12] RECOVERY Disk Space is now: OK on mwreview i-000002ae output: DISK OK [07:44:12] RECOVERY Free ram is now: OK on mwreview i-000002ae output: OK: 68% free memory [07:44:12] RECOVERY dpkg-check is now: OK on mwreview i-000002ae output: All packages OK [07:45:12] PROBLEM Current Load is now: WARNING on configtest-main i-000002dd output: WARNING - load average: 0.80, 4.61, 6.19 [07:45:18] RECOVERY dpkg-check is now: OK on ganglia-test2 i-00000250 output: All packages OK [07:45:18] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:45:56] RECOVERY Total Processes is now: OK on migration1 i-00000261 output: PROCS OK: 91 processes [07:46:01] RECOVERY Free ram is now: OK on migration1 i-00000261 output: OK: 89% free memory [07:46:01] RECOVERY Disk Space is now: OK on migration1 i-00000261 output: DISK OK [07:46:01] RECOVERY dpkg-check is now: OK on migration1 i-00000261 output: All packages OK [07:46:30] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [07:55:03] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [08:01:33] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 0.56, 1.25, 6.42 [08:11:27] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 3.57, 1.69, 3.96 [08:16:51] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [08:20:25] PROBLEM Free ram is now: UNKNOWN on gluster-4 i-000002e4 output: NRPE: Unable to read output [08:20:25] PROBLEM Free ram is now: WARNING on incubator-bot1 i-00000251 output: Warning: 8% free memory [08:21:35] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 1.99, 1.63, 2.84 [08:22:05] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [08:24:05] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [08:26:04] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [08:42:14] PROBLEM Puppet freshness is now: CRITICAL on wikistats-01 i-00000042 output: Puppet has not run in last 20 hours [08:47:14] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [08:56:36] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [08:58:36] PROBLEM Free ram is now: WARNING on test3 i-00000093 output: Warning: 16% free memory [09:01:08] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [09:04:14] PROBLEM Current Users is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [09:04:14] PROBLEM Disk Space is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [09:04:14] PROBLEM Free ram is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [09:04:14] PROBLEM Total Processes is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [09:04:19] PROBLEM dpkg-check is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [09:11:05] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 5.36, 6.20, 6.13 [09:15:56] PROBLEM Free ram is now: UNKNOWN on configtest-main i-000002dd output: NRPE: Unable to read output [09:17:08] !deployment-prep setting up a bit cache using varnish puppet classes. [09:17:08] deployment-prep is a project to test mediawiki at beta.wmflabs.org before putting it to prod [09:18:36] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [09:21:03] PROBLEM Free ram is now: WARNING on ganglia-test2 i-00000250 output: Warning: 17% free memory [09:21:03] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 188 processes [09:26:57] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [09:27:58] OHHHH [09:28:01] varnish started \O/ [09:49:40] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [09:59:54] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [10:06:33] 06/28/2012 - 10:06:32 - Creating a project directory for pdbhandler [10:06:33] 06/28/2012 - 10:06:32 - Created a home directory for dzahn in project(s): pdbhandler [10:07:32] 06/28/2012 - 10:07:31 - User dzahn may have been modified in LDAP or locally, updating key in project(s): pdbhandler [10:08:54] PROBLEM Free ram is now: CRITICAL on test3 i-00000093 output: Critical: 2% free memory [10:12:33] 06/28/2012 - 10:12:33 - Created a home directory for emw in project(s): pdbhandler [10:13:33] 06/28/2012 - 10:13:33 - User emw may have been modified in LDAP or locally, updating key in project(s): pdbhandler [10:13:34] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [10:20:34] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [10:30:57] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [10:38:48] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [10:40:58] PROBLEM Free ram is now: WARNING on incubator-bot1 i-00000251 output: Warning: 6% free memory [10:43:38] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [10:51:40] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [10:59:31] !log deployment-prep set group write on deployment-nfs-memc:/mnt/export/apache/common-local would let us rewrite the wikiversions.cdb file [10:59:32] Logged the message, Master [11:01:20] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [11:03:53] !log deployment-prep migrated all wiki from php-trunk to php-master by editing wikiversions.dat. Refreshed wikiversions.cdb and renamed ExtensionMessages-trunk.php to ExtensionMessages-master.php [11:03:55] Logged the message, Master [11:21:50] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [11:26:10] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 4.37, 4.67, 4.99 [11:31:53] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [11:34:13] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 4.46, 4.92, 5.01 [11:39:23] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 4.91, 4.82, 4.93 [11:51:54] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [12:01:54] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [12:16:20] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: Critical: 5% free memory [12:22:00] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [12:31:31] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: PROCS CRITICAL: 205 processes [12:31:36] RECOVERY Free ram is now: OK on ganglia-test2 i-00000250 output: OK: 86% free memory [12:32:00] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [12:36:24] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 191 processes [12:51:04] !log pdbhandler this is a new project dedicated to '"a MediaWiki extension to enable interactive 3D models of large molecules, like proteins and DNA', owner: Emw [12:51:05] Logged the message, Master [12:52:04] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [13:06:24] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [13:22:04] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [13:27:14] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 6.28, 6.35, 5.99 [13:33:34] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [13:36:48] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [13:41:44] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 70 MB (5% inode=57%): [13:52:43] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [14:01:22] RECOVERY Puppet freshness is now: OK on deployment-imagescaler01 i-0000025a output: puppet ran at Thu Jun 28 14:00:54 UTC 2012 [14:07:03] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [14:10:29] !log deployment-prep running puppet on apache{30,31}. The /etc/sudoers conflict has been merged in :) [14:10:31] Logged the message, Master [14:13:00] notice: /Stage[main]/Base::Vimconfig/File[/etc/alternatives/editor]/target: target changed '/bin/nano' to '/usr/bin/vim' [14:13:02] ... [14:15:17] RECOVERY Puppet freshness is now: OK on deployment-apache30 i-000002d3 output: puppet ran at Thu Jun 28 14:15:12 UTC 2012 [14:16:37] RECOVERY Puppet freshness is now: OK on deployment-jobrunner05 i-0000028c output: puppet ran at Thu Jun 28 14:16:23 UTC 2012 [14:20:33] 06/28/2012 - 14:20:32 - User laner may have been modified in LDAP or locally, updating key in project(s): deployment-prep [14:23:23] PROBLEM dpkg-check is now: CRITICAL on deployment-apache31 i-000002d4 output: DPKG CRITICAL dpkg reports broken packages [14:23:38] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [14:23:55] mhmm [14:24:05] looks like I broke nrpe [14:24:21] PROBLEM Total Processes is now: CRITICAL on deployment-apache30 i-000002d3 output: Connection refused by host [14:24:26] PROBLEM Free ram is now: CRITICAL on deployment-apache30 i-000002d3 output: Connection refused by host [14:25:09] you did? [14:25:40] paravoid: did the cabling problem ever get fixed? [14:25:55] not as far as I know [14:26:02] :( [14:26:25] yeah... [14:28:19] ATTENTION!! rebooting maganese (aka gerrit) for dist-upgrade [14:29:18] RECOVERY Total Processes is now: OK on deployment-apache30 i-000002d3 output: PROCS OK: 127 processes [14:29:23] RECOVERY Free ram is now: OK on deployment-apache30 i-000002d3 output: OK: 92% free memory [14:32:47] Jeff_Green: cool! RT-1802 ? [14:34:19] mutante: formey? [14:34:33] eh, yeah, it says gerrit [14:34:43] did gerrit move after this ticket was created? the hostname is manganese [14:35:17] gerrit used to be on formey [14:35:21] well, formey is also up and a Wikimedia Wikimedia gerrit (git) server (gerrit). [14:35:25] but that box was not powerful enough and doing to many things [14:35:31] it is still running though [14:35:36] and also says SVN [14:35:36] so ryan moved gerrit to the new manganese [14:35:43] svn should be on formey still [14:35:47] PROBLEM Disk Space is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [14:35:47] PROBLEM Total Processes is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [14:35:59] svn is still on formey, yes [14:36:06] arar disk space [14:36:11] gerrit is too [14:36:16] PROBLEM Current Load is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [14:36:16] PROBLEM Current Users is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [14:36:21] we have a replica gerrit server on formey [14:36:40] hmm, so formey needs a kernel upgrade / reboot too [14:37:04] go for it, if you'd like [14:37:08] I can also do it, if you'd like [14:37:22] well, if there is no announcement to be made.. i can just do it now [14:37:42] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [14:37:46] it fits in with the other reboot [14:37:51] mutante: yep [14:37:59] doing it [14:38:03] i just warned in several IRC channels as you probably saw [14:39:16] 2.6.32-41-server and rebooting [14:40:14] PROBLEM Free ram is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [14:40:44] PROBLEM dpkg-check is now: CRITICAL on deployment-bastion i-000002bd output: CHECK_NRPE: Socket timeout after 10 seconds. [14:41:29] done [14:41:52] RECOVERY Puppet freshness is now: OK on deployment-bastion i-000002bd output: puppet ran at Thu Jun 28 14:41:33 UTC 2012 [14:45:37] RECOVERY dpkg-check is now: OK on deployment-bastion i-000002bd output: All packages OK [14:46:18] RECOVERY Current Load is now: OK on deployment-apache31 i-000002d4 output: OK - load average: 4.67, 4.12, 3.09 [14:46:18] RECOVERY Current Users is now: OK on deployment-apache31 i-000002d4 output: USERS OK - 2 users currently logged in [14:50:07] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [14:50:07] RECOVERY Disk Space is now: OK on deployment-apache31 i-000002d4 output: DISK OK [14:50:07] RECOVERY Total Processes is now: OK on deployment-apache31 i-000002d4 output: PROCS OK: 123 processes [14:50:36] !log integration created geoip-on-labs Lucid instance to find out if geoip puppet class apply cleanly on labs [14:50:38] Logged the message, Master [14:53:34] RECOVERY Puppet freshness is now: OK on deployment-apache31 i-000002d4 output: puppet ran at Thu Jun 28 14:53:33 UTC 2012 [14:55:39] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [14:57:04] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [14:57:04] PROBLEM Free ram is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [14:59:23] RECOVERY dpkg-check is now: OK on deployment-apache31 i-000002d4 output: All packages OK [14:59:30] paravoid: I'm using hp cloud for someone and I got this notice "We have scheduled maintenance for July 5, 2012 10:00 am CDT - 10:00 pm CDT which will cause the instance listed to be inaccessible during the noted window." [14:59:45] paravoid: maybe we can just schedule 12 hour maintenance windows any time we want [15:01:45] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [15:01:45] PROBLEM Free ram is now: UNKNOWN on gluster-4 i-000002e4 output: NRPE: Unable to read output [15:02:01] LOL! [15:02:04] 12 hours?! [15:02:28] at GRNET we couldn't have maint windows more than 2h in general and over 1h per service [15:02:43] and that was a non-profit [15:02:46] wtf [15:03:31] yes [15:03:44] PROBLEM Current Load is now: CRITICAL on geoip-on-labs i-000002f4 output: Connection refused by host [15:04:01] this was my reply: ".... A 12 hour maintenance period? There's no way you guys can live-migrate the instance to another host or something?" [15:04:17] they have no internal ds [15:04:19] *dns [15:04:24] PROBLEM Current Users is now: CRITICAL on geoip-on-labs i-000002f4 output: Connection refused by host [15:04:26] they have no friendly instance names [15:05:00] what is it based on? [15:05:02] openstack? [15:05:04] PROBLEM Disk Space is now: CRITICAL on geoip-on-labs i-000002f4 output: Connection refused by host [15:05:11] yep [15:05:22] I know there's DNS support in openstack ;) [15:05:28] I *know* it. heh [15:05:44] PROBLEM Free ram is now: CRITICAL on geoip-on-labs i-000002f4 output: Connection refused by host [15:05:45] they use some custom dashboard that isn't nearly as good as horizon [15:06:22] that service is driving me insane [15:06:53] I think they are still in beta, though, so maybe I should cut them some slack :D [15:06:54] PROBLEM Total Processes is now: CRITICAL on geoip-on-labs i-000002f4 output: Connection refused by host [15:07:34] PROBLEM dpkg-check is now: CRITICAL on geoip-on-labs i-000002f4 output: Connection refused by host [15:08:57] hashar: you're reading ops@, right? [15:09:04] PROBLEM Disk Space is now: CRITICAL on deployment-apache31 i-000002d4 output: Connection refused by host [15:09:04] PROBLEM Total Processes is now: CRITICAL on deployment-apache31 i-000002d4 output: Connection refused by host [15:09:14] hashar: see my vac mail [15:09:16] PROBLEM Current Load is now: CRITICAL on deployment-apache31 i-000002d4 output: Connection refused by host [15:09:17] PROBLEM Current Users is now: CRITICAL on deployment-apache31 i-000002d4 output: Connection refused by host [15:10:52] crap [15:10:59] I'm going to need to set up virt6-8 then [15:11:04] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [15:11:09] well, they're basically done anyway [15:11:44] RECOVERY Free ram is now: OK on deployment-apache31 i-000002d4 output: OK: 91% free memory [15:11:54] RECOVERY Total Processes is now: OK on geoip-on-labs i-000002f4 output: PROCS OK: 77 processes [15:12:01] paravoid: have fun out there. let me know how the cloud track goes. [15:12:34] RECOVERY dpkg-check is now: OK on geoip-on-labs i-000002f4 output: All packages OK [15:12:44] well, I'd like to see it [15:12:56] so, if we need to move it in the next two weeks, I understand [15:13:10] but could you please leave e.g. virt8? [15:13:20] paravoid: I do get ops mails . Definitely enjoy the debian stuff :-] It is great to know you are going there [15:13:24] so I can get to do one and see how it goes [15:13:44] RECOVERY Current Load is now: OK on geoip-on-labs i-000002f4 output: OK - load average: 0.13, 1.20, 1.27 [15:13:50] hm [15:13:56] well [15:14:01] we need to move 150 vms [15:14:10] I guess two boxes can take 75 each [15:14:24] RECOVERY Current Users is now: OK on geoip-on-labs i-000002f4 output: USERS OK - 1 users currently logged in [15:14:38] it'll be interesting to deal with the network node [15:14:48] I'd like to see how migrations work too [15:15:00] but I guess I can do one later, between virt6 and virt7 :) [15:15:04] RECOVERY Disk Space is now: OK on geoip-on-labs i-000002f4 output: DISK OK [15:15:10] heh [15:15:16] anyway, a real shame that it had to come to that :/ [15:15:25] based on the docs, I have a feeling its cold migration [15:15:40] well, we waited on hardware forever [15:15:45] yeah :/ [15:15:50] PROBLEM Free ram is now: UNKNOWN on geoip-on-labs i-000002f4 output: NRPE: Unable to read output [15:16:40] and now we're back to waiting on hardwarew [15:16:56] :/ [15:22:04] paravoid: forget about me during your two weeks trip. If I am blocked I can just ask Daniel or use puppetmaster::self [15:22:18] paravoid: thanks for all you did so far and enjoy the trip! [15:22:32] (trying to be nice with my limited english) [15:26:43] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [15:32:29] thanks :) [15:42:16] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [15:42:16] let s crash beta once more [15:47:20] !log deployment-prep updating mediawiki-config to latest master [15:47:21] Logged the message, Master [15:51:56] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [15:54:24] I am doomed [15:54:29] got 404 now :) [15:56:45] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [15:57:55] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [16:06:44] success [16:07:01] !log deployment-prep apache conf uses site.conf :-(((( need to puppetize that one day [16:07:03] Logged the message, Master [16:07:05] I am off [16:07:08] see you tomorrow [16:08:09] bye :) [16:13:17] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [16:25:41] Ryan_Lane: btw, what did you do with gluster? [16:25:48] nothing [16:25:48] you said no downtime needed after all? [16:25:52] yeah [16:25:55] I'm going to re-schedule it [16:26:37] I need to double check, though [16:26:49] because who knows, maybe they made the incompatible change after our beta [16:28:28] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [16:32:55] Looks like someone broke the tollserver homepage lol... [16:32:59] tool* [16:34:45] And Daniel pings out :P [16:34:49] It's maintaince day today [16:34:58] Ahh, make sense [16:35:07] Looks like they are updating their mediawiki [16:35:31] Your ability to read error messages still amazes me [16:35:31] :P [16:35:44] Thanks Damianz... thanks [16:36:07] Allways welcome [16:36:08] :DS [16:43:17] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [16:58:35] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [17:07:39] hi all, I'm looking to set up a new labs instance for running global development analytics [17:07:57] i'm under the impression that I have sys admin status [17:08:05] and so should be able to create a new instance [17:08:25] but I don't see any interface for this on my "manage instances" tab [17:08:51] which, according to https://labsconsole.wikimedia.org/wiki/Help:Instances, is the place to start [17:08:58] any help would be appreciated [17:13:09] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [17:13:27] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [17:21:08] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 69 MB (5% inode=57%): [17:25:08] erosen, the link it's quite obvious... [17:25:33] i would have expected that [17:25:41] so either, I'm having trouble finding it [17:25:44] let me look [17:25:46] or it's maybe not showing up [17:25:58] assuming it is the first case [17:26:04] where exactly on the page is it [17:26:07] go to https://labsconsole.wikimedia.org/wiki/Special:NovaInstance [17:26:15] click the project and press submit [17:26:28] then in the project header, you have links toggle and Add instance [17:26:59] the direct link would be https://labsconsole.wikimedia.org/w/index.php?title=Special:NovaInstance&action=create&project=analytics [17:27:18] (assuming it is called analytics) [17:27:41] 06/28/2012 - 17:27:41 - Creating a home directory for mfarag at /export/keys/mfarag [17:27:42] aha [17:28:02] it appears I am not a sysadmin after all [17:28:13] i get the error: "You must be a member of the sysadmin role to perform this action." [17:28:13] you got an error? [17:28:27] seems so [17:28:41] 06/28/2012 - 17:28:40 - Updating keys for mfarag at /export/keys/mfarag [17:28:52] but when I go to manage global roles [17:29:03] i see my username as a member of the sysadmin role [17:29:41] 06/28/2012 - 17:29:40 - Creating a home directory for bachsau at /export/keys/bachsau [17:30:02] do you happen to know who I should talk to about this? [17:30:17] assuming they're not on this channel already [17:30:38] 06/28/2012 - 17:30:38 - Updating keys for bachsau at /export/keys/bachsau [17:30:38] 06/28/2012 - 17:30:38 - Updating keys for ckepper at /export/keys/ckepper [17:30:54] what's your username? [17:31:02] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [17:31:07] talk with Ryan_Lane or mutante [17:31:17] erosen [17:31:30] thanks [17:31:37] everyone is in the global role [17:31:48] it's a weirdness with how roles work in openstack in this version [17:31:53] that'll change in the next release [17:32:06] that said, if you aren't in the sysadmin role in a project, you'll need a sysadmin to do so [17:32:11] in that project [17:32:20] i see [17:32:45] i was hoping to have a separate project for global development [17:32:59] which I realize means I need to go down the project creation route [17:33:22] is that something i can help myself to [17:39:48] erosen: can i help? [17:40:24] perhaps [17:40:33] maybe we can take this to wikimedia-analytics [17:43:32] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [17:53:24] erosen: I thought global dev already had a project [17:54:32] there is a Globaleducation project, but nothing else [17:54:37] and nothing I've heard of [17:54:51] what would you be doing in the globaldev project? [17:54:57] I don't think I need to create a project just yet [17:55:11] well, if you need to, just let me know :) [17:55:11] the idea is to host a special purpose report card-like site [17:55:17] heh [17:55:23] don't we already have some of those? :) [17:55:32] hmm [17:55:36] it's always possible [17:55:41] which are you thinking of? [17:55:49] analytics has reportcard.wmflabs.org [17:56:36] yeah, the idea is to make a sister site [17:56:49] or at least a sandbox for global development-specific metrics [18:03:29] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [18:10:20] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 21% free memory [18:11:20] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 2.97, 4.51, 3.12 [18:13:57] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [18:16:17] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [18:16:17] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 1.20, 2.70, 2.72 [18:18:17] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [18:18:29] http://ee-prototype.wmflabs.org/ is running at a snail's pace [18:22:36] do you have apc enabled? [18:22:38] memcache? [18:22:49] also, please don't use the labs logo in the project :) [18:23:01] legal will dislike it quite a bi [18:23:02] *bit [18:23:30] are jobs disabled in the settings, and set to run via a cron? [18:24:15] heh, I'll investigate all of those things after our deploy is finished [18:24:22] no time to right now [18:24:40] labs itself is likely a little slow right now [18:24:54] people have again gone on an instance launching spree [18:34:01] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [18:36:39] 06/28/2012 - 18:36:38 - Creating a home directory for adminxor at /export/keys/adminxor [18:37:40] 06/28/2012 - 18:37:39 - Updating keys for adminxor at /export/keys/adminxor [18:43:24] PROBLEM Puppet freshness is now: CRITICAL on wikistats-01 i-00000042 output: Puppet has not run in last 20 hours [18:44:04] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [18:56:40] 06/28/2012 - 18:56:40 - Updating keys for mfarag at /export/keys/mfarag [18:56:52] Hi all! I'm new to this channel. A few days back I was in #wikimedia-tech and I expressed my interest to contribute to Wikimedia Foundation as a volunteer/freelance sysadmin. I was told that OPS/monitoring team would need a lot of help and was directed to this channel. Eventually, I got my login ID setup for wikimedia-labs. I was wondering if someone could help me out how to go from here. [18:56:59] Many Thanks. [18:57:33] Any good at baking? [18:57:40] I feel like chocolate chip cookies [18:58:23] adminxor: welcome! well, you already have an account the bot just told us before you joined. so that was a step [18:58:40] Also basically whatever you want to work on you pretty much can, most stuff used in production can be contributed to via gerrit (and poking people), labs provides recourses for testing/tweaking/playing/running stuff. [18:58:57] * Damianz is sure multiple people around here want help with stuff [18:59:03] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 3.90, 4.36, 5.00 [18:59:16] adminxor: next step is probably the "Git" page on labsconsole and setup once , try cloning from repo [19:00:41] 06/28/2012 - 19:00:41 - Updating keys for mfarag at /export/keys/mfarag [19:01:22] mutante: Thank you [19:01:27] adminxor: then check out if there is a specific project you are interested in and already exists, if yes, ask project admins to become a member, if not, suggest a project [19:01:35] Damianz: Thank you too [19:01:52] Okay [19:02:09] ohai [19:04:02] Well, I was actually kinda lost trying to figure out on what to do. But you surely helped me. Let me browse through the projects. [19:04:07] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [19:04:39] 06/28/2012 - 19:04:39 - Updating keys for mfarag at /export/keys/mfarag [19:05:39] adminxor: eh yea, admittedly it may be hard to understand what all projects are about from just the names, but there are also extended wiki pages and logs for them that may tell you more. or feel free to ask. i need to run though for the moment. [19:06:28] adminxor: and note the "project filter" to filter which you are seeing .. bbl [19:07:12] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 5.89, 5.50, 5.28 [19:08:20] great! [19:14:11] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [19:34:08] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [19:37:28] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 3.89, 4.52, 4.93 [19:44:16] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [19:55:08] PROBLEM host: labs-build1 is DOWN address: i-0000006b CRITICAL - Host Unreachable (i-0000006b) [19:59:28] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 69 MB (5% inode=57%): [20:01:02] RECOVERY host: labs-build1 is UP address: i-0000006b PING OK - Packet loss = 0%, RTA = 11.05 ms [20:04:24] PROBLEM Free ram is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [20:04:44] PROBLEM Free ram is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:05:29] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [20:07:57] PROBLEM Current Load is now: CRITICAL on bots-cb i-0000009e output: CRITICAL - load average: 14.49, 29.84, 15.91 [20:09:11] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [20:09:11] PROBLEM Free ram is now: UNKNOWN on configtest-main i-000002dd output: NRPE: Unable to read output [20:11:18] PROBLEM Current Load is now: WARNING on wikistats-01 i-00000042 output: WARNING - load average: 5.62, 7.83, 6.53 [20:12:58] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 0.86, 11.03, 11.55 [20:14:59] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [20:16:02] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [20:26:06] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 20% free memory [20:27:57] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 2.61, 1.60, 4.89 [20:30:27] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 6.08, 5.73, 5.20 [20:39:11] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [20:39:41] PROBLEM Free ram is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:45:01] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [20:49:00] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [21:01:16] RECOVERY Current Load is now: OK on wikistats-01 i-00000042 output: OK - load average: 0.72, 2.97, 5.00 [21:09:50] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [21:15:15] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [21:19:27] PROBLEM Current Load is now: WARNING on wikistats-01 i-00000042 output: WARNING - load average: 3.70, 5.52, 5.29 [21:23:47] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [21:28:37] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [21:31:14] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 4.52, 4.47, 4.93 [21:34:42] 06/28/2012 - 21:34:41 - Creating a home directory for kareneddy at /export/keys/kareneddy [21:35:40] 06/28/2012 - 21:35:40 - Updating keys for kareneddy at /export/keys/kareneddy [21:40:09] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [21:45:25] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [21:48:55] PROBLEM Current Users is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:48:55] PROBLEM Current Load is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:48:55] PROBLEM dpkg-check is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:49:58] PROBLEM Free ram is now: UNKNOWN on incubator-bot1 i-00000251 output: NRPE: Call to fork() failed [21:51:04] PROBLEM dpkg-check is now: UNKNOWN on incubator-bot1 i-00000251 output: NRPE: Call to fork() failed [21:52:09] PROBLEM Total Processes is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:53:34] PROBLEM Current Load is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:53:34] PROBLEM Current Users is now: UNKNOWN on incubator-bot1 i-00000251 output: NRPE: Call to fork() failed [21:53:44] RECOVERY Current Users is now: OK on psm-precise i-000002f2 output: USERS OK - 0 users currently logged in [21:53:44] RECOVERY Current Load is now: OK on psm-precise i-000002f2 output: OK - load average: 1.31, 4.16, 2.69 [21:53:44] RECOVERY dpkg-check is now: OK on psm-precise i-000002f2 output: All packages OK [21:54:30] PROBLEM Disk Space is now: UNKNOWN on incubator-bot1 i-00000251 output: NRPE: Call to fork() failed [21:56:04] PROBLEM dpkg-check is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:58:34] PROBLEM Total Processes is now: WARNING on psm-precise i-000002f2 output: PROCS WARNING: 152 processes [21:58:34] PROBLEM Current Users is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:59:24] PROBLEM Disk Space is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:59:24] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:00:14] PROBLEM Puppet freshness is now: CRITICAL on gerrit i-000000ff output: Puppet has not run in last 20 hours [22:00:24] PROBLEM SSH is now: CRITICAL on incubator-bot1 i-00000251 output: Server answer: [22:02:04] RECOVERY Total Processes is now: OK on incubator-bot1 i-00000251 output: PROCS OK: 140 processes [22:03:34] RECOVERY Current Load is now: OK on incubator-bot1 i-00000251 output: OK - load average: 0.78, 1.05, 0.87 [22:03:34] RECOVERY Current Users is now: OK on incubator-bot1 i-00000251 output: USERS OK - 0 users currently logged in [22:04:24] RECOVERY Disk Space is now: OK on incubator-bot1 i-00000251 output: DISK OK [22:04:24] RECOVERY Free ram is now: OK on incubator-bot1 i-00000251 output: OK: 36% free memory [22:05:34] RECOVERY SSH is now: OK on incubator-bot1 i-00000251 output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [22:06:04] RECOVERY dpkg-check is now: OK on incubator-bot1 i-00000251 output: All packages OK [22:10:44] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [22:15:44] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [22:25:49] PROBLEM Disk Space is now: CRITICAL on ipv6test1 i-00000282 output: DISK CRITICAL - free space: / 37 MB (2% inode=57%): [22:30:44] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 69 MB (5% inode=57%): [22:37:24] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [22:40:44] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [22:45:44] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [22:58:54] PROBLEM Total Processes is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [23:03:29] PROBLEM Total Processes is now: WARNING on psm-precise i-000002f2 output: PROCS WARNING: 156 processes [23:11:09] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [23:16:09] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [23:29:43] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [23:29:50] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [23:34:41] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [23:41:11] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [23:46:11] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [23:49:37] how hard would it be to get nice mysql statistics from a labs instance? [23:49:49] e.g. common queries, or slow queries stuff like that.