[00:11:49] RECOVERY Current Load is now: OK on bots-3 i-000000e5 output: OK - load average: 3.25, 3.87, 4.85 [00:14:09] PROBLEM Puppet freshness is now: CRITICAL on precise-test i-00000231 output: Puppet has not run in last 20 hours [00:18:39] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [00:20:39] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [00:20:39] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [00:20:39] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [00:46:29] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 5.10, 5.24, 5.09 [00:48:39] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [00:50:39] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [00:50:39] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [00:50:39] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [00:51:29] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 4.83, 4.86, 4.97 [01:18:39] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [01:20:39] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [01:20:39] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [01:20:39] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [01:48:39] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [01:50:39] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [01:50:39] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [01:50:39] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [02:18:39] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [02:20:39] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [02:20:39] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [02:20:39] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [02:48:42] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [02:50:42] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [02:50:42] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [02:50:42] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [02:53:20] 06/04/2012 - 02:53:20 - Updating keys for laner at /export/home/deployment-prep/laner [02:56:22] RECOVERY Puppet freshness is now: OK on pybal-precise i-00000289 output: puppet ran at Mon Jun 4 02:56:12 UTC 2012 [03:18:42] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [03:20:42] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [03:20:42] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [03:20:42] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [03:37:32] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 16% free memory [03:40:42] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 12% free memory [03:43:02] @add #wm-bot [03:45:42] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 15% free memory [03:48:42] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [03:50:42] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [03:50:42] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [03:50:42] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [03:52:12] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 15% free memory [03:57:32] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 3% free memory [04:00:42] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 5% free memory [04:00:42] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:02:32] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 96% free memory [04:10:42] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:10:42] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [04:12:12] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 4% free memory [04:17:12] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 96% free memory [04:18:42] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [04:20:42] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [04:20:42] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [04:20:42] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [04:48:42] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [04:50:42] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [04:50:42] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [04:50:42] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [05:18:42] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [05:20:42] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [05:20:42] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [05:20:42] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [05:26:52] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [05:34:52] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 71 MB (5% inode=58%): [05:42:12] PROBLEM Puppet freshness is now: CRITICAL on deployment-apache23 i-00000270 output: Puppet has not run in last 20 hours [05:48:42] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [05:50:42] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [05:50:42] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [05:50:42] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [06:18:42] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [06:20:42] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [06:20:42] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [06:20:42] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [06:35:52] PROBLEM Disk Space is now: CRITICAL on ganglia-test4 i-000002a2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:35:52] PROBLEM Current Users is now: CRITICAL on ganglia-test4 i-000002a2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:37:24] PROBLEM Total Processes is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:37:49] PROBLEM Current Load is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:37:49] PROBLEM Current Users is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:37:49] PROBLEM Current Load is now: CRITICAL on ganglia-test4 i-000002a2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:37:49] PROBLEM Total Processes is now: CRITICAL on ganglia-test4 i-000002a2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:14] PROBLEM Disk Space is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:14] PROBLEM Free ram is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:14] PROBLEM Current Users is now: CRITICAL on e3 i-00000291 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:14] PROBLEM dpkg-check is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:51] PROBLEM dpkg-check is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:51] PROBLEM Current Users is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:51] PROBLEM Disk Space is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:40:01] PROBLEM Current Load is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:40:01] PROBLEM Free ram is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:40:01] PROBLEM Total Processes is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:40:44] RECOVERY Disk Space is now: OK on ganglia-test4 i-000002a2 output: DISK OK [06:40:44] RECOVERY Current Users is now: OK on ganglia-test4 i-000002a2 output: USERS OK - 0 users currently logged in [06:41:09] PROBLEM Free ram is now: CRITICAL on e3 i-00000291 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:41:15] PROBLEM dpkg-check is now: CRITICAL on e3 i-00000291 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:41:35] RECOVERY Total Processes is now: OK on maps-test2 i-00000253 output: PROCS OK: 90 processes [06:42:05] PROBLEM Total Processes is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:42:35] RECOVERY Current Load is now: OK on maps-test2 i-00000253 output: OK - load average: 0.94, 4.48, 3.02 [06:42:36] RECOVERY Current Users is now: OK on maps-test2 i-00000253 output: USERS OK - 0 users currently logged in [06:42:36] RECOVERY Current Load is now: OK on ganglia-test4 i-000002a2 output: OK - load average: 2.31, 5.42, 3.31 [06:42:36] RECOVERY Total Processes is now: OK on ganglia-test4 i-000002a2 output: PROCS OK: 193 processes [06:43:18] PROBLEM Current Load is now: CRITICAL on deployment-jobrunner05 i-0000028c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:43:46] RECOVERY Current Users is now: OK on e3 i-00000291 output: USERS OK - 0 users currently logged in [06:43:46] RECOVERY Disk Space is now: OK on maps-test2 i-00000253 output: DISK OK [06:43:46] RECOVERY Free ram is now: OK on maps-test2 i-00000253 output: OK: 94% free memory [06:43:46] RECOVERY dpkg-check is now: OK on maps-test2 i-00000253 output: All packages OK [06:44:39] PROBLEM Disk Space is now: CRITICAL on deployment-jobrunner05 i-0000028c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:39] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner05 i-0000028c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:39] PROBLEM Current Users is now: CRITICAL on deployment-jobrunner05 i-0000028c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:39] PROBLEM Free ram is now: CRITICAL on deployment-jobrunner05 i-0000028c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:39] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 7.40, 6.50, 5.50 [06:46:14] PROBLEM Total Processes is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:46:39] PROBLEM Disk Space is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:46:55] PROBLEM Free ram is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:46:56] PROBLEM Free ram is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:46:56] PROBLEM Current Load is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:14] PROBLEM dpkg-check is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:14] PROBLEM Disk Space is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:28] PROBLEM Total Processes is now: CRITICAL on e3 i-00000291 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:34] PROBLEM Disk Space is now: CRITICAL on e3 i-00000291 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:34] PROBLEM Disk Space is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:34] PROBLEM Current Load is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:34] PROBLEM Current Users is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:34] PROBLEM Free ram is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:34] PROBLEM dpkg-check is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:41] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [06:50:18] PROBLEM Total Processes is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:38] RECOVERY Total Processes is now: OK on zeromq1 i-000002b7 output: PROCS OK: 87 processes [06:50:58] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [06:50:58] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [06:50:58] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [06:50:58] RECOVERY Free ram is now: OK on e3 i-00000291 output: OK: 94% free memory [06:50:58] RECOVERY dpkg-check is now: OK on e3 i-00000291 output: All packages OK [06:51:19] PROBLEM Current Load is now: CRITICAL on ganglia-test4 i-000002a2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:51:19] PROBLEM Total Processes is now: CRITICAL on ganglia-test4 i-000002a2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:33] RECOVERY Total Processes is now: OK on e3 i-00000291 output: PROCS OK: 91 processes [06:53:41] RECOVERY Disk Space is now: OK on e3 i-00000291 output: DISK OK [06:53:54] PROBLEM dpkg-check is now: CRITICAL on ganglia-test4 i-000002a2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:33] PROBLEM Current Load is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:33] PROBLEM Current Users is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:33] PROBLEM Current Users is now: CRITICAL on ganglia-test4 i-000002a2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:33] PROBLEM Disk Space is now: CRITICAL on ganglia-test4 i-000002a2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:33] PROBLEM Disk Space is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:33] PROBLEM dpkg-check is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:34] PROBLEM Total Processes is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:40] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: (Service Check Timed Out) [06:55:00] PROBLEM Free ram is now: CRITICAL on ganglia-test4 i-000002a2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:10] PROBLEM Current Load is now: CRITICAL on mwreview-test6 i-000002b9 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:10] PROBLEM Current Load is now: CRITICAL on deployment-bastion i-000002bd output: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:10] PROBLEM Current Users is now: CRITICAL on deployment-bastion i-000002bd output: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:52] RECOVERY Disk Space is now: OK on pediapress-ocg2 i-00000234 output: DISK OK [06:57:21] PROBLEM SSH is now: CRITICAL on mobile-wlm i-000002bc output: No route to host [06:57:33] PROBLEM Disk Space is now: CRITICAL on tw-next i-0000027e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:57:33] PROBLEM Current Load is now: CRITICAL on tw-next i-0000027e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:57:33] PROBLEM Current Users is now: CRITICAL on tw-next i-0000027e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:57:33] PROBLEM Total Processes is now: CRITICAL on tw-next i-0000027e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:57:41] PROBLEM Free ram is now: CRITICAL on tw-next i-0000027e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:58:29] PROBLEM Disk Space is now: WARNING on nagios 127.0.0.1 output: DISK WARNING - free space: /home/dzahn 3589 MB (20% inode=77%): [06:58:29] RECOVERY dpkg-check is now: OK on ganglia-test4 i-000002a2 output: All packages OK [06:58:48] RECOVERY Disk Space is now: OK on fr-wiki-db-precise i-0000023e output: DISK OK [06:58:49] RECOVERY Current Load is now: OK on fr-wiki-db-precise i-0000023e output: OK - load average: 6.06, 6.74, 4.67 [06:58:49] RECOVERY Current Users is now: OK on fr-wiki-db-precise i-0000023e output: USERS OK - 0 users currently logged in [06:58:49] RECOVERY Free ram is now: OK on fr-wiki-db-precise i-0000023e output: OK: 83% free memory [06:58:49] RECOVERY dpkg-check is now: OK on fr-wiki-db-precise i-0000023e output: All packages OK [06:59:42] RECOVERY Free ram is now: OK on ganglia-test4 i-000002a2 output: OK: 79% free memory [06:59:52] RECOVERY Disk Space is now: OK on incubator-bot0 i-00000296 output: DISK OK [06:59:52] RECOVERY Current Load is now: OK on incubator-bot0 i-00000296 output: OK - load average: 2.41, 3.65, 4.07 [06:59:52] RECOVERY Current Users is now: OK on incubator-bot0 i-00000296 output: USERS OK - 0 users currently logged in [06:59:52] RECOVERY Free ram is now: OK on incubator-bot0 i-00000296 output: OK: 87% free memory [06:59:52] RECOVERY Total Processes is now: OK on incubator-bot0 i-00000296 output: PROCS OK: 102 processes [06:59:57] RECOVERY dpkg-check is now: OK on incubator-bot0 i-00000296 output: All packages OK [07:00:08] PROBLEM Total Processes is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:01:01] RECOVERY Current Load is now: OK on deployment-bastion i-000002bd output: OK - load average: 1.33, 3.11, 2.10 [07:01:01] RECOVERY Current Users is now: OK on deployment-bastion i-000002bd output: USERS OK - 0 users currently logged in [07:01:01] RECOVERY Current Load is now: OK on mwreview-test6 i-000002b9 output: OK - load average: 0.11, 1.27, 0.94 [07:01:14] PROBLEM dpkg-check is now: CRITICAL on en-wiki-db-precise i-0000023c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:01:14] PROBLEM Current Load is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:01:14] PROBLEM Current Users is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:01:14] PROBLEM Disk Space is now: CRITICAL on en-wiki-db-precise i-0000023c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:01:14] PROBLEM Current Users is now: CRITICAL on en-wiki-db-precise i-0000023c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:16] PROBLEM Disk Space is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:16] PROBLEM Free ram is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:16] PROBLEM dpkg-check is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:27] PROBLEM dpkg-check is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:47] PROBLEM Current Users is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:47] PROBLEM Free ram is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:57] RECOVERY Total Processes is now: OK on ve-nodejs i-00000245 output: PROCS OK: 87 processes [07:03:35] PROBLEM Current Load is now: WARNING on deployment-jobrunner05 i-0000028c output: WARNING - load average: 6.89, 7.24, 7.46 [07:04:25] RECOVERY Current Users is now: OK on build-precise1 i-00000273 output: USERS OK - 0 users currently logged in [07:04:25] RECOVERY Current Load is now: OK on build-precise1 i-00000273 output: OK - load average: 5.62, 5.89, 4.65 [07:04:25] RECOVERY Disk Space is now: OK on build-precise1 i-00000273 output: DISK OK [07:04:25] RECOVERY Total Processes is now: OK on build-precise1 i-00000273 output: PROCS OK: 92 processes [07:04:31] RECOVERY dpkg-check is now: OK on build-precise1 i-00000273 output: All packages OK [07:04:45] RECOVERY Disk Space is now: OK on deployment-jobrunner05 i-0000028c output: DISK OK [07:04:45] RECOVERY Current Users is now: OK on deployment-jobrunner05 i-0000028c output: USERS OK - 0 users currently logged in [07:04:45] RECOVERY dpkg-check is now: OK on deployment-jobrunner05 i-0000028c output: All packages OK [07:04:45] RECOVERY Free ram is now: OK on deployment-jobrunner05 i-0000028c output: OK: 87% free memory [07:06:02] RECOVERY dpkg-check is now: OK on en-wiki-db-precise i-0000023c output: All packages OK [07:06:11] RECOVERY Disk Space is now: OK on en-wiki-db-precise i-0000023c output: DISK OK [07:06:11] RECOVERY Current Users is now: OK on en-wiki-db-precise i-0000023c output: USERS OK - 0 users currently logged in [07:06:31] RECOVERY Disk Space is now: OK on zeromq1 i-000002b7 output: DISK OK [07:06:31] RECOVERY dpkg-check is now: OK on zeromq1 i-000002b7 output: All packages OK [07:06:41] PROBLEM Current Load is now: WARNING on fr-wiki-db-precise i-0000023e output: WARNING - load average: 6.76, 6.40, 5.24 [07:06:51] RECOVERY SSH is now: OK on mobile-wlm i-000002bc output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [07:06:51] RECOVERY Free ram is now: OK on build-precise1 i-00000273 output: OK: 91% free memory [07:06:51] RECOVERY Disk Space is now: OK on tw-next i-0000027e output: DISK OK [07:06:51] RECOVERY Current Users is now: OK on tw-next i-0000027e output: USERS OK - 0 users currently logged in [07:06:51] RECOVERY Current Load is now: OK on tw-next i-0000027e output: OK - load average: 2.44, 4.49, 3.41 [07:06:52] RECOVERY Total Processes is now: OK on tw-next i-0000027e output: PROCS OK: 76 processes [07:06:58] RECOVERY Free ram is now: OK on tw-next i-0000027e output: OK: 84% free memory [07:07:21] RECOVERY dpkg-check is now: OK on pediapress-ocg2 i-00000234 output: All packages OK [07:09:51] RECOVERY Total Processes is now: OK on fr-wiki-db-precise i-0000023e output: PROCS OK: 80 processes [07:11:51] RECOVERY Current Load is now: OK on zeromq1 i-000002b7 output: OK - load average: 0.17, 2.43, 3.78 [07:11:51] RECOVERY Free ram is now: OK on zeromq1 i-000002b7 output: OK: 88% free memory [07:12:31] RECOVERY Current Users is now: OK on pediapress-ocg2 i-00000234 output: USERS OK - 0 users currently logged in [07:12:31] RECOVERY Free ram is now: OK on pediapress-ocg2 i-00000234 output: OK: 91% free memory [07:13:01] PROBLEM Current Load is now: WARNING on mobile-wlm i-000002bc output: WARNING - load average: 0.03, 2.79, 5.25 [07:18:01] RECOVERY Current Load is now: OK on mobile-wlm i-000002bc output: OK - load average: 0.00, 1.02, 3.80 [07:20:51] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [07:21:01] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [07:21:01] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [07:21:01] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [07:23:31] RECOVERY Current Load is now: OK on deployment-jobrunner05 i-0000028c output: OK - load average: 3.26, 3.61, 4.79 [07:24:51] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 2.38, 3.53, 4.82 [07:51:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [07:51:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [07:51:05] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [07:51:05] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [08:06:54] PROBLEM Current Load is now: WARNING on bots-3 i-000000e5 output: WARNING - load average: 4.66, 5.10, 5.03 [08:11:54] RECOVERY Current Load is now: OK on bots-3 i-000000e5 output: OK - load average: 3.77, 4.36, 4.74 [08:21:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [08:21:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [08:21:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [08:21:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [08:32:54] RECOVERY Disk Space is now: OK on deployment-transcoding i-00000105 output: DISK OK [08:40:54] PROBLEM Disk Space is now: WARNING on deployment-transcoding i-00000105 output: DISK WARNING - free space: / 76 MB (5% inode=52%): [08:51:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [08:51:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [08:51:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [08:51:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [09:21:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [09:21:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [09:21:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [09:21:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [09:42:54] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 6.07, 5.99, 5.41 [09:47:14] PROBLEM Puppet freshness is now: CRITICAL on blamemaps-m1small i-000002a1 output: Puppet has not run in last 20 hours [09:51:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [09:51:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [09:51:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [09:51:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [10:15:14] PROBLEM Puppet freshness is now: CRITICAL on precise-test i-00000231 output: Puppet has not run in last 20 hours [10:21:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [10:21:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [10:21:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [10:21:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [10:27:54] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 3.34, 4.19, 4.97 [10:40:54] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 6.50, 5.74, 5.24 [10:51:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [10:51:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [10:51:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [10:51:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [11:07:54] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 6% free memory [11:17:54] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5 output: Critical: 2% free memory [11:21:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [11:21:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [11:21:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [11:21:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [11:22:54] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 6% free memory [11:30:54] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 4.07, 4.37, 4.82 [11:51:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [11:51:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [11:51:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [11:51:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [12:02:34] hey Ryan_Lane [12:02:38] howdy [12:02:54] are you still in Germany? :D [12:02:59] yep [12:03:01] I never saw you so early here [12:03:02] till the 29th [12:03:03] ah right [12:03:07] wow [12:03:09] why? [12:03:14] why not? :) [12:03:25] you have vacation or is there some huge wikimedia thing? [12:03:38] just working from here [12:03:41] for the hell of it [12:03:41] ah ok [12:03:52] I though you are like merging toolserver and labs :D [12:03:56] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 5.80, 5.75, 5.33 [12:04:04] or something like that [12:05:01] well, labs will be an additional environment to toolserver [12:05:12] and likely toolserver users will eventually switch [12:11:36] btw Ryan_Lane is there any update on ipv6 on production? [12:11:51] ask in -operations [12:11:55] Erik sent a message that you are going to enable it soon, but I have no idea if anything happened or not [12:12:02] k [12:12:09] I haven't been working on it much [12:21:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [12:21:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [12:21:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [12:21:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [12:21:36] Ryan_Lane: why are these instances down? [12:21:40] did you suspend them? [12:21:50] because I don't think it's possible to shutdown -h on labs [12:21:53] no. ganglia is down because it has bad mount options [12:22:00] aha [12:22:16] the others likely have the same ones [12:22:19] so there boxes are running but somewhere in a boot process waiting for someone [12:22:30] these [12:22:49] I thought they are "down" like powered off [12:23:51] well, someone needs to mount the disks on the virtual host and fix them manually [12:23:59] I think sara will do that next time she is on [12:24:40] ok [12:32:59] is bots also down then Ryan_Lane? [12:33:10] my bot hasn't edited since 1 June [12:33:54] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 4.64, 4.62, 4.97 [12:34:43] it should be up [12:35:14] hmm that's odd I can't SSH [12:35:21] no supported authentication methods available.. [12:35:34] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [12:43:34] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 70 MB (5% inode=57%): [12:51:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [12:51:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [12:51:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [12:51:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [12:55:44] petan|wk: http://labs.wikimedia.beta.wmflabs.org/ should redirect to deployment.wikimedia... [12:56:10] Thehelpfulone: I know [12:56:15] hashar did something with that [12:56:25] Thehelpfulone: your bot is down because cluster was down [12:57:01] ok, how can I get it to restart by itself, or is it a manual task? [12:58:52] Ryan_Lane: can you add labsconsole into the interwiki map for wikitech.wikimedia.org or does that go off the general one on Meta? [13:00:39] umm [13:00:41] I dunno [13:00:57] Thehelpfulone: just ssh then start it [13:01:04] why does it need to be in the interwikimap? [13:03:07] it's in the normal interwiki map already [13:03:19] I just can't [[labsconsole:foo|]] it on wikitech [13:03:26] wikitech is being merged, though [13:03:48] labsconsole and wikitech will eventually just be wikitech [13:04:12] can you allow importing from wikitech to labsconsole? [13:04:14] Thehelpfulone: you already have access there? [13:04:18] yes [13:04:21] ah [13:04:34] I created a couple of templates that you can use to tag, derived from the ones on meta [13:05:39] Ryan_Lane: http://wikitech.wikimedia.org/edit/Wikitech:Project_to_transfer_content_to_Labs?redlink=1 can you start on something like that if you have an idea from an ops point of view as to what will be moved and what will stay (use http://meta.wikimedia.org/wiki/Meta:MetaProject_to_transfer_content_to_MediaWiki.org for some inspiration if you like) [13:07:42] * Ryan_Lane nods [13:07:52] may take a little bit for us to get to that [13:08:00] wikitech migration wasn't a really high priority [13:15:54] PROBLEM Current Load is now: WARNING on bots-3 i-000000e5 output: WARNING - load average: 5.93, 6.14, 5.32 [13:21:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [13:21:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [13:21:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [13:21:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [13:51:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [13:51:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [13:51:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [13:51:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [14:10:54] RECOVERY Current Load is now: OK on bots-3 i-000000e5 output: OK - load average: 4.18, 4.57, 4.93 [14:21:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [14:21:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [14:21:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [14:21:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [14:44:45] chrismcmahon: hey [14:44:49] omtsh is that guy [14:44:58] he is now in this chan [14:45:05] I've been in this chan for ages [14:45:06] Why? [14:45:30] chrismcmahon wanted to ask regarding some block you made on some guy from QA... [14:45:36] huh? [14:46:04] Just unblock it if it's a legit use [14:46:06] * user [14:46:10] I've only blocked spammers [14:46:11] hi omtsh, there is a candidate for the QA Engineer position at WMF, Alister Scott, who was experimenting with some browser automation on beta labs cluster, you blocked his account for editing with nonsense: http://en.wikipedia.beta.wmflabs.org/wiki/Special:RecentChanges [14:46:19] Ahhh, yes [14:46:43] btw chrismcmahon if you tell me your account name I give you some global rights, ok? [14:46:51] unblocked [14:46:53] I've spoken with him, he understands the situation, I'd like to get that account unblocked. [14:47:04] Yeah, I just unblocked it, my aplogies [14:47:07] omtsh: I'm Cmcmahon everywhere [14:47:09] * apologies [14:47:13] chrismcmahon: in fact, he should feel free to do any similar tests, that's what the site is for [14:47:15] omtsh: not a problem, thanks [14:48:01] petan|wk: yes, that's something we'll have to figure out, distinguishing legitimate users who might be posting nonsense for a purpose [14:49:02] PROBLEM Current Load is now: WARNING on bots-3 i-000000e5 output: WARNING - load average: 6.07, 5.92, 5.38 [14:49:06] chrismcmahon: ok, I gave you steward and some other bits, so you should be able to unblock anyone in future [14:49:17] thanks petan|wk [14:49:19] yw [14:50:36] petan|wk: what's the search for all projects? [14:50:43] on labs I mean, how do I see the full ist [14:51:04] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [14:51:04] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [14:51:04] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [14:51:04] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [14:51:28] Thehelpfulone: there was a link in bz [14:52:40] https://bugzilla.wikimedia.org/show_bug.cgi?id=37298 [14:54:20] !projectlist is https://labsconsole.wikimedia.org/w/index.php?title=Special%3AAsk&q=[[Resource+Type%3A%3Aproject]]&po=%3F%0D%0A%3FMember%0D%0A%3FDescription&sort_num=&order_num=ASC&eq=yes&p[format]=broadtable&p[limit]=500&p[sort]=&p[order]=&p[offset]=&p[headers]=show&p[mainlabel]=&p[link]=all&p[searchlabel]=%E2%80%A6+further+results&p[intro]=&p[outro]=&p[default]=&p[class]=sortable+wikitable+smwtable&eq=yes [14:54:20] Key was added [14:54:26] :) [14:54:30] short url would be nice [14:56:09] ok [14:56:19] !pl is https://labsconsole.wikimedia.org/w/index.php?title=Special:Ask&q=[[Resource+Type%3A%3Aproject]]&p=format%3Dbroadtable%2Fheaders%3Dshow%2Flink%3Dall%2Fsearchlabel%3D%E2%80%A6-20further-20results%2Fclass%3Dsortable-20wikitable-20smwtable&po=%3FMember%0A%3FDescription%0A&limit=500&eq=no [14:56:19]