[00:04:08] PROBLEM Current Load is now: CRITICAL on nova-gsoc1 i-000001de output: CRITICAL - load average: 38.49, 21.44, 8.82 [00:04:28] PROBLEM Current Load is now: WARNING on nova-production1 i-0000007b output: WARNING - load average: 19.87, 13.33, 5.81 [00:08:39] PROBLEM Current Load is now: WARNING on nova-essex-test i-000001f9 output: WARNING - load average: 23.53, 18.63, 8.83 [00:08:40] PROBLEM Current Load is now: WARNING on nova-ldap1 i-000000df output: WARNING - load average: 28.27, 24.24, 12.44 [00:11:58] PROBLEM Current Load is now: WARNING on nova-precise1 i-00000236 output: WARNING - load average: 12.86, 15.89, 10.11 [00:14:07] PROBLEM Current Load is now: WARNING on nova-gsoc1 i-000001de output: WARNING - load average: 16.59, 23.43, 17.59 [00:17:10] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [00:18:32] PROBLEM Current Load is now: WARNING on search-test i-000000cb output: WARNING - load average: 9.26, 8.85, 6.16 [00:19:53] PROBLEM Current Load is now: WARNING on bots-sql3 i-000000b4 output: WARNING - load average: 11.05, 9.34, 6.40 [00:20:44] PROBLEM Free ram is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [00:20:56] PROBLEM Current Load is now: WARNING on wikistats-01 i-00000042 output: WARNING - load average: 8.23, 8.00, 6.20 [00:22:27] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:04] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [00:32:58] PROBLEM Current Load is now: WARNING on dev-solr i-00000152 output: WARNING - load average: 5.39, 5.57, 5.09 [00:48:18] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [00:54:55] PROBLEM Current Load is now: CRITICAL on nova-ldap1 i-000000df output: CRITICAL - load average: 28.57, 25.12, 19.82 [00:55:16] PROBLEM Current Load is now: CRITICAL on nova-gsoc1 i-000001de output: CRITICAL - load average: 35.84, 29.09, 21.93 [01:02:30] in https://labsconsole.wikimedia.org/wiki/Help:InstanceConfigMediawiki, step 3 mentions "Set the 'labs_mediawiki_hostname' to the fully qualified hostname of your instance." my instance's name is 'pdbhandler-dev'. is that its "fully qualified hostname", or is the fully qualified hostname of the form "pdbhandler-dev.pmtpa.wmflabs"? [01:04:56] go with the latter [01:05:20] maybe add a .org? [01:06:02] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 5.23, 3.92, 3.28 [01:07:11] no .org [01:09:52] sounds reasonable. here goes. [01:10:42] PROBLEM Current Load is now: WARNING on nova-ldap1 i-000000df output: WARNING - load average: 15.08, 17.01, 18.96 [01:16:12] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 1.89, 2.50, 2.84 [01:19:36] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [01:23:09] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [01:28:06] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [01:35:53] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 3.65, 3.77, 3.17 [01:37:03] PROBLEM Current Load is now: CRITICAL on nova-production1 i-0000007b output: CRITICAL - load average: 29.47, 26.88, 21.89 [01:49:40] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [01:56:47] 07/01/2012 - 01:56:47 - Deleting home directory for johnduhart in project(s): testlabs [02:02:32] PROBLEM Current Load is now: WARNING on nova-production1 i-0000007b output: WARNING - load average: 16.23, 16.44, 19.03 [02:13:09] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [02:17:51] PROBLEM Current Users is now: UNKNOWN on aggregator2 i-000002c0 output: Invalid host name i-000002c0 [02:17:51] PROBLEM Disk Space is now: UNKNOWN on aggregator2 i-000002c0 output: Invalid host name i-000002c0 [02:20:11] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [02:23:02] PROBLEM Current Users is now: CRITICAL on aggregator2 i-000002c0 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:23:02] PROBLEM Disk Space is now: CRITICAL on aggregator2 i-000002c0 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:26:09] PROBLEM dpkg-check is now: CRITICAL on aggregator2 i-000002c0 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:26:18] PROBLEM Current Load is now: CRITICAL on aggregator2 i-000002c0 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:26:18] PROBLEM Total Processes is now: CRITICAL on aggregator2 i-000002c0 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:26:23] PROBLEM Free ram is now: CRITICAL on aggregator2 i-000002c0 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:28:08] PROBLEM Current Load is now: CRITICAL on nova-production1 i-0000007b output: CRITICAL - load average: 31.03, 25.09, 20.49 [02:31:08] PROBLEM Current Load is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [02:36:03] PROBLEM Current Load is now: WARNING on nova-precise1 i-00000236 output: WARNING - load average: 11.79, 14.53, 16.15 [02:40:43] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [02:41:34] PROBLEM Disk Space is now: CRITICAL on ipv6test1 i-00000282 output: DISK CRITICAL - free space: / 32 MB (2% inode=57%): [02:45:41] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [02:46:59] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 66 MB (5% inode=57%): [02:50:28] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [03:03:36] 07/01/2012 - 03:03:36 - User laner may have been modified in LDAP or locally, updating key in project(s): deployment-prep [03:06:44] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:11:08] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:08] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:14:47] PROBLEM Free ram is now: WARNING on incubator-bot1 i-00000251 output: Warning: 15% free memory [03:15:02] PROBLEM Current Load is now: WARNING on translation-memory-2 i-000002d9 output: WARNING - load average: 4.11, 4.93, 5.29 [03:17:42] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 12.93, 15.20, 12.75 [03:20:52] PROBLEM Free ram is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:20:57] PROBLEM Disk Space is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [03:20:57] PROBLEM Current Users is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:21:39] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [03:22:47] PROBLEM Current Load is now: CRITICAL on nova-precise1 i-00000236 output: CRITICAL - load average: 33.26, 22.37, 18.42 [03:23:03] RECOVERY Current Load is now: OK on translation-memory-2 i-000002d9 output: OK - load average: 4.12, 4.27, 4.77 [03:23:43] PROBLEM Current Load is now: WARNING on bots-2 i-0000009c output: WARNING - load average: 3.55, 4.40, 5.08 [03:23:58] PROBLEM Current Load is now: WARNING on psm-precise i-000002f2 output: WARNING - load average: 13.81, 13.83, 10.14 [03:25:24] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 9% free memory [03:27:24] RECOVERY Disk Space is now: OK on upload-wizard i-0000021c output: DISK OK [03:29:13] PROBLEM dpkg-check is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:29:56] RECOVERY Current Load is now: OK on bots-2 i-0000009c output: OK - load average: 3.98, 4.00, 4.64 [03:32:21] RECOVERY Current Users is now: OK on psm-precise i-000002f2 output: USERS OK - 0 users currently logged in [03:32:21] PROBLEM Total Processes is now: WARNING on psm-precise i-000002f2 output: PROCS WARNING: 154 processes [03:33:57] RECOVERY dpkg-check is now: OK on psm-precise i-000002f2 output: All packages OK [03:37:09] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [03:53:44] PROBLEM Current Load is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:54:51] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [03:55:32] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:55:32] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:59:00] PROBLEM Current Load is now: WARNING on psm-precise i-000002f2 output: WARNING - load average: 3.86, 5.78, 6.76 [03:59:06] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 12% free memory [04:09:39] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 15% free memory [04:10:39] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 16% free memory [04:14:19] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:19:37] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 17% free memory [04:24:21] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [04:25:59] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [04:32:48] PROBLEM dpkg-check is now: CRITICAL on pdbhandler-dev i-000002f7 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:36:46] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 4% free memory [04:39:29] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 3.02, 5.90, 5.14 [04:39:49] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [04:43:14] PROBLEM Current Load is now: WARNING on nova-ldap1 i-000000df output: WARNING - load average: 17.41, 17.89, 18.27 [04:46:39] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:50:43] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 2.86, 4.03, 4.54 [04:51:18] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [04:51:18] PROBLEM Free ram is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [04:51:18] PROBLEM Total Processes is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:51:29] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [04:57:39] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [05:00:02] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 8.66, 13.39, 10.58 [05:00:11] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 2% free memory [05:00:21] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 79% free memory [05:00:31] PROBLEM Free ram is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:15] PROBLEM Free ram is now: UNKNOWN on configtest-main i-000002dd output: NRPE: Unable to read output [05:03:15] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [05:04:30] PROBLEM Disk Space is now: CRITICAL on ipv6test1 i-00000282 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:09:38] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [05:10:12] PROBLEM Free ram is now: CRITICAL on signwriting-ase i-000002f5 output: Connection refused or timed out [05:11:08] PROBLEM Current Load is now: WARNING on integration-apache1 i-000002eb output: WARNING - load average: 9.31, 10.54, 11.28 [05:11:08] PROBLEM Current Load is now: WARNING on translation-memory-2 i-000002d9 output: WARNING - load average: 4.17, 4.79, 5.82 [05:11:36] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [05:11:46] PROBLEM Total Processes is now: WARNING on psm-precise i-000002f2 output: PROCS WARNING: 160 processes [05:13:21] PROBLEM Current Load is now: WARNING on configtest-main i-000002dd output: WARNING - load average: 1.88, 4.49, 5.17 [05:13:39] PROBLEM Current Load is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:55] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 77 MB (5% inode=57%): [05:14:37] PROBLEM Disk Space is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [05:15:14] PROBLEM Total Processes is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [05:15:24] PROBLEM Current Users is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [05:16:04] PROBLEM Current Load is now: CRITICAL on nova-ldap1 i-000000df output: CRITICAL - load average: 20.24, 21.73, 20.50 [05:18:28] PROBLEM Current Load is now: WARNING on hugglewiki i-000000aa output: WARNING - load average: 5.72, 6.69, 6.33 [05:18:57] PROBLEM Free ram is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [05:18:57] PROBLEM Total Processes is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [05:19:02] PROBLEM Free ram is now: UNKNOWN on signwriting-ase i-000002f5 output: NRPE: Unable to read output [05:20:25] PROBLEM Current Load is now: WARNING on mwreview i-000002ae output: WARNING - load average: 3.56, 5.79, 6.02 [05:20:25] RECOVERY Current Load is now: OK on configtest-main i-000002dd output: OK - load average: 5.05, 3.55, 4.24 [05:22:58] PROBLEM Current Load is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [05:23:55] PROBLEM Current Load is now: WARNING on upload-wizard i-0000021c output: WARNING - load average: 5.39, 7.21, 6.85 [05:23:55] RECOVERY Total Processes is now: OK on upload-wizard i-0000021c output: PROCS OK: 89 processes [05:24:00] RECOVERY Current Users is now: OK on upload-wizard i-0000021c output: USERS OK - 0 users currently logged in [05:29:32] PROBLEM Current Load is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:31:00] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:32:09] PROBLEM Current Load is now: WARNING on nova-ldap1 i-000000df output: WARNING - load average: 17.77, 17.48, 18.54 [05:32:22] PROBLEM Total Processes is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:32:27] PROBLEM Current Load is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:32:27] PROBLEM Disk Space is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:32:28] PROBLEM Free ram is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:32:28] PROBLEM Current Users is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:34:54] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [05:43:32] PROBLEM SSH is now: CRITICAL on bots-sql2 i-000000af output: CRITICAL - Socket timeout after 10 seconds [05:43:32] PROBLEM dpkg-check is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [05:43:48] PROBLEM Current Load is now: WARNING on integration-apache1 i-000002eb output: WARNING - load average: 12.08, 12.79, 16.17 [05:44:58] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [05:45:04] PROBLEM Free ram is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [05:45:04] PROBLEM Current Load is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [05:48:38] PROBLEM Current Load is now: CRITICAL on bots-cb i-0000009e output: CRITICAL - load average: 47.81, 20.63, 18.16 [05:49:21] PROBLEM Free ram is now: WARNING on incubator-bot1 i-00000251 output: Warning: 14% free memory [05:50:32] PROBLEM Disk Space is now: CRITICAL on ipv6test1 i-00000282 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:52:45] PROBLEM Current Load is now: WARNING on pybal-precise i-00000289 output: WARNING - load average: 6.10, 7.03, 6.48 [05:55:22] RECOVERY Current Load is now: OK on hugglewiki i-000000aa output: OK - load average: 1.22, 2.57, 4.86 [06:03:31] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 5.32, 6.36, 5.91 [06:03:48] PROBLEM Current Load is now: WARNING on wikidata-dev-3 i-00000225 output: WARNING - load average: 7.64, 7.58, 6.86 [06:03:52] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:03:53] PROBLEM Free ram is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:03:53] PROBLEM Current Users is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:03:53] PROBLEM Disk Space is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:03:53] PROBLEM Total Processes is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:06:27] RECOVERY Disk Space is now: OK on deployment-apache31 i-000002d4 output: DISK OK [06:06:27] RECOVERY Free ram is now: OK on deployment-apache31 i-000002d4 output: OK: 90% free memory [06:07:05] RECOVERY Total Processes is now: OK on deployment-apache31 i-000002d4 output: PROCS OK: 124 processes [06:07:11] PROBLEM Current Load is now: WARNING on deployment-apache31 i-000002d4 output: WARNING - load average: 1.96, 4.68, 5.96 [06:07:11] RECOVERY Current Users is now: OK on deployment-apache31 i-000002d4 output: USERS OK - 0 users currently logged in [06:07:11] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [06:08:51] RECOVERY Total Processes is now: OK on bots-sql2 i-000000af output: PROCS OK: 93 processes [06:08:59] RECOVERY Free ram is now: OK on upload-wizard i-0000021c output: OK: 93% free memory [06:10:09] RECOVERY SSH is now: OK on bots-sql2 i-000000af output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [06:10:09] RECOVERY dpkg-check is now: OK on bots-sql2 i-000000af output: All packages OK [06:11:21] PROBLEM Total Processes is now: WARNING on incubator-bot2 i-00000252 output: PROCS WARNING: 152 processes [06:11:51] PROBLEM Current Users is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:11:51] PROBLEM Disk Space is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:11:51] PROBLEM Current Load is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:11:51] PROBLEM Free ram is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:11:51] PROBLEM Total Processes is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:12:27] PROBLEM dpkg-check is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:12:31] PROBLEM Free ram is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:12:33] PROBLEM Total Processes is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:12:38] PROBLEM Current Users is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:13:42] RECOVERY Current Load is now: OK on mobile-testing i-00000271 output: OK - load average: 1.26, 1.99, 3.78 [06:13:42] RECOVERY Current Load is now: OK on worker1 i-00000208 output: OK - load average: 1.26, 3.64, 4.99 [06:13:42] RECOVERY Free ram is now: OK on worker1 i-00000208 output: OK: 92% free memory [06:13:43] RECOVERY Current Users is now: OK on worker1 i-00000208 output: USERS OK - 0 users currently logged in [06:13:43] RECOVERY Disk Space is now: OK on worker1 i-00000208 output: DISK OK [06:13:43] RECOVERY Total Processes is now: OK on worker1 i-00000208 output: PROCS OK: 87 processes [06:17:35] RECOVERY Current Load is now: OK on wikidata-dev-3 i-00000225 output: OK - load average: 3.17, 3.49, 4.74 [06:17:36] PROBLEM Current Load is now: WARNING on pdbhandler-dev i-000002f7 output: WARNING - load average: 6.30, 5.84, 5.81 [06:18:20] RECOVERY Current Load is now: OK on deployment-apache31 i-000002d4 output: OK - load average: 1.83, 2.12, 3.56 [06:19:05] PROBLEM Free ram is now: UNKNOWN on pdbhandler-dev i-000002f7 output: NRPE: Unable to read output [06:21:06] RECOVERY Current Users is now: OK on rds i-00000207 output: USERS OK - 0 users currently logged in [06:21:06] RECOVERY Disk Space is now: OK on rds i-00000207 output: DISK OK [06:21:06] RECOVERY Current Load is now: OK on rds i-00000207 output: OK - load average: 1.35, 3.81, 4.65 [06:21:06] RECOVERY Free ram is now: OK on rds i-00000207 output: OK: 94% free memory [06:21:06] RECOVERY Total Processes is now: OK on rds i-00000207 output: PROCS OK: 77 processes [06:21:11] RECOVERY Current Users is now: OK on reportcard2 i-000001ea output: USERS OK - 0 users currently logged in [06:21:12] RECOVERY Free ram is now: OK on reportcard2 i-000001ea output: OK: 87% free memory [06:21:12] RECOVERY Total Processes is now: OK on reportcard2 i-000001ea output: PROCS OK: 100 processes [06:21:17] RECOVERY dpkg-check is now: OK on reportcard2 i-000001ea output: All packages OK [06:22:37] PROBLEM Current Load is now: WARNING on reportcard2 i-000001ea output: WARNING - load average: 4.43, 5.54, 6.27 [06:24:34] RECOVERY Current Load is now: OK on pybal-precise i-00000289 output: OK - load average: 1.83, 3.71, 4.73 [06:27:46] PROBLEM Free ram is now: CRITICAL on pdbhandler-dev i-000002f7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:29:37] PROBLEM Current Load is now: WARNING on bots-3 i-000000e5 output: WARNING - load average: 5.28, 4.88, 5.02 [06:36:57] PROBLEM Total Processes is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:37:17] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [06:37:27] PROBLEM Current Load is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:37:42] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:38:13] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: (Service Check Timed Out) [06:39:34] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:25] !log deployment-prep rebooting all boxes [06:59:29] Logged the message, Master [07:32:44] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:44] PROBLEM Total Processes is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:52] PROBLEM Current Users is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:52] PROBLEM dpkg-check is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:52] PROBLEM Current Load is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:52] PROBLEM Current Users is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:42:42] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [07:43:17] PROBLEM Disk Space is now: WARNING on nagios 127.0.0.1 output: DISK WARNING - free space: /home/dzahn 3260 MB (18% inode=74%): [07:43:27] PROBLEM Current Load is now: WARNING on reportcard2 i-000001ea output: WARNING - load average: 7.00, 7.33, 9.22 [07:52:42] PROBLEM Current Load is now: CRITICAL on pdbhandler-dev i-000002f7 output: Connection refused or timed out [07:52:47] PROBLEM Current Load is now: CRITICAL on nova-ldap1 i-000000df output: CRITICAL - load average: 17.64, 20.15, 21.06 [07:59:01] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: PROCS CRITICAL: 209 processes [07:59:20] PROBLEM Puppet freshness is now: CRITICAL on wikistats-01 i-00000042 output: Puppet has not run in last 20 hours [08:00:12] PROBLEM Free ram is now: CRITICAL on signwriting-ase i-000002f5 output: Connection refused or timed out [08:01:56] PROBLEM Current Load is now: WARNING on incubator-bot1 i-00000251 output: WARNING - load average: 5.21, 7.16, 7.58 [08:01:56] RECOVERY Current Users is now: OK on incubator-bot1 i-00000251 output: USERS OK - 0 users currently logged in [08:01:56] RECOVERY dpkg-check is now: OK on incubator-bot1 i-00000251 output: All packages OK [08:01:56] PROBLEM Current Load is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:08:08] PROBLEM Total Processes is now: WARNING on incubator-bot2 i-00000252 output: PROCS WARNING: 164 processes [08:16:21] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [08:20:49] PROBLEM Current Load is now: WARNING on gerrit i-000000ff output: WARNING - load average: 3.77, 6.87, 8.01 [08:20:49] PROBLEM Current Load is now: WARNING on hugglewiki i-000000aa output: WARNING - load average: 2.63, 5.21, 7.86 [08:20:54] PROBLEM Current Load is now: CRITICAL on localpuppet2 i-0000029b output: CHECK_NRPE: Socket timeout after 10 seconds. [08:26:29] PROBLEM Current Load is now: WARNING on incubator-bot2 i-00000252 output: WARNING - load average: 6.76, 5.62, 5.79 [08:28:17] PROBLEM Current Load is now: WARNING on wikidata-dev-3 i-00000225 output: WARNING - load average: 3.63, 4.35, 6.52 [08:28:17] PROBLEM Current Load is now: WARNING on labs-nfs1 i-0000005d output: WARNING - load average: 2.70, 2.91, 5.14 [08:28:17] PROBLEM Current Load is now: WARNING on worker1 i-00000208 output: WARNING - load average: 7.00, 7.14, 7.97 [08:30:52] PROBLEM Current Load is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:30:52] PROBLEM Current Users is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:30:53] PROBLEM Disk Space is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:30:53] PROBLEM Free ram is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:30:55] PROBLEM Total Processes is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:31:02] PROBLEM dpkg-check is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:31:02] PROBLEM Current Users is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [08:31:02] PROBLEM Total Processes is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [08:31:07] PROBLEM dpkg-check is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [08:31:07] PROBLEM Free ram is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [08:31:07] PROBLEM Current Load is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [08:40:15] PROBLEM Current Load is now: CRITICAL on nova-essex-test i-000001f9 output: CRITICAL - load average: 31.22, 20.48, 16.39 [08:44:26] RECOVERY Current Load is now: OK on localpuppet2 i-0000029b output: OK - load average: 3.08, 3.63, 4.62 [08:44:26] PROBLEM Current Load is now: WARNING on ganglia-test2 i-00000250 output: WARNING - load average: 7.10, 7.43, 7.98 [08:44:26] PROBLEM Current Load is now: WARNING on bots-2 i-0000009c output: WARNING - load average: 4.68, 5.25, 6.06 [08:44:26] RECOVERY Current Load is now: OK on precise-test i-00000231 output: OK - load average: 1.20, 2.66, 4.81 [08:44:26] RECOVERY Current Users is now: OK on precise-test i-00000231 output: USERS OK - 0 users currently logged in [08:44:26] RECOVERY Disk Space is now: OK on precise-test i-00000231 output: DISK OK [08:44:26] RECOVERY Free ram is now: OK on precise-test i-00000231 output: OK: 80% free memory [08:44:27] RECOVERY Total Processes is now: OK on precise-test i-00000231 output: PROCS OK: 95 processes [08:44:34] RECOVERY dpkg-check is now: OK on precise-test i-00000231 output: All packages OK [08:44:45] PROBLEM Current Load is now: CRITICAL on grail i-000002c6 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:44:45] PROBLEM Current Load is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:44:45] PROBLEM Free ram is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:44:45] PROBLEM Disk Space is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [08:44:45] PROBLEM Current Load is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:44:50] PROBLEM Current Users is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:44:50] PROBLEM Disk Space is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:44:50] PROBLEM Total Processes is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:44:57] PROBLEM Current Load is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:45:27] PROBLEM Current Load is now: WARNING on migration1 i-00000261 output: WARNING - load average: 7.57, 6.56, 6.90 [08:46:10] PROBLEM Current Load is now: WARNING on bots-apache1 i-000000b0 output: WARNING - load average: 3.01, 3.75, 5.34 [08:46:10] PROBLEM Current Load is now: WARNING on maps-tilemill1 i-00000294 output: WARNING - load average: 4.23, 5.59, 7.69 [08:46:15] PROBLEM dpkg-check is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:20] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [08:48:31] PROBLEM Current Users is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [08:48:31] PROBLEM Disk Space is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [08:48:31] PROBLEM Free ram is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [08:48:31] PROBLEM Total Processes is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [08:48:50] PROBLEM dpkg-check is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [08:54:10] PROBLEM Current Users is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:54:11] PROBLEM Disk Space is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:54:11] PROBLEM Free ram is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:54:11] PROBLEM Total Processes is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:54:30] PROBLEM Total Processes is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [08:57:16] RECOVERY Disk Space is now: OK on bots-sql2 i-000000af output: DISK OK [08:57:16] RECOVERY Disk Space is now: OK on migration1 i-00000261 output: DISK OK [08:57:16] RECOVERY Current Users is now: OK on migration1 i-00000261 output: USERS OK - 0 users currently logged in [08:57:16] RECOVERY Free ram is now: OK on migration1 i-00000261 output: OK: 88% free memory [08:57:16] RECOVERY Total Processes is now: OK on migration1 i-00000261 output: PROCS OK: 90 processes [08:58:00] RECOVERY Current Load is now: OK on labs-nfs1 i-0000005d output: OK - load average: 2.69, 3.16, 4.78 [08:58:01] RECOVERY Current Users is now: OK on mobile-testing i-00000271 output: USERS OK - 0 users currently logged in [08:58:01] RECOVERY Total Processes is now: OK on mobile-testing i-00000271 output: PROCS OK: 139 processes [08:58:06] RECOVERY dpkg-check is now: OK on migration1 i-00000261 output: All packages OK [08:58:33] PROBLEM Current Load is now: CRITICAL on bots-apache1 i-000000b0 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:58:33] PROBLEM Current Load is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [09:03:31] PROBLEM Current Load is now: WARNING on nova-essex-test i-000001f9 output: WARNING - load average: 10.00, 14.89, 15.31 [09:04:23] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 193 processes [09:04:52] RECOVERY Total Processes is now: OK on configtest-main i-000002dd output: PROCS OK: 109 processes [09:06:02] RECOVERY Current Load is now: OK on gerrit i-000000ff output: OK - load average: 3.06, 3.25, 4.20 [09:06:03] RECOVERY Current Load is now: OK on grail i-000002c6 output: OK - load average: 4.43, 4.09, 4.07 [09:06:03] PROBLEM Current Load is now: WARNING on incubator-bot2 i-00000252 output: WARNING - load average: 2.53, 4.47, 5.91 [09:06:20] RECOVERY Current Load is now: OK on migration1 i-00000261 output: OK - load average: 0.03, 0.61, 2.99 [09:06:25] RECOVERY Current Load is now: OK on bots-apache1 i-000000b0 output: OK - load average: 1.25, 2.13, 3.24 [09:06:25] PROBLEM Current Load is now: WARNING on maps-tilemill1 i-00000294 output: WARNING - load average: 0.55, 3.93, 6.06 [09:07:22] RECOVERY Current Users is now: OK on mwreview i-000002ae output: USERS OK - 0 users currently logged in [09:07:22] RECOVERY Disk Space is now: OK on mwreview i-000002ae output: DISK OK [09:07:22] RECOVERY Free ram is now: OK on mwreview i-000002ae output: OK: 65% free memory [09:07:22] RECOVERY Total Processes is now: OK on mwreview i-000002ae output: PROCS OK: 121 processes [09:07:27] RECOVERY dpkg-check is now: OK on mwreview i-000002ae output: All packages OK [09:10:52] RECOVERY Current Load is now: OK on ganglia-test2 i-00000250 output: OK - load average: 1.72, 2.43, 4.59 [09:16:12] RECOVERY Current Load is now: OK on bots-2 i-0000009c output: OK - load average: 4.05, 4.07, 4.91 [09:18:02] PROBLEM host: deployment-apache31 is DOWN address: i-000002d4 CRITICAL - Host Unreachable (i-000002d4) [09:18:52] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [09:23:12] RECOVERY host: deployment-apache31 is UP address: i-000002d4 PING OK - Packet loss = 0%, RTA = 0.47 ms [09:29:24] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 4.25, 2.92, 3.95 [09:49:18] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [09:49:28] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 3.10, 2.59, 2.96 [09:58:33] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 1.64, 3.06, 3.07 [09:59:46] !log integration rebooted psm-precise and integration-apache1 due to leap second bug [09:59:48] Logged the message, Master [10:00:35] !log deployment-prep rebooted apache31 due to leap second bug. Stopped mysql on apache30 which was using 100%CPU [10:00:37] Logged the message, Master [10:20:19] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [10:21:21] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [10:25:53] PROBLEM Current Load is now: WARNING on nova-ldap1 i-000000df output: WARNING - load average: 18.62, 19.32, 19.37 [10:28:25] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 8.82, 8.94, 8.65 [10:28:25] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 8% free memory [10:28:25] PROBLEM Free ram is now: UNKNOWN on signwriting-ase i-000002f5 output: NRPE: Unable to read output [10:30:06] PROBLEM Free ram is now: WARNING on incubator-bot1 i-00000251 output: Warning: 12% free memory [10:33:21] PROBLEM Free ram is now: UNKNOWN on pdbhandler-dev i-000002f7 output: NRPE: Unable to read output [10:33:21] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 70 MB (5% inode=57%): [10:42:02] PROBLEM Free ram is now: UNKNOWN on configtest-main i-000002dd output: NRPE: Unable to read output [10:42:02] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: PROCS CRITICAL: 201 processes [10:44:33] PROBLEM Total Processes is now: WARNING on incubator-bot2 i-00000252 output: PROCS WARNING: 155 processes [10:47:13] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 193 processes [10:50:51] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [10:59:26] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 3.00, 2.75, 2.99 [11:08:08] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [11:21:12] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [11:43:54] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [11:48:44] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [11:50:39] PROBLEM Total Processes is now: WARNING on bots-3 i-000000e5 output: PROCS WARNING: 160 processes [11:51:20] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [11:54:21] Bots-3 is slow .. not sure what is going on there [11:55:39] RECOVERY Total Processes is now: OK on bots-3 i-000000e5 output: PROCS OK: 145 processes [12:03:39] PROBLEM Total Processes is now: WARNING on bots-3 i-000000e5 output: PROCS WARNING: 155 processes [12:13:57] RECOVERY Current Load is now: OK on bots-3 i-000000e5 output: OK - load average: 3.45, 4.01, 4.93 [12:21:29] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [12:21:59] PROBLEM Current Load is now: WARNING on bots-3 i-000000e5 output: WARNING - load average: 5.08, 5.02, 5.09 [12:29:29] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [12:44:40] PROBLEM dpkg-check is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [12:49:28] RECOVERY dpkg-check is now: OK on fr-wiki-db-precise i-0000023e output: All packages OK [12:51:30] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [13:17:15] RECOVERY Current Load is now: OK on bots-3 i-000000e5 output: OK - load average: 3.46, 4.48, 5.00 [13:21:45] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [13:29:59] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [13:38:08] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 70 MB (5% inode=57%): [13:49:43] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: PROCS CRITICAL: 201 processes [13:52:43] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [13:54:43] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 193 processes [14:04:24] PROBLEM Total Processes is now: WARNING on bots-3 i-000000e5 output: PROCS WARNING: 151 processes [14:22:54] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [14:53:07] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [15:01:25] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5 output: Critical: 1% free memory [15:04:25] RECOVERY Total Processes is now: OK on bots-3 i-000000e5 output: PROCS OK: 112 processes [15:06:55] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 51% free memory [15:07:21] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [15:07:21] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [15:12:15] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 8.43, 7.29, 6.28 [15:12:15] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 9% free memory [15:23:10] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [15:39:05] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [15:47:27] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 70 MB (5% inode=57%): [15:54:04] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [16:02:27] PROBLEM Current Load is now: WARNING on bots-3 i-000000e5 output: WARNING - load average: 5.47, 5.45, 5.51 [16:24:10] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [16:54:25] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [17:13:52] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 3.47, 4.07, 4.86 [17:24:33] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [17:33:23] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [17:38:13] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [17:43:12] PROBLEM Current Load is now: WARNING on nova-production1 i-0000007b output: WARNING - load average: 17.45, 16.66, 17.65 [17:43:27] !log deployment-prep manually rebooted most servers [17:43:29] Logged the message, Master [17:43:57] PROBLEM Current Load is now: WARNING on nova-precise1 i-00000236 output: WARNING - load average: 12.40, 18.03, 17.57 [17:46:52] PROBLEM Current Load is now: CRITICAL on nova-essex-test i-000001f9 output: CRITICAL - load average: 30.91, 19.63, 15.78 [17:46:52] PROBLEM host: deployment-nfs-memc is DOWN address: i-000000d7 CRITICAL - Host Unreachable (i-000000d7) [17:46:52] PROBLEM host: deployment-sql is DOWN address: i-000000d0 CRITICAL - Host Unreachable (i-000000d0) [17:46:52] PROBLEM host: deployment-dbdump is DOWN address: i-000000d2 CRITICAL - Host Unreachable (i-000000d2) [17:49:22] PROBLEM host: deployment-cache-upload is DOWN address: i-00000263 CRITICAL - Host Unreachable (i-00000263) [17:50:42] PROBLEM host: deployment-feed is DOWN address: i-00000118 CRITICAL - Host Unreachable (i-00000118) [17:51:32] PROBLEM Current Load is now: WARNING on nova-essex-test i-000001f9 output: WARNING - load average: 13.45, 19.11, 16.98 [17:51:42] PROBLEM Current Load is now: WARNING on bots-2 i-0000009c output: WARNING - load average: 4.51, 5.94, 5.35 [17:51:52] PROBLEM dpkg-check is now: CRITICAL on deployment-transcoding i-00000105 output: Connection refused by host [17:51:52] PROBLEM Total Processes is now: CRITICAL on deployment-transcoding i-00000105 output: Connection refused by host [17:51:52] PROBLEM Current Users is now: CRITICAL on deployment-transcoding i-00000105 output: Connection refused by host [17:52:52] PROBLEM SSH is now: CRITICAL on deployment-transcoding i-00000105 output: Connection refused [17:52:52] PROBLEM Current Load is now: CRITICAL on deployment-transcoding i-00000105 output: Connection refused by host [17:53:42] PROBLEM host: deployment-imagescaler01 is DOWN address: i-0000025a CRITICAL - Host Unreachable (i-0000025a) [17:54:42] PROBLEM Disk Space is now: CRITICAL on deployment-transcoding i-00000105 output: Connection refused by host [17:54:42] PROBLEM Free ram is now: CRITICAL on deployment-transcoding i-00000105 output: Connection refused by host [17:54:42] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [17:58:12] PROBLEM Puppet freshness is now: CRITICAL on wikistats-01 i-00000042 output: Puppet has not run in last 20 hours [17:59:06] RECOVERY Current Load is now: OK on bots-3 i-000000e5 output: OK - load average: 4.17, 4.27, 4.79 [18:04:54] PROBLEM Current Load is now: WARNING on deployment-apache30 i-000002d3 output: WARNING - load average: 17.71, 14.86, 9.12 [18:04:54] RECOVERY host: deployment-cache-upload is UP address: i-00000263 PING OK - Packet loss = 0%, RTA = 0.67 ms [18:06:34] PROBLEM Current Load is now: WARNING on deployment-apache31 i-000002d4 output: WARNING - load average: 20.78, 17.04, 10.26 [18:07:24] RECOVERY host: deployment-sql is UP address: i-000000d0 PING OK - Packet loss = 0%, RTA = 0.74 ms [18:07:25] RECOVERY host: deployment-dbdump is UP address: i-000000d2 PING OK - Packet loss = 0%, RTA = 0.73 ms [18:07:25] RECOVERY host: deployment-nfs-memc is UP address: i-000000d7 PING OK - Packet loss = 0%, RTA = 4.66 ms [18:11:24] RECOVERY host: deployment-feed is UP address: i-00000118 PING OK - Packet loss = 0%, RTA = 4.41 ms [18:11:44] RECOVERY Current Load is now: OK on bots-2 i-0000009c output: OK - load average: 4.22, 4.69, 4.94 [18:13:54] RECOVERY host: deployment-imagescaler01 is UP address: i-0000025a PING OK - Packet loss = 0%, RTA = 0.68 ms [18:19:27] PROBLEM Disk Space is now: CRITICAL on ipv6test1 i-00000282 output: DISK CRITICAL - free space: / 39 MB (2% inode=57%): [18:24:26] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 69 MB (5% inode=57%): [18:25:36] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [18:26:39] RECOVERY Current Load is now: OK on deployment-apache31 i-000002d4 output: OK - load average: 0.42, 0.88, 4.02 [18:30:27] RECOVERY Current Load is now: OK on deployment-apache30 i-000002d3 output: OK - load average: 0.21, 0.65, 4.22 [18:54:13] PROBLEM Current Load is now: CRITICAL on nova-precise1 i-00000236 output: CRITICAL - load average: 30.20, 19.74, 17.56 [18:56:03] PROBLEM Current Load is now: WARNING on bots-2 i-0000009c output: WARNING - load average: 4.87, 5.58, 5.21 [18:56:13] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [18:59:13] PROBLEM Current Load is now: WARNING on nova-precise1 i-00000236 output: WARNING - load average: 11.86, 16.54, 17.05 [19:09:17] PROBLEM Current Load is now: CRITICAL on nova-precise1 i-00000236 output: CRITICAL - load average: 30.63, 21.15, 17.89 [19:26:18] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [19:46:07] RECOVERY Current Load is now: OK on bots-2 i-0000009c output: OK - load average: 4.73, 4.73, 4.93 [19:56:24] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [19:56:45] PROBLEM Current Load is now: CRITICAL on nova-essex-test i-000001f9 output: CRITICAL - load average: 31.22, 21.37, 16.71 [20:00:34] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: Critical: 5% free memory [20:01:45] PROBLEM Current Load is now: WARNING on nova-essex-test i-000001f9 output: WARNING - load average: 10.70, 15.99, 15.89 [20:05:37] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 9% free memory [20:06:12] PROBLEM Total Processes is now: WARNING on bots-3 i-000000e5 output: PROCS WARNING: 155 processes [20:26:47] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [20:36:27] PROBLEM Current Load is now: WARNING on bots-2 i-0000009c output: WARNING - load average: 5.70, 5.71, 5.28 [20:49:58] PROBLEM Current Load is now: WARNING on nova-precise1 i-00000236 output: WARNING - load average: 15.44, 19.81, 18.24 [20:57:10] RECOVERY Current Load is now: OK on bots-2 i-0000009c output: OK - load average: 4.93, 4.89, 4.99 [20:57:25] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [21:07:34] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: PROCS CRITICAL: 201 processes [21:12:30] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 193 processes [21:27:26] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [21:57:28] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [22:27:40] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [22:28:20] PROBLEM Current Load is now: WARNING on bots-2 i-0000009c output: WARNING - load average: 5.06, 5.18, 5.06 [22:45:35] PROBLEM Current Load is now: CRITICAL on nova-precise1 i-00000236 output: CRITICAL - load average: 32.38, 22.93, 18.47 [22:50:35] PROBLEM Current Load is now: WARNING on nova-precise1 i-00000236 output: WARNING - load average: 11.85, 16.43, 16.97 [22:57:50] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [23:01:32] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: Critical: 5% free memory [23:03:42] RECOVERY Current Load is now: OK on bots-2 i-0000009c output: OK - load average: 4.54, 4.68, 4.97 [23:28:03] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [23:28:53] PROBLEM Current Load is now: WARNING on bots-2 i-0000009c output: WARNING - load average: 5.06, 5.29, 5.21 [23:53:58] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [23:58:17] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [23:58:37] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output