[00:00:05] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [00:07:16] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: PROCS CRITICAL: 203 processes [00:12:18] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 189 processes [00:22:05] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [00:23:05] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [00:23:10] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:45] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [00:30:05] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [00:53:07] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [00:58:03] PROBLEM Total Processes is now: CRITICAL on aggregator-test1 i-000002bf output: CHECK_NRPE: Socket timeout after 10 seconds. [01:00:08] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [01:02:53] PROBLEM Total Processes is now: WARNING on aggregator-test1 i-000002bf output: PROCS WARNING: 191 processes [01:23:46] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [01:24:16] PROBLEM Puppet freshness is now: CRITICAL on deployment-transcoding i-00000105 output: Puppet has not run in last 20 hours [01:26:16] PROBLEM Puppet freshness is now: CRITICAL on gerrit i-000000ff output: Puppet has not run in last 20 hours [01:30:46] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [01:53:50] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [02:00:50] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [02:01:20] PROBLEM Puppet freshness is now: CRITICAL on wikistats-01 i-00000042 output: Puppet has not run in last 20 hours [02:13:58] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [02:18:48] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [02:25:07] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [02:31:07] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [02:39:42] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [02:41:52] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 3.52, 4.45, 2.37 [02:46:52] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.24, 1.82, 1.80 [02:48:52] 07/05/2012 - 02:48:52 - User laner may have been modified in LDAP or locally, updating key in project(s): deployment-prep [02:55:19] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [03:01:08] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [03:02:20] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 14% free memory [03:19:22] PROBLEM Puppet freshness is now: CRITICAL on maps-test2 i-00000253 output: Puppet has not run in last 20 hours [03:28:11] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [03:31:11] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [03:32:21] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 4% free memory [03:37:21] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 28% free memory [03:38:41] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [03:47:49] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 17% free memory [03:52:15] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 14% free memory [03:52:15] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 16% free memory [03:59:18] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [04:01:48] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [04:08:08] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 5% free memory [04:12:29] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 3% free memory [04:12:59] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:17:29] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [04:17:29] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 5% free memory [04:18:55] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [04:19:24] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [04:24:09] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:18] PROBLEM Puppet freshness is now: CRITICAL on su-fe1 i-000002e5 output: Puppet has not run in last 20 hours [04:26:40] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 65 MB (4% inode=57%): [04:27:20] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:28:44] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [04:29:20] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [04:31:54] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [04:59:20] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [05:02:00] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [05:08:20] PROBLEM Free ram is now: CRITICAL on su-be1 i-000002e7 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:10] PROBLEM Free ram is now: UNKNOWN on su-be1 i-000002e7 output: NRPE: Unable to read output [05:29:30] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [05:32:00] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [05:51:55] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [05:54:14] PROBLEM Total Processes is now: CRITICAL on incubator-bot1 i-00000251 output: PROCS CRITICAL: 201 processes [05:59:14] PROBLEM Total Processes is now: WARNING on incubator-bot1 i-00000251 output: PROCS WARNING: 199 processes [05:59:54] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [06:02:04] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [06:14:14] PROBLEM Total Processes is now: CRITICAL on incubator-bot1 i-00000251 output: PROCS CRITICAL: 201 processes [06:16:54] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 20% free memory [06:29:57] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [06:31:36] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [06:34:30] PROBLEM Current Load is now: CRITICAL on deployment-apache30 i-000002d3 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:30] PROBLEM Disk Space is now: CRITICAL on deployment-apache30 i-000002d3 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:30] PROBLEM Current Users is now: CRITICAL on deployment-apache30 i-000002d3 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:55] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [06:35:09] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:35:15] PROBLEM Disk Space is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:35:15] PROBLEM Total Processes is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:35:22] PROBLEM Current Users is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:36:08] PROBLEM Current Users is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:36:08] PROBLEM Current Load is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:36:08] PROBLEM Free ram is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:36:08] PROBLEM dpkg-check is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:40:47] PROBLEM Current Users is now: CRITICAL on su-fe2 i-000002e6 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:40:47] PROBLEM Total Processes is now: CRITICAL on su-fe2 i-000002e6 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:42:15] PROBLEM Total Processes is now: CRITICAL on aggregator-test1 i-000002bf output: PROCS CRITICAL: 207 processes [06:42:27] RECOVERY Disk Space is now: OK on fr-wiki-db-precise i-0000023e output: DISK OK [06:42:27] RECOVERY Total Processes is now: OK on fr-wiki-db-precise i-0000023e output: PROCS OK: 98 processes [06:42:40] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [06:43:40] RECOVERY Current Users is now: OK on fr-wiki-db-precise i-0000023e output: USERS OK - 0 users currently logged in [06:43:40] RECOVERY Current Load is now: OK on fr-wiki-db-precise i-0000023e output: OK - load average: 2.22, 3.65, 2.61 [06:43:40] RECOVERY Free ram is now: OK on fr-wiki-db-precise i-0000023e output: OK: 64% free memory [06:43:40] RECOVERY dpkg-check is now: OK on fr-wiki-db-precise i-0000023e output: All packages OK [06:43:46] PROBLEM Free ram is now: CRITICAL on dumps-2 i-000002d8 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:43:47] PROBLEM dpkg-check is now: CRITICAL on dumps-2 i-000002d8 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:43:47] PROBLEM Total Processes is now: CRITICAL on dumps-2 i-000002d8 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:29] PROBLEM Disk Space is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:29] PROBLEM Current Users is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:29] PROBLEM Free ram is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:29] PROBLEM Disk Space is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:29] PROBLEM Current Load is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:29] PROBLEM Current Users is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:29] PROBLEM Total Processes is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:37] PROBLEM dpkg-check is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:37] PROBLEM Current Load is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:37] PROBLEM Total Processes is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:42] PROBLEM Free ram is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:42] PROBLEM Total Processes is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:01] PROBLEM Free ram is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:28] RECOVERY Current Users is now: OK on su-fe2 i-000002e6 output: USERS OK - 0 users currently logged in [06:54:28] RECOVERY Total Processes is now: OK on su-fe2 i-000002e6 output: PROCS OK: 86 processes [06:54:36] PROBLEM Total Processes is now: WARNING on aggregator-test1 i-000002bf output: PROCS WARNING: 194 processes [06:55:53] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:38] RECOVERY Current Users is now: OK on build-precise1 i-00000273 output: USERS OK - 1 users currently logged in [06:56:38] RECOVERY Free ram is now: OK on pediapress-ocg2 i-00000234 output: OK: 84% free memory [06:56:38] RECOVERY Total Processes is now: OK on pediapress-ocg2 i-00000234 output: PROCS OK: 86 processes [06:56:45] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 9.14, 9.72, 8.51 [06:56:46] RECOVERY Disk Space is now: OK on nova-precise1 i-00000236 output: DISK OK [06:56:46] RECOVERY Current Users is now: OK on nova-precise1 i-00000236 output: USERS OK - 0 users currently logged in [06:56:46] RECOVERY Current Load is now: OK on nova-precise1 i-00000236 output: OK - load average: 0.77, 3.29, 3.51 [06:56:46] RECOVERY Free ram is now: OK on nova-precise1 i-00000236 output: OK: 82% free memory [06:56:46] RECOVERY Total Processes is now: OK on nova-precise1 i-00000236 output: PROCS OK: 118 processes [06:56:51] RECOVERY Disk Space is now: OK on etherpad-lite i-000002de output: DISK OK [06:56:51] RECOVERY Current Users is now: OK on etherpad-lite i-000002de output: USERS OK - 0 users currently logged in [06:56:51] RECOVERY Total Processes is now: OK on etherpad-lite i-000002de output: PROCS OK: 121 processes [06:57:15] RECOVERY dpkg-check is now: OK on etherpad-lite i-000002de output: All packages OK [06:57:15] RECOVERY Current Load is now: OK on etherpad-lite i-000002de output: OK - load average: 5.88, 5.45, 4.70 [06:57:48] RECOVERY Total Processes is now: OK on dumps-2 i-000002d8 output: PROCS OK: 126 processes [06:57:56] RECOVERY dpkg-check is now: OK on dumps-2 i-000002d8 output: All packages OK [06:57:56] RECOVERY Free ram is now: OK on dumps-2 i-000002d8 output: OK: 87% free memory [07:01:28] RECOVERY Current Load is now: OK on deployment-apache30 i-000002d3 output: OK - load average: 0.26, 1.47, 3.02 [07:01:28] RECOVERY Current Users is now: OK on deployment-apache30 i-000002d3 output: USERS OK - 0 users currently logged in [07:01:28] RECOVERY Disk Space is now: OK on deployment-apache30 i-000002d3 output: DISK OK [07:02:06] PROBLEM Current Load is now: WARNING on build-precise1 i-00000273 output: WARNING - load average: 6.50, 5.56, 5.03 [07:02:16] PROBLEM dpkg-check is now: CRITICAL on redis1 i-000002b6 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:17] PROBLEM Disk Space is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:17] PROBLEM Total Processes is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:22] PROBLEM Free ram is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:22] PROBLEM dpkg-check is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:03:14] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [07:03:14] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [07:05:01] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [07:05:01] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [07:05:01] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 5.76, 8.45, 8.61 [07:05:01] PROBLEM Total Processes is now: WARNING on nagios 127.0.0.1 output: PROCS WARNING: 361 processes [07:06:16] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:06:31] PROBLEM Current Users is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:06:52] PROBLEM Free ram is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:06:52] PROBLEM dpkg-check is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:07:31] RECOVERY dpkg-check is now: OK on redis1 i-000002b6 output: All packages OK [07:07:31] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [07:07:31] PROBLEM Current Load is now: WARNING on aggregator-test1 i-000002bf output: WARNING - load average: 1.10, 2.74, 6.20 [07:07:31] PROBLEM Current Load is now: WARNING on wikidata-dev-2 i-00000259 output: WARNING - load average: 2.42, 3.17, 5.24 [07:07:42] PROBLEM Current Load is now: WARNING on integration-apache1 i-000002eb output: WARNING - load average: 10.17, 13.35, 10.10 [07:07:55] RECOVERY Current Load is now: OK on build-precise1 i-00000273 output: OK - load average: 1.57, 3.65, 4.44 [07:07:55] RECOVERY Disk Space is now: OK on build-precise1 i-00000273 output: DISK OK [07:07:55] RECOVERY Free ram is now: OK on build-precise1 i-00000273 output: OK: 87% free memory [07:07:55] RECOVERY Total Processes is now: OK on build-precise1 i-00000273 output: PROCS OK: 94 processes [07:08:14] RECOVERY dpkg-check is now: OK on build-precise1 i-00000273 output: All packages OK [07:08:51] PROBLEM Total Processes is now: CRITICAL on mobile-wlm i-000002bc output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:56] PROBLEM Free ram is now: CRITICAL on mobile-wlm i-000002bc output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:56] PROBLEM Current Load is now: CRITICAL on mobile-wlm i-000002bc output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:56] PROBLEM Disk Space is now: CRITICAL on mobile-wlm i-000002bc output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:56] PROBLEM Current Users is now: CRITICAL on mobile-wlm i-000002bc output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:56] PROBLEM Free ram is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:57] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:57] PROBLEM Disk Space is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:33] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [07:09:44] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:49] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:49] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:05] RECOVERY Total Processes is now: OK on nagios 127.0.0.1 output: PROCS OK: 107 processes [07:10:05] PROBLEM Free ram is now: WARNING on incubator-bot1 i-00000251 output: Warning: 15% free memory [07:10:05] PROBLEM Free ram is now: UNKNOWN on su-fe2 i-000002e6 output: NRPE: Unable to read output [07:10:10] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 0.91, 6.52, 7.96 [07:10:15] PROBLEM Current Load is now: WARNING on incubator-bot0 i-00000296 output: WARNING - load average: 1.62, 4.55, 5.38 [07:10:25] PROBLEM Current Users is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:25] PROBLEM Total Processes is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:12:31] RECOVERY Current Load is now: OK on wikidata-dev-2 i-00000259 output: OK - load average: 0.50, 1.96, 4.19 [07:12:31] RECOVERY Total Processes is now: OK on mobile-wlm i-000002bc output: PROCS OK: 103 processes [07:12:36] RECOVERY Current Load is now: OK on mobile-wlm i-000002bc output: OK - load average: 0.36, 3.15, 3.76 [07:12:37] RECOVERY Free ram is now: OK on mobile-wlm i-000002bc output: OK: 74% free memory [07:12:37] RECOVERY Disk Space is now: OK on mobile-wlm i-000002bc output: DISK OK [07:12:37] RECOVERY Current Users is now: OK on mobile-wlm i-000002bc output: USERS OK - 0 users currently logged in [07:12:37] RECOVERY Current Load is now: OK on mwreview i-000002ae output: OK - load average: 0.98, 3.70, 3.87 [07:12:37] RECOVERY Free ram is now: OK on mwreview i-000002ae output: OK: 69% free memory [07:12:37] RECOVERY Disk Space is now: OK on mwreview i-000002ae output: DISK OK [07:14:01] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 10% free memory [07:14:11] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 7.50, 8.55, 8.71 [07:14:51] RECOVERY Current Load is now: OK on incubator-bot0 i-00000296 output: OK - load average: 1.01, 2.32, 4.18 [07:14:51] RECOVERY Current Users is now: OK on incubator-bot2 i-00000252 output: USERS OK - 0 users currently logged in [07:14:51] RECOVERY Free ram is now: OK on incubator-bot2 i-00000252 output: OK: 40% free memory [07:14:51] RECOVERY dpkg-check is now: OK on incubator-bot2 i-00000252 output: All packages OK [07:14:51] RECOVERY Current Users is now: OK on mwreview i-000002ae output: USERS OK - 0 users currently logged in [07:14:52] RECOVERY Total Processes is now: OK on mwreview i-000002ae output: PROCS OK: 106 processes [07:17:31] RECOVERY Current Load is now: OK on aggregator-test1 i-000002bf output: OK - load average: 0.48, 1.74, 4.32 [07:19:52] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.31, 1.20, 4.36 [07:22:31] RECOVERY Current Load is now: OK on integration-apache1 i-000002eb output: OK - load average: 0.04, 1.00, 4.30 [07:29:51] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 0.28, 0.59, 3.15 [07:34:51] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 1.28, 0.78, 2.49 [07:35:01] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [07:36:31] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [08:05:34] PROBLEM Disk Space is now: CRITICAL on ipv6test1 i-00000282 output: CHECK_NRPE: Socket timeout after 10 seconds. [08:05:43] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [08:07:23] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [08:09:54] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 66 MB (5% inode=57%): [08:37:28] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [08:38:18] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [08:38:18] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [08:49:29] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [09:09:22] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [09:09:23] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [09:19:53] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [09:24:42] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [09:30:33] PROBLEM Current Load is now: WARNING on wikidata-dev-2 i-00000259 output: WARNING - load average: 7.73, 7.48, 5.91 [09:40:04] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [09:40:05] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [09:40:45] RECOVERY Current Load is now: OK on wikidata-dev-2 i-00000259 output: OK - load average: 0.42, 2.52, 4.27 [09:49:31] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [09:54:21] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [10:10:05] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [10:10:05] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [10:40:09] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [10:40:09] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [11:10:18] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [11:10:18] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [11:20:12] !log deployment-prep added a bunch of spamers in /home/wikipedia/common/wmf-config/mwblocker.log which would block them [11:20:15] Logged the message, Master [11:24:33] PROBLEM Puppet freshness is now: CRITICAL on deployment-transcoding i-00000105 output: Puppet has not run in last 20 hours [11:24:33] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [11:26:33] PROBLEM Puppet freshness is now: CRITICAL on gerrit i-000000ff output: Puppet has not run in last 20 hours [11:29:23] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [11:40:23] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [11:40:23] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [12:01:33] PROBLEM Puppet freshness is now: CRITICAL on wikistats-01 i-00000042 output: Puppet has not run in last 20 hours [12:10:23] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [12:10:23] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [12:15:30] !log deployment-prep Did some documentation work on [[Deployment/Overview]] [12:15:31] Logged the message, Master [12:23:32] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 20% free memory [12:40:31] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [12:40:31] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [13:10:45] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [13:10:45] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [13:19:36] PROBLEM Puppet freshness is now: CRITICAL on maps-test2 i-00000253 output: Puppet has not run in last 20 hours [13:40:45] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [13:40:45] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [14:10:50] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [14:10:50] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [14:14:40] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [14:24:33] PROBLEM Puppet freshness is now: CRITICAL on su-fe1 i-000002e5 output: Puppet has not run in last 20 hours [14:25:03] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 20% free memory [14:40:53] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [14:40:53] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [14:56:08] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [14:56:08] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [15:00:45] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 8.02, 8.27, 8.27 [15:00:45] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 8% free memory [15:10:55] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [15:10:56] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [15:14:05] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [15:41:03] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [15:41:03] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [16:10:52] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [16:14:12] PROBLEM host: signwriting-ase2 is DOWN address: i-000002fd CRITICAL - Host Unreachable (i-000002fd) [16:14:12] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [16:18:52] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 65 MB (5% inode=57%): [16:23:48] PROBLEM Total Processes is now: CRITICAL on signwriting-ase3 i-000002fe output: Connection refused by host [16:24:30] PROBLEM dpkg-check is now: CRITICAL on signwriting-ase3 i-000002fe output: Connection refused by host [16:25:49] PROBLEM Current Load is now: CRITICAL on signwriting-ase3 i-000002fe output: Connection refused by host [16:26:28] PROBLEM Current Users is now: CRITICAL on signwriting-ase3 i-000002fe output: Connection refused by host [16:26:58] PROBLEM Disk Space is now: CRITICAL on signwriting-ase3 i-000002fe output: Connection refused by host [16:27:38] PROBLEM Free ram is now: CRITICAL on signwriting-ase3 i-000002fe output: Connection refused by host [16:29:28] RECOVERY dpkg-check is now: OK on signwriting-ase3 i-000002fe output: All packages OK [16:30:48] RECOVERY Current Load is now: OK on signwriting-ase3 i-000002fe output: OK - load average: 0.65, 1.60, 1.14 [16:31:28] RECOVERY Current Users is now: OK on signwriting-ase3 i-000002fe output: USERS OK - 0 users currently logged in [16:31:58] RECOVERY Disk Space is now: OK on signwriting-ase3 i-000002fe output: DISK OK [16:32:38] PROBLEM Free ram is now: UNKNOWN on signwriting-ase3 i-000002fe output: NRPE: Unable to read output [16:33:48] RECOVERY Total Processes is now: OK on signwriting-ase3 i-000002fe output: PROCS OK: 77 processes [16:35:49] PROBLEM Current Users is now: CRITICAL on bastion-restricted1 i-0000019b output: USERS CRITICAL - 11 users currently logged in [16:45:03] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [17:16:03] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [17:44:08] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 20% free memory [17:46:18] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [17:53:52] 07/05/2012 - 17:53:51 - User yaron may have been modified in LDAP or locally, updating key in project(s): bastion [17:54:10] 07/05/2012 - 17:54:10 - Updating keys for yaron at /export/keys/yaron [17:56:22] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [17:56:22] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [18:01:14] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 8.22, 8.52, 8.49 [18:01:14] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 9% free memory [18:02:18] PROBLEM Current Load is now: WARNING on integration-apache1 i-000002eb output: WARNING - load average: 11.06, 13.40, 6.88 [18:07:08] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [18:11:30] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [18:11:31] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [18:12:05] RECOVERY Current Load is now: OK on integration-apache1 i-000002eb output: OK - load average: 0.73, 3.56, 4.76 [18:15:04] PROBLEM Free ram is now: CRITICAL on signwriting-ase4 i-000002ff output: Connection refused by host [18:16:24] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [18:16:44] PROBLEM Total Processes is now: CRITICAL on signwriting-ase4 i-000002ff output: Connection refused by host [18:17:04] PROBLEM dpkg-check is now: CRITICAL on signwriting-ase4 i-000002ff output: Connection refused by host [18:18:14] PROBLEM Current Load is now: CRITICAL on signwriting-ase4 i-000002ff output: Connection refused by host [18:18:44] PROBLEM Current Users is now: CRITICAL on signwriting-ase4 i-000002ff output: Connection refused by host [18:19:24] PROBLEM Disk Space is now: CRITICAL on signwriting-ase4 i-000002ff output: Connection refused by host [18:21:14] PROBLEM Current Users is now: WARNING on bastion-restricted1 i-0000019b output: USERS WARNING - 10 users currently logged in [18:22:14] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 20% free memory [18:46:25] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [18:56:16] RECOVERY Current Load is now: OK on signwriting-ase4 i-000002ff output: OK - load average: 2.83, 2.41, 2.20 [18:56:56] RECOVERY Total Processes is now: OK on signwriting-ase4 i-000002ff output: PROCS OK: 86 processes [18:57:02] RECOVERY dpkg-check is now: OK on signwriting-ase4 i-000002ff output: All packages OK [18:58:46] RECOVERY Current Users is now: OK on signwriting-ase4 i-000002ff output: USERS OK - 1 users currently logged in [18:59:45] RECOVERY Disk Space is now: OK on signwriting-ase4 i-000002ff output: DISK OK [19:00:14] PROBLEM Free ram is now: UNKNOWN on signwriting-ase4 i-000002ff output: NRPE: Unable to read output [19:01:14] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 8.60, 8.61, 8.69 [19:01:14] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 9% free memory [19:12:15] I'm having a problem creating a single-node mediawiki server instance. I've added the puppet class of role::mediawiki-install::labs, but the MeidaWiki source has never been installed in /srv/mediawiki [19:20:34] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [19:33:00] slevinski: did you force run puppet? [19:33:05] using puppetd -tv? [19:34:04] I tried that but with the sudo command as per the instructions. My user didn't have sudo rights. I'll try it without the sudo [19:35:49] info: Creating a new SSL key for i-000002fe.pmtpa.wmflabs [19:35:51] err: Could not request certificate: getaddrinfo: Name or service not known [19:35:59] Exiting; failed to retrieve certificate and waitforcert is disabled [19:40:21] ah [19:40:31] you need to do it as root [19:40:37] this is for signwriting, right? [19:40:42] yes [19:40:42] lemme fix your sudo rights [19:41:11] by default it should allow everyone. not sure why it doesn't create that properly on project creation [19:41:58] ok, you can sudo now [19:42:41] thanks, it's running [19:43:40] yw [19:44:35] PROBLEM Free ram is now: WARNING on ganglia-test2 i-00000250 output: Warning: 19% free memory [19:44:55] PROBLEM host: testing-virt6 is DOWN address: i-00000301 CRITICAL - Host Unreachable (i-00000301) [19:48:04] Something in puppet failed. [19:48:44] Role::Mediawiki-install::Labs/Git::Clone[mediawiki]/Exec[git_clone_mediawiki]/returns) change from notrun to 0 failed: Command exceeded timeout at /etc/puppet/manifests/generic-definitions.pp:750 [19:51:25] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [19:58:22] <^demon|away> Doesn't surprise me. Cloning MediaWiki core is gonna take awhile. Trying to do that via puppet is kinda silly. [19:59:05] Another error: E: Couldn't find package mysql-server-false [19:59:35] <^demon|away> mysql-server-false? Sounds like someone wrote the manifest wrong. [20:00:32] I thought mysql-server-false sounded a bit weird. [20:01:19] <^demon|away> If you're using puppetmaster::self, you can edit the manifests directly in /var/git, so you could fix them and once they work then submit them to gerrit. [20:02:07] This failed on instance i-000002fe, but is still running on i-000002ff [20:04:21] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 19% free memory [20:05:41] RECOVERY host: testing-virt6 is UP address: i-00000302 PING OK - Packet loss = 0%, RTA = 1.09 ms [20:06:31] RECOVERY host: nginx-dev2 is UP address: i-000002f0 PING OK - Packet loss = 0%, RTA = 1.17 ms [20:08:52] PROBLEM Current Load is now: CRITICAL on testing-virt6 i-00000302 output: Connection refused by host [20:09:27] PROBLEM Current Users is now: CRITICAL on testing-virt6 i-00000302 output: Connection refused by host [20:10:01] PROBLEM dpkg-check is now: CRITICAL on signwriting-ase4 i-000002ff output: DPKG CRITICAL dpkg reports broken packages [20:10:01] PROBLEM Disk Space is now: CRITICAL on testing-virt6 i-00000302 output: Connection refused by host [20:10:42] PROBLEM Free ram is now: CRITICAL on testing-virt6 i-00000302 output: Connection refused by host [20:13:51] RECOVERY Current Load is now: OK on testing-virt6 i-00000302 output: OK - load average: 0.17, 0.83, 0.50 [20:14:22] PROBLEM Current Load is now: CRITICAL on nginx-dev2 i-000002f0 output: Connection refused by host [20:14:22] PROBLEM Current Users is now: CRITICAL on nginx-dev2 i-000002f0 output: Connection refused by host [20:14:22] PROBLEM dpkg-check is now: CRITICAL on nginx-dev2 i-000002f0 output: Connection refused by host [20:14:22] PROBLEM Free ram is now: CRITICAL on nginx-dev2 i-000002f0 output: Connection refused by host [20:14:22] PROBLEM Total Processes is now: CRITICAL on nginx-dev2 i-000002f0 output: Connection refused by host [20:14:23] PROBLEM Disk Space is now: CRITICAL on nginx-dev2 i-000002f0 output: Connection refused by host [20:14:23] RECOVERY Current Users is now: OK on testing-virt6 i-00000302 output: USERS OK - 0 users currently logged in [20:15:01] RECOVERY dpkg-check is now: OK on signwriting-ase4 i-000002ff output: All packages OK [20:15:01] RECOVERY Disk Space is now: OK on testing-virt6 i-00000302 output: DISK OK [20:15:41] PROBLEM Free ram is now: UNKNOWN on testing-virt6 i-00000302 output: NRPE: Unable to read output [20:16:21] PROBLEM Current Users is now: CRITICAL on bastion-restricted1 i-0000019b output: USERS CRITICAL - 13 users currently logged in [20:18:17] PROBLEM host: pybal-precise is DOWN address: i-00000289 CRITICAL - Host Unreachable (i-00000289) [20:23:03] failed on i-000002ff as well. [20:23:05] Role::Mediawiki-install::Labs/Git::Clone[mediawiki]/Exec[git_clone_mediawiki]/returns) change from notrun to 0 failed: Command exceeded timeout at /etc/puppet/manifests/generic-definitions.pp:750 [20:25:10] PROBLEM Free ram is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:28:56] PROBLEM Free ram is now: CRITICAL on signwriting-ase3 i-000002fe output: Connection refused or timed out [20:34:01] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [20:34:41] PROBLEM Current Users is now: WARNING on bastion-restricted1 i-0000019b output: USERS WARNING - 9 users currently logged in [20:34:42] RECOVERY Disk Space is now: OK on patchtest i-000000f1 output: DISK OK [20:34:42] RECOVERY HTTP is now: OK on demo-deployment1 i-00000276 output: HTTP OK: HTTP/1.1 200 OK - 911 bytes in 1.401 second response time [20:34:42] RECOVERY dpkg-check is now: OK on patchtest i-000000f1 output: All packages OK [20:34:42] RECOVERY Current Users is now: OK on patchtest i-000000f1 output: USERS OK - 0 users currently logged in [20:34:42] RECOVERY Total Processes is now: OK on patchtest i-000000f1 output: PROCS OK: 89 processes [20:34:47] RECOVERY Current Load is now: OK on patchtest i-000000f1 output: OK - load average: 0.00, 0.10, 0.07 [20:34:47] PROBLEM Current Load is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [20:34:47] PROBLEM Current Users is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [20:34:47] PROBLEM Disk Space is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [20:34:47] PROBLEM Free ram is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [20:34:48] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:34:53] RECOVERY host: pybal-precise is UP address: i-00000289 PING OK - Packet loss = 0%, RTA = 0.64 ms [20:35:16] PROBLEM Free ram is now: UNKNOWN on signwriting-ase3 i-000002fe output: NRPE: Unable to read output [20:37:06] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 1.61, 6.56, 5.14 [20:37:06] PROBLEM Current Load is now: WARNING on wikistats-01 i-00000042 output: WARNING - load average: 1.80, 14.95, 12.36 [20:39:26] PROBLEM Free ram is now: WARNING on incubator-bot1 i-00000251 output: Warning: 11% free memory [20:39:26] RECOVERY Current Load is now: OK on upload-wizard i-0000021c output: OK - load average: 0.11, 2.31, 2.49 [20:39:26] RECOVERY Current Users is now: OK on upload-wizard i-0000021c output: USERS OK - 0 users currently logged in [20:39:27] RECOVERY Disk Space is now: OK on upload-wizard i-0000021c output: DISK OK [20:39:27] RECOVERY Free ram is now: OK on upload-wizard i-0000021c output: OK: 93% free memory [20:42:06] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 0.48, 2.63, 3.80 [20:42:36] PROBLEM Disk Space is now: CRITICAL on patchtest i-000000f1 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:42:37] PROBLEM HTTP is now: CRITICAL on demo-deployment1 i-00000276 output: CRITICAL - Socket timeout after 10 seconds [20:42:37] PROBLEM Current Users is now: CRITICAL on patchtest i-000000f1 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:42:37] PROBLEM Current Load is now: CRITICAL on patchtest i-000000f1 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:42:37] PROBLEM dpkg-check is now: CRITICAL on patchtest i-000000f1 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:42:37] PROBLEM Total Processes is now: CRITICAL on patchtest i-000000f1 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:47:06] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.22, 1.13, 2.82 [20:52:36] PROBLEM host: nova-production1 is DOWN address: i-0000007b CRITICAL - Host Unreachable (i-0000007b) [20:57:06] RECOVERY Current Load is now: OK on wikistats-01 i-00000042 output: OK - load average: 0.26, 1.07, 4.01 [21:01:59] !log nginx deleted nginx-dev2. It launched on a virt host that wasn't finished being configured and as such, was unusable [21:02:00] Logged the message, Master [21:06:04] Is there any way to extend the timeout so puppet can complete for Role::Mediawiki-install::Labs? [21:06:14] it times out? [21:06:33] It fetches 100 MB from the git repo, of course it times out [21:06:39] ah [21:06:40] right [21:06:47] slevinski: just keep re-running it [21:07:05] it'll eventually finish [21:07:19] <^demon|away> Having mediawiki-install::labs do a fresh clone from gerrit.wm.o seems to me like the Wrong Way to do it [21:07:20] pulling from the gerrit server is likely what is causing the timeout [21:07:29] ^demon|away: how else should it do it? [21:07:30] OK, I ran it twice. I'll keep running it an see what happens. [21:07:34] * Ryan_Lane nods [21:07:45] PROBLEM Free ram is now: CRITICAL on ganglia-test2 i-00000250 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:07:45] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:08:16] <^demon|away> Ryan_Lane: Granted, but is it going to re-clone each time puppet is run? [21:08:21] PROBLEM dpkg-check is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:08:21] no [21:08:21] PROBLEM Current Users is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:08:21] PROBLEM Total Processes is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:08:26] <^demon|away> Or is git::clone smart enough to notice when it's already there? [21:08:40] git clone would fail, if run twice in the same spot [21:09:03] <^demon|away> *sigh* [21:09:28] RECOVERY HTTP is now: OK on deployment-apache30 i-000002d3 output: HTTP OK: HTTP/1.1 200 OK - 27256 bytes in 0.012 second response time [21:09:29] RECOVERY HTTP is now: OK on grail i-000002c6 output: HTTP OK: HTTP/1.1 200 OK - 453 bytes in 0.965 second response time [21:09:32] <^demon|away> Replication could help this--if we could encourage people to do their clones/pulls from the r/o mirror, and only use gerrit.wm.o for write operations :) [21:09:38] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [21:09:45] well, this is puppet [21:09:48] we can do that [21:09:59] wtf [21:10:03] Ryan_Lane: Are you doing a VM migration now? [21:10:04] <^demon|away> Right. I'll add it to my list of things to look at with replication. [21:10:08] RoanKattouw: yes [21:10:09] why? [21:10:23] Hmm maybe that's why my VM is down [21:10:29] which vm? [21:10:41] it shouldn't be down [21:10:45] Or wait, it seems to be up, just slow [21:10:48] yeah [21:10:53] nm, must be a local issue with the HTTP server [21:10:55] that sounds about right [21:11:02] everything is slow right now [21:11:03] Although SSHing in is slow as molasses [21:11:07] PROBLEM Current Load is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:11:07] PROBLEM dpkg-check is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:11:21] I'm bringing up the new virt hosts that don't use gluster right now [21:12:15] migration works like crap [21:12:43] PROBLEM Free ram is now: WARNING on ganglia-test2 i-00000250 output: Warning: 17% free memory [21:12:43] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 190 processes [21:13:13] PROBLEM Disk Space is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:13:14] PROBLEM Current Load is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:13:14] PROBLEM Current Users is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:13:14] PROBLEM Free ram is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:13:14] PROBLEM Total Processes is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:13:18] PROBLEM dpkg-check is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:13:49] PROBLEM Total Processes is now: CRITICAL on dumps-2 i-000002d8 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:04] PROBLEM Total Processes is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:12] PROBLEM Free ram is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:12] PROBLEM Disk Space is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:12] PROBLEM Current Users is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:12] PROBLEM Disk Space is now: CRITICAL on dumps-2 i-000002d8 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:12] PROBLEM Current Load is now: CRITICAL on dumps-2 i-000002d8 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:12] PROBLEM Current Users is now: CRITICAL on dumps-2 i-000002d8 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:13] PROBLEM Current Load is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:13] PROBLEM Disk Space is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:14] PROBLEM Free ram is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:16] ewww [21:14:34] PROBLEM Current Users is now: CRITICAL on en-wiki-db-precise i-0000023c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:35] PROBLEM Current Load is now: CRITICAL on en-wiki-db-precise i-0000023c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:35] PROBLEM Disk Space is now: CRITICAL on en-wiki-db-precise i-0000023c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:35] PROBLEM Total Processes is now: CRITICAL on en-wiki-db-precise i-0000023c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:44] PROBLEM Free ram is now: CRITICAL on en-wiki-db-precise i-0000023c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:44] PROBLEM dpkg-check is now: CRITICAL on en-wiki-db-precise i-0000023c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:44] PROBLEM Total Processes is now: CRITICAL on aggregator-test1 i-000002bf output: CHECK_NRPE: Socket timeout after 10 seconds. [21:16:48] PROBLEM host: signwriting-ase4 is DOWN address: i-000002ff CRITICAL - Host Unreachable (i-000002ff) [21:17:08] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [21:17:38] PROBLEM dpkg-check is now: CRITICAL on dumps-2 i-000002d8 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:17:38] PROBLEM HTTP is now: CRITICAL on deployment-apache30 i-000002d3 output: CRITICAL - Socket timeout after 10 seconds [21:17:38] PROBLEM HTTP is now: CRITICAL on grail i-000002c6 output: CRITICAL - Socket timeout after 10 seconds [21:17:38] PROBLEM Free ram is now: CRITICAL on dumps-2 i-000002d8 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:18:25] RECOVERY dpkg-check is now: OK on build-precise1 i-00000273 output: All packages OK [21:18:25] RECOVERY Current Users is now: OK on build-precise1 i-00000273 output: USERS OK - 0 users currently logged in [21:18:25] RECOVERY Total Processes is now: OK on build-precise1 i-00000273 output: PROCS OK: 85 processes [21:18:53] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: CHECK_NRPE: Socket timeout after 10 seconds. [21:19:08] PROBLEM Current Load is now: CRITICAL on mobile-wlm i-000002bc output: CHECK_NRPE: Socket timeout after 10 seconds. [21:19:08] PROBLEM dpkg-check is now: CRITICAL on mobile-wlm i-000002bc output: CHECK_NRPE: Socket timeout after 10 seconds. [21:19:09] PROBLEM Current Users is now: CRITICAL on mobile-wlm i-000002bc output: CHECK_NRPE: Socket timeout after 10 seconds. [21:19:09] PROBLEM Total Processes is now: CRITICAL on mobile-wlm i-000002bc output: CHECK_NRPE: Socket timeout after 10 seconds. [21:19:17] PROBLEM dpkg-check is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [21:19:36] RECOVERY Disk Space is now: OK on pediapress-ocg2 i-00000234 output: DISK OK [21:19:36] RECOVERY Total Processes is now: OK on pediapress-ocg2 i-00000234 output: PROCS OK: 92 processes [21:19:45] RECOVERY Free ram is now: OK on pediapress-ocg2 i-00000234 output: OK: 84% free memory [21:19:45] RECOVERY Current Users is now: OK on pediapress-ocg2 i-00000234 output: USERS OK - 0 users currently logged in [21:19:45] RECOVERY Current Load is now: OK on build-precise1 i-00000273 output: OK - load average: 1.15, 3.63, 3.36 [21:19:45] RECOVERY Disk Space is now: OK on build-precise1 i-00000273 output: DISK OK [21:19:45] RECOVERY Free ram is now: OK on build-precise1 i-00000273 output: OK: 87% free memory [21:19:55] RECOVERY Current Users is now: OK on en-wiki-db-precise i-0000023c output: USERS OK - 0 users currently logged in [21:19:55] RECOVERY Current Load is now: OK on en-wiki-db-precise i-0000023c output: OK - load average: 0.97, 3.37, 2.99 [21:19:55] RECOVERY Disk Space is now: OK on en-wiki-db-precise i-0000023c output: DISK OK [21:19:55] RECOVERY Total Processes is now: OK on en-wiki-db-precise i-0000023c output: PROCS OK: 85 processes [21:20:01] RECOVERY Free ram is now: OK on en-wiki-db-precise i-0000023c output: OK: 78% free memory [21:20:01] RECOVERY dpkg-check is now: OK on en-wiki-db-precise i-0000023c output: All packages OK [21:20:01] PROBLEM Total Processes is now: WARNING on aggregator-test1 i-000002bf output: PROCS WARNING: 187 processes [21:20:10] PROBLEM Free ram is now: CRITICAL on mobile-wlm i-000002bc output: CHECK_NRPE: Socket timeout after 10 seconds. [21:21:15] RECOVERY Current Load is now: OK on pediapress-ocg2 i-00000234 output: OK - load average: 5.22, 6.65, 4.76 [21:21:15] RECOVERY dpkg-check is now: OK on pediapress-ocg2 i-00000234 output: All packages OK [21:22:36] PROBLEM Disk Space is now: CRITICAL on mobile-wlm i-000002bc output: CHECK_NRPE: Socket timeout after 10 seconds. [21:22:36] PROBLEM Disk Space is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:22:36] PROBLEM Current Users is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:22:36] PROBLEM Free ram is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:22:36] PROBLEM Total Processes is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:22:43] PROBLEM dpkg-check is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:22:43] PROBLEM Total Processes is now: CRITICAL on deployment-jobrunner05 i-0000028c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:22:52] PROBLEM Current Load is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [21:22:52] PROBLEM Disk Space is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [21:22:52] PROBLEM Total Processes is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [21:23:01] PROBLEM Free ram is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [21:23:01] PROBLEM Current Users is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [21:23:01] PROBLEM Current Load is now: CRITICAL on deployment-jobrunner05 i-0000028c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:23:01] PROBLEM Current Load is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:23:10] RECOVERY Free ram is now: OK on dumps-2 i-000002d8 output: OK: 87% free memory [21:23:10] RECOVERY dpkg-check is now: OK on dumps-2 i-000002d8 output: All packages OK [21:23:25] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [21:23:44] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [21:24:30] PROBLEM Current Load is now: WARNING on bots-3 i-000000e5 output: WARNING - load average: 8.03, 7.17, 5.51 [21:24:38] PROBLEM host: nova-production1 is DOWN address: i-0000007b CRITICAL - Host Unreachable (i-0000007b) [21:24:39] PROBLEM Disk Space is now: CRITICAL on deployment-jobrunner05 i-0000028c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:24:39] PROBLEM Current Users is now: CRITICAL on deployment-jobrunner05 i-0000028c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:24:39] PROBLEM Free ram is now: CRITICAL on deployment-jobrunner05 i-0000028c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:24:39] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner05 i-0000028c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:24:53] PROBLEM Puppet freshness is now: CRITICAL on deployment-transcoding i-00000105 output: Puppet has not run in last 20 hours [21:25:57] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:26:37] PROBLEM Puppet freshness is now: CRITICAL on gerrit i-000000ff output: Puppet has not run in last 20 hours [21:27:05] PROBLEM Current Load is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [21:27:06] PROBLEM Disk Space is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [21:27:06] PROBLEM Current Users is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [21:27:06] PROBLEM Total Processes is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [21:27:12] PROBLEM dpkg-check is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [21:27:12] PROBLEM SSH is now: CRITICAL on deployment-sql i-000000d0 output: CRITICAL - Socket timeout after 10 seconds [21:27:12] RECOVERY host: signwriting-ase4 is UP address: i-000002ff PING OK - Packet loss = 0%, RTA = 30.19 ms [21:27:12] PROBLEM Current Load is now: WARNING on deployment-jobrunner05 i-0000028c output: WARNING - load average: 11.49, 9.44, 5.23 [21:27:32] PROBLEM dpkg-check is now: CRITICAL on deployment-sql i-000000d0 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:28:42] RECOVERY Total Processes is now: OK on dumps-2 i-000002d8 output: PROCS OK: 101 processes [21:28:52] RECOVERY Current Load is now: OK on mobile-wlm i-000002bc output: OK - load average: 4.43, 6.04, 4.35 [21:28:52] RECOVERY Current Users is now: OK on mobile-wlm i-000002bc output: USERS OK - 0 users currently logged in [21:28:53] RECOVERY Total Processes is now: OK on mobile-wlm i-000002bc output: PROCS OK: 110 processes [21:29:00] RECOVERY dpkg-check is now: OK on mobile-wlm i-000002bc output: All packages OK [21:29:00] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: CHECK_NRPE: Socket timeout after 10 seconds. [21:29:45] PROBLEM Total Processes is now: CRITICAL on aggregator-test1 i-000002bf output: CHECK_NRPE: Socket timeout after 10 seconds. [21:31:50] PROBLEM Current Load is now: CRITICAL on aggregator-test1 i-000002bf output: CHECK_NRPE: Socket timeout after 10 seconds. [21:31:50] PROBLEM Current Users is now: CRITICAL on e3 i-00000291 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:31:50] PROBLEM Current Load is now: CRITICAL on e3 i-00000291 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:31:50] PROBLEM Disk Space is now: CRITICAL on e3 i-00000291 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:31:50] PROBLEM Total Processes is now: CRITICAL on e3 i-00000291 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:31:58] PROBLEM Free ram is now: CRITICAL on e3 i-00000291 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:31:58] PROBLEM dpkg-check is now: CRITICAL on e3 i-00000291 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:32:08] RECOVERY SSH is now: OK on deployment-sql i-000000d0 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [21:32:09] RECOVERY Disk Space is now: OK on ve-nodejs i-00000245 output: DISK OK [21:32:09] RECOVERY Current Users is now: OK on ve-nodejs i-00000245 output: USERS OK - 1 users currently logged in [21:32:09] RECOVERY Free ram is now: OK on ve-nodejs i-00000245 output: OK: 76% free memory [21:32:09] RECOVERY Total Processes is now: OK on ve-nodejs i-00000245 output: PROCS OK: 100 processes [21:32:17] PROBLEM Current Load is now: WARNING on ve-nodejs i-00000245 output: WARNING - load average: 9.53, 8.91, 5.98 [21:32:18] PROBLEM Current Load is now: WARNING on aggregator1 i-0000010c output: WARNING - load average: 5.87, 7.94, 5.38 [21:32:18] RECOVERY dpkg-check is now: OK on ve-nodejs i-00000245 output: All packages OK [21:32:18] PROBLEM Current Load is now: CRITICAL on deployment-jobrunner05 i-0000028c output: CHECK_NRPE: Socket timeout after 10 seconds. [21:32:18] PROBLEM Total Processes is now: CRITICAL on signwriting-ase4 i-000002ff output: CHECK_NRPE: Socket timeout after 10 seconds. [21:32:18] PROBLEM dpkg-check is now: CRITICAL on signwriting-ase4 i-000002ff output: CHECK_NRPE: Socket timeout after 10 seconds. [21:32:18] PROBLEM Current Load is now: CRITICAL on signwriting-ase4 i-000002ff output: CHECK_NRPE: Socket timeout after 10 seconds. [21:32:18] PROBLEM Current Users is now: CRITICAL on signwriting-ase4 i-000002ff output: CHECK_NRPE: Socket timeout after 10 seconds. [21:32:19] PROBLEM Free ram is now: CRITICAL on signwriting-ase4 i-000002ff output: CHECK_NRPE: Socket timeout after 10 seconds. [21:32:19] PROBLEM Total Processes is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:32:29] RECOVERY dpkg-check is now: OK on deployment-sql i-000000d0 output: All packages OK [21:32:38] PROBLEM Current Load is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:32:39] PROBLEM Current Users is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:32:39] PROBLEM Disk Space is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:33:03] spam spam spam [21:34:01] RECOVERY Disk Space is now: OK on incubator-bot0 i-00000296 output: DISK OK [21:34:01] RECOVERY Current Load is now: OK on incubator-bot0 i-00000296 output: OK - load average: 1.11, 3.13, 4.20 [21:34:01] RECOVERY Current Users is now: OK on incubator-bot0 i-00000296 output: USERS OK - 0 users currently logged in [21:34:01] RECOVERY Free ram is now: OK on incubator-bot0 i-00000296 output: OK: 85% free memory [21:34:01] RECOVERY Total Processes is now: OK on incubator-bot0 i-00000296 output: PROCS OK: 86 processes [21:35:10] RECOVERY dpkg-check is now: OK on incubator-bot0 i-00000296 output: All packages OK [21:35:10] RECOVERY Disk Space is now: OK on dumps-2 i-000002d8 output: DISK OK [21:35:10] PROBLEM Current Load is now: WARNING on dumps-2 i-000002d8 output: WARNING - load average: 0.69, 4.07, 5.02 [21:35:10] RECOVERY Current Users is now: OK on dumps-2 i-000002d8 output: USERS OK - 0 users currently logged in [21:36:08] PROBLEM host: en-wiki-db-precise is DOWN address: i-0000023c CRITICAL - Host Unreachable (i-0000023c) [21:36:37] RECOVERY Current Users is now: OK on e3 i-00000291 output: USERS OK - 0 users currently logged in [21:36:37] PROBLEM Current Load is now: WARNING on e3 i-00000291 output: WARNING - load average: 6.25, 8.08, 6.06 [21:36:37] RECOVERY Disk Space is now: OK on e3 i-00000291 output: DISK OK [21:36:37] RECOVERY Free ram is now: OK on e3 i-00000291 output: OK: 89% free memory [21:36:37] RECOVERY Total Processes is now: OK on e3 i-00000291 output: PROCS OK: 106 processes [21:36:45] RECOVERY dpkg-check is now: OK on e3 i-00000291 output: All packages OK [21:37:42] PROBLEM Current Load is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:38:38] PROBLEM Current Users is now: CRITICAL on pediapress-ocg1 i-00000233 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:38:38] PROBLEM Current Load is now: CRITICAL on pediapress-ocg1 i-00000233 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:38:38] PROBLEM Disk Space is now: CRITICAL on pediapress-ocg1 i-00000233 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:38:38] PROBLEM Total Processes is now: CRITICAL on pediapress-ocg1 i-00000233 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:38:49] PROBLEM Free ram is now: CRITICAL on pediapress-ocg1 i-00000233 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:38:49] PROBLEM dpkg-check is now: CRITICAL on pediapress-ocg1 i-00000233 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:39:41] PROBLEM host: fr-wiki-db-precise is DOWN address: i-0000023e CRITICAL - Host Unreachable (i-0000023e) [21:39:51] RECOVERY Current Load is now: OK on bots-3 i-000000e5 output: OK - load average: 3.73, 4.22, 4.99 [21:39:51] RECOVERY Current Load is now: OK on dumps-2 i-000002d8 output: OK - load average: 0.47, 1.77, 3.75 [21:39:51] PROBLEM Current Load is now: CRITICAL on deployment-apache30 i-000002d3 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:39:51] PROBLEM Current Users is now: CRITICAL on deployment-apache30 i-000002d3 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:39:51] PROBLEM Disk Space is now: CRITICAL on deployment-apache30 i-000002d3 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:39:51] PROBLEM Free ram is now: CRITICAL on deployment-apache30 i-000002d3 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:39:51] PROBLEM Total Processes is now: CRITICAL on deployment-apache30 i-000002d3 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:40:04] RECOVERY Free ram is now: OK on mobile-wlm i-000002bc output: OK: 74% free memory [21:41:09] PROBLEM Disk Space is now: CRITICAL on signwriting-ase4 i-000002ff output: CHECK_NRPE: Socket timeout after 10 seconds. [21:41:09] PROBLEM Current Users is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:41:09] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [21:41:29] PROBLEM Current Load is now: WARNING on aggregator-test1 i-000002bf output: WARNING - load average: 4.10, 5.57, 6.10 [21:41:39] PROBLEM Disk Space is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:41:39] PROBLEM Current Load is now: CRITICAL on e3 i-00000291 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:42:16] RECOVERY Total Processes is now: OK on deployment-jobrunner05 i-0000028c output: PROCS OK: 108 processes [21:42:30] RECOVERY Disk Space is now: OK on mobile-wlm i-000002bc output: DISK OK [21:42:30] PROBLEM SSH is now: CRITICAL on signwriting-ase4 i-000002ff output: CRITICAL - Socket timeout after 10 seconds [21:42:30] PROBLEM Total Processes is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:43:06] PROBLEM Current Load is now: UNKNOWN on wikistats-history-01 i-000002e2 output: Invalid host name i-000002e2 [21:43:14] PROBLEM Current Users is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:43:14] PROBLEM Disk Space is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:43:15] PROBLEM Free ram is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:43:15] PROBLEM Total Processes is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:43:29] RECOVERY Current Users is now: OK on deployment-jobrunner05 i-0000028c output: USERS OK - 0 users currently logged in [21:43:30] RECOVERY Current Load is now: OK on pediapress-ocg1 i-00000233 output: OK - load average: 1.30, 4.82, 4.83 [21:43:30] RECOVERY Disk Space is now: OK on pediapress-ocg1 i-00000233 output: DISK OK [21:43:30] RECOVERY Free ram is now: OK on pediapress-ocg1 i-00000233 output: OK: 87% free memory [21:43:30] RECOVERY Current Users is now: OK on pediapress-ocg1 i-00000233 output: USERS OK - 0 users currently logged in [21:43:30] RECOVERY dpkg-check is now: OK on pediapress-ocg1 i-00000233 output: All packages OK [21:43:30] RECOVERY Total Processes is now: OK on pediapress-ocg1 i-00000233 output: PROCS OK: 91 processes [21:44:09] RECOVERY Disk Space is now: OK on deployment-jobrunner05 i-0000028c output: DISK OK [21:44:09] PROBLEM Free ram is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:44:19] PROBLEM Puppet freshness is now: CRITICAL on labs-nfs1 i-0000005d output: Puppet has not run in last 20 hours [21:44:39] RECOVERY Free ram is now: OK on deployment-jobrunner05 i-0000028c output: OK: 90% free memory [21:45:06] PROBLEM dpkg-check is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:45:11] RECOVERY host: nova-production1 is UP address: i-0000007b PING OK - Packet loss = 0%, RTA = 0.97 ms [21:45:11] PROBLEM Current Users is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:45:15] RECOVERY dpkg-check is now: OK on deployment-jobrunner05 i-0000028c output: All packages OK [21:45:56] RECOVERY Current Users is now: OK on wikistats-history-01 i-000002e2 output: USERS OK - 0 users currently logged in [21:45:56] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [21:46:35] RECOVERY Disk Space is now: OK on wikistats-history-01 i-000002e2 output: DISK OK [21:47:05] RECOVERY Current Users is now: OK on etherpad-lite i-000002de output: USERS OK - 0 users currently logged in [21:47:05] PROBLEM Current Load is now: WARNING on etherpad-lite i-000002de output: WARNING - load average: 1.53, 5.01, 5.07 [21:47:05] RECOVERY Disk Space is now: OK on etherpad-lite i-000002de output: DISK OK [21:47:05] RECOVERY Total Processes is now: OK on etherpad-lite i-000002de output: PROCS OK: 116 processes [21:47:10] PROBLEM Current Load is now: WARNING on signwriting-ase4 i-000002ff output: WARNING - load average: 6.40, 8.59, 7.76 [21:47:11] RECOVERY Current Users is now: OK on signwriting-ase4 i-000002ff output: USERS OK - 0 users currently logged in [21:47:11] PROBLEM Free ram is now: UNKNOWN on signwriting-ase4 i-000002ff output: NRPE: Unable to read output [21:47:11] RECOVERY dpkg-check is now: OK on etherpad-lite i-000002de output: All packages OK [21:47:11] RECOVERY Total Processes is now: OK on zeromq1 i-000002b7 output: PROCS OK: 89 processes [21:47:16] RECOVERY Total Processes is now: OK on signwriting-ase4 i-000002ff output: PROCS OK: 100 processes [21:47:21] RECOVERY dpkg-check is now: OK on signwriting-ase4 i-000002ff output: All packages OK [21:47:21] RECOVERY Total Processes is now: OK on wikistats-history-01 i-000002e2 output: PROCS OK: 87 processes [21:47:45] RECOVERY Current Load is now: OK on zeromq1 i-000002b7 output: OK - load average: 0.77, 4.71, 4.29 [21:47:45] RECOVERY Current Users is now: OK on zeromq1 i-000002b7 output: USERS OK - 0 users currently logged in [21:47:45] RECOVERY Disk Space is now: OK on zeromq1 i-000002b7 output: DISK OK [21:47:55] RECOVERY Current Load is now: OK on wikistats-history-01 i-000002e2 output: OK - load average: 0.56, 4.64, 4.75 [21:48:15] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [21:49:05] RECOVERY Free ram is now: OK on zeromq1 i-000002b7 output: OK: 81% free memory [21:49:35] RECOVERY Current Load is now: OK on deployment-apache30 i-000002d3 output: OK - load average: 0.41, 3.51, 4.23 [21:49:36] RECOVERY Disk Space is now: OK on deployment-apache30 i-000002d3 output: DISK OK [21:49:36] RECOVERY Current Users is now: OK on deployment-apache30 i-000002d3 output: USERS OK - 0 users currently logged in [21:49:36] RECOVERY Free ram is now: OK on deployment-apache30 i-000002d3 output: OK: 92% free memory [21:49:36] RECOVERY dpkg-check is now: OK on zeromq1 i-000002b7 output: All packages OK [21:49:36] RECOVERY Total Processes is now: OK on deployment-apache30 i-000002d3 output: PROCS OK: 119 processes [21:52:05] RECOVERY Current Load is now: OK on aggregator1 i-0000010c output: OK - load average: 0.31, 2.05, 4.02 [21:52:06] RECOVERY Current Load is now: OK on etherpad-lite i-000002de output: OK - load average: 0.43, 2.20, 3.84 [21:52:06] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 1.18, 5.20, 5.27 [21:54:45] PROBLEM host: nova-production1 is DOWN address: i-0000007b CRITICAL - Host Unreachable (i-0000007b) [21:56:35] RECOVERY Current Load is now: OK on aggregator-test1 i-000002bf output: OK - load average: 0.64, 1.49, 3.81 [21:57:05] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 2.42, 3.66, 4.61 [22:02:15] PROBLEM Puppet freshness is now: CRITICAL on wikistats-01 i-00000042 output: Puppet has not run in last 20 hours [22:08:15] PROBLEM Puppet freshness is now: CRITICAL on maps-test3 i-0000028f output: Puppet has not run in last 20 hours [22:11:15] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [22:17:45] PROBLEM Free ram is now: WARNING on bots-cb i-0000009e output: Warning: 14% free memory [22:23:12] whoever just created an instance, it failed because you hit a new node that isn't working properly yet [22:23:29] I'm going to delete it [22:25:46] PROBLEM host: nova-production1 is DOWN address: i-0000007b CRITICAL - Host Unreachable (i-0000007b) [22:27:35] PROBLEM Free ram is now: CRITICAL on bots-cb i-0000009e output: Critical: 5% free memory [22:35:08] PROBLEM host: pdbhandler-1 is DOWN address: i-00000307 CRITICAL - Host Unreachable (i-00000307) [22:37:38] RECOVERY Free ram is now: OK on bots-cb i-0000009e output: OK: 41% free memory [22:37:58] PROBLEM dpkg-check is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [22:40:15] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 4.67, 18.27, 12.47 [22:41:25] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [22:42:45] RECOVERY dpkg-check is now: OK on bots-cb i-0000009e output: All packages OK [22:43:45] PROBLEM dpkg-check is now: CRITICAL on testing-virt7 i-00000308 output: Connection refused by host [22:46:55] PROBLEM Free ram is now: UNKNOWN on testing-virt7 i-00000308 output: NRPE: Unable to read output [22:48:45] RECOVERY dpkg-check is now: OK on testing-virt7 i-00000308 output: All packages OK [22:53:44] PROBLEM Current Load is now: CRITICAL on testing-virt8 i-00000309 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:54:24] PROBLEM Current Users is now: CRITICAL on testing-virt8 i-00000309 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:55:05] PROBLEM Disk Space is now: CRITICAL on testing-virt8 i-00000309 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:55:45] PROBLEM Free ram is now: CRITICAL on testing-virt8 i-00000309 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:55:54] PROBLEM host: nova-production1 is DOWN address: i-0000007b CRITICAL - Host Unreachable (i-0000007b) [22:56:54] PROBLEM Total Processes is now: CRITICAL on testing-virt8 i-00000309 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:57:34] PROBLEM dpkg-check is now: CRITICAL on testing-virt8 i-00000309 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:03:46] PROBLEM Current Load is now: CRITICAL on pdbhandler-dev i-0000030a output: Connection refused by host [23:04:25] PROBLEM Current Users is now: CRITICAL on pdbhandler-dev i-0000030a output: Connection refused by host [23:04:45] PROBLEM Total Processes is now: WARNING on aggregator-test1 i-000002bf output: PROCS WARNING: 181 processes [23:05:05] PROBLEM Disk Space is now: CRITICAL on pdbhandler-dev i-0000030a output: Connection refused by host [23:05:45] PROBLEM Free ram is now: CRITICAL on pdbhandler-dev i-0000030a output: Connection refused by host [23:08:45] RECOVERY Current Load is now: OK on pdbhandler-dev i-0000030a output: OK - load average: 0.08, 0.52, 0.36 [23:09:31] RECOVERY Current Users is now: OK on pdbhandler-dev i-0000030a output: USERS OK - 0 users currently logged in [23:10:05] RECOVERY Disk Space is now: OK on pdbhandler-dev i-0000030a output: DISK OK [23:10:45] PROBLEM Free ram is now: UNKNOWN on pdbhandler-dev i-0000030a output: NRPE: Unable to read output [23:11:25] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 check_ping: Invalid hostname/address - i-000002f0 [23:19:06] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 7% free memory [23:20:15] PROBLEM Puppet freshness is now: CRITICAL on maps-test2 i-00000253 output: Puppet has not run in last 20 hours [23:25:55] PROBLEM host: nova-production1 is DOWN address: i-0000007b CRITICAL - Host Unreachable (i-0000007b) [23:41:26] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 check_ping: Invalid hostname/address - i-000002f0 [23:56:14] PROBLEM host: nova-production1 is DOWN address: i-0000007b CRITICAL - Host Unreachable (i-0000007b)