[00:00:35] PROBLEM HTTP is now: WARNING on deployment-apache21 i-0000026d output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.004 second response time [00:29:46] RECOVERY Free ram is now: OK on ganglia-test2 i-00000250 output: OK: 85% free memory [02:42:21] 05/29/2012 - 02:42:21 - Updating keys for laner at /export/home/deployment-prep/laner [02:48:19] 05/29/2012 - 02:48:19 - Updating keys for laner at /export/home/deployment-prep/laner [02:49:19] 05/29/2012 - 02:49:19 - Updating keys for laner at /export/home/deployment-prep/laner [03:05:37] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 12.03, 10.71, 6.12 [03:06:20] 05/29/2012 - 03:06:19 - Updating keys for laner at /export/home/deployment-prep/laner [03:08:24] 05/29/2012 - 03:08:24 - Updating keys for laner at /export/home/deployment-prep/laner [03:09:00] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 3.31, 10.45, 7.31 [03:19:38] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.37, 2.08, 4.29 [03:25:09] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 0.90, 2.12, 3.96 [03:26:04] New patchset: Andrew Bogott; "Yet another stab at correct puppet syntax." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9213 [03:26:19] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9213 [03:27:00] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9213 [03:27:02] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9213 [03:30:13] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.54, 1.04, 2.96 [03:34:58] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 17% free memory [03:44:20] 05/29/2012 - 03:44:20 - Updating keys for laner at /export/home/deployment-prep/laner [03:45:57] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 13% free memory [03:50:01] PROBLEM Current Users is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:51:18] PROBLEM Disk Space is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:54:27] PROBLEM dpkg-check is now: CRITICAL on mwreview-test2 i-00000298 output: Connection refused by host [03:55:29] RECOVERY Current Users is now: OK on maps-tilemill1 i-00000294 output: USERS OK - 0 users currently logged in [03:56:05] PROBLEM Current Load is now: CRITICAL on mwreview-test2 i-00000298 output: Connection refused by host [03:56:14] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 4% free memory [03:56:19] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:56:19] PROBLEM Current Users is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:56:36] RECOVERY Disk Space is now: OK on maps-tilemill1 i-00000294 output: DISK OK [03:56:49] PROBLEM Current Users is now: CRITICAL on mwreview-test2 i-00000298 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:56:54] PROBLEM Disk Space is now: CRITICAL on mwreview-test2 i-00000298 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:57:28] PROBLEM Current Load is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:57:45] PROBLEM Free ram is now: CRITICAL on mwreview-test2 i-00000298 output: Connection refused by host [03:58:40] PROBLEM Total Processes is now: CRITICAL on mwreview-test2 i-00000298 output: Connection refused by host [04:00:30] RECOVERY Current Load is now: OK on worker1 i-00000208 output: OK - load average: 0.16, 1.92, 1.59 [04:00:30] RECOVERY Current Users is now: OK on worker1 i-00000208 output: USERS OK - 0 users currently logged in [04:00:30] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 96% free memory [04:01:00] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 14% free memory [04:01:30] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 3% free memory [04:01:50] RECOVERY Current Load is now: OK on incubator-bot1 i-00000251 output: OK - load average: 0.28, 1.66, 1.49 [04:03:06] Eh, anyone knows how can I properly change my email address? [04:04:20] PROBLEM dpkg-check is now: UNKNOWN on mwreview-test2 i-00000298 output: Invalid host name i-00000298 [04:04:50] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 13% free memory [04:06:34] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:09:11] PROBLEM host: mwreview-test2 is DOWN address: i-00000298 check_ping: Invalid hostname/address - i-00000298 [04:17:24] PROBLEM Total Processes is now: CRITICAL on mwreview-test3 i-00000299 output: Connection refused by host [04:17:34] PROBLEM dpkg-check is now: CRITICAL on mwreview-test3 i-00000299 output: Connection refused by host [04:20:19] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 5% free memory [04:20:29] PROBLEM Current Load is now: CRITICAL on mwreview-test3 i-00000299 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:21:07] PROBLEM Disk Space is now: CRITICAL on mwreview-test3 i-00000299 output: Connection refused or timed out [04:21:07] PROBLEM Current Users is now: CRITICAL on mwreview-test3 i-00000299 output: Connection refused or timed out [04:21:17] PROBLEM Free ram is now: CRITICAL on mwreview-test3 i-00000299 output: Connection refused by host [04:26:20] PROBLEM Disk Space is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:26:25] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 94% free memory [04:26:25] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 5% free memory [04:27:58] PROBLEM Total Processes is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:28:05] PROBLEM Current Load is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:28:05] PROBLEM Free ram is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:28:05] PROBLEM dpkg-check is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:31:39] PROBLEM Current Users is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:31:49] PROBLEM Current Users is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [04:31:49] PROBLEM Disk Space is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:09] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:09] PROBLEM Current Users is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:14] PROBLEM Total Processes is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:19] PROBLEM Disk Space is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:19] PROBLEM Free ram is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:59] PROBLEM Current Users is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:59] PROBLEM dpkg-check is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:59] PROBLEM Total Processes is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:34:04] PROBLEM Disk Space is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:34:04] PROBLEM Free ram is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:34:04] PROBLEM Total Processes is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:35:11] PROBLEM Puppet freshness is now: CRITICAL on mailman-01 i-00000235 output: Puppet has not run in last 20 hours [04:36:56] RECOVERY Current Users is now: OK on reportcard2 i-000001ea output: USERS OK - 0 users currently logged in [04:36:56] RECOVERY Disk Space is now: OK on reportcard2 i-000001ea output: DISK OK [04:36:56] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:37:26] RECOVERY Current Load is now: OK on maps-tilemill1 i-00000294 output: OK - load average: 0.37, 3.05, 3.54 [04:37:27] RECOVERY Total Processes is now: OK on maps-tilemill1 i-00000294 output: PROCS OK: 108 processes [04:37:32] RECOVERY Free ram is now: OK on maps-tilemill1 i-00000294 output: OK: 86% free memory [04:37:32] RECOVERY dpkg-check is now: OK on maps-tilemill1 i-00000294 output: All packages OK [04:37:49] RECOVERY Total Processes is now: OK on worker1 i-00000208 output: PROCS OK: 88 processes [04:37:54] RECOVERY Disk Space is now: OK on worker1 i-00000208 output: DISK OK [04:37:54] RECOVERY Free ram is now: OK on worker1 i-00000208 output: OK: 91% free memory [04:38:44] RECOVERY Current Users is now: OK on incubator-bot1 i-00000251 output: USERS OK - 0 users currently logged in [04:38:44] RECOVERY Total Processes is now: OK on incubator-bot1 i-00000251 output: PROCS OK: 125 processes [04:38:49] RECOVERY dpkg-check is now: OK on incubator-bot1 i-00000251 output: All packages OK [04:38:49] RECOVERY Disk Space is now: OK on rds i-00000207 output: DISK OK [04:38:49] RECOVERY Free ram is now: OK on rds i-00000207 output: OK: 92% free memory [04:38:49] RECOVERY Total Processes is now: OK on rds i-00000207 output: PROCS OK: 81 processes [05:03:43] PROBLEM Current Load is now: CRITICAL on mwreview-foo i-0000029a output: Connection refused by host [05:04:47] PROBLEM Current Users is now: CRITICAL on mwreview-foo i-0000029a output: Connection refused by host [05:05:28] PROBLEM Disk Space is now: CRITICAL on mwreview-foo i-0000029a output: Connection refused by host [05:05:57] PROBLEM Free ram is now: CRITICAL on mwreview-foo i-0000029a output: Connection refused by host [05:06:57] PROBLEM Total Processes is now: CRITICAL on mwreview-foo i-0000029a output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:07:27] PROBLEM dpkg-check is now: CRITICAL on mwreview-foo i-0000029a output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:13:57] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 6.16, 4.39, 4.91 [06:21:57] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 5.84, 5.52, 5.27 [06:33:09] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 5.23, 6.75, 3.91 [06:36:53] PROBLEM Disk Space is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:14] PROBLEM Current Users is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:14] PROBLEM Current Load is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:14] PROBLEM Total Processes is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:19] PROBLEM Free ram is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:19] PROBLEM dpkg-check is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:19] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:21] PROBLEM dpkg-check is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:22] PROBLEM Current Users is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:22] PROBLEM Disk Space is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:22] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:22] PROBLEM Current Users is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:22] PROBLEM Disk Space is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:22] PROBLEM Current Load is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:23] PROBLEM Current Users is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:23] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:24] PROBLEM Total Processes is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:28] PROBLEM Free ram is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:28] PROBLEM Total Processes is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:46:10] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 6.07, 10.76, 8.66 [06:47:08] PROBLEM Current Load is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:20] PROBLEM Disk Space is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:20] PROBLEM Free ram is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:20] PROBLEM Total Processes is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:25] PROBLEM dpkg-check is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:06] PROBLEM Disk Space is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:06] PROBLEM Current Users is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:16] PROBLEM Current Load is now: WARNING on aggregator-test3 i-00000293 output: WARNING - load average: 3.93, 5.80, 6.06 [06:49:42] RECOVERY Disk Space is now: OK on incubator-bot1 i-00000251 output: DISK OK [06:50:10] PROBLEM Current Load is now: WARNING on incubator-bot1 i-00000251 output: WARNING - load average: 11.72, 11.86, 7.82 [06:50:10] RECOVERY Current Users is now: OK on incubator-bot1 i-00000251 output: USERS OK - 0 users currently logged in [06:50:11] RECOVERY Free ram is now: OK on incubator-bot1 i-00000251 output: OK: 69% free memory [06:50:11] RECOVERY Total Processes is now: OK on incubator-bot1 i-00000251 output: PROCS OK: 124 processes [06:50:45] PROBLEM Current Load is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:45] PROBLEM Current Users is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:45] PROBLEM Disk Space is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:45] PROBLEM Free ram is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:45] PROBLEM Total Processes is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:50] PROBLEM Current Load is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:50] PROBLEM dpkg-check is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:50] PROBLEM Total Processes is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:07] PROBLEM Free ram is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:32] PROBLEM Current Load is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:17] PROBLEM Total Processes is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:22] PROBLEM Disk Space is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:22] PROBLEM Current Users is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:22] PROBLEM Free ram is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:22] PROBLEM Current Load is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:22] PROBLEM dpkg-check is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:27] PROBLEM Total Processes is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:52] PROBLEM Current Load is now: WARNING on incubator-bot2 i-00000252 output: WARNING - load average: 12.50, 11.00, 6.52 [06:54:17] PROBLEM dpkg-check is now: CRITICAL on deployment-thumbproxy i-0000026b output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:23] PROBLEM Current Load is now: WARNING on labs-nfs1 i-0000005d output: WARNING - load average: 11.60, 10.58, 7.76 [06:55:23] PROBLEM Current Load is now: WARNING on dumps-1 i-00000170 output: WARNING - load average: 8.13, 10.05, 6.05 [06:55:23] PROBLEM Current Load is now: WARNING on bots-apache1 i-000000b0 output: WARNING - load average: 6.92, 6.35, 5.86 [06:55:23] PROBLEM Total Processes is now: WARNING on nagios 127.0.0.1 output: PROCS WARNING: 265 processes [06:55:33] PROBLEM Current Load is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:48] PROBLEM Free ram is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:48] PROBLEM Total Processes is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:54] PROBLEM SSH is now: CRITICAL on reportcard2 i-000001ea output: CRITICAL - Socket timeout after 10 seconds [06:55:54] PROBLEM Free ram is now: CRITICAL on labs-lvs1 i-00000057 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:54] PROBLEM dpkg-check is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:54] PROBLEM dpkg-check is now: CRITICAL on labs-lvs1 i-00000057 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:54] PROBLEM Current Load is now: CRITICAL on test3 i-00000093 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:54] PROBLEM Disk Space is now: CRITICAL on test3 i-00000093 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:55] PROBLEM Current Users is now: CRITICAL on test3 i-00000093 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:55] PROBLEM Total Processes is now: CRITICAL on test3 i-00000093 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:58] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:59] PROBLEM Free ram is now: CRITICAL on test3 i-00000093 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:59] PROBLEM Current Users is now: CRITICAL on labs-lvs1 i-00000057 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:59] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:59] PROBLEM Disk Space is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:59] PROBLEM Current Users is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:59] PROBLEM Total Processes is now: CRITICAL on bots-sql1 i-000000b5 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:00:50] PROBLEM Total Processes is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [07:00:55] PROBLEM Current Load is now: CRITICAL on labs-lvs1 i-00000057 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:00:55] PROBLEM dpkg-check is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [07:00:55] PROBLEM SSH is now: CRITICAL on bots-sql2 i-000000af output: CRITICAL - Socket timeout after 10 seconds [07:01:05] PROBLEM SSH is now: CRITICAL on bots-cb i-0000009e output: CRITICAL - Socket timeout after 10 seconds [07:01:05] PROBLEM Total Processes is now: CRITICAL on labs-lvs1 i-00000057 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:01:10] PROBLEM dpkg-check is now: CRITICAL on test3 i-00000093 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:01:10] PROBLEM Total Processes is now: CRITICAL on shop-analytics-main i-000001e6 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:01:15] PROBLEM HTTP is now: CRITICAL on resourceloader2-apache i-000001d7 output: CRITICAL - Socket timeout after 10 seconds [07:01:15] PROBLEM SSH is now: CRITICAL on rds i-00000207 output: CRITICAL - Socket timeout after 10 seconds [07:01:15] PROBLEM Disk Space is now: CRITICAL on labs-lvs1 i-00000057 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:04:53] PROBLEM Total Processes is now: CRITICAL on deployment-apache23 i-00000270 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:05:13] PROBLEM Current Load is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:05:48] RECOVERY Current Load is now: OK on aggregator-test3 i-00000293 output: OK - load average: 4.80, 3.29, 4.02 [07:05:48] PROBLEM Current Load is now: WARNING on reportcard2 i-000001ea output: WARNING - load average: 16.64, 15.92, 11.42 [07:05:49] RECOVERY Total Processes is now: OK on reportcard2 i-000001ea output: PROCS OK: 93 processes [07:06:52] RECOVERY dpkg-check is now: OK on reportcard2 i-000001ea output: All packages OK [07:09:15] RECOVERY Disk Space is now: OK on maps-tilemill1 i-00000294 output: DISK OK [07:09:16] PROBLEM Current Load is now: WARNING on maps-tilemill1 i-00000294 output: WARNING - load average: 6.03, 8.71, 9.85 [07:09:16] RECOVERY Current Users is now: OK on maps-tilemill1 i-00000294 output: USERS OK - 0 users currently logged in [07:09:16] RECOVERY Total Processes is now: OK on maps-tilemill1 i-00000294 output: PROCS OK: 110 processes [07:09:21] RECOVERY Free ram is now: OK on maps-tilemill1 i-00000294 output: OK: 86% free memory [07:09:21] RECOVERY dpkg-check is now: OK on maps-tilemill1 i-00000294 output: All packages OK [07:09:21] RECOVERY Free ram is now: OK on labs-lvs1 i-00000057 output: OK: 90% free memory [07:09:21] RECOVERY Total Processes is now: OK on nagios 127.0.0.1 output: PROCS OK: 228 processes [07:09:21] RECOVERY SSH is now: OK on reportcard2 i-000001ea output: SSH OK - OpenSSH_5.8p1 Debian-7ubuntu1 (protocol 2.0) [07:09:21] RECOVERY dpkg-check is now: OK on labs-lvs1 i-00000057 output: All packages OK [07:09:21] RECOVERY Current Load is now: OK on test3 i-00000093 output: OK - load average: 0.82, 1.74, 2.03 [07:09:22] RECOVERY Disk Space is now: OK on test3 i-00000093 output: DISK OK [07:09:22] RECOVERY Current Users is now: OK on test3 i-00000093 output: USERS OK - 0 users currently logged in [07:09:23] RECOVERY Total Processes is now: OK on bots-cb i-0000009e output: PROCS OK: 196 processes [07:09:26] RECOVERY Free ram is now: OK on bots-cb i-0000009e output: OK: 76% free memory [07:09:26] RECOVERY dpkg-check is now: OK on bots-cb i-0000009e output: All packages OK [07:09:26] RECOVERY Current Users is now: OK on rds i-00000207 output: USERS OK - 0 users currently logged in [07:09:26] RECOVERY Disk Space is now: OK on rds i-00000207 output: DISK OK [07:09:26] PROBLEM Current Load is now: WARNING on rds i-00000207 output: WARNING - load average: 4.57, 7.29, 8.26 [07:09:26] RECOVERY Total Processes is now: OK on rds i-00000207 output: PROCS OK: 102 processes [07:09:31] RECOVERY Free ram is now: OK on rds i-00000207 output: OK: 92% free memory [07:09:41] RECOVERY Total Processes is now: OK on test3 i-00000093 output: PROCS OK: 75 processes [07:10:59] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 96% free memory [07:10:59] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 53% free memory [07:10:59] RECOVERY Current Users is now: OK on bots-cb i-0000009e output: USERS OK - 0 users currently logged in [07:10:59] RECOVERY Disk Space is now: OK on bots-cb i-0000009e output: DISK OK [07:10:59] RECOVERY Total Processes is now: OK on bots-sql1 i-000000b5 output: PROCS OK: 84 processes [07:11:19] RECOVERY Total Processes is now: OK on shop-analytics-main i-000001e6 output: PROCS OK: 91 processes [07:11:24] RECOVERY HTTP is now: OK on resourceloader2-apache i-000001d7 output: HTTP OK: HTTP/1.1 200 OK - 453 bytes in 0.016 second response time [07:11:25] RECOVERY dpkg-check is now: OK on test3 i-00000093 output: All packages OK [07:11:25] RECOVERY SSH is now: OK on rds i-00000207 output: SSH OK - OpenSSH_5.8p1 Debian-7ubuntu1 (protocol 2.0) [07:11:25] PROBLEM Current Load is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:25] PROBLEM dpkg-check is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:25] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:25] PROBLEM Current Users is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:26] PROBLEM Disk Space is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:27] PROBLEM Free ram is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:27] PROBLEM Total Processes is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:30] PROBLEM Free ram is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:30] PROBLEM Current Users is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:30] PROBLEM Disk Space is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:30] PROBLEM Total Processes is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:35] PROBLEM dpkg-check is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:13:42] RECOVERY Total Processes is now: OK on deployment-apache23 i-00000270 output: PROCS OK: 137 processes [07:13:59] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 9.05, 10.26, 18.23 [07:14:04] RECOVERY dpkg-check is now: OK on incubator-bot1 i-00000251 output: All packages OK [07:17:47] PROBLEM Current Load is now: WARNING on swift-be2 i-000001c8 output: WARNING - load average: 6.20, 7.21, 7.70 [07:17:47] PROBLEM Current Load is now: WARNING on deployment-apache23 i-00000270 output: WARNING - load average: 5.17, 5.78, 6.48 [07:17:47] PROBLEM Current Load is now: WARNING on swift-be4 i-000001ca output: WARNING - load average: 6.37, 7.42, 7.34 [07:17:57] RECOVERY Current Users is now: OK on migration1 i-00000261 output: USERS OK - 0 users currently logged in [07:17:57] RECOVERY dpkg-check is now: OK on migration1 i-00000261 output: All packages OK [07:17:57] PROBLEM Current Load is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:18:02] PROBLEM Current Load is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:18:15] PROBLEM Current Load is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:21:59] PROBLEM Current Load is now: CRITICAL on dumps-1 i-00000170 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:22:41] RECOVERY Current Users is now: OK on labs-lvs1 i-00000057 output: USERS OK - 0 users currently logged in [07:22:41] RECOVERY Current Load is now: OK on labs-lvs1 i-00000057 output: OK - load average: 0.48, 1.98, 4.00 [07:22:41] PROBLEM Current Load is now: WARNING on precise-test i-00000231 output: WARNING - load average: 5.86, 7.14, 8.05 [07:22:41] RECOVERY dpkg-check is now: OK on mobile-testing i-00000271 output: All packages OK [07:22:41] RECOVERY Current Users is now: OK on precise-test i-00000231 output: USERS OK - 0 users currently logged in [07:22:41] RECOVERY Free ram is now: OK on precise-test i-00000231 output: OK: 83% free memory [07:22:42] RECOVERY Disk Space is now: OK on precise-test i-00000231 output: DISK OK [07:22:42] RECOVERY Total Processes is now: OK on precise-test i-00000231 output: PROCS OK: 88 processes [07:22:55] RECOVERY Total Processes is now: OK on labs-lvs1 i-00000057 output: PROCS OK: 79 processes [07:23:15] RECOVERY Disk Space is now: OK on labs-lvs1 i-00000057 output: DISK OK [07:23:15] RECOVERY SSH is now: OK on bots-cb i-0000009e output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [07:23:15] RECOVERY dpkg-check is now: OK on precise-test i-00000231 output: All packages OK [07:23:15] RECOVERY Free ram is now: OK on mobile-testing i-00000271 output: OK: 69% free memory [07:23:15] PROBLEM Current Load is now: WARNING on migration1 i-00000261 output: WARNING - load average: 0.54, 3.37, 6.42 [07:23:15] RECOVERY Total Processes is now: OK on mobile-testing i-00000271 output: PROCS OK: 238 processes [07:23:40] RECOVERY Disk Space is now: OK on migration1 i-00000261 output: DISK OK [07:24:09] RECOVERY Free ram is now: OK on migration1 i-00000261 output: OK: 79% free memory [07:24:10] RECOVERY Total Processes is now: OK on migration1 i-00000261 output: PROCS OK: 86 processes [07:24:55] RECOVERY Total Processes is now: OK on incubator-bot2 i-00000252 output: PROCS OK: 123 processes [07:25:27] PROBLEM Current Load is now: WARNING on incubator-bot2 i-00000252 output: WARNING - load average: 7.35, 7.69, 7.71 [07:25:27] RECOVERY dpkg-check is now: OK on deployment-thumbproxy i-0000026b output: All packages OK [07:25:27] PROBLEM Current Load is now: WARNING on pybal-precise i-00000289 output: WARNING - load average: 3.19, 4.28, 6.06 [07:25:27] RECOVERY Disk Space is now: OK on pybal-precise i-00000289 output: DISK OK [07:25:27] RECOVERY Free ram is now: OK on pybal-precise i-00000289 output: OK: 80% free memory [07:25:27] RECOVERY Total Processes is now: OK on pybal-precise i-00000289 output: PROCS OK: 92 processes [07:25:37] RECOVERY Current Users is now: OK on pybal-precise i-00000289 output: USERS OK - 0 users currently logged in [07:27:06] RECOVERY Disk Space is now: OK on mobile-testing i-00000271 output: DISK OK [07:27:07] RECOVERY Current Users is now: OK on mobile-testing i-00000271 output: USERS OK - 0 users currently logged in [07:27:07] RECOVERY Current Load is now: OK on deployment-apache23 i-00000270 output: OK - load average: 0.96, 1.56, 3.87 [07:27:07] RECOVERY Current Load is now: OK on swift-be4 i-000001ca output: OK - load average: 2.71, 2.61, 4.77 [07:27:07] RECOVERY Current Load is now: OK on dumps-1 i-00000170 output: OK - load average: 1.91, 3.16, 4.89 [07:29:31] PROBLEM Current Load is now: WARNING on upload-wizard i-0000021c output: WARNING - load average: 5.46, 6.88, 6.15 [07:29:41] RECOVERY Current Load is now: OK on migration1 i-00000261 output: OK - load average: 0.28, 1.39, 4.71 [07:29:47] PROBLEM dpkg-check is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:54] PROBLEM Current Load is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:30:26] RECOVERY dpkg-check is now: OK on pybal-precise i-00000289 output: All packages OK [07:31:44] RECOVERY Current Load is now: OK on swift-be2 i-000001c8 output: OK - load average: 0.35, 2.31, 4.96 [07:32:39] RECOVERY dpkg-check is now: OK on incubator-bot2 i-00000252 output: All packages OK [07:32:39] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 76% free memory [07:32:39] RECOVERY Total Processes is now: OK on bots-sql2 i-000000af output: PROCS OK: 93 processes [07:32:44] RECOVERY dpkg-check is now: OK on bots-sql2 i-000000af output: All packages OK [07:32:45] RECOVERY SSH is now: OK on bots-sql2 i-000000af output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [07:33:24] PROBLEM Current Load is now: WARNING on ganglia-test2 i-00000250 output: WARNING - load average: 1.53, 5.23, 6.67 [07:35:15] RECOVERY Current Load is now: OK on pybal-precise i-00000289 output: OK - load average: 0.11, 1.78, 3.90 [07:37:36] RECOVERY Current Load is now: OK on upload-wizard i-0000021c output: OK - load average: 0.08, 1.56, 3.81 [07:37:36] RECOVERY Current Load is now: OK on worker1 i-00000208 output: OK - load average: 0.10, 2.25, 4.24 [07:37:36] RECOVERY Current Users is now: OK on worker1 i-00000208 output: USERS OK - 0 users currently logged in [07:37:36] RECOVERY Disk Space is now: OK on worker1 i-00000208 output: DISK OK [07:37:36] RECOVERY Free ram is now: OK on worker1 i-00000208 output: OK: 91% free memory [07:37:36] RECOVERY Total Processes is now: OK on worker1 i-00000208 output: PROCS OK: 90 processes [07:37:41] RECOVERY Free ram is now: OK on reportcard2 i-000001ea output: OK: 85% free memory [07:37:42] RECOVERY Current Users is now: OK on reportcard2 i-000001ea output: USERS OK - 0 users currently logged in [07:37:42] RECOVERY Disk Space is now: OK on reportcard2 i-000001ea output: DISK OK [07:41:42] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 0.48, 5.96, 16.72 [07:43:22] RECOVERY Current Load is now: OK on ganglia-test2 i-00000250 output: OK - load average: 0.34, 1.25, 3.80 [07:46:52] RECOVERY Current Load is now: OK on bots-apache1 i-000000b0 output: OK - load average: 3.17, 3.28, 4.49 [07:52:38] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 5.80, 2.25, 3.58 [07:57:36] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 2.27, 1.71, 2.93 [08:06:57] RECOVERY Current Load is now: OK on mobile-testing i-00000271 output: OK - load average: 1.28, 1.11, 4.01 [08:14:56] !log deployment-prep hashar: raising mysql max_connect_errors (bug 37173) [08:16:07] !log deployment-prep hashar: restarting mysql [08:33:01] !log deployment-prep hashar: raised max_connect_errors to 10000 and running FLUSH HOSTS; for bug 37173 [08:37:09] 05/29/2012 - 08:37:09 - Updating keys for ashley at /export/home/hackathon/ashley [08:37:16] 05/29/2012 - 08:37:16 - Updating keys for ashley at /export/home/bastion/ashley [09:07:42] !log deployment-prep hashar: Banned all Amazon Elastic Cloud IP ranges at squid level. Crawlers / spiders there are hammering the beta cluster. [09:10:39] Probably a good idea to block tor etc :P [09:12:08] yup Tor will probably be the next [09:12:19] could do it if you find out a list of Tor output nodes [09:12:28] coffee break [09:18:46] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 17% free memory [09:23:46] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 21% free memory [09:53:21] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [10:41:25] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 19% free memory [10:57:25] mutante: can you arrange to get your first name showing up on your hackaton badge ? [10:57:41] mutante: cause I still cant remember if it is Daniel or David ;-D [11:02:57] hashar: only 48 hours until the first hackatonian code shall be written [11:05:56] yeah [11:06:02] still have to write a plan of action [12:16:11] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 11.93, 10.52, 5.71 [12:21:44] !log bots restarting labs-morebots on bots-2 [12:21:45] bots is not a valid project. [12:21:46] Logged the message, Master [12:21:54] !log bot restarting labs-morebots on bots-2 [12:21:54] bot is not a valid project. [12:21:54] bot is not a valid project. [12:22:00] eh... [12:22:07] lol wut [12:22:13] I restarted it first before [12:22:28] * Hydriz pushes this for mutante to fix [12:23:37] eh, didnt you just ask me to do that? [12:23:40] ok [12:24:03] lol I was asking for the logging :P [12:24:08] but nevermind [12:24:17] it doesn't seem working? [12:24:25] !log foo [12:24:25] Message missing. Nothing logged. [12:24:32] !log foo foo [12:24:32] foo is not a valid project. [12:24:38] !log wikistats foo [12:24:39] Logged the message, Master [12:24:48] looks normal to me [12:24:55] hmm, seems like that was a conflict earlier then :P [12:25:02] yeah, i guess so [12:33:56] PROBLEM Current Load is now: CRITICAL on localpuppet2 i-0000029b output: Connection refused by host [12:34:36] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 6.38, 5.74, 5.74 [12:34:46] PROBLEM Current Users is now: CRITICAL on localpuppet2 i-0000029b output: Connection refused by host [12:35:21] PROBLEM Disk Space is now: CRITICAL on localpuppet2 i-0000029b output: Connection refused by host [12:35:46] PROBLEM Free ram is now: CRITICAL on localpuppet2 i-0000029b output: Connection refused by host [12:37:16] PROBLEM Total Processes is now: CRITICAL on localpuppet2 i-0000029b output: Connection refused by host [12:37:46] PROBLEM dpkg-check is now: CRITICAL on localpuppet2 i-0000029b output: Connection refused by host [12:37:57] RECOVERY Current Users is now: OK on mwreview-foo i-0000029a output: USERS OK - 1 users currently logged in [12:37:57] RECOVERY Disk Space is now: OK on mwreview-foo i-0000029a output: DISK OK [12:37:57] RECOVERY Free ram is now: OK on mwreview-foo i-0000029a output: OK: 92% free memory [12:38:46] RECOVERY Total Processes is now: OK on mwreview-foo i-0000029a output: PROCS OK: 84 processes [12:38:51] RECOVERY dpkg-check is now: OK on mwreview-foo i-0000029a output: All packages OK [12:41:09] RECOVERY Current Load is now: OK on mwreview-foo i-0000029a output: OK - load average: 1.60, 1.23, 0.68 [12:41:57] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [12:48:52] RECOVERY Current Load is now: OK on localpuppet2 i-0000029b output: OK - load average: 1.30, 1.02, 1.07 [12:50:20] RECOVERY Current Users is now: OK on localpuppet2 i-0000029b output: USERS OK - 1 users currently logged in [12:50:40] RECOVERY Free ram is now: OK on localpuppet2 i-0000029b output: OK: 87% free memory [12:51:43] RECOVERY Disk Space is now: OK on localpuppet2 i-0000029b output: DISK OK [12:52:13] RECOVERY Total Processes is now: OK on localpuppet2 i-0000029b output: PROCS OK: 84 processes [12:52:43] RECOVERY dpkg-check is now: OK on localpuppet2 i-0000029b output: All packages OK [13:06:43] !log bots petrb: patching wm-bot [13:06:44] Logged the message, Master [13:07:32] thank you, its flooding irc.wikimedia.org [13:07:35] :) [13:15:11] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 3.34, 4.19, 4.96 [13:20:30] Hydriz: what [13:20:37] who is flooding irc [13:21:11] wm-bot on irc.wm.o [13:21:13] how [13:21:25] join / part [13:21:26] maybe [13:21:32] because I was restarting my test [13:21:35] like 40 times [13:21:36] its quitting and joining and quitting and joining... [13:21:36] yeah [13:21:41] that's ok [13:22:01] but at least I am using my own bot for some of the channels :) [13:22:13] you can remove the incubator.wikimedia entry actually [13:22:22] why... [13:22:27] it doesn't matter to bot [13:22:35] I will remove channels only when projects are closed [13:22:55] maybe some other channel is using that feed [13:23:14] okie [13:26:19] RECOVERY Current Load is now: OK on mobile-testing i-00000271 output: OK - load average: 3.50, 3.70, 4.99 [13:35:44] PROBLEM dpkg-check is now: CRITICAL on localpuppet2 i-0000029b output: DPKG CRITICAL dpkg reports broken packages [13:43:47] PROBLEM Puppet freshness is now: CRITICAL on nova-precise1 i-00000236 output: Puppet has not run in last 20 hours [13:48:04] 05/29/2012 - 13:48:04 - Updating keys for mah at /export/home/wikistats/mah [13:48:17] 05/29/2012 - 13:48:16 - Updating keys for mah at /export/home/bastion/mah [13:48:21] PROBLEM Puppet freshness is now: CRITICAL on nova-essex-test i-000001f9 output: Puppet has not run in last 20 hours [13:48:21] 05/29/2012 - 13:48:21 - Updating keys for mah at /export/home/deployment-prep/mah [13:53:35] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 5.10, 5.20, 5.06 [13:54:03] @infobot-share-on [13:54:03] Channel was configured to share the db [13:54:15] @infobot-share-trust+ #wikimedia-operations [13:54:15] You inserted channel #wikimedia-operations to shared db [13:54:18] hm [13:57:04] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 20% free memory [13:58:24] PROBLEM Puppet freshness is now: CRITICAL on nova-production1 i-0000007b output: Puppet has not run in last 20 hours [14:08:14] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 3.96, 4.13, 4.69 [14:21:23] !log deployment prep killing puppet on jobrunner boxes [14:21:24] deployment is not a valid project. [14:23:15] !log depoyment-prep killing stalled apt-get / puppet-agent runs on jobrunner02 and 03 [14:23:16] depoyment-prep is not a valid project. [14:25:21] 05/29/2012 - 14:25:20 - Updating keys for laner at /export/home/deployment-prep/laner [14:25:23] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 17% free memory [14:30:23] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 21% free memory [14:31:03] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [14:35:23] PROBLEM Puppet freshness is now: CRITICAL on mailman-01 i-00000235 output: Puppet has not run in last 20 hours [14:38:01] @help [14:38:01] Type @commands for list of commands. This bot is running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 1.3.6 source code licensed under GPL and located at https://github.com/benapetr/wikimedia-bot [14:46:54] !log bastion creating new project mediawiki-custom-de [14:46:55] Logged the message, Master [14:47:49] 05/29/2012 - 14:47:49 - Creating a project directory for mediawiki-custom-de [14:47:59] !log mediawiki-custom-de adding member and admin Kai_Nissen_(WMDE) [14:47:59] mediawiki-custom-de is not a valid project. [14:48:05] meh [14:48:50] Failed to add Kai_Nissen_(WMDE) to mediawiki-custom-de. [14:49:01] maybe it's related to mediawiki-custom-de is not a valid [14:49:03] :D [14:49:19] bot reads ldap and if it's not in ldap yet, it doesn't exist [14:50:00] Successfully added Kai Nissen (WMDE) to mediawiki-custom-de. [14:50:11] ah, no, looks related to underscore vs. space :) [14:50:19] user gave it to me with the underscores [14:50:41] !log mediawiki-custom-de adding member and admin Kai Nissen (WMDE) [14:50:41] mediawiki-custom-de is not a valid project. [14:50:48] 05/29/2012 - 14:50:48 - Creating a home directory for knissen at /export/home/mediawiki-custom-de/knissen [14:51:48] 05/29/2012 - 14:51:48 - Updating keys for knissen at /export/home/mediawiki-custom-de/knissen [14:54:20] 05/29/2012 - 14:54:20 - Updating keys for laner at /export/home/deployment-prep/laner [14:54:22] New patchset: Faidon; "puppetmaster::self: hack around a git/SSH issue" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9252 [14:54:39] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9252 [14:54:39] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9252 [14:55:06] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9252 [14:55:34] !log mediawiki-custom-de adding member and admin Kai Nissen (WMDE) [14:55:35] Logged the message, Master [14:55:38] woot [14:55:42] mutante: it works [14:55:55] some kind of ldap lag [14:56:07] or maybe the bot is lagging heh [15:07:18] New review: Dzahn; "yea, reasonable" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6471 [15:07:20] Change merged: Dzahn; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6471 [15:13:00] New patchset: Hashar; "ignore /private not private" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9253 [15:13:15] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9253 [15:15:06] New review: Dzahn; "ack, / not ." [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9253 [15:15:08] Change merged: Dzahn; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9253 [15:20:50] New patchset: Faidon; "Fix templatedir for puppetmaster::self" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9254 [15:21:05] New patchset: Faidon; "puppetmaster::self: add a few requires" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9255 [15:21:20] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9254 [15:21:21] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9255 [15:22:29] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9255 [15:22:35] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9254 [15:22:38] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9255 [15:22:39] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9254 [15:25:47] 05/29/2012 - 15:25:47 - Creating a home directory for otto at /export/home/mwreview/otto [15:26:49] 05/29/2012 - 15:26:48 - Updating keys for otto at /export/home/mwreview/otto [15:29:40] AARGH FINALLY [15:29:50] puppetmaster::self works [15:29:53] gah [15:37:16] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 15% free memory [16:08:00] New patchset: Andrew Bogott; "Move mysql datadir to /mnt/mysql, and try to actually move the data there." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9260 [16:08:14] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9260 [16:20:06] New patchset: Andrew Bogott; "Move mysql datadir to /mnt/mysql, and try to actually move the data there." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9260 [16:20:23] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9260 [16:23:24] New patchset: Andrew Bogott; "Move mysql datadir to /mnt/mysql, and try to actually move the data there." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9260 [16:23:39] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9260 [16:23:49] 05/29/2012 - 16:23:49 - Updating keys for knissen at /export/home/mediawiki-custom-de/knissen [16:24:22] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9260 [16:24:24] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9260 [16:32:27] PROBLEM dpkg-check is now: CRITICAL on mwreview-foo i-0000029c output: Connection refused by host [16:32:31] PROBLEM Current Load is now: CRITICAL on mwreview-foo i-0000029c output: Connection refused by host [16:32:31] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 7.78, 7.41, 6.39 [16:33:40] PROBLEM Disk Space is now: CRITICAL on mwreview-foo i-0000029c output: Connection refused by host [16:33:40] PROBLEM Current Users is now: CRITICAL on mwreview-foo i-0000029c output: Connection refused by host [16:34:02] PROBLEM Free ram is now: CRITICAL on mwreview-foo i-0000029c output: Connection refused by host [16:34:02] PROBLEM Total Processes is now: CRITICAL on mwreview-foo i-0000029c output: Connection refused by host [16:55:56] New patchset: Andrew Bogott; "Qualify/the/path/to/test." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9263 [16:56:12] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9263 [16:58:28] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9263 [16:58:30] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9263 [17:12:30] RECOVERY dpkg-check is now: OK on mwreview-foo i-0000029c output: All packages OK [17:12:50] RECOVERY Current Load is now: OK on mwreview-foo i-0000029c output: OK - load average: 1.28, 1.21, 0.99 [17:13:32] New patchset: Sara; "Don't bother applying ganglia::collector role to hooft.esams or streber, as neither of these actually run gmetad." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9265 [17:13:48] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9265 [17:13:50] RECOVERY Current Users is now: OK on mwreview-foo i-0000029c output: USERS OK - 1 users currently logged in [17:13:50] RECOVERY Disk Space is now: OK on mwreview-foo i-0000029c output: DISK OK [17:14:10] RECOVERY Free ram is now: OK on mwreview-foo i-0000029c output: OK: 90% free memory [17:14:24] RECOVERY Total Processes is now: OK on mwreview-foo i-0000029c output: PROCS OK: 94 processes [17:16:56] PROBLEM HTTP is now: CRITICAL on s1tiny i-00000277 output: CRITICAL - Socket timeout after 10 seconds [17:17:55] New review: Sara; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9265 [17:17:58] Change merged: Sara; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9265 [17:23:17] New patchset: Andrew Bogott; "Try to tidy up the order of db dir creation so this works the first time." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9266 [17:23:33] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9266 [17:36:55] PROBLEM Total Processes is now: CRITICAL on mwreview-bar i-0000029d output: Connection refused by host [17:37:29] PROBLEM dpkg-check is now: CRITICAL on mwreview-bar i-0000029d output: Connection refused by host [17:42:49] PROBLEM Current Load is now: CRITICAL on mwreview-bar i-0000029d output: Connection refused by host [17:43:29] PROBLEM Current Users is now: CRITICAL on mwreview-bar i-0000029d output: Connection refused by host [17:44:09] PROBLEM Disk Space is now: CRITICAL on mwreview-bar i-0000029d output: Connection refused by host [17:45:41] PROBLEM Free ram is now: CRITICAL on mwreview-bar i-0000029d output: Connection refused by host [17:46:41] PROBLEM host: m1small is DOWN address: i-0000029e check_ping: Invalid hostname/address - i-0000029e [17:53:44] PROBLEM dpkg-check is now: CRITICAL on s1tiny i-0000029f output: Connection refused by host [17:54:34] PROBLEM Current Load is now: CRITICAL on s1tiny i-0000029f output: Connection refused by host [17:54:44] 05/29/2012 - 17:54:44 - Updating keys for mshavlovsky at /export/home/blamemaps/mshavlovsky [17:55:11] PROBLEM Current Users is now: CRITICAL on s1tiny i-0000029f output: Connection refused by host [17:55:15] 05/29/2012 - 17:55:15 - Updating keys for mshavlovsky at /export/home/bastion/mshavlovsky [17:55:44] PROBLEM Disk Space is now: CRITICAL on s1tiny i-0000029f output: Connection refused by host [17:56:19] PROBLEM Free ram is now: CRITICAL on s1tiny i-0000029f output: Connection refused by host [17:56:46] 05/29/2012 - 17:56:45 - Updating keys for mshavlovsky at /export/home/blamemaps/mshavlovsky [17:57:04] PROBLEM HTTP is now: CRITICAL on s1tiny i-0000029f output: CRITICAL - Socket timeout after 10 seconds [17:57:17] 05/29/2012 - 17:57:16 - Updating keys for mshavlovsky at /export/home/bastion/mshavlovsky [17:58:14] PROBLEM Total Processes is now: CRITICAL on s1tiny i-0000029f output: Connection refused by host [18:21:55] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner05 i-0000028c output: DPKG CRITICAL dpkg reports broken packages [18:36:55] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 17% free memory [18:53:29] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9266 [18:53:32] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9266 [18:56:56] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 22% free memory [18:58:25] RECOVERY Total Processes is now: OK on mwreview-bar i-0000029d output: PROCS OK: 86 processes [18:58:45] RECOVERY dpkg-check is now: OK on mwreview-bar i-0000029d output: All packages OK [18:59:00] RECOVERY Current Load is now: OK on mwreview-bar i-0000029d output: OK - load average: 1.21, 0.66, 0.33 [18:59:00] RECOVERY Current Users is now: OK on mwreview-bar i-0000029d output: USERS OK - 1 users currently logged in [18:59:25] RECOVERY Disk Space is now: OK on mwreview-bar i-0000029d output: DISK OK [19:00:46] RECOVERY Free ram is now: OK on mwreview-bar i-0000029d output: OK: 86% free memory [19:04:56] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 17% free memory [19:32:02] RECOVERY dpkg-check is now: OK on deployment-jobrunner05 i-0000028c output: All packages OK [19:53:27] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [19:54:18] I need some help - I created a Labs user using the labsconsole Special:Createuser page, and I realize that I entered an email address with a typo. Can I change this via modify-LDAP-user or some other way? [19:54:49] mutante: petan ssmollett - maybe one of you can help me? [19:55:15] hi sumanah [19:55:18] hi paravoid! [19:55:20] I have like 5' but let me try [19:55:20] <^demon|busy> sumanah: modify-ldap-user is mildly borked, and I don't think we can change e-mails from it anyway. [19:55:46] <^demon|busy> quickest solution would be to do it on the database. [19:56:14] database? [19:56:16] isn't it on ldap? [19:56:35] <^demon|busy> Actually, yeah. Ignore me. [19:56:47] * ^demon|busy returns his peanuts and leaves the gallery. [19:56:53] sumanah: give me the username or wrong email and the right email [19:57:15] paravoid: ok, will pm [20:03:29] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 20% free memory [20:05:25] done, fixed via modify-ldap-user. [20:07:39] @infobot-link blah [20:07:39] Permission denied [20:11:21] @trustadd .*@wikimedia/Krinkle admin [20:11:21] Successfuly added .*@wikimedia/Krinkle [20:17:40] New patchset: Andrew Bogott; "Rearrange dependencies a bit." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9330 [20:17:56] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9330 [20:18:02] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9330 [20:18:04] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9330 [20:26:15] !log bots jeremyb: [i-000000a9] added hostmasks for krinkle and jeremyb to admin config. killed the mono proc and then killed the sleep. (didn't kill the restart.sh) [20:26:18] Logged the message, Master [20:26:40] petan|wk: if you are still around [20:26:50] petan|wk: I will update beta to latest master tomorrow [20:27:07] I will do the exts too [20:27:19] yes [20:27:30] ok [20:28:54] https://bugzilla.wikimedia.org/show_bug.cgi?id=37199 ;) [20:30:08] then I will get a week of vacations [20:30:11] and write docs !! ;-] [20:31:00] ok [20:31:19] I go to hackaton this week, that's my vacation :D [20:31:52] hah [20:32:12] I hope people are going to do other stuff there than just coding... like being in a pub XD [20:32:54] I will finally see a living Ryan Lane :D [20:33:14] Hey all :) I'm in need of a public test wiki to have a branch of my extension live on, is that something tbest done under the Lbas umbrella or some other way? [20:33:51] PROBLEM Total Processes is now: CRITICAL on mwreview-qux i-000002a0 output: Connection refused by host [20:34:03] *best **labs [20:34:31] PROBLEM dpkg-check is now: CRITICAL on mwreview-qux i-000002a0 output: Connection refused by host [20:35:37] paravoid: you rocks :-] [20:35:51] PROBLEM Current Load is now: CRITICAL on mwreview-qux i-000002a0 output: Connection refused by host [20:36:21] PROBLEM Current Users is now: CRITICAL on mwreview-qux i-000002a0 output: Connection refused by host [20:37:11] PROBLEM Disk Space is now: CRITICAL on mwreview-qux i-000002a0 output: Connection refused by host [20:37:41] PROBLEM Free ram is now: CRITICAL on mwreview-qux i-000002a0 output: Connection refused by host [20:39:48] !log deployment-prep manually running puppet on jobrunner-05 [20:39:50] Logged the message, Master [20:41:10] !log deployment-prep rebooting jobrunner05 following some package installs made earlier by puppet [20:41:11] Logged the message, Master [20:41:19] 05/29/2012 - 20:41:19 - Updating keys for laner at /export/home/deployment-prep/laner [20:43:49] PROBLEM Current Load is now: CRITICAL on blamemaps-m1small i-000002a1 output: Connection refused by host [20:44:29] PROBLEM Current Users is now: CRITICAL on blamemaps-m1small i-000002a1 output: Connection refused by host [20:45:09] PROBLEM Disk Space is now: CRITICAL on blamemaps-m1small i-000002a1 output: Connection refused by host [20:45:49] PROBLEM Free ram is now: CRITICAL on blamemaps-m1small i-000002a1 output: Connection refused by host [20:45:49] RECOVERY Current Load is now: OK on mwreview-qux i-000002a0 output: OK - load average: 1.94, 1.64, 1.31 [20:46:19] RECOVERY Current Users is now: OK on mwreview-qux i-000002a0 output: USERS OK - 1 users currently logged in [20:46:29] PROBLEM HTTP is now: CRITICAL on blamemaps-m1small i-000002a1 output: CRITICAL - Socket timeout after 10 seconds [20:47:09] RECOVERY Disk Space is now: OK on mwreview-qux i-000002a0 output: DISK OK [20:47:19] Jarry1250: the public part of that request is the issue.. [20:47:39] PROBLEM host: s1tiny is DOWN address: i-0000029f check_ping: Invalid hostname/address - i-0000029f [20:47:39] PROBLEM Total Processes is now: CRITICAL on blamemaps-m1small i-000002a1 output: Connection refused by host [20:48:19] PROBLEM dpkg-check is now: CRITICAL on blamemaps-m1small i-000002a1 output: Connection refused by host [20:48:25] Reedy: Well, only the login screen need be public. Basically I want to get people to try it out at Berlin. [20:49:06] Doesn't make any difference [20:49:17] Public means proxying and/or public ip address [20:49:40] You might be able to bribe Ryan [20:50:16] Yes. Is it possible to allow people a network connection to my laptop whilst I'm there? [20:50:32] To my localhost, I mean, over the network. [20:51:29] Jarry1250: anyone with git can do that [20:51:35] !proxy [20:51:40] !socks-proxy [20:51:40] ssh @bastion.wmflabs.orgĀ -D ; # [20:51:47] !putty [20:51:47] official site: http://www.chiark.greenend.org.uk/~sgtatham/putty/ | how to tunnel - http://oldsite.precedence.co.uk/nc/putty.html [20:52:21] 05/29/2012 - 20:52:21 - Updating keys for laner at /export/home/deployment-prep/laner [20:53:14] Jarry1250: as long as the internal network isn't restrictive, and you check the apache bindings and your firewall, it should be fine [20:53:28] http://192.168.100.XXX/wiki/ [20:53:49] Reedy: kk, I'll have to give it a go. [20:53:50] RECOVERY Total Processes is now: OK on mwreview-qux i-000002a0 output: PROCS OK: 84 processes [20:53:56] Not Found [20:53:56] [20:53:56] The requested URL /wiki/ was not found on this server. [20:53:56] [20:53:56] Apache/2.2.0 (Fedora) Server at 192.168.100.xxx Port 80 [20:54:03] How did that work? [20:54:03] * Reedy beats vvv [20:54:22] Jarry1250: should be enough people around to help you do that if you get stuck ;) [20:54:40] RECOVERY dpkg-check is now: OK on mwreview-qux i-000002a0 output: All packages OK [20:54:51] vvv: .xxx? NSFW mate :P [20:55:00] Reedy: Haha, yup :) [20:55:26] Name: 192.168.100.xxx [20:55:26] Address: 99.192.180.243 [20:55:33] So, this is a legit domain name [20:55:42] <^demon|busy> .xxx is such a joke. [20:57:40] RECOVERY Free ram is now: OK on mwreview-qux i-000002a0 output: OK: 92% free memory [20:59:46] New patchset: Andrew Bogott; "Revert "Rearrange dependencies a bit."" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9337 [21:00:04] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9337 [21:00:04] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9337 [21:06:15] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 13% free memory [21:17:07] !log deployment-prep Fixed Amazon Elastic Cloud ban. Properly fixing {{bug|37173}} hopefully [21:17:08] Logged the message, Master [21:24:58] PROBLEM Current Users is now: CRITICAL on ganglia-test4 i-000002a2 output: Connection refused by host [21:25:18] PROBLEM Disk Space is now: CRITICAL on ganglia-test4 i-000002a2 output: Connection refused by host [21:25:53] PROBLEM Free ram is now: CRITICAL on ganglia-test4 i-000002a2 output: Connection refused by host [21:26:18] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 22% free memory [21:27:03] PROBLEM Total Processes is now: CRITICAL on ganglia-test4 i-000002a2 output: Connection refused by host [21:27:09] doesn't everyone with a gerrit account appear on https://labsconsole.wikimedia.org/wiki/Special:RecentChanges ? [21:28:04] PROBLEM dpkg-check is now: CRITICAL on ganglia-test4 i-000002a2 output: Connection refused by host [21:28:44] PROBLEM Current Load is now: CRITICAL on ganglia-test4 i-000002a2 output: Connection refused by host [21:38:38] Hi [21:39:55] hi guys [21:40:04] does anyone here know why /var/run is symlinked to /run on labs instances? [21:41:21] because it's that way upstream? [21:41:47] hm, i see [21:41:52] changed in precise (or whatever)? [21:42:14] ok, so it isn't a labs specific thing [21:42:19] just a newer ubuntu thing [21:42:27] that was a move done by several distros [21:42:31] makes apparmor stuff pretty annoying [21:42:39] so that /run could be created at startup as a tmpfs [21:42:48] aye, makes sense [21:42:51] ok thank youuu! [21:43:01] see http://lists.fedoraproject.org/pipermail/devel/2011-March/150031.html [21:43:01] http://wiki.debian.org/ReleaseGoals/RunDirectory [21:44:53] * jeremyb is back, was on phone [21:46:55] New patchset: Andrew Bogott; "Move socket and pid_file for labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9342 [21:47:10] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9342 [21:48:12] New review: Ottomata; "(no comment)" [operations/puppet] (test) C: 1; - https://gerrit.wikimedia.org/r/9342 [21:51:15] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 14% free memory [22:07:27] RECOVERY Total Processes is now: OK on ganglia-test4 i-000002a2 output: PROCS OK: 87 processes [22:07:57] RECOVERY dpkg-check is now: OK on ganglia-test4 i-000002a2 output: All packages OK [22:08:47] RECOVERY Current Load is now: OK on ganglia-test4 i-000002a2 output: OK - load average: 0.11, 0.24, 0.47 [22:10:17] RECOVERY Current Users is now: OK on ganglia-test4 i-000002a2 output: USERS OK - 1 users currently logged in [22:10:17] RECOVERY Disk Space is now: OK on ganglia-test4 i-000002a2 output: DISK OK [22:11:07] RECOVERY Free ram is now: OK on ganglia-test4 i-000002a2 output: OK: 92% free memory [22:15:57] PROBLEM dpkg-check is now: CRITICAL on ganglia-test4 i-000002a2 output: DPKG CRITICAL dpkg reports broken packages [22:18:07] PROBLEM Puppet freshness is now: CRITICAL on deployment-apache21 i-0000026d output: Puppet has not run in last 20 hours [22:26:41] New patchset: Jeremyb; "Move socket and pid_file for labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9342 [22:26:53] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (test); V: -1 - https://gerrit.wikimedia.org/r/9342 [22:29:23] New patchset: Jeremyb; "Move socket and pid_file for labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9342 [22:30:00] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9342 [22:31:06] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9342 [22:31:08] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9342 [22:42:26] !credentials [22:42:26] when you see No Nova credentials found for your account just relog to wiki and should be ok [22:49:13] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 3.65, 11.43, 6.33 [22:53:46] PROBLEM Current Load is now: CRITICAL on mwreview-baz i-000002a3 output: Connection refused by host [22:54:06] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.47, 4.44, 4.68 [22:54:36] PROBLEM Current Users is now: CRITICAL on mwreview-baz i-000002a3 output: Connection refused by host [22:55:06] PROBLEM Disk Space is now: CRITICAL on mwreview-baz i-000002a3 output: Connection refused by host [22:55:46] PROBLEM Free ram is now: CRITICAL on mwreview-baz i-000002a3 output: Connection refused by host [22:56:56] PROBLEM Total Processes is now: CRITICAL on mwreview-baz i-000002a3 output: Connection refused by host [22:57:36] PROBLEM dpkg-check is now: CRITICAL on mwreview-baz i-000002a3 output: Connection refused by host [23:18:46] RECOVERY Current Load is now: OK on mwreview-baz i-000002a3 output: OK - load average: 1.70, 1.70, 1.43 [23:19:37] RECOVERY Current Users is now: OK on mwreview-baz i-000002a3 output: USERS OK - 1 users currently logged in [23:20:45] RECOVERY Free ram is now: OK on mwreview-baz i-000002a3 output: OK: 85% free memory [23:21:55] RECOVERY Total Processes is now: OK on mwreview-baz i-000002a3 output: PROCS OK: 94 processes [23:22:40] RECOVERY dpkg-check is now: OK on mwreview-baz i-000002a3 output: All packages OK [23:25:25] RECOVERY Disk Space is now: OK on mwreview-baz i-000002a3 output: DISK OK [23:35:16] 05/29/2012 - 23:35:16 - Updating keys for mshavlovsky at /export/home/bastion/mshavlovsky [23:35:26] PROBLEM host: dontblameme is DOWN address: i-000002a4 check_ping: Invalid hostname/address - i-000002a4 [23:36:16] 05/29/2012 - 23:36:16 - Updating keys for mshavlovsky at /export/home/bastion/mshavlovsky [23:36:46] 05/29/2012 - 23:36:46 - Updating keys for mshavlovsky at /export/home/blamemaps/mshavlovsky [23:44:16] PROBLEM Puppet freshness is now: CRITICAL on nova-precise1 i-00000236 output: Puppet has not run in last 20 hours [23:49:16] PROBLEM Puppet freshness is now: CRITICAL on nova-essex-test i-000001f9 output: Puppet has not run in last 20 hours [23:59:15] PROBLEM Puppet freshness is now: CRITICAL on nova-production1 i-0000007b output: Puppet has not run in last 20 hours