[00:37:50] PROBLEM Current Load is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [00:37:50] PROBLEM Current Users is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [00:39:03] PROBLEM Disk Space is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [00:39:03] PROBLEM Free ram is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [00:39:03] PROBLEM Total Processes is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [00:42:24] PROBLEM Disk Space is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [00:42:24] PROBLEM Current Users is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [00:43:14] RECOVERY Disk Space is now: OK on upload-wizard i-0000021c output: DISK OK [00:43:14] RECOVERY Free ram is now: OK on upload-wizard i-0000021c output: OK: 86% free memory [00:43:14] RECOVERY Total Processes is now: OK on upload-wizard i-0000021c output: PROCS OK: 110 processes [00:46:43] RECOVERY Disk Space is now: OK on incubator-bot2 i-00000252 output: DISK OK [00:46:43] RECOVERY Current Users is now: OK on incubator-bot2 i-00000252 output: USERS OK - 0 users currently logged in [00:47:39] RECOVERY Current Load is now: OK on upload-wizard i-0000021c output: OK - load average: 0.17, 2.51, 2.53 [00:47:39] RECOVERY Current Users is now: OK on upload-wizard i-0000021c output: USERS OK - 0 users currently logged in [01:41:11] PROBLEM Puppet freshness is now: CRITICAL on blamemaps-m1small i-000002a1 output: Puppet has not run in last 20 hours [01:43:41] PROBLEM Free ram is now: WARNING on ganglia-test2 i-00000250 output: Warning: 19% free memory [01:49:46] PROBLEM Disk Space is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [01:49:46] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [01:49:46] PROBLEM Current Users is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [01:49:46] PROBLEM Total Processes is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [01:50:06] PROBLEM dpkg-check is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [01:50:16] PROBLEM Free ram is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [01:54:31] RECOVERY Disk Space is now: OK on mwreview i-000002ae output: DISK OK [02:39:50] RECOVERY Free ram is now: OK on ganglia-test2 i-00000250 output: OK: 22% free memory [02:50:39] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:19] PROBLEM Current Users is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:19] PROBLEM Free ram is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:19] PROBLEM Current Users is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [02:55:17] PROBLEM Current Load is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:01:57] PROBLEM Free ram is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:01:57] PROBLEM Total Processes is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:02:02] PROBLEM dpkg-check is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:02:02] PROBLEM Disk Space is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:02:02] PROBLEM Disk Space is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:02:02] PROBLEM Total Processes is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:02:07] PROBLEM Current Load is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:02:07] PROBLEM dpkg-check is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:09:21] 06/01/2012 - 03:09:21 - Updating keys for laner at /export/home/deployment-prep/laner [03:18:43] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: (Service Check Timed Out) [03:18:43] RECOVERY Puppet freshness is now: OK on dumps-2 i-00000257 output: puppet ran at Fri Jun 1 02:59:59 UTC 2012 [03:19:27] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 6.42, 6.01, 8.57 [03:19:28] PROBLEM Current Load is now: WARNING on bots-apache1 i-000000b0 output: WARNING - load average: 9.21, 8.74, 8.99 [03:33:24] RECOVERY Current Users is now: OK on build1 i-000002b3 output: USERS OK - 0 users currently logged in [03:34:23] RECOVERY Disk Space is now: OK on ganglia-test5 i-000002a7 output: DISK OK [03:34:23] RECOVERY Total Processes is now: OK on ganglia-test5 i-000002a7 output: PROCS OK: 198 processes [03:34:28] RECOVERY dpkg-check is now: OK on ganglia-test5 i-000002a7 output: All packages OK [03:34:34] PROBLEM Current Users is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:40:30] PROBLEM Disk Space is now: WARNING on nagios 127.0.0.1 output: DISK WARNING - free space: /home/dzahn 3568 MB (20% inode=77%): [03:40:40] PROBLEM Current Load is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:40:40] PROBLEM Disk Space is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:40:40] PROBLEM Disk Space is now: CRITICAL on ganglia-test6 i-000002af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:42:41] RECOVERY Total Processes is now: OK on build1 i-000002b3 output: PROCS OK: 99 processes [03:42:46] RECOVERY Disk Space is now: OK on build1 i-000002b3 output: DISK OK [03:42:46] RECOVERY Free ram is now: OK on build1 i-000002b3 output: OK: 91% free memory [03:43:01] RECOVERY dpkg-check is now: OK on build1 i-000002b3 output: All packages OK [03:43:16] RECOVERY Current Load is now: OK on build1 i-000002b3 output: OK - load average: 0.11, 0.15, 0.22 [03:43:26] RECOVERY Current Users is now: OK on precise-test i-00000231 output: USERS OK - 0 users currently logged in [03:43:26] RECOVERY Free ram is now: OK on precise-test i-00000231 output: OK: 85% free memory [03:43:26] PROBLEM Total Processes is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:45:10] PROBLEM Current Users is now: CRITICAL on ganglia-test6 i-000002af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:47:41] PROBLEM Free ram is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:50:15] PROBLEM dpkg-check is now: CRITICAL on ganglia-test6 i-000002af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:50:15] PROBLEM Total Processes is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:50:20] PROBLEM Current Users is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:50:20] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:50:20] PROBLEM dpkg-check is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:55:29] * Hydriz facepalms at gluster's I/O handling [03:57:46] RECOVERY Disk Space is now: OK on precise-test i-00000231 output: DISK OK [03:57:46] RECOVERY Total Processes is now: OK on precise-test i-00000231 output: PROCS OK: 84 processes [03:57:51] PROBLEM Current Load is now: WARNING on precise-test i-00000231 output: WARNING - load average: 0.94, 2.23, 6.44 [03:57:51] RECOVERY dpkg-check is now: OK on precise-test i-00000231 output: All packages OK [03:58:11] PROBLEM Current Load is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:58:42] PROBLEM Current Users is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:58:56] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: CRITICAL - load average: 77.69, 72.64, 53.83 [03:59:01] PROBLEM Total Processes is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:59:06] PROBLEM Disk Space is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:59:06] PROBLEM SSH is now: CRITICAL on bots-sql2 i-000000af output: CRITICAL - Socket timeout after 10 seconds [03:59:06] PROBLEM Current Load is now: CRITICAL on ganglia-test6 i-000002af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:59:06] PROBLEM Total Processes is now: CRITICAL on ganglia-test6 i-000002af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:59:11] PROBLEM Total Processes is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:59:16] PROBLEM Free ram is now: CRITICAL on ganglia-test6 i-000002af output: CHECK_NRPE: Socket timeout after 10 seconds. [04:04:44] PROBLEM Current Load is now: WARNING on labs-nfs1 i-0000005d output: WARNING - load average: 4.89, 8.72, 10.77 [04:05:50] PROBLEM Current Load is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:12:26] PROBLEM Current Load is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:15:40] PROBLEM Disk Space is now: CRITICAL on ipv6test1 i-00000282 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:16:03] PROBLEM Current Load is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [04:16:03] PROBLEM Total Processes is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [04:16:08] PROBLEM Current Users is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [04:21:33] RECOVERY Total Processes is now: OK on mobile-testing i-00000271 output: PROCS OK: 252 processes [04:21:38] RECOVERY Total Processes is now: OK on incubator-bot2 i-00000252 output: PROCS OK: 129 processes [04:21:43] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 11.73, 14.83, 14.27 [04:22:42] RECOVERY Puppet freshness is now: OK on bots-cb i-0000009e output: puppet ran at Fri Jun 1 04:18:02 UTC 2012 [04:23:01] RECOVERY Total Processes is now: OK on rds i-00000207 output: PROCS OK: 89 processes [04:23:37] RECOVERY Free ram is now: OK on rds i-00000207 output: OK: 92% free memory [04:23:58] PROBLEM Current Load is now: CRITICAL on labs-nfs1 i-0000005d output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:12] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:12] PROBLEM Current Users is now: CRITICAL on ipv6test1 i-00000282 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:17] PROBLEM Current Load is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:17] PROBLEM Disk Space is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:17] PROBLEM Current Users is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:17] PROBLEM Total Processes is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:34] PROBLEM Free ram is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:34] PROBLEM Free ram is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:34] PROBLEM dpkg-check is now: CRITICAL on ipv6test1 i-00000282 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:34] PROBLEM dpkg-check is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:34] PROBLEM Current Users is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:34] PROBLEM Current Load is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:35] PROBLEM Disk Space is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:35] PROBLEM Current Load is now: CRITICAL on ipv6test1 i-00000282 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:49] PROBLEM host: worker1 is DOWN address: i-00000208 PING CRITICAL - Packet loss = 100% [04:29:19] RECOVERY Current Users is now: OK on maps-tilemill1 i-00000294 output: USERS OK - 0 users currently logged in [04:29:19] RECOVERY Total Processes is now: OK on bots-cb i-0000009e output: PROCS OK: 106 processes [04:29:24] RECOVERY Current Users is now: OK on bots-cb i-0000009e output: USERS OK - 1 users currently logged in [04:29:24] PROBLEM Free ram is now: WARNING on ganglia-test2 i-00000250 output: Warning: 16% free memory [04:29:34] PROBLEM Current Load is now: WARNING on migration1 i-00000261 output: WARNING - load average: 8.66, 6.81, 6.43 [04:29:34] PROBLEM dpkg-check is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:29:34] PROBLEM Current Load is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [04:30:00] PROBLEM Current Load is now: WARNING on rds i-00000207 output: WARNING - load average: 5.00, 6.37, 9.58 [04:30:00] RECOVERY Disk Space is now: OK on rds i-00000207 output: DISK OK [04:30:00] RECOVERY Current Users is now: OK on rds i-00000207 output: USERS OK - 0 users currently logged in [04:30:10] PROBLEM Free ram is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:30:11] PROBLEM Total Processes is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:30:15] PROBLEM Current Users is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [04:30:16] PROBLEM Disk Space is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [04:30:21] PROBLEM Total Processes is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [04:30:26] PROBLEM dpkg-check is now: CRITICAL on ganglia-test2 i-00000250 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:31:19] PROBLEM Current Load is now: WARNING on wep i-000000c2 output: WARNING - load average: 0.83, 1.88, 5.94 [04:33:06] RECOVERY host: worker1 is UP address: i-00000208 PING OK - Packet loss = 0%, RTA = 0.48 ms [04:33:11] PROBLEM Current Load is now: WARNING on labs-nfs1 i-0000005d output: WARNING - load average: 7.32, 6.13, 5.12 [04:33:16] PROBLEM Current Load is now: WARNING on worker1 i-00000208 output: WARNING - load average: 5.59, 5.51, 6.83 [04:33:26] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:31] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 3.64, 3.18, 14.11 [04:33:31] RECOVERY dpkg-check is now: OK on pybal-precise i-00000289 output: All packages OK [04:33:31] RECOVERY Free ram is now: OK on pybal-precise i-00000289 output: OK: 85% free memory [04:33:31] RECOVERY Total Processes is now: OK on pybal-precise i-00000289 output: PROCS OK: 90 processes [04:33:36] PROBLEM Disk Space is now: CRITICAL on ganglia-test2 i-00000250 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:36] RECOVERY Puppet freshness is now: OK on aggregator-test3 i-00000293 output: puppet ran at Fri Jun 1 04:33:24 UTC 2012 [04:35:09] PROBLEM Current Load is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:35:34] RECOVERY Current Load is now: OK on incubator-bot2 i-00000252 output: OK - load average: 1.33, 2.02, 4.93 [04:36:04] PROBLEM Current Users is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:36:04] PROBLEM Disk Space is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:36:04] PROBLEM Total Processes is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:36:09] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:36:19] RECOVERY Current Load is now: OK on wep i-000000c2 output: OK - load average: 2.73, 2.20, 4.94 [04:36:29] PROBLEM Current Load is now: WARNING on aggregator-test3 i-00000293 output: WARNING - load average: 2.57, 3.80, 15.53 [04:36:29] PROBLEM Current Load is now: WARNING on swift-be2 i-000001c8 output: WARNING - load average: 4.00, 4.32, 5.87 [04:36:29] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 74% free memory [04:36:39] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 2.70, 3.06, 11.27 [04:36:39] PROBLEM dpkg-check is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:37:51] PROBLEM Free ram is now: CRITICAL on ipv6test1 i-00000282 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:37:52] PROBLEM Total Processes is now: CRITICAL on ipv6test1 i-00000282 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:38:11] PROBLEM Current Load is now: WARNING on ganglia-test2 i-00000250 output: WARNING - load average: 6.30, 7.29, 6.34 [04:38:11] RECOVERY Free ram is now: OK on ganglia-test5 i-000002a7 output: OK: 46% free memory [04:38:11] RECOVERY Current Users is now: OK on ganglia-test5 i-000002a7 output: USERS OK - 0 users currently logged in [04:38:11] PROBLEM Current Load is now: WARNING on ganglia-test5 i-000002a7 output: WARNING - load average: 15.19, 13.45, 12.89 [04:38:33] PROBLEM Current Load is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:38:34] PROBLEM Current Users is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:38:34] PROBLEM Free ram is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:38:34] PROBLEM SSH is now: CRITICAL on ipv6test1 i-00000282 output: CRITICAL - Socket timeout after 10 seconds [04:38:34] PROBLEM Total Processes is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:38:34] PROBLEM dpkg-check is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:45:43] PROBLEM Current Load is now: WARNING on reportcard2 i-000001ea output: WARNING - load average: 5.31, 6.67, 8.17 [04:45:48] RECOVERY Disk Space is now: OK on reportcard2 i-000001ea output: DISK OK [04:45:49] RECOVERY Current Users is now: OK on reportcard2 i-000001ea output: USERS OK - 0 users currently logged in [04:45:49] RECOVERY Total Processes is now: OK on reportcard2 i-000001ea output: PROCS OK: 80 processes [04:47:50] PROBLEM Current Load is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:48:54] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:53:18] PROBLEM Current Load is now: WARNING on upload-wizard i-0000021c output: WARNING - load average: 6.01, 8.41, 10.75 [04:53:18] RECOVERY Current Users is now: OK on upload-wizard i-0000021c output: USERS OK - 0 users currently logged in [04:53:18] RECOVERY Disk Space is now: OK on upload-wizard i-0000021c output: DISK OK [04:53:18] RECOVERY Total Processes is now: OK on upload-wizard i-0000021c output: PROCS OK: 108 processes [04:53:23] RECOVERY Free ram is now: OK on upload-wizard i-0000021c output: OK: 86% free memory [04:56:30] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:56:35] RECOVERY Free ram is now: OK on reportcard2 i-000001ea output: OK: 85% free memory [04:56:45] RECOVERY dpkg-check is now: OK on reportcard2 i-000001ea output: All packages OK [04:56:46] PROBLEM Current Load is now: WARNING on incubator-bot1 i-00000251 output: WARNING - load average: 11.21, 14.90, 12.84 [04:56:50] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 13% free memory [04:56:56] PROBLEM Current Load is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:38:43] 06/01/2012 - 06:38:43 - Updating keys for tstarling at /export/home/scribunto/tstarling [06:44:41] 06/01/2012 - 06:39:02 - Updating keys for tstarling at /export/home/testlabs/tstarling [06:45:28] 06/01/2012 - 06:45:28 - Updating keys for tstarling at /export/home/deployment-prep/tstarling [06:45:29] 06/01/2012 - 06:45:28 - Updating keys for tstarling at /export/home/deployment-prep/tstarling [06:45:29] 06/01/2012 - 06:45:28 - Updating keys for tstarling at /export/home/deployment-prep/tstarling [06:45:36] 06/01/2012 - 06:45:36 - Updating keys for tstarling at /export/home/wikisource-dev/tstarling [06:45:36] 06/01/2012 - 06:45:36 - Updating keys for tstarling at /export/home/wikisource-dev/tstarling [06:45:37] 06/01/2012 - 06:45:36 - Updating keys for tstarling at /export/home/wikisource-dev/tstarling [09:27:46] petan: was checkuser disabled on beta? [09:28:01] petan: also, we need to change the logos [09:29:02] 06/01/2012 - 09:29:02 - Updating keys for asher at /export/home/testlabs/asher [09:29:05] 06/01/2012 - 09:29:05 - Updating keys for asher at /export/home/gluster/asher [09:29:11] 06/01/2012 - 09:29:11 - Updating keys for asher at /export/home/mobile/asher [09:29:18] 06/01/2012 - 09:29:18 - Updating keys for asher at /export/home/deployment-prep/asher [09:30:02] 06/01/2012 - 09:30:02 - Updating keys for asher at /export/home/testlabs/asher [09:30:05] 06/01/2012 - 09:30:05 - Updating keys for asher at /export/home/gluster/asher [09:30:12] 06/01/2012 - 09:30:12 - Updating keys for asher at /export/home/mobile/asher [09:30:17] 06/01/2012 - 09:30:17 - Updating keys for asher at /export/home/deployment-prep/asher [09:30:22] Reedy: can you help me with something on beta? [09:30:33] I need checkuser disabled [09:37:47] paravoid: labs is basically broken right now :( [09:38:44] hmm, isn't checkuser already disabled? [09:38:46] ah I see [09:38:46] it's the damn nfs server again [09:38:48] it was readded back recently by accident [09:38:52] Hydriz: ah [09:38:58] I thought that was the case [09:40:01] In that case then Oversight needs to be disabled as well [09:40:16] the stewards want that kept enabled [09:40:37] Did you get it sorted? [09:40:46] no. labs is having issues right now, though [09:41:07] but its quite embarrassing to introduce Labs at its current state to Berlin hackers... [09:41:23] yeah, I wanted to get rid of gluster before coming [09:41:41] and replace it with a normal kind of nfs share? :P [09:43:22] maybe ceph [09:43:26] for the short term, no shared storage [09:51:17] !log testlabs rebooting labs-nfs1 [09:51:19] Logged the message, Master [09:51:41] Write speed: 1 letter per minute :P [09:52:08] well, load makes the bot slow ;) [09:52:59] I am just saving my progress first [09:53:11] saving your progress? [09:53:11] before I can allow you to take down any share [09:53:24] yeah, quitting everything [09:53:31] (slowly) [09:53:42] actually, until we fix the load issues, can you disable the dump stuff? [09:53:49] meaning, for the next few weeks [09:54:00] I just stopped [09:54:06] and yeah, its possible [09:54:09] thanks [09:54:14] I just need to be able to access the wiki list [09:54:18] * Ryan_Lane nods [09:54:34] so I know which wiki I stopped at, and continue on another server [09:55:33] hmm, what does the command "logout" do, other than logging out? [09:55:39] hm. this instance is taking forever to boot [09:56:44] maybe I should pause all of the instances [09:56:47] until it comes up [09:59:15] I'm suspending them all [10:03:16] be my guest :) [10:03:18] I just hope to see Labs work, and not have too awesome features that can slow things down [10:25:20] Hydriz: well, I meant all instances [11:08:32] about 1/2 of the instances have been rebooted [11:08:39] nfs and the bastions are up [11:08:44] load is stable [11:08:53] rebooting in the order of "I like it" [11:08:55] ? [11:09:04] :P [11:09:12] rebooting in no specific order [11:09:19] though I did bring up nfs and the bastions first [11:09:32] bastion-restricted in fact [11:09:45] I did bring that one up first, yes [11:09:45] I had to tunnel via bots-apache1 to reboot the wmib bot and morebots [11:10:39] * Ryan_Lane nods [11:10:49] lol analytics is up, finally... [12:29:19] 06/01/2012 - 12:29:19 - Updating keys for laner at /export/home/deployment-prep/laner [13:06:22] hey all [13:06:42] hello hello [13:06:44] who else is about [13:07:54] petan|hackaton: Hiya, are you at the venue? [13:08:04] no, just in hostel atm [13:08:14] in fact I don't even know how to get there, need to find out [13:08:39] so far I didn't meet anyone except for LCawte who is somewhere here as well [13:08:50] petan|hackaton: It's at Gleisdreieck U-Bahn (U2 line), just down the street [13:08:56] ok [13:09:08] I guess it start in 2 hours [13:09:11] 06/01/2012 - 13:09:10 - Updating keys for diederik at /export/home/hadoop/diederik [13:09:16] 06/01/2012 - 13:09:16 - Updating keys for diederik at /export/home/reportcard/diederik [13:09:22] 06/01/2012 - 13:09:22 - Updating keys for diederik at /export/home/statsgrokse/diederik [13:09:25] 06/01/2012 - 13:09:24 - Updating keys for diederik at /export/home/mobile-stats/diederik [13:09:26] 06/01/2012 - 13:09:25 - Updating keys for diederik at /export/home/nginx/diederik [13:09:34] Yeah, but we had the Wikidata summit here today and yesterday, so there are a bunch of people here already [13:09:42] oh [13:09:45] 06/01/2012 - 13:09:45 - Updating keys for diederik at /export/home/packaging/diederik [13:09:46] 06/01/2012 - 13:09:46 - Updating keys for diederik at /export/home/analytics/diederik [13:09:52] anyone I know... [13:09:56] Ryan? :D [13:10:26] ok, I will try to get there soon then [13:10:36] (At Gleisdreieck, take the Luckenwalder Straße exit, the venue is on the right-jand side of Luckenwalder [13:11:16] There's a gate with a car barrier and a security office, wave to the security guard and he should buzz you in. We're halfway down the courtyard on the left-hand side [13:12:17] ok [13:12:27] I don't think Ryan's here right now but he should be here by 6 [13:13:00] btw are all people in same hostel? [13:13:44] No [13:13:54] Most WMF staff are in a hotel across the street from the venue [13:14:05] Some staff are at a hotel elsewhere in the city because the first hotel was full I guess [13:14:27] in this room is supposed to be 6 people, so I guess other 5 hackers are to come, unless I am in a room with someone... foreign :D [13:14:30] Most other people are at the hostel I guess [13:27:36] so, labs working properly? [13:27:46] other than 4 instances I think [13:27:50] great [13:28:00] Ryan_Lane: did you just arrive to hotel that you check? :P [13:28:25] perhaps not ganglia though [13:28:28] I'm at a beer garden, doing a sprint on redis/zeromq for mediawiki [13:28:37] !ganglia [13:28:45] why there is no such a key [13:28:53] heh [13:28:58] because it's new [13:29:01] !ping [13:29:01] pong [13:29:20] Ryan_Lane: beer garden? I want to be there as well :D [13:29:37] is there a beer garden in the venue [13:29:41] you should have come to the hackathon~ [13:29:42] ! [13:29:45] Ryan_Lane: I am [13:29:50] just in hostel so far [13:29:51] oh [13:29:54] I see [13:32:45] sounds like there's a party in here [13:33:08] yep, ganglia is still down [13:33:32] oh [13:33:37] it has bad mount options [13:33:42] I'll fix it a little later [13:35:25] the french are coming! [13:37:32] sacre bleh! [13:44:06] 06/01/2012 - 13:44:06 - Updating keys for preilly at /export/home/mobile-sms/preilly [13:44:15] 06/01/2012 - 13:44:15 - Updating keys for akhanna at /export/home/bastion/akhanna [13:44:16] 06/01/2012 - 13:44:15 - Updating keys for diederik at /export/home/bastion/diederik [13:44:16] 06/01/2012 - 13:44:16 - Updating keys for asher at /export/home/bastion/asher [13:44:16] 06/01/2012 - 13:44:16 - Updating keys for tstarling at /export/home/bastion/tstarling [13:44:43] script was broken [13:47:03] Ryan_Lane: you should talk to ScriptStack about that [13:47:07] :D [13:47:48] 06/01/2012 - 13:47:48 - Creating a project directory for queue [13:47:48] 06/01/2012 - 13:47:48 - Creating a home directory for asher at /export/home/queue/asher [13:47:48] 06/01/2012 - 13:47:48 - Creating a home directory for laner at /export/home/queue/laner [13:48:48] 06/01/2012 - 13:48:48 - Updating keys for asher at /export/home/queue/asher [13:48:48] 06/01/2012 - 13:48:48 - Updating keys for laner at /export/home/queue/laner [13:50:07] why there is no nagios [13:50:12] the bot? [13:50:15] yeh [13:50:18] gonna fix that [13:50:18] ircecho probably needs to be started on it [13:54:44] !log nagios petrb: fixed nagios [13:54:46] Logged the message, Master [13:55:12] PROBLEM Disk Space is now: CRITICAL on deployment-deb i-000002b5 output: Connection refused by host [13:55:36] here we go [13:55:37] :D [13:55:52] PROBLEM Free ram is now: CRITICAL on deployment-deb i-000002b5 output: Connection refused by host [13:55:54] lovely bot [13:57:02] PROBLEM Total Processes is now: CRITICAL on deployment-deb i-000002b5 output: Connection refused by host [13:57:42] PROBLEM dpkg-check is now: CRITICAL on deployment-deb i-000002b5 output: Connection refused by host [13:58:41] it's simple [13:58:47] purposely :) [13:59:42] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [13:59:42] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [14:00:54] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [14:03:49] PROBLEM Current Users is now: CRITICAL on redis1 i-000002b6 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:03:49] PROBLEM Current Load is now: CRITICAL on queue-wiki1 i-000002b8 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:03:49] PROBLEM Disk Space is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:04:24] PROBLEM Disk Space is now: CRITICAL on redis1 i-000002b6 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:04:24] PROBLEM Free ram is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:04:24] PROBLEM Current Users is now: CRITICAL on queue-wiki1 i-000002b8 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:05:04] PROBLEM Free ram is now: CRITICAL on redis1 i-000002b6 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:05:04] PROBLEM Disk Space is now: CRITICAL on queue-wiki1 i-000002b8 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:05:44] PROBLEM Free ram is now: CRITICAL on queue-wiki1 i-000002b8 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:05:44] PROBLEM Total Processes is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:06:14] PROBLEM dpkg-check is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:06:14] PROBLEM Total Processes is now: CRITICAL on redis1 i-000002b6 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:06:45] !ping [14:06:45] pong [14:06:50] @whoami [14:06:50] You are unknown to me :) [14:06:53] damn you [14:06:54] PROBLEM dpkg-check is now: CRITICAL on redis1 i-000002b6 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:06:54] PROBLEM Total Processes is now: CRITICAL on queue-wiki1 i-000002b8 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:07:34] PROBLEM dpkg-check is now: CRITICAL on queue-wiki1 i-000002b8 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:07:34] PROBLEM Current Load is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:07:44] RECOVERY dpkg-check is now: OK on deployment-deb i-000002b5 output: All packages OK [14:08:14] PROBLEM Current Load is now: CRITICAL on redis1 i-000002b6 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:08:14] PROBLEM Current Users is now: CRITICAL on zeromq1 i-000002b7 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:08:54] RECOVERY Current Load is now: OK on deployment-deb i-000002b5 output: OK - load average: 0.10, 0.75, 0.87 [14:09:34] RECOVERY Current Users is now: OK on deployment-deb i-000002b5 output: USERS OK - 1 users currently logged in [14:10:14] RECOVERY Disk Space is now: OK on deployment-deb i-000002b5 output: DISK OK [14:10:44] RECOVERY Free ram is now: OK on deployment-deb i-000002b5 output: OK: 87% free memory [14:12:04] RECOVERY Total Processes is now: OK on deployment-deb i-000002b5 output: PROCS OK: 84 processes [14:24:14] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [14:29:44] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [14:29:44] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [14:30:54] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [14:32:34] RECOVERY dpkg-check is now: OK on queue-wiki1 i-000002b8 output: All packages OK [14:33:44] RECOVERY Current Load is now: OK on queue-wiki1 i-000002b8 output: OK - load average: 0.04, 0.03, 0.06 [14:34:24] RECOVERY Current Users is now: OK on queue-wiki1 i-000002b8 output: USERS OK - 1 users currently logged in [14:35:04] RECOVERY Disk Space is now: OK on queue-wiki1 i-000002b8 output: DISK OK [14:35:44] RECOVERY Free ram is now: OK on queue-wiki1 i-000002b8 output: OK: 90% free memory [14:36:54] RECOVERY Total Processes is now: OK on queue-wiki1 i-000002b8 output: PROCS OK: 85 processes [14:54:14] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [14:59:44] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [14:59:44] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [15:00:54] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [15:13:41] petan|hackaton: where are you located atm? [15:14:05] Danny_B|backup: third table from projector [15:14:17] gray tshirt? [15:14:23] yes [15:14:32] I just turned [15:14:40] i just waved to you ;-))) [15:16:59] lol [15:17:06] Is it the hackathon this weekend? [15:18:37] yep [15:18:44] 1st June to 3rd June [15:19:03] and from reports, people are streaming in right now [15:19:56] IT's scary [15:24:14] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [15:29:44] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [15:29:44] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [15:30:54] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [15:41:24] Hydriz: lol [15:41:29] streaming? [15:41:38] "from reports" :P [15:41:42] I am not at the hackathon [15:41:44] where is Ryan [15:41:47] he isn't here XD [15:42:32] He is last heard from "beer garden" approximately 1 hour ago :P [15:43:16] 06/01/2012 - 15:43:16 - Updating keys for johlig at /export/home/bastion/johlig [15:43:44] now he's not even here [15:43:47] :D [15:46:29] he's hiding [15:54:14] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [15:59:44] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [15:59:44] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [16:00:54] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [16:02:49] Platonides: is that wikipedia upload tool in git [16:04:21] wm-bot died [16:04:25] 4 hours ago [16:13:41] petan|hackaton: you definitely want to meet guillom and victor to speak about your involvement as a volunteer :-] [16:24:14] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [16:29:44] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [16:29:44] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [16:30:54] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [16:32:59] Danny_B|backup: I don't think Ryan is gonna be around today :P [16:37:01] New patchset: Andrew Bogott; "Set up apache url-rewrite for labsmw." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9644 [16:37:13] petan|hackaton: Ryan should show up for the friday barbecue [16:37:17] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9644 [16:37:20] hopefully going to make us some cocktails ;-D [16:37:28] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9644 [16:37:31] yay [16:37:45] paravoid: are you able to create a new project? [16:37:59] !log deployment-prep Created deployment-deb instance to build packages :D [16:38:00] Logged the message, Master [16:38:15] hashar: why is it logged 4 hours after you created it XD [16:38:45] hashar: are you going to work on that now [16:38:59] ;) [16:39:04] cause I know it works now!!! [16:46:14] PROBLEM host: mwreview-test5 is DOWN address: i-000002b4 check_ping: Invalid hostname/address - i-000002b4 [16:46:20] Danny_B|backup: hey if you are around let me know, we can set up that project [16:50:24] hey Ryan_Lane [16:50:34] everybody is waiting for you [16:53:37] Quote from today: [16:53:55] Erik: "So if the people working on labs could stand up, Ryan Lane and Faidon .... is Ryan here?" Sam: "No, he's drinking" [16:54:06] xD [16:54:20] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [16:54:28] petan|hackaton: waiting for me for what? [16:54:32] hahaha [16:54:54] I was doing a sprint [16:55:02] drinking sprint? [16:55:02] :P [16:55:14] no redis/zeromq [16:55:21] Zeromq is fucking awesome [16:55:53] Ryan_Lane: waiting for you to make the coctails [16:56:10] heh [16:56:10] gonna be hard here [17:00:25] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [17:00:55] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [17:00:55] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [17:11:07] Ryan_Lane: If you have a moment... mysql on instance 'mwreview' didn't come back up after the reboot last night, and I could use a hand understanding why. [17:11:14] (If you're busy with hackathon stuff that's fine...) [17:11:28] didn't/won't. [17:14:58] andrewbogott: you might need to run 'mysql repair table' type commands -- tables can be left in a corrupted state on server crashes [17:15:30] GChriss: Ok, I'll give that a try. [17:15:50] Notably, I had two instances configured with the same puppet class, and both failed to come back up and show the same error. [17:16:15] what error do you see? [17:16:51] '120601 17:10:14 InnoDB: Operating system error number 13 in a file operation.' [17:17:29] andrewbogott: domas says that's permission denied [17:17:40] Yep. [17:17:50] But I wonder why permissions worked pre-crash but not post... [17:17:58] dunno [17:18:51] ok. heading off [17:19:21] /etc/selinux/config -> SELINUX=permissive? [17:20:43] GChriss: I don't have an /etc/selinux dir. [17:21:38] hey folks [17:22:01] can somebody setup a labs account for 10gible [17:23:24] drdee: The fasted route is for you/him/her to make an entry on this page: http://www.mediawiki.org/wiki/Project:Labsconsole_accounts [17:23:32] And include a note that they want a labs account, and why. [17:23:47] thanks! [17:24:28] drdee: np. Sumana may be busy with the hackathon, so if nothing happens for a day or so let me know and I'll take care of it. [17:24:45] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [17:24:55] it's for the hackathon [17:25:42] oh, ok... does 10gible have an svn account already? [17:26:09] *shrug* well, in any case, fill out that page and I can make the account right away. [17:30:25] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [17:30:55] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [17:30:55] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [17:33:11] drdee: Well, something is broken and I can't create an account. I'll try a couple of other things... [17:34:39] ssmollett: Are you around? Can you give https://labsconsole.wikimedia.org/wiki/Special:CreateAccount a try and see if it's broken for you as well? [17:37:26] drdee: and/or I may have created the account and am getting spurious email. Can you send 10gible to labsconsole and have them try to do a password reset? [17:38:33] petan|hackaton: back from tour [17:39:28] andrewbogott: sure [17:44:30] Danny_B|backup: ok let me know if you wanted to do that [17:44:53] I mean setting up that devel wiki [17:48:44] andrewbogott: will do. any particular account i should create, or just a test account? [17:49:06] Try '10gible', the last entry here: http://www.mediawiki.org/wiki/Developer_access#10gible [17:49:31] ssmollett: It may be that I'm getting an error because the account exists already, and the message is just deeply inappropriate... [17:52:40] "There was either an authentication database error or you are not allowed to update your external account." [17:54:36] Yeah, same thing that happens for me. [17:54:40] No idea what that means. [17:54:45] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [17:54:49] andrewbogott: better than "Error" [17:54:55] that was there in past [17:55:07] ldaplist doesn't show the account, so i'd be surprised if it worked. [17:56:52] do you want me to try adding it from formey? [17:58:04] one thing petan|hackaton that I remember - labs RC feed, has it been setup yet? [17:58:42] petan|hakaton: It's worse than 'error' if it's a lie. [17:58:58] ssmollet: If you know how to do it from formey, then, yes please. [17:59:50] Thehelpfulone: will be working on that soon [18:00:19] and it's always very slow petan|hackaton, granted we don't have as many servers as the rest of the projects, but anything you can do about that? [18:00:25] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [18:00:34] yes, there is a lot we can do about that [18:00:39] I remember someone mentioning about adding more squids or something but I'm not sure if that's still valid [18:00:45] yes that is [18:00:55] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [18:00:55] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [18:00:55] however the labs were down today because of hardware issues [18:01:04] these can affect the performance as well [18:01:17] there are many similar small outages in labs almost every day [18:02:38] done. 10gible should be able to reset the password from https://labsconsole.wikimedia.org/wiki/Special:PasswordReset . [18:03:44] Boo my yougert is getting warm [18:03:45] PROBLEM dpkg-check is now: CRITICAL on testswarm-funny i-000002ba output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:03:55] ssmollett: thanks. drdee! ^^^ [18:05:35] PROBLEM HTTP is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused [18:05:40] in case you (or someone else) want to do it yourself next time: add-ldap-user 10gible test ; add-labs-user --wikiname="Tanmay Shah" --mail="tanspark@gmail.com" 10gible . [18:07:15] PROBLEM SSH is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused [18:07:45] PROBLEM Total Processes is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused by host [18:07:50] PROBLEM dpkg-check is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused by host [18:08:03] smmollett: noted, thanks. [18:08:25] PROBLEM Current Load is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused by host [18:08:55] PROBLEM host: testswarm-funny is DOWN address: i-000002ba check_ping: Invalid hostname/address - i-000002ba [18:08:55] PROBLEM Current Users is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused by host [18:09:13] drdee: I have to run, but let me know if you need anything else for 10gible (e.g. bastion access) and I'll be back in an hour or so. [18:10:33] PROBLEM Free ram is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused by host [18:10:43] PROBLEM Disk Space is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused by host [18:26:03] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [18:30:33] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [18:31:03] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [18:31:03] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [18:56:03] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [19:00:33] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [19:01:03] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [19:01:03] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [19:26:03] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [19:30:33] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [19:31:03] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [19:31:03] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [19:56:03] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [20:00:33] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [20:01:03] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [20:01:03] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [20:11:21] New patchset: Andrew Bogott; "Add a trailing slash to $datadir because that seems to help." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9701 [20:11:36] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9701 [20:12:58] New review: Ottomata; "(no comment)" [operations/puppet] (test) C: 1; - https://gerrit.wikimedia.org/r/9701 [20:13:57] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9701 [20:14:13] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9701 [20:14:14] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9644 [20:21:37] drdee/drdee2: Does 10gible have what they need? [20:26:03] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [20:30:33] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [20:31:03] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [20:31:03] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [20:37:19] 06/01/2012 - 20:37:19 - Updating keys for andrew at /export/home/globaleducation/andrew [20:56:03] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [21:00:33] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [21:01:03] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [21:01:03] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [21:23:56] New patchset: Andrew Bogott; "Allow git-clone to clone to a different dir than the repo name." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9709 [21:24:12] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9709 [21:25:59] New patchset: Andrew Bogott; "Allow git-clone to clone to a different dir than the repo name." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9709 [21:26:03] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [21:26:14] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9709 [21:26:37] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9709 [21:26:40] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9709 [21:30:33] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [21:31:03] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [21:31:03] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [21:42:13] PROBLEM Puppet freshness is now: CRITICAL on blamemaps-m1small i-000002a1 output: Puppet has not run in last 20 hours [21:52:10] New patchset: Andrew Bogott; "Catch rewrite up with the rename from 'core' to 'wiki'." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9712 [21:52:26] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/9712 [21:54:12] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9712 [21:54:14] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9712 [21:56:03] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [22:00:33] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [22:01:03] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [22:01:03] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [22:07:13] PROBLEM Disk Space is now: WARNING on deployment-transcoding i-00000105 output: DISK WARNING - free space: / 78 MB (5% inode=52%): [22:09:13] PROBLEM Puppet freshness is now: CRITICAL on mobile-testing i-00000271 output: Puppet has not run in last 20 hours [22:11:13] PROBLEM Puppet freshness is now: CRITICAL on precise-test i-00000231 output: Puppet has not run in last 20 hours [22:20:13] PROBLEM Puppet freshness is now: CRITICAL on bots-3 i-000000e5 output: Puppet has not run in last 20 hours [22:22:13] PROBLEM Puppet freshness is now: CRITICAL on ganglia-test5 i-000002a7 output: Puppet has not run in last 20 hours [22:22:13] PROBLEM Puppet freshness is now: CRITICAL on mwreview i-000002ae output: Puppet has not run in last 20 hours [22:23:13] PROBLEM Puppet freshness is now: CRITICAL on incubator-bot2 i-00000252 output: Puppet has not run in last 20 hours [22:24:13] PROBLEM Puppet freshness is now: CRITICAL on ganglia-test6 i-000002af output: Puppet has not run in last 20 hours [22:25:13] PROBLEM Puppet freshness is now: CRITICAL on localpuppet1 i-0000020b output: Puppet has not run in last 20 hours [22:26:03] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [22:30:33] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [22:31:03] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [22:31:03] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [22:32:13] PROBLEM Puppet freshness is now: CRITICAL on maps-tilemill1 i-00000294 output: Puppet has not run in last 20 hours [22:36:19] Change on 12mediawiki a page OAuth/User stories was modified, changed by 216.38.130.162 link https://www.mediawiki.org/w/index.php?diff=545586 edit summary: Updated for extra wmf user stories and comments [22:49:13] PROBLEM Puppet freshness is now: CRITICAL on mailman-01 i-00000235 output: Puppet has not run in last 20 hours [22:56:03] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [23:00:33] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [23:01:03] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [23:01:03] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [23:01:21] Change on 12mediawiki a page OAuth/User stories was modified, changed by 216.38.130.162 link https://www.mediawiki.org/w/index.php?diff=545596 edit summary: [23:03:40] petan, petan|wk also one more labs thing - I would like sending emails to work please (email confirmation etc) [23:26:03] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250) [23:30:33] PROBLEM host: aggregator-test is DOWN address: i-0000024d CRITICAL - Host Unreachable (i-0000024d) [23:31:03] PROBLEM host: aggregator1 is DOWN address: i-0000010c CRITICAL - Host Unreachable (i-0000010c) [23:31:03] PROBLEM host: aggregator-test3 is DOWN address: i-00000293 CRITICAL - Host Unreachable (i-00000293) [23:56:03] PROBLEM host: ganglia-test2 is DOWN address: i-00000250 CRITICAL - Host Unreachable (i-00000250)