[00:10:34] RECOVERY Disk Space is now: OK on deployment-feed i-00000118 output: DISK OK [00:49:59] PROBLEM Puppet freshness is now: CRITICAL on deployment-syslog i-00000269 output: Puppet has not run in last 20 hours [01:22:16] * jeremyb spies a GChriss [01:24:30] * jeremyb has caught up on GChriss's question [01:25:37] * jeremyb makes GChriss beep some more [01:25:42] GChriss [01:25:43] GChriss [01:25:44] GChriss [01:31:10] oh [01:31:15] hi [01:31:19] * GChriss unmutes [01:31:27] hah [01:31:31] * jeremyb waves [01:31:36] how you've been! [01:31:46] (that was a question) [01:32:10] so, one big question is... how much storage do you need? you have them all stored locally on the openmeetings server now? [01:32:20] i guess you could just use commons for storage [01:32:53] for anyone just joining us, this is in reference to http://lists.wikimedia.org/pipermail/labs-l/2012-May/000207.html [01:33:07] well and backscroll [01:33:39] and will openmeetings continue to be hosted where it is now for some period of time while devel on labs is running too? [01:33:40] I'm not really sure labs should be used for something partially production... [01:33:48] Reedy: see backscroll [01:33:57] 17 18:26:22 < GChriss> ok, will do. In comparison to the "Labs to host wikilovesmonuments.org?" thread, a OpenMeetings migration to Labs would not be for production use (or, if so, only lightly). there's a bunch of code cleanup/rewrites that need to happen, hopefully in a WMF-friendly environment [01:34:43] GChriss: and pretty good! how about you? [01:35:33] to answer the storage question, it's a non-issue for non-production use. I'm currently at 34GB of archived videos, but that could be easily trimmed and re-added when appropriate. or even design a "cache from the Internet Archive on an on-demand type basis" [01:36:08] 34GB isn't that much to put on commoons [01:36:15] as long as you don't do it all in one go ;) [01:36:55] the Internet Archive is a better place than Commons for publishing meetings, but OpenMeetings.org accepts videos in a generalized sense: http://openmeetings.org/wiki/OMwiki:Submissions [01:37:39] also, "hopefully in a WMF-friendly environment" --> "hopefully placed into production by the WMF" [01:37:41] in time [01:37:58] You'd be best starting that discussion as soon as possible [01:38:06] started [01:38:25] GChriss: is it feasible to use the internet archive as a sort of foreign file repo like commons? [01:38:52] 'wget' is your friend [01:39:05] He's meaning in a more dynamic sense [01:39:06] erm? [01:39:15] as in a FS system? [01:39:17] if it's not here, is it at IA? [01:39:20] no [01:39:34] https://www.mediawiki.org/wiki/InstantCommons [01:40:18] that by default needs a remote MW instance [01:40:25] but you could do other stuff [01:40:31] ah. It's a neat feature, [01:40:32] but [01:40:46] not practicable in this case? [01:41:19] reuse of the IA has wider applications [01:41:44] the vast majority of footage wouldn't find use outside of OpenMeetings.org, with the exception of remix'd or cut-and-copy-into-Commons clips [01:42:32] there's development work going into Bittorrent-enabled video players [01:43:02] the IA videos also have trackerless .torrent files, HTTP-from-IA seeding, and Miro-friendly RSS [01:44:57] that's how the following 50min meeting received 2k downloads, as a featured Miro feed: http://archive.org/details/ovc_opengov_20june2009 [01:45:58] anyway, storage I'm not worried about. Getting a Labs project up and running would be fantastic [01:46:24] i think the biggest issue would be geting you your own public IP [01:46:30] there aren't many spare [01:47:21] virtual IPs would work, no? [01:47:56] do you want this to be something anyone can test? or something that you need to tunnel over ssh in order to test? (and also need to have a labs account) [01:48:24] Public ips wouldn't be a huge issue [01:48:36] I should talk to Ryan and spend some time hacking on proxy services [01:49:53] anyway, sleep! [01:50:21] from a code-alteration standpoint, yes, changes would be over SSH access. for the public-facing side, as long as 'http://openmeetings.org' is the primary way to access the site I'm all for that (rewrite rules if necessary?) [01:51:12] yeah, i'm saying http (web browsing) over ssh tunnel [01:51:37] how about http over http? :) [01:51:46] http is over tcp actually [01:51:56] well, yes. I don't need UDP [01:52:08] you could just go straight for SPDY [01:52:21] We can do just http but you need a public ip as the proxy concept doesn't exist as of yet. [01:53:04] bye! [01:53:10] * Damianz waves [01:53:19] how tight are IP addresses? [01:53:28] PROBLEM Disk Space is now: WARNING on deployment-feed i-00000118 output: DISK WARNING - free space: / 70 MB (5% inode=40%): [01:53:36] Depends how many we have free, I think Lesslie added some more the other day [01:54:08] ok [02:43:18] 05/18/2012 - 02:43:18 - Updating keys for laner at /export/home/deployment-prep/laner [02:46:19] 05/18/2012 - 02:46:19 - Updating keys for laner at /export/home/deployment-prep/laner [02:53:43] RECOVERY Puppet freshness is now: OK on deployment-apache22 i-0000026f output: puppet ran at Fri May 18 02:53:33 UTC 2012 [02:54:20] 05/18/2012 - 02:54:19 - Updating keys for laner at /export/home/deployment-prep/laner [03:03:51] RECOVERY dpkg-check is now: OK on s1tiny i-00000277 output: All packages OK [03:03:51] RECOVERY Current Load is now: OK on s1tiny i-00000277 output: OK - load average: 0.73, 1.77, 1.48 [03:04:33] RECOVERY Current Users is now: OK on s1tiny i-00000277 output: USERS OK - 0 users currently logged in [03:04:42] PROBLEM HTTP is now: CRITICAL on deployment-web i-00000217 output: CRITICAL - Socket timeout after 10 seconds [03:04:42] PROBLEM HTTP is now: CRITICAL on deployment-web5 i-00000213 output: CRITICAL - Socket timeout after 10 seconds [03:06:19] 05/18/2012 - 03:06:18 - Updating keys for laner at /export/home/deployment-prep/laner [03:08:11] RECOVERY Free ram is now: OK on s1tiny i-00000277 output: OK: 56% free memory [03:08:11] RECOVERY Disk Space is now: OK on s1tiny i-00000277 output: DISK OK [03:08:11] RECOVERY Total Processes is now: OK on s1tiny i-00000277 output: PROCS OK: 86 processes [03:09:16] PROBLEM HTTP is now: CRITICAL on deployment-web3 i-00000219 output: CRITICAL - Socket timeout after 10 seconds [03:09:16] PROBLEM HTTP is now: CRITICAL on deployment-web4 i-00000214 output: CRITICAL - Socket timeout after 10 seconds [03:09:54] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.013 second response time [03:09:54] PROBLEM HTTP is now: WARNING on deployment-web5 i-00000213 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.011 second response time [03:11:34] PROBLEM Current Load is now: WARNING on ganglia-test2 i-00000250 output: WARNING - load average: 3.57, 10.29, 5.69 [03:16:09] PROBLEM HTTP is now: CRITICAL on deployment-web i-00000217 output: CRITICAL - Socket timeout after 10 seconds [03:16:09] PROBLEM HTTP is now: CRITICAL on deployment-web5 i-00000213 output: CRITICAL - Socket timeout after 10 seconds [03:17:25] RECOVERY Current Load is now: OK on ganglia-test2 i-00000250 output: OK - load average: 0.45, 3.55, 4.06 [03:20:54] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 11.39, 7.69, 4.83 [03:23:06] PROBLEM Current Load is now: WARNING on deployment-nfs-memc i-000000d7 output: WARNING - load average: 7.31, 7.92, 5.91 [03:23:06] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 1.70, 11.54, 8.46 [03:23:21] PROBLEM Disk Space is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:21] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:58] PROBLEM HTTP is now: WARNING on deployment-web3 i-00000219 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.021 second response time [03:24:11] PROBLEM HTTP is now: WARNING on deployment-web4 i-00000214 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.009 second response time [03:28:08] RECOVERY Disk Space is now: OK on worker1 i-00000208 output: DISK OK [03:28:08] RECOVERY Current Load is now: OK on worker1 i-00000208 output: OK - load average: 0.13, 2.17, 2.02 [03:30:58] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 0.79, 2.56, 3.71 [03:30:58] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [03:33:08] RECOVERY Current Load is now: OK on deployment-nfs-memc i-000000d7 output: OK - load average: 0.01, 1.66, 3.72 [03:33:08] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.29, 1.89, 4.60 [03:35:58] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.22, 1.20, 2.80 [03:45:25] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 16% free memory [04:03:01] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 14% free memory [04:03:11] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 13% free memory [04:05:40] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 3% free memory [04:10:30] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:18:00] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 4% free memory [04:18:34] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:18:43] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 15% free memory [04:22:58] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 96% free memory [04:33:11] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:38:36] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 5% free memory [04:48:38] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 95% free memory [05:05:21] PROBLEM HTTP is now: CRITICAL on deployment-web3 i-00000219 output: CRITICAL - Socket timeout after 10 seconds [05:05:22] PROBLEM HTTP is now: CRITICAL on deployment-web4 i-00000214 output: CRITICAL - Socket timeout after 10 seconds [05:10:11] PROBLEM HTTP is now: WARNING on deployment-web3 i-00000219 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.014 second response time [05:10:11] PROBLEM HTTP is now: WARNING on deployment-web4 i-00000214 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.022 second response time [06:41:28] PROBLEM Current Load is now: CRITICAL on bots-cb i-0000009e output: CRITICAL - load average: 13.68, 30.52, 18.86 [06:41:34] PROBLEM Free ram is now: CRITICAL on mobile-testing i-00000271 output: Connection refused or timed out [06:41:43] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: Connection refused or timed out [06:41:43] PROBLEM dpkg-check is now: CRITICAL on mobile-testing i-00000271 output: Connection refused or timed out [06:41:43] PROBLEM Current Users is now: CRITICAL on mobile-testing i-00000271 output: Connection refused or timed out [06:44:02] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 4.18, 9.76, 8.39 [07:04:33] RECOVERY Free ram is now: OK on mobile-testing i-00000271 output: OK: 90% free memory [07:04:33] RECOVERY dpkg-check is now: OK on mobile-testing i-00000271 output: All packages OK [07:04:33] RECOVERY Current Users is now: OK on mobile-testing i-00000271 output: USERS OK - 0 users currently logged in [07:04:53] PROBLEM dpkg-check is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:04:53] PROBLEM dpkg-check is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [07:04:53] PROBLEM Total Processes is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [07:16:34] PROBLEM Current Load is now: WARNING on precise-test i-00000231 output: WARNING - load average: 6.81, 8.20, 7.38 [07:16:44] PROBLEM HTTP is now: CRITICAL on deployment-web3 i-00000219 output: CRITICAL - Socket timeout after 10 seconds [07:16:44] PROBLEM HTTP is now: CRITICAL on deployment-web4 i-00000214 output: CRITICAL - Socket timeout after 10 seconds [07:17:13] PROBLEM Current Users is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [07:17:14] PROBLEM Total Processes is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:20:13] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 5.76, 10.81, 17.69 [07:20:13] RECOVERY dpkg-check is now: OK on bots-sql2 i-000000af output: All packages OK [07:20:13] RECOVERY Total Processes is now: OK on bots-sql2 i-000000af output: PROCS OK: 82 processes [07:20:18] RECOVERY dpkg-check is now: OK on incubator-bot1 i-00000251 output: All packages OK [07:20:42] PROBLEM HTTP is now: WARNING on deployment-web3 i-00000219 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.010 second response time [07:20:42] PROBLEM HTTP is now: WARNING on deployment-web4 i-00000214 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.014 second response time [07:23:28] RECOVERY Total Processes is now: OK on precise-test i-00000231 output: PROCS OK: 84 processes [07:23:46] RECOVERY Current Users is now: OK on bots-sql2 i-000000af output: USERS OK - 0 users currently logged in [07:25:32] PROBLEM Current Load is now: CRITICAL on bots-cb i-0000009e output: CRITICAL - load average: 57.14, 22.38, 19.05 [07:25:32] PROBLEM Current Load is now: WARNING on deployment-imagescaler01 i-0000025a output: WARNING - load average: 0.45, 5.77, 7.68 [07:25:32] PROBLEM Current Load is now: WARNING on migration1 i-00000261 output: WARNING - load average: 7.04, 6.48, 6.06 [07:25:32] PROBLEM Current Load is now: WARNING on bots-2 i-0000009c output: WARNING - load average: 4.05, 5.10, 6.44 [07:25:44] PROBLEM Current Load is now: WARNING on worker1 i-00000208 output: WARNING - load average: 4.97, 5.63, 6.57 [07:25:44] PROBLEM Current Load is now: WARNING on deployment-nfs-memc i-000000d7 output: WARNING - load average: 9.89, 8.87, 7.16 [07:25:44] PROBLEM Current Load is now: WARNING on ganglia-test2 i-00000250 output: WARNING - load average: 4.96, 13.86, 19.17 [07:25:44] PROBLEM Current Load is now: WARNING on deployment-apache23 i-00000270 output: WARNING - load average: 3.58, 4.86, 5.15 [07:25:44] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 0.58, 2.30, 5.50 [07:25:44] PROBLEM Current Load is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:01] PROBLEM HTTP is now: CRITICAL on deployment-web3 i-00000219 output: CRITICAL - Socket timeout after 10 seconds [07:26:01] PROBLEM HTTP is now: CRITICAL on deployment-web4 i-00000214 output: CRITICAL - Socket timeout after 10 seconds [07:26:01] PROBLEM Current Load is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:01] PROBLEM Disk Space is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:01] PROBLEM Current Users is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:01] PROBLEM Total Processes is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:15] PROBLEM Current Load is now: WARNING on rds i-00000207 output: WARNING - load average: 4.18, 6.81, 7.09 [07:26:15] PROBLEM Current Load is now: WARNING on hugglewiki i-000000aa output: WARNING - load average: 2.74, 3.71, 5.78 [07:26:15] PROBLEM Current Load is now: WARNING on swift-be3 i-000001c9 output: WARNING - load average: 1.35, 4.04, 5.40 [07:26:15] PROBLEM Current Load is now: WARNING on upload-wizard i-0000021c output: WARNING - load average: 8.07, 6.34, 6.39 [07:26:32] PROBLEM Current Load is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:32] PROBLEM Free ram is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:32] PROBLEM dpkg-check is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:27:34] RECOVERY Current Load is now: OK on precise-test i-00000231 output: OK - load average: 0.16, 2.05, 4.50 [07:28:21] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 1.60, 4.63, 9.17 [07:28:41] PROBLEM Current Load is now: WARNING on bastion1 i-000000ba output: WARNING - load average: 1.27, 4.37, 5.03 [07:29:31] PROBLEM Current Load is now: WARNING on incubator-bot2 i-00000252 output: WARNING - load average: 3.16, 7.77, 8.27 [07:30:01] RECOVERY Current Load is now: OK on deployment-apache23 i-00000270 output: OK - load average: 1.32, 2.75, 4.16 [07:30:01] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 0.58, 1.12, 4.09 [07:30:01] RECOVERY Disk Space is now: OK on reportcard2 i-000001ea output: DISK OK [07:30:01] RECOVERY Current Users is now: OK on reportcard2 i-000001ea output: USERS OK - 0 users currently logged in [07:30:01] RECOVERY Total Processes is now: OK on reportcard2 i-000001ea output: PROCS OK: 82 processes [07:30:06] PROBLEM Current Load is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:31:07] RECOVERY Current Load is now: OK on hugglewiki i-000000aa output: OK - load average: 0.87, 1.67, 4.29 [07:31:07] RECOVERY Current Load is now: OK on swift-be3 i-000001c9 output: OK - load average: 0.05, 1.55, 3.95 [07:31:53] PROBLEM Current Load is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:31:53] PROBLEM Current Load is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:10] RECOVERY Current Load is now: OK on bastion1 i-000000ba output: OK - load average: 0.39, 1.65, 3.61 [07:34:10] PROBLEM Current Load is now: CRITICAL on aggregator-test2 i-0000024e output: CRITICAL - load average: 4.89, 24.67, 24.21 [07:34:33] PROBLEM Free ram is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:33] PROBLEM Disk Space is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:33] PROBLEM Current Users is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:33] PROBLEM Total Processes is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:53] RECOVERY Current Load is now: OK on deployment-imagescaler01 i-0000025a output: OK - load average: 0.25, 1.88, 4.76 [07:34:53] PROBLEM Current Load is now: WARNING on incubator-bot1 i-00000251 output: WARNING - load average: 0.86, 3.27, 5.32 [07:36:07] PROBLEM Current Load is now: WARNING on rds i-00000207 output: WARNING - load average: 4.17, 4.69, 5.70 [07:36:07] PROBLEM Current Load is now: WARNING on upload-wizard i-0000021c output: WARNING - load average: 4.04, 4.84, 5.45 [07:36:07] RECOVERY Current Load is now: OK on reportcard2 i-000001ea output: OK - load average: 2.64, 4.58, 4.90 [07:36:07] RECOVERY Free ram is now: OK on reportcard2 i-000001ea output: OK: 85% free memory [07:36:07] RECOVERY dpkg-check is now: OK on reportcard2 i-000001ea output: All packages OK [07:39:16] PROBLEM Current Load is now: WARNING on aggregator-test2 i-0000024e output: WARNING - load average: 1.25, 9.66, 17.76 [07:39:16] RECOVERY Disk Space is now: OK on upload-wizard i-0000021c output: DISK OK [07:39:16] RECOVERY Free ram is now: OK on upload-wizard i-0000021c output: OK: 94% free memory [07:39:16] RECOVERY Current Users is now: OK on upload-wizard i-0000021c output: USERS OK - 0 users currently logged in [07:39:16] RECOVERY Total Processes is now: OK on upload-wizard i-0000021c output: PROCS OK: 84 processes [07:39:59] RECOVERY Current Load is now: OK on bots-2 i-0000009c output: OK - load average: 2.58, 3.06, 4.44 [07:39:59] RECOVERY Current Load is now: OK on migration1 i-00000261 output: OK - load average: 0.13, 2.32, 4.41 [07:39:59] RECOVERY Current Load is now: OK on incubator-bot1 i-00000251 output: OK - load average: 1.39, 2.20, 4.27 [07:40:10] PROBLEM Current Load is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:10] RECOVERY Current Load is now: OK on worker1 i-00000208 output: OK - load average: 0.26, 2.14, 4.35 [07:43:10] RECOVERY Current Load is now: OK on mobile-testing i-00000271 output: OK - load average: 0.66, 1.02, 3.94 [07:50:13] RECOVERY Current Load is now: OK on deployment-nfs-memc i-000000d7 output: OK - load average: 0.17, 0.90, 3.69 [07:50:13] RECOVERY Current Load is now: OK on ganglia-test2 i-00000250 output: OK - load average: 1.06, 0.93, 4.65 [08:03:05] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 3.41, 2.26, 3.79 [08:04:12] RECOVERY Current Load is now: OK on aggregator-test2 i-0000024e output: OK - load average: 2.59, 1.14, 4.07 [08:18:08] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.25, 1.34, 2.66 [08:20:25] hello [08:36:55] good morning! [08:45:28] I am still updating the labs MediaWiki configuration ;-] [09:02:10] paravoid: did you get any information about the disk slowness on labs? [09:03:58] nope [09:04:16] briefly talked with Ryan yesterday, didn't come up with anything [09:04:23] other than "we have to replace gluster asap" :-) [09:04:34] yeah heard about that already ;-D [09:06:20] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.018 second response time [09:06:20] PROBLEM HTTP is now: WARNING on deployment-web4 i-00000214 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.017 second response time [09:06:20] PROBLEM HTTP is now: WARNING on deployment-web5 i-00000213 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.012 second response time [09:06:20] PROBLEM HTTP is now: WARNING on deployment-web3 i-00000219 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.017 second response time [09:32:19] PROBLEM Puppet freshness is now: CRITICAL on localpuppet1 i-0000020b output: Puppet has not run in last 20 hours [09:44:21] 05/18/2012 - 09:44:21 - Creating a home directory for jamesur at /export/home/planet/jamesur [09:44:59] !log planet added jalexander [09:45:04] !log deployment-prep hashar: getting ride of 1.17 and 1.18 checkouts [09:45:21] 05/18/2012 - 09:45:21 - Updating keys for jamesur at /export/home/planet/jamesur [09:45:24] Jamesofur: ok, the project is called "planet" [09:46:04] there is one instance currently, called "venus", but you could create a new one if you want [09:46:07] awesome, have we done much so far on it? [09:46:26] it currently uses the class "misc::planet-venus" and "generic::locales::international" [09:46:40] but the webserver appears down (at one point it was reachable) did not check yet why [09:48:02] Jamesofur: class misc::planet-venus , installs a system role, sets up planet languages, installs package..system user, index.html, configs... etc [09:48:36] even crons... i started it some time back and then it got kind of stalled and i havent looked for a while [09:48:45] very nice, using the current setup at the moment or a new one? [09:49:05] yes, well, trying [09:49:14] but that was exactly the question.. does it work [09:49:22] are the old planet configs compatible with planet-venus [09:49:57] afair the basics were but something wasnt.. and i forgot what exactly right now :p [09:50:18] heheh [09:50:27] looking in mail... [09:50:54] ahhahhaa [09:51:01] oh yeah [09:51:01] I deleted a file by mistake :-D [09:51:04] \O/ [09:51:20] LOL that would usually hurt things :P [09:52:22] '/^([^.]+)\.([^.]+)\./' <--- lovely regex ;- [09:52:23] :) [09:52:33] there is no such thing [09:52:43] regexp ftw [09:56:05] Jamesofur: planet-venus = Ubuntu distro package, btw. we love just using distro where possible [09:56:23] oh totally, just makes it easier [09:57:03] Jamesofur: https://gerrit.wikimedia.org/r/#/c/2451/ the thing that wasnt compatible were some templates for design of the planet pages Brion wrote... [09:57:24] so what we need is new templates for the planet-venus as next step [09:57:35] the main configs (feed URLs) arent a problem [09:58:12] ahh interesting, yeah I think I'll play around with it as I have time this weekend and early next week. Will be cool to see some options [09:58:20] cool:) [10:00:13] ok, bed time I need to put down the laptop or I won't get sleep before a weekend of little sleep with makerfaire [10:04:23] poor man sysloger: sudo tcpdump -A -n -v -s0 udp port 514 | grep PHP [10:04:24] ;) [10:04:35] given as a free hint [10:05:14] nice [10:05:31] now I am going to try to update a database ! [10:09:46] HOLY S*** IT WORKS!!!!!!!!!!! [10:09:48] yeahhhh [10:10:05] "little step for script, huge step for me" and that kind of stuff [10:10:16] "I am a cookie" -- JFK would work too [10:10:44] !log deployment-prep updated enwiki database using 'mwscript update.php enwiki --quick' [10:11:10] :-) [10:11:10] !log deployment-prep MW Multiversion has been setup and updated to recognize wmflabs [10:11:24] +1 for the bot being broken :-))) [10:11:36] we should get the bot out of labs, they keep being broken hehe [10:12:33] !log deployment-prep Running 'foreachwiki update.php --quick' [10:13:38] took me the week to reach that point [10:14:33] lol hashar [10:14:52] @ "little step ..." [10:15:29] !bot [10:15:29] http://meta.wikimedia.org/wiki/WM-Bot [10:15:50] @search down [10:15:50] No results found! :| [10:16:30] !bot del [10:16:30] Successfully removed bot [10:16:58] !bot is http://meta.wikimedia.org/wiki/WM-Bot | troubleshooting bots -> https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation#Troubleshooting [10:16:58] Key was added! [10:21:23] !log bots restarting labs-morebots on bots-2 [10:21:25] Logged the message, Master [10:22:53] hashar: ^ you can re-log now [10:23:00] danke [10:23:07] de rien [10:23:10] !log deployment-prep updated enwiki database using 'mwscript update.php enwiki --quick' [10:23:11] Logged the message, Master [10:23:18] !log deployment-prep Running 'foreachwiki update.php --quick' [10:23:19] Logged the message, Master [10:23:26] mutante: ah, cool, was wondering how to fix that [10:23:47] !log deployment-prep Seems like mwmultiversion is back in function again :-] [10:23:49] Logged the message, Master [10:23:55] paravoid: :) added that docs page there after trying to find it like 3 times:) [10:24:25] pnmtojpeg: command not found [10:24:27] !!! [10:24:43] anyone have any idea what pnmtojpeg can be ? [10:25:00] converting pnm to jpeg? :p [10:25:37] http://filext.com/file-extension/pnm [10:25:47] PBM Portable Any Map Graphic [10:25:52] pnmtojpeg converts the named PBM, PGM, or PPM image file, or the standard input if no file is named, to a JFIF file on the standard output. [10:26:00] sounds like yet another hacky software [10:26:20] why hacky? [10:27:10] looks like "xnview" (image viewer , free software, multi-platform) is using that [10:27:49] oh we have the package on image scaling apaches [10:27:55] which are not setup on labs yet ;-D [10:28:18] what do you mean? [10:28:26] we do have an image scaler apache on labs [10:28:29] not sure if mediawiki runs there [10:28:48] hashar: there is imagescaler.pp in prod. [10:28:48] all requests are landing on the frontend squid which then load balance them on the 4 application apaches [10:29:20] did you just want imagescaler::packages added to labs? [10:29:28] in SF I tried setting up a frontend squid + nginx proxy + imagescaler apaches but failed :-] [10:30:25] paravoid: where you referring to deployment-imagescaler01 ? [10:30:40] s/where/were/ [10:30:48] yes I am [10:31:04] mutante: we already have that in labs :-) [10:31:10] RECOVERY Disk Space is now: OK on deployment-feed i-00000118 output: DISK OK [10:31:20] the problem is to send thumbs traffic there now ;-) [10:31:30] ah.ok, good [10:31:54] uhm.. i dont know about the sending traffic part [10:33:43] * hashar listens to http://youtu.be/oNxMzTdrjvE [10:39:04] PROBLEM Disk Space is now: WARNING on deployment-feed i-00000118 output: DISK WARNING - free space: / 70 MB (5% inode=40%): [10:42:06] oh yeah, my own log :p [10:42:19] !log bots restarted labs-morebots on bots-2 [10:42:20] Logged the message, Master [10:42:53] !log planet added member/sysadmin jalexander [10:42:54] Logged the message, Master [10:42:54] !log deployment-prep hashar: update.php script ran on all wikis [10:42:56] Logged the message, Master [10:44:11] looking at -feed [10:44:28] i am not even sure I remember what that instance is for [10:46:33] !log deployment-prep On -feed, ran apt-get clean [10:46:35] Logged the message, Master [10:49:04] RECOVERY Disk Space is now: OK on deployment-feed i-00000118 output: DISK OK [10:50:15] PROBLEM Puppet freshness is now: CRITICAL on deployment-syslog i-00000269 output: Puppet has not run in last 20 hours [10:51:41] the super auto cleaner is at https://gerrit.wikimedia.org/r/#/c/7075/ [10:51:59] will post to ops-l [10:52:45] hmm after all most ops want to ignore that change [10:52:52] * hashar hides [10:55:02] !log deployment-prep ran apt-get clean on -syslog [10:55:03] Logged the message, Master [10:59:49] New patchset: Hashar; "nfs::home::wikipedia instead of /home/wikipedia" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7940 [11:00:05] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7940 [11:01:20] New patchset: Hashar; "nfs::home::wikipedia instead of /home/wikipedia" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7940 [11:01:25] hashar: that's not valid puppet code [11:01:35] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7940 [11:01:57] you had it like that in the first place and I fixed it yesterday btw [11:02:06] hooo [11:02:22] you depend on resources, not classes [11:02:36] there are resources for classes (Class["nfs::home::wikipedia"]) but it's not what you want here [11:03:28] can you comment on the change ? [11:03:40] I am out for a bit of time to get lunch [11:03:59] will have a look at resource versus class when I am back [11:04:21] New review: Faidon; "This is not valid puppet code." [operations/puppet] (test); V: 0 C: -2; - https://gerrit.wikimedia.org/r/7940 [11:04:41] just did [11:05:39] hashar: http://en.wiktionary.org/wiki/savoir-vivre (strange translation?) [11:05:48] enjoy lunch, bbl as well [11:30:38] mutante: I have no idea how to translate savoir-vivre :-D [11:34:23] New patchset: Hashar; "add up missing nfs::home::wikipedia" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7940 [11:34:38] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7940 [11:34:53] paravoid: I guess I understood your remark ;-D https://gerrit.wikimedia.org/r/#/c/7940/ [11:35:35] ? [11:35:54] about "depending on resources, not classes" [11:36:12] I needed to include the class which provides a "file {}" statement [11:36:20] then I can require that file [11:36:38] right [11:39:09] meh, Ubuntu installer annoys me with jumping to "install base system" right after i finished writing partition tables [11:39:37] i just want to write them and go back to RAID setup first...dont jump to conclusions [11:39:53] paravoid: can you please merge https://gerrit.wikimedia.org/r/#/c/7940/ ? :-D [11:40:06] so I fix deployment-syslog instancen [11:40:54] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7940 [11:41:01] done [11:41:40] έλεος [11:41:46] oh? [11:42:14] which, according to the french wiktionary, seems to be 'thanks' in Greek [11:42:16] hopefully [11:42:17] ;) [11:42:45] err, no [11:43:10] where did you find that? [11:43:14] Ας μιλήσουμε τώρα όλα τα ελληνικά :) :) apergos. Google μεταφράζει [11:43:23] έλεος = mercy != merci [11:43:39] mercy in english that is [11:44:04] ohh [11:44:20] where in wiktionary did you read that? [11:44:30] ευχαριστώ [11:44:33] that's the one [11:44:37] http://fr.wiktionary.org/wiki/merci [11:44:47] there is several alternatives for the meaning of 'merci' [11:44:49] yay, wiktionary editing:) [11:44:53] I picked up the wrong one [11:45:05] ah [11:45:30] I guess I know why I never use wiktionary haha [11:45:32] too confusing [11:45:44] ask me:) i can teach you basic templates [11:46:41] paravoid: looks like you pressed the wrong button on https://gerrit.wikimedia.org/r/#/c/7940/ [11:46:45] paravoid: not merged yet ;) [11:46:57] remember: the vulgarities category is just as valid as any other:) [11:49:30] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7940 [11:49:42] paravoid: merci! [11:51:42] !log deployment-prep puppet running again on -syslog \o/ [11:51:44] Logged the message, Master [11:52:02] this way we get it again in ganglia [12:01:42] PROBLEM HTTP is now: CRITICAL on deployment-web i-00000217 output: CRITICAL - Socket timeout after 10 seconds [12:01:42] PROBLEM HTTP is now: CRITICAL on deployment-web4 i-00000214 output: CRITICAL - Socket timeout after 10 seconds [12:06:35] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.016 second response time [12:06:35] PROBLEM HTTP is now: WARNING on deployment-web4 i-00000214 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.019 second response time [12:06:35] PROBLEM HTTP is now: WARNING on deployment-web3 i-00000219 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.008 second response time [12:06:35] PROBLEM HTTP is now: WARNING on deployment-web5 i-00000213 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.012 second response time [13:08:30] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 4.03, 12.37, 6.91 [13:18:33] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 1.20, 2.30, 3.92 [13:31:16] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [14:26:01] hi hashar, I see some more progress on beta commons, I got a reasonable error on upload this morning after getting a file in yesterday. What I'm seeing now is that video is queued for TMH but TMH doesn't seem to be working the queue: http://commons.wikimedia.beta.wmflabs.org/wiki/File:NASA_footages_UFOs.ogv. I'm hoping that's just a matter of starting the right process. [14:26:14] hashar: and thanks! [14:26:19] hey ;-D [14:26:32] I did a good sprint today [14:26:40] configuration of labs start to be identical to the one in production [14:26:48] though for TMH, I did not look closely [14:26:56] looks like the video is playing now!! [14:27:09] I have no idea how the queue is being processed :( [14:28:33] hashar: I think TMH working the incoming video queue was one of the few things set up correctly before you started rebuilding anything, but I don't know how to re-start that process. Petr might know. [14:31:36] !log deployment-prep adding puppet class applicationserver::jobrunner [14:31:44] Logged the message, Master [14:38:59] !log deployment-prep Creating deployment-jobrunner01 to be used as … a jobrunner!!!! [14:39:19] chrismcmahon: yeah maybe some shell script running in the background [14:39:32] chrismcmahon: it probably worked, but I would not call that properly setup :-]]] [14:39:37] per production standard I mean hehe [14:43:39] hashar: yes, but since TMH is not in production right now... [14:44:04] I will set up that job production like :-D [14:44:19] it is unlikely I manage to do that by the end of today though [14:44:33] PROBLEM Current Load is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused by host [14:44:46] excellent :) [14:45:07] I basically spend all the week trying to recover from jetlag [14:45:13] PROBLEM Current Users is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused by host [14:45:21] took me like 5 days to actually wake up at 7am [14:45:32] and upgrading labs configuration ;))) [14:45:42] it seems we have now something that looks a like production [14:45:54] PROBLEM Disk Space is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused by host [14:46:33] PROBLEM Free ram is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused by host [14:47:15] !log deployment-prep updating MediaWiki [14:47:17] Logged the message, Master [14:47:43] PROBLEM Total Processes is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused by host [14:48:23] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused by host [14:48:59] !log deployment-prep /home/wikipedia/common/php-trunk now tracks mediawiki/core.git , branch master. So a simple 'git pull' will update it! [14:49:00] Logged the message, Master [14:51:50] nice, thanks again [14:55:17] !log deployment-prep updating all extensions [14:55:19] Logged the message, Master [14:56:38] ahhhh [14:56:45] THERE ARE LOCAL HACKS ON LABS!!!!!!!!!!!!!!!!!!! [14:56:47] oh noo [14:58:55] chrismcmahon: looks like your IRC client is not really stable :-( [15:00:12] !log deployment-prep hashar: rebooting jobrunner01 [15:00:13] Logged the message, Master [15:01:32] hashar: this is true, I don't know if it's my wireless driver, my IRC client, or my internet connection, but something about this setup makes pidgin unhappy from time to time. [15:01:52] looks like some timeout [15:01:58] which makes your client change host [15:05:04] !log deployment-prep Running: foreachwiki update.php --quiet --quick [15:05:05] Logged the message, Master [15:22:10] !log deployment-prep rewinded MobileFrontend to before 9db8dc94b1b83999931fca3d0edf5e22ab1effb3 ( https://gerrit.wikimedia.org/r/#/c/7795/ ) [15:22:12] Logged the message, Master [15:24:45] !log deployment-prep Well MaxSem fixed mobile Frontend :-D [15:24:46] Logged the message, Master [15:26:01] mutante: could you possibly run puppet on deployment-jobrunner01.pmtpa.wmflabs please ? [15:26:05] I can't sudo as root there [15:26:19] that is a new instance, I guess puppet has not run yet to install the sudoer file [15:26:21] hashar: do you have sysadmin rights in the project? [15:26:24] if so, change the sudo policy [15:26:35] good morning :) [15:26:40] I do sudo on other instances [15:26:42] let me check [15:26:50] poor mutante :) [15:27:34] admin, user:hashar, host:ALL, commands:ALL [15:27:46] Ryan_Lane: yeah I should be able to sudo whatever I want :) [15:28:00] you can't? [15:28:06] what's it telling you? [15:28:24] hashar is not in the sudoers file. This incident will be reported. [15:28:29] will the CIA come to my house ? :( [15:29:41] this is an instance in the deployment-prep project? [15:29:43] which instance? [15:29:48] it must be misconfigured [15:30:06] deployment-jobrunner01 / i-00000278 [15:30:46] maybe I should just delete it and recreate it [15:31:42] no. please don't [15:32:06] * hashar step off the delete button [15:34:03] hm [15:34:06] this is missing config [15:34:22] is puppet broken on it? [15:35:05] seems puppet is broken on it [16:06:35] RECOVERY Current Users is now: OK on deployment-jobrunner01 i-00000278 output: USERS OK - 1 users currently logged in [16:06:35] RECOVERY Disk Space is now: OK on deployment-jobrunner01 i-00000278 output: DISK OK [16:06:35] RECOVERY Free ram is now: OK on deployment-jobrunner01 i-00000278 output: OK: 93% free memory [16:06:36] hashar: ok. it should work now [16:06:44] dpkg had broken somehow [16:06:47] trying [16:07:01] great! [16:07:04] thanks Ryan [16:07:06] yw [16:07:34] then I can spend the weekend figuring out how our job runner infrastructure works and apply that to the new fresh instance ;-D [16:07:46] RECOVERY Total Processes is now: OK on deployment-jobrunner01 i-00000278 output: PROCS OK: 106 processes [16:08:26] RECOVERY dpkg-check is now: OK on deployment-jobrunner01 i-00000278 output: All packages OK [16:09:46] RECOVERY Current Load is now: OK on deployment-jobrunner01 i-00000278 output: OK - load average: 0.13, 0.40, 0.25 [16:24:18] 05/18/2012 - 16:24:18 - Updating keys for laner at /export/home/deployment-prep/laner [16:25:20] 05/18/2012 - 16:25:20 - Updating keys for laner at /export/home/deployment-prep/laner [16:26:19] 05/18/2012 - 16:26:18 - Updating keys for laner at /export/home/deployment-prep/laner [16:27:38] !log deployment-prep Removed apache:: puppet class, uses the application:: ones instead [16:27:40] Logged the message, Master [16:28:20] 05/18/2012 - 16:28:20 - Updating keys for laner at /export/home/deployment-prep/laner [16:30:23] 05/18/2012 - 16:30:23 - Updating keys for laner at /export/home/deployment-prep/laner [16:33:19] 05/18/2012 - 16:33:19 - Updating keys for laner at /export/home/deployment-prep/laner [16:33:38] wtf [16:36:20] 05/18/2012 - 16:36:20 - Updating keys for laner at /export/home/deployment-prep/laner [16:37:23] !log deployment-prep started mediawiki job runner on -jobrunner01 [16:37:24] Logged the message, Master [16:42:27] !log deployment-prep added fake 'aawiki' entry to wikiversions.data [16:42:29] Logged the message, Master [16:52:46] !log deployment-prep added 'aawiki' to all.dblist and made a symbolic to it named wmflabs.dblist [16:52:47] Logged the message, Master [16:56:21] will have to create that wiki [16:57:51] see you tomorrow [16:58:41] looking at the transcoding node right now, it needs some changes to be inline with the changed setup [16:59:20] adding /apache symlink it finds the config again but trying to run /apache/common/php-trunk/maintenance/runJobs.php i get [16:59:24] No MWMultiVersion instance initialized! MWScript.php wrapper not used? [16:59:41] any ideas [17:02:33] hi j^ no ideas but let me know what you find out? [17:04:48] ok, looking at http://wikitech.wikimedia.org/view/Heterogeneous_deployment#Run_a_maintenance_script_on_a_wiki it has to run /usr/bin/php /apache/common/multiversion/MWScript.php runJobs.php [17:08:16] now i need to figure out how to run a script in an extensions folder... [17:19:50] full link [17:21:04] where are the logs now? [17:27:35] https://labsconsole.wikimedia.org/wiki/Deployment/HetDeploy lists a folder called mwmultiversion but that does not exist [17:29:11] i'm not sure if hashar has finished fully migrating everything [17:30:46] RECOVERY Free ram is now: OK on mobile-feeds i-000000c1 output: OK: 93% free memory [17:32:36] RECOVERY Current Load is now: OK on mobile-feeds i-000000c1 output: OK - load average: 0.10, 0.16, 0.07 [17:32:36] RECOVERY Current Users is now: OK on mobile-feeds i-000000c1 output: USERS OK - 1 users currently logged in [17:32:36] RECOVERY Total Processes is now: OK on mobile-feeds i-000000c1 output: PROCS OK: 119 processes [17:32:41] RECOVERY SSH is now: OK on mobile-feeds i-000000c1 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:32:41] RECOVERY dpkg-check is now: OK on mobile-feeds i-000000c1 output: All packages OK [17:39:35] !log deployment-prep j: add /apache symlin on deployment-transcoding [17:39:38] Logged the message, Master [17:40:09] will wait for full migration before doing more in that case. thumbnails are still extracted on the -web instances or what is doing that now? [17:41:58] PROBLEM Free ram is now: WARNING on ganglia-test2 i-00000250 output: Warning: 19% free memory [17:51:58] RECOVERY Free ram is now: OK on ganglia-test2 i-00000250 output: OK: 20% free memory [17:58:29] RECOVERY Puppet freshness is now: OK on mobile-feeds i-000000c1 output: puppet ran at Fri May 18 17:58:19 UTC 2012 [17:59:59] PROBLEM Free ram is now: WARNING on ganglia-test2 i-00000250 output: Warning: 19% free memory [18:02:17] Ryan_Lane: I keep getting a no nova credentials message [18:02:26] log out then back in [18:02:28] logging in and logging out solves it, but it's annoying [18:04:43] jeremyb: the mailman web ui seems to be down, any idea what's up? [18:15:15] 05/18/2012 - 18:15:15 - Updating keys for marktraceur at /export/home/bastion/marktraceur [18:15:32] 05/18/2012 - 18:15:32 - Updating keys for marktraceur at /export/home/orgcharts/marktraceur [19:33:13] PROBLEM Puppet freshness is now: CRITICAL on localpuppet1 i-0000020b output: Puppet has not run in last 20 hours [20:57:15] 05/18/2012 - 20:57:15 - Updating keys for jeroendedauw at /export/home/bastion/jeroendedauw [20:57:18] 05/18/2012 - 20:57:18 - Updating keys for jeroendedauw at /export/home/globaleducation/jeroendedauw [21:01:10] New patchset: Bhartshorne; "cleaning incoming URLs a little bit to increase hit ratio" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7985 [21:01:24] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7985 [21:03:19] 05/18/2012 - 21:03:19 - Creating a home directory for reedy at /export/home/globaleducation/reedy [21:03:39] There we go [21:04:18] 05/18/2012 - 21:04:18 - Updating keys for reedy at /export/home/globaleducation/reedy [21:35:43] New review: Aaron Schulz; "(no comment)" [operations/puppet] (test); V: 0 C: -1; - https://gerrit.wikimedia.org/r/7985 [22:04:10] petan, petan|wk you around? [22:04:11] http://simple.wikipedia.beta.wmflabs.org/wiki/Special:RecentChanges [22:04:15] Forbidden [22:04:17] You don't have permission to access /wiki/Special:RecentChanges on this server. [22:04:19] maybe Ryan_Lane? ^ [22:04:39] no idea [22:04:59] it's improperly setup, I'd imagine [23:03:54] PROBLEM HTTP is now: CRITICAL on deployment-apache20 i-0000026c output: CRITICAL - Socket timeout after 10 seconds [23:03:54] PROBLEM HTTP is now: CRITICAL on deployment-web3 i-00000219 output: CRITICAL - Socket timeout after 10 seconds [23:03:54] PROBLEM HTTP is now: CRITICAL on deployment-web i-00000217 output: CRITICAL - Socket timeout after 10 seconds [23:04:04] PROBLEM HTTP is now: CRITICAL on deployment-web4 i-00000214 output: CRITICAL - Socket timeout after 10 seconds [23:04:04] PROBLEM HTTP is now: CRITICAL on deployment-web5 i-00000213 output: CRITICAL - Socket timeout after 10 seconds [23:08:36] PROBLEM HTTP is now: WARNING on deployment-apache20 i-0000026c output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.018 second response time [23:08:40] PROBLEM HTTP is now: WARNING on deployment-web3 i-00000219 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.010 second response time [23:08:40] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.018 second response time [23:08:40] PROBLEM HTTP is now: WARNING on deployment-web4 i-00000214 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.017 second response time [23:08:40] PROBLEM HTTP is now: WARNING on deployment-web5 i-00000213 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.017 second response time [23:12:53] PROBLEM Current Load is now: CRITICAL on bots-cb i-0000009e output: CRITICAL - load average: 34.56, 35.76, 18.33 [23:17:53] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 1.58, 13.65, 13.52 [23:32:08] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [23:37:52] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 1.62, 1.03, 4.23