[00:08:06] PROBLEM SSH is now: CRITICAL on mobile-testing i-00000271 output: CRITICAL - Socket timeout after 10 seconds [00:20:56] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [00:35:26] PROBLEM Puppet freshness is now: CRITICAL on wikistats-01 i-00000042 output: Puppet has not run in last 20 hours [01:05:57] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [01:50:57] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [02:35:57] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [02:38:13] PROBLEM Total Processes is now: CRITICAL on aggregator-test1 i-000002bf output: PROCS CRITICAL: 203 processes [02:43:03] RECOVERY Total Processes is now: OK on deployment-thumbproxy i-0000026b output: PROCS OK: 150 processes [02:43:18] PROBLEM Total Processes is now: WARNING on aggregator-test1 i-000002bf output: PROCS WARNING: 197 processes [02:52:38] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:00:58] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [03:02:38] RECOVERY Current Load is now: OK on mobile-testing i-00000271 output: OK - load average: 0.44, 0.29, 0.24 [03:20:58] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 20% free memory [03:23:53] PROBLEM Disk Space is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:28:48] RECOVERY Disk Space is now: OK on mobile-testing i-00000271 output: DISK OK [03:31:02] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [03:36:02] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 14% free memory [03:39:12] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 12% free memory [03:50:58] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [03:54:08] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [03:55:58] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [03:59:08] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 5% free memory [04:00:58] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 14% free memory [04:04:08] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:09:08] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 13% free memory [04:15:58] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 5% free memory [04:20:58] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 96% free memory [04:24:08] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 4% free memory [04:29:08] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 95% free memory [04:31:44] PROBLEM Free ram is now: WARNING on test3 i-00000093 output: Warning: 6% free memory [04:36:40] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [04:41:10] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [04:54:10] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 23% free memory [04:57:00] PROBLEM Current Users is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:11:10] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [05:33:00] RECOVERY Free ram is now: OK on incubator-bot1 i-00000251 output: OK: 37% free memory [05:37:10] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [05:47:10] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 20% free memory [06:09:38] 06/25/2012 - 06:09:38 - Updating keys for amgine at /export/keys/amgine [06:11:04] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [06:36:10] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [06:38:20] PROBLEM Total Processes is now: CRITICAL on aggregator-test1 i-000002bf output: PROCS CRITICAL: 206 processes [06:43:25] PROBLEM Total Processes is now: WARNING on aggregator-test1 i-000002bf output: PROCS WARNING: 200 processes [06:44:00] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 53% free memory [06:45:10] PROBLEM Free ram is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:00] RECOVERY Free ram is now: OK on mobile-testing i-00000271 output: OK: 84% free memory [07:13:20] PROBLEM Total Processes is now: CRITICAL on aggregator-test1 i-000002bf output: PROCS CRITICAL: 203 processes [07:38:04] good morning [07:46:03] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [08:21:05] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [08:43:35] PROBLEM SSH is now: CRITICAL on mobile-testing i-00000271 output: CRITICAL - Socket timeout after 10 seconds [08:48:25] RECOVERY SSH is now: OK on mobile-testing i-00000271 output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [08:53:25] PROBLEM Total Processes is now: WARNING on aggregator-test1 i-000002bf output: PROCS WARNING: 194 processes [09:21:15] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [10:03:05] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:06:15] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [10:23:06] PROBLEM Disk Space is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:23:36] PROBLEM Total Processes is now: CRITICAL on aggregator-test1 i-000002bf output: CHECK_NRPE: Socket timeout after 10 seconds. [10:27:56] RECOVERY Disk Space is now: OK on mobile-testing i-00000271 output: DISK OK [10:28:26] PROBLEM Total Processes is now: WARNING on aggregator-test1 i-000002bf output: PROCS WARNING: 193 processes [10:36:26] PROBLEM Puppet freshness is now: CRITICAL on wikistats-01 i-00000042 output: Puppet has not run in last 20 hours [10:38:26] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 18% free memory [10:44:26] PROBLEM Current Users is now: WARNING on bastion-restricted1 i-0000019b output: USERS WARNING - 7 users currently logged in [10:48:26] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 20% free memory [10:51:26] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [11:41:26] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [12:17:06] PROBLEM Current Users is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [12:39:35] 06/25/2012 - 12:39:32 - User beetstra may have been modified in LDAP or locally, updating key in project(s): bots,bastion [12:39:39] 06/25/2012 - 12:39:38 - Updating keys for beetstra at /export/keys/beetstra [12:40:16] OK, so that did not help .. ? [12:41:05] Beetstra: ? [12:41:13] you can't get into something? [12:41:21] I get a 'Permission denied (publickey)." when trying to access bots-3 .. [12:41:28] I can get into bastion, and into bots-2 [12:41:42] gimme a sec [12:41:55] bots-1 fine as well [12:42:00] K [12:56:26] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [13:04:24] Lol RAM flood [13:05:18] petan: are you around ? [13:05:28] petan: we need a proper way to override wgRC2UDPAddress [13:05:46] the InitialiseSettings.php file is shared between production and labs, so we can't just change the IP there :-D [13:07:06] I am probably going to rewrite the wmfLabsOverrideSettings to allow overriding an existing production conf [13:07:07] hashar: isn't the the reason for realm? [13:07:21] for realm? [13:07:34] InitialiseSettings.php is per see the production file [13:07:42] for labs, we have InitiaseSettings-wmflabs.php [13:07:48] where we are supposed to override production settings [13:07:59] looks like the method just merge them instead of just overriding [13:08:14] I should really write a test suite on that [13:08:16] for that [13:09:03] away a few sec, coffee badly needed [13:09:04] hashar: it's not in there [13:09:11] check the override [13:09:19] I changed both files just to make sure [13:09:28] it doesn't work and I wanted to know what's wrong [13:09:36] that's why changed it in IS [13:09:59] hashar: there is already another override file [13:10:03] check it [13:12:27] ohh [13:12:46] petan: I will sort it out :) [13:13:45] petan: I will also drop the local repository on dbdump [13:13:54] to use the remote operations/mediawiki-config [13:14:33] anyone will still be able to make change locally but will have to submit them to gerrit so they get merged in prod ulitmately [13:31:27] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [13:32:40] hashar: ok [13:33:02] hashar: I hope all changes to labs files will be merged immediately [13:33:14] I don't want to wait for ops weeks just to merge a config change of beta [13:34:07] RECOVERY HTTP is now: OK on demo-deployment1 i-00000276 output: HTTP OK: HTTP/1.1 200 OK - 911 bytes in 0.322 second response time [13:36:44] you could make your change locally [13:36:47] then git push :) [13:37:10] also mediawiki-config is merged by platform engineering (aka me, sam, chad) [13:37:26] and since they are usually simple enough and well safeguarded, I guess we will apply them fastly [13:41:27] RECOVERY HTTP is now: OK on demo-web1 i-00000255 output: HTTP OK: HTTP/1.1 200 OK - 453 bytes in 0.046 second response time [13:42:17] PROBLEM HTTP is now: CRITICAL on demo-deployment1 i-00000276 output: CRITICAL - Socket timeout after 10 seconds [13:49:37] PROBLEM HTTP is now: CRITICAL on demo-web1 i-00000255 output: CRITICAL - Socket timeout after 10 seconds [14:14:07] PROBLEM Free ram is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [14:26:27] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [14:38:07] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [14:41:27] RECOVERY Disk Space is now: OK on deployment-transcoding i-00000105 output: DISK OK [14:42:57] RECOVERY Current Load is now: OK on mobile-testing i-00000271 output: OK - load average: 0.13, 0.27, 0.23 [14:49:27] PROBLEM Disk Space is now: WARNING on deployment-transcoding i-00000105 output: DISK WARNING - free space: / 71 MB (5% inode=52%): [14:59:27] RECOVERY Disk Space is now: OK on deployment-transcoding i-00000105 output: DISK OK [15:00:00] !log deployment-prep updating Ubuntu on deployment-transcoding [15:00:01] Logged the message, Master [15:11:27] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [15:14:17] !log deployment-prep Deleted InitialiseSettingsDeploy.php (no longer used). Replaced by InitialiseSettings-wmflabs.php [15:14:19] Logged the message, Master [15:43:57] PROBLEM Free ram is now: CRITICAL on gluster-3 i-000002e1 output: Connection refused by host [15:46:07] PROBLEM Disk Space is now: CRITICAL on gluster-3 i-000002e1 output: Connection refused by host [15:46:07] PROBLEM SSH is now: CRITICAL on gluster-3 i-000002e1 output: Connection refused [15:46:27] PROBLEM Total Processes is now: CRITICAL on gluster-3 i-000002e1 output: Connection refused by host [15:46:32] PROBLEM dpkg-check is now: CRITICAL on gluster-3 i-000002e1 output: Connection refused by host [15:51:07] RECOVERY Disk Space is now: OK on gluster-3 i-000002e1 output: DISK OK [15:51:07] RECOVERY SSH is now: OK on gluster-3 i-000002e1 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [15:51:27] RECOVERY Total Processes is now: OK on gluster-3 i-000002e1 output: PROCS OK: 101 processes [15:51:32] RECOVERY dpkg-check is now: OK on gluster-3 i-000002e1 output: All packages OK [15:53:57] PROBLEM Free ram is now: UNKNOWN on gluster-3 i-000002e1 output: NRPE: Unable to read output [16:01:27] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [16:37:07] PROBLEM dpkg-check is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [16:42:07] RECOVERY dpkg-check is now: OK on mobile-testing i-00000271 output: All packages OK [17:01:27] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [17:17:07] PROBLEM Current Users is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [17:21:57] RECOVERY Current Users is now: OK on mobile-testing i-00000271 output: USERS OK - 0 users currently logged in [18:01:27] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [18:19:27] PROBLEM dpkg-check is now: CRITICAL on build-precise1 i-00000273 output: DPKG CRITICAL dpkg reports broken packages [18:19:27] RECOVERY Current Users is now: OK on bastion-restricted1 i-0000019b output: USERS OK - 2 users currently logged in [18:24:27] RECOVERY dpkg-check is now: OK on build-precise1 i-00000273 output: All packages OK [18:38:17] PROBLEM Free ram is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:43:17] RECOVERY Free ram is now: OK on mobile-testing i-00000271 output: OK: 84% free memory [18:56:27] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [19:17:58] I've a problem when I want to log into bastion : [19:17:59] $ ssh tpt@bastion.wmflabs.org [19:18:03] If you are having access problems, please see: https://labsconsole.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [19:18:05] Connection closed by 208.80.153.207 [19:18:07] I've added by ssh public key in labsconsole, and I've no problem to use gerrit with the same key. [19:19:55] you're most likely not in the bastion project [19:20:20] yup, looks like it [19:20:38] Successfully added tpt to bastion. [19:20:44] Gotta wait for your keys to update and stuff [19:28:06] well, also need to purge the nscd cache now too [19:28:08] gimme a sec [19:29:46] Reedy: It's working ! Thanks a lot. [19:29:55] bot needed to be restarted too [19:31:27] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [19:32:27] PROBLEM Current Users is now: WARNING on bastion-restricted1 i-0000019b output: USERS WARNING - 7 users currently logged in [20:09:17] PROBLEM dpkg-check is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:37:27] PROBLEM Puppet freshness is now: CRITICAL on wikistats-01 i-00000042 output: Puppet has not run in last 20 hours [20:39:51] Reedy: when trying to get into gerrit from another machine, add another key to my gerrit account, right? [20:40:07] Or use the same key [20:41:06] Well, since adding a second key is phail, I'll try the copy as soon as I get back home. [20:51:27] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [21:19:47] can the editor engagement dev group start using http://simple.wikipedia.beta.wmflabs.org/ for our software testing? [21:20:25] petan: ^ [21:22:56] chrismcmahon: you around? [21:25:29] Ryan_Lane: When matthias tries to log into the bastion labs server, he gets "Connection closed by 208.80.153.207". Any idea? He's doing ssh -A mlitn@bastion.wmflabs.org [21:25:41] did he try before he was added to the project? [21:25:51] not sure, probably not [21:26:06] I don't think he knew about it until I asked him to try [21:26:08] he did :) [21:26:11] oh [21:26:20] :P [21:26:21] tell him to try now [21:26:24] wait [21:26:24] ok... [21:26:27] he still isn't in the project [21:26:32] he needs to be in the bastion project [21:26:33] I added him [21:26:35] ah [21:26:37] my bad [21:27:04] I'll see if I can add him to that project as well [21:27:11] you can [21:27:47] yep [21:27:52] ok. gimme a sec [21:28:06] he should be good to go [21:28:23] may be a little bit before his home directory is created [21:28:31] 06/25/2012 - 21:28:31 - Created a home directory for mlitn in project(s): bastion [21:28:37] there we go [21:28:38] :) [21:29:33] 06/25/2012 - 21:29:33 - User mlitn may have been modified in LDAP or locally, updating key in project(s): bastion [21:30:40] he logged in successfully! [21:31:49] Ryan Lane: thanks for the help (and thanks for consistently being the most responsive person in Ops) :) [21:32:38] hi kaldari [21:32:49] howdy! [21:33:11] paravoid: heh. you forgot virt8 ;) [21:33:20] forgot what? [21:33:20] I just launched a vm there [21:33:22] did you ever hear anything back on getting a beta instance set up with a mirrored DB? [21:33:28] I need to delete it. gimme a sec [21:33:49] forgot to do what? [21:33:58] disable its nova-compute service [21:34:06] chrismcmahon: no one responded to my email :( [21:34:13] 1997 nova-manage service disable virt7 nova-compute [21:34:13] 1998 nova-manage service disable virt8 nova-compute [21:34:13] 1999 history |grep disable [21:34:23] dunno [21:34:24] nova-compute virt8 pmtpa enabled :-) 2012-06-25 21:32:58 [21:34:42] hashar: ^^ can we use beta labs for a test env for Editor Engagement? [21:34:53] I just disabled it [21:34:54] PROBLEM host: gluster-4 is DOWN address: i-000002e3 CRITICAL - Host Unreachable (i-000002e3) [21:35:30] that may sound silly, but how did you see that? [21:35:36] chrismcmahon: Ryan Kaldari poked me about that in some mail :( [21:35:38] nova-manage service list [21:35:38] lost it though [21:35:47] kaldari: I *think* beta labs is fit for use for an EE test env, but hashar would need to be involved in setting it up. [21:35:51] :P [21:35:56] chrismcmahon: I guess the easier is to create a wiki for that team [21:36:16] chrismcmahon: I did wrote him some documentation, need to review it and publish on mediawiki.org [21:36:36] kaldari: sorry for the lack of feedback over the last days / week :( [21:36:46] ah, nova-manage service list [21:36:56] kaldari: i got notes about how to connect to the beta, make a change there, update the extension / master etc [21:37:07] kaldari: need to clean them up though [21:37:29] hashar: no prob, just had to follow-up on it today per our sprint planning [21:37:40] sure [21:37:44] do poke me :D [21:37:55] Ryan_Lane: the real question is [21:37:57] did the VM work? :) [21:38:03] hashar: we have http://en.wikipedia.beta.wmflabs.org/ created, but doesn't have any db mirroring set up [21:38:04] no [21:38:08] networking didn't work [21:38:18] wonder why [21:38:21] it also didn't create the vm [21:38:22] ;) [21:38:27] og [21:38:28] oh [21:38:28] wait [21:38:30] kaldari: that one is just an import :( it is not in sync with the live database. [21:38:30] ignore me [21:38:32] it probably did [21:38:42] I was looking on the gluster filesystem forit [21:38:46] haha [21:38:54] kaldari: I am not sure we want to replicate enwiki to the beta :-D Going to kill the small sql server there [21:38:57] that's obviously not going to work :) [21:39:06] well, it would have been good to see why it didn't work [21:39:20] yes [21:39:27] ah [21:39:28] I know why [21:39:28] since I'm otherwise done [21:39:52] hashar: yeah, I think replicating simple.wiki would be more realistic and would work for our purposes [21:39:58] on virt8: ii python-greenlet 0.3.1-1ubuntu1~lucid0 Lightweight in-process c [21:40:05] wait [21:40:08] wrong package [21:40:27] 0.9.15-0ubuntu2~lucid4 0 [21:40:28] 500 http://ppa.launchpad.net/openstack-release/2011.3/ubuntu/ lucid/main Packages [21:40:31] *** 0.9.13-0ubuntu1~lucid0 0 [21:40:33] 1001 http://apt.wikimedia.org/wikimedia/ lucid-wikimedia/main Packages [21:40:35] on virt5: hi python-eventlet 0.9.15-0ubuntu2~lucid4 [21:40:37] eventlet [21:40:45] on virt8: ii python-eventlet 0.9.13-0ubuntu1~lucid [21:40:47] kaldari: do you really need edits from live wiki? [21:40:55] we need to pin that package [21:41:05] that package or the PPA? [21:41:06] hashar: no, but it would be nice if it's possible [21:41:15] just the package [21:41:25] the reason we have an issue is because our repo is higher [21:41:32] and that version of eventlet is for swift [21:41:36] it looks like the other beta sites are doing some kind of semi-live mirroring [21:41:41] and maplebed uploaded all of the swift stuff into our repo [21:41:42] kaldari: so I guess we should first setup a basic editor engagement wiki on labs. Wich unfortunately will have no content and probably little edits beside yours [21:42:02] kaldari: then in parallel find out how we could have edits from a live wiki. [21:42:09] openstack::common is really nova::common, right? [21:42:12] Ryan_Lane: maplebed: newer eventlet is breaking swift? if so, we need to fix swift [21:42:17] (or eventlet) [21:42:21] hashar: well, we already have 2 different editor engement wikis on labs [21:42:26] ohh [21:42:32] no. I started off poorly [21:42:38] kaldari: are they part of 'beta' or something you have set up? [21:42:39] and named everything openstack::blah [21:42:46] hashar: but we need one with realistic article and user data [21:42:54] which isn't necessarily wrong, if I would have used openstack::nova::compute-service [21:43:00] right [21:43:14] RECOVERY host: gluster-4 is UP address: i-000002e4 PING OK - Packet loss = 0%, RTA = 725.75 ms [21:43:16] hashar: why not put the EE project on http://en.wikipedia.beta.wmflabs.org/ and set that up to be useful? [21:43:17] hasher: http://en.wikipedia.beta.wmflabs.org/ and http://ee-prototype.wmflabs.org/ [21:43:21] after dealing with gluster I'm going to fix the puppet manifests for openstack [21:43:24] kaldari: I could duplicate the simplewiki database we have on labs and use that to create the EE one [21:43:44] they were the first large sets of puppet I wrote [21:43:47] I'd like to work on cleaning up some things in puppet in general at some point [21:43:50] i.e. modules [21:43:51] hashar: that sounds like it would work [21:43:55] chrismcmahon: I think we want en.wikipedia.beta.wmflabs.org to be close to the production wiki [21:43:55] paravoid: agreed [21:44:04] PROBLEM Total Processes is now: CRITICAL on gluster-4 i-000002e4 output: Connection refused by host [21:44:05] I'm waiting for this round of changes to finish though [21:44:11] * Ryan_Lane nods [21:44:12] i.e. Ciscos, gluster [21:44:16] chrismcmahon: IE no experimental extensions there but only ones which are more or less scheduled for a live deployment [21:44:16] yeah [21:44:33] chrismcmahon: need to talk to Rob about it [21:44:34] PROBLEM dpkg-check is now: CRITICAL on gluster-4 i-000002e4 output: Connection refused by host [21:44:34] are you doing the gluster upgrade now? [21:44:39] no [21:44:52] when is/was it? [21:45:08] chrismcmahon: basically beta is supposed to be a staging area for production. Any experimentation / feature development should be done on a dedicated wiki (which can itself be hosted on 'beta') [21:45:08] hashar: what would be a realistic time-table for getting that created, and would you be the person doing it? [21:45:35] kaldari: I would be the one, need to figure out how to copy the db and make sure the sql server still have enough space to handle the copy [21:45:55] hashar: kaldari this has always confused me a bit. I think it is OK for en.wikipedia.beta.wmflabs.org to reflect the future state of enwiki, not necessarily the present state. I could be wrong. [21:46:00] hashar: this isn't really for experimentation, this is mostly for QA testing [21:46:20] is that going to be deployed on production 'soon' ? [21:46:33] chrismcmahon: yeah that is a grey area :/ [21:46:34] all our extensions are deployed [21:46:53] well, except for one that isn't turned on [21:47:12] paravoid: was thinking about tomorrow [21:47:19] The E3 team is the one doing the experiments [21:47:20] unfortunately, the mysql issue we have isn't fixed [21:47:24] I'm going to try it using precise [21:47:27] to see if it's a fuse bug [21:47:47] hashar: the E2 team just works on approved extensions [21:47:59] for the most part [21:48:53] so are E3 extensions deployed on production too? [21:49:04] RECOVERY Total Processes is now: OK on gluster-4 i-000002e4 output: PROCS OK: 84 processes [21:49:20] hashar: I don't know, I'm E2 and only speaking for E2 [21:49:27] ohhh [21:49:34] RECOVERY dpkg-check is now: OK on gluster-4 i-000002e4 output: All packages OK [21:49:42] hashar: kaldari yes, E2 is a stable long-running high-value project, and worth putting on beta labs [21:49:44] sorry, I am not really following all the editor engagement project. I just know it is pretty cool [21:49:49] definitely [21:49:55] hashar: heh, NP [21:50:05] so I guess we can have them setup on en.wikipedia.beta.wmflabs.org [21:50:15] if they are not yet http://en.wikipedia.beta.wmflabs.org/wiki/Special:Version [21:50:22] hashar: yeah, that probably makes sense [21:50:27] paravoid: though the upgrade bug I hit last time is gone [21:50:33] so, that's a plus [21:50:38] and we'll be on a stable version of gluster [21:50:44] PROBLEM Free ram is now: UNKNOWN on gluster-4 i-000002e4 output: NRPE: Unable to read output [21:50:49] kaldari: could you fill in a bug report https://bugzilla.wikimedia.org/enter_bug.cgi?product=Wikimedia%20Labs against deployment-prep listing the extension you want to have setup on the beta enwiki ? [21:50:52] one bug fixed, how many introduced? :) [21:50:56] that's good! [21:50:57] kaldari: will do that tomorrow [21:51:12] well, so far I don't see any new bugs, but who knows till we try [21:51:15] hashar: I'd rather just get ssh access and set it all up myself if that's OK [21:51:44] kaldari: just about doing some configuration change in operations/mediawiki-config to add the extension in a if( $cluster == 'wmflabs ) {} , then checkout the extension on beta, run the db update and refresh the l10n message cache [21:51:53] kaldari: ohh or that [21:52:02] kaldari: but then I need to finish up my documentation :-] [21:52:20] I really hope this is fixed in precise [21:52:30] (and finish up fixing the mediawiki config on labs which is not running out of operations/mediawiki-config yet ) [21:53:01] hashar: I notice the en.beta only has a few hundred articles though, is that going to be expanded at some point? [21:55:19] hashar: BTW, I pasted in the wrong URL a while back, I meant we have http://ee.wikipedia.beta.wmflabs.org and http://ee-prototype.wmflabs.org [21:55:30] kaldari: bug tracking me publishing the beta doc is https://bugzilla.wikimedia.org/show_bug.cgi?id=37943 [21:57:00] kaldari: we can probably add more articles. I have no idea how the first articles got imported there. Maybe through Special:Import [21:57:04] would have to ask [21:57:23] kaldari: ee.wikipedia.beta.wmflabs.org is supposed to be the wiki for the 'eʋegbe', # Éwé language [21:57:37] oh! [21:57:44] :) [21:57:45] yeah that is confusing [21:57:46] hehe [21:58:33] kaldari: it might be useful to add hashar to the ee mail list [21:58:35] chrismcmahon: the DjVu bug you assigned to me, I am probably going to step out of it :( [21:58:39] chrismcmahon: https://bugzilla.wikimedia.org/show_bug.cgi?id=37764 [21:58:46] hashar: OK [21:58:54] well hmm [21:59:01] I got already a few thousands mails per week :)D [21:59:09] or not :) [21:59:23] so I would prefer having the editor team to be able to deploy change [21:59:28] same stuff is ongoing for the mobile team [21:59:42] and I can't remember who else asked me for doc [22:01:07] hashar: if you can get me ssh access on en.beta, I can help set up all the configs to mirror en.wiki. I've basically done that for my local instance. [22:01:33] let me check if I can add someone :)D [22:02:18] kaldari: what is your account name on labs ? [22:02:28] kaldari [22:02:34] that is creative :-] [22:02:46] :) [22:03:08] I have added you to the deployment-prep project [22:03:14] welcome aboard! [22:03:32] 06/25/2012 - 22:03:32 - Created a home directory for kaldari in project(s): deployment-prep [22:03:50] how do you mirror enwiki anyway? Is there a way to fetch articles from enwiki through the mediawiki API ? [22:04:07] yeah, I have an importing script that I wrote [22:04:15] but there's also Special:Import [22:04:28] neither is a great solution though [22:04:31] 06/25/2012 - 22:04:31 - User kaldari may have been modified in LDAP or locally, updating key in project(s): deployment-prep [22:05:01] BRB.... [22:06:25] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory [22:35:39] 06/25/2012 - 22:35:39 - Updating keys for amgine at /export/keys/amgine [22:44:55] sleeping for real already [22:44:57] see you later [22:47:00] PROBLEM Current Load is now: CRITICAL on bots-cb i-0000009e output: CRITICAL - load average: 23.79, 35.87, 18.62 [22:51:30] PROBLEM Free ram is now: CRITICAL on aggregator-test1 i-000002bf output: Critical: 5% free memory [22:52:00] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 0.54, 13.37, 13.58 [23:12:00] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.15, 0.50, 3.95 [23:17:32] 06/25/2012 - 23:17:32 - Creating a project directory for swiftupgrade [23:17:32] 06/25/2012 - 23:17:32 - Created a home directory for ben in project(s): swiftupgrade [23:18:32] 06/25/2012 - 23:18:32 - User ben may have been modified in LDAP or locally, updating key in project(s): swiftupgrade [23:23:52] PROBLEM Current Load is now: CRITICAL on su-fe1 i-000002e5 output: Connection refused by host [23:24:52] PROBLEM Current Users is now: CRITICAL on su-fe1 i-000002e5 output: Connection refused by host [23:24:52] PROBLEM Current Load is now: CRITICAL on su-be1 i-000002e7 output: Connection refused by host [23:25:02] PROBLEM Disk Space is now: CRITICAL on su-fe1 i-000002e5 output: Connection refused by host [23:25:02] PROBLEM Current Users is now: CRITICAL on su-be1 i-000002e7 output: Connection refused by host [23:25:52] PROBLEM Free ram is now: CRITICAL on su-fe1 i-000002e5 output: Connection refused by host [23:25:52] PROBLEM Disk Space is now: CRITICAL on su-be1 i-000002e7 output: Connection refused by host [23:26:32] PROBLEM Free ram is now: CRITICAL on su-be1 i-000002e7 output: Connection refused by host [23:27:02] PROBLEM Total Processes is now: CRITICAL on su-fe1 i-000002e5 output: Connection refused by host [23:27:32] PROBLEM Total Processes is now: CRITICAL on su-be1 i-000002e7 output: Connection refused by host [23:27:37] PROBLEM dpkg-check is now: CRITICAL on su-fe1 i-000002e5 output: Connection refused by host [23:28:13] PROBLEM dpkg-check is now: CRITICAL on su-be1 i-000002e7 output: Connection refused by host [23:33:51] PROBLEM Disk Space is now: CRITICAL on su-fe2 i-000002e6 output: Connection refused by host [23:33:51] PROBLEM Current Users is now: CRITICAL on su-be3 i-000002e9 output: Connection refused by host [23:33:51] PROBLEM Current Load is now: CRITICAL on su-be2 i-000002e8 output: Connection refused by host [23:34:24] PROBLEM Free ram is now: CRITICAL on su-fe2 i-000002e6 output: Connection refused by host [23:34:24] PROBLEM Current Users is now: CRITICAL on su-be2 i-000002e8 output: Connection refused by host [23:34:24] PROBLEM Current Load is now: CRITICAL on su-aux1 i-000002ea output: Connection refused by host [23:34:24] PROBLEM Disk Space is now: CRITICAL on su-be3 i-000002e9 output: Connection refused by host [23:35:22] PROBLEM Current Users is now: CRITICAL on su-aux1 i-000002ea output: Connection refused by host [23:35:22] PROBLEM Free ram is now: CRITICAL on su-be3 i-000002e9 output: Connection refused by host [23:35:22] PROBLEM Disk Space is now: CRITICAL on su-be2 i-000002e8 output: Connection refused by host [23:36:02] PROBLEM Free ram is now: CRITICAL on su-be2 i-000002e8 output: Connection refused by host [23:36:22] PROBLEM Disk Space is now: CRITICAL on su-aux1 i-000002ea output: Connection refused by host [23:36:22] PROBLEM Total Processes is now: CRITICAL on su-fe2 i-000002e6 output: Connection refused by host [23:36:22] PROBLEM dpkg-check is now: CRITICAL on su-fe2 i-000002e6 output: Connection refused by host [23:36:22] PROBLEM Total Processes is now: CRITICAL on su-be3 i-000002e9 output: Connection refused by host [23:36:43] PROBLEM Free ram is now: CRITICAL on su-aux1 i-000002ea output: Connection refused by host [23:37:03] PROBLEM dpkg-check is now: CRITICAL on su-be3 i-000002e9 output: Connection refused by host [23:37:03] PROBLEM Total Processes is now: CRITICAL on su-be2 i-000002e8 output: Connection refused by host [23:37:44] PROBLEM Current Load is now: CRITICAL on su-fe2 i-000002e6 output: Connection refused by host [23:37:44] PROBLEM Total Processes is now: CRITICAL on su-aux1 i-000002ea output: Connection refused by host [23:37:44] PROBLEM dpkg-check is now: CRITICAL on su-be2 i-000002e8 output: Connection refused by host [23:38:43] PROBLEM Current Users is now: CRITICAL on su-fe2 i-000002e6 output: Connection refused by host [23:38:43] PROBLEM Current Load is now: CRITICAL on su-be3 i-000002e9 output: Connection refused by host [23:38:43] PROBLEM dpkg-check is now: CRITICAL on su-aux1 i-000002ea output: Connection refused by host [23:41:31] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf output: Warning: 6% free memory