[00:50:15] hey Ryan_Lane for some reason i can no longer access my new instance ? [00:50:23] which instance? [00:50:36] test10 [00:51:09] in which project? [00:51:30] leslie [00:52:16] ugh [00:52:20] seems ldap isn't working on it [00:52:32] re-running puppet [00:53:06] it is natty (i needed a newer instance to try and backport a package [00:53:09] ah [00:53:27] you don't need to have natty to backport packages from it [00:53:30] or precise [00:53:40] you can add their source repos to your apt list [00:53:45] and comment out the others [00:53:51] to use apt-get source [00:53:52] ah ok [00:54:11] cool, i'll just use another instance then :) [00:54:12] though [00:54:14] kill that one [00:54:15] that said [00:54:17] I fixed it [00:54:19] hehe [00:54:22] I restarted nslcd [00:54:27] damn, you fix it and then i'm about to kill it [00:54:38] puppet doesn't work correctly on natty [00:54:45] gotcha [00:54:45] which means broken nslcd ;) [00:54:59] nslcd and nscd are now required to run on ldap clients [00:55:42] *yawn* [00:57:32] I didn't know you guys where switching over to switft for storing the media, that's actually rather cool :D [01:01:07] we are already using it [01:01:08] for thumbs [01:01:13] soon for all media [01:02:26] Ah, so that's what the 43ishmillion objects are on the ganglia graphs :P [01:03:36] yep [01:04:41] Ryan_Lane, do you know anything about the permissions that gluster expects for a brick? [01:04:57] what do you mean? [01:05:01] gluster is telling me a dir is already in use when it definitely isn't; I presume it's a permission problem. [01:05:07] hm [01:05:11] which directory? [01:05:23] what does lsof say? [01:05:35] I think gluster runs as root... [01:05:40] Gluster says: Brick: driver-dev:/exp1/andrewtest already in use\n [01:05:45] lsof says "" [01:05:54] is any other volume using it? [01:06:44] * andrewbogott tries to remember how to query gluster about existing volumes [01:08:41] Hm... gluster is using a parent dir. That's probably the issue. [01:09:41] Ryan_Lane: Oh, as you're here - I don't suppose your LDAP mw auth plugin supports autologin if someone's htaccess creds pass the ldap auth check so I don't have to login to a wiki when already authenticated against ldap :D Also you get a grape for saving me hastle of figuring out mw auth plugins. [01:09:50] Sweet! Check it out: [01:10:06] $ sudo gluster volume delete volume-00000002 [01:10:06] Damianz: yes, it can do that [01:10:06] Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y [01:10:06] Volume volume-00000002 has been started.Volume needs to be stopped before deletion. [01:10:06] andrew@driver-dev:/$ sudo gluster volume stop volume-00000002 [01:10:07] Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y [01:10:07] Volume volume-00000002 does not exist [01:10:19] ahahhahahaha [01:10:26] well, *that* is dangerous [01:10:35] is there a way to allocate a brand new ip ? the allocate ip button doesn't let me allocate :( [01:10:47] We haz ips? [01:10:51] LeslieCarr: you need to up your project's quota [01:10:59] ok [01:11:00] also, you need to add a new ip into the list [01:11:06] I think we have docs!! [01:11:26] https://labsconsole.wikimedia.org/wiki/Help:Nova-manage [01:11:37] \o/ [01:11:45] yay [01:11:49] Ryan_Lane: Next question.... how!? I setup wgLDAPAutoAuthUsername and it appears to not do so much and the docs started ranting about windows ad env so I stopped reading. [01:12:12] do i do this on virt1 ? [01:12:18] or formey ? [01:12:38] virt0 [01:12:43] virt0 is the controller [01:12:47] formey is svn/gerrit [01:12:52] virt1 is now a compute node :) [01:13:12] Damianz: based on the server username [01:13:32] here's an example" http://www.mediawiki.org/wiki/Extension:LDAP_Authentication/Kerberos_Configuration_Examples [01:13:54] you still need to configure the extension at least a little [01:13:59] it needs to contact the ldap server [01:14:12] hm. I think [01:14:31] I'm pretty sure it needs to check for existence of the user [01:14:50] I can login normally as it's configured now (ignoring I just commented out the group req because it didn't do nice things), just no auto login thing happening. [01:19:16] the ldap extension isn't the easiest in the world to configure [01:19:23] I haven't tested the auto login stuff in a while [01:19:28] but i've had reports it's working properly [01:20:44] Hmm I'll dig around the code a bit and see if I can figure it out... and yeah the whole array of servers thing had me totally confused for a min. It doesn't help that my ldap server doesn't allow anon binding and has some weird groups/filters for stuff. [01:21:36] it's normal to not allow anon binding [01:21:40] that's what the proxy agent is for [01:21:56] different bases are allowed for groups and people [01:22:04] but only one base. not multiple [01:33:00] Ack [01:33:35] Yeah if you set wgLDAPAutoAuthDomain to something then it works, otherwise it just throws an array key not there warning, which it probably shouldn't but anyway. [01:47:45] * Damianz sends Ryan_Lane|away some email spam via the Zarro boogs master [02:41:20] RECOVERY Total Processes is now: OK on fundraising-civicrm fundraising-civicrm output: PROCS OK: 81 processes [02:41:30] PROBLEM Free ram is now: WARNING on puppet-lucid puppet-lucid output: Warning: 12% free memory [02:41:30] RECOVERY Current Load is now: OK on fundraising-civicrm fundraising-civicrm output: OK - load average: 0.03, 0.09, 0.08 [02:41:40] RECOVERY Current Users is now: OK on fundraising-civicrm fundraising-civicrm output: USERS OK - 0 users currently logged in [02:41:40] RECOVERY Disk Space is now: OK on fundraising-civicrm fundraising-civicrm output: DISK OK [02:41:50] RECOVERY Free ram is now: OK on fundraising-civicrm fundraising-civicrm output: OK: 88% free memory [02:42:50] RECOVERY dpkg-check is now: OK on fundraising-civicrm fundraising-civicrm output: All packages OK [02:51:30] PROBLEM Free ram is now: CRITICAL on puppet-lucid puppet-lucid output: Critical: 3% free memory [03:26:33] RECOVERY Free ram is now: OK on puppet-lucid puppet-lucid output: OK: 20% free memory [03:34:56] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [04:05:16] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [04:35:16] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [05:05:16] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [05:35:16] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [06:05:16] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [06:35:26] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [06:50:26] PROBLEM Free ram is now: WARNING on puppet-lucid puppet-lucid output: Warning: 19% free memory [07:00:26] RECOVERY Free ram is now: OK on puppet-lucid puppet-lucid output: OK: 20% free memory [07:05:26] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [07:35:26] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [08:05:26] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [08:35:26] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [09:05:26] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [09:35:26] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [10:05:26] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [10:30:13] PROBLEM dpkg-check is now: CRITICAL on deployment-web4 deployment-web4 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:30:43] PROBLEM Current Load is now: CRITICAL on deployment-webs1 deployment-webs1 output: CRITICAL - load average: 108.58, 58.69, 23.81 [10:30:43] PROBLEM Free ram is now: CRITICAL on deployment-webs1 deployment-webs1 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:31:13] PROBLEM Free ram is now: CRITICAL on deployment-web3 deployment-web3 output: Critical: 1% free memory [10:31:53] PROBLEM Current Load is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [10:31:53] PROBLEM Current Load is now: CRITICAL on deployment-web2 deployment-web2 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:31:53] PROBLEM Current Users is now: CRITICAL on deployment-web2 deployment-web2 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:32:13] PROBLEM SSH is now: CRITICAL on deployment-web4 deployment-web4 output: CRITICAL - Socket timeout after 10 seconds [10:32:13] PROBLEM Current Load is now: CRITICAL on deployment-web4 deployment-web4 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:33:13] PROBLEM Current Users is now: CRITICAL on deployment-web4 deployment-web4 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:33:13] PROBLEM SSH is now: CRITICAL on deployment-web2 deployment-web2 output: CRITICAL - Socket timeout after 10 seconds [10:34:33] PROBLEM Disk Space is now: CRITICAL on deployment-web4 deployment-web4 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:34:33] PROBLEM Total Processes is now: CRITICAL on deployment-web4 deployment-web4 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:34:43] PROBLEM Disk Space is now: CRITICAL on deployment-web2 deployment-web2 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:34:43] PROBLEM Disk Space is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [10:34:43] PROBLEM Free ram is now: CRITICAL on deployment-web2 deployment-web2 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:34:43] PROBLEM Total Processes is now: CRITICAL on deployment-web2 deployment-web2 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:34:48] PROBLEM SSH is now: CRITICAL on deployment-web deployment-web output: CRITICAL - Socket timeout after 10 seconds [10:34:48] PROBLEM Free ram is now: CRITICAL on deployment-web4 deployment-web4 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:34:48] PROBLEM Free ram is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [10:34:48] PROBLEM Total Processes is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [10:34:53] PROBLEM dpkg-check is now: CRITICAL on deployment-web2 deployment-web2 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:34:53] PROBLEM dpkg-check is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [10:35:23] PROBLEM Current Load is now: CRITICAL on deployment-web3 deployment-web3 output: CRITICAL - load average: 43.28, 64.90, 33.31 [10:36:03] RECOVERY Free ram is now: OK on deployment-web3 deployment-web3 output: OK: 64% free memory [10:36:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [10:37:53] PROBLEM Disk Space is now: CRITICAL on deployment-webs1 deployment-webs1 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:37:53] PROBLEM SSH is now: CRITICAL on deployment-webs1 deployment-webs1 output: CRITICAL - Socket timeout after 10 seconds [10:37:53] PROBLEM Current Users is now: CRITICAL on deployment-webs1 deployment-webs1 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:37:53] PROBLEM Total Processes is now: CRITICAL on deployment-webs1 deployment-webs1 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:38:13] PROBLEM dpkg-check is now: CRITICAL on deployment-webs1 deployment-webs1 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:45:23] PROBLEM Current Load is now: WARNING on deployment-web3 deployment-web3 output: WARNING - load average: 0.25, 8.81, 17.49 [10:47:13] PROBLEM host: deployment-web2 is DOWN address: deployment-web2 CRITICAL - Host Unreachable (deployment-web2) [10:49:23] RECOVERY Total Processes is now: OK on deployment-web4 deployment-web4 output: PROCS OK: 99 processes [10:49:33] RECOVERY Free ram is now: OK on deployment-web4 deployment-web4 output: OK: 90% free memory [10:50:03] RECOVERY dpkg-check is now: OK on deployment-web4 deployment-web4 output: All packages OK [10:52:03] RECOVERY SSH is now: OK on deployment-web4 deployment-web4 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:52:03] RECOVERY Current Load is now: OK on deployment-web4 deployment-web4 output: OK - load average: 0.23, 0.22, 0.09 [10:52:23] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [10:52:43] PROBLEM host: deployment-webs1 is DOWN address: deployment-webs1 CRITICAL - Host Unreachable (deployment-webs1) [10:53:03] RECOVERY Current Users is now: OK on deployment-web4 deployment-web4 output: USERS OK - 0 users currently logged in [10:54:23] RECOVERY Disk Space is now: OK on deployment-web4 deployment-web4 output: DISK OK [10:56:43] RECOVERY Current Load is now: OK on deployment-web2 deployment-web2 output: OK - load average: 0.49, 0.17, 0.06 [10:56:43] RECOVERY Current Users is now: OK on deployment-web2 deployment-web2 output: USERS OK - 0 users currently logged in [10:56:43] RECOVERY Current Load is now: OK on deployment-web deployment-web output: OK - load average: 0.28, 0.16, 0.06 [10:56:53] RECOVERY host: deployment-web2 is UP address: deployment-web2 PING OK - Packet loss = 0%, RTA = 0.68 ms [10:56:53] RECOVERY host: deployment-web is UP address: deployment-web PING OK - Packet loss = 0%, RTA = 1.49 ms [10:58:03] RECOVERY SSH is now: OK on deployment-web2 deployment-web2 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:59:33] RECOVERY Free ram is now: OK on deployment-web2 deployment-web2 output: OK: 62% free memory [10:59:33] RECOVERY SSH is now: OK on deployment-web deployment-web output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:59:33] RECOVERY Total Processes is now: OK on deployment-web2 deployment-web2 output: PROCS OK: 119 processes [10:59:38] RECOVERY Disk Space is now: OK on deployment-web2 deployment-web2 output: DISK OK [10:59:38] RECOVERY dpkg-check is now: OK on deployment-web2 deployment-web2 output: All packages OK [10:59:38] RECOVERY Free ram is now: OK on deployment-web deployment-web output: OK: 80% free memory [10:59:38] RECOVERY Disk Space is now: OK on deployment-web deployment-web output: DISK OK [10:59:38] RECOVERY Current Users is now: OK on deployment-web deployment-web output: USERS OK - 0 users currently logged in [10:59:39] RECOVERY Total Processes is now: OK on deployment-web deployment-web output: PROCS OK: 118 processes [10:59:43] RECOVERY dpkg-check is now: OK on deployment-web deployment-web output: All packages OK [11:06:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [11:22:43] PROBLEM host: deployment-webs1 is DOWN address: deployment-webs1 CRITICAL - Host Unreachable (deployment-webs1) [11:36:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [11:52:43] PROBLEM host: deployment-webs1 is DOWN address: deployment-webs1 CRITICAL - Host Unreachable (deployment-webs1) [12:06:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [12:22:43] PROBLEM host: deployment-webs1 is DOWN address: deployment-webs1 CRITICAL - Host Unreachable (deployment-webs1) [12:29:53] Just lost gadgets at http://hi.wikipedia.beta.wmflabs.org again! [12:31:30] petan: Was it u again :P ? [12:36:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [12:47:23] Just lost gadgets at http://hi.wikipedia.beta.wmflabs.org again! [12:47:43] * Sid-G said that 17 minutes ago too! [12:47:49] MaxSem|away: Reedy ^^^ [12:51:03] but Sid-G we are doing the rollout to Hindi Wikipedia *today* so everyone is pretty focused on that [12:51:09] (the deployment to all Wikipedias) [12:51:35] ohhhkay [12:51:48] Sid-G: you've been following that, right? [12:52:10] sumanah:I know that its scheduled for 1 march [12:52:13] Sid-G: https://www.mediawiki.org/wiki/MediaWiki_1.19/Roadmap#Deployment_schedule [12:52:22] sumanah:read that [12:52:26] ok. [12:52:38] sumanah:Has 1 march begun anywhere yet? [12:52:43] PROBLEM host: deployment-webs1 is DOWN address: deployment-webs1 CRITICAL - Host Unreachable (deployment-webs1) [12:52:53] Sid-G: it's almost 1 march in Sydney, Australia. [12:53:03] wow [12:53:20] it's 11:53 pm there. [12:53:35] but more importantly, the rollout is actually Feb 29 [12:53:43] huh? [12:54:00] https://www.mediawiki.org/w/index.php?title=MediaWiki_1.19%2FRoadmap&diff=504929&oldid=504610 [12:54:02] * Sid-G goes to read the schedule again [12:54:08] it was always scheduled for Wednesday. [12:54:12] The date was a little off. [12:54:19] oh [12:54:26] must've been 1 march when i read [12:54:49] you can see why I suggested you look at it. :-) [12:55:11] wow, that means i should fix the untranslated messages right away [12:55:49] https://wikitech.wikimedia.org/view/Software_deployments#Week_of_February_27 is also generally accurate in case you want something to keep up on in the future [12:55:53] yes, please do, Sid-G [12:56:49] sumanah:just to let u know, chrome's telling me wikitech.wikimedia.org is "The site's security certificate is not trusted!" [12:57:04] Yes. [12:57:18] Sid-G: https://bugzilla.wikimedia.org/show_bug.cgi?id=27291 There is ongoing argument about that. [12:57:23] :-/ [13:00:09] sumanah: So when will be a good time to update twinkle sometime in the next week? [13:00:56] I don't know. [13:01:14] Sid-G: I have about 0 expertise on that. [13:01:14] ... [13:02:16] well, pick a date! Neither do I have any! I'm just worried that if I mess it up and someone messes a MW update up, I wont be able to tell the difference if it was me who messed up [13:02:32] Neither do I have any! -->any expertise [13:03:09] Sid-G: then perhaps (wild idea) you should ask someone who *does* have expertise, like Reedy, MaxSem|away, or people in #wikimedia-operations [13:03:58] :Any ideas?Reedy, MaxSem|away [13:04:24] the ops channel (#wikimedia-operations) will be much better to ask this in, as it has the sysadmins in it [13:05:40] * Sid-G asked over there [13:06:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [13:22:43] PROBLEM host: deployment-webs1 is DOWN address: deployment-webs1 CRITICAL - Host Unreachable (deployment-webs1) [13:32:50] so I tried to create wikistream-1 last night, I guess I did it wrong? [13:33:00] n/win 2 [13:33:07] whoops :) [13:36:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [13:37:09] mhm, yes - gadgets are broken, and null edit doesn't help [13:44:07] MaxSem:So when will it be fixed? tomorrow? day after tomorrow? [13:52:43] PROBLEM host: deployment-webs1 is DOWN address: deployment-webs1 CRITICAL - Host Unreachable (deployment-webs1) [13:58:02] MaxSem: tried a proper edit and revert? [14:01:48] That fixed it [14:01:51] Again [14:03:06] I wonder if that purge hook might be useful too [14:06:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [14:09:29] crap [14:10:04] I was going to investigate what was stored in memcached, but now I'll have to wait for it to eget broken again [14:20:40] petan: Are you there? [14:22:43] PROBLEM host: deployment-webs1 is DOWN address: deployment-webs1 CRITICAL - Host Unreachable (deployment-webs1) [14:36:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [14:52:43] PROBLEM host: deployment-webs1 is DOWN address: deployment-webs1 CRITICAL - Host Unreachable (deployment-webs1) [15:06:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [15:22:43] PROBLEM host: deployment-webs1 is DOWN address: deployment-webs1 CRITICAL - Host Unreachable (deployment-webs1) [15:36:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [15:46:03] ACKNOWLEDGEMENT host: deployment-webs1 is DOWN address: deployment-webs1 CRITICAL - Host Unreachable (deployment-webs1) [16:06:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [16:36:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [17:06:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [17:17:43] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [17:31:15] 02/29/2012 - 17:31:14 - Creating a home directory for cmcmahon at /export/home/deployment-prep/cmcmahon [17:32:15] 02/29/2012 - 17:32:15 - Updating keys for cmcmahon [17:36:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [17:48:03] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [18:06:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [18:08:37] er, anyone know what I can do about wikistream-1? [18:08:56] should i create a ticket for help w/ it or something? [18:10:44] Hmm [18:10:53] It's in the pending state so we've probably run out of room on the nodes [18:11:11] I'll ack it in nagios, just wonder around until Ryan installs some new beefy servers [18:11:16] !nagios [18:11:17] http://nagios.wmflabs.org/nagios3 [18:11:36] Damianz: oh so there aren't any resources to create the new instance? [18:12:21] Damianz: it's the first one I created so I wasn't sure I did everything right. I intentionally picked the smallest size (m1.tiny) [18:12:50] How long ago did you make it? [18:13:12] Hmm a few hours ago from the look of it [18:13:22] The nodes are a little full at the moment pending new hardware [18:13:26] 2012-02-29T03:28:02Z [18:13:41] I'll reply to the thread on the mailing list about it and see if Ryan can do some magic with the shceduler config to squeeze some more on [18:13:54] 15 hrs ago? [18:14:12] oh there is a mailing list too? i'm not on there [18:14:32] is that internal, or open to other people who work in the labs area? [18:14:51] open [18:14:57] there's a link on the labsconsole main page [18:16:08] or saying that [18:16:11] Someone seems to have removed the link [18:16:38] i was going to say, i must be blind :) [18:16:56] https://lists.wikimedia.org/mailman/listinfo/labs-l [18:17:05] cheers [18:17:16] i'll add it to the main page, if you aren't doing it already [18:17:19] It seems to have got replaced with a pink unicorn, I prefer the unicorn :P [18:17:26] heheh [18:18:03] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [18:21:15] here comes the god [18:23:14] LeslieCarr doesn't look very god like [18:23:32] ? [18:23:55] [18:21:15] here comes the god [18:24:02] hah [18:24:09] <^demon> Maybe he was referring to brion :p [18:24:17] sure i was! [18:24:26] <^demon> LeslieCarr: Not that you're not godlike :) [18:24:40] but i'm not an oracle though [18:24:56] oy [18:36:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [18:43:53] RECOVERY host: reportcard1 is UP address: reportcard1 PING OK - Packet loss = 0%, RTA = 0.74 ms [19:06:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [19:36:23] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [19:41:12] 02/29/2012 - 19:41:12 - Updating keys for cmcmahon [19:41:16] 02/29/2012 - 19:41:16 - Updating keys for cmcmahon [19:41:23] 02/29/2012 - 19:41:23 - Updating keys for cmcmahon [19:48:59] Ryan_Lane: Have you tested 'gluster volume set auth.allow ' to verify that it does something? [19:49:32] hm. I guess I didn't try to gluster mount from a non-allowed node [19:49:38] but it adds to an allow list [19:50:16] is it not doing anything for you? [19:50:31] by default the allow list is * [19:50:43] I see the change in volume-info. But I seem able to mount from another system no matter what I try... [19:50:50] hm. [19:50:53] e.g. set auth.reject to "*" [19:50:57] that's bad if that's true [19:51:11] Most likely I'm doing something wrong and/or misunderstanding how this is meant to work. [19:51:12] I hope. [19:51:28] oh [19:51:31] use IP addresses [19:51:44] Yes, I tried explicitly rejecting the system I'm testing from. [19:51:48] Still mounts, no problems. [19:51:50] hm [19:51:52] that's bad [19:51:59] I need to test that on the virt cluster [19:52:02] Do you have a minute to check my work? In case I'm doing something silly? [19:52:07] sure [19:52:10] which instance? [19:53:52] Oh, by 'check my work' I mean -- see if you can produce the same behavior untainted by my work :) [19:54:06] But, just a second, it may be that the volume is mounting but can't be accessed when it's rejected. Testing that now. [19:54:53] Hi Ryan [19:54:57] Damianz: howdy [19:58:37] Ryan_Lane: False alarm. I was confused because mount succeeds on a rejected volume -- but I can't actually access it. [19:58:44] So, reasonable behavior, mostly. [19:58:45] oh. ok. good :) [19:58:47] yeah [19:58:50] it should reject the mount [19:59:14] I'm ls'ing the rejected volume and ls is just hanging. So that's something to watch for :) [20:00:34] heh [20:00:35] indeed [20:00:56] yeah, hangs for me too [20:01:00] Can you ctrl-c? [20:01:07] nope [20:01:16] Punishment Mode Enabled! [20:01:33] maybe if it's mounted intr? [20:02:03] nope :D [20:02:29] hahaha. indeed [20:07:29] edsu: In the words of Ryan_Lane "We'll just need to delete and recreate the instance." [20:15:10] ok :) [20:20:43] <^demon> Ryan_Lane: What time did you plan on moving gerrit? [20:20:55] Damianz: i guess that's something i can do? [20:21:04] Should be [20:23:25] Hopefully you have a delete button next to it on the manage instance page, otherwise poke Ryan_Lane to do it or find someone that does. [20:33:06] so i guess lucid is preferred instance type? [20:42:59] edsu: yep [20:43:05] ^demon: I don't know, why? [20:43:42] <^demon> Well if you were like "oh in about half an hour" I was going to stop working on gerrit stuff. [20:43:58] when I do it I'm going to shut off gerrit [20:44:06] so you'll be forced to ;) [20:44:28] <^demon> I didn't know if I'd get booted from formey [20:44:32] oh [20:44:33] that [20:44:34] hm [20:44:35] true [20:44:47] want me to let you know like 30 mins before I do it? [20:44:52] <^demon> Yeah that'd be good. [20:44:54] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [20:44:54] <^demon> Thanks [20:51:48] Ryan_Lane: are you doing labs maintenance ? [20:51:53] no, why? [20:53:36] LeslieCarr: ? [20:53:59] got a cookies aren't enabled error when trying to login [20:54:03] logged out then logged in again [20:54:06] hm. weird [20:54:12] (same losing login thing happening) [20:54:21] yeah, I'm also having that issue [20:54:32] though it did log me in [20:54:39] so i dunno what's up with that [20:54:40] yeah, I'm not sure what's up with that [20:54:56] sessions are being stored in memcache [20:54:59] Ryan has magic issues that randomly appear :P [20:55:15] this happens to me on the cluster, too [20:56:23] the unicorns make them appear [20:56:33] well, annoying but i'd classify as minor issue :) [20:56:41] Leave the poor unicorns alone! [20:56:49] no evictions in memcache [20:57:08] LeslieCarr: are you selecting the "remember me" checkbox? [20:57:16] :p yes [20:57:36] something is wrong on virt4, too [20:58:25] Bah, I really need to re-stich my top :( [20:59:23] ah [20:59:29] I think I see the problem in wikistream [20:59:36] there's a bad security group rule [20:59:45] -180tcp [20:59:45] • 0.0.0.0/0 [20:59:47] err [21:00:03] from −1 to 80 on tcp for 0.0.0.0/0 [21:00:08] -1 isn't valid for tcp [21:00:18] it's a fake entry for icmp [21:00:52] !log wikistream removing from −1 to 80 security group rule, adding a web group, and adding ports 80 and 443 to it [21:01:26] I *really* need to finish those openstackmanager changes [21:01:36] this is getting slow enough to be unbearable [21:06:53] edsu: ok. I think I fixed your issue. you'll need to delete/recreate the instance again [21:07:13] lol [21:07:15] edsu: add the web group and keep the default group when creating your instance [21:07:27] nova should disallow obviously incorrect ports [21:12:49] Ryan_Lane: got it [21:14:10] Ryan_Lane: oh, did you add the web group for me? [21:14:27] yep [21:14:32] thanks! [21:15:09] yw [21:15:14] PROBLEM host: wikistream-1 is DOWN address: wikistream-1 CRITICAL - Host Unreachable (wikistream-1) [21:21:09] PROBLEM dpkg-check is now: CRITICAL on test1 test1 output: DPKG CRITICAL dpkg reports broken packages [21:31:09] RECOVERY dpkg-check is now: OK on test1 test1 output: All packages OK [21:39:13] Ryan_Lane: should `ssh -A edsu@bastion.wmflabs.org` work on bastion1 ; I'm getting an ssh: connect to host bastion.wmflabs.org port 22: Connection timed out [21:39:28] following the instructions at https://labsconsole.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [21:39:41] oh that's for my local machine [21:39:44] my bad [21:39:57] on bastion you can't access bastion [21:40:05] but you can access the local address (bastion1) [21:41:39] RECOVERY host: wikistream-1 is UP address: wikistream-1 PING OK - Packet loss = 0%, RTA = 2.48 ms [21:44:39] PROBLEM Disk Space is now: CRITICAL on wikistream-1 wikistream-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:44:39] PROBLEM Current Load is now: CRITICAL on wikistream-1 wikistream-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:45:19] PROBLEM Free ram is now: CRITICAL on wikistream-1 wikistream-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:46:49] PROBLEM Total Processes is now: CRITICAL on wikistream-1 wikistream-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:47:29] PROBLEM dpkg-check is now: CRITICAL on wikistream-1 wikistream-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:48:48] uhoh, that doesn't look good [21:48:59] PROBLEM Current Users is now: CRITICAL on wikistream-1 wikistream-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:49:01] Give it 15min or so for puppet to run again [21:57:40] edsu: if you sudo to root, then run: puppetd -tv [21:57:44] those errors will go away [21:57:53] it's a bug in our puppet configuration [22:04:09] PROBLEM Current Load is now: CRITICAL on backport backport output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:04:49] PROBLEM Current Users is now: CRITICAL on backport backport output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:05:29] PROBLEM Disk Space is now: CRITICAL on backport backport output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:06:19] PROBLEM Free ram is now: CRITICAL on backport backport output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:07:35] <^demon> Ryan_Lane: You asked for name suggestions? [22:07:39] PROBLEM Total Processes is now: CRITICAL on backport backport output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:07:41] <^demon> unicorn.wikimedia.org, of course. [22:07:48] hahaha [22:08:00] :D [22:08:06] pink.unicorn.wikimedia.org [22:08:29] PROBLEM dpkg-check is now: CRITICAL on backport backport output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:25:48] wait.. gerrit uses a ssh server written in python? [22:26:25] no [22:26:29] written in java [22:26:33] I have python hooks [22:26:44] that ssh back to the gerrit server to add comments and such [22:27:23] Ah [22:27:29] Java is more disturbing [22:27:34] :D [22:27:48] * Damianz thinks to never login to gerrit again [22:28:16] Ryan_Lane: Totally should impliment krb and make gerrit work with it :D [22:28:49] well, it could [22:28:52] webserver auth [22:28:59] that sends the username to gerrit [22:29:06] but I don't think we want web auth to be krb [22:29:13] SAML or something like it would be better [22:29:46] I was thinking more ssh but SSO web wise would be sweet. [22:30:14] I'd really love some oauth wikipedianess that allows 3rd party stuff to validate logins securly. [22:32:13] Damianz, we were talking about browserid a few hours ago in #mediawiki [22:33:23] oooh