[00:00:44] ah [00:00:46] here we go [00:00:49] it works now [00:00:53] I hate glusterfs [00:01:36] it seems the gluster processes were down for that mount on two of the servers [00:02:01] lol [00:03:15] thankfully it was only that one volume [00:03:18] I checked the rest [00:03:24] gluster volume status | grep 'Brick' | grep 'N' [00:04:14] ori-l: works now? [00:04:55] ori-l: and btw, I think kubo is fine, it was just gluster being stupid [00:07:17] bleh. I forgot the -c option to qemu-img to compress the roundtrip instance's disk [00:14:02] PROBLEM host: ve-roundtrip2.pmtpa.wmflabs is DOWN address: 10.4.0.162 CRITICAL - Host Unreachable (10.4.0.162) [00:29:33] PROBLEM Free ram is now: CRITICAL on dumps-bot3.pmtpa.wmflabs 10.4.0.118 output: Critical: 5% free memory [00:31:32] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 4.92, 5.15, 5.06 [00:36:53] PROBLEM Free ram is now: WARNING on bots-3.pmtpa.wmflabs 10.4.0.59 output: Warning: 15% free memory [00:41:23] 12/29/2012 - 00:41:23 - Creating a home directory for smccandlish at /export/keys/smccandlish [00:41:34] can we not kill that bot yet [00:43:36] heh [00:44:03] oh good, SMcCandlish is making a Labs account! [00:44:03] PROBLEM host: ve-roundtrip2.pmtpa.wmflabs is DOWN address: 10.4.0.162 CRITICAL - Host Unreachable (10.4.0.162) [00:44:13] can probably kill that bot, yeah [00:44:17] we need to run it on the new host [00:44:25] there's still a bot that creates keys [00:47:11] 12/29/2012 - 00:47:11 - Updating keys for smccandlish at /export/keys/smccandlish [00:47:56] keys are harder though [00:48:02] stupid opensshd [00:51:52] PROBLEM Free ram is now: CRITICAL on bots-3.pmtpa.wmflabs 10.4.0.59 output: Critical: 3% free memory [00:52:34] you know what you really should do? [00:52:44] make labsconsole talk to salt, make salt dump the key [00:52:48] insant and KILL BOTTAGE [00:53:23] Damianz: that would be ideal. yes [00:54:05] we need salt-api [00:54:57] I need to figure out talking to salt from djano in a non-blocking way heh [00:56:53] RECOVERY Free ram is now: OK on bots-3.pmtpa.wmflabs 10.4.0.59 output: OK: 114% free memory [00:57:36] if you had salt-api then doing stuff as the user integrated into osm just got easy heh [00:58:00] could totally write a php client though ;P [00:58:00] yes [00:58:02] that's the idea [00:58:15] yeah, could write a php client, but I'd prefer not to [00:58:24] I'd like to have keystone integration with salt-api [00:59:12] Personally I'm just going to pipe everything though a rest api I control so I'm not too fussed about salt auth... anything automagic can be minion delegated access [00:59:37] what do you mean? [00:59:48] you're going to write your own rest api for it? [01:00:21] grrrrr…. Ryan_Lane, do you mind logging into nova-precise2 and telling me what I've screwed up with my paths/aliases/etc? I predict it will take you about 45 seconds to diagnose. [01:00:31] sure :) [01:02:17] Well I'm going to expose an api for the asset management thing I've half written, so that will just 'proxy' to salt in the same way their api works.... it's all abstracted ish so if you do like power off it will figure out of it's a vmware guest, ec2 instance etc, tied together with enc glue, pxe installs, ad dns configs and soon salt magic [01:02:20] I think this is a matter of missing something php related [01:02:51] andrewbogott: libapache2-mod-php5 [01:02:58] Ryan_Lane, I thought that too, but… if I comment out wgarticlepath then… https://nova-precise2.pmtpa.wmflabs/w/index.php?title=Main_Page [01:02:59] it works [01:03:05] Which convinced me that php was functioning [01:03:30] hm [01:03:32] interesting [01:03:41] I'd still try installing that package [01:03:46] and see if that fixes it [01:03:49] yep, ok [01:05:17] Ryan_Lane: Yeah, that seems to help. [01:05:23] heh [01:05:28] annoying, right [01:05:28] ? [01:05:31] But, crap, I still feel like my evidence for php being fine is convincing [01:06:23] Once I get OSM working properly will the sidebar magically appear, or do I need to do something specific for that? [01:07:22] nope. you need to modify the sidebar config [01:07:34] which is in mediawiki's MediaWiki namespace [01:07:52] https://labsconsole.wikimedia.org/wiki/MediaWiki:Sidebar [01:08:15] ok -- that explains why I couldn't find it in the config. [01:08:16] thanks [01:09:35] also... [01:09:35] https://labsconsole.wikimedia.org/wiki/MediaWiki:Sidebar/Group:sysadmin [01:09:43] https://labsconsole.wikimedia.org/wiki/MediaWiki:Sidebar/Group:cloudadmin [01:09:53] https://labsconsole.wikimedia.org/wiki/MediaWiki:Sidebar/Group:netadmin [01:09:53] PROBLEM Total processes is now: WARNING on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS WARNING: 179 processes [01:10:02] https://labsconsole.wikimedia.org/wiki/MediaWiki:Sidebar/Group:user [01:14:52] RECOVERY Total processes is now: OK on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS OK: 99 processes [01:15:32] PROBLEM host: ve-roundtrip2.pmtpa.wmflabs is DOWN address: 10.4.0.162 CRITICAL - Host Unreachable (10.4.0.162) [01:16:33] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.66, 4.81, 4.94 [01:24:34] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 4.93, 5.17, 5.09 [01:36:38] 12/29/2012 - 01:36:37 - Updating keys for saper at /export/keys/saper [01:45:33] PROBLEM host: ve-roundtrip2.pmtpa.wmflabs is DOWN address: 10.4.0.162 CRITICAL - Host Unreachable (10.4.0.162) [02:15:34] PROBLEM host: ve-roundtrip2.pmtpa.wmflabs is DOWN address: 10.4.0.162 CRITICAL - Host Unreachable (10.4.0.162) [02:37:32] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 4.89, 4.95, 5.01 [02:48:02] PROBLEM host: ve-roundtrip2.pmtpa.wmflabs is DOWN address: 10.4.0.162 CRITICAL - Host Unreachable (10.4.0.162) [02:56:40] 12/29/2012 - 02:56:40 - Updating keys for gifti at /export/keys/gifti [03:02:32] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.85, 4.92, 4.99 [03:18:02] PROBLEM host: ve-roundtrip2.pmtpa.wmflabs is DOWN address: 10.4.0.162 CRITICAL - Host Unreachable (10.4.0.162) [03:32:32] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 5.34, 5.21, 5.15 [03:36:42] PROBLEM Free ram is now: UNKNOWN on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: NRPE: Call to fork() failed [03:41:22] PROBLEM Current Users is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [03:41:42] PROBLEM Free ram is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [03:42:02] PROBLEM Disk Space is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [03:42:02] PROBLEM dpkg-check is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [03:43:23] PROBLEM Total processes is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [03:44:32] PROBLEM SSH is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: Server answer: [03:44:42] PROBLEM Current Load is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [03:46:43] RECOVERY Free ram is now: OK on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: OK: 33% free memory [03:47:02] RECOVERY Disk Space is now: OK on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: DISK OK [03:47:03] RECOVERY dpkg-check is now: OK on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: All packages OK [03:48:02] PROBLEM host: ve-roundtrip2.pmtpa.wmflabs is DOWN address: 10.4.0.162 CRITICAL - Host Unreachable (10.4.0.162) [03:48:22] RECOVERY Total processes is now: OK on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: PROCS OK: 120 processes [03:49:32] RECOVERY SSH is now: OK on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [03:49:42] RECOVERY Current Load is now: OK on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: OK - load average: 0.16, 1.14, 1.20 [03:51:22] RECOVERY Current Users is now: OK on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: USERS OK - 0 users currently logged in [03:57:03] RECOVERY Free ram is now: OK on patchtest.pmtpa.wmflabs 10.4.0.69 output: OK: 411% free memory [03:57:03] RECOVERY dpkg-check is now: OK on patchtest.pmtpa.wmflabs 10.4.0.69 output: All packages OK [03:57:03] RECOVERY dpkg-check is now: OK on patchtest2.pmtpa.wmflabs 10.4.0.74 output: All packages OK [03:57:03] RECOVERY Free ram is now: OK on patchtest2.pmtpa.wmflabs 10.4.0.74 output: OK: 479% free memory [03:57:43] RECOVERY Total processes is now: OK on patchtest.pmtpa.wmflabs 10.4.0.69 output: PROCS OK: 85 processes [03:58:14] RECOVERY Current Users is now: OK on patchtest.pmtpa.wmflabs 10.4.0.69 output: USERS OK - 0 users currently logged in [03:58:14] RECOVERY Current Users is now: OK on patchtest2.pmtpa.wmflabs 10.4.0.74 output: USERS OK - 0 users currently logged in [03:58:33] RECOVERY Current Load is now: OK on patchtest.pmtpa.wmflabs 10.4.0.69 output: OK - load average: 0.00, 0.01, 0.00 [03:58:54] RECOVERY Total processes is now: OK on patchtest2.pmtpa.wmflabs 10.4.0.74 output: PROCS OK: 83 processes [03:59:03] RECOVERY Disk Space is now: OK on patchtest.pmtpa.wmflabs 10.4.0.69 output: DISK OK [03:59:04] RECOVERY Disk Space is now: OK on patchtest2.pmtpa.wmflabs 10.4.0.74 output: DISK OK [03:59:13] RECOVERY Current Load is now: OK on patchtest2.pmtpa.wmflabs 10.4.0.74 output: OK - load average: 0.00, 0.01, 0.00 [04:05:02] PROBLEM dpkg-check is now: CRITICAL on patchtest.pmtpa.wmflabs 10.4.0.69 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:05:03] PROBLEM dpkg-check is now: CRITICAL on patchtest2.pmtpa.wmflabs 10.4.0.74 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:05:03] PROBLEM Free ram is now: CRITICAL on patchtest2.pmtpa.wmflabs 10.4.0.74 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:05:13] PROBLEM Free ram is now: CRITICAL on patchtest.pmtpa.wmflabs 10.4.0.69 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:05:53] PROBLEM Total processes is now: CRITICAL on patchtest.pmtpa.wmflabs 10.4.0.69 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:06:23] PROBLEM Current Users is now: CRITICAL on patchtest.pmtpa.wmflabs 10.4.0.69 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:06:23] PROBLEM Current Users is now: CRITICAL on patchtest2.pmtpa.wmflabs 10.4.0.74 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:06:43] PROBLEM Current Load is now: CRITICAL on patchtest.pmtpa.wmflabs 10.4.0.69 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:07:03] PROBLEM Disk Space is now: CRITICAL on patchtest.pmtpa.wmflabs 10.4.0.69 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:07:04] PROBLEM Total processes is now: CRITICAL on patchtest2.pmtpa.wmflabs 10.4.0.74 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:07:13] PROBLEM Disk Space is now: CRITICAL on patchtest2.pmtpa.wmflabs 10.4.0.74 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:07:23] PROBLEM Current Load is now: CRITICAL on patchtest2.pmtpa.wmflabs 10.4.0.74 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:18:02] PROBLEM host: ve-roundtrip2.pmtpa.wmflabs is DOWN address: 10.4.0.162 CRITICAL - Host Unreachable (10.4.0.162) [04:20:32] RECOVERY host: ve-roundtrip2.pmtpa.wmflabs is UP address: 10.4.0.162 PING OK - Packet loss = 0%, RTA = 0.67 ms [04:26:28] 12/29/2012 - 04:26:27 - Updating keys for smccandlish at /export/keys/smccandlish [04:31:11] 12/29/2012 - 04:31:10 - Updating keys for smccandlish at /export/keys/smccandlish [04:35:55] 12/29/2012 - 04:35:54 - Updating keys for smccandlish at /export/keys/smccandlish [04:41:31] 12/29/2012 - 04:41:31 - Updating keys for smccandlish at /export/keys/smccandlish [04:47:08] 12/29/2012 - 04:47:07 - Updating keys for smccandlish at /export/keys/smccandlish [04:47:32] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.58, 4.80, 4.94 [04:51:43] 12/29/2012 - 04:51:43 - Updating keys for smccandlish at /export/keys/smccandlish [04:55:32] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 5.45, 5.27, 5.10 [04:56:47] 12/29/2012 - 04:56:46 - Updating keys for smccandlish at /export/keys/smccandlish [05:05:33] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.71, 4.88, 4.99 [05:09:48] Anyone home? [05:11:49] Gerrit won't let me log in. [05:12:17] I did logout of labsconsole and back in. [05:13:34] Hello, hello? [05:15:46] Any Gerrit admins around? [05:21:54] 12/29/2012 - 05:21:53 - Updating keys for smccandlish at /export/keys/smccandlish [05:22:50] Thanks. [05:24:46] It still always says "Incorrect username or password." I've tried smccandlish, Smccandlish, SMcCandlish and my shell ID, mech, and they all fail. [05:26:14] 12/29/2012 - 05:26:13 - Updating keys for smccandlish at /export/keys/smccandlish [05:32:45] Still doesn't work. I'll try again tomorrow. [05:34:00] I can't log into Gerrit, either. [05:36:34] https://dl.dropbox.com/u/11458013/Screenshot%20from%202012-12-29%2000%3A31%3A02.png [05:37:02] Looks like a different error than SMcCandlish was having. [06:21:44] So I can log into kubo now, but not Gerrit. Zero-sum accessibility? :) [06:22:13] well, I fixed kubo for you earlier ;) [06:22:25] I think you're going to need ^demon for gerrit [06:22:52] franny: try to log in for me? [06:22:56] I'm tailing the logs [06:23:45] Just tried again, Ryan_Lane. [06:23:57] and the world imploded [06:25:13] -_- [06:25:19] it's trying to look you up by fran [06:25:22] for some reason [06:25:33] That's my shell account name. [06:25:45] Caused by: java.sql.BatchUpdateException: Duplicate entry 'username:fran' for key 'PRIMARY' [06:25:48] hooray [06:25:58] But "fran" gives me "incorrect username or password." [06:25:59] did you initially log in with an email address? [06:26:09] Not that I'm aware. [06:26:15] did you ever log in with fran? [06:26:26] I think I might have. [06:26:31] that's the problem [06:26:41] Why would it not work anymore? [06:26:43] you were renamed after you had logged in with fran [06:26:51] By whom? [06:26:53] gerrit doesn't handle that well [06:27:04] no clue [06:27:17] you know what [06:27:30] you could write a web interface for renaming and use salt to do remote app changes [06:27:41] could. yes. [06:27:42] * Damianz finds salty uses for all Ryan's problems :P [06:27:52] this again assumes salt-api [06:28:05] franny: I think you're going to need ^demon to solve this problem [06:28:11] shame I dislike the code :P [06:28:18] though you know what they say; fuck it, ship it [06:28:36] which code? salt-api? [06:28:44] All code :P [06:28:48] it's considered alpha right now, help them write it ;) [06:28:49] ah [06:28:49] heh [06:29:02] franny: you seem to have two accounts [06:29:02] I don't like the way they've done modules... but I might [06:29:06] in labsconsole [06:29:09] Would be interesting to make ESSO work with it [06:29:12] Fran and https://labsconsole.wikimedia.org/w/index.php?title=User:Fran_McCrory&action=edit&redlink=1 [06:29:14] 'Aborted' < The most helpful error ever seen from an app [06:29:49] so, yeah, indeed you were renamed at some point [06:29:54] PROBLEM Total processes is now: WARNING on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS WARNING: 156 processes [06:30:01] did you have a svn account? [06:30:08] I remember someone saying about svn not working [06:30:12] it's possible this happened when the real name bug was on labsconsole [06:30:24] Huh. [06:30:36] (if you set your real name in labsconsole, it would rename your account) [06:30:46] I fixed that at some point last week [06:30:51] ROFL [06:31:02] That was probably it. [06:31:05] how does it even have ad access to do that [06:31:05] likely [06:31:06] -.- [06:31:23] Well, "Fran McCrory" is consistent with my username on everything else Wikimedia. [06:31:42] Damianz: because real name is hardcoded in the ldap plugin to be cn [06:31:52] and I fixed mediawiki core for updating preferences [06:31:53] PROBLEM Total processes is now: WARNING on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS WARNING: 151 processes [06:31:57] we use cn for the username [06:32:00] so, yeah, fail [06:32:05] wtf [06:32:11] so many attrubites to choose from [06:32:29] guess it's not sAMAccountName at least [06:32:32] to be fair, the ldap extension is really old and preference syncing is one of the earlier features ;) [06:33:10] I don't like to make backwards incompatible changes because it makes support a nightmare [06:33:13] so I never changed this [06:33:23] I'll be making that backwards incompatible change soon :) [06:33:45] (it'll be configurable, so you can set it to any attribute you want) [06:33:56] if $env == 'labs' # sekrit code [06:46:52] RECOVERY Total processes is now: OK on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS OK: 146 processes [06:54:52] RECOVERY Total processes is now: OK on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS OK: 150 processes [07:14:32] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 8.85, 6.26, 5.45 [07:54:33] RECOVERY Free ram is now: OK on dumps-bot3.pmtpa.wmflabs 10.4.0.118 output: OK: 32% free memory [08:30:22] PROBLEM Free ram is now: CRITICAL on dumps-bot2.pmtpa.wmflabs 10.4.0.60 output: Critical: 5% free memory [12:07:12] PROBLEM Total processes is now: WARNING on bots-3.pmtpa.wmflabs 10.4.0.59 output: PROCS WARNING: 151 processes [12:09:52] PROBLEM Free ram is now: WARNING on bots-3.pmtpa.wmflabs 10.4.0.59 output: Warning: 8% free memory [12:14:53] PROBLEM Free ram is now: CRITICAL on bots-3.pmtpa.wmflabs 10.4.0.59 output: Critical: 3% free memory [12:17:12] RECOVERY Total processes is now: OK on bots-3.pmtpa.wmflabs 10.4.0.59 output: PROCS OK: 150 processes [12:29:03] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [12:59:52] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [13:29:53] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [13:56:02] * Beetstra looks sadly at bots-3 [13:56:20] What has changed there? Why do my bots suddenly bring that box down so fast ... ? [13:58:48] afaik it was rebooted [13:59:53] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [14:03:02] giftpflanze - I see complaints of low memory above [14:03:23] well, i can't tell you why :) [14:03:58] * Beetstra tickles petan with a USB3-cable [14:29:53] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [15:00:13] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [15:30:12] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [15:35:23] RECOVERY Free ram is now: OK on dumps-bot2.pmtpa.wmflabs 10.4.0.60 output: OK: 32% free memory [16:00:12] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [16:30:12] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [17:00:13] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [17:30:32] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [17:49:42] PROBLEM Free ram is now: WARNING on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: Warning: 19% free memory [18:00:32] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [18:30:33] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [19:00:43] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [19:30:53] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [20:02:12] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [20:32:12] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [21:02:13] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [21:32:14] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [21:51:41] 12/29/2012 - 21:51:41 - Updating keys for mwang at /export/keys/mwang [22:02:22] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [22:21:29] @labs-info bots-3 [22:21:29] [Name bots-3 doesn't exist but resolves to I-000000e5] I-000000e5 is Nova Instance with name: bots-3, host: virt8, IP: 10.4.0.59 of type: m1.small, with number of CPUs: 1, RAM of this size: 2048M, member of project: bots, size of storage: 30 and with image ID: lucid-server-cloudimg-amd64.img [22:21:42] @labs-info bots-nr1 [22:21:42] [Name bots-nr1 doesn't exist but resolves to I-0000049e] I-0000049e is Nova Instance with name: bots-nr1, host: virt5, IP: 10.4.1.2 of type: m1.small, with number of CPUs: 1, RAM of this size: 2048M, member of project: bots, size of storage: 30 and with image ID: ubuntu-12.04-precise [22:22:10] @labs-project-users bots [22:22:10] Following users are in this project (displaying 19 of 40 total): Addshore, Alejrb, Andrew Bogott, Aude, Beetstra, DamianZaremba, DeltaQuad, Dzahn, Fastily, Hashar, Hydriz, Hyperon, Jasonspriggs, Jeremyb, Johnduhart, Kaldari, Krinkle, Ryan Lane, Lcarr, [22:22:34] * jeremyb spies a european [22:22:38] :o [22:24:11] @labs-info bots-2 [22:24:11] [Name bots-2 doesn't exist but resolves to I-0000009c] I-0000009c is Nova Instance with name: bots-2, host: virt6, IP: 10.4.0.42 of type: m1.small, with number of CPUs: 1, RAM of this size: 2048M, member of project: bots, size of storage: 30 and with image ID: lucid-server-cloudimg-amd64.img [22:24:13] @labs-info bots-4 [22:24:14] [Name bots-4 doesn't exist but resolves to I-000000e8] I-000000e8 is Nova Instance with name: bots-4, host: virt6, IP: 10.4.0.64 of type: m1.small, with number of CPUs: 1, RAM of this size: 2048M, member of project: bots, size of storage: 30 and with image ID: lucid-server-cloudimg-amd64.img [22:24:25] interesting [22:24:33] talking to a bot? [22:24:37] no [22:24:41] checking images [22:24:45] we use precise on nr1 [22:24:49] didn't even notice [22:24:59] images of a bot? ew! [22:25:05] of ubuntu [22:25:06] :P [22:25:39] you even named it???? ok ok, i will stop [22:28:33] petan: where is the user directory stuff configured on bots-apache01? [22:29:12] Damianz: also, you set up cluebot incorrectly on bots-apache01 ;) [22:29:13] /etc/apache2/mods-enabled/ [22:29:22] I think there [22:29:25] not sure [22:29:42] don't really care :P [22:29:48] someone broke it, I hacky fixed it [22:29:50] Damianz: you added cluebot into sites-enabled. it should be in sites-available with a link to enabled ;) [22:30:02] * petan slaps Damianz for being hacky [22:30:06] if people didn't borke things it wouldn't be a mess [22:30:25] heh [22:30:29] it would be no fun [22:30:43] no ops would be needed [22:30:49] to fix it [22:30:49] than they wouldn't be people [22:31:22] ops should make structured, tested changes in a documented workflow [22:31:25] otherwise it's mayhem [22:32:09] reminds me I wanted to make a documentation for wm-bot [22:32:23] PROBLEM host: bots-3.pmtpa.wmflabs is DOWN address: 10.4.0.59 CRITICAL - Host Unreachable (10.4.0.59) [22:32:38] wut [22:32:42] bots-3 down? [22:32:54] weren't bots there [22:33:50] petan: /data/project/petrb/logs doesn't exist [22:33:58] re: https://bugzilla.wikimedia.org/show_bug.cgi?id=42578 [22:34:16] but it should be /data/project/public_html or not? [22:34:23] ah [22:34:24] right [22:34:24] sorry [22:35:02] OOM freaking python [22:35:03] that directory isn't owned by ou [22:35:04] *you [22:35:13] um, who owns it? [22:35:17] no one [22:35:17] root? [22:35:19] 1002 [22:35:21] aha [22:35:23] weird [22:35:28] that's why it's being denied [22:35:29] why is that problem for apache? [22:35:33] oh [22:35:36] ok [22:35:48] for userdir the owner must match [22:36:12] I think it's possible to disable that [22:36:15] !logs bots bots-3 died OOM because of some python [22:36:26] !log bots bots-3 died OOM because of some python [22:36:27] but it's likely best to not change that [22:36:28] Logged the message, Master [22:36:39] ok, cool [22:36:49] !logs alias htmllogs [22:36:50] Created new alias for this key [22:36:53] !logs [22:36:53] experimental: http://bots.wmflabs.org/~wm-bot/html/%23wikimedia-labs [22:36:53] can you also fix the directory permissions to not be 777? [22:37:05] which one [22:37:10] public_html wasn't [22:37:13] db and logs [22:37:15] when it was on nfs [22:37:19] and wm-bot [22:37:22] ah [22:37:38] that one I likely can do that but problem is that there is no wm-bot user in ldap [22:37:44] maybe that was 1002 user... [22:37:48] ah [22:37:54] wm-bot is running using own unix user [22:37:57] petan: add it as a user via labsconsole [22:38:05] how? [22:38:09] create an account [22:38:14] it's open registration [22:38:16] oh [22:38:22] ok [22:38:32] this is one of the benefits of open registration, you can make system accounts now :) [22:38:33] but can I change default home? [22:38:38] like in passwd [22:38:45] not that it would matter [22:38:46] hm [22:38:52] I could [22:38:53] but I don't like services to have /home [22:39:05] right [22:39:38] ok anyway keep the rights 777 until I do that otherwise logging won't work [22:39:43] * Ryan_Lane nods [22:40:09] btw, the user doesn't need to be added to any projects or anything like that [22:40:11] just needs to exist [22:40:24] aha ok [22:40:37] project membership is only needed to ssh in [22:40:54] it's to allow cross-project service accounts [22:41:28] ok [22:42:42] another place puppet sucks in our usage and making things re-usable is impossible [22:42:57] Damianz: what do you mean? [22:43:14] puppet service accounts in ldap rather than declaring local uids [22:43:22] we could do that too [22:43:32] as long as they are marked as system users [22:43:39] so that the uid range doesn't conflict [22:45:17] Creating directory '/home/wmib'. [22:45:18] Unable to create and initialize directory '/home/wmib'. [22:45:27] it needs to be member of project :P [22:45:52] Failed to add Wmib to bots. This needs the "loginviashell" right. [22:46:02] RECOVERY host: bots-3.pmtpa.wmflabs is UP address: 10.4.0.59 PING OK - Packet loss = 0%, RTA = 0.69 ms [22:48:29] that's because /home/ is fake [22:49:52] RECOVERY Free ram is now: OK on bots-3.pmtpa.wmflabs 10.4.0.59 output: OK: 874% free memory [22:52:36] Damianz I know but how could I create a home for a user on project storage? [22:52:41] that's what bot is doing or not? [22:53:18] does it need a home? you could just reference everything to the username in ldap and it will work cross instance [22:53:44] of course it doesn't need a home [22:54:07] but user without proper $HOME might not work ok [22:54:17] petan: it does? [22:54:22] even system users in system have some home specified [22:54:31] /home isn't fake [22:54:35] -.- [22:54:43] the dir isn't created unless you're a project member [22:54:50] can't be a member as it doesn't have shell rights [22:54:54] pam_mkhomedir makes it [22:54:58] yes [22:55:01] it's just a directory [22:55:09] Unable to create and initialize directory '/home/wmib'. [22:55:10] ah [22:55:10] I see [22:55:12] yeah but you can't ssh in for pam to make it [22:55:13] I know wy [22:55:14] *why [22:55:16] so bleh [22:55:28] labs-nfs1:/export/home/bots on /home type nfs (rw,addr=10.4.0.13) [22:55:45] the instance needs to be rebooted [22:55:52] the nfs mounts are read-only [22:55:57]