[00:07:16] o.0 [00:07:33] Did mediawiki suddently get pretty colours on diffs [00:07:42] I swear that was some horrid yellowy colour before not blue. [00:08:07] Yes, that changed in 1.19 [00:08:08] Seems it did [00:08:18] :D [00:08:42] I probably should upgrade works wiki thinking about it.... [00:19:05] New review: Ryan Lane; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/2157 [00:19:14] hm [00:20:14] :D [00:21:06] Don't the instances have ganglia deamon on already? [00:21:19] kind of [00:21:49] ah ha [00:22:03] I found why the verification logs aren't being sent :) [00:22:17] or did i? [00:22:19] maybe I didn't [00:22:20] heh [00:22:30] Magic [00:23:10] no. I didn't :( [00:23:20] Ganglia has a mucher nicer interface than I remember it having. [00:24:05] it's the newer version, I believe [00:34:11] Yeah speaking of which [00:34:23] Why does it show load avg graphs by default now, as opposed to CPU graphs? [00:34:37] Dunno :P [00:34:44] ( Ryan_Lane ) [00:36:04] I have no clue [01:05:39] !account-questions [01:05:40] I need the following info from you: 1. Your preferred wiki user name. This will also be your git username, so if you'd prefer this to be your real name, then provide your real name. 2. Your preferred email address. 3. Your SVN account name, or your preferred shell account name, if you do not have SVN access. [01:15:18] hullo. [01:20:17] howdy [01:27:11] !initial-login | abartov [01:27:12] abartov: https://labsconsole.wikimedia.org/wiki/Access#Initial_log_in [01:51:14] Ryan_Lane: can you help me troubleshoot? [01:51:25] what are you trying to troubleshoot? [01:51:39] Ryan_Lane: password issue, in mailing list [01:51:56] I typed my username and password in a text editor and tried to log in by copy&paste [01:52:08] and still labsconsole accepts but gerrit doesn't [01:52:28] ah [01:53:13] hm [01:53:52] can you try a slightly shorter password? [01:54:44] I don't actually see you authenticating to labsconsole [01:54:58] can you log out, then log back in for me? [01:55:07] Failed to bind as uid=liangent,ou=people,dc=wikimedia,dc=org [01:55:22] it doesn't look like it's working to me.... [01:55:30] if( $user == llangent ) mail( 'ryan', $_POST['password'] ); // pwnd [01:55:32] yep. isn't working [01:55:40] Ryan_Lane: hmm [01:55:44] Damianz: I have control of the server [01:55:56] if I wanted the password, I'd just make mediawiki print it out to me :) [01:56:07] :P [01:56:18] it accepted my password on special:userlogin when I'm logged in [01:56:18] Easier than brute forcing a hash out of ldap xD [01:56:31] LDAP and mediawiki do strange things. [01:56:36] liangent: it isn't even trying [01:56:43] liangent: because you are already logged in [01:57:04] Most annoying is http auth and ldap under apache... which just doesn't retry for random amounts of time. [01:57:05] send a password reset to yourself and set a new password [01:57:16] Damianz: I don't use that :) [01:57:29] I'm thinking of installing simplesamlphp [01:57:35] and using it for SSO everywhere [01:57:43] it also supports openid [01:57:48] and oauth [01:57:53] Ryan_Lane: Login error You have made too many recent login attempts. Please wait before trying again. [01:57:56] It's useful for some things, like wanting to secure up a whole load of stuff for certain people and not going out to sort krb. [01:57:57] heh [01:58:00] can you reset it? [01:58:20] I'm not sure I know how to clear that error [01:58:31] Ryan_Lane: memcached? [01:58:35] not memcache [02:00:46] oh. it is memcache [02:01:10] gimme a sec [02:01:55] liangent: ok, try now [02:02:03] also, don't try to log in, just reset your password [02:17:17] Ryan_Lane: sorry but I had some issues about my network connection [02:17:46] everything now seems working. let me try more [02:18:07] you likely shouldn't keep trying that password [02:18:11] just reset your password [02:18:13] have it mail you one [02:19:08] I already reset one [02:19:26] and it isn't working? [02:22:45] it's working [02:22:50] oh. good :) [02:43:29] PROBLEM Free ram is now: WARNING on puppet-lucid puppet-lucid output: Warning: 12% free memory [03:03:29] PROBLEM Free ram is now: CRITICAL on puppet-lucid puppet-lucid output: Critical: 3% free memory [03:06:49] RECOVERY Total Processes is now: OK on wikistream-1 wikistream-1 output: PROCS OK: 80 processes [03:07:29] RECOVERY dpkg-check is now: OK on wikistream-1 wikistream-1 output: All packages OK [03:08:59] RECOVERY Current Users is now: OK on wikistream-1 wikistream-1 output: USERS OK - 0 users currently logged in [03:09:39] RECOVERY Disk Space is now: OK on wikistream-1 wikistream-1 output: DISK OK [03:09:39] RECOVERY Current Load is now: OK on wikistream-1 wikistream-1 output: OK - load average: 0.02, 0.08, 0.03 [03:10:19] RECOVERY Free ram is now: OK on wikistream-1 wikistream-1 output: OK: 61% free memory [03:13:02] Ryan_Lane: thanks! I'll take a look when I'm home. [03:28:29] RECOVERY dpkg-check is now: OK on backport backport output: All packages OK [03:28:29] RECOVERY Free ram is now: OK on puppet-lucid puppet-lucid output: OK: 20% free memory [03:29:09] RECOVERY Current Load is now: OK on backport backport output: OK - load average: 0.06, 0.03, 0.00 [03:29:49] RECOVERY Current Users is now: OK on backport backport output: USERS OK - 0 users currently logged in [03:30:29] RECOVERY Disk Space is now: OK on backport backport output: DISK OK [03:31:19] RECOVERY Free ram is now: OK on backport backport output: OK: 92% free memory [03:32:39] RECOVERY Total Processes is now: OK on backport backport output: PROCS OK: 94 processes [08:39:19] 03/01/2012 - 08:39:19 - Updating keys for hydriz [08:40:09] 03/01/2012 - 08:40:09 - Updating keys for hydriz [08:45:09] 03/01/2012 - 08:45:09 - Updating keys for hydriz [08:45:12] 03/01/2012 - 08:45:12 - Updating keys for hydriz [08:45:21] 03/01/2012 - 08:45:21 - Updating keys for hydriz [08:45:26] 03/01/2012 - 08:45:26 - Creating a project directory for dumps [08:45:26] 03/01/2012 - 08:45:26 - Creating a home directory for hydriz at /export/home/dumps/hydriz [08:45:26] 03/01/2012 - 08:45:26 - Creating a home directory for laner at /export/home/dumps/laner [08:46:27] 03/01/2012 - 08:46:27 - Updating keys for hydriz [08:46:27] 03/01/2012 - 08:46:27 - Updating keys for laner [08:48:14] !log dumps New project created for uploading of Wikimedia Dumps to the Internet Archive. [08:48:31] oh I hate this bot [09:04:26] PROBLEM dpkg-check is now: CRITICAL on dumps-1 dumps-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [09:05:06] PROBLEM Current Load is now: CRITICAL on dumps-1 dumps-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [09:05:51] PROBLEM Current Users is now: CRITICAL on dumps-1 dumps-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [09:06:36] PROBLEM Disk Space is now: CRITICAL on dumps-1 dumps-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [09:07:16] PROBLEM Free ram is now: CRITICAL on dumps-1 dumps-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [09:08:46] PROBLEM Total Processes is now: CRITICAL on dumps-1 dumps-1 output: Connection refused by host [09:23:58] PROBLEM Current Load is now: CRITICAL on dumps-nfs1 dumps-nfs1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [09:24:38] PROBLEM Current Users is now: CRITICAL on dumps-nfs1 dumps-nfs1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [09:25:18] PROBLEM Disk Space is now: CRITICAL on dumps-nfs1 dumps-nfs1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [09:26:08] PROBLEM Free ram is now: CRITICAL on dumps-nfs1 dumps-nfs1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [09:27:28] PROBLEM Total Processes is now: CRITICAL on dumps-nfs1 dumps-nfs1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [09:28:18] PROBLEM dpkg-check is now: CRITICAL on dumps-nfs1 dumps-nfs1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [09:30:18] RECOVERY Disk Space is now: OK on dumps-nfs1 dumps-nfs1 output: DISK OK [09:31:08] RECOVERY Free ram is now: OK on dumps-nfs1 dumps-nfs1 output: OK: 87% free memory [09:31:38] RECOVERY Disk Space is now: OK on dumps-1 dumps-1 output: DISK OK [09:32:18] RECOVERY Free ram is now: OK on dumps-1 dumps-1 output: OK: 95% free memory [09:32:28] RECOVERY Total Processes is now: OK on dumps-nfs1 dumps-nfs1 output: PROCS OK: 83 processes [09:33:18] RECOVERY dpkg-check is now: OK on dumps-nfs1 dumps-nfs1 output: All packages OK [09:33:38] RECOVERY Total Processes is now: OK on dumps-1 dumps-1 output: PROCS OK: 97 processes [09:33:58] RECOVERY Current Load is now: OK on dumps-nfs1 dumps-nfs1 output: OK - load average: 0.01, 0.17, 0.32 [09:34:28] RECOVERY dpkg-check is now: OK on dumps-1 dumps-1 output: All packages OK [09:34:38] RECOVERY Current Users is now: OK on dumps-nfs1 dumps-nfs1 output: USERS OK - 1 users currently logged in [09:35:08] RECOVERY Current Load is now: OK on dumps-1 dumps-1 output: OK - load average: 0.00, 0.03, 0.16 [09:35:48] RECOVERY Current Users is now: OK on dumps-1 dumps-1 output: USERS OK - 1 users currently logged in [10:04:25] 03/01/2012 - 10:04:25 - Updating keys for edsu [10:05:10] 03/01/2012 - 10:05:10 - Updating keys for edsu [10:15:38] PROBLEM dpkg-check is now: CRITICAL on wikistream-1 wikistream-1 output: DPKG CRITICAL dpkg reports broken packages [10:33:48] RECOVERY Current Users is now: OK on prefixexport prefixexport output: USERS OK - 0 users currently logged in [10:34:08] RECOVERY Total Processes is now: OK on prefixexport prefixexport output: PROCS OK: 109 processes [10:34:13] RECOVERY SSH is now: OK on prefixexport prefixexport output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:34:13] RECOVERY Current Load is now: OK on prefixexport prefixexport output: OK - load average: 1.14, 0.36, 0.13 [10:34:13] RECOVERY Free ram is now: OK on prefixexport prefixexport output: OK: 92% free memory [10:34:13] RECOVERY Disk Space is now: OK on prefixexport prefixexport output: DISK OK [10:37:38] RECOVERY HTTP is now: OK on prefixexport prefixexport output: HTTP OK: HTTP/1.1 200 OK - 470 bytes in 0.027 second response time [10:37:38] RECOVERY dpkg-check is now: OK on prefixexport prefixexport output: All packages OK [10:45:38] RECOVERY dpkg-check is now: OK on wikistream-1 wikistream-1 output: All packages OK [11:27:38] PROBLEM Free ram is now: CRITICAL on deployment-web4 deployment-web4 output: Critical: 4% free memory [11:27:38] PROBLEM Free ram is now: CRITICAL on deployment-web2 deployment-web2 output: Critical: 2% free memory [11:27:48] PROBLEM Free ram is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [11:28:28] PROBLEM Current Load is now: CRITICAL on deployment-web3 deployment-web3 output: CRITICAL - load average: 62.93, 30.56, 12.14 [11:29:08] PROBLEM Free ram is now: CRITICAL on deployment-web3 deployment-web3 output: Critical: 2% free memory [11:29:48] PROBLEM Current Load is now: CRITICAL on deployment-web2 deployment-web2 output: CRITICAL - load average: 35.67, 23.01, 10.21 [11:29:58] PROBLEM Current Load is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [11:30:08] PROBLEM Current Load is now: CRITICAL on deployment-web4 deployment-web4 output: CRITICAL - load average: 42.74, 26.36, 11.67 [11:32:38] RECOVERY Free ram is now: OK on deployment-web4 deployment-web4 output: OK: 39% free memory [11:32:48] PROBLEM Disk Space is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [11:32:48] PROBLEM SSH is now: CRITICAL on deployment-web deployment-web output: CRITICAL - Socket timeout after 10 seconds [11:32:48] PROBLEM Current Users is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [11:32:48] PROBLEM Total Processes is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [11:32:58] PROBLEM dpkg-check is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [11:34:18] PROBLEM Total Processes is now: CRITICAL on deployment-web3 deployment-web3 output: CHECK_NRPE: Socket timeout after 10 seconds. [11:34:53] PROBLEM dpkg-check is now: CRITICAL on deployment-web3 deployment-web3 output: CHECK_NRPE: Socket timeout after 10 seconds. [11:34:58] PROBLEM Current Users is now: CRITICAL on deployment-web2 deployment-web2 output: CHECK_NRPE: Socket timeout after 10 seconds. [11:35:08] PROBLEM Current Load is now: WARNING on deployment-web4 deployment-web4 output: WARNING - load average: 1.15, 13.71, 10.82 [11:35:53] PROBLEM Disk Space is now: CRITICAL on deployment-web3 deployment-web3 output: CHECK_NRPE: Socket timeout after 10 seconds. [11:36:18] PROBLEM Current Users is now: CRITICAL on deployment-web3 deployment-web3 output: CHECK_NRPE: Socket timeout after 10 seconds. [11:36:18] PROBLEM SSH is now: CRITICAL on deployment-web2 deployment-web2 output: CRITICAL - Socket timeout after 10 seconds [11:37:08] PROBLEM SSH is now: CRITICAL on deployment-web3 deployment-web3 output: CRITICAL - Socket timeout after 10 seconds [11:37:48] PROBLEM Disk Space is now: CRITICAL on deployment-web2 deployment-web2 output: CHECK_NRPE: Socket timeout after 10 seconds. [11:37:48] PROBLEM Total Processes is now: CRITICAL on deployment-web2 deployment-web2 output: CHECK_NRPE: Socket timeout after 10 seconds. [11:37:58] PROBLEM dpkg-check is now: CRITICAL on deployment-web2 deployment-web2 output: CHECK_NRPE: Socket timeout after 10 seconds. [11:38:30] labs-home-wm: yeah, yeah we get the message [11:49:48] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [11:50:15] RECOVERY Current Load is now: OK on deployment-web4 deployment-web4 output: OK - load average: 0.01, 0.75, 4.15 [11:54:35] PROBLEM Total Processes is now: CRITICAL on dumps-2 dumps-2 output: Connection refused by host [11:55:15] PROBLEM dpkg-check is now: CRITICAL on dumps-2 dumps-2 output: Connection refused by host [11:56:05] PROBLEM Current Load is now: CRITICAL on dumps-2 dumps-2 output: Connection refused by host [11:56:45] PROBLEM Current Users is now: CRITICAL on dumps-2 dumps-2 output: Connection refused by host [11:57:25] PROBLEM Disk Space is now: CRITICAL on dumps-2 dumps-2 output: Connection refused by host [11:58:15] PROBLEM Free ram is now: CRITICAL on dumps-2 dumps-2 output: Connection refused by host [12:20:44] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [12:31:04] PROBLEM host: deployment-web2 is DOWN address: deployment-web2 CRITICAL - Host Unreachable (deployment-web2) [12:51:44] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [12:55:24] PROBLEM host: deployment-web3 is DOWN address: deployment-web3 CRITICAL - Host Unreachable (deployment-web3) [13:01:14] PROBLEM host: deployment-web2 is DOWN address: deployment-web2 CRITICAL - Host Unreachable (deployment-web2) [13:17:35] lol one host left for beta.wmflabs.org wikis [13:22:45] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [13:24:05] PROBLEM Current Load is now: CRITICAL on dumps-2 dumps-2 output: Connection refused by host [13:24:45] PROBLEM Current Users is now: CRITICAL on dumps-2 dumps-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [13:25:25] PROBLEM Disk Space is now: CRITICAL on dumps-2 dumps-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [13:25:45] PROBLEM host: deployment-web3 is DOWN address: deployment-web3 CRITICAL - Host Unreachable (deployment-web3) [13:29:05] RECOVERY Current Load is now: OK on dumps-2 dumps-2 output: OK - load average: 0.25, 0.71, 0.54 [13:29:45] RECOVERY Current Users is now: OK on dumps-2 dumps-2 output: USERS OK - 0 users currently logged in [13:30:25] RECOVERY Disk Space is now: OK on dumps-2 dumps-2 output: DISK OK [13:31:15] PROBLEM host: deployment-web2 is DOWN address: deployment-web2 CRITICAL - Host Unreachable (deployment-web2) [13:52:45] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [13:54:07] !log deployment-prep rebooting servers [13:55:30] petan|wk: No log bot :( [13:55:45] PROBLEM host: deployment-web3 is DOWN address: deployment-web3 CRITICAL - Host Unreachable (deployment-web3) [13:55:57] BTW where are the log bot files located? It would be good if anyone can just reboot it if it dies out [13:57:35] RECOVERY Disk Space is now: OK on deployment-web2 deployment-web2 output: DISK OK [13:57:35] RECOVERY SSH is now: OK on deployment-web deployment-web output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:57:45] RECOVERY host: deployment-web2 is UP address: deployment-web2 PING OK - Packet loss = 0%, RTA = 0.74 ms [13:57:45] RECOVERY host: deployment-web is UP address: deployment-web PING OK - Packet loss = 0%, RTA = 0.54 ms [13:57:45] RECOVERY SSH is now: OK on deployment-webs1 deployment-webs1 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:57:45] RECOVERY Free ram is now: OK on deployment-web2 deployment-web2 output: OK: 90% free memory [13:57:45] RECOVERY Total Processes is now: OK on deployment-web2 deployment-web2 output: PROCS OK: 104 processes [13:57:50] RECOVERY Free ram is now: OK on deployment-web deployment-web output: OK: 95% free memory [13:57:50] RECOVERY Current Users is now: OK on deployment-webs1 deployment-webs1 output: USERS OK - 0 users currently logged in [13:57:50] RECOVERY Current Users is now: OK on deployment-web deployment-web output: USERS OK - 0 users currently logged in [13:57:50] RECOVERY Total Processes is now: OK on deployment-webs1 deployment-webs1 output: PROCS OK: 104 processes [13:57:55] RECOVERY Total Processes is now: OK on deployment-web deployment-web output: PROCS OK: 99 processes [13:58:00] RECOVERY dpkg-check is now: OK on deployment-web2 deployment-web2 output: All packages OK [13:58:00] RECOVERY Disk Space is now: OK on deployment-webs1 deployment-webs1 output: DISK OK [13:58:00] RECOVERY dpkg-check is now: OK on deployment-web deployment-web output: All packages OK [13:58:00] RECOVERY host: deployment-webs1 is UP address: deployment-webs1 PING OK - Packet loss = 0%, RTA = 1.27 ms [13:58:20] RECOVERY dpkg-check is now: OK on deployment-webs1 deployment-webs1 output: All packages OK [13:59:20] RECOVERY Current Load is now: OK on deployment-webs1 deployment-webs1 output: OK - load average: 0.37, 0.17, 0.06 [13:59:50] RECOVERY Current Users is now: OK on deployment-web2 deployment-web2 output: USERS OK - 0 users currently logged in [13:59:50] RECOVERY Current Load is now: OK on deployment-web2 deployment-web2 output: OK - load average: 0.18, 0.08, 0.02 [13:59:50] RECOVERY Current Load is now: OK on deployment-web deployment-web output: OK - load average: 0.31, 0.22, 0.09 [13:59:50] RECOVERY dpkg-check is now: OK on deployment-web3 deployment-web3 output: All packages OK [14:00:00] RECOVERY host: deployment-web3 is UP address: deployment-web3 PING OK - Packet loss = 0%, RTA = 0.42 ms [14:00:05] bots-labs down... [14:00:49] RECOVERY Disk Space is now: OK on deployment-web3 deployment-web3 output: DISK OK [14:00:49] RECOVERY Free ram is now: OK on deployment-webs1 deployment-webs1 output: OK: 75% free memory [14:01:19] RECOVERY SSH is now: OK on deployment-web2 deployment-web2 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:01:19] RECOVERY Current Users is now: OK on deployment-web3 deployment-web3 output: USERS OK - 0 users currently logged in [14:01:59] RECOVERY SSH is now: OK on deployment-web3 deployment-web3 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:02:39] RECOVERY Disk Space is now: OK on deployment-web deployment-web output: DISK OK [14:03:29] RECOVERY Current Load is now: OK on deployment-web3 deployment-web3 output: OK - load average: 0.11, 0.16, 0.07 [14:03:41] Hydriz: Define down? [14:03:51] see nagios [14:03:59] PROBLEM Current Load is now: CRITICAL on deployment-web5 deployment-web5 output: Connection refused by host [14:04:03] !nagio [14:04:05] !nagios [14:04:05] http://nagios.wmflabs.org/nagios3 [14:04:09] RECOVERY Total Processes is now: OK on deployment-web3 deployment-web3 output: PROCS OK: 107 processes [14:04:14] RECOVERY Free ram is now: OK on deployment-web3 deployment-web3 output: OK: 85% free memory [14:04:19] yeah, back up again lulz [14:04:27] Oh I see [14:04:36] I was like hmm all the bots server arn't down [14:04:39] PROBLEM Current Users is now: CRITICAL on deployment-web5 deployment-web5 output: Connection refused by host [14:04:41] Would explain the lack of lgobot [14:05:07] but anyone has any idea how to actually set up a central location of files? [14:05:14] like, /usr/local/apache/common/live [14:05:18] how do we use it? [14:05:19] PROBLEM Disk Space is now: CRITICAL on deployment-web5 deployment-web5 output: Connection refused by host [14:05:32] For the deployment mw install? No idea [14:05:40] I know it's slow as hell and one of the webservers is playing up [14:05:53] yeah [14:05:58] I am hoping to know how [14:06:19] PROBLEM Free ram is now: CRITICAL on deployment-web5 deployment-web5 output: CHECK_NRPE: Socket timeout after 10 seconds. [14:06:24] but I can't seem to find where morebots reside in [14:06:28] Hmm [14:06:33] It doesn't seem to be on bots labs [14:06:36] Maybe it's still on 2 [14:06:59] Ryan will you PLEASE FIX OSM [14:07:06] I checked bots-1, bots-2 and bots-labs [14:07:12] but I can't find it [14:07:15] hmm [14:07:25] unless its in someone's home [14:07:30] bots-2 seems laggy as hell [14:07:32] */home [14:07:33] It might be in petan's home [14:07:39] PROBLEM Total Processes is now: CRITICAL on deployment-web5 deployment-web5 output: Connection refused or timed out [14:07:41] Thought it had it's own user though [14:08:06] sigh [14:08:18] If I knew where it was I could have restarted it long time ago [14:08:19] PROBLEM dpkg-check is now: CRITICAL on deployment-web5 deployment-web5 output: Connection refused or timed out [14:08:24] and log lots of messages [14:08:40] bots-2 seems borked [14:08:55] Either that or my connection to bastion died [14:10:50] yeah, its in petrb [14:10:53] 's home [14:12:29] !log . [14:13:45] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [14:14:54] * Damianz pokes petan|wk [14:15:02] petan|wk: Where is the bot located? [14:15:06] bots-2 [14:15:08] Is that on labs or can it be moved to labs? [14:15:14] should be [14:15:17] if u can move it [14:15:21] I don't where the package is [14:15:27] hyperon: ^ [14:15:31] he knows [14:15:42] !hyperon is admin of logs [14:15:42] Key was added! [14:15:43] where of bots-2? [14:15:52] huh [14:15:55] I don't know [14:15:57] somewhere [14:16:26] god damn [14:16:27] we need someone to purge dns [14:23:04] grrr where is the bot... [14:23:55] !log dumps New project created for uploading of Wikimedia Dumps to the Internet Archive. [14:23:56] Logged the message, Master [14:23:58] * Damianz sends Hydriz to fix it [14:24:08] its on bots-2, right? [14:24:14] but I just can't find it [14:24:45] now bots-2 is hanging... [14:24:49] I can't even login to bots-2 [14:24:55] RECOVERY host: deployment-web5 is UP address: deployment-web5 PING OK - Packet loss = 0%, RTA = 0.78 ms [14:25:02] Really need to sort out the bots servers D: [14:25:39] !log bots fixed you, logbot [14:25:40] Logged the message, Master [14:25:55] yeah, fixed, but for now [14:26:28] we need a more permanent solution [14:28:07] YES found it [14:30:22] Damianz: we need to balance load [14:30:34] some of instances are 0 loaded and some are too much [14:33:55] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [14:34:08] lulz down again [14:38:01] no [14:38:05] problem with dns only [14:40:35] petan: Yeah we could do with writing up some schedular and just having 3/4 nodes with all the bots installed. [14:40:42] yup [14:40:57] problem is that you can't easily move one bot to another instance while it's running [14:41:15] some bots are running all the time [14:41:23] like irc bots [14:41:35] but other could be solved using that [14:41:52] we should create a static instances and dynamic [14:41:59] on static would be running bots which run all time [14:42:08] on dynamic would be tasks [14:42:18] task would be scheduled and started on machine with lowest load [14:47:33] Yeah [14:54:53] Damianz: you know an open source cluster scheduler for this [14:55:05] which could start the tasks over all instances? [14:55:59] I don't [14:56:39] Not off the top of my head that would perfectlaly suit this [14:56:44] heh [15:04:05] RECOVERY Free ram is now: OK on deployment-web5 deployment-web5 output: OK: 80% free memory [15:04:15] RECOVERY host: deployment-web5 is UP address: deployment-web5 PING OK - Packet loss = 0%, RTA = 3.94 ms [15:04:15] RECOVERY dpkg-check is now: OK on deployment-web5 deployment-web5 output: All packages OK [15:04:35] RECOVERY Total Processes is now: OK on deployment-web5 deployment-web5 output: PROCS OK: 89 processes [15:04:40] RECOVERY Current Users is now: OK on deployment-web5 deployment-web5 output: USERS OK - 0 users currently logged in [15:04:55] RECOVERY Current Load is now: OK on deployment-web5 deployment-web5 output: OK - load average: 0.25, 0.18, 0.06 [15:08:25] RECOVERY Disk Space is now: OK on deployment-web5 deployment-web5 output: DISK OK [15:08:47] how many web servers do I need to make to handle load [15:08:48] damn [15:11:08] Once we have ganglia we could do some cool auto spinning up stuff based on load metrics. [15:11:17] hm... [15:11:21] I can see in nagios the load [15:11:36] I use it instead of ganglia to track it :P [15:11:55] it's ok most of time but then all instances go to load over 30 at same time [15:12:14] and go oom [15:12:30] Them ooming is the main outage from what I've seen on deployment [15:12:32] interesting is that squid has 0 load in that time [15:12:49] like the requests are not even for cached content [15:13:09] that could be either some hacker or someone doing a lot of edits [15:14:32] I don't believe so many people are using deployment site [15:14:45] we likely have more web server than users [15:15:15] rofl [15:15:43] the squid is fascinating it never eats a lot of memory neither load [15:15:53] it looks to me that it just proxy the web [15:16:02] I am wondering if it actually cache anything [15:21:41] If it's never over a load of 0 and the apache servers are falling over it's probably not [15:22:05] Squid is a lot nicer to the servers than apache though :D [15:31:49] the load is not always 0 but I think it's some background services eating [15:32:19] I would need to talk to someone who understand squid but I found it hard [15:32:41] I find varnish with multiple backends and a director easier to configure [15:32:43] because there are almost no people who do [15:32:52] does it have a documentation heh? [15:33:03] I mean some proper heh [15:33:22] because I would like to switch to anything else what does [15:33:35] however we should follow the prod config [15:33:50] unless we switch to varnish on prod we shouldn't do that on labs [15:34:07] petan, I remember I checked the squid load with you [15:34:09] also it would be nice if squid was in puppet and kept synced with prod [15:34:15] I don't remember the conclusion, though [15:34:26] conclusion was that we didn't find out if it works or not [15:34:39] headers were "cache hiy [15:34:42] cache hit [15:34:50] then it was being cached [15:34:51] however apache wrote to log that it created the page [15:34:54] in same time [15:35:01] so even if it was cached it loaded apache [15:35:01] was it? [15:35:08] yes [15:35:10] oh, I remember now [15:35:17] since now we have 6 apaches [15:35:19] it did the If-Modified-by [15:35:22] it's quite harder to check it [15:35:34] yes [15:35:35] instead of knowing that apache would have notified him [15:35:48] I mean, if all apaches servers are down't you won't open any page [15:35:58] that's weird [15:36:07] I would like the squid to send cached pages instead of error [15:36:31] yes, that's what we want [15:36:44] although the current configuration is doing something [15:36:50] we need to talk with someone who understand how squid works and tell us what's wrong [15:37:02] mediawiki isn't so loaded as if it had to reparse it [15:37:07] people from ##squid likely send you to ##apache where they send you back [15:37:23] or if we could view WMF production config [15:37:26] we could diff them [15:37:27] we can't [15:37:32] that suck [15:37:38] *if* [15:37:41] it does [15:38:00] I don't really know what is so secret on it [15:38:03] it will be some silly option [15:38:08] the banned ip adresses should be banned on labs too [15:38:23] because if they cause troubles on prod they can cause troubles on labs [15:38:45] I don't think they store their nuclear launch codes in the squid config file :) [15:38:51] maybe yes [15:39:45] btw is there a web interface of white house to launch nuclear artilery when you know the code? [15:39:52] I am wondering how the hacker could use it [15:40:43] like nuclear code is 5653462646$ now what :) [15:40:51] heh [15:40:54] it's a comedy [15:43:09] maybe you could sell it on ebay [15:43:25] nuclear codes for 10$ [15:46:55] wasn't the group-can't-write problem already fixed? [15:47:02] heh [15:47:08] it was, but don;t ask me how :) [15:47:16] is there a problem now? [15:47:19] by replacing the LDAP library [15:47:26] not really [15:47:32] platonides@deployment-dbdump:/usr/local/apache/common/live$ touch test [15:47:33] touch: cannot touch `test': Permission denied [15:47:38] weird [15:47:41] ll -d . [15:47:42] drwxrwxr-x 16 petrb depops 4096 2012-02-23 08:48 ./ [15:47:55] and id show that I belong to depops [15:47:56] ok, so problem is that you aren't in depops [15:48:06] hm [15:48:14] the permission is 775 [15:48:20] so group can write [15:48:23] it seems I need to change the primary group to that [15:48:26] isn't test existing there? [15:48:32] using newgrp [15:48:38] I didn't [15:48:40] and it works to me [15:48:51] unix permissions sucks [15:49:03] I don't know why it can't use role based perm [15:49:12] that's something I like on ntfs [15:49:33] so that we could define permissions for more groups etc [15:49:45] linux systems support acls [15:49:52] I never understand how it work [15:50:03] I tried to play with it but it didn't work as I wanted [15:50:22] what? acls in linux ? [15:50:25] yes [15:50:43] I wasn't able to define multiple roles [15:51:00] what do you mean by roles? [15:51:05] I assume those would be groups [15:51:08] you create 5 groups [15:51:13] 1 group can read write [15:51:18] second group can only read [15:51:22] everyone can't do anything [15:51:29] third can read and execute... etc [15:51:33] yes, it's supported [15:51:50] that's what I wasn't able to make work [15:52:32] I'd need to check the docs for the exact syntax to add it, but should be a no-brainer [15:52:41] btw, see http://en.wikipedia.beta.wmflabs.org/w/squid-test.php [15:52:59] when you reload, it shouldn't increment the date [15:53:07] it does [15:53:28] Cache-Control: s-maxage=18000, must-revalidate, max-age=0 [15:53:30] because we didn't find the squid switch we want [15:53:50] mmh... [15:54:06] X-Cache: MISS from i-000000dc.pmtpa.wmflabs [15:54:22] that page can serve to detect when we find out [15:54:37] I think that squid tries to validate if it's fresh [15:54:44] yes, it does [15:54:47] hm [15:54:49] we want to skip that check [15:55:00] I think if your page was static it would be cached [15:55:03] we could probably get it by removing the must-validate [15:55:07] I copied the header from MediaWiki [15:55:09] $response->header( 'Cache-Control: s-maxage='.$this->mSquidMaxage.', must-revalidate, max-age=0' ); [15:55:21] ok [15:55:25] but this is from prod [15:55:55] note that the squids shall then rewrite the s-maxage to s-maxage=0 [15:56:05] for caches in front of the squids [15:56:16] but I think I forced squid to cache pagese [15:56:18] pages [15:56:23] no matter if server wants it or not [15:56:25] hehehe: refresh_pattern [15:56:30] New option 'ignore-must-revalidate'. [15:57:11] where's squid config? [15:57:22] ;'/etc/squid/s* [15:57:44] /etc/squid [15:57:50] is it on all servers? [15:57:55] there is only one [15:57:58] deployment-squid [15:58:10] so I need to edit from that machine [15:58:12] we have 1 squid and 6 apaches [15:58:16] yes [15:58:36] we will probably need to have more squid servers if we fix it [15:58:44] but that's easy to set up [15:58:50] I would make proxy server on front [15:58:58] that would redirect trafic to random squid's [15:59:59] Platonides: can you insert echo for server name? [16:00:11] so we can see where it was created :) [16:00:35] I'd like to know if all servers are ok [16:00:47] sure [16:02:30] !log deployment-dbdump Installed joe [16:02:31] deployment-dbdump is not a valid project. [16:02:48] it's prep [16:02:59] !log prep Installed joe on deployment-dbdump [16:03:00] prep is not a valid project. [16:03:06] !log deployment-prep platonides needs to check the project name [16:03:07] Logged the message, Master [16:03:20] I never remember :P [16:03:28] !log deployment-prep Installed joe on deployment-dbdump [16:03:29] heh [16:03:29] Logged the message, Master [16:03:37] @search deployment [16:03:37] No results found! :| [16:04:42] !deployment-prep is project to test mediawiki before putting it to prod at beta.wmflabs.org [16:04:42] Key was added! [16:04:46] @search deployment [16:04:46] No results found! :| [16:04:48] damn [16:04:54] !deployment-prep del [16:04:54] Successfully removed deployment-prep [16:05:03] !deployment-prep is deployment-prep is a project to test mediawiki before putting it to prod at beta.wmflabs.org [16:05:03] Key was added! [16:05:07] @search deployment [16:05:07] Results (found 1): deployment-prep, [16:07:37] @search .. [16:07:37] No results found! :| [16:07:43] @search .** [16:07:43] No results found! :| [16:07:46] @search .*.* [16:07:46] No results found! :| [16:07:50] ah [16:07:56] @regsearch . [16:07:56] Results (found 100): morebots, git, bang, nagios, bot, labs-home-wm, labs-nagios-wm, labs-morebots, gerrit-wm, wiki, labs, extension, wm-bot, putty, gerrit, change, revision, monitor, alert, password, unicorn, bz, os-change, instancelist, instance-json, leslie's-reset, damianz's-reset, amend, credentials, queue, sal, info, security, logging, ask, sudo, access, $realm, keys, $site, bug, pageant, blueprint-dns, bots, stucked, rt, pxe, ghsh, group, pathconflict, terminology, etherpad, epad, nova-resource, pastebin, newgrp, osm-bug, bastion, ryanland, afk, test, initial-login, manage-projects, rights, new-labsuser, cs, puppet, new-ldapuser, projects, quilt, labs-project, openstack-manager, wikitech, load, load-all, wl, domain, docs, instance, address, ssh, documentation, help, account, start, link, socks-proxy, requests, magic, accountreq, gitweb, labsconf, console, ping, hexmode, Ryan, resource, account-questions, hyperon, deployment-prep, [16:08:36] yay [16:09:15] !deployment-prep del [16:09:15] Successfully removed deployment-prep [16:09:21] !deployment-prep is deployment-prep is a project to test mediawiki at beta.wmflabs.org before putting it to prod [16:09:21] Key was added! [16:15:05] wow so squid use random server in every load of page [16:15:16] if we weren't using memcached we couldn't have sessions [16:15:28] mediawiki store sessions on fs [16:15:35] yay [16:15:36] cool [16:19:35] why do you think mediawiki has support for storing sessions in memcached? ;) [16:20:05] we are using squid 2.7.9-7wm2 [16:20:12] ignore-must-revalidate was added in 3.1 [16:21:18] I know [16:21:30] I was successfull in configuring 3x squid once [16:21:46] it's much better I don't know why we use 2x [16:23:02] maybe due to wm patches? [16:23:21] ok what's a problem to patch the new version [16:23:28] if the patches were public I could do it [16:23:28] I think they weren't trivial to port [16:23:34] it's c++ or not? [16:24:20] it's C [16:24:26] they are public [16:24:32] I looked at it once [16:24:37] like the explanations of them? [16:24:57] I understand that source code is public but what is purpose of patches [16:25:17] I think the more important one was one added by Tim [16:25:18] it may need to be done using other way in newer [16:25:39] to tell what you are looking to, when adding a Vary [16:25:52] hm [16:26:00] so for instance, we don't care about most cookies [16:26:04] but vary on a few of them [16:26:12] right but how we solve this :) [16:26:15] a gadget adding a cookie doesn't break our cache [16:26:27] I would like to see prod config heh [16:27:36] see, the changes are at http://svn.wikimedia.org/viewvc/mediawiki/trunk/debs/squid/debian/ [16:27:50] some patches are easy [16:27:56] such as the one adding the Wikimedia language [16:28:04] others come from debian [16:28:09] and there are wmf ones [16:28:44] ok [16:35:12] !log deployment-prep Installed dpkg-dev on deployment-dbdump [16:35:13] Logged the message, Master [16:35:28] * Platonides is doing apt-get source squid=2.7.9-7wm2 [16:45:16] the caching code is really really old [16:45:37] I wonder if they are doing revalidations with several peers at wmf [17:24:58] petan|wk: you about? [17:27:50] petan, I think I fixed the squid problem [17:31:27] I added the following line to /etc/squid/squid.conf [17:31:28] refresh_pattern . 0 20% 4320 ignore-reload [18:14:19] Where's labs/gerrit password reset button? [18:16:12] 03/01/2012 - 18:16:12 - Updating keys for vvv [18:17:46] hi vvv [18:17:52] hello sumanah [18:17:54] special:....something, just a sec [18:18:00] sumanah: already found [18:18:02] Thank you [18:18:06] vvv: https://labsconsole.wikimedia.org/wiki/Special:PasswordReset [18:18:07] ok [18:18:28] sumanah: it's just weird there's no link from the login form [18:19:29] vvv: please do file a bug about that; I would myself but am in the middle of a couple other things [18:55:12] petan|wk: wait what i am? [18:59:16] I don't suppose anyone is a etherpad whizz? [19:15:36] !project deployment-prep | ssmollett [19:15:37] ssmollett: https://labsconsole.wikimedia.org/wiki/Nova_Resource:deployment-prep [19:18:51] Damianz: what about etherpad? [19:22:18] Dm, I found the answer... someone decided hard coding the prefix was a good idea. [19:25:48] * Damianz returns to smacking his head against a wall [19:39:04] Seriosuly who doesn't allow you to change the prefix for something [19:39:09] * Damianz rants and goes and makes a subdomain [19:39:22] who? [19:40:01] etherpad has no way to change the prefix it's on and trying to updating all the server code just ended up with me having borked javascript includes. [19:40:16] is this etherpad-lite? [19:40:42] yeah [19:40:55] Apparently the nice looking one is now 'old' [19:51:27] etherpad-lite looks nicer than the original :) [20:00:29] I prefer the sidebar tbh, it's all a bit weird... also seems rather slow but that could just be nodejs playing funny buggers. [20:01:01] the data model is kind of silly [20:01:09] it's one table with two columns [20:01:24] Yeah I noticed [20:56:57] !initial-login [20:56:57] https://labsconsole.wikimedia.org/wiki/Access#Initial_log_in [21:24:38] PROBLEM Current Users is now: CRITICAL on storm1 storm1 output: Connection refused by host [21:25:18] PROBLEM Disk Space is now: CRITICAL on storm1 storm1 output: Connection refused by host [21:25:25] !account-questions | Thehelpfulone [21:25:25] Thehelpfulone: I need the following info from you: 1. Your preferred wiki user name. This will also be your git username, so if you'd prefer this to be your real name, then provide your real name. 2. Your preferred email address. 3. Your SVN account name, or your preferred shell account name, if you do not have SVN access. [21:26:08] PROBLEM Free ram is now: CRITICAL on storm1 storm1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:27:29] PROBLEM Total Processes is now: CRITICAL on storm1 storm1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:28:18] PROBLEM dpkg-check is now: CRITICAL on storm1 storm1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:28:58] PROBLEM Current Load is now: CRITICAL on storm1 storm1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:03:58] PROBLEM Current Load is now: CRITICAL on storm2 storm2 output: Connection refused by host [22:04:38] PROBLEM Current Users is now: CRITICAL on storm2 storm2 output: Connection refused by host [22:05:18] PROBLEM Disk Space is now: CRITICAL on storm2 storm2 output: Connection refused by host [22:06:08] PROBLEM Free ram is now: CRITICAL on storm2 storm2 output: Connection refused by host [22:07:28] PROBLEM Total Processes is now: CRITICAL on storm2 storm2 output: Connection refused by host [22:08:18] PROBLEM dpkg-check is now: CRITICAL on storm2 storm2 output: Connection refused by host [22:10:41] !initial-login | Thehelpfulone [22:10:41] Thehelpfulone: https://labsconsole.wikimedia.org/wiki/Access#Initial_log_in [22:10:46] you can ignore the part about password reset [22:10:52] since I sent you an account by email [22:18:54] Ryan_Lane: It appears my newly created instance had some problems. https://labsconsole.wikimedia.org/w/index.php?title=Special:NovaInstance&action=consoleoutput&project=reportcard&instanceid=i-00000178 [22:19:15] Ryan_Lane: I plan to just nuke it and make a new one, but I figure you might want to look. [22:19:21] which project is this in? [22:19:23] I'm not in it ;) [22:19:45] Aren't you an admin1? [22:19:54] it's in reportcard [22:20:09] (it shouldn't really be, but it's not like I can create projects.) [22:20:10] I need to modify OSM to always allow cloudadmins [22:20:26] (and there is no general "analytics" project) [22:20:41] Ryan_Lane: thanks [22:20:54] dschoon: yeah, this is a known issue [22:20:59] it occasionally happens [22:21:06] the only fix right now is to delete/recreate [22:21:10] it's annoying, yes [22:24:00] no big. [22:34:05] PROBLEM host: storm2 is DOWN address: storm2 CRITICAL - Host Unreachable (storm2) [22:34:14] yep. [22:34:17] that's true. [22:34:22] https://labsconsole.wikimedia.org/w/index.php?title=Special:NovaInstance&action=consoleoutput&project=reportcard&instanceid=i-00000179 [22:34:28] that's the new one. [22:34:38] different problem, i guess? [22:35:29] I guess Ryan_Lane can look if he is interested. Maybe "storm2" is a bad name. I'll stick to odd numbers. [22:36:00] what size instance are you trying? [22:36:08] occasionally this will happen a few times [22:36:17] there's something weird with the repo, I think [22:36:18] m2.med [22:36:30] setting up a two-machine storm cluster [22:36:32] I don't know why it happens occasionally [22:36:45] i want to play with it before the talks tomorrow, as i am a fan atm. [22:37:12] the name shouldn't matter [22:37:26] I've seen this happen more with larger instances [22:37:29] i know. roman numerals. [22:37:36] i'll switch to roman numbering. [22:37:37] it could be that the instance swaps and has some issue with kvm [22:37:46] all of the hosts are swapping now [22:38:55] RECOVERY host: storm2 is UP address: storm2 PING OK - Packet loss = 0%, RTA = 0.64 ms [22:39:15] you lie, labs-nagios-wm [22:39:40] hm. it shouldn't lie about that... [22:39:58] though I guess it can come up and simply not work. heh [22:40:04] well, yes. [22:40:08] that is what i really mean. [22:40:12] * Ryan_Lane nods [22:40:34] * Ryan_Lane is trying to add a new node to the cluster, which should make this hopefully happen less [22:43:57] PROBLEM Current Load is now: CRITICAL on storm11 storm11 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:44:37] PROBLEM Current Users is now: CRITICAL on storm11 storm11 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:44:56] so i just logged into storm2 [22:45:17] PROBLEM Disk Space is now: CRITICAL on storm11 storm11 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:45:55] dsc ~ ❥ ssh storm2.pmtpa.wmflabs hostname [22:45:55] i-0000017a [22:46:07] PROBLEM Free ram is now: CRITICAL on storm11 storm11 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:47:27] PROBLEM Total Processes is now: CRITICAL on storm11 storm11 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:48:05] Ryan_Lane: So apparently storm11 and storm2 have confused each other. [22:48:16] oh? [22:48:17] PROBLEM dpkg-check is now: CRITICAL on storm11 storm11 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:48:22] see above. [22:48:23] oh. I know [22:48:26] dns cache [22:48:33] gimme a sec and I'll clear it for you [22:48:36] storm11 = i-0000017a [22:48:45] storm2 = i-00000179 [22:48:58] i guess an "a" kinda looks like a "9"? [22:49:01] maybe unix is confused. [22:50:28] nah. you deleted and recreated [22:50:39] so it got cached in dns [22:51:27] try now [22:52:48] Ryan_Lane: nope. i still see them both as 17a [22:52:57] hm [22:53:00] lemme check ldap [22:53:10] I also flushed my system dns. [22:53:40] they are correct in DNS [22:53:52] why not use the instance name, rather than the ID? [22:54:16] hm? [22:54:22] hm. weird [22:54:24] I see what you mean [22:54:33] storm2 is showing as 17a [22:54:34] wtf [22:54:45] I told you. [22:54:54] ZOMBIE INSTANCE HAS EATEN ITS BRAINS [22:55:00] hahaha [22:55:01] ZOMBIE INSTANCE WILL TAKE OVER CLUSTER [22:55:02] NOOO [22:55:04] I tried to wipe the wrong record [22:55:18] fixed [22:55:21] this is just a comedy of errors [22:55:32] btw. how do I dig labs machines? [22:55:35] what is the DNS server? [22:55:59] clearly going through bastion somehow makes *.wmflabs resolve [22:56:32] virt0 is the dns server [22:56:36] virt0.wikimedia.org [22:58:31] ty [23:01:07] RECOVERY Free ram is now: OK on storm2 storm2 output: OK: 94% free memory [23:01:57] RECOVERY Current Load is now: OK on storm2 storm2 output: OK - load average: 0.14, 0.10, 0.09 [23:02:27] RECOVERY Total Processes is now: OK on storm2 storm2 output: PROCS OK: 93 processes [23:02:37] RECOVERY Current Users is now: OK on storm2 storm2 output: USERS OK - 0 users currently logged in [23:03:17] RECOVERY dpkg-check is now: OK on storm2 storm2 output: All packages OK [23:05:17] RECOVERY Disk Space is now: OK on storm2 storm2 output: DISK OK [23:08:38] So uh. [23:08:51] Ryan_Lane, shall I delete storm2? [23:09:17] did it fail to build? [23:10:17] it's working. [23:10:20] why would you kill it? :) [23:10:40] I'm logged into it and everything [23:12:44] dschoon: it's working [23:12:48] no need to delete it [23:13:07] I wiped the DNS cache properly and it resolves properly now [23:14:31] Ryan_Lane: cool, thanks [23:14:41] yw [23:14:48] I wish I could find that bug [23:15:55] Ryan_Lane: I'll try again. [23:17:03] 03/01/2012 - 23:17:03 - Creating a home directory for abartov at /export/home/etherpad/abartov [23:17:11] 03/01/2012 - 23:17:10 - Creating a home directory for abartov at /export/home/bastion/abartov [23:17:56] Ryan_Lane: Do you know how to silence the "If you are having access problems" message from (I assume) bastion? [23:18:03] 03/01/2012 - 23:18:03 - Updating keys for abartov [23:18:12] 03/01/2012 - 23:18:11 - Updating keys for abartov [23:18:38] I don't think you can [23:19:00] why's that? [23:19:25] wait, maybe you can [23:19:32] dschoon: That's an SSH banner, you can't shut it up [23:19:55] it may be possible to ignore it on the client side [23:20:07] Ryan_Lane: Ah yes, you are correct [23:20:15] dschoon: You using PuTTY on Windows? [23:20:26] No. [23:20:29] Urm [23:20:44] On PuTTY there's an option to display it or not [23:20:52] methecooldude: I care because if I execute commands remotely like: [23:21:06] dschoon: -oLogLevel=error [23:21:48] ssh storm@storm1.pmtpa.wmflabs nimbus $argv [23:21:54] Or: ssh -q user@server [23:22:03] yeah -q works too [23:22:08] oh, cool [23:22:18] you can set loglevel in your config [23:22:26] or -q (that's -oLogLevel=quiet) [23:22:33] Can `LogLevel=error` go in .ssh/config? [23:22:37] yep [23:22:39] nice. [23:25:59] sticking it in the Ryan_Lane methecooldude `Host bastion1.eqiad.wmflabs` def silences them all [23:26:16] * Ryan_Lane nods [23:26:31] * methecooldude nods also [23:41:51] so Ryan_Lane, how do I bring up a box to work on, and specifically, how do I get that Etherpad dump on that box? [23:42:02] we'll have to provide the dump [23:42:12] to create instances, see.... [23:42:14] !instances [23:42:14] https://labsconsole.wikimedia.org/wiki/Help:Instances [23:42:18] !security [23:42:18] https://labsconsole.wikimedia.org/wiki/Security_Groups [23:42:27] !security del [23:42:27] Successfully removed security [23:42:41] !security is https://labsconsole.wikimedia.org/wiki/Help:Security_Groups [23:42:41] Key was added! [23:55:54] Ryan_Lane: cool, thanks. [23:56:36] yw [23:56:46] there are occasional problems with the build process [23:57:04] likely due to the hosts currently being slightly overloaded [23:57:19] you can see if a build is successful or not by looking in the console log [23:57:37] if you see a ruby or puppet package fail to install, then the instance build failed [23:57:44] if puppet runs then it is successful [23:59:24] abartov: most things are documented automatically via recent changes, but actions you do manually on instances are not [23:59:30] you should log them via !log [23:59:51] !logging [23:59:51] To log a message, use the following format: !log