[00:26:04] PROBLEM Free ram is now: WARNING on mobile-enwp mobile-enwp output: Warning: 8% free memory [00:31:04] RECOVERY Free ram is now: OK on mobile-enwp mobile-enwp output: OK: 25% free memory [00:49:04] PROBLEM Free ram is now: WARNING on mobile-enwp mobile-enwp output: Warning: 19% free memory [02:47:46] PROBLEM Current Load is now: WARNING on bots-cb bots-cb output: WARNING - load average: 7.58, 24.45, 14.89 [03:03:06] RECOVERY Total Processes is now: OK on php5builds php5builds output: PROCS OK: 95 processes [03:03:56] RECOVERY dpkg-check is now: OK on php5builds php5builds output: All packages OK [03:04:36] RECOVERY Current Load is now: OK on php5builds php5builds output: OK - load average: 0.05, 0.06, 0.02 [03:05:16] RECOVERY Current Users is now: OK on php5builds php5builds output: USERS OK - 0 users currently logged in [03:05:56] RECOVERY Disk Space is now: OK on php5builds php5builds output: DISK OK [03:06:46] RECOVERY Free ram is now: OK on php5builds php5builds output: OK: 92% free memory [03:07:46] RECOVERY Current Load is now: OK on bots-cb bots-cb output: OK - load average: 0.66, 1.07, 4.50 [04:21:17] !account-questions | JRWR [04:21:17] JRWR: I need the following info from you: 1. Your preferred wiki user name. This will also be your git username, so if you'd prefer this to be your real name, then provide your real name. 2. Your preferred email address. 3. Your SVN account name, or your preferred shell account name, if you do not have SVN access. [04:21:32] ok [04:22:57] 1) ForrestFuqua 2) jrwr00@gmail.com 3) ForrestFuqua [04:23:07] PROBLEM Current Load is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [04:23:07] PROBLEM dpkg-check is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [04:23:07] PROBLEM Free ram is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [04:23:32] OH NOES [04:23:37] * JRWR grabs the firehose [04:23:38] ? [04:23:40] heh [04:24:03] well, all of the nagios alerts in here are for labs instances [04:24:06] which aren't production [04:24:08] I know [04:24:25] it's all community maintained :) [04:24:56] PROBLEM Current Users is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:56] PROBLEM Disk Space is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:56] PROBLEM Total Processes is now: CRITICAL on deployment-web deployment-web output: CHECK_NRPE: Socket timeout after 10 seconds. [04:25:09] though I do wonder why that instance is dead [04:25:14] anyways... [04:25:44] I'm going to make your shell account name: forrestfuqua [04:25:47] all lowercased [04:25:50] ok [04:25:58] want a privatekey for anything? [04:26:05] I should say public key [04:26:09] nope [04:26:17] ever use ec2? [04:26:33] Some... I never liked the setup, but I have used them [04:27:07] ok. just sent a password to your email [04:27:42] please see the terms of use: http://www.mediawiki.org/wiki/Wikimedia_Labs/Terms_of_use [04:27:56] PROBLEM Free ram is now: WARNING on deployment-web deployment-web output: Warning: 8% free memory [04:28:06] RECOVERY dpkg-check is now: OK on deployment-web deployment-web output: All packages OK [04:28:33] I've added you into the nginx project [04:28:46] Great [04:28:47] labs-home-wm: poke poke [04:28:54] oh, right [04:29:00] !initial-login | JRWR [04:29:01] JRWR: https://labsconsole.wikimedia.org/wiki/Access#Initial_log_in [04:29:07] ignore the password portion of that [04:29:19] you'll need to add your key to labs and gerrit, though [04:29:46] RECOVERY Disk Space is now: OK on deployment-web deployment-web output: DISK OK [04:29:46] RECOVERY Current Users is now: OK on deployment-web deployment-web output: USERS OK - 0 users currently logged in [04:29:46] RECOVERY Total Processes is now: OK on deployment-web deployment-web output: PROCS OK: 93 processes [04:29:51] every project has storage on instances within that project at /data/project [04:29:58] you can create instance in the nginx project [04:30:00] see the docs [04:30:02] !instances [04:30:02] https://labsconsole.wikimedia.org/wiki/Help:Instances [04:30:06] !security [04:30:06] https://labsconsole.wikimedia.org/wiki/Help:Security_Groups [04:30:28] there's other project members in the nginx project [04:30:47] make sure not to delete or change their instances, unless you discuss it with them [04:31:04] on your projects, you should log your changes, and document what you are doing [04:31:08] !logging [04:31:08] To log a message, use the following format: !log [04:31:25] you can document your changes on the project page [04:31:28] !project nginx [04:31:28] https://labsconsole.wikimedia.org/wiki/Nova_Resource:nginx [04:32:56] PROBLEM Current Load is now: WARNING on deployment-web deployment-web output: WARNING - load average: 0.81, 11.87, 12.36 [04:32:56] RECOVERY Free ram is now: OK on deployment-web deployment-web output: OK: 70% free memory [04:33:53] heh [04:34:06] !log deployment-prep rebooting deployment-web, it OOM'd [04:34:07] Logged the message, Master [04:34:12] wait [04:34:21] !log nginx Starting NGINX PROJECT OF DOOM [04:34:22] !log deployment-prep make that deployment-web5 [04:34:23] Logged the message, Master [04:34:23] Logged the message, Master [04:35:21] !log deployment-prep also deployment-web [04:35:22] Logged the message, Master [04:36:03] !log deployment-prep also deployment-web3 [04:36:04] Logged the message, Master [04:37:17] 03/11/2012 - 04:37:16 - Creating a home directory for forrestfuqua at /export/home/nginx/forrestfuqua [04:37:56] RECOVERY Current Load is now: OK on deployment-web deployment-web output: OK - load average: 0.20, 0.14, 0.05 [04:38:12] 03/11/2012 - 04:38:12 - Creating a home directory for forrestfuqua at /export/home/bastion/forrestfuqua [04:38:17] 03/11/2012 - 04:38:17 - Updating keys for forrestfuqua [04:39:12] 03/11/2012 - 04:39:12 - Updating keys for forrestfuqua [04:41:14] 03/11/2012 - 04:41:14 - Updating keys for forrestfuqua [04:41:19] 03/11/2012 - 04:41:19 - Updating keys for forrestfuqua [04:42:19] odd... [04:44:54] I will be exploring all night to get started! this is going to be fun [04:49:18] Ryan_Lane: So I can just make a new instance and get started? [04:58:13] 03/11/2012 - 04:58:12 - Updating keys for forrestfuqua [04:58:18] 03/11/2012 - 04:58:18 - Updating keys for forrestfuqua [05:01:17] !log nginx started Project Doom, Made Wrong Instance, Reconfiguring [05:01:18] Logged the message, Master [05:01:42] !log nginx Project Doom is to compare nginx with varnish and other reverse proxy type cache's [05:01:43] Logged the message, Master [05:03:49] Hello Krenair [05:03:53] Hello Krinkle* [05:08:08] PROBLEM host: nginx-ffuqua-doom1 is DOWN address: nginx-ffuqua-doom1 CRITICAL - Host Unreachable (nginx-ffuqua-doom1) [05:09:38] well duh... its been deleted [05:13:56] PROBLEM Current Load is now: CRITICAL on nginx-ffuqua-doom1-1 nginx-ffuqua-doom1-1 output: Connection refused by host [05:14:36] PROBLEM Current Users is now: CRITICAL on nginx-ffuqua-doom1-1 nginx-ffuqua-doom1-1 output: Connection refused by host [05:15:16] PROBLEM Disk Space is now: CRITICAL on nginx-ffuqua-doom1-1 nginx-ffuqua-doom1-1 output: Connection refused by host [05:16:06] PROBLEM Free ram is now: CRITICAL on nginx-ffuqua-doom1-1 nginx-ffuqua-doom1-1 output: Connection refused by host [05:17:26] PROBLEM Total Processes is now: CRITICAL on nginx-ffuqua-doom1-1 nginx-ffuqua-doom1-1 output: Connection refused by host [05:18:16] PROBLEM dpkg-check is now: CRITICAL on nginx-ffuqua-doom1-1 nginx-ffuqua-doom1-1 output: Connection refused by host [05:18:48] well... [05:18:51] that build failed [05:22:56] !log nginx nginx-ffuqua-doom1-1 server build failure, rebuilding [05:22:58] Logged the message, Master [05:43:27] man [05:43:37] Its still having issues making a instance [05:43:56] PROBLEM Current Load is now: CRITICAL on nginx-ffuqua-doom1-2 nginx-ffuqua-doom1-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:44:36] PROBLEM Current Users is now: CRITICAL on nginx-ffuqua-doom1-2 nginx-ffuqua-doom1-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:45:16] PROBLEM Disk Space is now: CRITICAL on nginx-ffuqua-doom1-2 nginx-ffuqua-doom1-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:46:06] PROBLEM Free ram is now: CRITICAL on nginx-ffuqua-doom1-2 nginx-ffuqua-doom1-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:47:26] PROBLEM Total Processes is now: CRITICAL on nginx-ffuqua-doom1-2 nginx-ffuqua-doom1-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:48:16] PROBLEM dpkg-check is now: CRITICAL on nginx-ffuqua-doom1-2 nginx-ffuqua-doom1-2 output: CHECK_NRPE: Socket timeout after 10 seconds. [05:50:43] !log nginx rebuilding nginx-doom due to more install bugs... [05:50:44] Logged the message, Master [06:02:59] !log nginx Current Plans: make a full mediawiki install from trunk and install all plugins that I an get running on the install, make it a full setup, php-fpm, apache, nginx, varnish, across serveral servers if needed and test to see if nginx or varnish+apache works better [06:03:01] Logged the message, Master [06:04:00] PROBLEM Current Load is now: CRITICAL on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:04:40] PROBLEM Current Users is now: CRITICAL on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:05:20] PROBLEM Disk Space is now: CRITICAL on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:06:10] PROBLEM Free ram is now: CRITICAL on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:07:30] PROBLEM Total Processes is now: CRITICAL on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:08:20] PROBLEM dpkg-check is now: CRITICAL on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:14:26] !log nginx server is stable, now compiling php 5.4.0 [06:14:27] Logged the message, Master [06:20:50] PROBLEM Current Load is now: WARNING on bots-cb bots-cb output: WARNING - load average: 9.30, 15.55, 7.84 [06:21:10] RECOVERY Free ram is now: OK on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: OK: 93% free memory [06:22:30] RECOVERY Total Processes is now: OK on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: PROCS OK: 99 processes [06:23:20] RECOVERY dpkg-check is now: OK on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: All packages OK [06:24:00] RECOVERY Current Load is now: OK on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: OK - load average: 1.24, 0.77, 0.40 [06:24:40] RECOVERY Current Users is now: OK on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: USERS OK - 1 users currently logged in [06:30:22] PROBLEM dpkg-check is now: CRITICAL on bots-cb bots-cb output: CHECK_NRPE: Socket timeout after 10 seconds. [06:30:22] RECOVERY Disk Space is now: OK on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: DISK OK [06:31:12] PROBLEM dpkg-check is now: CRITICAL on nginx-ffuqua-doom1-3 nginx-ffuqua-doom1-3 output: DPKG CRITICAL dpkg reports broken packages [06:31:32] PROBLEM Current Load is now: CRITICAL on bots-cb bots-cb output: CHECK_NRPE: Socket timeout after 10 seconds. [06:36:23] PROBLEM Current Load is now: WARNING on bots-cb bots-cb output: WARNING - load average: 11.74, 24.46, 16.35 [06:46:21] !log nginx running first test compile of php5.4.0 with the following options: --enable-fpm --with-fpm-user=www-data --with-fpm-group=www-data --enable-bcmath --enable-exif --with-gd --enable-ftp --with-mhash --with-mcrypt --with-mysql --with-pdo-mysql --enable-embedded-mysqli --with-curl --with-pcre-regex [06:46:22] Logged the message, Master [06:52:32] RECOVERY dpkg-check is now: OK on bots-cb bots-cb output: All packages OK [06:53:21] PROBLEM Free ram is now: CRITICAL on mobile-enwp mobile-enwp output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:03] PROBLEM Disk Space is now: CRITICAL on nova-daas-1 nova-daas-1 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:03] PROBLEM Current Users is now: CRITICAL on nova-daas-1 nova-daas-1 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:03] PROBLEM Free ram is now: CRITICAL on nova-daas-1 nova-daas-1 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:03] PROBLEM dpkg-check is now: CRITICAL on nova-daas-1 nova-daas-1 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:23] PROBLEM Free ram is now: CRITICAL on orgcharts-dev orgcharts-dev output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:58] PROBLEM Current Load is now: WARNING on mobile-enwp mobile-enwp output: WARNING - load average: 14.14, 12.71, 8.27 [06:55:58] PROBLEM Current Load is now: WARNING on bots-sql3 bots-sql3 output: WARNING - load average: 14.73, 13.15, 9.04 [06:57:14] RECOVERY Disk Space is now: OK on aggregator1 aggregator1 output: DISK OK [06:58:11] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 5.20, 6.14, 4.24 [06:58:11] PROBLEM Free ram is now: WARNING on mobile-enwp mobile-enwp output: Warning: 9% free memory [07:01:27] PROBLEM Free ram is now: WARNING on bots-cb bots-cb output: Warning: 12% free memory [07:03:06] PROBLEM Free ram is now: CRITICAL on mobile-enwp mobile-enwp output: Critical: 2% free memory [07:03:30] well this isnt good [07:03:50] PROBLEM Disk Space is now: CRITICAL on mobile-enwp mobile-enwp output: CHECK_NRPE: Socket timeout after 10 seconds. [07:03:50] PROBLEM Current Load is now: CRITICAL on nova-daas-1 nova-daas-1 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:03:50] PROBLEM dpkg-check is now: CRITICAL on mobile-enwp mobile-enwp output: CHECK_NRPE: Socket timeout after 10 seconds. [07:03:50] PROBLEM SSH is now: CRITICAL on mobile-enwp mobile-enwp output: CRITICAL - Socket timeout after 10 seconds [07:03:50] PROBLEM Total Processes is now: CRITICAL on nova-daas-1 nova-daas-1 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:04:47] RECOVERY Disk Space is now: OK on nova-daas-1 nova-daas-1 output: DISK OK [07:04:48] RECOVERY Free ram is now: OK on nova-daas-1 nova-daas-1 output: OK: 71% free memory [07:05:03] RECOVERY Current Users is now: OK on nova-daas-1 nova-daas-1 output: USERS OK - 0 users currently logged in [07:05:03] RECOVERY dpkg-check is now: OK on nova-daas-1 nova-daas-1 output: All packages OK [07:05:03] RECOVERY Free ram is now: OK on orgcharts-dev orgcharts-dev output: OK: 87% free memory [07:06:45] PROBLEM Current Load is now: CRITICAL on mobile-enwp mobile-enwp output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:05] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 0.72, 2.82, 3.64 [07:08:35] RECOVERY Current Load is now: OK on nova-daas-1 nova-daas-1 output: OK - load average: 0.26, 3.00, 3.57 [07:08:35] RECOVERY Total Processes is now: OK on nova-daas-1 nova-daas-1 output: PROCS OK: 98 processes [07:11:17] PROBLEM Current Load is now: WARNING on mobile-enwp mobile-enwp output: WARNING - load average: 7.53, 13.93, 12.95 [07:13:27] RECOVERY Disk Space is now: OK on mobile-enwp mobile-enwp output: DISK OK [07:13:27] RECOVERY SSH is now: OK on mobile-enwp mobile-enwp output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [07:13:27] RECOVERY dpkg-check is now: OK on mobile-enwp mobile-enwp output: All packages OK [07:15:55] !log nginx Dont do work on servers at 00:00 PST... Gotta love midnight backups [07:15:56] Logged the message, Master [07:16:17] PROBLEM Free ram is now: CRITICAL on bots-cb bots-cb output: Critical: 3% free memory [07:18:07] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.11, 0.88, 2.27 [07:21:27] RECOVERY Free ram is now: OK on bots-cb bots-cb output: NRPE: Unable to read output [07:46:17] RECOVERY Current Load is now: OK on bots-sql3 bots-sql3 output: OK - load average: 1.73, 2.29, 4.57 [08:38:39] Hello [08:38:56] Do I need a private key to access to an instance? [08:41:38] hi Hydriz [08:41:59] Hello [08:44:21] Hydriz: Do I need a private key to access to an instance? [08:44:47] you need a private key on your own computer [08:44:52] and upload the public key to labs [08:46:37] but amazingly it doesn't seem to work with connecting from bastion [08:47:31] Oh I just realised I am a global sysop :) [08:47:50] now to make vandals cry [08:48:02] And where can I type the key? I can login to bastion. [08:48:11] but not to the instance [08:48:24] yeah [08:48:31] exactly what I am experiencing [08:48:42] ? [08:49:47] you can do: ssh @bastion.wmflabs.org [08:49:58] then in bastion, you try ssh .pmtpa.wmflabs [08:50:13] and you fail with public key being denied [08:50:15] or something [08:50:47] but if you have set up the ssh proxycommand, then you don't get this issue [08:53:55] and where is the prozycommand? on pc or wmflabs [08:54:57] are you using ubuntu/linux? [08:59:44] I'm a Linux user with Putty, Hydriz. [08:59:50] ah [08:59:56] Putty is going to be different [09:00:09] which I am trying to figure out how to do with [09:00:11] ah [09:00:31] and thats the problem I encountered there [09:00:38] How can I say the private key dir in ssh command? [09:00:39] do use the original linux terminal [09:00:53] and put the config as stated in Help:Access in your ~/.ssh/config file [09:01:34] ok [09:01:48] Hydriz: where I can write the private key dir? [09:02:09] I don't understand you [09:02:09] :( [09:02:26] write private key dir? [09:02:52] *enter [09:03:16] hmm [09:03:29] you mean in the Explorer program? [09:03:32] or in terminal [09:03:38] terminal [09:04:05] Can I type "ssh -privatekeydir /media/stick..."? [09:04:31] wait [09:04:39] do you mean entering into the .ssh directory? [09:05:36] no, I want to enter the path to private key in the ssh command [09:05:50] oh [09:06:09] I think its the -i option [09:06:13] ah [09:09:57] PROBLEM Disk Space is now: WARNING on aggregator1 aggregator1 output: DISK WARNING - free space: / 515 MB (5% inode=94%): [09:12:10] The key is in a .pkk file. [09:12:42] heh [09:12:49] thats a Putty standard file [09:12:53] argh [09:13:03] so... [09:13:16] do regenerate a keypair in the terminal itself [09:13:32] ? [09:13:39] A new keypair [09:13:40] ? [09:14:11] ssh-keygen [09:14:24] then upload the public key to labsconsole [09:14:34] But I want to use the existing key ;) [09:14:35] then add the config stated in Help:Access in the config file [09:14:39] zzz [09:14:50] then... good luck :P [09:15:00] Cos I am not taught how to use that [09:15:12] unless you wait for Ryan to come awake [09:15:12] k [09:34:57] PROBLEM Disk Space is now: CRITICAL on aggregator1 aggregator1 output: DISK CRITICAL - free space: / 258 MB (2% inode=94%): [11:08:49] !log incubator Deleted instance incubator-testing [11:08:50] Logged the message, Master [12:24:49] PROBLEM Disk Space is now: CRITICAL on incubator-web incubator-web output: Connection refused by host [12:29:34] RECOVERY Disk Space is now: OK on incubator-web incubator-web output: DISK OK [14:06:52] I have a problem: http://imgur.com/fDmfZ [14:12:40] zzz [14:12:48] Thats what I said a few hours ago [14:13:10] Do I need a private key on bastion? [14:13:32] I don't know whats the issue [14:14:29] I can connect to bastion, but I can't connect from bastion to the instance. [14:14:41] (now I'm on Windows) [14:16:59] yes [14:17:07] and thats also what I am trying to say [14:17:20] no matter what kind of terminal you use [14:17:22] this issue exists [14:17:31] (Putty/Ubuntu Terminal) [14:18:19] How can I access? [14:18:26] :( [14:18:32] * Hydriz shrugs [14:18:47] hmm [14:19:27] 03/11/2012 - 14:19:26 - Updating keys for addshore [14:20:12] 03/11/2012 - 14:20:12 - Updating keys for addshore [14:20:13] 03/11/2012 - 14:20:13 - Updating keys for addshore [14:20:40] petan|wk: there is documentation :) [14:27:45] hi [14:27:55] !access | IWorld addshore [14:27:55] IWorld addshore: https://labsconsole.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [14:29:16] I just started kvm on my desktop if I am gone, my kernel is down [14:29:39] for some reason kvm doesn't like my kernel [14:31:18] :| [14:36:14] Where ubuntu write the .pub file? [14:36:17] *writes [14:45:13] 03/11/2012 - 14:45:13 - Updating keys for iworld [14:45:27] 03/11/2012 - 14:45:27 - Updating keys for iworld [14:48:38] IWorld .ssh [14:48:43] $HOME/.ssh [14:50:52] Where are only "authorized_keys known_hosts known_hosts.old [14:50:52] " [14:55:33] IWorld type [14:55:36] ssh-keygen [14:55:42] that will create a key there [14:55:58] now I must enter a filename [14:56:35] ok? [14:57:42] @ petan [14:59:33] great, I have an .pub file. Now I can upload that :-) [15:01:58] yes [15:03:12] 03/11/2012 - 15:03:12 - Updating keys for iworld [15:03:14] okay [15:03:27] 03/11/2012 - 15:03:27 - Updating keys for iworld [15:05:08] THANKS!!! [15:05:21] it runs! [15:06:10] good [16:00:39] Ryan_Lane (or someone else) : Could you create a labs account for me? I'm already in svn. [17:02:19] !account [17:02:19] in order to get an access to labs, please type !account-questions and ask Ryan, or someone who is in charge of creating account on labs [18:16:08] !log nginx Now preparing Nginx 1.1.16 [18:16:09] Logged the message, Master [18:22:03] !log nginx Now compiling with the following options ./configure --with-http_ssl_module --with-http_stub_status_module --with-ipv6 --with-http_geoip_module --with-file-aio --with-http_gzip_static_module --with-http_gzip_static_module --with-http_gzip_static_module [18:22:04] Logged the message, Master [18:27:30] !log nginx Now preparing mecached 1.4.13 [18:27:31] Logged the message, Master [18:36:38] !log nginx memcached is running with the following options: memcached -vvv -d -r -m 2048 -L -t 2 [18:36:39] Logged the message, Master [18:44:53] !log nginx selected Mediawiki 1.19.1Beta1 as test subject [18:44:54] Logged the message, Master [19:57:13] 03/11/2012 - 19:57:13 - Creating a home directory for iworld at /export/home/huggle/iworld [19:58:13] 03/11/2012 - 19:58:12 - Updating keys for iworld [20:05:42] !log huggle new member IWorld [20:05:43] Logged the message, Master [20:21:12] =] [20:22:16] * Damianz yawns [20:23:06] or Damianz addshore [20:23:15] :O [20:23:16] Damianz: can you add addshore to the bots project? [20:23:34] I coul [20:23:35] d [20:24:27] !log bots Added addshore as a member. [20:24:29] Logged the message, Master [20:24:40] !account-questions | multichill [20:24:40] multichill: I need the following info from you: 1. Your preferred wiki user name. This will also be your git username, so if you'd prefer this to be your real name, then provide your real name. 2. Your preferred email address. 3. Your SVN account name, or your preferred shell account name, if you do not have SVN access. [20:24:41] * addshore thanks [20:24:46] (not really me) [20:24:51] heh [20:25:09] 03/11/2012 - 20:25:09 - Creating a home directory for addshore at /export/home/bots/addshore [20:25:11] * jeremyb runs away [20:25:20] jeremyb: Dude, that's all in svn [20:25:38] multichill: it's the standard response when your question is asked [20:26:08] multichill: anyway, tell ^demon or sumanah or Ryan_Lane ;) [20:26:10] 03/11/2012 - 20:26:09 - Updating keys for addshore [20:26:57] These spicy beans are rather spicy [20:46:22] !log hugglewa iworld: adding stable folder [20:46:24] Logged the message, Master [20:49:42] So we have a bot that uses a bot to log messages? :) [20:49:53] yes [20:50:25] I typed "log adding stable folder" in shell ;) [20:52:41] !log hugglewa iworld: creating /var/www/index2.php [20:52:42] Logged the message, Master [20:57:20] multichill: I do need the answers to those questions [20:57:30] multichill: don't make me go scrounge in svn :) [20:57:43] Ryan_Lane: I have a little problem ;) [20:57:53] Just multichill and maarten@mdammers.nl [20:57:53] nice to see you [21:01:40] Ryan_Lane: That should be enough right? ^^ [21:05:24] sure [21:05:42] unless you want your git name to be something else (it can be different from your ssh account name) [21:06:01] Ryan_Lane: if I have a labs account, do I have commit access on wm svn? [21:06:13] JeroenDeDauw: we have that in production too :D [21:06:17] IWorld: no [21:06:23] you have to apply for svn access [21:06:31] you have git access, though [21:06:43] Gerrit looks really messy review wise now [21:06:53] and this is why the world of review before merge is awesome :) [21:07:38] Where is the /userinfo folder on Git? [21:07:51] we don't have one [21:07:55] and why I'm not "Ready for Git?" [21:08:01] we're going to pull that info from LDAP [21:08:15] I have a gerrit account [21:08:18] yeah [21:08:28] ignore that [21:08:30] ok [21:08:41] !initial-login | multichill [21:08:41] multichill: https://labsconsole.wikimedia.org/wiki/Access#Initial_log_in [21:08:51] If the svn moved to Git, do I need commit access? [21:08:56] multichill: did you need full labs access, or just gerrit access? [21:09:05] IWorld: we are moving to git in a couple weeks [21:09:07] Labs too [21:09:10] ok [21:09:35] Ryan_Lane: do I have full access in Git? [21:09:46] IWorld: you have the same access as everyone else [21:09:53] everyone can do a push into git [21:10:09] it must be reviewed before being merged, though [21:10:13] 03/11/2012 - 21:10:13 - Creating a home directory for multichill at /export/home/bastion/multichill [21:10:15] ah [21:10:29] cool [21:10:32] gated trunk allows us to give a gerrit account to anyone we want [21:10:36] we couldn't do that with svn [21:10:51] ah [21:11:03] cool [21:11:08] multichill: did you need access to any projects? [21:11:12] 03/11/2012 - 21:11:12 - Updating keys for multichill [21:11:33] Ryan_Lane: Not for now. Did you copy my svn ssh key?' [21:11:45] it all uses the same system [21:11:52] gerrit does not, though [21:11:54] we have an open bug for that [21:12:07] The current svn access is bizzare [21:12:26] Ryan_Lane: are the SSH Keys syncronized in future? [21:12:38] Not gerrit [21:12:45] they keys are sync'd from svn to git, but not the opposite way [21:12:50] this is by design [21:13:30] ok [21:14:23] Do you review all changes? [21:14:31] I don't [21:14:40] it depends on the gerrit project [21:14:42] The admins [21:15:00] core committers review changes. I review changes for the puppet repo. [21:15:08] ah [21:15:09] mediawiki core committers review for mediawiki [21:15:19] extension maintainers will review for their projects, etc [21:15:22] ok [21:15:38] Git has many benefits [21:15:50] I especially love that extension maintainers can review their changes :) [21:16:02] :) [21:16:02] I've had people break my extensions a number of times [21:16:10] ah [21:16:24] then end-users pull from trunk (I tell people trunk is stable, because I keep trunk stable) [21:16:25] Extension wise are we using repos per extension or some weird branch thing? [21:16:32] repo per extension, I believe [21:16:37] :D [21:16:47] Ryan_Lane: who can create an repo? [21:16:54] on Git [21:16:59] I'm not totally sure :D [21:17:17] I'd like most people to be able to [21:17:36] I think we still need to work that out [21:17:39] it'll take some effort [21:17:53] Ryan_Lane: Btw do you use bz for your extensions. [21:18:05] only the ones I actually maintain [21:18:14] and LDAP using bugzilla is a really new thing [21:18:22] Eh [21:18:31] LdapAuthentication extension [21:18:33] I'll wait until extensions hit gerrit then re-submit my patch in bz [21:18:41] I used to take bugreports via liquid threads [21:18:41] :P [21:18:42] Ryan_Lane: is there an admin group on Gerrit? [21:18:48] IWorld: yes [21:19:01] Who has that? [21:19:11] no clue [21:19:29] :) [21:19:39] hahhahahaha "DON'T USE THIS - CHAD CAN'T SPELL" [21:19:46] description on a repo :D [21:20:01] lol [21:20:11] Thanks Ryan. Will have a look at it later this week. [21:20:11] is there an test repo on gerrit? [21:20:13] Btw can we delete repos in gerrit yet? [21:20:17] @ Ryan_Lane [21:20:22] Damianz: nope [21:20:32] IWorld: I don't think so [21:20:34] that's going to get fun. [21:20:45] Damianz: well, we *can* delete projects [21:20:55] it just isn't built into the interface [21:21:24] Yeah I was listening to the FOSDEM talk about gerrit - it's rather interesting. [21:21:30] you have to invoke some incantation of sql commands, and delete the repo on file-system [21:21:31] Ryan_Lane: https://gerrit.wikimedia.org/r/#admin,project,test/mediawiki,info [21:21:52] IWorld: yeah, it was for testing the import of mediawiki [21:21:59] ok [21:22:10] Damianz: gerrit is good software with a poor interface [21:22:17] Can I use tortoisegit? [21:22:55] I like the idea of git review - not sure I like the gerrit way of doing things but it makes sense for the use here. [21:23:23] I like Git and Git review. :-) [21:25:24] Good night (GMT+1) [21:30:48] I actually like the gerrit way of thinga [21:30:49] *thins [21:30:51] *things [21:30:58] I also enjoy my poor typing skills [21:31:03] :D [21:31:40] hi [21:31:46] someone need me [21:31:46] :D [21:31:55] dunno [21:32:06] petan: the deployment-web boxes OOM often, though :D [21:32:08] if not I go continue enjoy weekend [21:32:15] I like it for some things but reviewing every change for others is just a whole lot of work. Quite tagging stable and unit testing on push. [21:32:18] ah [21:32:23] Ryan_Lane: I suspect puppet [21:32:23] enjoy your weekend ;) [21:32:30] nah, it isn't puppet [21:32:34] need more cache [21:32:40] more squid, more memcache, etc [21:32:43] because there is one instance which isn't even running apache and it's getting oom [21:32:50] webs1 [21:32:54] it's not in squid [21:33:05] Are you sure squid is doing stuff? [21:33:08] well, we can look into it later :) [21:33:15] Damianz: surely it's not [21:33:24] Damianz: if u can configure it, do it [21:33:34] I am waiting months to see production squid [21:33:41] heh [21:33:42] sorry :) [21:33:47] I'm not 100% sure how production squids are configured but they are probably in puppet somewhere. [21:33:52] it's full of secret data, Damianz [21:34:07] private bank accounts of wikimedia employees and such [21:34:10] is in config of squid [21:34:27] :D [21:34:31] they need to remove it [21:34:43] o.0 [21:34:52] Damianz: squid config is nowhere in puppet [21:34:53] it's secret [21:35:00] neither on noc.wikimedia [21:35:07] it's only config which is hidden from public [21:35:18] That's kind of insane but w/e. [21:35:24] Also wikitech really needs an ssl :P [21:35:31] why [21:35:31] :D [21:35:54] it's surely a best place to hack in [21:36:03] what is benefit of hacking wikitech :D [21:36:08] and im doing a compare of squid, nginx, varnish [21:36:24] if u can document it, I will be happy to use something else [21:36:25] Ive been hunting, and was about to ask where I could find their configs [21:36:28] I like varnish more than squid. [21:36:34] and I like nginx [21:36:43] why have layers, when one can do it all [21:37:27] damn why do people need to register to download freeware :( [21:37:29] Distribution of traffic, redundancy etc [21:37:41] usinh php-fpm pools on the backend, and nginx doing the cache and load balancing [21:37:42] I found a torrent to download freeware just because I am lazy :P [21:37:45] lol [21:37:49] that is lazy [21:38:03] PROBLEM Free ram is now: WARNING on deployment-web2 deployment-web2 output: Warning: 19% free memory [21:38:11] I don't want to register there [21:38:18] there it goes again.. [21:38:29] Reminds me I need to look at cgroups with php-frpm. [21:38:43] no problem I just create 400 instances and that will be... Ryan_Lane can I? :P [21:38:52] you should [21:38:55] :D [21:38:56] I think it would solve the oom [21:39:26] but maybe cluster would be oom [21:39:35] oh Ryan_Lane, the poor instances are slow as hell today and lastnight, at midnight they where at a turtles pace [21:39:37] I mean the host cluster [21:39:50] hmm [21:39:57] btw Ryan_Lane you have any experience with kvm and windows :P [21:40:04] i do [21:40:08] because I try to make windows xp run on my kvm [21:40:15] but graphics doesn't really work [21:40:19] heh [21:40:22] windows 7 crash my kernel [21:40:24] that suck [21:40:36] when I boot win 7 my linux crash [21:40:43] windows xp boot [21:40:48] but graphic needs a driver [21:40:50] on phone. back in a bit [21:40:52] ok [21:41:11] Is deployment web puppetize atm? [21:41:15] we want [21:41:17] to do it [21:41:25] Damianz: you can start :D [21:42:26] If all the mw code is on gluster storage then once the nodes are puppetized deploying out new stuff is easy. [21:43:03] RECOVERY Free ram is now: OK on deployment-web2 deployment-web2 output: OK: 21% free memory [21:43:41] where is gluster [21:43:44] Nom, don't have access to that project :P [21:43:45] .//data? [21:43:49] Yeah [21:43:50] Damianz: fixing [21:43:52] See Ryan's email. [21:44:32] fixed [21:44:55] you have access now mh [21:45:17] 03/11/2012 - 21:45:16 - Creating a home directory for damian at /export/home/deployment-prep/damian [21:46:16] 03/11/2012 - 21:46:16 - Updating keys for damian [21:47:14] :D [21:48:52] sql on deployment will not work for 2h [21:54:26] petan: do you know the difference between m1 and s1 on the instances? [22:00:55] The squid instances seems maxed out of fds anyway [22:08:20] did you guys start moving away from nfs instances yet? [22:08:47] it's at /data/project [22:09:21] I'm going to move the home directories to it soon as well [22:14:20] !log deployment-prep Increased nofile on deployment-squid and added max_filedesc option to squid config. Also installed squidclient. [22:14:22] Logged the message, Master [22:15:14] Ryan_Lane: For multiple squids I assume you where thinking LVS with direct routing where the LVS box holds the public IP and routes to the internal ips of the squid instances? [22:15:33] yes, but it's actually pretty difficult to setup lvs in labs [22:16:00] What's the ipv6 status of labs Ryan_Lane ? [22:16:16] there's a bit of work I need to do for that [22:16:32] Ryan_Lane: I'd imagine so - kinda funy that we really route public ips currently anyway. [22:16:42] what do you mean? [22:17:13] Arn't public IPs routed to private IPs on instances? [22:17:27] they are NAT'd [22:17:31] Ah [22:17:40] Would be a waste if Wikipedia misses http://www.worldipv6launch.org/participants/?q=1 [22:17:46] Hmm seems there is a local interface setup for lvs on this squid box. [22:17:54] yeah [22:18:03] multichill: I think we are planning for that [22:18:06] in production [22:18:27] ip addr list [22:18:32] we need to upgrade LVS, add IPv6 support to pybal, and enable IPv6 for varnish [22:18:50] and proxy IPv6 for squid through the nginx ssl cluster [22:19:01] we are also using the ssl cluster as an IPv6 proxy [22:19:16] Damianz: yeah, the puppet config does that [22:19:28] Ah [22:19:33] Damianz: but, routing makes it difficult [22:20:00] also, libvirt is configured to disallow ip, mac and arp spoofing [22:20:09] for obvious reasons ;) [22:20:10] We could use nat but that would defat the point of using routing as the squid boxes go from being hardly touched to raped. [22:20:30] that's the point [22:20:42] 95+% of our traffic is handled by squid/varnish [22:21:06] 98% for media, about 95% for text [22:21:12] 98% or so for bits [22:21:35] you guys should set up varnish for bit [22:21:37] *bits [22:21:41] it's actually puppetized [22:21:43] fully [22:21:47] same for mobile [22:21:56] though mobile needs squid support too [22:21:57] I wonder if we make it. Would be nice to have test access somewhere soon [22:22:15] yeah, it's unlikely we'll be able to test it in labs first [22:22:28] way too much upfront work for that [22:22:32] Production caching does weird things sometimes [22:22:37] Damianz: how so? [22:23:21] As in if you reply to a talk page it can take a couple of refreshes before the cache clears and cacthes up which gets confuzzling. [22:23:49] only if purging is broken [22:24:23] and if that's the case you need to let us know [22:25:00] It doesn't happen often but sometimes it takes quite a while :D [22:25:11] if that's the case, then purging has somehow broken [22:25:29] mediawiki sends a multicast udp message that purges everything at the same time [22:25:44] we have a multicast to unicast proxy for esams [22:26:01] Interesting [22:26:03] so, it may be 300ms or so slower [22:26:19] in esams we have a unicast to multicast relay [22:26:29] Btw is ganglia in use in labs now - aside from the nodes in the production cluster? [22:26:52] not yet [22:26:55] sara is working on it [22:27:05] we're setting it up per-project [22:27:23] where every instance talks to the ganglia server, and the ganglia server runs a gmond server per project [22:27:27] Install per project or an aggregater with a 'cluster' for each project? [22:27:34] Oh cool, ignore last line :P [22:27:47] so, it can't easily be puppetized [22:28:04] instead, we are dynamically generating the gmond and gmetad configs on the ganglia server [22:28:16] and gmond on the instances is handled via puppet and a custom facter script [22:29:07] it would be more ideal to have an aggregator in each project, but there's no way of easily configuring that [22:33:22] hm. I wonder if I could just modify the ldap autofs mount to switch from NFS to gluster for home directories [22:33:44] That might break a lot of stuff [22:33:49] current mounts would stay NFS until they timed out, and would switch gradually to gluster [22:34:01] I'd have to rsync the directories constantly [22:34:20] You could bind mount the gluster stuff onto nfs but that might cause weird fh issues. [22:34:30] well, autofs times out mounts [22:34:40] so, if you don't access it for a little while, it unmounts [22:34:45] Hmm [22:34:51] then it would re-mount as glusterfs, rather than nfs [22:34:59] so people shouldn't notice [22:35:10] as long as their home directory files were still the same [22:35:37] some instances would need to be rebooted for this to switch properly [22:35:49] since people run crons and leave themselves logged in [22:35:59] I could run a script to see which instances still have NFS running [22:36:07] Shame it couldn't have been switches when you murdred everything last week. [22:36:19] yeah, I didn't have the gluster storage up yet [22:36:24] My english is awful tonight =/ [22:36:26] I'm going to have to do that occasionally [22:36:35] because live-migration is broken [22:36:44] apparently it'll still kind of suck in essex [22:36:46] I hope they fix that in essex which is due out in a few weeks :D [22:36:52] heh [22:36:57] already discussed that with vish [22:37:02] at a meetup [22:37:25] I don't get why the stop the instances as libvirt supports live, live migration... [22:37:29] s/the/they/ [22:37:41] they don't stop them [22:37:43] they suspend them [22:37:51] I also don't get why they get suspended, though [22:38:03] Well I meant stop in the sense that it goes offline :P [22:38:56] yeah [22:38:58] it's annoying [22:39:10] this is one case where esxi has a major advantage [22:39:21] live migration in an esx cluster just works [22:39:29] until it doesn't :D [22:39:40] I rarely had a failed migration in esx [22:39:50] esx can also do that crazy thing where it runs 2 vms side my side replicating the ram in case one node fails... or it goofs up and takes your whole cluster down :) [22:39:57] yeah [22:40:06] it's a neat feature [22:40:12] it's crazy expensive [22:40:44] Tbh if you're replicating servers like that for redundancy you're probably doing it work, the idea is nice though. [22:40:53] s/work/wrong/ gah [22:41:15] yeah, we never plan on having servers like that [22:41:29] if it can't horizontally scale we don't want it [22:42:05] I wish MySQL cluster didn't suck donkey balls - that was a really nice idea that just screwed me over too many times when the storage nodes decided to all die at once. [22:42:59] if a master dies, we go into read-only and failover manually [22:43:07] it's safer, and doesn't cause a ton of downtime [22:43:32] Hmm, do you then just elect a master from one of the slaves and switch the master host on all the other slaves? [22:43:37] maybe if that caused us major financial issues we'd come up with a more automated solution, but we don't need to eal with it [22:43:56] no, we just pick one [22:46:08] need food. back in a while [22:46:27] I'm still un-decided on MySQL replication - drdb based with forcing flushing out to the disk vs binlog has major advantages and disadvantages depending on what you want. It's like choosing do I want downtime or random data inconsistancies =/ Also filtering binlog wise of what you send to slaves is... useless for most cases. [22:55:09] hmm, (Can't contact the database server: Lost connection to MySQL server at 'reading initial communication packet', system error: 111 (deployment-sql)) [22:55:11] on deployment-prep [22:56:18] petan is breaking sql for the next few hours [22:56:30] heh, ok