[00:05:32] PROBLEM Current Load is now: CRITICAL on deployment-web deployment-web output: CRITICAL - load average: 78.03, 38.12, 15.49 [00:05:32] PROBLEM Current Load is now: CRITICAL on deployment-web2 deployment-web2 output: CRITICAL - load average: 35.49, 23.30, 10.07 [00:05:42] PROBLEM Free ram is now: CRITICAL on deployment-web2 deployment-web2 output: Critical: 5% free memory [00:06:32] PROBLEM Free ram is now: CRITICAL on deployment-web5 deployment-web5 output: CHECK_NRPE: Socket timeout after 10 seconds. [00:06:42] PROBLEM Current Load is now: CRITICAL on deployment-web5 deployment-web5 output: CHECK_NRPE: Socket timeout after 10 seconds. [00:07:02] PROBLEM Current Load is now: WARNING on deployment-web4 deployment-web4 output: WARNING - load average: 19.45, 21.16, 10.48 [00:07:32] PROBLEM Current Load is now: CRITICAL on deployment-web3 deployment-web3 output: CRITICAL - load average: 107.49, 62.69, 27.35 [00:09:02] PROBLEM Free ram is now: CRITICAL on deployment-web3 deployment-web3 output: Critical: 3% free memory [00:10:32] PROBLEM Current Load is now: WARNING on deployment-web deployment-web output: WARNING - load average: 1.70, 21.21, 15.45 [00:10:32] PROBLEM Current Load is now: WARNING on deployment-web2 deployment-web2 output: WARNING - load average: 0.72, 10.81, 8.59 [00:10:42] RECOVERY Free ram is now: OK on deployment-web2 deployment-web2 output: OK: 46% free memory [00:11:32] PROBLEM dpkg-check is now: CRITICAL on deployment-web5 deployment-web5 output: CHECK_NRPE: Socket timeout after 10 seconds. [00:11:42] PROBLEM SSH is now: CRITICAL on deployment-web5 deployment-web5 output: CRITICAL - Socket timeout after 10 seconds [00:12:12] PROBLEM Current Users is now: CRITICAL on deployment-web5 deployment-web5 output: CHECK_NRPE: Socket timeout after 10 seconds. [00:12:12] PROBLEM Disk Space is now: CRITICAL on deployment-web5 deployment-web5 output: CHECK_NRPE: Socket timeout after 10 seconds. [00:12:12] PROBLEM Total Processes is now: CRITICAL on deployment-web5 deployment-web5 output: CHECK_NRPE: Socket timeout after 10 seconds. [00:12:22] PROBLEM Free ram is now: CRITICAL on deployment-web deployment-web output: Critical: 3% free memory [00:12:40] petan you pinged me? :) [00:14:02] RECOVERY Free ram is now: OK on deployment-web3 deployment-web3 output: OK: 71% free memory [00:15:35] addshore: he pinged you about access to hugglewm [00:15:45] I added him, so no worries [00:15:49] :) [00:16:04] Can I ping you about access to the bots wmlabs project? :) [00:16:22] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [00:16:32] petan: ^^ ? [00:16:33] :) [00:16:41] I usually try to avoid giving access to projects [00:16:51] ahhh :P [00:16:58] as it means people are missing from the loop [00:17:26] It would be good if people asked for access requests on the discussion page of the project [00:17:28] !project bots [00:17:28] https://labsconsole.wikimedia.org/wiki/Nova_Resource:bots [00:17:32] PROBLEM Current Load is now: WARNING on deployment-web3 deployment-web3 output: WARNING - load average: 0.05, 11.61, 17.95 [00:18:03] !project-access is To request access to a project, use a project's discussion page; see !project-discuss [00:18:03] Key was added! [00:18:03] willdo :) [00:18:45] !project-discuss https://labsconsole.wikimedia.org/wiki/Nova_Resource:bots [00:18:52] !project-discuss is https://labsconsole.wikimedia.org/wiki/Nova_Resource:bots [00:18:52] Key was added! [00:18:57] !project-discuss bots [00:18:57] https://labsconsole.wikimedia.org/wiki/Nova_Resource:bots [00:19:04] o.O [00:19:22] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [00:20:03] !project-discuss [00:20:03] https://labsconsole.wikimedia.org/wiki/Nova_Resource:bots [00:20:05] !project-discuss foo [00:20:05] https://labsconsole.wikimedia.org/wiki/Nova_Resource:bots [00:20:06] !project-discuss is https://labsconsole.wikimedia.org/wiki/Nova_Resource_Talk:$1 [00:20:06] Key exist! [00:20:12] !project-discuss del [00:20:12] Successfully removed project-discuss [00:20:12] !project-discuss del [00:20:14] Unable to find the specified key in db [00:20:15] !project-discuss is https://labsconsole.wikimedia.org/wiki/Nova_Resource_Talk:$1 [00:20:15] Key was added! [00:20:17] :) [00:20:25] !project-discuss foo [00:20:25] https://labsconsole.wikimedia.org/wiki/Nova_Resource_Talk:foo [00:20:32] RECOVERY Current Load is now: OK on deployment-web2 deployment-web2 output: OK - load average: 0.00, 1.47, 4.51 [00:20:32] !project-access del [00:20:33] Successfully removed project-access [00:20:38] !project-access is To request access to a project, use a project's discussion page; see !project-discuss [00:20:38] Key was added! [00:21:22] adium's auto-"correction" and formatting makes me want to: (╯°□°)╯︵ ┻━┻ [00:22:02] RECOVERY Current Load is now: OK on deployment-web4 deployment-web4 output: OK - load average: 0.04, 1.13, 4.04 [00:22:22] PROBLEM Free ram is now: WARNING on deployment-web deployment-web output: Warning: 6% free memory [00:23:17] * werdna waves [00:23:19] how goes Ryan_Lane [00:23:26] poorly [00:23:32] I broke my unicorn's horn [00:23:41] :o [00:23:48] oh right, you're talking about the _base directory on labs [00:23:52] * werdna was confused [00:24:32] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [00:26:42] werdna: how's things in aussie-ville? [00:27:02] they're going [00:27:04] back at uni [00:27:14] I totally need to make a mediawiki extension called aussie-ville. It'll be like farmville, but you'll throw prawns on the barbie, and chase kangaroos [00:27:22] PROBLEM Free ram is now: CRITICAL on deployment-web deployment-web output: Critical: 5% free memory [00:27:30] so, as usual, trying to juggle two jobs, class, appropriate amounts of time at the pub, sleep, et cetera [00:27:35] heh [00:27:36] indeed [00:27:50] fun times [00:30:32] RECOVERY Current Load is now: OK on deployment-web deployment-web output: OK - load average: 0.00, 0.39, 4.25 [00:37:32] RECOVERY Current Load is now: OK on deployment-web3 deployment-web3 output: OK - load average: 0.09, 0.30, 4.99 [00:46:22] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [00:49:22] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [00:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [01:16:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [01:19:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [01:24:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [01:43:23] heh. this gluster code is a bitch and a half to write [01:43:42] I'm running all commands via ssh [01:46:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [01:49:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [01:52:25] PROBLEM Free ram is now: WARNING on deployment-web deployment-web output: Warning: 6% free memory [01:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [01:56:20] hey Ryan_Lane, you htere still? [01:56:23] yep [01:56:24] I can't log into deployment-dbdump [01:56:27] Permission denied (publickey). [01:56:30] yes my agent is forwarded [01:57:01] you *sure*? :) [01:57:18] because I was able to su to you, and you have an authorized_keys there [01:57:32] andrew-macbook:wmf-deployment andrew$ ssh -A bastion.wmflabs.org ssh deployment-dbdump [01:57:36] Permission denied (publickey). [01:57:52] do you have a key in your agent? [01:57:54] ssh-add -l [01:58:18] is it your labs key? [01:58:36] hmm, the wrong key is in my agent [01:58:39] ;) [01:58:40] 2048 6b:fe:97:06:09:4f:e9:33:14:73:7a:af:24:d5:0a:21 /Users/andrew/.ssh/id_rsa-cca (RSA) [01:58:51] now to figure out how to fix that :) [01:59:02] open another window and make a new agent [01:59:06] then add the correct key to it [01:59:25] or this, right? andrew-macbook:wmf-deployment andrew$ ssh-add ~/.ssh/id_rsa-wikimedia [01:59:34] works now :) [02:16:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [02:19:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [02:20:59] addshore: hi. any issues with your new project and access? feel free to bu me about it now [02:21:40] well, instance creation is broken for now [02:21:45] :P [02:22:32] ah ok. yeah. i saw that mail.. re the base image [02:24:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [02:29:05] Ryan_Lane: ugh, now I can't ssh to deployment-sql, and my key is definitely in my ssh agent :) [02:29:53] ok. try now [02:30:07] nscd was acting up [02:30:10] <3 [02:30:20] you rock [02:31:17] deployment-prep is all really really slow, but that's my job to figure out :) [02:31:28] likely needs more squid/varnish [02:31:39] also, we need real hardware for databases [02:31:45] shouldn't, even Special:BlankPage is obscenely slow [02:32:02] I can make MediaWiki run pretty damn fast on a single small server [02:32:13] it's taking like minutes to load Special:BlankPage [02:32:29] I'm betting memc is down [02:33:20] nope, I'll see what comes up in profiling [02:34:28] weird, it looks like the connection's being held open or something. All the HTML is through, including the profiling info, my browser is just insisting that it's still loading [02:41:51] some image holding it up :/ [02:41:55] PROBLEM Free ram is now: WARNING on puppet-lucid puppet-lucid output: Warning: 12% free memory [02:46:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [02:49:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [02:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [02:56:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [02:57:00] !log deployment-prep rebooting a few hosts, there is something seriously wrong with fetching resources at the moment [02:57:01] Logged the message, junior [03:04:55] PROBLEM host: deployment-nfs-memc is DOWN address: deployment-nfs-memc CRITICAL - Host Unreachable (deployment-nfs-memc) [03:05:40] PROBLEM dpkg-check is now: CRITICAL on deployment-web deployment-web output: Connection refused by host [03:05:40] PROBLEM Current Users is now: CRITICAL on deployment-web2 deployment-web2 output: Connection refused by host [03:05:40] PROBLEM Total Processes is now: CRITICAL on deployment-web2 deployment-web2 output: Connection refused by host [03:05:45] PROBLEM SSH is now: CRITICAL on deployment-web2 deployment-web2 output: Connection refused [03:05:49] * werdna whistles [03:06:05] PROBLEM Total Processes is now: CRITICAL on deployment-web3 deployment-web3 output: Connection refused by host [03:06:15] PROBLEM dpkg-check is now: CRITICAL on deployment-web3 deployment-web3 output: Connection refused by host [03:06:40] PROBLEM Disk Space is now: CRITICAL on deployment-web3 deployment-web3 output: Connection refused by host [03:06:40] PROBLEM Current Users is now: CRITICAL on deployment-web3 deployment-web3 output: Connection refused by host [03:06:45] PROBLEM host: deployment-squid is DOWN address: deployment-squid CRITICAL - Host Unreachable (deployment-squid) [03:06:55] PROBLEM Free ram is now: CRITICAL on puppet-lucid puppet-lucid output: Critical: 3% free memory [03:07:05] PROBLEM Free ram is now: CRITICAL on deployment-web3 deployment-web3 output: Connection refused by host [03:07:25] PROBLEM Current Users is now: CRITICAL on deployment-web deployment-web output: Connection refused by host [03:07:35] PROBLEM Total Processes is now: CRITICAL on deployment-web deployment-web output: Connection refused by host [03:07:55] PROBLEM Disk Space is now: CRITICAL on deployment-web deployment-web output: Connection refused by host [03:07:55] PROBLEM SSH is now: CRITICAL on deployment-web deployment-web output: Connection refused [03:07:55] PROBLEM SSH is now: CRITICAL on deployment-web3 deployment-web3 output: Connection refused [03:07:55] PROBLEM SSH is now: CRITICAL on deployment-web4 deployment-web4 output: Connection refused [03:08:05] PROBLEM dpkg-check is now: CRITICAL on deployment-web2 deployment-web2 output: Connection refused by host [03:08:35] PROBLEM Current Users is now: CRITICAL on deployment-web4 deployment-web4 output: Connection refused by host [03:08:35] PROBLEM Current Load is now: CRITICAL on deployment-web deployment-web output: Connection refused by host [03:08:35] PROBLEM dpkg-check is now: CRITICAL on deployment-web4 deployment-web4 output: Connection refused by host [03:08:35] PROBLEM Current Load is now: CRITICAL on deployment-web2 deployment-web2 output: Connection refused by host [03:08:40] PROBLEM Free ram is now: CRITICAL on deployment-web2 deployment-web2 output: Connection refused by host [03:09:15] PROBLEM Disk Space is now: CRITICAL on deployment-web2 deployment-web2 output: Connection refused by host [03:09:55] PROBLEM Current Load is now: CRITICAL on deployment-web4 deployment-web4 output: Connection refused by host [03:10:25] PROBLEM Free ram is now: CRITICAL on deployment-web deployment-web output: Connection refused by host [03:10:35] PROBLEM Current Load is now: CRITICAL on deployment-web3 deployment-web3 output: Connection refused by host [03:10:35] PROBLEM Free ram is now: CRITICAL on deployment-web4 deployment-web4 output: Connection refused by host [03:10:35] PROBLEM Disk Space is now: CRITICAL on deployment-web4 deployment-web4 output: Connection refused by host [03:10:35] PROBLEM Total Processes is now: CRITICAL on deployment-web4 deployment-web4 output: Connection refused by host [03:11:10] !log deployment-prep facepalm: apparently all reboots are failing, so this will be down until Ryan brings it all back up tomorrow [03:11:11] Logged the message, junior [03:16:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [03:17:40] werdna: arr.heh. i was about to warn you of rebooting.. but got distracted by a query :p [03:19:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [03:24:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [03:26:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [03:26:55] RECOVERY Free ram is now: OK on puppet-lucid puppet-lucid output: OK: 20% free memory [03:35:35] PROBLEM host: deployment-nfs-memc is DOWN address: deployment-nfs-memc CRITICAL - Host Unreachable (deployment-nfs-memc) [03:37:35] PROBLEM host: deployment-squid is DOWN address: deployment-squid CRITICAL - Host Unreachable (deployment-squid) [03:42:45] RECOVERY host: deployment-squid is UP address: deployment-squid PING OK - Packet loss = 0%, RTA = 0.45 ms [03:44:05] RECOVERY host: deployment-nfs-memc is UP address: deployment-nfs-memc PING OK - Packet loss = 0%, RTA = 3.36 ms [03:46:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [03:49:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [03:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [03:56:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [04:16:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [04:19:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [04:24:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [04:26:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [04:46:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [04:49:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [04:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [04:56:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [05:16:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [05:19:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [05:24:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [05:26:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [05:46:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [05:49:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [05:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [05:56:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [06:16:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [06:19:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [06:24:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [06:26:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [06:46:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [06:49:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [06:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [06:56:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [07:16:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [07:19:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [07:24:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [07:26:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [07:46:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [07:49:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [07:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [07:56:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [08:16:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [08:19:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [08:24:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [08:26:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [08:46:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [08:49:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [08:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [08:56:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [09:16:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [09:19:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [09:24:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [09:26:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [09:46:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [09:49:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [09:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [09:56:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [09:56:35] !account-questions | IWorld [09:56:35] IWorld : I need the following info from you: 1. Your preferred wiki user name. This will also be your git username, so if you'd prefer this to be your real name, then provide your real name. 2. Your preferred email address. 3. Your SVN account name, or your preferred shell account name, if you do not have SVN access. [09:57:00] ah [10:00:24] Where I can create the account? [10:00:51] just answer the questions and someone do that [10:01:09] someone would be mutante heh [10:01:10] ah [10:02:20] 1. IWorld 2. vgweb@hotmail.de 3. IWorld. If someone need my real name, ask me via query. [10:02:37] 3 needs to be lower case [10:02:39] but ok [10:02:42] iworld [10:02:46] mutante: can you create it pls [10:02:58] he will be working on huggle wa [10:05:27] IWorld: "Real name is optional. If you choose to provide it, this will be used for giving you attribution for your work. " so i am not asking you :) [10:05:40] ah [10:06:32] :) [10:06:36] petan|wk: you know what, the bot could also output the "created new account" line? hm? [10:06:49] indeed [10:06:53] created new account User:IWorld (via petan, will work on hugglewa, password sent by e-mail) [10:07:01] IWorld: check your mail [10:07:05] ok [10:07:21] I think it could be implemented to script you use to create it [10:07:31] if it's in svn [10:08:01] should all members of hugglewa also bet sys and netadmin though? [10:08:07] be [10:08:16] I can sort it out [10:08:23] !log hugglewa inserted IWorld to project [10:08:24] Logged the message, Master [10:08:37] heh, you were faster [10:08:40] ah [10:08:44] he's already there [10:08:50] Successfully added IWorld to hugglewa. [10:08:53] heh [10:09:35] !access [10:09:35] https://labsconsole.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [10:09:44] this can wait [10:09:45] !access | IWorld [10:09:45] IWorld : https://labsconsole.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [10:09:50] labs are down a bit [10:09:54] yea, please explain he current issue :) [10:09:58] the [10:10:06] I already told him :) [10:10:09] kk [10:10:16] once it's fixed we can create few instances there [10:10:31] Thank you! [10:11:21] your welcome. you can already create an upload an ssh key though [10:11:32] ok [10:11:36] https://labsconsole.wikimedia.org/wiki/Special:NovaKey [10:12:05] Do I need Linux for accessing? [10:12:17] no. are you on windows? [10:12:23] yes [10:12:24] then get putty [10:12:33] ok [10:12:36] !putty [10:12:36] how to tunnel - http://oldsite.precedence.co.uk/nc/putty.html [10:12:36] the full install package, includng puttygen and pageant [10:12:55] use puttygen to create your key, use pageant as an agent to load it [10:13:01] use putty to connect:) [10:13:07] !bastion [10:13:07] http://en.wikipedia.org/wiki/Bastion_host; lab's specific bastion host is: bastion.wmflabs.org; see !access [10:13:10] I know [10:13:33] you can search the bot to get answers for anything you want [10:13:37] @seach bot [10:13:43] @search bot [10:13:44] Results (found 7): morebots, bot, labs-morebots, keys, bots, cs, help, [10:13:47] etc [10:14:29] petan|wk: can i add to an eisting definition? [10:14:36] you need to del [10:14:41] then create it again [10:14:47] !key del [10:14:48] Unable to find the specified key in db [10:14:56] !putty del [10:14:57] Successfully removed putty [10:15:17] !putty official site: http://www.chiark.greenend.org.uk/~sgtatham/putty/ | how to tunnel - http://oldsite.precedence.co.uk/nc/putty.html [10:15:24] !putty is [10:15:25] It would be cool to give me also a text of key [10:15:37] !putty is official site: http://www.chiark.greenend.org.uk/~sgtatham/putty/ | how to tunnel - http://oldsite.precedence.co.uk/nc/putty.html [10:15:37] Key was added! [10:15:38] !putty is official site: http://www.chiark.greenend.org.uk/~sgtatham/putty/ | how to tunnel - http://oldsite.precedence.co.uk/nc/putty.html [10:15:38] Key exist! [10:15:40] that [10:15:41] :) [10:15:41] ;) [10:15:55] !:) is /me laughs [10:15:55] Key was added! [10:15:58] !:) [10:15:58] /me laughs [10:16:01] arr;) [10:16:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [10:16:49] Can I use "SSH-2 RSA" [10:16:51] ? [10:16:53] yep [10:17:03] anything what ubuntu knows :) [10:17:12] ah [10:17:32] are 4096 bits in key ok? [10:17:38] yes [10:17:42] ok [10:17:47] upload .pub [10:17:51] you need to copy/paste from the "Public key for pasting box" [10:17:53] ah [10:17:57] instead of saving [10:18:21] well, you can also save it, but that is a different format [10:18:30] ah [10:18:56] DO save your private key though.. preferably in a save place .. maybe a USB flash drive or something f you like [10:19:19] okay [10:19:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [10:20:09] and add a passphrase to your key [10:20:18] ok [10:20:20] hen start pageant.exe, it becomes a little tray icon [10:20:28] ok [10:20:30] select your private key from pageant and load it [10:20:47] that way you dont have to keep typing your passphrase more than once per sesson [10:21:16] after you see it loaded in the tray icon. just start putty and connect.. you dont need to worry about loading the key in putty itself [10:21:32] ah [10:21:49] in putty enable the setting thought to allow "Agent forwarding" [10:22:04] that will allow you to connect to the bastion host and from there to another instance [10:22:08] forwarding the key [10:22:42] ah [10:23:00] Do I need the key fingerprint? [10:23:03] you can change that, enter a username and hostname, and save that whole thing as a "session" in putty.. so next time it should just be a doulble-click [10:24:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [10:24:44] Can I upload the file with "---- BEGIN SSH2 PUBLIC KEY ----"? [10:25:49] IWorld: no, copy/paste the one that starts with just ssh-rsa [10:25:57] ok [10:25:58] see that in puttygen on that box? [10:26:00] ok [10:26:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [10:26:27] 03/06/2012 - 10:26:27 - Creating a home directory for iworld at /export/home/hugglewa/iworld [10:27:25] 03/06/2012 - 10:27:25 - Updating keys for iworld [10:28:21] And now? :) [10:30:16] load the private key, connect to bastion.wmflabs.org [10:30:25] confirm that works.. then thats it for today i think:) [10:30:42] ah [10:31:06] Do I connect via pageant.exe? [10:32:06] no, thats just where you load the key [10:32:10] connect via putty.exe [10:32:38] provide putty.exe with hostname (bastion..) and your username..then hit connect [10:35:10] I have an error. :-( [10:35:52] wait [10:36:12] 03/06/2012 - 10:36:12 - Creating a home directory for iworld at /export/home/bastion/iworld [10:36:12] http://www.uploadscreenshot.com/image/806762/151621 [10:36:26] !log bastion new member IWorld [10:36:27] Logged the message, Master [10:36:35] you weren't in bastion [10:36:46] ah [10:36:55] try it [10:37:11] 03/06/2012 - 10:37:11 - Updating keys for iworld [10:37:40] okay :) [10:37:51] It runs! [10:40:06] what are the infos like "usage of"? [10:40:22] @search ganglia [10:40:22] Results (found 2): load, load-all, [10:40:24] !load [10:40:25] http://ganglia.wikimedia.org/2.2.0/graph_all_periods.php?h=virt2.pmtpa.wmnet&m=load_one&r=hour&s=by%20name&hc=4&mc=2&st=1327006829&g=load_report&z=large&c=Virtualization%20cluster%20pmtpa [10:40:40] what you mean [10:40:58] if it's info in bastion you can ignore [10:41:06] ok [10:41:44] When we can create hugglewa.wmflabs.org? [10:42:07] wait :) [10:42:12] labs are broken [10:42:16] ok [10:42:48] Can I ask on evening (UTC) again? [10:43:15] yes [10:44:30] Do I have svn access? [10:44:39] no [10:44:43] http://svn.wikimedia.org/users.php [10:44:54] iworld N [10:44:59] I think you need to apply for it [10:45:05] ok [10:46:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [10:46:36] I think only people who are blue in list are actually devs [10:46:48] other people just have account but that's all [10:46:48] blue list? [10:46:56] there is link [10:47:04] some people link to mediawiki site [10:47:34] ah [10:47:47] That's all for now. :) [10:48:06] Thank you for creating my account! [10:48:44] Bye! [10:49:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [10:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [10:56:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [11:16:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [11:19:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [11:24:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [11:26:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [11:46:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [11:49:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [11:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [11:56:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [12:16:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [12:19:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [12:24:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [12:26:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [12:41:03] Beta cluster down, I am so sad :( [12:43:43] yes [12:44:55] The one way to fix the beta cluster is out of action :( [12:45:05] I see some resource request issue from SAL [12:45:23] that's not the problem heh [12:45:42] then this is the problem [12:45:55] if that is not the problem with the cluster, it wasn't logged... [12:46:03] last log line [12:46:09] that's it [12:46:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [12:46:33] but it doesn't tell us whats wrong [12:46:37] it does [12:46:49] werdna's line [12:46:49] unless being unable to reboot is the problem [12:46:52] yes [12:46:53] it is [12:47:04] reboot = shutdown [12:47:08] but why do you want to reboot? [12:47:10] he did reboot on all vm's [12:47:13] I don't [12:47:16] werdna did [12:48:22] I do believe it should be fixed in about 8hours [12:49:25] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [12:49:57] heh [12:50:01] hope so [12:50:35] that's tommorow for me [12:51:01] hmm [12:51:08] Is it possible to delete instances? [12:51:13] yes [12:51:18] oh good [12:51:25] I shall delete one or two [12:52:54] !log dumps Deleted instances dumps-8 & dumps-nfs2. [12:52:55] Logged the message, Master [12:53:05] Deleted or just rebooted? :P [12:53:21] delete [12:53:37] Ryan asked me to delete them [12:54:13] we are out of ram :o [12:54:19] so we need to remove some [12:54:27] yeah [12:54:29] !stats [12:54:33] !load [12:54:33] http://ganglia.wikimedia.org/2.2.0/graph_all_periods.php?h=virt2.pmtpa.wmnet&m=load_one&r=hour&s=by%20name&hc=4&mc=2&st=1327006829&g=load_report&z=large&c=Virtualization%20cluster%20pmtpa [12:54:35] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [12:54:47] cool [12:54:51] there are graphs [12:56:01] !log dumps Deleted instance dumps-5 too, only working with servers with multiple of 4 [12:56:02] Logged the message, Master [12:56:25] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [12:56:52] oh yes [12:57:04] how do we actually use the bots-sql servers? [12:57:12] we don't even have a proper sql account [12:57:18] (at least for me) [12:57:19] you create an account there and use it [12:57:25] simple [12:57:28] re [12:57:34] meh [12:57:41] okay... [12:57:50] * Hydriz wonders how to create an account [12:57:55] I make it [12:58:04] zzz [12:58:20] petan|wk: are the bugs fixed? [12:58:25] PROBLEM host: dumps-nfs2 is DOWN address: dumps-nfs2 CRITICAL - Host Unreachable (dumps-nfs2) [12:58:34] :O dumps-nfs2 is down!!! [12:59:11] wtf the changes in instances are not synced to nagios [12:59:13] sigh [12:59:55] PROBLEM host: dumps-8 is DOWN address: dumps-8 check_ping: Invalid hostname/address - dumps-8 [13:03:16] IWorld: ? [13:03:18] not ye [13:03:26] oh [13:03:34] Hydriz: it's bots-sql2 [13:03:35] petan|wk: --> #huggle [13:03:46] 2... [13:03:53] I am logged in to 3 [13:03:54] :P [13:04:04] whats my username? [13:04:08] Hydriz? [13:04:57] ah [13:04:59] no caps [13:05:38] zzz can't change password [13:05:48] yes [13:05:50] hydriz [13:05:53] on bots-2 [13:05:54] * Hydriz bangs the table [13:06:00] bots-sql2 [13:06:08] you don't need to ssh to sql server [13:06:17] true that [13:06:19] just do mysql from any application server [13:06:25] but just accessing to see the dbs [13:06:34] use php my admin for this [13:06:36] * Hydriz prays he can actually remember the password [13:06:40] petan|wk: do we need an instance for HuggleWA? [13:06:41] change it [13:06:45] yes [13:06:59] Who can create that? [13:07:06] me [13:07:10] yes, how to change it if I don't have access to the mysql db [13:07:16] Hydriz: you do [13:07:21] just login to bots-2 [13:07:34] ssh to any application server [13:07:45] then type mysql -h bots-sql2 -p -u hydriz [13:07:56] typ your pw [13:07:57] eh yes... [13:08:49] FAIL bots-4 doesn't have mysql :P [13:09:17] bots-2 is just too overloaded [13:09:26] bots-2 is a POS [13:09:30] * Hydriz just hopes to dominate the use of bots-4 in the future [13:11:25] ok, tell me whats POS [13:11:50] wtf I can't even use mysql in bots-2 [13:12:05] what [13:12:12] the sql servers are separate vm's [13:12:19] you can access sql from any [13:12:25] from bots-4 too [13:12:32] *facepalm* [13:12:49] I think I offended the instances or something [13:13:03] The program 'mysql' can be found in the following packages: [13:13:04] * mysql-client-core-5.1 [13:13:04] * mysql-client-5.0 [13:13:04] * mysql-cluster-client-5.1 [13:13:04] Ask your administrator to install one of them [13:16:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [13:16:52] try now [13:17:44] which server? [13:18:05] ok, you meant bots-4? [13:19:05] ERROR 1142 (42000): UPDATE command denied to user 'hydriz'@'i-000000e8.pmtpa.wmflabs' for table 'user' [13:19:08] petan|wk: sorry! [13:19:10] I fail at life [13:19:46] petan|wk: it was obscenely slow [13:19:51] and I thought rebooting might help [13:19:55] turned out, it didn't :p [13:20:34] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [13:21:22] :D [13:21:23] it's ok [13:21:47] petan|wk: It was freezing loading wiki-en.png [13:21:49] I wonder why [13:21:49] don't reboot all boxes at once [13:21:50] * werdna shrugs [13:21:55] :D [13:21:57] well in theory it should have worked fine [13:21:58] it's never a good idea [13:22:02] but yeah [13:22:08] should have read my email [13:22:12] ah, ok [13:22:20] which one [13:22:29] I probably didn't receive it [13:22:35] oh [13:22:42] the one about "don't reboot labs servers" [13:22:43] you mean your box :D [13:22:47] I thought you sent me one [13:22:49] ah well, it'll be sorted in 5 hours [13:23:00] at least we have het deploy on those boxes now p [13:23:07] hehe [13:23:14] <^demon> werdna: I'm not even on labs-l and I knew that...Ryan forwarded it to engineering. [13:23:21] true [13:23:23] yeah I just failed to read it properly [13:23:27] I'd read the email in the morning [13:23:29] everyone know [13:23:30] it was just out of my mind [13:23:34] even my cat [13:23:46] :P [13:23:47] and when it all failed to come back up I was like "shit, I remember reading something about this" [13:23:51] how could you not know [13:23:56] <^demon> petan|wk: You were about to reboot, and your cat said "no way?" [13:23:57] <^demon> Awesome [13:24:09] no, it slashed me in face [13:24:16] <^demon> Same effect :) [13:24:17] yeah, sorry :( [13:24:41] no problem [13:24:47] when I did that I gave up for the day and went to the pub [13:24:48] it's not production :) [13:24:51] I was like "ahhh" [13:24:54] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [13:24:55] that's enough for today [13:24:57] :D [13:24:59] haha [13:26:19] <^demon> Is it friday yet? [13:26:24] <^demon> *sigh* [13:26:26] here it's wednesday [13:26:29] as of half an hour ago [13:26:34] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [13:26:34] I want it to be friday [13:26:46] <^demon> Well, I want it to be saturday morning :) [13:26:46] potentially related to me organising a date for friday [13:26:49] <^demon> But I'll settle for friday [13:27:37] indeed [13:28:01] <^demon> Spring break. Long overdue vacation. [13:28:19] where are you going [13:28:38] did I tell you I have my DC flights ^demon [13:28:41] <^demon> Cruising in the virgin islands/barbados/puerto rico. [13:28:45] you should join me in NYC on the 8th [13:28:46] whoa [13:28:48] how exciting [13:28:54] sounds like a lot of vodka [13:30:10] <^demon> I'll probably be in DC starting the weekend before WM2012. Doubt I'll be able to make it up to NYC. [13:31:37] bah [13:31:43] heh [13:31:46] we'll have to hang out during wikimania [13:31:49] are you coming petan|wk [13:31:52] I hope [13:32:06] I want to get drunked in DC [13:32:09] :D [13:32:39] although the point of conference is probably something else [13:32:55] noep [13:32:58] that would be incorrect [13:33:04] :) [13:33:13] I need to check out what you drink in US [13:33:45] beer, mostly. not a lot of spirits [13:34:00] hm... I heard many things about your beer [13:34:10] * werdna is not american really, to clarify [13:34:10] :D [13:34:13] ah [13:34:14] <^demon> Um, we drink lots of spirits in the US. [13:34:16] <^demon> And beer :) [13:34:21] there's crap beer and there's good beer [13:34:21] <^demon> We're just generally drunks. [13:34:26] like here [13:34:30] <^demon> Crap beer = american beers [13:34:36] <^demon> Good beer = imported beers [13:34:42] no, there's some amazing microbrews in ca [13:34:43] petan|wk: huh? I'm a sysadmin of hugglewa? [13:34:47] I heard that our imported beer in US taste like american beer :D [13:34:50] <^demon> werdna: Few and far between :\ [13:34:57] IWorld: yes I gave you that [13:35:00] cool [13:35:05] ^demon: I need to show you tornados and pi bar in sf [13:35:30] <^demon> werdna: If you get a chance, you should come down to Richmond. We've got this place called Capitol Ale House. They've got a 6 page beer menu that changes seasonally. [13:35:36] ^demon: you aren't from sf? [13:35:36] petan|wk: what can sysadmin do? [13:35:43] IWorld: break stuff [13:35:48] <^demon> petan|wk: No, Virginia. Just south of DC. [13:35:49] reboot instances, etc [13:36:00] ah [13:36:04] ^demon: I'm in [13:36:14] I am wondering if there are some people who actually work for wmf and live in sf :D [13:36:31] because all people I know are like from somewhere else [13:36:33] <^demon> I see people in the office when I go out to SF, so I assume so :) [13:36:45] do they know what is wikipedia though? [13:36:55] I believe there are some people in office [13:36:56] petan|wk: what are NetAdmins? [13:36:58] question is who they are [13:37:10] IWorld: that can break even more [13:37:56] ah :) [13:38:26] petan|wk: where are you based, btw? [13:38:47] central europe [13:39:03] german is best place for me regarding meetups :) [13:39:11] so, austria? [13:39:17] no, czech republic [13:39:21] it's close [13:39:27] borders with austria [13:39:29] ah, I know czech republic [13:39:38] lot of beers [13:39:38] though I've only been to prague there [13:39:47] just a few :p [13:39:54] yes, I am mostly in prague as well :D [13:40:06] prague is a beautiful city [13:40:27] rest of the country... who cares [13:40:29] :D [13:40:46] I would love to visit some of the rest of cz [13:41:00] I don't think there is much to see [13:41:07] pilzn! [13:41:12] yes :) [13:41:51] they make the beer there then it's being drinked by prague people [13:42:24] in fact, we work in pragues and rest of country supply the beer [13:42:26] :D [13:42:36] easy [13:42:41] sounds like a good arrangement [13:47:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [13:50:34] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [13:55:54] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [13:56:34] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [14:14:46] @ petan|wk: --> #huggle [14:17:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [14:20:34] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [14:25:54] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [14:26:34] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [14:47:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [14:50:34] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [14:53:22] !log deployment-prep root: disabling bot for a while [14:53:23] Logged the message, Master [14:55:54] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [14:56:34] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [15:09:33] !log [15:09:33] Message missing. Nothing logged. [15:09:37] !log fggsgaa [15:09:37] Message missing. Nothing logged. [15:11:55] oh [15:12:02] nvm [15:17:35] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [15:20:20] !log bots petrb: ggsgsh [15:20:21] Logged the message, Master [15:20:26] yay [15:20:35] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [15:20:42] here we go [15:23:57] !log deployment-prep petrb: test [15:23:59] Logged the message, Master [15:25:55] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [15:26:35] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [15:29:22] Damianz: hey [15:29:39] how do I change a content of /etc/bash.bashrc on whole labs [15:31:18] Puppet? [15:31:21] I know [15:31:22] but how [15:31:33] I don't know how does it work [15:32:08] I created a system to log to sal [15:32:16] you just type log message in shell [15:32:28] I want to import it to all instances we have [15:32:40] That could get rather annoying [15:32:45] why [15:32:54] it's much easier to use [15:32:58] actually I like it :D [15:33:01] When it starts spitting out private data that happens to contain the word log [15:33:13] huh? [15:33:33] I don't get it [15:33:50] it's a shell command called log [15:34:04] you type log the message is here [15:34:06] example [15:35:02] !log bots petrb: this is a test :o [15:35:03] Logged the message, Master [15:35:16] I typed "log this is a test :o" [15:35:18] in shell [15:35:41] what's a problem with that [15:36:02] So what happens if I have a bash function or executable called log already or I type something starting with log and it ends up in here when it's not suppose to? [15:36:27] what if you accidentaly type sudo rm -rf /... ? [15:36:54] it's your mistake in that case, you need to be carefull when you use shell [15:37:30] how in the world you typed "log private data" as a command to shell by accident? [15:37:37] you could just mistake it with wall [15:37:39] But you're suggesting making a change to someones environment which could potentially leak information [15:37:42] which has similar efect [15:37:57] wall is limited and exists already [15:38:04] ok we can make it opt-in [15:38:10] class in console [15:38:16] if you check it you will have it [15:38:28] it's extremely useful [15:38:41] so I want to use it on bots and I already installed it to deployment-prep [15:38:41] More prefrable or as homedirs are shared betwean instances just make people stick a line in .bashrc [15:39:09] that's too complicated [15:39:20] I would rather have an option in console [15:39:27] so that creator of instance can install it or not [15:40:02] btw Reedy said they had this in prod so I don't think it's insecure [15:40:14] they also log to SAL using shell cmd [15:46:06] beta.wmflabs.org surver down? [15:46:32] yes [15:46:40] well servers [15:47:35] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [15:49:34] Damianz: what if I insert "do you want to log this? y/n" [15:49:44] then it wouldn't leak if it was called as mistaky [15:50:00] ? [15:50:11] what you think of that [15:50:35] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [15:50:52] Probably configuration, for the beta cluster having it globally makes sense, for others it doesn't really. [15:50:59] hm [15:51:06] I think it should be in puppet though [15:51:22] some people might want to use it [15:51:50] because I noticed that it's not just a beta cluster where people don't log, it's even on bots [15:51:59] it's everywhere [15:52:12] it's too annoying to move to another window [15:52:17] when you are in shell [15:55:55] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [15:56:35] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [16:17:35] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [16:19:55] petan|wk: I might be getting a bit confused with how translatewiki does it [16:19:58] certainly one of htem does it [16:20:35] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [16:25:29] I don't believe it should matter [16:25:43] programs which randomly calls non existent command are broken [16:25:55] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [16:26:07] I dont'see any problem in it [16:26:35] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [16:47:35] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [16:50:35] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [16:55:55] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [16:56:35] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [17:02:20] re [17:12:05] PROBLEM Free ram is now: WARNING on bots-3 bots-3 output: Warning: 19% free memory [17:17:35] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [17:20:35] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [17:25:55] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [17:26:35] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [17:47:35] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [17:50:35] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [17:55:55] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [17:56:35] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [17:58:35] ok. I'm getting started [18:05:30] PROBLEM Current Load is now: CRITICAL on dumps-2 dumps-2 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:05] PROBLEM Disk Space is now: CRITICAL on dumps-4 dumps-4 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:05] PROBLEM Current Users is now: CRITICAL on dumps-4 dumps-4 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:05] PROBLEM Free ram is now: CRITICAL on dumps-4 dumps-4 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:40] PROBLEM Disk Space is now: CRITICAL on dumps-2 dumps-2 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:40] PROBLEM Current Users is now: CRITICAL on dumps-2 dumps-2 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:40] PROBLEM Free ram is now: CRITICAL on dumps-2 dumps-2 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:40] PROBLEM Current Load is now: CRITICAL on swarm-specialpage swarm-specialpage output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:40] PROBLEM Disk Space is now: CRITICAL on testblog testblog output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:41] PROBLEM Current Load is now: CRITICAL on testblog testblog output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:41] PROBLEM dpkg-check is now: CRITICAL on dumps-1 dumps-1 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:41] PROBLEM dpkg-check is now: CRITICAL on dumps-2 dumps-2 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:42] PROBLEM dpkg-check is now: CRITICAL on swarm-specialpage swarm-specialpage output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:43] PROBLEM dpkg-check is now: CRITICAL on testblog testblog output: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:43] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: (Service Check Timed Out) [22:09:06] !log deployment-prep petrb: updating svn [22:09:07] Logged the message, Master [22:09:30] Ryan_Lane: we can log to sal from shell now [22:09:39] * Ryan_Lane nods [22:09:39] just it's needed to put it to puppet [22:09:47] some of the instances are still haging issues [22:09:51] I know [22:09:55] can you name the bot something else? :) [22:10:01] yes :) [22:10:03] heh [22:10:24] also, squid fails to start on deployment-squid on reboot [22:10:29] I fixed it [22:10:37] using /mnt now? [22:10:47] it was mounted to /var/spool before [22:10:51] * Ryan_Lane nods [22:11:38] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [22:12:30] o.o [22:13:23] ok [22:13:42] cool [22:14:01] it's listening on bots-labs tcp [22:14:12] you just need to send it a message from instance you work on [22:14:18] ah. cool [22:14:24] I made a script for that but it's only on deployment-prep [22:14:30] it'll figure out the project automatically? [22:14:34] it needs to know what is name of project [22:14:44] it needs to be in environment, it's not now [22:14:47] !realm [22:14:47] $realm is a variable used in puppet to determine which cluster a system is in. See also $site. [22:14:51] hm [22:15:08] RECOVERY host: bots-4 is UP address: bots-4 PING OK - Packet loss = 0%, RTA = 0.62 ms [22:15:47] !instanceproject is $instanceproject is a variable used in puppet to determine which project an instance is in. [22:15:47] Key was added! [22:16:08] http://pastebin.mozilla.org/1505613 [22:16:09] !puppet-variables is see: !realm, !site, !instanceproject [22:16:09] Key was added! [22:16:18] I need a variable in shell [22:16:21] not in puppet [22:16:25] hm [22:16:28] PROBLEM host: bob is DOWN address: bob CRITICAL - Host Unreachable (bob) [22:16:31] but I could use puppet too [22:16:37] I need to make those facter variables [22:16:38] PROBLEM host: bots-sql3 is DOWN address: bots-sql3 CRITICAL - Host Unreachable (bots-sql3) [22:16:39] it would need to change the content of script [22:16:46] then you could use `facter instanceproject` [22:17:38] PROBLEM host: hugglewiki is DOWN address: hugglewiki CRITICAL - Host Unreachable (hugglewiki) [22:17:38] PROBLEM host: deployment-backup is DOWN address: deployment-backup CRITICAL - Host Unreachable (deployment-backup) [22:17:38] PROBLEM host: incubator-nfs is DOWN address: incubator-nfs CRITICAL - Host Unreachable (incubator-nfs) [22:17:38] PROBLEM host: ganglia-master is DOWN address: ganglia-master CRITICAL - Host Unreachable (ganglia-master) [22:17:38] PROBLEM host: incubator-dep is DOWN address: incubator-dep CRITICAL - Host Unreachable (incubator-dep) [22:17:39] PROBLEM host: mobile-enwp is DOWN address: mobile-enwp CRITICAL - Host Unreachable (mobile-enwp) [22:17:39] PROBLEM host: memcache-puppet is DOWN address: memcache-puppet CRITICAL - Host Unreachable (memcache-puppet) [22:18:28] PROBLEM host: pad2 is DOWN address: pad2 CRITICAL - Host Unreachable (pad2) [22:18:28] PROBLEM host: p-b is DOWN address: p-b CRITICAL - Host Unreachable (p-b) [22:18:28] PROBLEM host: secondinstance is DOWN address: secondinstance CRITICAL - Host Unreachable (secondinstance) [22:18:28] PROBLEM host: test3 is DOWN address: test3 CRITICAL - Host Unreachable (test3) [22:18:28] PROBLEM host: testing-puppet is DOWN address: testing-puppet CRITICAL - Host Unreachable (testing-puppet) [22:18:58] PROBLEM host: wep is DOWN address: wep CRITICAL - Host Unreachable (wep) [22:21:26] !log deployment-prep petrb: some instances will need to reboot, however site seems to be ok now [22:21:27] Logged the message, Master [22:22:01] I'm fixing the ones that need reboots [22:22:06] ok [22:22:06] RECOVERY host: bots-sql3 is UP address: bots-sql3 PING OK - Packet loss = 0%, RTA = 0.67 ms [22:22:06] it's thanks to broken live migrations [22:22:54] ahhhhh what's the deal with wmflabs being down - I can't do my stuffz now :/ [22:23:04] which instance of yours is down? [22:23:25] are you on the labs-l list? [22:23:33] if not, you should subscrib [22:23:41] there is a site notice I made :D [22:23:49] everyone should know that it's down [22:23:53] that was also really helpful too :) [22:24:43] JeroenDeDauw: ?? [22:25:32] Ryan_Lane: I can't reach wmflabs.org [22:25:44] because it isn't a real domain [22:25:56] RECOVERY host: deployment-backup is UP address: deployment-backup PING OK - Packet loss = 0%, RTA = 3.27 ms [22:25:56] are you trying to go to labsconsole.wikimedia.org? [22:26:08] or beta.wmflabs.org? [22:26:17] or some other subdomain? [22:26:26] RECOVERY host: p-b is UP address: p-b PING OK - Packet loss = 0%, RTA = 1.00 ms [22:27:16] Ryan_Lane: education.wmflabs [22:27:45] which instance is its IP associated with? [22:28:43] * Ryan_Lane sighs [22:28:48] JeroenDeDauw: ^^ [22:28:56] Ryan_Lane: https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-000000c2 [22:29:40] that instance looks up to me [22:29:46] RECOVERY host: hugglewiki is UP address: hugglewiki PING OK - Packet loss = 0%, RTA = 1.99 ms [22:30:00] it isn't, though [22:30:04] JeroenDeDauw: reboot it [22:30:29] * JeroenDeDauw clicks reboot [22:30:59] windows factor just increased by some points [22:31:15] seriously, subscribe to labs-l [22:32:27] your instance was one of the ones that was failing to boot. I've now fixed it [22:32:32] it'll come up soon [22:32:56] it's up [22:33:44] RECOVERY host: wep is UP address: wep PING OK - Packet loss = 0%, RTA = 0.57 ms [22:34:54] RECOVERY host: mobile-enwp is UP address: mobile-enwp PING OK - Packet loss = 0%, RTA = 3.67 ms [22:36:34] RECOVERY host: bob is UP address: bob PING OK - Packet loss = 0%, RTA = 0.82 ms [22:37:44] RECOVERY host: memcache-puppet is UP address: memcache-puppet PING OK - Packet loss = 0%, RTA = 1.07 ms [22:39:12] Wait, we have a host called bob? [22:39:14] Oh, labs [22:39:16] heh [22:40:24] RECOVERY host: incubator-dep is UP address: incubator-dep PING OK - Packet loss = 0%, RTA = 0.95 ms [22:42:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [22:42:44] RECOVERY host: incubator-nfs is UP address: incubator-nfs PING OK - Packet loss = 0%, RTA = 3.51 ms [22:43:34] RECOVERY host: pad2 is UP address: pad2 PING OK - Packet loss = 0%, RTA = 1.31 ms [22:43:54] RECOVERY host: secondinstance is UP address: secondinstance PING OK - Packet loss = 0%, RTA = 0.84 ms [22:45:54] RECOVERY host: test3 is UP address: test3 PING OK - Packet loss = 0%, RTA = 3.51 ms [22:47:44] PROBLEM host: ganglia-master is DOWN address: ganglia-master CRITICAL - Host Unreachable (ganglia-master) [22:48:34] PROBLEM host: testing-puppet is DOWN address: testing-puppet CRITICAL - Host Unreachable (testing-puppet) [22:50:00] hm. ganglia-master seems broken [22:51:36] it's a lucid image. not sure why it won't boot completely [22:59:04] RECOVERY host: testing-puppet is UP address: testing-puppet PING OK - Packet loss = 0%, RTA = 1.01 ms [23:01:06] and again driver-dev still won't come up [23:01:21] it must be due to a bad image being used [23:01:47] oh well. everything is back to normal, now [23:03:22] instances are creating properly [23:05:06] ok. back to project storage :) [23:05:35] * Damianz wonders if Ryan_Lane broke life [23:05:49] everything should be working now [23:05:56] actually, things should be working better than before [23:06:13] I applied security updates, and fixed all the instances that were broken thanks to broken live migrations [23:06:16] Did we sacrifice petan to sexi? [23:06:47] Awesome the bots came back up ok :D [23:06:58] Saw they went down around 6ish [23:08:04] * Damianz gives Ryan_Lane a cookie then takes it off him remembering that he broke it in the first place :) [23:08:14] heh [23:12:35] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [23:13:04] Ryan_Lane: The CVN is considering to switch their main stuff to Labs which means we'd basically be putting 3 things on it. 1) a wiki, 2) some php tools (a wikitable generator and a basic JSON API) with a mysql db 3) about a dozen IRC bots that is the heart of #cvn-commons, #cvn-meta and all. [23:13:18] that's fine [23:13:38] Would it be possible to have the wiki and the tools accessible from countervandalism.net ? [23:13:55] PROBLEM Current Load is now: CRITICAL on testing-creation testing-creation output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:14:01] if you guys own the domain you can point it to the public IP [23:14:06] right [23:14:29] Did you ever get started with the nginx proxy setup? [23:14:33] nope [23:14:35] PROBLEM Current Users is now: CRITICAL on testing-creation testing-creation output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:14:35] ok, well one step at a time :) But good to know it's possible. [23:14:39] Thanks [23:15:15] PROBLEM Disk Space is now: CRITICAL on testing-creation testing-creation output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:16:05] PROBLEM Free ram is now: CRITICAL on testing-creation testing-creation output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:17:25] PROBLEM Total Processes is now: CRITICAL on testing-creation testing-creation output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:17:45] PROBLEM host: ganglia-master is DOWN address: ganglia-master CRITICAL - Host Unreachable (ganglia-master) [23:18:55] RECOVERY Current Load is now: OK on testing-creation testing-creation output: OK - load average: 0.41, 0.28, 0.23 [23:19:05] andrewbogott: seems I can't recover driver-dev [23:19:19] andrewbogott: I think it's due to kernel upgrades in the image [23:19:30] Ryan_Lane: That's ok, you already salvaged everything that I needed, as best I can tell. [23:19:32] maybe I should try a newer oneiric image for this [23:19:35] RECOVERY Current Users is now: OK on testing-creation testing-creation output: USERS OK - 1 users currently logged in [23:19:38] Although it's unsettling. [23:20:15] RECOVERY Disk Space is now: OK on testing-creation testing-creation output: DISK OK [23:21:05] RECOVERY Free ram is now: OK on testing-creation testing-creation output: OK: 92% free memory [23:21:06] I'm pretty sure the image being used isn't ok [23:21:12] I'm going to add a new one to glance [23:22:25] RECOVERY Total Processes is now: OK on testing-creation testing-creation output: PROCS OK: 85 processes [23:22:53] * Damianz stretches out over the channel [23:26:18] andrewbogott: actually, it's weird [23:26:32] a number of the other instances with the same image work fine [23:26:59] yeah, I don't think I did anything in driver-dev that I didn't do with driver-dev-jumbo, other than the exact moment when I rebooted it. [23:27:39] Mayba nova was just in a bad mood. [23:27:44] I bet somehow unattended upgrades installed a new kernel [23:27:51] I'm going to try it explicitly on another node [23:31:06] I swear the etherpad database was designed by someone who had no understanding of databases. [23:31:14] yeah. very likely [23:33:55] PROBLEM Current Load is now: CRITICAL on test-oneiric test-oneiric output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:34:35] PROBLEM Current Users is now: CRITICAL on test-oneiric test-oneiric output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:34:48] so, going to dist-upgrade that instance and see if it boots [23:35:15] PROBLEM Disk Space is now: CRITICAL on test-oneiric test-oneiric output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:36:05] PROBLEM Free ram is now: CRITICAL on test-oneiric test-oneiric output: Connection refused by host [23:37:25] PROBLEM Total Processes is now: CRITICAL on test-oneiric test-oneiric output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:37:54] well, *that* isn't the issue [23:37:58] that works perfectly fine [23:38:01] wtf. [23:38:15] PROBLEM dpkg-check is now: CRITICAL on test-oneiric test-oneiric output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:38:23] ^ Perfactly [23:38:58] the nrpe problem is separate from this ;) [23:39:22] force running puppet fixes the issue [23:40:15] RECOVERY Disk Space is now: OK on test-oneiric test-oneiric output: DISK OK [23:41:05] RECOVERY Free ram is now: OK on test-oneiric test-oneiric output: OK: 94% free memory [23:42:25] RECOVERY Total Processes is now: OK on test-oneiric test-oneiric output: PROCS OK: 76 processes [23:42:35] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [23:43:55] RECOVERY Current Load is now: OK on test-oneiric test-oneiric output: OK - load average: 0.01, 0.04, 0.03 [23:44:35] RECOVERY Current Users is now: OK on test-oneiric test-oneiric output: USERS OK - 0 users currently logged in