[00:18:24] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [00:48:24] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [01:18:24] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [01:48:24] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [02:18:24] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [02:41:00] RECOVERY Disk Space is now: OK on swift-be1 swift-be1 output: DISK OK [02:41:00] RECOVERY Free ram is now: OK on swift-be1 swift-be1 output: OK: 87% free memory [02:42:50] RECOVERY Total Processes is now: OK on swift-be1 swift-be1 output: PROCS OK: 81 processes [02:43:30] RECOVERY dpkg-check is now: OK on swift-be1 swift-be1 output: All packages OK [02:43:30] RECOVERY Current Load is now: OK on swift-be1 swift-be1 output: OK - load average: 0.01, 0.04, 0.00 [02:44:50] RECOVERY Current Users is now: OK on swift-be1 swift-be1 output: USERS OK - 0 users currently logged in [02:48:30] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [03:06:00] RECOVERY Disk Space is now: OK on swift-be2 swift-be2 output: DISK OK [03:06:00] RECOVERY Free ram is now: OK on swift-be2 swift-be2 output: OK: 87% free memory [03:07:50] RECOVERY Total Processes is now: OK on swift-be2 swift-be2 output: PROCS OK: 81 processes [03:08:30] RECOVERY dpkg-check is now: OK on swift-be2 swift-be2 output: All packages OK [03:08:30] RECOVERY Current Load is now: OK on swift-be2 swift-be2 output: OK - load average: 0.02, 0.06, 0.03 [03:09:50] RECOVERY Current Users is now: OK on swift-be2 swift-be2 output: USERS OK - 0 users currently logged in [03:18:30] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [03:34:50] RECOVERY Free ram is now: OK on swift-be4 swift-be4 output: OK: 88% free memory [03:34:50] RECOVERY Current Load is now: OK on swift-be4 swift-be4 output: OK - load average: 0.30, 0.29, 0.11 [03:34:50] RECOVERY Current Users is now: OK on swift-be4 swift-be4 output: USERS OK - 0 users currently logged in [03:36:00] RECOVERY Total Processes is now: OK on swift-be4 swift-be4 output: PROCS OK: 81 processes [03:36:05] RECOVERY Disk Space is now: OK on swift-be4 swift-be4 output: DISK OK [03:37:50] RECOVERY dpkg-check is now: OK on swift-be4 swift-be4 output: All packages OK [03:48:30] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [04:18:30] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [04:48:30] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [05:12:56] "The topic for #wikimedia-labs is: Status: down. Having glusterfs issues." is it still down? [05:18:30] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [05:47:29] heh [05:47:31] whoops [05:47:35] my client doesn't show the topic [05:48:30] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [05:51:30] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [06:00:18] Ryan_Lane: but I'm still unable to delete security groups [06:00:31] oh? what error are you getting? [06:00:38] and from what group in which project? [06:00:48] the labs downtime was unrelated to labsconsole [06:01:06] Ryan_Lane: "Failed to delete security group. " [06:01:14] deleting "web" from "category-sorting" [06:01:28] is it being used on an instance? [06:01:48] I haven't create an instance yet [06:01:57] ok. let me look [06:02:31] 03/22/2012 - 06:02:31 - Creating a home directory for laner at /export/home/category-sorting/laner [06:03:25] liangent: try again for me [06:03:31] I'm watching the log [06:03:31] 03/22/2012 - 06:03:31 - Updating keys for laner [06:04:40] Ryan_Lane: got it? [06:05:00] * Ryan_Lane grumbles [06:05:04] gimme a min [06:05:19] stupid library I'm using for the api changed slightly over time [06:05:28] and I keep tracking down little issues here and there [06:07:43] liangent: try now [06:10:54] Ryan_Lane: seems fixed. thanks [06:10:58] yw [06:18:30] PROBLEM host: essex-4 is DOWN address: essex-4 check_ping: Invalid hostname/address - essex-4 [06:22:30] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [06:44:18] PROBLEM Current Load is now: CRITICAL on catsort-pub catsort-pub output: Connection refused by host [06:44:58] PROBLEM Current Users is now: CRITICAL on catsort-pub catsort-pub output: Connection refused by host [06:45:38] PROBLEM Disk Space is now: CRITICAL on catsort-pub catsort-pub output: Connection refused by host [06:46:28] PROBLEM Free ram is now: CRITICAL on catsort-pub catsort-pub output: Connection refused by host [06:47:48] PROBLEM Total Processes is now: CRITICAL on catsort-pub catsort-pub output: Connection refused by host [06:48:38] PROBLEM dpkg-check is now: CRITICAL on catsort-pub catsort-pub output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:52:49] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [07:00:58] RECOVERY Disk Space is now: OK on aggregator1 aggregator1 output: DISK OK [07:23:28] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [07:50:18] PROBLEM host: deployment-web4 is DOWN address: deployment-web4 CRITICAL - Host Unreachable (deployment-web4) [07:53:28] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [08:20:49] PROBLEM host: deployment-web4 is DOWN address: deployment-web4 CRITICAL - Host Unreachable (deployment-web4) [08:23:29] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [08:49:34] Ryan_Lane: Top of the morning [08:49:41] hello [08:49:50] * Ryan_Lane is going to sleep really, really soon [08:49:55] ok [08:50:15] I can't access my labs [08:50:28] is that a known issues [08:50:49] PROBLEM host: deployment-web4 is DOWN address: deployment-web4 CRITICAL - Host Unreachable (deployment-web4) [08:50:58] I see lots of problems [08:51:39] which instances? [08:51:45] in which projects? [08:51:57] sec [08:52:14] are you subscribed to labs-l? [08:52:17] if not, please do [08:53:29] PROBLEM host: deployment-web is DOWN address: deployment-web CRITICAL - Host Unreachable (deployment-web) [08:54:56] inst dumpster01 in search [08:55:10] is that the only one? [08:55:34] and in deployment [08:55:45] which one in deployment? [08:55:46] deployment-mw-search [08:55:54] deployment-mwsearch [08:55:55] ok. gimme a sec [08:56:04] thanks [08:57:14] hm. dumpster01 doesn't seem to want to boot [08:58:08] ok. let me check the other [08:58:16] I'll need to debug dumpster01 later [08:59:39] I don't see an instance named deployment-mwsearch [09:00:42] there's no instance named that [09:00:44] OrenDsk: ^^ [09:01:00] ah [09:01:02] wmsearch [09:01:47] OrenDsk: that instance is up [09:01:48] sec [09:01:54] deployment-wmsearch [09:02:07] dumpster01 has some other issue. I'll need to debug it tomorrow [09:02:11] ok [09:02:20] perhaps I'll set up dumpster again [09:02:27] for get about it [09:02:54] I got new code to put in it anyhow [09:03:06] thanks [09:03:13] yw [09:17:49] RECOVERY SSH is now: OK on deployment-web deployment-web output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [09:17:49] RECOVERY Current Users is now: OK on deployment-web deployment-web output: USERS OK - 0 users currently logged in [09:17:59] RECOVERY host: deployment-web4 is UP address: deployment-web4 PING OK - Packet loss = 0%, RTA = 0.72 ms [09:17:59] RECOVERY host: deployment-web is UP address: deployment-web PING OK - Packet loss = 0%, RTA = 2.57 ms [09:18:29] RECOVERY Free ram is now: OK on deployment-web deployment-web output: OK: 91% free memory [09:19:59] RECOVERY Current Load is now: OK on deployment-web deployment-web output: OK - load average: 0.15, 0.09, 0.03 [09:19:59] RECOVERY Total Processes is now: OK on deployment-web deployment-web output: PROCS OK: 107 processes [09:21:09] RECOVERY Disk Space is now: OK on deployment-web deployment-web output: DISK OK [09:21:09] RECOVERY dpkg-check is now: OK on deployment-web deployment-web output: All packages OK [09:36:08] Ryan_Lane: are you sleeping ;-) [09:37:14] jeremyb: boo [09:37:21] jeremyb: ping [10:13:11] Ryan_Lane: hi [10:20:13] Damianz: here? [10:20:24] I need a help with puppet a bit [10:21:16] Not that good with puppet tbh. [10:22:04] I can't even checkout the labs repo [10:22:17] I need to modify nrpe [10:22:26] but what I get is some other file [10:22:29] probably from prod [10:22:40] how do you read the repo? [10:22:53] or how do I switch to labs [10:24:58] git checkout test && git rebase master? [10:25:04] PROBLEM Puppet is now: CRITICAL on deployment-web2 deployment-web2 output: NRPE: Command check_puppet not defined [10:28:12] trying [10:28:17] why it's not in docs? [10:29:16] petanb@server:~/prod/blah/puppet$ git rebase master [10:29:16] fatal: Needed a single revision [10:29:16] invalid upstream master [10:29:41] hmm maybne doesn't need rebasing [10:29:50] ok but I see files which aren't labs ther [10:29:58] configs are surely from prod [10:30:19] try git rebase test? [10:30:24] I need to open templates/nrpe_local.cfg.erb [10:30:44] We kinda abuse branches because they arn't branches anymore =/ [10:30:59] :D [10:31:00] that's it [10:31:02] thanks [10:34:12] New patchset: Petrb; "inseted a new check for puppet freshness" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3376 [10:34:23] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3376 [10:39:42] New patchset: Petrb; "inseted a new check for puppet freshness" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3376 [10:39:54] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3376 [10:41:33] New patchset: Petrb; "inseted a new check for puppet freshness" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3376 [10:41:45] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3376 [10:43:26] New patchset: Petrb; "inseted a new check for puppet freshness" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3376 [10:43:38] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3376 [10:44:40] ok, can someone merge it :D [10:44:49] or review it [10:48:49] This is rather than having it snmptrap? [10:49:02] that's not going to work on labs [10:49:07] because of firewall [10:49:21] there is snmp version for prod [10:49:31] which has hardcoded ip of production [10:49:35] New review: DamianZaremba; "(no comment)" [operations/puppet] (test) C: 1; - https://gerrit.wikimedia.org/r/3376 [10:50:04] PROBLEM Puppet is now: WARNING on deployment-web2 deployment-web2 output: NRPE: Unable to read output [10:50:25] wondering if nagios can do sudo [10:50:48] I wanted to use 4755 chmod but it doesn't work because of way how we mount stuff [10:51:22] Not sure, I get confused betwean work nagios and labs nagios and my nagios :P [10:51:33] heh [10:51:48] I would rather make puppet file readable by all [10:51:53] no reason to make it only for root [10:51:59] especially when all users have root [11:05:04] PROBLEM Puppet is now: CRITICAL on deployment-web2 deployment-web2 output: NRPE: Command check_puppet not defined [11:36:07] https://labsconsole.wikimedia.org/wiki/Help:Access#Using_agent_forwarding - it seems I can connect to bastion with a simple "ssh bastion.wmflabs.org" [11:36:59] You should be able to. [11:37:15] You won't be able to connect to any of the instances though -- well unless you have a key on bastion that's allowed. [11:37:28] Damianz: but why the example has so many lines? [11:37:34] eval `ssh-agent`; ssh-add; ssh -A @bastion.wmflabs.org [11:37:41] Yeah [11:37:52] ssh-agent and ssh-add setup and add your key to a ssh agent locally [11:37:56] -A forwards the agent [11:38:04] So you can proxy though bastion without having any keys on it. [11:38:07] Nice for security. [11:38:41] You could also drop proxy stuff into .ssh/conf and ssh 'directly' to the instance hostname and have your ssh client do all cool proxying stuff though bastion for you. [11:41:36] "ssh: Could not resolve hostname bastion2.wmflabs.org: Name or service not known" [11:41:52] bastion2.wmflabs.org is given in the example config file [11:56:13] <^demon> Ryan_Lane: You around? [11:58:30] !accountreq | IWorld [11:58:30] IWorld: in case you want to have an account on labs, please contact someone who is in charge of doing that: Ryan.Lane, m.utante or ssmolle.tt [11:58:45] mh [12:00:27] what can I do if I want to have apache running on my instance [12:01:00] !accountreq is look at this page: https://www.mediawiki.org/wiki/Project:Labsconsole_accounts [12:01:01] Key exist! [12:01:05] oh [12:01:09] !accountreq del [12:01:10] Successfully removed accountreq [12:01:13] !accountreq is look at this page: https://www.mediawiki.org/wiki/Project:Labsconsole_accounts [12:01:13] Key was added! [12:01:19] !accountreq | IWorld [12:01:19] IWorld: look at this page: https://www.mediawiki.org/wiki/Project:Labsconsole_accounts [12:01:22] ok [12:03:34] seems selecting items from Special:NovaInstance&action=configure doesn't work [12:05:19] hmm [12:26:31] RECOVERY Free ram is now: OK on catsort-pub catsort-pub output: OK: 92% free memory [12:27:51] RECOVERY Total Processes is now: OK on catsort-pub catsort-pub output: PROCS OK: 80 processes [12:28:41] RECOVERY dpkg-check is now: OK on catsort-pub catsort-pub output: All packages OK [12:29:41] RECOVERY Current Load is now: OK on catsort-pub catsort-pub output: OK - load average: 0.02, 0.07, 0.03 [12:30:01] RECOVERY Current Users is now: OK on catsort-pub catsort-pub output: USERS OK - 0 users currently logged in [12:30:41] RECOVERY Disk Space is now: OK on catsort-pub catsort-pub output: DISK OK [13:03:05] 03/22/2012 - 13:03:05 - Updating keys for maxsem [13:03:11] 03/22/2012 - 13:03:10 - Updating keys for maxsem [13:03:13] 03/22/2012 - 13:03:12 - Updating keys for maxsem [13:03:16] 03/22/2012 - 13:03:16 - Updating keys for maxsem [13:22:43] how can I share/copy from a dir in one lab to another [14:00:12] 03/22/2012 - 14:00:11 - Updating keys for maxsem [14:00:15] 03/22/2012 - 14:00:15 - Updating keys for maxsem [14:01:05] 03/22/2012 - 14:01:05 - Updating keys for maxsem [14:01:10] 03/22/2012 - 14:01:09 - Updating keys for maxsem [14:01:11] 03/22/2012 - 14:01:11 - Updating keys for maxsem [14:01:14] 03/22/2012 - 14:01:14 - Updating keys for maxsem [14:02:05] 03/22/2012 - 14:02:05 - Updating keys for maxsem [14:02:11] 03/22/2012 - 14:02:10 - Updating keys for maxsem [14:02:13] 03/22/2012 - 14:02:12 - Updating keys for maxsem [14:02:16] 03/22/2012 - 14:02:16 - Updating keys for maxsem [14:22:34] OrenOf: hi [14:50:31] hii [15:56:36] PROBLEM Puppet check is now: CRITICAL on deployment-web2 deployment-web2 output: NRPE: Command check_puppet not defined [16:24:26] is it possible to assign a domain name to an instance ? [16:24:46] or rather - how it it done [16:25:11] You need a public ip, ask Ryan. [16:25:19] Some projects like bots have apache servers for that purpose [16:25:35] Eventually we'll have a proxy so you can just get it 'routed' to an instance. [16:28:33] OrenOf: rather than copying between instances, use the shared storage [16:28:39] OrenOf: /data/project [16:28:47] OrenOf: you really need to subscribe to labs-l [16:29:22] <^demon> Ah yay Ryan_Lane is back :) [16:29:27] ish [16:29:31] I'm about to head off to the office [16:29:48] <^demon> Ah, we can talk later then :) [16:29:52] Ryan_Lane: good morning [16:29:56] quick someone gaffatape him to a chair! [16:30:01] good morning [16:30:12] Ryan_Lane: labs-l should be renamed to notification-of-ryan-breakage :D [16:30:15] I will subscribe to this list as well [16:30:22] heh [16:30:27] well, last time I didn't break it ;) [16:30:34] gluster broke itself [16:30:48] It's just that I already get about 4 lists and can't realy follow so much stuff [16:31:02] New review: Bhartshorne; "(no comment)" [operations/puppet] (test); V: 0 C: 0; - https://gerrit.wikimedia.org/r/3376 [16:31:32] ^demon: when people request features in labsconsole, you should tell them to open a bug for the openstackmanager extension ;) [16:31:45] OrenOf: labs-l is fairly low volume [16:32:02] <^demon> I suppose I could do that. [16:32:04] it's mostly announcements of things that are available now, or of outages [16:32:13] You should subscribe to the kernel driver list :P That's like ARGHAGHAGH trying to kep up. [16:32:18] hahaha [16:33:25] I subscribe to labs-l in digest mode. It's not too high-traffic. [16:34:01] <^demon> Ryan_Lane: So yeah what I wanna talk about later is these hook scripts a guy wrote to "automatically update the parent repo when a submodule changes" [16:34:19] <^demon> Since gerrit doesn't support that automagically yet and I need a way to keep the extensions.git repo in sync with the various HEADs [16:34:27] update the parent repo? [16:34:37] in which way does it do so? [16:34:54] Ryan_Lane: can you/I assign my instances externaly visible domain names [16:35:07] at least the solr [16:35:08] OrenOf: you need a public IP? [16:35:11] <^demon> On change-merged, it looks in the config file to see if there's any configured repos using it as a submodule. If so, it updates that "parent" repo to point to the new HEAD. [16:35:16] PROBLEM Disk Space is now: WARNING on wikistream-1 wikistream-1 output: DISK WARNING - free space: / 78 MB (5% inode=47%): [16:35:22] OrenOf: what will it be used for? [16:35:51] to remote debug using eclipse [16:35:54] ^demon: this needs to happen on the server side? [16:35:55] ah [16:35:56] <^demon> Ryan_Lane: So in our case, mediawiki/extensions.git will be updated when you push to mediawiki/extensions/LdapAuthentication.git [16:35:58] <^demon> Yep [16:36:18] OrenOf: ok. which project is this? [16:36:28] ^demon: ah ok [16:36:40] ^demon: push it into gerrit and I'll review it ;) [16:36:49] what language is it? [16:36:52] <^demon> python [16:36:58] cool [16:37:06] <^demon> It's pretty much abandoned upstream so I'll just fork it into the repo with the other hooks [16:37:16] instance is dev-solr and project is search I think [16:38:00] if you could do the same with search-test it would be great as well [16:38:12] Ryan_Lane: is SVN moved to Git? [16:38:27] ugh, you need two public IPs for this? [16:38:41] IWorld: you're asking in the wrong place again ;) [16:38:51] atgh [16:38:54] *argh [16:39:03] OrenOf: there's no way you can do this via port forwarding? [16:39:16] give me one for now [16:39:20] for solr [16:39:25] we're incredibly low on IP addresses [16:39:47] if both instances are in the same project, you can move the IP address between them [16:39:55] I was thinkig can you give a domain name [16:40:07] you can create that yourself [16:40:17] ok so give me just one [16:40:30] and tell me how to set the domain name [16:40:49] so, all of this you do via "Manage addresses" [16:40:59] that special page takes a while to load [16:41:09] in your project, select "Allocate IP address" [16:41:33] then, when the address is allocated, you can add a hostname to it (use the wmflabs domain when doing so) [16:41:41] and you can "Associate" the IP address with an instance [16:42:07] which makes the network traffic for that IP address get routed to that instance [16:42:21] if you want to move the IP to another instance, you can "reassociate" it [16:43:58] can I associate 2+ domains to one ip adddress [16:44:03] yes [16:44:09] add a second hostname [16:44:12] to the same IP [16:44:14] that's excelent [16:44:30] you can only associate an IP address with one instance at a time, though [16:44:39] but you can move it at will [16:45:19] I'll check if I realy need an Ip adress - perhaps the domain name would be enough [16:45:31] you need the IP [16:45:32] does that make sense [16:45:42] the hostname is tied to the IP [16:45:57] you can't have one without the other [16:46:19] couldn't you sethings up like a virtual hosting do [16:46:28] many domains on one IP [16:47:05] sethings = set things [16:47:22] we have plans for it: http://www.mediawiki.org/wiki/Wikimedia_Labs/Reverse_proxy_for_web_services [16:47:43] that only really works for web services, though [16:48:07] technically if we use a tcp load balancer it could work for more than that [16:48:31] hm. not really, though [16:48:51] a web reverse proxy works because the web server can do L7 balancing [16:49:21] anyway, I'm off to the office [16:49:26] * Ryan_Lane waves [16:49:27] ok thnkas [16:49:31] yw [16:49:34] have fun [17:55:15] andrewbogott: hm. I need to get you shell access and root on at least the openstack stuff [17:56:42] since you likely know the most about how to manage nova. [17:56:49] and since I'll be out of town for like 9 days [17:58:53] I guess I need to rename werdna for that... [18:03:33] Ryan_Lane, the title for Labs-l is "A list for announcements and discussion related to the Wikimedia Labs project."... Isn't that supported to be a description or something? [18:04:36] Every message I get from that list is "To: A list for announcements and discussion related to the Wikimedia Labs project. " which doesn't seem right to me [18:10:17] hmm yes you're right [18:11:01] well, what should I change it to? [18:11:25] Wikimedia Labs project [18:12:13] ok. changed [18:14:35] is the list open subscription? [18:14:49] yes [18:28:31] 03/22/2012 - 18:28:31 - Creating a project directory for wikidata-dev [18:28:32] 03/22/2012 - 18:28:31 - Creating a home directory for laner at /export/home/wikidata-dev/laner [18:29:30] 03/22/2012 - 18:29:29 - Updating keys for laner [18:31:21] New review: Ryan Lane; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2157 [18:31:32] ssmollett: heh [18:31:38] merge conflict ;) [18:32:21] lemme try to fix [18:33:08] yup ... i was expecting a conflict. [18:34:41] New patchset: Ryan Lane; "First iteration of adding ganglia for labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/2157 [18:34:52] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/2157 [18:35:13] New review: Ryan Lane; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2157 [18:35:16] Change merged: Ryan Lane; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/2157 [18:36:37] Ryan_Lane: if you have a minute, would you review https://labsconsole.wikimedia.org/wiki/Managing_Multiple_SSH_Agents [18:37:01] sure [18:37:20] I got fed up yesterday trying to deal with keeping my prod key separate from labs. [18:37:30] * Ryan_Lane nods [18:37:36] I have two windows open [18:37:55] and in one window I start one agent, and in the other I start another [18:38:06] then add labs key to one, and production to the other [18:38:06] I want to have two screen sessions both running on bast1001. [18:38:17] that doesn't work so well in that scenario [18:38:18] you connect to labs via bast1001? [18:38:40] I connect to everything via bast1001. (I don't run screen on my laptop) [18:38:53] I run screen on bastion-restricted, and on fenari [18:39:36] * Ryan_Lane reviews [18:40:56] heh. your persistent agent code is complex :) [18:41:02] it is. [18:41:50] http://pastebin.com/7bujDmax [18:43:02] Ryan_Lane: reload the wiki page? [18:43:24] mine is incredibly simple and error prone :D [18:43:38] but hey, if it works... [18:43:55] hasn't bitten me yet, but sticking it into /tmp is likely a bad idea [18:44:33] not that I can think of how it would be insecure [18:45:16] hm. I like the check for ownership [18:45:18] * Ryan_Lane steals that [18:45:33] well, both our solutions allow for race conditions. There's a reason ssh doesn't use a predictable filename for its socket. [18:45:47] * Ryan_Lane nods [18:46:10] this looks good to me [18:46:26] this is basically what I do, except I do things manually localy [18:46:33] then ssh to two different bastions [18:46:58] I added a link to it on the access page: https://labsconsole.wikimedia.org/w/index.php?title=Help:Access&diff=2805&oldid=2791 [18:47:09] thanks [18:47:27] if it looks good, I'm going to mail it out to ops@ to encourage us to make sure our keys are different. [18:47:32] (without it being a pain) [18:47:34] yeah. sounds good [18:47:40] thanks for the review! [18:47:57] technically it shouldn't matter much now with separate bastions in labs [18:48:04] but, it's still good practice [18:48:22] I'll wait to really say that until I disable agent forwarding for non-bastion instances [18:48:25] it does still matter if you ever ssh into an instance and forward your agent there. [18:48:30] ;) [18:48:42] because, you know, you're never going to want to ssh from one instance to another, right? [18:48:59] we have shared storage [18:49:06] which makes the necessity for that much smaller [18:49:32] we really need kerberos [18:49:50] it would solve a number of problems [18:50:01] the kerberos setup at Linden was frikkin awesome. [18:50:26] I wish freeipa was packaged for ubuntu [18:50:28] tickets were used both for ssh access and for http access. [18:50:42] I'd prefer to use SAML or OpenID for web [18:50:42] you were only presented with basic auth to log in to protected sites if you didn't have a key. [18:51:03] since not all web users will also have shell access [18:51:15] but yeah, that's a good way to handle it [18:51:16] the end result was that you got one ticket when you got to work and from there everything Just Worked(tm). [18:51:19] yep [18:51:37] on the web side, if we used SAML, you'd get a web token that would auto-log you in to everything [18:51:42] it's like kerberos, but for the web [18:51:58] kerberos isn't safe through web proxies ;) [18:52:22] since it's connection based and http is connectionless [18:52:38] IIRC we set up spnego for the web stuff. [18:52:42] * Ryan_Lane nods [18:52:55] that's a way around that issue [18:53:29] I started with trying kerberos for the web stuff at my last place [18:53:46] and decided to go with SAML, since kerberos requires browser configuration [18:53:51] any issue with me sending a link to that page to labs-l as well as ops? [18:53:53] in firefox you have to go into about:config :( [18:53:57] nope. go for it [18:54:48] oh. awesome [18:54:58] seems some openstack people are packaging freeipa [18:55:14] freeipa integrates kerberos and LDAP [18:57:19] hm. it may be included in precise, at least for the client [19:16:26] Ryan_Lane: Sorry, was at lunch. Is werdna also named 'andrew' in his shell account? [19:17:04] on production, yeah :( [19:18:13] OK. Well, if werdna doesn't mind changing then that's great (he said it was OK as I recall.) Otherwise I can probably survive having two different logins. [19:18:52] New patchset: Sara; "Revert "First iteration of adding ganglia for labs."" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3423 [19:19:03] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3423 [19:19:21] New review: Sara; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3423 [19:19:23] Change merged: Sara; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3423 [19:21:11] petan or petan|wk: are you here? [19:28:25] Ryan_Lane: It would help me if this diagram (http://wikitech.wikimedia.org/view/File:Virtual_architecture_diagram.svg) were marked with hostnames. Or , at any rate, if there were a list of hostnames & their respective services on http://wikitech.wikimedia.org/view/OpenStack. Does that exist someplace? Or, if not, is it something you could add before you skip town? [19:40:07] PROBLEM dpkg-check is now: UNKNOWN on analytics-main analytics-main output: Invalid host name analytics-main [19:41:12] 03/22/2012 - 19:41:12 - Updating keys for edsu [19:41:27] 03/22/2012 - 19:41:26 - Updating keys for edsu [19:42:13] 03/22/2012 - 19:42:12 - Updating keys for edsu [19:42:25] 03/22/2012 - 19:42:24 - Updating keys for edsu [19:45:07] PROBLEM dpkg-check is now: CRITICAL on analytics-main analytics-main output: Connection refused by host [19:52:45] can other people see http://wikistream-1.pmtpa.wmflabs/ ok? [19:53:54] edsu: I can. [19:53:55] edsu, what do you mean? [19:54:02] edsu: we haven't got a .wmflabs TLD... [19:54:05] :D [19:54:06] making a tunnel? [19:54:10] though it's doing some funny thing with teh scroll bar. [19:54:38] (yes, it has to be tunneled through the labs bastion host) [19:54:52] ^^^ Platonides and IWorld [19:55:48] ok [19:55:50] maplebed: yeah the size of the page is shrinking and growing so the scrollbar jumps sometimes ; need to fix that somehow [19:56:40] it's worse than just the scrollbar jumping around - my browser is apparently the perfect height for the scroll bar to appear and disappear constantly. [19:56:53] maplebed: ahh yeah, that's bad [19:58:25] edsu: another visual bug - when you pause it the grayet out wikistream title box doesn't have rounded corners. [20:01:00] maplebed: if you reload does that fix the height? [20:02:37] so far. [20:05:58] maplebed: cool, i think i fixed the rounded pause corners too [20:07:58] edsu: the rounded corner thing isn't fixed for me when the background is something other than black or white. [20:09:23] ok [20:20:10] maplebed: does it look any better now? [20:20:27] * maplebed reloads [20:20:39] nope. [20:20:42] wanna screen shot? [20:20:47] yeah, that would be great [20:21:18] http://screencast.com/t/9Bw6A9PAAPQA [20:22:18] another suggestion - do something about checking image sizes. It looks weird tiled. [20:22:22] what browser are you using? [20:22:24] :P [20:22:27] firefox 11.0 [20:22:40] (on a mac) [20:22:42] maplebed: i kind of like the tiled look myself [20:22:51] it works sometimes [20:23:00] maplebed: not going to spend much energy there, but you are free too if you want :) http://github.com/edsu/wikistream [20:23:03] other times it reminds me of late 90s web design. [20:23:12] maplebed: that's fine w/ me :-) [20:23:16] heh... [20:23:39] I'll pass on patching... [20:23:51] speaking of which maybe i should reomve the rounded corners all together, that should fix the problem [20:24:27] thanks for taking a look though, i appreciate it [20:24:37] I'm always happy to provide feedback. :D [20:26:04] ok now you have me thinking, what would you do w/ the smaller images? :-) [20:26:19] stretch them? [20:26:55] actually, I would ask commons for larger versions and if I get a 'no larger version exists' error, skip it and go on to the next. [20:27:07] there are plenty of large enough images for nearly any browser. [20:27:28] well they are images that have been updated recently [20:27:33] you mean filter out the smaller ones? [20:28:10] pretty much. [20:29:25] are you pulling from something like http://commons.wikimedia.org/wiki/Special:NewFiles? [20:30:39] maplebed: are you giving a short presentation tonight? [20:30:45] Ryan_Lane: if you want me to. [20:30:49] that would be good [20:30:53] ok! [20:30:55] you're using os x, right? [20:30:58] yes. [20:31:08] I'm going to give a short one on nova. [20:31:18] 03/22/2012 - 20:31:17 - Updating keys for maxsem [20:31:24] and likely discuss how we're using similar support architecture (gerrit and jenkins) [20:31:38] I should likely make a presentation soon [20:31:42] heh [20:31:52] I think I will spend the rest of the afternoon doing that. [20:31:53] :D [20:32:05] 03/22/2012 - 20:32:05 - Updating keys for maxsem [20:32:10] 03/22/2012 - 20:32:10 - Updating keys for maxsem [20:32:12] yeah, I need to do the same [20:32:12] 03/22/2012 - 20:32:12 - Updating keys for maxsem [20:33:25] maplebed: no, i'm sitting in 30 irc channels :) [20:34:00] andrewbogott: so, virt1-4 are the compute nodes [20:34:02] maplebed: well wikistream is, not me ... thank goodness, heh [20:34:13] andrewbogott: virt2 is the network and api node [20:34:28] OK, that's a pretty simple diagram :) [20:34:37] andrewbogott: virt0 has mediawiki, ldap, dns, mysql, the scheduler, and the queue [20:35:02] maplebed: we want audio, or just to speak loudly? [20:35:31] we aren't recording or broadcasting [20:35:41] I prefer to speak loudly. [20:35:44] same [20:35:46] I have no trouble projecting. [20:35:50] cool [20:36:44] andrewbogott: oh, puppet master is also virt0 [20:36:49] gerrit in on manganese [20:36:51] and formey [20:37:00] glance is on virt0 [20:37:12] that diagram is out of date.... [20:37:26] edsu: oh yeah... it doesn't surprise me that there's a channel with changes on commons. [20:37:45] hm. I wonder when my flight is [20:38:07] Ryan_Lane: Is the OS meetup tonight or was it last night? [20:38:15] tonight [20:38:35] we have 72 people signed up. should be interesting [20:38:51] wait, no. 75 now [20:38:55] <^demon> What time is that again? [20:39:01] Email you sent has Mar 21 in the subject line -- not sure if that matters to anyone. [20:39:12] shit. it does? [20:39:31] Well, it says 'tomorrow, mar 21' so... mixed message :) [20:39:55] Ryan_Lane: I'll be sorry to miss it, whenever it is. [20:40:17] heh [20:40:26] well, you'll be in town for the design summit :) [20:40:28] so will faidon [20:40:36] Did you get him a ticket, btw? [20:40:41] not yet [20:40:42] but... [20:40:48] we have two tickets for people for the booth [20:41:00] so, if we can't get him one, we'll use one of the booth tickets for him [20:41:11] 03/22/2012 - 20:41:10 - Updating keys for maxsem [20:41:12] 03/22/2012 - 20:41:12 - Updating keys for maxsem [20:41:16] 03/22/2012 - 20:41:16 - Updating keys for maxsem [20:41:34] Ryan_Lane: An email from the 19th says "The good news is that all the people that have asked us until today, will get an invite to the summit." [20:41:36] labs-home-wm: bah. I must fix you [20:41:48] oh [20:41:50] good [20:41:54] he'll get one then, I guess [20:42:05] 03/22/2012 - 20:42:05 - Updating keys for maxsem [20:42:06] hm. flight tomorrow is for 1pm [20:42:10] And Ben and Sumanah are coming too? [20:42:18] I dunno if sumanah is coming [20:42:33] Ah, I figured she was designated tabler. [20:42:41] Karen Chelini is coming [20:42:43] <^demon> Ryan_Lane: Is there a call-in line for the OS meeting? [20:42:58] ^demon: hm. likely won't be teribly useful [20:43:06] <^demon> Aw ok :( [20:43:12] we're doing two short presentations and the rest will be socializing and hacking, likely [20:43:30] * ^demon will just have to socialize on irc. [20:43:34] heh [20:44:00] Ryan_Lane: Karen Chelini isn't on the WMF staff page... is she someone I would've met in SF? [20:44:09] she's new [20:44:12] she's a recruiter [20:44:23] Ah, makes sense. [20:49:40] <^demon> Ryan_Lane: Is there a way to look up someone by CN? [20:49:49] <^demon> I've got someone who doesn't know the shell name. [20:50:06] ^demon: getent + grep? [20:50:20] <^demon> Oh duh [20:50:30] <^demon> ldaplist -l passwd | grep blahhhh [20:59:06] yeah. that'll work [20:59:10] you can also do an ldapsearch [20:59:19] using your own dn and password [21:09:52] New patchset: Sara; "Second iteration of adding ganglia for labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3494 [21:10:04] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3494 [21:13:12] New review: Sara; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3494 [21:13:15] Change merged: Sara; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3494 [21:27:58] maplebed: changed the logic and css for the background image: http://wikistream-1.pmtpa.wmflabs/ [21:28:18] maplebed: maybe for the worse :) [21:28:23] heh... [21:28:34] I don't mind it much. [21:28:39] I think I'd lose the box around it though. [21:28:51] Being centered etc. it stands out pretty well. [21:28:54] oh... [21:28:55] hm. [21:28:59] now that an image has loaded, [21:29:07] it makes more sense to have the box. [21:29:23] though it's boxed by the background transparency pretty well on its own with out the line. [21:29:25] hm. [21:29:26] meh. [21:29:26] ok same for the banner at the top too? [21:29:36] at least there are no corner bugs now. [21:29:37] ;) [21:29:56] which banner? the github one? [21:30:22] the box at the top of the screen with the wikimedia logo in it [21:30:30] it has a similar border [21:30:41] oh, sorry, that's the one I was talking about. [21:30:59] oh, gotcha; i tried removing both of them [21:31:49] to me, the one around the content looks appropriate. the one around the banner works when the banner is not the same as the background color, but looks odd when the background's white. [21:32:08] oh, and if you want to get fancy, I'd suggest preloading the image then swapping it out rather than blanking the page then loading the new image. [21:32:18] that would eliminate the white background in between images (as they change) [21:32:57] is it just the images I'm seeing or did you also take my suggestion on image size to avoid tiling? [21:33:19] maplebed: good thought, i like that [21:33:48] maplebed: yeah, i changed the css slightly to avoid tiling [21:33:50] you could also do fancy things like a fade transition all slideshow-like... ;) [21:33:55] (if you preloaded) [21:34:08] I'll stop short of suggesting you ken-burns the thing though. [21:34:22] to preload i guess i just fetch it, and then do the switch once the browser has it cached? [21:34:33] honestly, I've never actually done it, [21:34:35] maplebed: hahah, that would be nice [21:34:38] just know that it can be done. [21:34:41] k [21:35:00] IIRC you have to set it to load in a div that's out of the browser's viewable area [21:35:04] then move it or something. [21:35:07] but it's been a while. [21:35:29] hey, look at tha. http://perishablepress.com/3-ways-preload-images-css-javascript-ajax/ [21:35:31] :P [21:35:47] interwebs++ [21:35:53] totally. [21:36:41] hey, lookit that! I think wikisteram just crashed by browser tab. [21:37:15] http://screencast.com/t/Xc19hzTtSq [22:22:43] maplebed: tried out jquery-backstretch seems to work a bit better [22:22:57] * maplebed reloads [22:23:52] yay fades! [22:24:06] my cpu is going crazy now though ... hmm :) [22:24:11] lol [22:25:59] yeah, sent my firefox process from ~20% to ~60% CPU. [22:26:11] but hey, what else will I heat my office with? [22:26:54] <^demon|away> Ryan_Lane: Could we enable git:// in addition to https:// for anon users? [22:27:48] no [22:27:52] it's insecure [22:28:11] it's one of the reasons I didn't add it to begin with [22:28:24] ^demon|away: ^^ [22:28:32] <^demon|away> Wouldn't it behave just like an anon clone over https? [22:28:33] a MITM could replace data [22:28:50] I dislike that we allowed http with svn, too [22:29:16] <^demon|away> Mmk. I don't really need it (I do everything over ssh). But it was asked. [22:29:21] maplebed: heh, yeah i imagine the old version did too, i spaced out the checks for new images a bit more [22:30:02] tell them to use https? :) [22:31:20] <^demon|away> It was asked by an anon on one of the talk pages if we were gonna support git:// in addition to https:// [22:31:37] <^demon|away> I paraphrased and said no. [22:31:53] <^demon|away> http://www.mediawiki.org/w/index.php?title=Talk%3AGit%2FConversion&diff=514381&oldid=514367 [22:46:38] PROBLEM Free ram is now: WARNING on orgcharts-dev orgcharts-dev output: Warning: 16% free memory [22:47:28] PROBLEM Free ram is now: WARNING on test-oneiric test-oneiric output: Warning: 14% free memory [23:01:38] PROBLEM Free ram is now: WARNING on nova-daas-1 nova-daas-1 output: Warning: 16% free memory [23:02:28] PROBLEM Free ram is now: CRITICAL on test-oneiric test-oneiric output: Critical: 5% free memory [23:05:48] PROBLEM Free ram is now: WARNING on utils-abogott utils-abogott output: Warning: 16% free memory [23:06:38] PROBLEM Free ram is now: CRITICAL on orgcharts-dev orgcharts-dev output: Critical: 4% free memory [23:13:30] RECOVERY Free ram is now: OK on test-oneiric test-oneiric output: OK: 97% free memory [23:15:58] New patchset: Sara; "Third iteration of adding ganglia for labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3504 [23:16:09] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3504 [23:16:42] RECOVERY Free ram is now: OK on orgcharts-dev orgcharts-dev output: OK: 96% free memory [23:17:15] New review: Sara; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3504 [23:19:40] PROBLEM Free ram is now: WARNING on test3 test3 output: Warning: 11% free memory [23:22:02] Ryan_Lane: I think I've more-or-less finished my swift slides. [23:26:40] PROBLEM Free ram is now: CRITICAL on utils-abogott utils-abogott output: Critical: 3% free memory [23:28:33] Change abandoned: Sara; "(no reason)" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3504 [23:31:40] RECOVERY Free ram is now: OK on utils-abogott utils-abogott output: OK: 97% free memory [23:31:40] PROBLEM Free ram is now: CRITICAL on nova-daas-1 nova-daas-1 output: Critical: 3% free memory [23:32:35] heh. still working on mine [23:33:01] (I meant in case you wanted to review them. :P ) [23:34:14] New patchset: Sara; "Third iteration of adding ganglia for labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3506 [23:34:25] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3506 [23:35:23] New review: Sara; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3506 [23:35:25] Change merged: Sara; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3506 [23:36:40] RECOVERY Free ram is now: OK on nova-daas-1 nova-daas-1 output: OK: 93% free memory