[00:01:06] New patchset: Bhartshorne; "copying over swift files from production" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3778 [00:01:18] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3778 [00:02:01] alejrb: Yea, looks like MySQL is a no go at the moment, but that might change] [00:04:05] mk, I'll have a think [00:04:06] cheers [00:04:43] New review: Bhartshorne; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3778 [00:04:46] Change merged: Bhartshorne; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3778 [04:20:34] 03/27/2012 - 04:20:33 - Creating a home directory for daniel at /export/home/wikidata-dev/daniel [04:21:31] 03/27/2012 - 04:21:30 - Updating keys for daniel [12:36:58] <^demon> Ryan_Lane: I found gerrit 2.3 release notes :) [12:37:01] <^demon> https://gerrit-review.googlesource.com/#/c/33220/9/ReleaseNotes/ReleaseNotes-2.3.txt [12:37:10] heh [12:37:18] * Ryan_Lane is on vacation ;) [12:37:25] * Ryan_Lane isn't actually here [12:37:42] <^demon> I know, I'm just excited :D [12:37:44] <^demon> "* Allow superprojects to subscribe to submodules updates" [12:37:48] ah. draft status [12:37:49] nice [12:38:13] <^demon> Yeah, you can commit drafts by pushing to refs/drafts/foo like you do with /for/ [12:38:22] git review will need to be changed for 2.3 [12:38:25] Ryan_Lane: Sorry to disturb you.... but any news on MySQL for bots? [12:38:41] methecooldude: I'm not sure I know what you mean? [12:39:02] Ryan_Lane: http://bots.wmflabs.org/phpmyadmin [12:39:05] <^demon> Ryan_Lane: Likely :) In the meantime, I'll probably work on getting the 2.3rc0 running on gerrit-dev.wmflabs. [12:39:09] Who can log in to that [12:39:15] methecooldude: ask petan [12:39:20] Ryan_Lane: Mkay [12:39:22] I don't know [12:39:33] ^demon: cool [12:40:07] ^demon: Create project through web interface [12:40:09] \o/ [12:40:16] <^demon> Yes I know :D :D [12:40:21] <^demon> Life is good in 2.3 [12:40:41] Redirect the user to the reverted change (when reverting).<-- [12:42:37] !access | Daniel_WMDE [12:42:37] Daniel_WMDE: https://labsconsole.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [12:43:07] Hey, I didn't even know this existed! [12:43:22] Ryan_Lane: ^demon, Ryan_Lane: yea, i read that. but i can't even log into bastion. [12:43:46] did anyone add you to the project? [12:43:49] * Ryan_Lane checks [12:43:51] yes# [12:44:00] <^demon> Quick, hide the booze. Coren found us :) [12:44:00] well, to my project. no idea about bastion [12:44:31] ^demon: No worries. Unless it's a fine port, I'm unlikely to hog the booze. :-) [12:44:45] Daniel_WMDE: what's your labsconsole user name? [12:45:31] <^demon> Ryan_Lane: Also, "Enable case insensitive login to Gerrit WebUI for LDAP authentication" [12:45:40] meh [12:45:54] it isn't case sensitive for mediawiki [12:46:02] Daniel_WMDE: can you add me to the labs project "Wikidata"? [12:46:06] Ryan_Lane: "Daniel Kinzler". [12:46:07] <^demon> People are used to wiki-casing where first letter doesn't matter. [12:46:13] <^demon> Right now, Demon != demon [12:46:42] you weren't in the bastion project [12:46:48] IWorld: not at the meoment, no. it's not clear yet how we will be using labs instances, and who gets which kind of access. [12:46:54] <^demon> Also, the fact that you login via CN rather than SN is kind of crazy imho. [12:47:00] Daniel_WMDE: ah [12:47:08] ^demon: oooooh *enable* case insensitive [12:47:13] 03/27/2012 - 12:47:13 - Creating a home directory for daniel at /export/home/bastion/daniel [12:47:30] Ryan_Lane: my shell account is "daniel". i hope. it least fpr git it is. [12:48:13] 03/27/2012 - 12:48:13 - Updating keys for daniel [12:48:20] Daniel_WMDE: ^^ now you can access it [12:48:49] Ryan_Lane: nope. but the error changed. "Connection closed by 208.80.153.194" [12:49:05] also... what was the problem? do we need to do that for all users? can you do it for Abraham too? [12:50:00] Daniel_WMDE: try again? It should definitely work now [12:50:11] everyone that needs labs access must be a member of bastion [12:50:21] *of the bastion project [12:50:41] ah. tell mutante :) [12:50:51] nope [12:50:51] docs say this :( [12:50:52] same error [12:51:34] debug2: we sent a publickey packet, wait for reply [12:51:35] debug1: Server accepts key: pkalg ssh-rsa blen 149 [12:51:37] debug2: input_userauth_pk_ok: fp c0:fc:05:93:80:76:da:b4:fd:24:a2:c9:15:7c:62:f9 [12:51:38] Connection closed by 208.80.153.194 [12:51:51] you are using daniel@bastion.wmflabs.org right? [12:52:00] yes [12:52:11] well, i have it in my .ssh/config [12:52:16] but i can do it again explicitly [12:52:35] daniel@brightpad ~/www/wikidata> ssh -vv -A daniel@bastion.wmflabs.org [12:52:40] ^--- same thing [12:53:12] hm. I can't log into bastion, since I've purposely locked myself out [12:53:18] this makes it hard to debug [12:53:35] well, it'S not urgent... but it sucks :) [12:53:37] Daniel_WMDE: you have a key uploaded to labsconsole? [12:53:43] yes. [12:55:46] Ryan_Lane: if i select a different identity file (with the wrong key), the error stays the same [12:56:06] could this be somethign silly, like a newline at the end of the key (or lack thereof)? [12:56:13] I dunno [12:57:01] Ryan_Lane: when i try to re-submit my key to labs console, i get "failed to import keypair" [12:57:07] <^demon> fwiw, I can login to bastion with demon@ [12:57:13] ...why "pair"? [12:57:21] this should just be the public key, right? [12:57:31] it's called a keypair thanks to EC2 [12:57:39] hehe [12:57:40] anyway [12:57:49] we're using standard "cloud" terminology [12:57:56] i guess it fails because the key is already in there [12:58:09] i could try to delete it and add it again.. [12:58:17] hm. it shouldn't fail [12:58:19] is that likely to solve problems? or create them? [12:58:25] it won't hurt anything [12:58:30] let me try to add anther key [12:59:02] adding another key worked fine [12:59:07] are you sure you are using daniel@deepthought ? [12:59:13] yes [12:59:13] 03/27/2012 - 12:59:13 - Updating keys for daniel [12:59:17] your key is fine [12:59:31] your home directory is accessible [12:59:31] 03/27/2012 - 12:59:31 - Updating keys for daniel [12:59:33] both keys produce the same error [13:00:30] actually, wait. does ssh ignore the -i option if an identity is specified in the config? [13:00:33] it seems so... [13:00:48] yes [13:01:50] bah. I see why root is locked out on this instance [13:02:54] New patchset: Ryan Lane; "Don't lock out root" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3807 [13:02:54] hm, ssh will always try id_rsa, even if a different identity is specified?! [13:03:02] I'm totally claiming this day as a work day rather than a vacation one now :D [13:03:05] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3807 [13:03:13] ooohhh [13:03:15] wait, etf?? [13:03:27] New review: Ryan Lane; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3807 [13:03:29] Change merged: Ryan Lane; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3807 [13:03:38] Ryan_Lane: sorry to keep you, can can bug ^demon or mutante instead :) [13:03:50] I'm waiting for a friend right now. no worries [13:04:25] Ryan_Lane: my key in labsconsole... i have the smae key twice there now. once prefixed with "ssh-rsa", and once without that prefix. [13:04:32] Ryan_Lane: i guess one of these is wrong :) [13:04:39] delete both [13:05:29] done [13:05:52] figure out which one is the correct one ;) [13:06:13] 03/27/2012 - 13:06:12 - Updating keys for daniel [13:06:31] 03/27/2012 - 13:06:31 - Updating keys for daniel [13:07:12] 03/27/2012 - 13:07:12 - Updating keys for daniel [13:07:30] 03/27/2012 - 13:07:30 - Updating keys for daniel [13:08:10] Ryan_Lane: ok. works with the "ssh-rsa" key. [13:08:23] great [13:08:36] Ryan_Lane: i think the one without the prefix was the one automatically imported when my svn account was converted [13:08:40] is that possible? [13:08:54] or maybe i screwed it up when i tried to copy my key to gerrit last week [13:08:57] also possible [13:09:08] just keep an eye out for this issue... [13:09:38] Ryan_Lane: thanks for your support. could you add Abraham to the bastion project, too? [13:09:56] what's his wiki name? [13:10:22] Daniel_WMDE: yes, whatever was your svn key was automatically imported [13:10:27] Abraham Taherivand [13:11:17] does he even have an account? [13:12:07] I don't see one for him [13:12:15] he told me he does [13:12:16] I'll let someone else handle this [13:12:34] yea, do that :) [13:12:35] but... [13:12:41] daniel@bastion1:~$ ssh 10.4.0.128 [13:12:42] Permission denied (publickey). [13:12:50] you sure it isn't https://labsconsole.wikimedia.org/w/index.php?title=User:Ataherivand&action=edit&redlink=1? [13:13:07] oh, i guess it it [13:13:15] New review: Demon; "This won't be necessary if we can get 2.3 deployed soon enough and test out the new submodule regist..." [operations/puppet] (test); V: 0 C: 0; - https://gerrit.wikimedia.org/r/3521 [13:13:26] (damn, why did he break the convention) [13:13:29] anyway [13:13:47] Daniel_WMDE: so, you need to make sure your agent is forwarded [13:13:51] using -i won't work here [13:13:57] yes [13:13:58] i know [13:14:07] i have agent forwardning enabled in the config file [13:14:12] on bastion, type this: ssh-add -l [13:14:14] i now tried again with ewxplicit -A [13:14:15] no dice [13:14:26] also, which instance is this? don't use IP addresses [13:14:26] 1024 c0:fc:05:93:80:76:da:b4:fd:24:a2:c9:15:7c:62:f9 /home/daniel/.ssh/id_rsa (RSA) [13:14:29] use the instance's name [13:15:14] daniel@bastion1:~$ ssh-add -l [13:15:15] 1024 c0:fc:05:93:80:76:da:b4:fd:24:a2:c9:15:7c:62:f9 /home/daniel/.ssh/id_rsa (RSA) [13:15:17] daniel@bastion1:~$ ssh wikidata-contenthandler-demo [13:15:18] Permission denied (publickey). [13:15:24] instances occasionally fail to build [13:16:32] ...so what do i do now? [13:16:58] seems puppet is broken in the test branch [13:17:09] >_< [13:17:12] someone needs to fix that before instance creation will work [13:17:13] yay :) [13:17:32] so, maybe it should have told me it didn't owrk :) [13:17:37] anyway. i'll leave it for today [13:17:48] maybe i have more luck tomorrow [13:17:53] well, from the console's perspective it worked [13:18:01] won't be at it before 8pm your time though.. [13:18:03] the instance was created. it's running [13:18:11] I'm in Tokyo [13:18:16] it's 10PM here [13:18:27] ah :) [13:23:37] but, I'm not coming into the office till the 2nd anyway [13:23:56] I'm on vacation and just happen to be waiting for a friend to show up, so I popped online [13:24:50] Ryan_Lane: Is it obvious what's broken in puppet? I can have a look at fixing it. [13:25:18] Mar 27 12:29:34 i-000001d3 puppet-agent[1747]: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to parse template ganglia/gmond_template.erb: Could not find value for 'mcast_address' at 48:/var/lib/puppet/templates/ganglia/gmond_template.erb at /etc/puppet/manifests/ganglia.pp:148 on node i-000001d3.pmtpa.wmflabs [13:25:32] somehow mcast_address isn't being defined [13:25:45] likely because we aren't using it in labs [13:25:53] ok; I'll poke about. [13:25:57] cool. thanks [13:27:25] fucking heater [13:27:41] the controls are all in japanese, on a remote [13:27:48] I have no clue what I'm doing when I push the buttons [13:28:00] it's cold :D [13:28:15] it *looks* like it turned on [13:28:42] where's my japanese friend. she always figures out how to turn the stupid thing on :( [13:29:56] Isn't there some smartphone tool that captions japanese writing? [13:30:16] hm. that would be amazingly helpful [13:32:22] \o/ I figured it out [13:32:35] I won't freeze to death [13:33:08] also, renting an apartment, rather than a hotel room, is definitely not more convienient. [13:37:36] cheaper? bigger? [13:37:45] cheaper [13:37:48] not much bigger [13:37:57] no housekeeping, though [13:38:12] no shampoo, no soap [13:38:33] but… there's a washer/drier [13:38:45] and there'e a full kitchen [13:38:49] Does it have a futuristic toilet seat and/or a soaking tub? [13:38:57] both :D [13:39:04] That makes it all worth it! [13:39:07] heh [13:39:19] Although maybe hotel rooms there have both as well. [13:39:34] not the cheap ones [13:39:44] well, they all have the futuristic toilets [13:39:47] I guess a futuristic toilet seat with unintelligible japanese labels is maybe just scary. [13:40:09] hahaha [13:40:13] indeed it is [13:40:31] everything is confusing [13:40:46] the controls for the tub are by the front door [13:40:55] wash/dry/post to flickr [13:41:05] :D [13:41:46] my friend laughed at me when I tried to turn on the heater [13:41:53] "You just turned on the tub" [13:42:22] I feel strongly that controls for a device should be ON THE DEVICE. I'm sort of opposed to TV remotes for this reason. [13:42:46] "no, no, now you're trying to buzz someone into the front door downstairs" [13:42:52] heh [13:42:54] indeed [13:43:07] it would be hard for the heater, though. it's on the ceiling, out of reach [13:43:17] True. [13:43:41] I'm not sure how I'm supposed to buzz someone in. there's three bug buttons [13:43:49] two are in red, one is in green [13:44:12] one is in red, and has daunting red text with an arrow pointing to it [13:44:21] I hope I don't accidentally call the police when she gets here [13:44:24] So that's obvious -- the green one lets them in, one of the red ones calls the police and the other electrocutes. [13:44:33] hahaha [13:45:13] <^demon> Or opens a trap door and sends the unwanted caller to the dungeon. [13:45:34] also a win, but I'm pretty sure that would make her angry [13:46:51] Ryan_Lane: iPhone app 'japan goggles' [13:47:06] (not the one I was thinking of, but it seems to do what I was talking about) [13:48:38] cool [13:48:59] any chance puppet has been broken since the 23rd? [13:51:03] it's possible it's due to sara's changes. yeah [14:01:49] New patchset: Andrew Bogott; "Clarified an '$realm == "labs"' clause." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3809 [14:02:01] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3809 [14:02:43] Everyone blame sara :P [14:02:45] New review: Hashar; "If it is in Gerrit 2.3 we want it deployed instead of an unmaintained third party plugin." [operations/puppet] (test); V: 0 C: -2; - https://gerrit.wikimedia.org/r/3521 [14:03:25] ssmollet! I'm going to go ahead and merge this patch, but I'd still appreciate you reviewing it when you're online: https://gerrit.wikimedia.org/r/3809 [14:03:41] ohh [14:03:41] (Unless you are up at 7AM for some reason) [14:03:44] operations log comes here [14:05:19] that is because one of my change made to the Gerrit hook was not merged yet :-D [14:07:55] hashar: no [14:08:01] hashar: test branch goes here [14:08:08] ohhh [14:08:16] I though I broken something :-] [14:08:22] well, everything except production branch goes here [14:08:25] Ryan_Lane: isn't it like 5am for you? [14:08:31] 11 PM [14:08:42] * Ryan_Lane is in tokyo [14:08:46] ohhh [14:08:48] :D [14:09:02] Vacation or just run away with a japaneese wife? [14:09:08] heh [14:09:09] we will not let you come back to USA till we have at least 2 japanese committing to our projects :-))))))) [14:09:15] is that for tourism or outreach ? [14:09:16] hahaha [14:09:25] vacation ; [14:09:28] ;) [14:09:28] cool [14:09:35] hashar, :D [14:09:42] I think my next holiday will either be iceland or the us hmmm [14:10:01] my next vacations will be cycling with wife and baby [14:10:06] Ryan_Lane: can I come join.. Im stuck in a deadend job at a school [14:10:10] cause planes just emit too much carbon [14:10:17] :D [14:10:21] Ryan_Lane: enjoy your vacations! [14:10:25] will do [14:10:38] oh, and Thanks for all the awesome work on mediawiki Ryan_Lane, I keep finding your code [14:10:46] Ryan_Lane: oh man, Tokyo was nice in that the subway and everything were made for people my height :-) [14:10:57] hahahaha [14:11:05] sumanah: I can see that :D [14:11:37] JRWR: totally welcome [14:11:41] Ryan_Lane: are you going to get a chance to take one of the bullet trains? that was cool [14:11:47] Is it ssmollett or sumanah that makes up for the small people in the channel? [14:11:53] I took one last time I was here [14:12:08] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3809 [14:12:08] Ryan_Lane: and there was a giiiiiiant Ferris Wheel out near some expo center, with transparent cars [14:12:10] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3809 [14:12:12] not for the acrophobic [14:12:12] this is 5th? 6th? time here or so [14:12:38] sumanah: in yokohama? osaka? [14:12:46] there's an amazingly huge one in osaka [14:12:51] Ryan_Lane: it was in Tokyo, the one I went to [14:13:18] oh. right. I think there's one near shibuya [14:14:18] Ryan_Lane: on my second day in Japan I was meeting someone and trying to say "nice to meet you" and I accidentally said "Irrashimase" [14:14:26] :D [14:14:48] My favorite tourism highlight was Ikebukuro Gyoza Stadium. But that's only because I /really/ care about dumplings. [14:14:52] No idea if that still exists. [14:15:03] * Ryan_Lane <3's dumplings [14:15:13] might be worth a trip then [14:15:20] other happy memories of Japan: onsens & the funiculars that take you from one to another; the restaurant supply district [14:15:28] best part of having a friend here is that I can go to places that don't have a picture menu [14:15:51] * sumanah will stop reminiscing now [14:16:03] The gyoza stadium is, I believe, downstairs from an all-creampuff foodcourt. [14:16:19] all creampuff foodcourt? awesome [14:17:24] * koolhead17 next trip has to b masai mara [14:17:27] Yeah the only bad thing about that part of the world is you might end up with dog for lunch [14:17:40] There are some business models that just require 20 million people to fly. [14:17:50] meeting. [14:18:24] Damianz: not in japan. maybe some other places [14:18:32] I think even in china you have to try for it [14:19:13] I think anyplace where they eat dog it's a luxury item, so you won't likely get it by mistake. [14:19:18] Hmmm, while I love food from japan/china I don't think I'd risk eating there. [14:19:34] it's so. fucking. good. [14:19:52] <^demon> Chinese food in China is awesome too. Beats the hell out of nasty westernized chinese food. [14:20:12] the only thing that worries me here right now is the radiation [14:20:25] Daniel_WMDE: puppet should be less broken now. Try deleting and recreating your instance? [14:20:58] Hmm radiation is probably ok is most of mainland japan. [14:21:30] Damianz: http://bogott.net/unspecified/?p=1247 [14:21:32] well, I was told it's best to not go outside when it's raining [14:22:07] :D [14:22:26] I wouldn't have a problem eating donkey, horse, or dot or cat, for that matter [14:22:36] as long as it's what I actually ordered [14:22:46] I'd eat it if it were put in front of me. But I don't think I'd order it. [14:23:28] o.0 [14:23:30] dog & cat seem gross just because eating carnivores is kind of gross. [14:23:41] yeah. if a friend ordered it, and gave it to me, I'd eat it. [14:23:50] Damianz: aren't you from a part of the world where eating horse is commonplace? [14:24:01] In salami maybe :P [14:24:14] Well, there you go. [14:24:16] My fav meat is Ostrich but I've only had it like twice. [14:24:37] There used to be a chain of burger places in MN that had emu burgers. Tasty! [14:25:01] I'm hungry, talking about all this food [14:25:09] I'm eating meatballs for lunch :D [14:25:22] Japan is suppose to have amazing streetfood. [14:25:45] I had a really great burger today topped with bacon, avacado, some kind of cream sauce, and a soft french cheese [14:26:18] mmm bacon [14:26:25] Bacon makes everything taste good [14:26:31] indeed [14:26:31] They put mayonnaise on everything in Japan, which grosses me out. (that's right, I find mayonnaise grosser than horsemeat. *shrug*) [14:26:35] :D [14:26:43] Yeah I don't like mayonnaise [14:27:09] Raw eggs and nucluer explosions probably don't go well togther. [14:27:32] Really anything + nuclear explosions... [14:28:09] mayonnaise is wonderful [14:28:18] and sauces made with it are equally as wonderful [14:28:41] hmm. I want a beer [14:28:48] * Ryan_Lane goes to the vending machine outside [14:29:50] ^^ most awesome thing about japan [14:31:34] I fancy a nice strawberry cider actually, it's a rather sunny day [14:31:47] mmm. beer [14:31:57] I didn't see strawberry cider [14:32:15] they do have that at the convenience store around the corner, though [14:32:45] or well, strawberry something alcohol related, anyway [14:32:54] Damianz: where are you again? [14:33:00] UK. [14:33:10] it's not cold for once? :D [14:33:22] It's been roasting since like friday. [14:33:39] that's pretty great [14:34:28] Well apart from there being a load of people cutting their grass which my breathing objects too yes it has. [14:34:41] Suppose to tail off over this week but as long as it's not raining at the weekend I don't mind., [15:14:54] PROBLEM Disk Space is now: WARNING on wikistream-1 wikistream-1 output: DISK WARNING - free space: / 78 MB (5% inode=47%): [15:24:04] RECOVERY Total Processes is now: OK on swift-fe1 swift-fe1 output: PROCS OK: 80 processes [15:24:14] RECOVERY dpkg-check is now: OK on swift-fe1 swift-fe1 output: All packages OK [15:25:44] RECOVERY Current Load is now: OK on swift-fe1 swift-fe1 output: OK - load average: 0.08, 0.13, 0.05 [15:25:54] RECOVERY Current Users is now: OK on swift-fe1 swift-fe1 output: USERS OK - 0 users currently logged in [15:25:54] RECOVERY dpkg-check is now: OK on deployment-nfs-memc deployment-nfs-memc output: All packages OK [15:26:04] RECOVERY Disk Space is now: OK on swift-fe1 swift-fe1 output: DISK OK [15:26:14] PROBLEM Puppet freshness is now: CRITICAL on aggregator1 aggregator1 output: Puppet has not run in the last 10 hours [15:26:14] PROBLEM Puppet freshness is now: CRITICAL on analytics analytics output: Puppet has not run in the last 10 hours [15:26:14] PROBLEM Puppet freshness is now: CRITICAL on asher1 asher1 output: Puppet has not run in the last 10 hours [15:26:14] PROBLEM Puppet freshness is now: CRITICAL on backport backport output: Puppet has not run in the last 10 hours [15:26:14] PROBLEM Puppet freshness is now: CRITICAL on bastion-restricted1 bastion-restricted1 output: Puppet has not run in the last 10 hours [15:26:14] PROBLEM Puppet freshness is now: CRITICAL on bastion1 bastion1 output: Puppet has not run in the last 10 hours [15:26:14] PROBLEM Puppet freshness is now: CRITICAL on bob bob output: Puppet has not run in the last 10 hours [15:29:37] !log nagios petrb: tweaked irc bot [15:31:54] PROBLEM host: deployment-web5 is DOWN address: deployment-web5 CRITICAL - Host Unreachable (deployment-web5) [16:11:18] * andrewbogott goes to dentist & lunch [16:17:54] I've skipped lunbch - and you have made me hungry [18:08:24] tewwy: what's the public facing URL of beta? I'm curious to look at the ganglia graphs for the host. [18:09:11] ssmollett: would you object to me putting a link to http://ganglia.wmflabs.org on https://labsconsole.wikimedia.org/wiki/Main_Page? [18:09:14] maplebed: http://labs.wikimedia.beta.wmflabs.org/ ? [18:09:22] sumanah: that's what I was looking for. [18:09:23] thanks. [18:09:27] maplebed: sure [18:09:30] glad to help [18:11:29] andrewbogott: do you know how I can get the instance ID from the publicly-visible hostname/IP? [18:12:09] maplebed: You mean programatically, or with your eyeballs? [18:12:15] eyeballs. [18:13:08] The 2nd column on the 'instance list' page has the guid. [18:13:11] Is that what you want? [18:13:26] I've got the name beta.wmflabs.org and the IP 208.80.153.219. [18:13:37] I don't see either of those in the instance list (that I have access to) [18:14:09] but my target is the instance ID, yeah. the i-0000000xxx string. [18:15:56] maplebed: Your instance list doesn't look like this? http://bogott.net/misc/instancelist.png [18:16:26] it does, but nowhere in that list is the word 'beta' or the IP 208.80.... [18:16:41] Oh, I see. So it's not the column that's missing but the row. [18:16:51] likely. [18:16:56] though all the IPs I see are internal, [18:16:57] Is that maybe because you're not a membed of the project that contains that instance? [18:17:01] *member [18:17:07] so I wouldn't be surprised if the 208 IP isn't listed even if the row were present. [18:17:20] that's certainly likely. [18:17:30] do I have to be a member of the project to map IP address to instance name? [18:17:45] Oh, wait -- that's a public ip that's mapped to a private one probably. Hang on... [18:18:41] Well, labs console has decided to not let me log in at the moment. But, does the 'manage addresses' tab have an entry for that ip? [18:19:31] still waiting to load. [18:19:39] but it probably does. [18:19:43] I think that was the step I was missing. [18:20:27] You can also get all this info via the commandline on nova0. But that seems like needless effort. [18:20:37] hey, it loaded! [18:21:14] but no, the IP is not in that list (because I'm not in the project) [18:21:36] ok. So in that case I can look it up for you, or tell you how to use the commandline, or you can add yourself to the project. [18:21:51] I don't know which project it is. [18:21:56] Or I could waste even more time by giving you additional useless choices. [18:21:59] OK, I will dig [18:22:04] because I can't look into the project to see which instance it is [18:22:25] this seems like a flaw. [18:22:47] but only for people that want to help with a project that aren't already in it, I suppose. [18:26:09] It is i-000000dc (deployment-prep) [18:26:16] Which I presume is in the deployment-prep project. [18:27:31] maplebed: Do you care to know where that came from, or happy to take it as revealed knowledge for now? [18:27:40] what I really wanted to load was http://ganglia.wmflabs.org/latest/?c=ganglia&h=i-000000dc.pmtpa.wmflabs&m=load_one&r=hour&s=by%20name&hc=4&mc=2 [18:28:01] but the host is "down" (no data recvd within the last 5m) so ... ::sigh:: [18:28:34] I think there have been issues lately with that host getting crawled by google and becoming deeply overloaded. [18:28:38] Not sure if that's what you're seeing now. [18:29:44] I'm ok with accepting the name as coming from the gods, and I also think it would be useful to have a page on how to gain insight into your instance, which would, by definition, include things like how to find the ganglia graphs for your instance. [18:30:15] It'd probably be useful to have a separate page for ops on 'how to find info on someone else's instance' [18:30:33] so that when folks come over to me and complain I can at least find out if there's something obvious broken. [18:30:50] (that was the source of this investigation this morning. [18:30:51] ) [18:31:53] This http://wikitech.wikimedia.org/view/OpenStack is the right place for the second thing. [18:32:28] As for user information about instances... we should just add links to every possible useless thing to the instance page. [18:32:36] You could make an RT bug with a wishlist for that. [18:33:08] um... 'every possible useful thing' :) [18:33:37] I will be probably be redesigning that page sometime soon. [18:34:21] RT-2708 [18:38:48] maplebed: You have what you need, for the moment? [18:39:06] I think so. thank you, [18:39:24] I didn't succeed in finding out what I had hoped to, [18:39:27] but I think it's time to move on. [18:39:29] :P [18:40:01] I am ill-equipped to understand those ganglia graphs. [18:40:20] But it's easy enough to get you a login on that server if you want to poke about. [18:40:35] I'll pass. I need to get swift working. [19:24:34] New patchset: Bhartshorne; "adding stanza for labs swift role, adjusting ganglia stats collector to spoof the cluster name instead of always pmtpa prod" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3816 [19:24:46] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3816 [19:32:24] New patchset: Bhartshorne; "adding stanza for labs swift role, adjusting ganglia stats collector to spoof the cluster name instead of always pmtpa prod" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3816 [19:32:36] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3816 [19:32:45] New review: Bhartshorne; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3816 [19:32:47] Change merged: Bhartshorne; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3816 [19:39:59] andrewbogott: when adding puppet groups, what does 'group position' mean? [19:44:40] oh. seems I misunderstood what a puppet group was. [19:45:10] maplebed: I'm not sure, but I expect it just determines the order of entries in the gui. [19:47:56] yeah, it seems like it's one of those "you have to enter a value but it doesn't actually matter much." [19:50:05] andrewbogott: do you know how to fix this error: err: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate definition: Sshkey[10.4.0.2] is already defined in file /etc/puppet/manifests/ssh.pp at line 36; cannot redefine at /etc/puppet/manifests/ssh.pp:36 on node i-000001d2.pmtpa.wmflabs [19:50:27] you got that from doing a puppetd -tv? [19:50:32] yeah. [19:50:39] When I get that message I just count to 20 and try again. [19:51:00] Which usually works, hence I have never tried to understand what it means :( [19:51:12] hm. [19:51:21] trying again I got a different error, but they seem unrelated. [19:51:27] err: Could not retrieve catalog from remote server: Error 400 on SERVER: value is a required option for Puppet::Parser::Resource::Param at /etc/puppet/manifests/iptables.pp:142 on node i-000001d2.pmtpa.wmflabs [19:51:41] maybe I should count to 30. [19:51:44] :P [19:52:14] The second thing looks like it might be an actual error. The first one surely isn't. [19:52:47] ok, I'll look into the second one. [19:52:48] thanks. [19:53:02] well... at this point it's safe to assume that I don't know any more than you do. [19:53:20] knowing how little I know, I would never assume such a thing. [19:53:39] I tend to assume that when I get a puppet error right at the beginning of a run it's because I'm stepping on the tail of an automatic run which isn't totally cleaned up yet. [19:54:02] Although in theory there is an official correct error message for that. [19:54:31] Is this email from Terry the same issue you were looking at early and set aside? [19:54:41] yeah. [19:54:51] I may or may not have thrown you under the bus. [19:55:15] It's ok, I'll have a look and then declare ignorance. [19:55:22] ;) [19:58:35] New patchset: Bhartshorne; "don't configure iptables in labs for swift" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3820 [19:58:47] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3820 [19:59:06] New review: Bhartshorne; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3820 [19:59:09] Change merged: Bhartshorne; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3820 [20:07:16] 03/27/2012 - 20:07:16 - Creating a home directory for andrew at /export/home/deployment-prep/andrew [20:08:14] 03/27/2012 - 20:08:14 - Updating keys for andrew [20:12:50] maplebed: where did you find that ganglia page? [20:24:45] Ryan_Lane: not really there, are you? [20:33:43] andrewbogott: he's marked as /away and has been idle for 5 hours, 59 minutes, 50 second so unlikely [20:33:56] I know, just fishing. [20:38:29] andrewbogott: in the puppet configs... ;) [20:38:42] it's what ssmollett has ben working on. [20:38:46] ugh, ok. [20:38:56] Did you get puppet to run, ever? [20:39:17] I got further than I did before. [20:39:30] now I have an expected error (I didn't check in a file that I need) [20:40:51] but (re ganglia) I only needed the puppet configs to tell me the URL - ganglia.wmflabs.org [20:41:12] from there it's just clicky clicky. [20:49:44] maplebed: Ganglia is marking this host as yellow which I assume means 'troubled'. But the graphs don't look so bad at the moment, do they? http://ganglia.wmflabs.org/latest/?c=ganglia&h=i-000000d0.pmtpa.wmflabs&m=load_one&r=hour&s=by%20name&hc=4&mc=2 [20:49:53] That's the SQL server for the beta cluster, btw. [20:52:13] which, come to think of it, I guess only the squid box should be in the game since I keep hitting the same page over and over. [20:54:47] New patchset: Bhartshorne; "committing placeholder ring files so puppet will be happy." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3827 [20:54:59] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3827 [20:55:31] andrewbogott: color in the overall view indicates "load". [20:55:45] (actually the load_one metric as a percentage of the number of CPUs) [20:56:37] I'm not confident the ganglia stuff is actually working right though; I didn't see changes in my instance that I thought I should. [20:56:47] (let alone the copious gaps) [20:57:10] The squid box is the one which ganglia reports as 'down'. [20:57:12] also, I saw one of my instances reporting a bunch of iowait that wasn't reflected in 'top'. [20:57:19] So that's suspicious but not especially useful. [20:58:09] I'm pretty sure ssmollett is only part way through actually implementing the ganglia service. [20:58:13] * andrewbogott nods [20:59:18] I'm tempted to just reboot the squid box. Being a caveman, that's the only intervention I understand. [20:59:35] It doesn't look at all sick, though. [21:00:59] andrewbogott: Being a caveman, you should also understand percussive maintenance. :-) [21:01:19] New review: Bhartshorne; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3827 [21:01:22] Change merged: Bhartshorne; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3827 [21:01:27] A.k.a. Hit machine with heavy object until it starts working or stops working for good. :-) [21:01:49] I have fixed surprisingly many hard drives by whacking and/or shaking them. But things weren't so microscopic back in the day. [21:02:14] I spent a sad afternoon trying to twist a hard drive /just right/ and just the right time to make it spin up. [21:07:41] andrewbogott: is there a bugzilla section to submit bugs against the labsconsole or should I put in RTs? [21:07:59] RT I think. But I don't know for sure. [21:08:10] k. [21:08:33] I want the instance name in the 'configure instance' screen somewhere. [21:11:32] me too! [21:11:57] When I'm trying to duplicate a config I always forget which I'm copying from and which I'm copying too... very dangerous. [21:18:58] RT-2710 [21:26:33] New patchset: Bhartshorne; "correcting forgotten number typo" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3829 [21:26:45] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3829 [21:27:09] New review: Bhartshorne; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3829 [21:27:12] Change merged: Bhartshorne; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3829 [21:53:49] New patchset: Bhartshorne; "replacing placeholder ring files with the real thing" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3832 [21:54:01] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3832 [21:57:45] New review: Bhartshorne; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3832 [21:57:47] Change merged: Bhartshorne; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3832 [22:22:24] andrewbogott: do you know if it will cause Bad Things if I reformat /mnt/ as xfs on an instance? [22:23:03] Not sure. [22:23:22] I don't know if the filesystem is virtualized or just running natively, for one thing. [22:23:27] I think I remember ryan telling me to reformat it with no problem [22:30:32] I wouldn't expect it to break anything outside of that instance, anyway. [22:32:30] * maplebed goes to try. [22:41:52] RT-2712: make labs instance's /var/log/ readable by default [22:43:25] New patchset: Bhartshorne; "changing backend for labs' swift instance from inaccessible ms5 to reachable upload." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3836 [22:43:37] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3836 [22:43:41] New review: Bhartshorne; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3836 [22:43:44] Change merged: Bhartshorne; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3836 [22:49:57] New patchset: Bhartshorne; "turn off swift container sharding for labs" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3837 [22:50:09] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3837 [22:50:17] New review: Bhartshorne; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3837 [22:50:19] Change merged: Bhartshorne; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3837 [22:55:56] maplebed, do you know anything about deploying mediawiki? [22:56:01] nope. [22:56:22] is Reedy here? [22:56:24] he does. [22:57:09] Ohai? [22:57:43] Reedy, have you had anything to do with the beta cluster? [22:57:59] Not really [22:58:14] I'm getting a message that I should run update.php, which I am trying to do. [22:58:25] But it seems to want sql to be hosted locally, which it isn't. [22:58:32] In production we most certainly don't do that [22:58:39] I still can't actually log into labs bastion [22:58:41] * Reedy whistles [22:58:57] most certainly don't do what? run update.php? [22:59:00] Yup [22:59:07] What's it wanting you to run it for? What's the error? [22:59:17] So... in that case, how do I get the schema up to date for the latest mediawiki? [22:59:59] Reedy: http://pastebin.com/Pg4c6ygc [23:00:05] preceeded by a multi-minute load time [23:01:39] Oops, had to solve a capcha [23:02:04] You don't need to run update.php for that one... [23:02:45] ok then... what do I need to do? [23:03:07] That's usually a case of mysql being busy [23:03:18] People see them on WMF wikis, but they usually go away [23:03:32] Can you log onto the mysql server? [23:03:40] Well, it's strangely selective. I load a dozen page elements no problem, then that last little bit hangs... [23:03:42] yes, I can. [23:04:06] It does not look especially busy. [23:04:07] "There were no Nova credentials found for your user account. Please ask a Nova administrator to create credentials for you." [23:04:10] * Reedy frowns [23:04:21] Reedy: log out and log back in again. [23:05:15] Is it possible that I have no connection to the sql server at all, and everything that works is coming from the cache? [23:05:17] And it already has an assigned key... [23:06:58] I don't think squid etc are properly setup yet on deployment, so presumably you shouldn't be getting any full cached responses [23:07:03] might get cached objects from memcached etc [23:09:13] So, any idea what I can do to make the sql server shape up? [23:10:55] Restarting mysql would probably help for now [23:11:07] I wonder if there are quite a few long running transactions [23:12:37] New patchset: Bhartshorne; "updating labs config with the correct AUTH key for the mw:thumbnail account" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3839 [23:12:49] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3839 [23:13:03] New review: Bhartshorne; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3839 [23:13:05] Change merged: Bhartshorne; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3839 [23:14:08] 03/27/2012 - 23:14:07 - Updating keys for reedy [23:14:14] 03/27/2012 - 23:14:14 - Updating keys for reedy [23:14:17] 03/27/2012 - 23:14:17 - Updating keys for reedy [23:19:53] hot damn, my labs-running swift cluster warks! [23:22:47] Wiggle wiggle wiggle wiggle, yeah! [23:24:07] Reedy: well, it looks like mysql is crashing every 30 seconds or so. Fatal error: Can't open and lock privilege tables: Incorrect information in file: './mysql/host.frm' [23:24:21] So that would explain the poor performance, presuming I didn't somehow cause that problem during my investigations. [23:24:25] Ah, that can't be helping [23:24:33] Any idea what that's about? And/or where '.' is? [23:32:04] Presuambly it's /var/lib/mysql [23:32:18] * Reedy makes a note that he should probably fix his labs access in the near future [23:32:43] seems to be in /data/project/db/mysql [23:33:03] Do you have a labs account? [23:33:30] I have no idea what a hosts.frm is. It appears to be a binary file [23:34:16] Yeah, it's what mysql actually physically stores the data [23:35:03] Oh, that's no good. [23:35:04] 2012-03-28 00:34:35 Server refused our key [23:35:04] 2012-03-28 00:34:35 Disconnected: No supported authentication methods available (server sent: publickey) [23:35:27] Also need to create another ssh key for labs [23:39:02] ok, so you have a putty problem or something on your end? [23:40:56] Possibly, but this setup works fine for every other SSH server I use [23:43:59] Same key for git/gerrit also [23:44:03] Computers usck [23:44:54] What's your host os? [23:45:15] Hm... looks like host.* is very small, so it would be painless to recreate if I knew even one single thing about mysql. [23:46:43] Quick google suggests you just need to fix permissions [23:47:11] Again, if I knew a single thing about mysql... [23:47:40] You mean this, right? http://forums.mysql.com/read.php?10,172399,172572#msg-172572 [23:48:23] You should just need to stop mysql, chown the whole dir to the mysql user, and restart mysql [23:48:42] what are they currently? [23:48:57] some mysql and some root. [23:49:42] fixing them all to be mysql:mysql would be for starters [23:50:31] nope, exact same failure. [23:52:12] So the file might also be borked.. [23:53:35] yeah [23:54:28] More googling says backup form.* reinstall mysql/run /usr/bin/mysql_install_db [23:55:07] it'll screw up permissions though (grants need readding etc) [23:55:55] Do you know how to do that? [23:56:28] I mean, I know how to do the restore. but not how to fix things after [23:57:00] Will it let you login to the sql command line? It could be repairable [23:57:40] That's just a naked 'mysql' right? [23:57:56] (incidentally, the service mysql start has still not returned.) [23:57:56] depending on setup, yeah [23:58:01] should get mysql> [23:58:12] nah, it says ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' [23:59:11] which makes sense if mysql isn't running [23:59:29] yep