[00:02:00] heh [00:11:38] Coren: around ? :) [01:33:19] addshore: Am now. What's up? [03:07:34] Coren, given that resolv.conf is messed up and hostname -d still returns pmtpa.wmflabs… what's a way for a migrated instance to notice that it's now in eqiad? [03:08:35] Well, if resolv.conf is /removed/ the dhclient will replace it with the info coming from DHCP which should be correct. [03:09:06] right -- I'm trying to write a first-boot script that removes it as soon as the instance notices that it's been migrated. [03:09:09] So it has to notice in the first place. [03:09:28] I guess I can do a regexp on ip, maybe... [03:09:37] seems fragile though [03:09:58] It's probably too late by the time first-boot runs. [03:10:07] Because that happens after networking is brought up. [03:10:18] You probably want to remove it just before it's shut down for migration. [03:11:26] after first boot can't we remove it and reboot? [03:11:30] wont' that accomplish the same? [03:11:48] I'd prefer a method that didn't scramble the original instance, in case we need to fall back. [03:13:03] Well, I suppose; that seems even /more/ brittle to me. [03:14:13] might be… I've gone through several possible solutions here :) [03:19:21] hm, ip doesn't work, it seems to still have a pmtpa ip [03:19:23] curious [03:20:54] Dafu? How could that be unless there is some static gunk in /etc/network? [03:22:47] no idea. [03:22:56] I'm going to try removing resolv.conf via salt, see if that changes behavior [03:24:58] Did you check that /etc/network/interfaces wasn't made static for some reason? [03:26:28] no… I'll look when I have an instance up again [03:33:54] Clearing resolv.conf before the move seems to have settled the domain/ip issues [03:34:01] let's see if I can automate that bit [04:11:50] resolv.conf got wierd in 12.04 [04:11:56] that whole resolvconf.d thing [04:55:45] any opinions on augeas? [06:06:01] We have some processes on tools-login "mono CVNBot.exe". Doesn't CVN have its own project? [08:42:22] !log deployment-prep Upgrading all varnishes. [08:42:24] Logged the message, Master [09:39:33] I'm trying to log in to tools-dev.wmflabs.org as described on [[Nova_Resource:Tools/Help]], but I'm getting an error telling me that I don't have access rights to /home/bjelleklang [09:48:27] Coren: looks like the home-directory issue was not one-off [09:49:34] .< [09:49:41] * Nemo_bis lost one eye [09:56:47] bjelleklang: Try again, please. [09:57:23] Nikerabbit: You're probably right; I'll file a bug. [09:59:51] scfc_de: No such file or directory [10:06:03] bjelleklang: What's the complete error message? [10:07:51] On login to the server I get "-bash: /home/bjelleklang/.bash_profile: Permission denied" [10:08:18] I see the WM Ascii logo and welcoming text, but I'm placed in the root dir (/) [10:08:41] cd home/bjelleklang gives me "-bash: cd: home/bjelleklang/: Permission denied" [10:10:02] ls -al in /home gives me "drwx------ 2 root root 28 Feb 25 10:06 bjelleklang" [10:14:07] I don't understand why it does that; I'll set up your home directory manually to get you going. One moment, please. [10:16:01] bjelleklang: Try again, please. [10:16:21] it works :9 [10:16:22] :) [10:37:41] petan: Could it be that wm-bot's ! expansions are off by one? If I "!tr abc" in #wikimedia-labs-requests, it outputs "https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/tr?action=formedit" & Co. [10:48:09] Sir_Lestaty: You read https://wikitech.wikimedia.org/wiki/User_talk:Sir_Lestaty_de_Lioncourt#Running_bots_on_tools-login.2C_etc. ? [10:49:52] ty ;) [16:19:20] bastion1 seems to be hanging for me [16:30:58] manybubbles: ACK. bastion2 and 3 work, though. andrewbogott_afk, Coren: bastion1 seems to be stuck. [16:31:13] scfc_de: Joy. [16:31:46] Yeah, it looks dead-ish. [16:33:24] * Coren kicks it. [16:34:47] Beating it up fixed it. [16:35:07] But /man/ gluster is slow-ass. [16:36:50] beting what, reboot key? [16:36:57] *beating [16:37:45] Ah, *and* gluster is also broken for /data/project [16:38:01] On bastion? [16:39:05] Yeah [16:39:23] Probaby why bastion1 went down in the first place. [16:39:35] Fix't [17:14:41] Coren: Do you have few minutes to slap deployment-scap.pmtpa.wmflabs and get it to actually mount /home via nfs? It's a new instance in the deployment-prep project [17:17:40] bd808: Did you add the nfs client role? [17:18:11] Ah, right, nevermind -- deployment-prep does so implicitly. [17:18:25] Normally, just a reboot after the first puppet cycles should do it. [17:18:35] I think I tried that [17:21:35] I brought an instance up last night and it had gluster. I added the nfs role and rebooted and it didn't mount /home (Unable to create and initialize directory '/home/bd808'). I deleted the instance last night. [17:22:00] Today I made a new instance with the same name. I came up and wouldn't even mount the gluster home [17:22:18] I added the nfs role in a fit of optimism and rebooted [17:22:32] It is now still not mounting home. [17:25:53] This seems to happen to me each time I make an instance in any project (deployment-prep, wikimania-support, logstash). I'm not sure if I'm doing something canonically wrong or if the automounter setup is just really fragile. [18:33:49] Coren, andrewbogott_afk: Tried to bring up yet another new instance (deployment-tin) in deployment-prep project. During initial puppet runs autofs switched from gluster to nfs and then failed to mount. Running automount in forground shows that labnfs is claiming that /deployment-prep/home doesn't exist. Ouput at https://gist.github.com/bd808/6839349fdfe187f6eb17 [18:34:57] How odd. [18:35:15] bd808: And yet it's mounted (and working) on the other instances? [18:35:47] Coren: Yes. I can log into deployment-bastion and see the mount [18:36:57] * bd808 blames pmtpa and gremlins [18:42:45] Oh, crapness. [18:43:00] Some of the eqiad fixes actually broke a bit in pmtpa. [18:45:12] bd808: Try ls /home on one of the new instances? [18:46:24] Coren: It's an empty directory [18:46:33] May need reboot. :-( [18:46:37] Do try though. [18:46:45] The exports should be okay now. [18:47:01] Ah. let me kick automount in the head again [18:47:50] That could also work, though autofs is about as reliable as an unpatched Windows XP on the 'net. [18:47:53] nfs down again? [18:47:55] local-liangent-php@tools-dev:~/mw/maintenance$ touch ~/test [18:47:55] touch: cannot touch `/data/project/liangent-php/test': Stale NFS file handle [18:48:23] Yeah, you're stuck with a broken mount, which autofs will not be able to fix. [18:48:55] hm so I should re-login? [18:49:21] liangent: Oh, wait, sorry. I confused you with bd808 for a sec. :-) [18:49:24] did someone just take beta labs out? "ForbiddenYou don't have permission to access /wiki/Special:UserLogin on this server." [18:49:40] that's from http://en.wikipedia.beta.wmflabs.org/wiki/Special:UserLogin [18:50:03] happened just now [18:50:03] Ooooh. Poopitude. [18:50:15] Coren: "Stale NFS file handle" from deployment-bastion now too. cc chrismcmahon [18:50:22] Coren: anyway I pressed ctrl+D several times then mosh again [18:50:29] Connection to tools-dev.wmflabs.org closed. [18:50:30] /usr/bin/mosh: Did not find mosh server startup message. [18:50:32] oh joy. all my Jenkins builds just kicked off :-) [18:50:45] Yeah; some eqiad stuff got misapplied in pmtpa. [18:50:53] * Coren fixes. [18:51:18] thanks Coren! [18:52:01] Coren: what did you do, All my bots died? [18:52:35] He's tried to fix an nfs issue for me and possibly made the nfs gods angry [18:53:45] bd808: yep everything crashed, https://tools.wmflabs.org/?status is down, and Ive had a half dozen bots die [18:54:09] bd808: Yeah, turns out that restarting a daemon when the wrong DC config got puppeted onto it is bad. [18:54:25] NFS /should/ be correctly back now. [18:54:43] https://tools.wmflabs.org/?status works for me now [18:54:44] But basically everything trying to use it in the interim will have gotten stale filehandles. [18:55:01] Coren: beta labs looks happier now, thanks [18:55:24] Coren: thanks [18:55:48] Sorry people, it's my fault for not checking that puppet didn't break /two/ things first. [18:56:11] Coren: deployment-tin is mounting /home after reboot. Thanks! [18:56:32] "Oh, look, puppet updated the wrong script. I shouldn't check that it also hasn't updated it's /configuration/ too!" Derp derp. [18:57:14] Feel free to tell people that the brief outage was all my fault but make up some heroically stupid story about how I did it [18:57:43] Well, it /is/ your fault. If you hadn't complained, things would have been limping along just fine! :-P [18:58:02] (And then bit me at a surprise point in the future) :-) [18:58:02] I know ;) [18:59:25] I seem to have a magically ability to spawn a new instance every time a minor issue like this has been introduced but not yet reported [19:03:16] legoktm, YuviPanda, aude if one of you is restarting the bot, !log [19:04:56] not currenlty [19:14:56] OK what the eff [19:19:43] It's not an error [19:20:24] Coren: Is there a way to figure out who's running commands in a tool account? [19:22:14] "who"? [19:22:33] You mean, amongst the maintainers? [19:22:37] Yeah [19:22:52] No really, unless the bot was set up to log it specifically. [19:23:01] Ugh [19:23:05] But I'm going to guess that the bot is ill; I've seen it do that before. [19:23:09] Yeah [19:24:19] It'd be good to figure out why though [19:24:27] is labs back? [19:24:29] rdwrer: ? [19:24:31] oh [19:24:53] YuviPanda: It *looks* like it's dying on every patchset pushed [19:24:56] oh [19:24:57] wat [19:31:16] I killed grrrit-wm [19:32:00] murderer [19:32:07] I know [19:32:12] but I've to go now :( [19:46:34] accusations :o [19:52:21] Things are SO much better without gluster. Or autofs. [20:11:15] zz_yuvipanda: rdwrer got the bot back [20:11:27] let's see if it is ok now [20:12:22] I didn't do anything [20:12:47] Oh, you meant "zz_yuvipanda, rdwrer: got the bot back [20:12:48] if it misbehaves, we can kill again [20:12:48] " [20:12:56] Punctuation is important [20:13:09] yeah [21:23:00] totally was not my fault [22:48:54] !log tools Lol, so, something happened with grrrit-wm earlier and nobody logged any of it. It was yoyoing, Yuvi killed it, then aude did something and now it's back. [22:48:56] Logged the message, Master [22:48:58] Yay logs.