[02:45:24] Hi all. [02:45:48] I'm trying to fix my upstart config for a service I'm running on my instance [02:46:05] I'm currently having a "start on local-filesystems .." [02:46:37] but what I really want is a start as soon as the gluster stuff on /data/project becomes available [02:46:49] is there an event name for that? [02:48:18] is "start on mounted MOUNTPOINT=/data/project" safe? [09:51:02] i'd like to run a java based webapp on tool labs (*.war file). can anybody tell me how to set up tomcat on tool labs? [09:54:43] danielnaber: iirc Coren has set up a tomcat server for that [09:56:54] valhallasw: any idea where it is and how can i get my *.war file in there? [10:21:03] danielnaber: no, not really. Check with Coren -- the docs don't actually mention it at the moment :/ [14:40:48] Coren: Nikerabbit can't log in due to "Could not chdir to home directory /home/nikerabbit: Permission denied"; /home/nikerabbit is "drwx------ 2 root root" and contains only an empty .bashrc owned by root. It looks like the home-directory-set-up-thingy is broken. [14:42:03] scfc_de: That's part of PAM session setup. Try this: blow away the home and have him try again. [14:42:41] k [14:43:57] Nikerabbit: Could you try to log into tools-login.wmflabs.org? [14:44:32] The other (less fun) possibility is that this is a side effect of some of the recent LDAP changes made to accomodate eqiad. [14:46:10] That's why I asked you :-). We'll see. [14:47:19] scfc_de: I'll try [14:48:11] scfc_de: looks ok now [14:49:29] Nikerabbit: Okay; let's assume that was a one-time glitch. [15:11:23] scfc_de`: hope so [16:14:55] I'm having problems with glusterfs in parsoid-spof, "cannot access /data/project: Transport endpoint is not connected" Help? [16:15:22] marcoil: I'll have a look, just a moment.. [16:15:28] thanks! [16:15:57] marcoil, what is the project name? [16:16:41] andrewbogott: you mean the dir in /data/project? There are a few, one is parsoid-deploy, for example [16:17:38] Ah, the answer is 'visualeditor' [16:18:24] andrewbogott: I didn't know that :/ [16:18:34] andrewbogott, still in singapore or back in mpls? [16:19:11] marcoil: is your home dir working ok? [16:19:11] subbu: Singapore, for a few more weeks. [16:19:25] andrewbogott: yes [16:19:26] subbu: y'all don't have a car, right? So, no street parking hassles! [16:19:37] marcoil: ok. /data/project should be happier now. [16:19:42] andrewbogott, that is correct :) [16:21:30] i dont pay attention to all the snow emergency and park restriction announcements. [16:21:33] andrewbogott: it certainly is, thanks a lot! [16:22:33] I fear that my neighbors will be murdering each other over parking spaces before long [16:22:33] marcoil: np [16:26:33] Coren: a mediawiki dev question which you may know the answer to… https://gerrit.wikimedia.org/r/#/c/115185/ [16:26:59] that works except for a bit that tries to submit a job to the jobqueue. Which apparently doesn't exist for a maintenance script. Any idea what I need to do to allow that? [16:31:07] * Coren looks at it. [16:31:54] I honesly don't know; I've never fiddled with maintenance scripts before beyond DB maintenance. :-( [16:32:21] ok. I'm sure there's a one-line init I need to include… but i can't find a lot of examples that do things like this. [16:32:28] I could rewrite that job to not be a job, I suppose... [16:34:42] andrewbogott: One point of note; I note that virt1000 is occasionaly ridiculously slow. Artefact of its status that will self-fix once we use it "for real" as wikitech, or something to look into? [16:35:07] you mean web pages are slow? [16:35:13] Or the whole system? [16:36:06] andrewbogott: Web pages; getting the list of instances in a project somtimes takes >5min; just doing a configure on one often over 3min. Then, sometimes, it's reasonably fast like in pmtpa. [16:36:12] (It's happening now for me) [16:36:21] hm... [16:36:34] andrewbogott, Coren: Could one of you take shell right away for a test from https://wikitech.wikimedia.org/wiki/Special:UserRights/Tim_Landscheidt_(Test) (the "_(Test)" is *very* important :-))? [16:36:48] I'd like to think that that's related to Ryan's work but I don't actually know. Worth you looking into if you know where to begin [16:37:19] scfc_de: Bah. Test schmtest. :-) [16:37:31] scfc_de: (removed, I resisted the temptation to mess with you) [16:37:44] Re LDAP, there's still the, eh, virtual groups (?) Ryan talked about some time ago. [16:37:49] Coren: Thanks! [16:39:55] Okay, so that didn't remove /home/scfc-test and its contents. I wonder why shuaib's home directory recently got purged, but maybe he deleted that before his shell right was removed. [16:42:33] And regiving shell right and readding to Tools doesn't purge it either. So removing shell right seems to be non-destructive. [16:56:46] scfc_de: It's not intended to be. [17:27:24] andrewbogott: wait, this is in pmtpa? [17:27:38] because none of my work is affecting pmtpa yet [17:28:07] oh, the redis driver isn't in pmtpa? Doesn't it need to be in both places to keep in sync? [17:28:19] um… virt0, not virt2 [17:28:21] I haven't switched everything to it [17:28:35] nemo is saying labs might be down? [17:28:35] ok [17:28:51] hasn't really indicated why he thinks that [17:29:23] seems ok to me... [17:29:35] well, quick look at least [17:29:59] Nemo_bis: ^ [17:32:16] Ryan_Lane1: totally unrelated… I'm trying to write a migration script that mounts an instance's drive and mucks around a bit with the files inside. I see all different behaviors with the mount. Right now I have one that's hanging on access, sometimes qemu-nbd says 'disk not found' when the file is right there. [17:32:38] Is this somehow subject to when instances were created? Would you expect any variation in volume type? [17:32:44] Or is everything just 'disk' partition 1? [17:32:52] it varies based on the image [17:33:23] sometimes when you nbd connect, it'll be nbd0p1, for instance [17:34:06] is there any programatic way to predict? [17:34:34] connect the nbd and see if there's a partition or not :) [17:34:45] the thing that's worked for me (most of the time) is qemu-nbd --partition=1 -c /dev/nbd0 /path/to/volume [17:34:54] also, why are you changing things in the image's disk? [17:35:22] it may be easier to use salt on the instance before it's brought down [17:35:32] or is this when you move it over to eqiad? [17:35:38] I don't know if it's the right approach… trying to get puppet to run without manual intervention [17:36:02] you're doing it chroot'ed into the image, or something? [17:36:10] !log deployment-prep Rebooted deployment-cache-mobile01 - was impossible to log into it though Varnish still worked [17:36:12] Logged the message, Master [17:36:12] or are dealing with git? [17:36:45] Ryan_Lane1: the process is… 1) create an image in eqiad with the same image,flavor,secgroup as the old instance... [17:36:51] 2) halt both eqiad and tampa instances [17:37:07] 3) copy files from pmtpa instance into the instance dir of the eqiad instance [17:37:15] 4) mount eqiad instance disk, swizzle files [17:37:18] 5) boot [17:37:20] ah [17:37:23] * Ryan_Lane1 nods [17:37:33] 4) may be needed if puppet is ever to run again. I'm not sure. [17:37:42] if you have dhclient set the host name it may not be necessary for all of that [17:37:42] There are also resolv.conf issues [17:38:06] hm. I guess you do need puppet to run properly for nfs and such [17:38:26] yes, although it would be ok if it ran puppet still using virt0 as the master [17:38:34] well, only kind of [17:38:39] since the hostname would be wrong [17:38:42] but also the ID of the instance changes. [17:38:47] So it has to learn about that somehow [17:39:04] that's why I was recommending dhclient setting the hostname [17:39:05] * Coren just wasted 20 minutes trying to figure out why ssh rooot@tools-exec-01 didn't work. @*#&^$ [17:39:22] I think for that you need to write an ifup script? [17:39:54] ok… but that script would still need to get injected via a disk mount? Or can nova do that somehow? [17:42:20] I suppose if it works in pmtpa and eqiad it can be installed with puppet before migration [17:42:30] maybe that's the angle I should be working [17:47:53] !log deployment-prep Investigating a mobile bug, might cause intermittent problems [17:47:55] Logged the message, Master [18:24:26] Coren, can you reach bastion-eqiad.wmflabs.org? [18:27:00] andrewbogott: nope :( [18:27:09] andrewbogott: Not anymore. [18:27:46] andrewbogott: Probably not network, since tools-login-eqiad.wmflabs.org works. [18:28:40] andrewbogott: Also, I can't reach it from other instances either (as bastion1) so it looks dead. [18:28:47] ok [18:41:23] I just rebooted. We will see... [18:42:44] It doesn't look like the reboot took. [18:52:53] andrewbogott: Had you changed things on that box before it going away? Because now I'm worried a bit. [19:03:08] Coren: I was working on virt1001, but not doing anything to the vm [19:03:28] The VMs I have in tools all seem okay. [19:04:11] Wait, I lied. They /seemed/ okay. [19:04:35] I seem to have lost one. [19:09:16] Coren: I forced the unmount of some volumes that I was using as part of migrate. I can't think how that would matter unless there was an id collision [19:09:19] which seems unlikely [19:10:14] andrewbogott: One of my instance died. i-...3c tools-exec-01 [19:10:20] Same symptoms. [19:10:28] My others all seem up and happy. [19:13:26] Coren: I don't know what happened… I'm too sleepy to contribute anything useful right now I fear [19:13:52] ] [19:13:58] Ew. I understand. [19:18:32] andrewbogott: All the KVMs are dead on 1001 [19:18:55] huh [19:20:09] was there something that you can't access toolserver from labs? [19:21:08] Coren, did you try to nova-reboot the instances? [19:21:32] andrewbogott: Not yet. I just noticed. [19:22:08] giftpflanze: No; wget http://toolserver.org/ works for me. [19:24:17] wiki.toolserver.org times out for me [19:27:59] virt1000 seems dead-ish. You got some rms that are stuck on devices. [19:28:14] 1001* [19:28:37] I think I want to reboot it. [19:28:45] andrewbogott: See any issues with that? ^^ [19:28:53] Yeah, this is all connected… I was running a migration script which mounted some volumes and a couple of them hung [19:28:58] no, go ahead and reboot [19:33:30] giftpflanze: wiki.toolserver.org is located at WMF Tampa IIRC; so there might be firewall settings that prohibit accessing it from Labs. Coren, andrewbogott: Is this a setting in Labs or in the network? Target is 208.80.152.234. [19:34:27] *sigh* [19:45:54] wm-bot5 keeps flooding out, then comes back hours later and dumps the backlog :/ [19:48:07] andrewbogott: Rebooting virt1001 allowed me to reboot -exec-01; Ima try bastion [19:49:31] Coren: I will refrain from running my VM volume-mounting script for now, since that seems suspect. Although I don't know how... [19:50:04] andrewbogott: It might just have been a temporal coincidence, but yeah. Not for the day play while I'm setting up eqiad tools. :-) [19:50:43] It was very erratic anyway, probably need to find a puppet or salt solution instead [19:51:29] andrewbogott: That worked; bastion-eqiad is back. [19:51:47] strange but good [20:08:51] Coren: it seems NAT is the problem now? [20:33:53] mutante: Sorry, was putting out another fire. Lemme look at it for you. [20:35:29] mutante: ... you forgot to associate your IP with an instance. :-) [20:48:41] Coren: re.. i was away talking to Joel.. arg, yea, done and associated [20:48:53] Coren: now.. The certificate is not trusted because no issuer chain was provided. :p [20:49:02] i swear that worked too.. looking [20:59:33] !log wikistats restored disapeared IP and DNS entries [20:59:35] Logged the message, Master [20:59:43] !log wikistats disabling SSL, certificate has been revoked [20:59:44] Logged the message, Master [21:18:10] !log wikistats putting it behind web proxy instead [21:18:11] Logged the message, Master [21:25:07] !log wikistats giving back public IP [21:25:08] Logged the message, Master [23:21:23] !log Deployment-prep chowned /data/project/apache/common-local/php-master/extensions/.git/modules/MobileFrontend/* to mwdeploy:mwdeploy [23:21:23] Deployment-prep is not a valid project. [23:21:53] !log deployment-prep chowned /data/project/apache/common-local/php-master/extensions/.git/modules/MobileFrontend/* to mwdeploy:mwdeploy [23:21:54] Logged the message, Master [23:21:57] right... [23:25:38] !log deployment-prep recursively chowned extensions/MobileFrontend to mwdeploy:mwdeploy [23:25:39] Logged the message, Master [23:36:43] !log deployment-prep Rolled back [23:36:44] Logged the message, Master [23:49:59] any hints on using a desktop mysql client to hit a host in labs without opening up ports? [23:50:09] looks like I need to double hop ssh port forwarding [23:50:12] via the bastion [23:51:06] cajoel: Do you know https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Configuring_MySQL_Workbench ? [23:52:14] that talks about tools-login [23:52:17] (If you don't mean Tools, then replace tools-login.wmflabs.org with bastion.wmflabs.org & Co.) [23:52:26] and implies that mysql is open to anyone on tools [23:52:39] not sure I want 3306 open to all of labs.. [23:53:03] So your mysqld is only accessible from localhost? [23:53:08] right [23:53:13] Hmmm. [23:53:32] port forward it? [23:53:34] Linux? [23:53:35] I think I can double hop it... [23:53:37] yep [23:53:38] -L [23:54:18] but that just opens a port locally on bastion [23:54:22] not much better.. [23:54:24] I use ProxyCommand to "hide" the first hop (so "ssh tools-exec-01.pmtpa.wmflabs" works). If you have a similar setup, I think you only need the second hop. [23:54:48] interesting. [23:55:16] https://wikitech.wikimedia.org/wiki/Help:Getting_Started#Using_ProxyCommand [23:55:17] https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_instances_with_ProxyCommand_ssh_option_.28recommended.29 [23:56:08] second doc is much better.. :) [23:56:55] cajoel: btw, production uses mariadb instead of mysql [23:57:07] drop-in replacement though [23:57:07] that's ok for my uses [23:57:09] yep [23:57:29] with less Larry [23:57:33] yea [23:57:42] less annoying copyright messages in console client [23:58:14] I don't mind Oracle as much as I find it incomprehenssible that Larry 'owns' the island of Lanai.