[00:29:57] 3Wikimedia Labs / 3tools: jsub not installed on the queue machines - 10https://bugzilla.wikimedia.org/64988#c6 (10bgwhite) (In reply to Marc A. Pelletier from comment #5) Any status update on documentation? I can't use 'jlocal' as I keep getting: "/usr/bin/jlocal": No such file or directory [00:33:12] 3Wikimedia Labs / 3tools: jsub not installed on the queue machines - 10https://bugzilla.wikimedia.org/64988#c7 (10Betacommand) Where are you using it? on your crontab or your submitted jobs? [00:59:12] 3Wikimedia Labs / 3tools: jsub not installed on the queue machines - 10https://bugzilla.wikimedia.org/64988#c8 (10bgwhite) On queue machines and tools-login. On crontab and submitted jobs. [01:01:57] 3Wikimedia Labs / 3tools: jsub not installed on the queue machines - 10https://bugzilla.wikimedia.org/64988#c9 (10Betacommand) thats your problem, jlocal should only exist on the -submit host. you basically have a wrapper script you invoke that submits other jobs. if you use jlocal in your crontab to start t... [01:06:57] 3Wikimedia Labs / 3tools: jsub not installed on the queue machines - 10https://bugzilla.wikimedia.org/64988#c10 (10bgwhite) That still doesn't solve my original question. jlocal is not found anywhere. I can't use it on any host as it is not found. I can't use it on the submit host as it is not found. Wh... [01:08:41] 3Wikimedia Labs / 3tools: jsub not installed on the queue machines - 10https://bugzilla.wikimedia.org/64988#c11 (10Betacommand) jlocal is on the submit host, I used it daily and just got a email from a cron using it ~2 minutes ago [01:09:06] is bgwhite on IRC? [01:22:20] Betacommand: Seldomly. [01:22:48] All web tools are down? [01:22:56] YuviPanda: Awake? [01:23:02] scfc_de: yup [01:23:22] sshing [01:24:17] scfc_de: seems up? [01:24:25] scfc_de: although I was seeing '2014/05/20 01:23:52 [alert] 18840#0: 768 worker_connections are not enough' [01:24:28] a lot on it [01:25:29] I could have sworn ... ?! Now works for me as well. [01:25:41] scfc_de: I am sure there were difficulties. [01:25:44] scfc_de: I did restart nginx [01:31:56] scfc_de: looks like our nginx needs tuning [01:33:26] scfc_de: http://blog.martinfjordvald.com/2011/04/optimizing-nginx-for-high-traffic-loads/ is a nice read, I should spend some time on it tomorrow [01:36:40] YuviPanda: That sounds like fun, so I'll leave it for you :-). [01:36:46] scfc_de: :D [03:44:51] Hi there. Is there an admin around who could approve this? https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Nischayn22 I'm working with Nischay on a reference tool which will query OCLC's WorldCat API as part of The Wikipedia Library. [08:17:58] (03Abandoned) 10Hashar: Add usage() to take(1) [labs/toollabs] - 10https://gerrit.wikimedia.org/r/70058 (owner: 10Platonides) [08:18:01] (03Abandoned) 10Hashar: Make take non-recursive by default. [labs/toollabs] - 10https://gerrit.wikimedia.org/r/70059 (owner: 10Platonides) [08:18:15] (03Abandoned) 10Hashar: Implementing verbose messages. [labs/toollabs] - 10https://gerrit.wikimedia.org/r/70107 (owner: 10Platonides) [08:18:55] (03Abandoned) 10Hashar: WORK IN PROGRESS: Add robots.txt [labs/toollabs] - 10https://gerrit.wikimedia.org/r/77916 (owner: 10Tim Landscheidt) [09:07:55] 3Wikimedia Labs / 3tools: Where to put OSM hillshading tiles - 10https://bugzilla.wikimedia.org/65519 (10nosy) 3NEW p:3Unprio s:3normal a:3Marc A. Pelletier I synced the OSM hillshading tiles off Toolserver and its sitting currently in my home. I'd love to copy it into a production place but I have n... [09:10:09] 3Wikimedia Labs / 3tools: Where to put OSM hillshading tiles - 10https://bugzilla.wikimedia.org/65519#c1 (10nosy) Forgot to say its about 270GB of space needed. [09:23:23] 3Wikimedia Labs / 3tools: Copy contents of https://svn.toolserver.org/ to Wikimedia git - 10https://bugzilla.wikimedia.org/58801#c14 (10nosy) I guess I will put all the stuff I backup to Labs first in my home. If this gets unhandy just let me know. [09:24:35] Is there an issue with the web services? When reloading a page three times, I get 504 on http and a "Fehler: Datenübertragung unterbrochen" on HTTPS [09:24:58] https://tools.wmflabs.org/magnustools/ for example [09:25:48] through a proxy, I get 502 Bad Gateway nginx/1.0.15 [09:30:12] yes there is ... failures have been intermittent and are getting worse [09:31:29] I am unable to work with my wiki. [09:33:38] 3Wikimedia Labs / 3tools: Where to put OSM hillshading tiles - 10https://bugzilla.wikimedia.org/65519#c2 (10Alexandros Kosiaris) Already talked with nosy on IRC. The best place would be /data/project/tiles on the maps project VMs [09:46:23] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Roadmap en was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=1010132 edit summary: /* Schedule */ some updates on what is done [09:49:36] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Roadmap de was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=1010135 edit summary: /* Zeitplan */ Aktualisierungen:was ist erledigt? [10:26:14] hello [10:26:18] checking out the proxy now [10:26:47] proxy should be slightly better now. [10:26:51] it's just buckling under load [10:26:57] rillke: GerardM- ^ [10:27:36] thanks, yuvi [10:27:40] can you verify? [10:27:54] it needs some tuning, looks like. let me tweak some settings [10:28:30] yes, wiki loads now [10:28:38] great! [10:29:22] ok [10:36:37] YuviPanda: hey my proxy issue? [10:37:04] liangent: I couldn't figure out a way to check if upstream hasn't sent any data :( [10:37:16] and I also haven't had much time to look, between firefighting tools proxy and also android app :( [10:37:17] sorry! [10:37:57] YuviPanda: by checking content-length? [10:54:56] liangent: hmm, that doesn't sound bad. let me see if I can put up a custom error handler [10:55:00] but first fixing the proxy itself [10:55:58] thanks YuviPanda [12:37:50] so I have installed goaccess on the proxy [12:37:52] interesting logs [12:38:23] imagemapedit seems to be among the most requested things [12:38:39] YuviPanda: did traffic just coincidentally surge? Or does the new nginx version have worse performance somehow? [12:38:58] andrewbogott: I am unsure. Perhaps worse default config? [12:39:14] andrewbogott: I've a patch with some tuned config params [12:39:27] Are you still hopeful that you'll be able to speed things up? [12:39:49] ok [12:40:14] andrewbogott: https://gerrit.wikimedia.org/r/134328 [12:40:57] andrewbogott: if this doesn't work let's revert to the older version and let it be. I'll also work with Coren on setting up a test tools proxy that's an exact copy of the live one [12:43:38] 3Wikimedia Labs / 3tools: Move wiki.toolserver.org to WMF - 10https://bugzilla.wikimedia.org/60220#c26 (10Silke Meyer (WMDE)) Reedy, I would like to communicate a "freeze" date for the wiki before it can't be edited any more. When will you be bale to migrate it and so when could that deadline be? End of May?... [12:45:23] 3Wikimedia Labs / 3tools: Where to put OSM hillshading tiles - 10https://bugzilla.wikimedia.org/65519 (10nosy) 5NEW>3RES/FIX [12:47:04] YuviPanda: probably have to force an nginx restart after that change applies [12:47:14] andrewbogott: shouldn't the Notify take care of that? [12:47:27] I'm running puppet [12:47:30] Oh, you're right. [12:47:31] Yes it should [12:47:33] on the box [12:48:05] I updated the labs proxy already, seems fine. [12:48:15] Although there weren't perf issues there anyway [12:48:30] yeah [12:56:29] andrewbogott: seems stable now. [12:56:37] andrewbogott: thoughts on adding another proxy box and DNS load balancing between them? [12:56:43] should be trivial to do from the proxy side [12:56:51] YuviPanda: will you respond to the email thread with your thoughts? [12:57:05] andrewbogott: I will in about 5 mins. [12:57:06] YuviPanda: I certainly don't object to setting up another VM if you think it will help. [12:57:42] andrewbogott: I think the current patch itself will help, since there was no load spike in the VM, and also the error logs explicitly pointed out that nginx was running out of open FDs [12:58:28] That seems pretty definite! Seems weird though, it's not like our current use case is that extreme. [12:58:59] andrewbogott: true, but we do open extra FDs in redis connections, and proxying implies at least 2 fds per connection [12:59:11] so that's 3 FDs per connection. [12:59:36] hm, ok [12:59:47] andrewbogott: magnus reports it is down again, I can't repro it [13:00:10] I am tailing logs as well and got nothing [13:01:00] YuviPanda: ok, I emailed him asking him to join us on IRC [13:01:06] andrewbogott: ok! [13:01:44] andrewbogott: is there any way I can get notified when tools webproxy is down? [13:02:25] hey hedonil. is tools still down for you? is up for me and error logs are clean [13:02:44] YuviPanda: Doesn't icinga already check it? If not, it should. [13:02:49] YuviPanda: yep, it's down for newly restarted webservices [13:02:58] hedonil: ow. example? [13:03:05] http://tools.wmflabs.org/newwebtest/blame/ [13:03:17] hedonil: it shows me x's tools?! [13:03:45] YuviPanda: :-) right at moment... [13:03:56] hey Coren. I don't have a way for icinga to notify me in particular, and I don't know of labs' icinga status [13:03:59] hedonil: so it is up? :) [13:04:01] YuviPanda: there are some intermittend "hangs" [13:04:12] hedonil: yeah, that was probably the nginx restart to apply the patch [13:04:28] hedonil: That may be the webservice /itself/ running out of resources. The defaults are rather conservative. [13:05:04] Coren: hmm, rather no [13:05:15] hedonil: Checking the tool's error log should reveal this; lighttpd complains loudly when that happens. [13:05:40] That URI works for me. [13:05:54] * hedonil checks this nearly every second, as I'm testing things [13:06:27] YuviPanda: http://tools.wmflabs.org/catscan2 [13:06:30] YuviPanda: hangs [13:06:53] ... ah, I see issues as well. The connections regularily just 'sit there', and that's with a long-running webservice. [13:07:35] and this happed occasionaly with other tools, too [13:08:31] hmm, error logs still show nothing [13:08:40] I don't know if this is just the backend server time outing [13:09:39] it now redirects to http://tools.wmflabs.org/magnustools/ [13:09:42] is that accurate? [13:10:04] and http://tools.wmflabs.org/catscan2/catscan2.php works alright [13:10:37] access logs say things are fine, error logs don't indicate anything wrong [13:10:59] * YuviPanda considers downgrading to nginx 1.5 and just leaving it there. [13:11:49] YuviPanda: even if logs say no, users say yes :P [13:12:02] I'm not denying problems, hedonil. Just frustrated I can't diagnose them [13:12:19] YuviPanda: just kidding a bit [13:12:24] :) [13:12:29] hedonil: they seem to be up for me now. [13:12:41] * andrewbogott is reluctant to move back from an official release back to a home-made one :( [13:12:53] yeah, me too andrewbogott [13:13:42] YuviPanda: if you're in the middle of testing an app, and somthing hangs, you dig deep into shit (thinking it's your crapy code...) [13:14:09] hedonil: yeah, finding out that it is infrastructure and not your code can be frustrating [13:14:38] I'm tailing catscan's logs too, no errors there [13:15:15] YuviPanda: /right now/ it seems to be ok. [13:15:44] YuviPanda: but I'll cry loud if it happens again :-) [13:17:43] YuviPanda: o/: hangs. http://tools.wmflabs.org/newwebtest/ec/ [13:17:49] at the very moment [13:18:48] hedonil: what browser are you using, btw? hangs for me in FF but not in chrome [13:18:59] Coren: wasnt the new web supposed to be better? All Ive seen are headaches [13:19:03] loads just fine for me in ff :( [13:19:06] YuviPanda: chrome & firefox [13:19:46] Betacommand: It is, an it was, but it looks like the upgrade to the latest version has issues. [13:19:54] YuviPanda: but you're right, mostof the issues are in chrome [13:20:25] andrewbogott: interesting. I am getting a bunch of 499 response codes in the access logs [13:20:34] 499? [13:20:57] client closed connection [13:20:59] 'Used in Nginx logs to indicate when the connection has been closed by client while the server is still processing its request, making server unable to send a status code back.' [13:21:07] In this case is 'client' the browser or the backend service? [13:21:07] and there's a bunch of them [13:21:26] browser, IIRC [13:22:43] apparently caused if the server's timeout is longer than the browser's, and this would potentially mean that our backend connections are being saturated? [13:22:44] YuviPanda: that would happen if the user sees a 'hang' and gives up [13:22:57] So it could be a secondary symptom [13:23:03] true, but there are a *lot* of these [13:23:14] * Coren ponders. [13:23:16] tail -f /var/log/nginx/access.log | grep 499 gives me a LOT of them [13:23:25] ok, so probably not user behavior [13:23:31] yeah [13:23:39] YuviPanda: Have you looked in the redis logs to see if /it/ fails to return the right value in time? [13:23:41] Your patch just shortened the server timeout though, didn't it? [13:23:59] andrewbogott: that's the keepalive timeout [13:24:03] not the server timeout [13:24:07] oh, ok [13:24:23] Coren: redis logs are empty, and am doing a monitor and it seems fine [13:25:23] andrewbogott: I could downgrade and see if the 499s persist? [13:25:32] if they do the problem is elsewhere, if not then nginx bug. [13:25:40] sure. [13:25:45] Is this still 1.7? [13:25:50] andrewbogott: yup [13:25:57] I think you should try 1.6 first [13:26:00] since it's 'stable' [13:26:26] 1.7 is just 1.6 with one series of patches, but let me see if I can find a build [13:26:30] Unless, oh, was 1.6 unavailable? [13:26:39] andrewbogott: can't find debs inside http://nginx.org/packages/ubuntu/ [13:27:18] andrewbogott: nevermind, found 'em [13:27:29] andrewbogott: or not. no -extras 1.6 [13:27:37] so not useful [13:28:20] andrewbogott: found [13:28:21] https://launchpad.net/~nginx/+archive/stable/+packages [13:30:16] andrewbogott: moved to 1.6 [13:33:19] no drop in 499s. downgrading to 1.5 to check. [13:33:27] we also have no baselien for 499s from before, so this is frustrating. [13:34:43] aha! [13:34:58] I hadn't restarted nginx after the downgrade to 1.6, just did that. [13:35:23] 499s seem to have slowed down, but still around [13:35:43] hedonil: are things any better now? [13:36:48] * hedonil checks [13:37:36] 499s not tool specific either, see some for / [13:39:13] andrewbogott: YuviPanda: "hang" rate dropped to ~ 1:50. A good ratio for today ,-) [13:39:28] awww / lol / :'( [13:39:28] :) [13:39:52] let me test something [13:41:35] just disabled spdy to test [13:41:36] Hi folks, I'm trying to add Nischayn22 to the local-wikipedia-library-reference tool group but it's 'failing to add'. not sure what's going on, could someone take a look? https://wikitech.wikimedia.org/w/index.php?title=Special:NovaServiceGroup&action=managemembers&projectname=tools&servicegroupname=local-wikipedia-library-reference&returnto=Special%3ANovaServiceGroup [13:42:49] Ocaasi: I'm looking. [13:42:55] andrewbogott: thanks! [13:43:05] YuviPanda: Is the automatic compression related to 1.5/1.6/1.7? [13:43:16] gzip? don't think so [13:43:23] are you facing issues with that? [13:44:12] No, I'm just wondering what may be different in the different versions, and if nginx would have to compress all streams in one version, but in another not, that would cause a lot of load I assume. [13:44:12] andrewbogott: hmm, 499s aren't a new phenomenon. they exist in the archived logs as well [13:45:02] So, red herring :( [13:46:11] possibly. [13:46:43] hedonil: how's it holding up on your end? any more hung connections? [13:47:09] !ping [13:47:09] !pong [13:47:09] YuviPanda: looks good, atm. knock on wood [13:48:49] Ocaasi: I see the problem too, there's clearly a bug. I'm about to get called away, though, do you mind creating a bugzilla bug for this and assigning it to me? [13:49:06] will do, thanks andrew [13:49:47] could it have to do with this tool being named wikipedia-library-reference and there already being a wikipedia-library named tool? [13:50:14] Ocaasi: I believe that... [13:50:31] yes, that's what I was going to say :) I just set up a test with two groups where one's name was a subset of the other... [13:50:33] same behavior. [13:50:49] ok, will go on that hunch and create a new tool with unique name [13:51:12] andrewbogott: thanks again! [13:51:51] thanks, sorry for the inconvenience [13:52:00] andrewbogott: Coren the 499 might be a red herring, and things *might* be alright. I've no way to tell. I might just wait for a bit to see if hedonil complains again [13:52:07] I am also going to setup curl in a loop [13:52:24] no problem. could an admin please remove local-wikipedia-library-reference from the tools service group list? the name is causing bugs. [13:52:54] YuviPanda: I haven't seen the issue again yet, fwiw [13:53:09] intermittent issues are the worst. [13:53:20] Ocaasi: done [13:53:29] andrewbogott: gracias, sir [13:55:06] andrewbogott: feel free to leave this for someone else, but it's also failing for local-oclc-reference [13:56:45] well… that one I cannot explain :( [13:57:53] :( [14:03:55] 3Wikimedia Labs / 3tools: Failed to set group members for local-oclc-reference - 10https://bugzilla.wikimedia.org/65534 (10Ocaasi) 3UNC p:3Unprio s:3normal a:3Marc A. Pelletier From here: https://wikitech.wikimedia.org/wiki/Special:NovaServiceGroup Try to add a member to a group. Even though there... [14:05:31] Coren: andrewbogott hedonil things seem stable now [14:05:39] 3Wikimedia Labs / 3tools: Failed to set group members for local-oclc-reference - 10https://bugzilla.wikimedia.org/65534 (10Ocaasi) [14:05:39] 3Wikimedia Labs / 3tools: Failed to set group members for local-oclc-reference - 10https://bugzilla.wikimedia.org/65534 (10Ocaasi) [14:05:42] YuviPanda: is that with 1.6? [14:05:55] YuviPanda: ack [14:06:29] andrewbogott: https://bugzilla.wikimedia.org/show_bug.cgi?id=65534 (i'm not sure how to change the assignment to you) [14:06:39] 3Wikimedia Labs / 3tools: Failed to set group members for local-oclc-reference - 10https://bugzilla.wikimedia.org/65534 (10Ocaasi) [14:06:44] but i added you to the cc list [14:06:54] 3Wikimedia Labs / 3tools: Failed to set group members for local-oclc-reference - 10https://bugzilla.wikimedia.org/65534 (10Andrew Bogott) a:5Marc A. Pelletier>3Andrew Bogott [14:07:41] hedonil: andrewbogott Coren https://dpaste.de/3Wx4 is a script I'm running that consistently hits tools and checks for non 200s [14:08:14] That seems heavy-handed. [14:27:48] hello [14:28:08] is it possible to change shell account username? [14:30:48] Coren: hehe :) [14:31:03] Coren: been trying to make it a habit to use python instead of bash wherever [14:32:44] mgrabovsky: It is not especially possible. If you're truly desperate you can create a new account with the name you want. [14:33:23] andrewbogott: there's a twist, though, I think it may have to do with SVN [14:33:56] andrewbogott: I used to have commit access with my nick, and when I tried to use that when registering on Wikitech, I wasn't able to use it [14:34:13] so I used a different one that worked [14:34:38] andrewbogott: I got the 'There was either an authentication database error or you are not allowed to update your external account.' error [14:35:24] it might be your old svn account was already migrated to labs. What was your svn name? [14:35:41] andrewbogott: same as my nick here, mgrabovsky [14:36:22] there is a labs account with that name; it already has a key registered as well. [14:37:10] oh, is it possible to link my account on Wikitech to that one? [14:37:49] I didn't know a kind of migration had taken place, I was gone for too long it seems [14:38:39] hm, I'm not sure. I need to go in a minute but I will think about it -- can you send me an email with all the various vitals? your wikitech name, your old shell name, your new shell name, etc? [14:38:59] andrewbogott: what's your email or a way to contact you? [14:39:07] abogott@wikimedia.org [14:39:26] thanks, I'll get in touch [15:10:07] andrewbogott: Coren hedonil seems stable now? 499s still happening, but my script that hits tools seems happy so far. I'm inclined to just leave it as it is [15:10:31] It's not breaking as badly, at the very list. [15:10:40] oh? [15:10:59] Coren: where? [15:12:29] YuviPanda: works like clockwork \o [15:12:45] Coren: oh, you meant it's *not* breaking as badly, I missed the not [15:13:00] Yeah, that's what I meant. :-P [15:16:04] Coren: :) [15:16:11] Coren: I'll write up a report to labs-l shortly [15:33:34] the homepage of my webpage is index.html. can i somehow change it to index.py. ie. anyone who visits http://tools.wmflabs.org/mytool gets to see index.py instead of index.html. [15:34:20] rohit-dua: which webserver? apache or lighttpd , behind proxy or not [15:34:34] oh, you answered that already kind of [15:34:38] saying it's tools [15:34:45] yes no proxy [15:35:05] index-file.names = ( "index.html" ) [15:35:08] something like that [15:35:11] and change the index.html [15:35:43] http://redmine.lighttpd.net/projects/1/wiki/Index-file-names_Details [15:36:13] mutante: but where do i change index-file.names [15:36:21] inside tool labs [15:38:30] rohit-dua: i _think_ on tools-webgrid-01, but only from looking at where scfc_de changed related things [15:40:50] andrewbogott: Coren I've never really written 'outage reports' of any sort, so the email I just sent might be a bit weird. do let me know if there's any more / less info I could add [15:42:08] rohit-dua: Web server settings can be changed in a file named .lighttpd.conf in your tool's home. [15:42:38] YuviPanda: looks good to me, thank you. [15:42:42] Coren: i do not get any .lighttpd when i ls -l in my home directory.. [15:42:53] ls-a [15:43:14] andrewbogott: :) I still need to setup monitoring, though [15:43:22] rohit-dua: There is none by default, [15:43:41] so i create one and add the default settings? [15:43:41] andrewbogott: is there a way I can get something to email or some other way of notifying me when this goes down? [15:44:15] rohit-dua: You can simply add it with any directives you want. https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Configuring_the_web_server [15:44:31] YuviPanda: Probably, but I don't immediately know what the best approach is. [15:44:44] !log deployment-prep Deployed scap 7b6fc47 via trebuchet [15:44:44] Coren: thank you :-) [15:44:54] rohit-dua: In particular, you probably only want one line in it: [15:44:57] index-file.names += ("index.py") [15:45:06] !ping [15:45:06] !pong [15:45:42] Coren: so what would happen if both index.html and .py are present? [15:46:24] rohit-dua: ... I'm not sure. It certainly will pick one but I wouldn't rely on which. [15:46:30] @seen morebots [15:46:30] bd808: Last time I saw morebots they were talking in the channel, they are still in the channel #wikimedia-operations at 5/20/2014 3:15:40 PM (30m50s ago) [15:46:50] (I.e.: I'd make sure that you have only one or the other) [15:47:11] morebots, everything ok? [15:47:19] bd808: I'll restart it [15:47:36] andrewbogott: ty [15:49:02] labs-morebots, feeling better [15:49:03] I am a logbot running on tools-exec-08. [15:49:03] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [15:49:03] To log a message, type !log . [15:49:19] !log testlabs this test message is testing the message test [15:49:21] Logged the message, dummy [15:53:55] !log deployment-prep Deployed scap 7b6fc47 via trebuchet [15:53:57] Logged the message, Master [16:08:39] YuviPanda: got some new hangs (few ~ 1/hour), made a pic of status [16:08:43] YuviPanda: http://tools.wmflabs.org/tools-info/misc/proxy-hang.png [16:09:05] Any idea why my webservice could be ceasing to work every once in a while? [16:09:08] An atomatic restart would be nice [16:09:13] auto [16:09:36] otherwise it is borderline useless [16:10:01] I cannot even set up a heartbeat script, because all cron jobs are forced to be run on the worker nodes [16:10:15] (from where I apparently cannot restart the webservice) [16:10:25] Am I missing something obvious here? [16:11:09] YuviPanda: interesting thing is, that on reload /one/ resource load randomly fails [16:12:06] YuviPanda: something with http keep alive request settings here ? [16:14:52] dschwen: you need MORE magic with your script :P [16:15:26] dschwen: I provide you a webwatcher script (a bestseller atm ) [16:15:35] dschwen: https://tools.wmflabs.org/paste/view/fde97e1a [16:17:06] Coren: I'd suggest to create a continuos queue for webservice, would be the easiest solution, I think [16:29:38] dschwen: here is version for vanilla (non-tweaked) webservices ;) https://tools.wmflabs.org/paste/view/5d976bb0 [16:33:27] dschwen: What's the tool's name? [16:46:02] dschwen: Thing is, it's really abnormal for lighttpd to die and it doesn't seem wise to me to restart it unconditionally. How do your lighttpds die? (There should be at least some hint in the error.log) [16:48:17] Coren: it doesn't seem to be abnormal, though, as there are often enough people complaining about tools that are down. [16:48:37] Coren: one of the reasons seems to be out of memory-issues [16:50:00] valhallasw: The hard limit on lighttpds is at 4G(!); if one of them manages to hit that limit I really *want* it stopped. :-) But my point is, it's really not supposed to be possible for lighttpd to exit without an explicit kill or something really bad happening (like lighttpd *crashing*) so any instance of a webservice being down really needs to be looked into. [16:51:04] Coren: betacommand has reported OOM issues with cgi-bin python scripts [16:51:34] Coren: well, there are some reasons. One is std config. I'll provide a analysis on lighty in a few days. [16:51:53] and sure, stopping it when it hits 4G is reasonable enough, but restarting the damn server makes much more sense than leaving it for dead [16:51:59] * hedonil became a lighty blackbelt :P [16:52:18] hedonil: But then, if lighty ends because of a broken config you don't want to restart it with that same config. [16:53:01] valhallasw: The grid will never restart a process *it* killed, only one that dies on its own. [16:53:15] Coren: I don't care what 'the grid' does, I care what 'tool labs' does [16:53:24] and I care about a web service that does not need me to keep it online [16:53:54] ergo, if the grid kills the web service for some reason, there should be something in-place to restart it for me [16:53:56] because, really, that's all I'm going to do when it's down anyway [16:54:04] Coren: one thing is OOM, it's ok for most tools to die when the limit is reached. but restarted would be fine then. [16:54:28] As suggested, a contiuous queue would be fine for tools-webgrid [16:54:46] wouldn't help for OOM, though. [16:55:02] valhallasw: Then you should care about its memory usage running out of bounds and why it gets to that point it the first place. Seriously. 4G is way, *way* past any sort of reasonableness and any web server that hits that barrier often enough that restarting it regularily is an issue is problematic to begin with. [16:55:31] Coren: Yes, but not an issue that *I'm* paid to resolve. [16:55:52] valhallasw: Wait what? [16:56:01] if lighttpd is crappy enough to require 4G for simple cgi-bin scripts, that's *your* problem, not mine. [16:56:29] valhallasw: It isn't. If your "simple" cgi-bin script eats up 4G of ram, then it's a problem with your script, not lighttpd. [16:57:11] Talk to betacommand. I glanced at his scripts. They look reasonable enough. [16:57:24] For some reason, Apache never had issues. [16:57:49] valhallasw: Wrong; it regularily had issue in that the entire *server* ran OOM and had to be restarted. [16:58:44] valhallasw: In fact, the fact that some tools became destructive because they ate all resources is one of the main reasons why I switch to per-tool servers -- this way misbehaving scripts kill themselves and don't bring everything else down with them. [16:59:17] thanks, hedonil [16:59:30] Coren, I find the following at the end of my error.log [16:59:41] 2014-05-20 14:35:44: (server.c.1512) server stopped by UID = 0 PID = 30656 [16:59:41] 2014-05-20 14:35:44: (server.c.1502) unlink failed for: /var/run/lighttpd/zoomviewer.pid 2 No such file or directory [16:59:59] I don't know what to make of it [17:00:12] dschwen: Hm. That's just /after/ it wanted to end; what's a couple lines up from that? [17:00:29] oh wait [17:00:32] is this fatal?! [17:00:33] 2014-05-20 14:35:44: (server.c.1512) server stopped by UID = 0 PID = 30656 [17:00:33] 2014-05-20 14:35:44: (server.c.1502) unlink failed for: /var/run/lighttpd/zoomviewer.pid 2 No such file or directory [17:00:36] sorry [17:00:37] one sec [17:00:45] 2014-05-20 14:35:44: (mod_fastcgi.c.2701) FastCGI-stderr: PHP Notice: Undefined index: stage in /data/project/zoomviewer/public_html/index.php on line 6 [17:00:54] Coren: and now they silently die instead of getting restarted. That's still not good from a tool user perspective. The tool user *also* still doesn't know anything, except lighttpd randomly dies without any information. [17:01:02] hm, where I'm from this is a warning [17:01:12] Coren: and the tool user can only get that information with magic incantations of jstat and jacct [17:01:27] dschwen: Hm, that shouldn't be fatal indeed. Interesting. [17:01:30] at least, if you consider an exit code of 137 (or was it something else?) informative. [17:01:55] sorry, that second 'tool user' should have been 'tool owner' [17:02:02] dschwen: We can rule out OOM at least -- that gets a sigkill and wouldn't have had the server politely wrap up as shown by the last two lines. [17:03:08] dschwen: yw. (modify starting script line 22 for your restart) [17:03:40] dschwen: your webservice's problem was: maxvmem 3.925G [17:04:03] Aha. So it hit the soft limit first. [17:04:12] dschwen: it's not in error.log, but jobinfo: $ qacct -j lighttpd-zoomviewer [17:04:13] I was just looking at the qacct [17:05:37] Corem: hi again. is it possible i link my index homepage file from a path like index-file.names += ("/BUB/app/index.py") [17:05:42] hedonil, what is ${HOME}/webstart.sh [17:05:55] just "webservice $1" ? [17:06:08] Coren: hi again. is it possible i link my index homepage file from a path like index-file.names += ("/BUB/app/index.py") [17:06:09] thanks [17:06:19] hm, why is this eating so much mem [17:06:23] dschwen: yep. something like that [17:06:31] rohit-dua: No; indices are "file in the directory which will be shown by default". You probably want a rewrite if you want your /toolname/ to show something elsewhere by default. [17:07:11] Coren: is it possible to rewrite that? [17:08:14] rohit-dua: Look at the "Url rewrite" section of the help page I pointed you at earlier. This is probably what you want. [17:08:46] dschwen: Is it possible for user-provided data to your web interface to cause it to work on huge datasets? [17:09:42] hm [17:09:44] Coren: thank you. i got it :) [17:09:45] oh... [17:09:58] I'm launching vips image processing tasks [17:10:11] as child processes I guess [17:10:24] 3Wikimedia Labs / 3Infrastructure: filearchive table not available on labs - 10https://bugzilla.wikimedia.org/61813#c10 (10Luis Villa (WMF Legal)) Can't they already do that by simply uploading the file instead of the SHA? [17:10:25] I should probably submit those through grid enginr [17:10:29] If we start webservices with "-m a", that should inform users about any jobs killed by the grid, but I don't think that would catch any "soft" kills. [17:10:30] buuuuut [17:10:32] Hm. Those would be added to the script's own usage. [17:10:51] that memory eating bug was fixed in vips months ago [17:10:58] let's see... [17:11:06] which version of vips do we have... [17:11:19] dschwen: Add a "ulimit" to be sure that it doesn't run away? [17:11:22] scfc_de: It wouldn't indeed. Hm. Nontrivial to catch. [17:11:50] 7.32.3 [17:11:55] scfc_de: Ah, indeed, if you ulimit your child process then the sum can't break you (but then that can cause the child to be killed so you have to cope with that) [17:12:20] scfc_de: That's unarguably better behaviour though. [17:13:01] We could also use "-m ae" with the assumption that tools owners would have to endure that one mail even on a "clean" shutdown/restart. [17:13:04] scfc_de: still, it would only tell users that the job was killed, not why. [17:13:28] which is a lot better than what we have now, but there has to be a way to improve on that. [17:14:08] 3Wikimedia Labs / 3Infrastructure: filearchive table not available on labs - 10https://bugzilla.wikimedia.org/61813#c11 (10Marc A. Pelletier) At best they could tell that some file with the same /name/ existed; the SHA will confirm content. AFAIK, uploading doesn't check against deleted files' SHAs. [17:14:26] crap that VIPS is a year old [17:14:45] valhallasw: I recently thought about that "webservice start" should add the tool to a list ("stop" removing it), and then check every hour or so whether the corresponding jobs are still running. Don't know if the Redis database already has that or if an uncommanded lighttpd shutdown removes the entry there as well. [17:15:14] valhallasw: Part of the problem is that unixy programs often deal very badly with malloc() not returning memory because they always presume infinite ram (that libg++.so message is a good example). If programs simply gave proper handling of OOM that wouldn't be as much of an issue. [17:16:11] Coren: right. however, it should be possible for the grid to give you a process tree with memory usage per process, right? [17:16:14] dschwen: Is the newer version available for precise somewhere? I could add it to the tool's repo and upgrade it. [17:16:17] not sure if that works for cgi-bin, though [17:16:30] but I *think* those are seperate processes [17:17:01] Coren: as an aside; 143 seems to be the "new" grid kill exit status. Got some kills recently (143) but didn't see any 137's for a time now [17:17:03] valhallasw: They'd be with lighttpd because it doesn't use a mod_php equivalent and runs a php-cgi in FCGI mode instead. [17:17:51] Right. Might not be too informative with php then, I guess. [17:17:59] Coren, the version in precise is even older [17:18:03] it would be even better to link the request to the memory usage [17:18:07] 143 is SIGTERM; I changed the default some time ago so that a qdel wouldn't be quite as destructive. [17:18:08] I wonder where you got the current one from [17:18:11] Oh! [17:18:30] valhallasw: That means it SIGTERMS ooms too? That's actually a good thing, for the most part. [17:18:53] I think it does, yes. [17:19:35] dschwen: It comes from prod. [17:19:54] dschwen: apt on tools goes local - WMF - public repos [17:24:27] hedonil, I had to add a timeout (-m 5) and a check for the curl return code [17:24:41] your script does nothing if the webservice just does not respond at all [17:25:15] dschwen: +2 ;-) [17:25:23] oh man, my webthingie is soooo screwed up right now. I get an endless row of dots when I try to stop it :-( [17:27:01] is there a "webservice kill" command [17:27:13] oh, wow, now it stopped [17:27:15] sge has trouble to qdel lighthttp from time to time, took ten minute for me this afternoon [17:28:48] Webservice already running [17:28:52] LIES!!!!!!! [17:29:39] dschwen: "already running" includes "in the process of starting". :-) [17:29:47] oh, ok [17:29:50] :-) [17:30:31] Well, also, "in the process of dying" but that requires a lot of happy fun timing to get. [17:32:50] 3Wikimedia Labs / 3tools: jsub not installed on the queue machines - 10https://bugzilla.wikimedia.org/64988#c12 (10bgwhite) (In reply to Betacommand from comment #11) > jlocal is on the submit host, I used it daily and just got a email from a > cron using it ~2 minutes ago Could you provide an example of ho... [17:45:04] dschwen: Is the trusty package recent enough to solve the issue? We could always backport it. [17:45:27] I think it is [17:49:23] provided it doesn't create dependency hell, I can do that. Please open a bz? [17:59:59] !add-labs-user [18:00:07] :( [18:01:18] 3Wikimedia Labs / 3tools: jsub not installed on the queue machines - 10https://bugzilla.wikimedia.org/64988#c13 (10Betacommand) 0 1 * * * jlocal python /data/project/betacommand-dev/svn_copy/email_logs.py [18:02:12] Coren: andrewbogott I see a lot of discussion here but none about things being down, so yay? [18:02:41] YuviPanda: I see nothing broken atm, so that's good. [18:02:43] YuviPanda: Seems solid to me, but time will tell :) [18:02:50] Coren: andrewbogott :) [18:02:57] YuviPanda: I'll reserve celebration until at least 24h pass. :-) [18:03:16] Coren: yeah, :) [18:03:19] Coren: I agree [18:05:56] andrewbogott, ping on the 14.04 image? [18:21:57] YuviPanda: don't know here if tool or proxy : https://tools.wmflabs.org/paste [18:28:16] hedonil: seems still down [18:28:31] tools hangs way too much [18:32:47] Reasonator is down [18:33:07] all tools are down... :/ [18:33:13] am on it [18:33:57] hedonil: GerardM- Steinsplitter back up now. [18:34:08] Coren: andrewbogott issues popped up again. back to our home rolled deb now [18:34:24] !log tools back to homerolled nginx 1.5 on proxy, newer versions causing too many issues [18:34:26] Logged the message, Master [18:34:43] :( [18:35:04] GerardM-: Steinsplitter phe better now? [18:35:34] y, tools back <3 [18:35:48] look like to work [18:38:03] 3Wikimedia Labs / 3Infrastructure: !add-labs-user gone, fix or add docs to link SVN users to labs/wikitech - 10https://bugzilla.wikimedia.org/64596#c2 (10Andrew Bogott) I dug the old docs out of wikitech history and updated them to work on virt1000: https://wikitech.wikimedia.org/wiki/Add-labs-user Probabl... [18:42:48] 3Wikimedia Labs / 3Infrastructure: !add-labs-user gone, fix or add docs to link SVN users to labs/wikitech - 10https://bugzilla.wikimedia.org/64596#c3 (10Andrew Bogott) jayvdb, if you have a wikitech account already, can you please tell me your on-wiki name? Or if you don't have one, just your preferred name? [18:43:48] YuviPanda: yes .. it is back [18:44:06] GerardM-: phe Steinsplitter should hopefully be more stable this time, since I just reverted it to an older state [18:53:38] YuviPanda: I have a IIS 7 license spare :P [18:54:03] hedonil: :P [18:54:12] webservice should support USB [19:03:30] is there a way to overwrite a lighttpd global var, I'm trying to move the log files (accesslog.filename and server.errorlog) to a sub dir but I can do it only in a conditional so lighttp create access.log/error.log even if this logs remains empty (the conditional is always true) [19:04:35] phe: yes, 08:43 < Coren> rohit-dua: Web server settings can be changed in a file named .lighttpd.conf in your tool's home. [19:06:10] mutante, yes I know but if I setup unconditionaly accesslog.filename = "/data/project/phetools/log/lighttpd.access.log" lighttpd doesn't start, and if I set it it conditionaly webservice start but create the empty file error.log [19:07:02] phe: ugh, in that case i'm not sure [19:12:36] can i change the server.document-root from "$home/public_html" to "$home/public_html/folder" ? [19:13:50] phe: configurations in lighty default cannot be overwritten in custom .lighttpd.conf [19:14:04] rohit-dua: phe https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Default_configuration [19:14:35] it should be possible if you manually submit the job [19:14:37] even though it's possible with soem foo [19:21:37] the vagrant user homedir of my labs is reporting it's user as jmorgan... [19:21:44] is that normal ? [19:23:59] * Coren is annoyed at the nginx problems. [19:28:31] Coren: I downgraded to 1.5 [19:30:05] Yeah, I saw. Still annoying that you had to. [19:30:44] Coren: yeah. now we are on a hand-rolled deb [19:32:30] Hand-rolled doesn't bug me, having to use an older version because of an unspecified intermittent problem does. [19:32:46] yeah, that too. [19:32:50] i added index-file.names += ("index.py") to my ~/.lighttpd.conf. but still index.py is not being recognized... as homepage.. shows 404 [19:33:11] rohit-dua: Did you restart your webservice after the change? [19:33:12] index.py is present in my home directory.. [19:33:15] no [19:33:34] rohit-dua: There's your problem. [19:33:57] Also, index.py is expected in ~/public_html not your actual home, but I'm guessing you knew that. [19:34:54] yes. Coren. i'm restarting my webservice :) [19:35:52] well webservice restart seems to take a lot of time... [19:36:07] * Damianz eats Coren [19:45:47] Coren: can you do me a favour and chown two items to my tools name? tools.tools-info (.bash_history && .ssh) -> old uid still from migration [19:46:13] hedonil: You should be able to 'take .bash_history .ssh' [19:46:29] Coren: I can't [19:46:38] Ah, or not if the gid is also broken. Sure thing, gimme a sec. [19:47:23] chowned. [19:48:45] Coren: Thanks. [19:51:42] 3Wikimedia Labs / 3deployment-prep (beta): cannot sudo on deployment-bastion - 10https://bugzilla.wikimedia.org/65548 (10Aude) 3NEW p:3Unprio s:3normal a:3None I used to be able to sudo as mwdeploy but now can't. I am in the 'svn' group. sudo -u mwdeploy -- touch extensions/Wikidata/extensions/Wiki... [19:55:31] 3Wikimedia Labs / 3deployment-prep (beta): cannot sudo on deployment-bastion - 10https://bugzilla.wikimedia.org/65548 (10Aude) a:3Bryan Davis [20:06:00] How do I add myself to https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps? [20:36:20] Dispenser, let me check if I can add you [20:36:36] aude, are you cool with that? [20:37:07] Crap, I can't [20:37:32] I activatedtwo factor authentication and of all days I leave my phone home today :-) [20:37:56] Lol, token authentication fail. [20:39:57] :-) [20:41:55] I think that's how you put in requests https://wikitech.wikimedia.org/wiki/New_Project_Request/maps [20:49:57] Dispenser: what do you need ? [20:50:31] To be put into the maps project [20:51:20] Dispenser: i think that should be fine. [20:52:32] dschwen: ok [20:52:47] Dispenser: done [20:53:41] yay! Now just have to finish working through these 600+ tabs :-( [21:01:36] 3Wikimedia Labs / 3deployment-prep (beta): Parsoid dead on BetaLabs (getting "parsoidserver-http-bad-sta tus: 503" on VE load) - 10https://bugzilla.wikimedia.org/65553 (10James Forrester) a:3None [21:04:19] 3Wikimedia Labs / 3deployment-prep (beta): Parsoid dead on BetaLabs (getting "parsoidserver-http-bad-status: 503" on VE load) - 10https://bugzilla.wikimedia.org/65553#c2 (10James Forrester) *** Bug 65555 has been marked as a duplicate of this bug. *** [21:08:39] !log deployment chown'ed /data/project/parsoid/parsoid.log from mwalker (?!?) to parsoid so Parsoid runs again [21:08:40] deployment is not a valid project. [21:08:46] !log deployment-prep chown'ed /data/project/parsoid/parsoid.log from mwalker (?!?) to parsoid so Parsoid runs again [21:08:48] Logged the message, Mr. Obvious [21:09:34] 3Wikimedia Labs / 3deployment-prep (beta): Parsoid dead on BetaLabs (getting "parsoidserver-http-bad-status: 503" on VE load) - 10https://bugzilla.wikimedia.org/65553 (10Roan Kattouw) 5NEW>3RES/FIX [21:09:34] 3Wikimedia Labs / 3deployment-prep (beta): Parsoid dead on BetaLabs (getting "parsoidserver-http-bad-status: 503" on VE load) - 10https://bugzilla.wikimedia.org/65553#c3 (10Roan Kattouw) Parsoid broke in beta labs because someone swapped the UIDs of the parsoid user and mwalker. This caused various things th... [21:09:51] hahaha mwalker! [21:10:03] Yeah [21:10:14] He said someone changed his UID [21:10:20] Apparently it was swapped with the parsoid UID [21:10:23] RoanKattouw: might be related to andrewbogott's work of fixing uids [21:10:39] Yeah that's what Matt tells me too [21:10:46] hehe [21:14:11] !log deployment-prep Converted deployment-stream to use local puppet & salt masters [21:14:13] Logged the message, Master [21:19:33] 3Wikimedia Labs / 3deployment-prep (beta): cannot sudo on deployment-bastion - 10https://bugzilla.wikimedia.org/65548#c2 (10Daniel Zahn) Does this mean the users should be converted like in: https://bugzilla.wikimedia.org/show_bug.cgi?id=64596 (instead of working around it)? [21:35:33] RoanKattouw_away: my vagrant user had a different username as well [21:35:49] as owner in his /home [21:36:09] thedj: were you able to log into your instance in labs btw? [21:36:21] yeah at some point :) [21:36:27] cool! [21:37:00] ori: http://wikimaps-ext.wmflabs.org/wiki/Main_Page [21:40:05] thedj: what is that, some kind of social networking site? "wiki"? [21:41:03] j/k, cool [21:41:05] glad it's set up [22:00:37] anyone have any idea why a continuous job would just 'disappear' -- I start it with: "jstart -N jouncebot start_jouncebot.sh" it runs for 10-20 minutes and then just stops, silently. the job id no longer exists... there's no output in the .err or .out files explaining why [22:01:45] mwalker: Coren may have more useful ideas, but I'm pretty sure that if a job runs amuck and hits its memory limit, the grid kills it without comment. [22:03:13] andrewbogott, ok; that's a place to start at least; I can see how this bot could have some memory leaks [22:03:45] mwalker: $ qacct -j jouncebot -> maxvmem 295.207M | exit_status 137 (=killed) [22:05:04] thanks hedonil; I think that settles that this has some memory leaks [22:06:01] mwalker: just try to give your job more mem: -mem 600m or -mem 1G [22:06:42] keep in mind, thaht grid adds all the binaries to your account as well, not just the script [22:07:19] mwalker: andrewbogott is correct that jobs dying unexpectedly are often caused by oom issues. [22:07:45] I think this might actually be a byproduct of how python does threading -- two threads is > than 256M [22:07:48] in virtual memory [22:08:01] Eeew. [22:09:24] is not [22:10:47] try https://docs.python.org/2/library/gc.html , you might have a leak [22:53:34] 3Wikimedia Labs / 3deployment-prep (beta): beta labs mysteriously goes read-only overnight - 10https://bugzilla.wikimedia.org/65486#c1 (10Chris McMahon) I think this is the first time I've seen the wiki read-only during the day (PDT) https://wmf.ci.cloudbees.com/job/VisualEditor-en.wikipedia.beta.wmflabs.org... [23:03:33] 3Wikimedia Labs / 3deployment-prep (beta): beta labs mysteriously goes read-only overnight - 10https://bugzilla.wikimedia.org/65486#c2 (10Arthur Richards) If I recall correctly, this is something that can happen when things go sideways with the database. Not sure if that's what's going on here, but may be wo... [23:06:32] !log deployment-prep Fixing puppet config for upstream rename of role::applicationserver -> role::mediawiki [23:06:34] Logged the message, Master [23:10:18] bd808: thanks for that [23:11:18] !log deployment-prep deployment-apache01 needs more work: "Could not set shell on user[mwdeploy]" [23:11:20] Logged the message, Master [23:12:30] bd808: btw do you know what this is about: [23:12:30] err: /Stage[main]/Role::Labs::Instance/Mount[/home]: Could not evaluate: Execution of '/bin/mount -o rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,nofsc /home' returned 32: mount.nfs: mounting labstore.svc.eqiad.wmnet:/project/deployment-prep/home failed, reason given by server: No such file or directory [23:12:33] on deployment-prep? [23:13:00] this is on a new instance i switched to local puppet/salt, but this error occurred both before and after [23:13:13] hmm... looks like the labs nfs server freaking out [23:13:40] That was rampant when we first moved to eqiad but I thought it was fixed now [23:13:51] The old fix was "reboot unitl it works" [23:14:16] There was/is some race condition with nfs server acl configuration [23:39:44] q: what's the labs LDAP infra built on? (openldap or opendj)? [23:40:09] bd808: do you have a sec to talk about mwdeploy? [23:41:00] I guess. I'm sorta trying to unbreak puppet in labs right now [23:41:05] cajoel: I think opendj, but Coren or andrewbogott would kn ow better [23:41:22] opendj, sadly. [23:41:42] on a scale of 1 to Mediawiki, how sad? :) [23:42:21] Hello [23:43:17] I'm tyning to connect with c++ [23:43:35] a database [23:43:54] db.setHostName("tools-login.wmflabs.org"); [23:43:54] db.setDatabaseName("eswiki"); [23:44:15] but... database not open [23:44:36] Someone could help me? [23:44:40] Thanks [23:45:58] ori: fixing things so that the user hacks aren't needed sounds fine. I think that mwdeploy may not be touching NFS in deployment-prep with the changes to use scap there. [23:46:16] I'd like to get it all working again before I start changing other things however. [23:48:10] ori: l1onupdate has it's home dir on an nfs mount however so that would need to change [23:48:19] *l10update [23:48:42] anyone= [23:48:43] ? [23:49:06] Harpagornis: the hostname is certainly not tools-login; you probably want something like "eswiki.labsdb" instead. [23:49:43] !log deployment-prep Fixed puppet for deployment-apache[12] using https://gerrit.wikimedia.org/r/#/c/134519/2 [23:49:45] Logged the message, Master [23:50:39] Coren.. [23:50:42] db.setHostName("eswiki.labsdb.wmflabs.org"); [23:50:42] db.setDatabaseName("eswiki");? [23:50:58] No '.wmflabs.org' [23:51:04] umm [23:51:05] ok [23:51:06] sorry [23:56:10] the fault is ralgis [23:56:27] .__. [23:58:49] Coren, not open [23:59:40] Hm, I just noticed the database name should be 'eswiki_p' not just 'eswiki'. That may also be your issue. But I'm not familiar with the class you are using for database access so my help is limited.