[02:36:23] [bz] (8RESOLVED - created by: 2Maarten Dammers, priority: 4Immediate - 6critical) [Bug 54847] Data leakage user table "new" databases like wikidatawiki_p and the wikivoyage databases - https://bugzilla.wikimedia.org/show_bug.cgi?id=54847 [04:02:01] [bz] (8NEW - created by: 2MZMcBride, priority: 4Unprioritized - 6normal) [Bug 55455] Provide dynamic report of differences between replica databases and production databases - https://bugzilla.wikimedia.org/show_bug.cgi?id=55455 [08:13:40] !ping [08:13:41] !pong [09:46:15] Is there any reason bots-bsql01 points to a tool server (aparently, from the motd) and is refusing connections from the bots project? [09:46:36] Though maybe not tools - home dir looks like bots *confused* [09:48:54] Also looks like someone changed the my.cnf files to point to a different server (which doesn't fix code - and I don't want to fix the code now, inncase that db is wrong/old) *more confused*. [10:21:39] bots-bsql01 server is going to be deleted 24. 9. 2013 [10:53:11] [bz] (8PATCH_TO_REVIEW - created by: 2Krinkle, priority: 4Normal - 6enhancement) [Bug 49350] Tool Labs: Change logo - https://bugzilla.wikimedia.org/show_bug.cgi?id=49350 [11:06:58] tools.wmflabs.org/{user}/ not working? [11:08:26] some but not all ..it seems [11:08:36] https://tools.wmflabs.org/abot/ works .. other random ones don't [11:08:57] "Proxy Error" "Reason: Error reading from remote server" [11:09:07] https://tools.wmflabs.org/boteas/ also works [11:09:15] Some FS got muddled? [11:09:53] wants to look at labs icinga to see if a proxy is down [11:10:01] but notices labs icinga itself is broken [11:12:08] oops , got an internal server error now [11:12:18] https://tools.wmflabs.org/intersect-contribs/ [11:12:35] Coren: ^ [11:18:14] a930913: what's a specific one you were after? seems to me more of them work again [11:22:19] mutante: BracketBot/Cluestuff. Mainly the former. [11:22:36] Just wanted to check something whereupon I found it down. [11:25:33] hashar: where are the configurations for jenkins test workspaces and stuff? :> [11:26:02] ie where does /srv/ssd/jenkins-slave/workspace/mwext-Wikibase-repo-tests/junit-phpunit-allexts.xml come from [11:28:13] a930913: ah..hmm. but those don't have links on the index page [11:29:03] maybe they are still in transition from toolserver.org ? [11:29:25] mutante: I know, I was just generalising above. [11:34:14] addshore: hey [11:34:32] addshore: that xml file is the job configuration in Jenkins. It is generated by Jenkins Job Builder script [11:34:32] this I just found them all ;p [11:34:50] in jenkins-job-builder-config repo? :P [11:38:46] addshore: yeah that is it :-] [11:38:56] I got a clone of integration/jenkins-job-builder [11:39:05] got one ;p [11:39:07] and under that cloned the jenkins-job-builder-config in the config directory [11:39:19] ahh [11:39:20] !newlabs [11:39:20] This is labs. It's another version of toolserver. Because people wanted it just like toolserver, an effort was made to create an almost identical environment. Now users can enjoy replication, similar commands, and bear the burden of instabilities, just like Toolserver. [11:39:43] addshore: then I do something like: mkdir output; jenkins-jobs tests configs/ -o output/ [11:39:55] but hashar the definitions of the jobs look the same in the yaml file [11:39:59] addshore: that should generate all the XML files under output/ (albeit without a .xml suffix) [11:40:01] Coren, sooooo. Labs is busted again [11:40:09] Cyberpower678: what happened this time? [11:40:25] Looks like the proxy node is down. [11:40:41] mhhm which node Cyberpower678 ? [11:40:48] Proxy Error [11:40:48] The proxy server received an invalid response from an upstream server. [11:40:48] The proxy server could not handle the request GET /cyberbot/spambotstatus.php. [11:40:48] Reason: Error reading from remote server [11:40:53] addshore: for Wikibase you want to look at mediawiki-extensions.yaml and the mwext-Wikibase-{kind}-tests' template [11:41:19] That. So it's either the webserver or the proxy server. [11:41:19] addshore: Also, you should take a backup of the generated jobs using: jenkins-jobs tests configs/ -o output-reference/ [11:41:53] addshore: then you can hack the yaml files and generate the XML files under a different directory (i.e.: output) that let you do a diff of the generated conf: diff -u -r output-reference/ output/ [11:43:08] mhhhm, im still slighty confused [11:54:52] ahh I found it! :d [11:56:32] addshore, so about that proxy error? [11:58:29] file a bug, mark it as critical etc and ping people like petan Coren andrewbogott_afk YuviNoPower [11:59:26] or Ryan_Lane ;p [12:23:05] Steinsplitter: It was working yesterday hmmm, also where does it say that? Didn't see anything on mailing list [12:24:13] ? [12:32:31] Damianz: ? [12:36:40] Bah! You guys made Tool Labs busier than anticipated and we keep running oom on the webservers. :_) [12:38:22] :P [12:38:25] make more? :D [12:38:27] * Coren needs to add a webserver or two. [12:38:42] Yeah, will do -- that's #3 on my list now. [13:01:37] Can you at least restart them so they work? [13:02:26] Coren, ^ [13:06:15] Cyberpower678: I have. Every tool I try now works. Which one is giving you trouble? [13:06:41] http://tools.wmflabs.org/cyberbot/spambotstatus.php [13:06:48] A simple tool [13:06:58] returns [13:07:04] * Coren looks into it [13:07:06] Proxy Error [13:07:07] The proxy server received an invalid response from an upstream server. [13:07:07] The proxy server could not handle the request GET /cyberbot/spambotstatus.php. [13:07:08] Reason: Error reading from remote server [13:08:23] Aaaah. I think I see the proximate cause. Something is hammering *hard* on catfood. [13:08:57] Wait what? Catfood. [13:09:06] Why does labs need cat food? [13:12:00] Cyberpower678: You might want to lower that refresh value a bit though. [13:12:55] My initial refresh was immediate at first. But 1 second seems appropriate for me. [13:13:09] It's hardly using any resources. [13:15:19] What in *blazes* is that thing doing? [13:15:30] (Talking about catfood) [13:15:54] Coren, I didn't know labs consisted of cats. [13:16:13] My parrot would have lots of fun with them. :p [13:18:07] * Coren had to disable that tool. [13:18:27] It looks like it's some sort of RSS feed; but whatever uses it keeps connections open. [13:18:35] Is that what was causing the webserver to fail? [13:19:04] Almost certainly; when I restarted it (again) to unwedge yours, the workers were all stuck on that one tool. [13:19:51] * Cyberpower678 's RAM is leaking [14:21:27] Coren: is this apache? what did you use to query what the workers are doing at an instance? [14:21:40] keeps connections open == long poll? [14:21:59] jeremyb: It is; I used strace; and they're indeed stuck in a poll. [14:25:08] ah, strace [16:29:24] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Normal - 6enhancement) [Bug 52382] automatically import some content from production (tracking) - https://bugzilla.wikimedia.org/show_bug.cgi?id=52382 [16:29:25] [bz] (8NEW - created by: 2Ċ½eljko Filipin, priority: 4Normal - 6enhancement) [Bug 47205] sync Sandbox gadget from production to en.wikipedia.beta.wmflabs.org - https://bugzilla.wikimedia.org/show_bug.cgi?id=47205 [16:29:26] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Normal - 6enhancement) [Bug 49779] sync articles from production wikis (css/gadgets) - https://bugzilla.wikimedia.org/show_bug.cgi?id=49779 [16:44:54] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Highest - 6major) [Bug 48501] [OPS] beta: get SSL certificates - https://bugzilla.wikimedia.org/show_bug.cgi?id=48501 [18:12:56] Hello guys, are there any known issues to tools webserver? PHP-Sites are disgorging a straight 500. [18:13:56] hedonil: I think there's an ongoing conversation about that on the labs-l list. I haven't read it yet though :) [18:15:01] ok thanks, so it's known and worked on [18:15:39] Well, actually, now I've read it and it does not seem helpful. Coren, your thoughts? [18:16:21] hedonil: meanwhile you might contribute to that thread with whatever it is you're doing and seeing. [18:16:29] Oh bah! [18:16:40] I need to find a clean way of limiting resource use per user. [18:16:47] (Yeah, aware and working on) [18:17:34] hedonil: I restarted the affected webserver and will be disabling the culprit. [18:17:57] andrewbogott, YuviNoPower: I think the api should be installed via a package [18:18:07] * andrewbogott agrees. [18:18:22] But yuvi is out for a few days with wrist pain. [18:18:26] * Ryan_Lane nods [18:18:42] Coren, can you explain how the yuviproxy helps with the betalabs cert issue? [18:18:51] I skimmed that thread, maybe I misunderstood the question. [18:19:05] Coren: isn't it possible to limit resource per user via virtual hosts? [18:19:12] andrewbogott: It's a protected instance that can have "real" SSL certs that proxies on behalf of any project that needs a web thing. [18:19:15] and yuvi's proxy doesn't help beta [18:19:26] Ryan_Lane: Why not? [18:19:34] Coren, atm the proxy runs on a labs box. [18:19:46] even if we wanted to use the proxy itself, which we don't, it doesn't have the right kind of * cert [18:19:59] beta is meant to look like production, which means the same https setup [18:20:07] Ryan_Lane: Ah, good point then. [18:20:16] it needs to have its own cert and it needs to have the correct * cert [18:20:33] it probably won't be cheap [18:20:40] webserver back on track again. thx [18:21:24] Ryan_Lane: Yes, you can limit per vhost but you either end up having to do total resources/total users on every box (otherwise you can still run out) or limit to some higher fraction and *hope* not too many of 'em reach the limit. [18:21:48] Ryan_Lane: I just need a way to spin off the expensive tools in their own ballpark so that when they consume everything they have they won't impact others. [18:21:56] * Ryan_Lane nods [18:22:47] Instances are (relatively) inexpensive; I might start doing per-tool httpds for the really heavy stuff. [18:23:10] (catscan/catfood are major culprits) [18:23:12] sounds sane [18:58:10] in Tools, when I run my tool is it gauranteed to have a single IP for all time? [18:58:51] or at least for a period of about a month? Allow for the tool to be called several times in different jobs on the grid [19:07:52] notconfusing: I would say that it is not guaranteed, since the grid engine will start the tool on a system of its choosing... [19:08:22] And in any case the host won't have a public IP. [19:18:17] notconfusing: why do you need a stable IP? [19:19:40] Ryan_Lane, I don't necessarily, it's for an external API which I'm getting a key, i just want to know how to configure it [19:19:51] I can get a token from them instead though [19:21:08] notconfusing: oh, well, the IP in question is a private IP anyway [19:21:21] it's in the 10. range, which is not publically routable [19:21:34] to talk to the outside world, NAT is used [19:21:47] Ryan_Lane, great I'll get a token, thanks [19:22:06] I think each application server has its own NAT IP, though, so it's the same problem, but worse because you don't know the IP :) [19:22:15] yeah, likely a good approach :) [19:36:12] andrewbogott: Actually, the exec hosts do have public IPs so that identd works. :-) [19:36:27] Huh, ok. [19:36:46] (Was necessary for the large number of IRC bots of all denomination) [19:36:58] (Which reminds me that we probably want to tell them to use dickson) [19:53:30] Hey Coren! Did you have any chance to look at fast-cgi support on labs (https://bugzilla.wikimedia.org/show_bug.cgi?id=52944) [20:25:24] any idea why response times from bots.wmflabs.org are so slow? I use the IRC logs there a fair bit, but it often takes 10-60 seconds or even longer to load the page for the first time [20:41:46] Question: is there any way to get permission to do EXPLAIN queries on database replicas? [20:44:14] parent5446: no, there was a bug about this a while back, let me find it [20:44:25] parent5446: https://bugzilla.wikimedia.org/show_bug.cgi?id=48875 [20:44:50] hashar: ping? [20:44:52] legoktm: Thanks [20:45:01] bd808: more or less [20:45:06] bd808: really close to head to bed :D [20:45:47] hashar: quick (I hope) question. I want to run purgeChangedPages.php in beta to see what happens. [20:45:55] What wiki name should I use? [20:46:12] * bd808 hasn't played in beta before [20:47:00] bd808: the list of beta wikis is in the mediawiki-config repo ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config.git [20:47:17] are you familiar with it already ? [20:47:19] hashar: Awesome. Thanks [20:47:21] $ ls *labs* [20:47:21] all-labs.dblist wikiversions-labs.dat [20:47:29] all-labs.dblist get them :-] [20:47:32] you can try out on enwiki [20:47:47] the purges are send over unicast to all the cache instances [20:48:54] bd808: and wmf-config/squid-labs.php has the $wgHTCPRouting [20:49:07] Well, there goes the only reason for me to use tool labs. This is why I hate MySQL. [20:49:12] that basically route purge requests to various caches depending on the URL [20:49:32] bd808: aka on beta we send purges to the text and mobile caches (see squid-labs.php) [20:50:07] hashar: very helpful as usual :) Get some sleep now [20:50:38] bd808: and if you go to the varnish text : ssh deployment-cache-text1.pmtpa.wmflabs [20:50:42] you can watch purge requests [20:50:49] varnishncsa -n frontend [20:50:53] and: varnishncsa [20:50:56] (for the backend) [20:51:33] parent5446, I can run the explains for you in prod [20:51:49] bd808: varnishncsa basically gives you something similar to apache access logs :-] [20:52:06] MaxSem: Thanks. I was really just looking for a way to test queries without having to prod somebody. ;) [20:52:15] * bd808 is watching them now [20:53:15] in any case, you can't get a 100% precise explain in labs [20:53:40] Hmm I guess that's true. I figured at least the index choice would be the same as production. [20:54:35] nah, depends on cardinality [21:13:13] Reedy: Running mwscript on deployment-bastion is giving 'Undefined variable: wmgConfigDir' notice. Is that related to gerrit:84898? [21:13:51] Probably [21:13:58] I guess that means it's not defined in the labs file [21:13:58] grr [21:14:27] Uh [21:14:46] It shouldn't be running that code on production [21:15:22] Reedy: I'm in beta [21:15:44] f not g [21:16:46] That could be a t-shirt slogan [22:04:43] !add-labs-user [22:08:31] [bz] (8NEW - created by: 2Dereckson, priority: 4Unprioritized - 6normal) [Bug 53793] Users with a former SVN account not migrated can't create an account - https://bugzilla.wikimedia.org/show_bug.cgi?id=53793 [23:24:05] is this the outage rollercoaster backchannel? :) [23:28:33] to add to the fun, coren's switching addresses :P [23:29:33] I just love how every bug is, somehow, the infrastructure's fault. :-) [23:33:53] * Coren notes the nice complete lack of outage since catfood has been disabled. :-) [23:42:13] Coren: well, if one tool can take down everything, then maybe it is the infra's fault? :) [23:42:26] Ptbtbtb! [23:42:46] Yes, but the tool going on a rampage in the first place isn't. :-) [23:43:55] web tools could run inside of containers, dynamically routed to via yuviproxy [23:43:59] then one tool misbehaving would only crash itself [23:45:27] !ping [23:45:27] !pong [23:45:31] -_- [23:46:49] Coren: well, if one tool can take down everything, then maybe it is the infra's fault? :) [23:46:55] Ryan_Lane: That works for self-contained tools that service web/fcgi/wsgi requests -- not so much for static things and simple CGIs. You don't want to need a container with an apache install for all of hundreds of tools. [23:46:59] web tools could run inside of containers, dynamically routed to via yuviproxy [23:47:03] then one tool misbehaving would only crash itself [23:47:12] Ryan_Lane: You're repeating yourself. :-P [23:47:31] chat when on plane wifi isn't great. it says it's connected, then just stops sending or receiving messages [23:48:44] oh? so my messages sent, but I wasn't getitng responses [23:48:53] Coren: why not? setting up containers is cheap and easy [23:49:30] I suppose it could be made to work; it's certainly worth looking into. [23:49:44] *after* NFS, and some bugzilla hacking. :-) [23:49:45] hm. I wonder if it's necessary to actually do a full containerization [23:49:47] yeah [23:49:53] it's not a huge priority [23:50:27] I wonder how well apache integrates with cgroups :) [23:51:30] https://github.com/MatthewIfe/mod_cgroup [23:51:37] not sure if that's worth a shit, though [23:52:09] !ping [23:52:09] !pong [23:54:36] heh, well, that module is interesting, at least. looks like the memory controller limitation could be dangerous :D