[00:40:42] Beta's question from earlier was silly. [00:48:54] ok, so I have a python script with a .cgi extension, it has the correct "shebang" at the top, it is in /data/toolname/public_html, it has execute permissions, it is owned by my tool account, and it runs without errors from the shell............. but trying to run it over the web server results in an Internal Error [00:49:19] how can I determine why the error is occuring? [00:51:29] Scottywong: If you use the new web service setup ... [00:51:34] !newweb [00:51:34] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb [00:51:38] You get (a) much improved performance and (b) error logs delivered to your $HOME [00:51:46] Which will help a great deal. :-) [00:52:08] do I have to do something with that example FCGI configuration code? [00:52:27] or can I just do a "webservice start" in the shell without doing any other steps? [00:52:34] No, just do 'webservice start'; if all you have are cgi it should just work. [00:52:38] ok [00:52:42] i'll give it a try [00:52:43] oh [00:52:44] Hm. [00:52:54] do I need to add something to enable the logs? that page seems to indicate that I need to [00:53:01] debug.log-request-handling = "enable" [00:53:11] Scottywong: That's just if you want extra debug logging of every request in detail. [00:53:15] ok [00:53:20] thanks, i'll give it a try [00:53:56] lol [00:54:11] ok, so I did webservice start, then I tried to load the page again, and my browser downloaded the cgi file instead of running it [00:54:13] :/ [00:54:58] https://tools.wmflabs.org/csbot/test.py works fine for me. [00:55:17] That is, literally just 'print "Foo!\n"' [00:55:50] let me change the extension to .py [00:55:51] What is your problem URL? [00:56:05] https://tools.wmflabs.org/afdstats/afdstats.cgi?name=Foo [00:56:29] Oh, I don't think random '.cgi' extensions get parsed by default with lighttpd. :-) [00:56:53] Should work with .py though. [00:58:20] seems to be working a bit better with .py [00:58:39] "A bit" doesn't seem promising. [00:58:43] although it's just printing out all of the html as plain text [00:58:58] * Coren tries. [00:59:09] https://tools.wmflabs.org/afdstats/afdstats.py?name=Foo [01:00:15] Ah! Huh. Well, your script doesn't seem to output a format so it's defaulting to the (unhelpful) text/x-python. [01:00:20] or, this one would be faster [01:00:20] https://tools.wmflabs.org/afdstats/afdstats.py?name=Foostuff [01:00:22] oh [01:00:33] my script doesn't output a format [01:00:36] not sure what you mean by that [01:00:46] I know most people who do python use a toolkit of some kind, like Flask, that sets the headers. [01:00:53] top line is [01:01:07] The HTTP response includes a Content-Type: header; your script doesn't set it. [01:01:11] oh [01:01:18] ok, i'll look into that, thanks [01:02:04] I never had to do any of that on toolserver, I guess it was taken care of automatically somehow? [01:02:43] One thing I see that is an easy fix in your code that is a difference between the toolserver and here is that you select against the revision table; you probably want to do use revision_userindex instead -- that'll be MUCH faster. [01:03:39] Scottywong: Try this: [01:03:49] Add to your py script those, before any other output: [01:03:58] print "Content-Type: text/html" [01:03:58] print [01:04:41] yeah, I haven't gotten around to modifying the code yet, just trying to get the damn thing to execute for right now.. :) I'll try adding those lines to the top, thanks [01:06:24] yup, that worked [01:06:31] jeez that was a lot easier than I thought it would be [01:07:22] thanks a bunch, gonna log off for a bit and log back on at home [03:00:44] i'm just going to keep peppering you guys with noob questions until you stop answering them :) here goes: [03:01:01] can i put things like common css files in my home directory, and have individual tools access them? [03:01:18] if so, what would the url be to access them? i've already set permissions on my home folder to allow read access [03:01:32] they wouldn't be accessible via any url [03:01:42] ok [03:01:49] so I have to store everything in the tool directory [03:01:55] Scottwong: you may try ln -s [03:01:57] not necessarily [03:02:06] or have a dynamic endpoint [03:02:20] not familiar with either of those [03:02:46] have a GCI script named 'css.py', which takes a ?name= parameter [03:02:47] ok, so ln is basically making a shortcut [03:02:59] Scottywong: yes [03:03:08] the script could be with open('/your/dir/name.css') as f: print f.read() [03:03:20] of course, you need to whitelist it to avoid security issues [03:03:34] gotcha [03:03:38] ok, those are two good options, thanks [04:19:26] i see new keys! i guess it's fixed [04:20:54] praise the lord! [05:25:39] so, probably a stupid question, but does the replicated db in tool labs include revision text? [05:28:15] no [05:29:47] k just checking [10:17:49] !log integration refreshed slave-scripts on integration-selenium-driver , doing a git pull in /srv/deployment/integration/slave-scripts [10:18:16] Logged the message, Master [10:32:44] hm, I'm having an interesting connection issue: https://tools.wmflabs.org/ (SSL) works fine, but http://tools.wmflabs.org/ (non-SSL) is insanely slow [10:32:47] is this a known thing? [10:35:04] tto: yep, known issue. [10:35:27] is there a bug report? [10:40:23] tto: hmm, didn't find any. [10:41:09] well, I guess so long as it is "known" in the minds of the right people, it won't be forgotten [10:41:35] tto:;-) [10:42:21] tto: it doesn't hurt to file a new one [10:42:41] yep, I may as well [10:44:30] seems like the user panos has run a big py script on tools-login :( [10:45:43] hashar: my vm has timeouts connecting to brewster.wikimedia.org [10:46:32] https://bugzilla.wikimedia.org/show_bug.cgi?id=57968 [10:46:42] matanya: yeah noticed that [10:46:45] might be overloaded [10:48:39] tools webproxy seems to be down: http://ganglia.wmflabs.org/latest/?r=hour&cs=&ce=&m=load_one&s=by+name&c=tools&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [10:49:42] back on track now [10:50:39] aaand down again [10:50:50] hedonil: I was going to say it :D [10:51:13] fale::-D [11:42:39] the shared pywikibot seems to be an outdated version (r-1 (unknown), ???????, 2013/09/21, 23:13:04, OUTDATED) [11:57:29] fale: russblau needs to update it, as it's not +w for everybody... [11:58:32] fale: try again? [11:59:11] it's now linked to the nightlies [12:02:19] generate_user_files.py is broken... [12:07:21] fale: [12:07:21] valhallasw@tools-login:/shared/pywikipedia$ python /shared/pywikipedia/rewrite/scripts/version.py [12:07:25] Pywikibot: [https] r/p/pywikibot/core (r2402, 65c129b, 2013/12/04, 10:16:04, ok) [12:07:28] :-) [12:09:54] valhallasw: :) [12:44:16] Coren: is it possible to have an instance running something which isn't ubuntu 12.04/10.04? [12:52:30] krd@tools-login:~$ df [12:52:30] df: `/home (deleted)': No such file or directory [12:52:30] df: `/data/project (deleted)': No such file or directory [12:52:30] df: `/mnt/pagecounts (deleted)': No such file or directory [12:52:32] WTF? [13:00:37] krd: df /home works [13:01:08] could those you list be due to the backup system in place? [13:01:24] time machine stuff [13:01:27] maybe, dontknow. [13:11:57] Well, don't get it wrong, maybe I was misinformed when I thought that I have to migrate from toolserver to labs as soon as possible because it is faster and more stable. [13:13:28] krd: mainly because the TS is going down, but I have no clue how that relates to your message. [13:13:48] Because as far as I can see, /home is still there, so it's probably a df issue and not a storage issue [13:14:30] Currently I'm waiting for /public/pagecounts/ to reappear, as my script using it is currently stuck. [13:15:01] I'd like to avaoid to revert to http pulling again. [14:27:19] matanya: It /might/ be doable, given a pressing need, but the argument will have to be compelling indeed. [14:28:06] Coren: I need to build debian packages for stuff that only made it to ubuntu 12.10 and onward [14:28:11] krd: "foo (deleted)" is an artifact of a leftover NFS mount of a different filesystem. It's internal silliness, but has no effect. [14:28:54] Coren: if not possible, i'll try and get a debian laptop or latest ubuntu, but this is not preferable [14:30:17] matanya: Is raring adequate for that use? [14:30:58] Coren: does raring have gem2deb higher than 0.3 ? [14:31:23] The point is that we don't want to have any part of the infrastructure rely on a non-LTS release; but we can build an image for a singleton. [14:31:23] that is the least dep i need to solve :) [14:31:27] Hm. [14:31:49] I'd have to look it up; but alternately we could backport that dep to Precise instead if that's your last. [14:31:50] understood Coren. it is somewhat a blocker for the VE guys [14:32:23] let me check the deps tree a sec Coren [14:32:25] I already deploy a couple of backports in tools. [14:32:35] (For similar reasons) [14:35:31] Coren: 'rdiscount', '~> 2.1.6' 'json', '~> 1.8.0' 'parallel', '~> 0.7.1' 'rkelly-remix', '~> 0.0.4' 'dimensions', '~> 1.2.0' 'sass', '~> 3.2.0' [14:36:28] That's the dependencies to gem2deb? [14:37:09] Coren: no, for the packge i'm going to build [14:38:07] Coren: gem2deb has ruby2.0 and ruby2.0-dev as deps. [14:38:21] Hm, those are all ruby aren't they? It should be feasible to backport an entire ruby2 tree. [14:38:52] yes, coren. a full ruby2 tree would be the best [14:39:19] The "interesting" question is what is the better thing to do for maintenance: (a) maintain a bleeding-edge distro but rely on it for the packages or (b) maintain a ruby2 into precise and keep it maintained. [14:40:15] My instincts would have me lean towards (b), [14:40:44] but both of those have a maintenance cost. Who's going to feed and care for it? Ops? [14:40:48] Coren: as long there is a maintainer, i don't care :) [14:41:54] I'll discuss it with the rest of ops. A datapoint will need is the "oldest" distro that supports all the versions you need? [14:42:00] Coren: when thinking of it more, one bleeding-egde image would server for future stuff for more than me [14:42:39] Coren: raring [14:43:07] and it doesn't have to have ruby 2.0 if it is raring. [14:43:50] Coren: Thx. Regarding https://bugzilla.wikimedia.org/show_bug.cgi?id=57617 , is there anything we can push? [14:46:23] matanya: The problem with relying on raring in labs is that you're not going to be able to convince ops to deploy a non-LTS in prod. [14:47:29] Coren: i'm not asking them to deploy non-LTS, i just need the dev tool from raring. the package itself is wmf-maintained. [14:48:20] matanya: Aaah, that does make things much simpler. If raring has all you need, then it shouldn't be a problem to make an image of it available for use as labs instance. [14:49:00] krd: Lemme poke legal about that and request they give us clearance fast. [14:49:08] Thx. [14:49:12] Coren: I'd much appricate if you can prepare me such an image. [14:50:12] matanya: I'll try to squeeze it in this week. The problem isn't difficulty but the move of labs + new DC is keeping ops busy. :-) [14:50:41] matanya: Can you open a bz for it so it doesn't fall between the cracks and/or other opsen can help? [14:50:55] * Coren doesn't like being the only one who knows of a request. :-) [14:50:58] boy do i know that :) want a bz or rt? [14:52:59] Coren: in addition, i do think it is a good idea to have a bleeding-edge for future features developments. (mainly i refer to the coming 14.04) [14:53:34] bz is best for me; that's user-facing. [14:54:35] ok, filing [15:02:45] Coren: https://bugzilla.wikimedia.org/show_bug.cgi?id=57982 [16:09:19] Coren: Hi, can you tell in a word what's the current issue? tools-webproxy is recurrently down, newWeb is very slow, some tools web give 503, https://tools.wmflabs.org/?status doesn't show up at all & last not least tools-login is blocked. [16:09:42] hedonil: Hang on, in a meeting at the moment. [16:11:57] hedonil: Looks like something his hammering on it hard. A bot of some kind. [16:13:36] Coren: yep. [16:15:51] Oh, ffs. Looks like a distributed spider. Probably an address scraper that's recursively hammering on scripts. [16:19:16] I've disabled the tool being hammered, that should gradually help. [16:21:27] Coren: nope. right now: tools-webproxy down again: http://ganglia.wmflabs.org/latest/?r=hour&cs=&ce=&m=load_one&s=by+name&c=tools&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [16:21:36] I've seen a bunch of slowness on my isntances as well [16:21:41] all morning, really [16:22:02] Excellent timing while I'm holding interviews. :-( [16:22:28] Yeah, there's a couple of thousand IPs hitting the web stack. [16:25:09] I can live with it [16:25:56] It's making the proxy hammer on the webservers and webproxy, which is impacting a lot of VMs. [16:32:34] ! [16:32:34] There are multiple keys, refine your input: !log, $realm, $site, *, :), ?, {, access, account, account-questions, accountreq, add, addresses, addshore, afk, airport-centre, alert, amend, ask, awstats, bang, bastion, be, beta, bible, blehlogging, blueprint-dns, borg, bot, bots, botsdocs, broken, bug, bz, channels, chmod, cloak, cmds, console, cookies, coren, Coren, cp, credentials, cs, Cyberpower678, cyberpowerresponse, damianz, damianz's-reset, db, del, demon, dependency, deployment-beta-docs-1, deployment-prep, derp, doc, docs, domain, dumb, enwp, epad, es, etherpad, evil, excuse, extension, failure, false, fff, filemoves.py, flow, FORTRAN, forwarding, gerrit, gerritsearch, gerrit-wm, ghsh, git, git-puppet, gitweb, google, group, grrrit-wm, hashar, help, helpmebot, hexmode, hodor, home, htmllogs, hyperon, icinga, IE6, info, initial-login, instance, instance-json, instancelist, instanceproject, ip, is, jenkins, keys, labs, labsconf, labsconsole, labsconsole.wiki, labs-home-wm, labs-l, labs-morebots, labs-nagios-wm, labs-project, labs-putty, labstore3, labswiki, leslie's-reset, lighttpd, limitations, link, linux, load, load-all, log, logs, logsearch, mac, magic, mäh, mail, manage-projects, mediawiki, mediawiki-instance, meh, mob, mobile-cache, monitor, morebots, msys-git, mw, nagios, nagios.wmflabs.org, nagios-fix, nc, newgrp, newlabs, newlabs2, newlabs-rl, new-labsuser, new-ldapuser, newweb, night, nocloakonjoin, nova-resource, op_on_duty, openstack-manager, origin/test, os-change, osm, osm-bug, paf, pageant, pang, password, pastebin, pathconflict, pc, perl, petan, petan..., petan:ping, petan-build, petan-forgot, ping, pl, po*of, pong, poof, poofing, port-forwarding, project-access, project-discuss, projects, proxy, pung, puppetmaster::self, puppetmasterself, puppet-variables, putty, pxe, pypi, python, pythonguy, pythonwalkthrough, queue, quilt, ragesoss, rb, reboot, redis, remove, replicateddb, report, requests, resource, revision, rights, rq, rt, rules, Ryan, Ryan_Lane, ryanland, sal, SAL, say, screenfix, search, searchlog, security, security-groups, seen, self, sexytime, shellrequests, single-node-mediawiki, snapshits, socks-proxy, srv, ssh, sshkey, start, stats, status, Steinsplitter, StoneB, stucked, sudo, sudo-policies, sudo-policy, sumanah, svn, t13, taskinfo, tdb, Technical_13, terminology, test, Thehelpfulone, tmh, todo, tooldocs, tools-admin, toolsbeta, tools-bug, tools-request, toolsvslabs, tools-web, trout, tunnel, tygs, unicorn, venue, vim, vmem, we, whatIwant, whitespace, whyismypackagegone:'(, wiki, wikitech, wikitech-putty, wikiversity-sandbox, windows, wl, wm-bot, wm-bot2, wm-bot3, wm-bot4, wmflabs, xy, you, [16:32:42] :O [16:32:48] \o/ [16:32:55] Heya johang [16:32:57] Heya JohannesK_WMDE [16:33:00] hi Coren [16:33:58] !log tools rebooting webproxy with new kernel settings to help against the DDOS [16:34:00] Logged the message, Master [16:37:05] !cyberpowerresponse [16:37:05] and I say what I need, get no response and realize I just wasted the effort of typing what I need. [16:37:24] !Cyberpower678 [16:37:25] addshore, how do you rollback? with the rollback button? :D [16:37:31] : [16:37:32] :D [16:38:03] great toy! [16:40:27] lol# [16:41:31] !cyberpowerresponse [16:41:31] and I say what I need, get no response and realize I just wasted the effort of typing what I need. [16:42:22] !a [16:42:23] There are multiple keys, refine your input: access, account, account-questions, accountreq, add, addresses, addshore, afk, airport-centre, alert, amend, ask, awstats, [16:42:33] !addshore [16:42:34] <(^.^)> [16:42:38] :D [16:42:40] !b [16:42:41] https://bugzilla.wikimedia.org/$1 [16:42:50] !c [16:42:50] There are multiple keys, refine your input: channels, chmod, cloak, cmds, console, cookies, coren, cp, credentials, cs, cyberpowerresponse, [16:45:05] Hi all! I have someone asking if it's possible to host in tool in Tool Labs that uses the users' Wikipedia login credentials. I feel the answer is no but I wanted to check... anyone? [16:47:57] Coren ^^ [16:48:12] Silke_WMDE: i think it's not forbidden, but it is frowned upon. what do they want to do? for getting the watch list, there is this token thing in user settings. but even better would be if they used https://blog.wikimedia.org/2013/11/22/oauth-on-wikimedia-wikis/ [16:49:09] Silke_WMDE: if they need to do something on the site as the user, they should use oauth; if they just want to identify the user, they can use oauth or tusc [16:50:41] JohannesK_WMDE valhallasw He wants to crowdsource error corrections in a grammar tool for dewp. And he was asking if that crowd had to create new accounts in the tool or if they could use their WP login. [16:52:12] Silke_WMDE: so the tool will make edits in the name of users. i think that's a job for oauth. [16:53:09] ok, I'll point him to this. thanks! [16:56:53] Capt. Harris: Be advised, we got DDOS and spiders in the wire down here. [16:56:58] Phantom Pilot: Roger your last, Bravo Six. Can't run it any closer. We're hot to trot and packing snake and nape, but we're bingo on fuel. [16:57:06] Capt. Harris: For the record, it's my call. Dump everything you got left on my pos. I say again, expend all remaining in my perimeter. It's a lovely fucking war. Bravo Six out. [16:57:15] Phantom Pilot: Roger your last, Bravo Six. We copy, it's your call. Get 'em in their holes down there, hang tough Bravo Six. We are coming cocked for treetops. [16:58:46] is tools down? [16:58:58] Under attack. [16:59:14] under attack in which way? [16:59:16] webtools-proxy didn't reboot by now. [17:00:21] Ryan_Lane: see earlier 17:13 [17:00:28] I wasn't online then [17:00:29] hedonil: No, I had only limited attention to give it because I was giving an interview. [17:00:42] I'd go to the logs, but aren't they on tools? :D [17:00:47] Ryan_Lane: Either a DDOS or some really idiotic disrtibuted crawler. [17:01:05] do the logs show it coming from a bunch of places? [17:01:11] Was recursively following links on scripts, and hammering every webserver hard. [17:01:17] Yeah, it was coming from all over the world. [17:01:24] fun [17:01:24] (Though most from cn) [17:02:56] Coren: How is the dewiki database replication? [17:02:59] guess I won't be taking screen shots of any tools for my presentation tomorrow [17:03:09] Silke_WMDE: AFAIK, it's back. [17:03:18] oh good! [17:03:20] Ryan_Lane: With a bit o' luck I should be able to bring things back up shortly. [17:03:26] * Ryan_Lane nods [17:03:46] * Coren turned on syn cookies. [17:04:12] It adds a rountrip, so helps with teh rate. [17:04:44] Disabling wikidata-todo helps; that was the one getting hammered hardest [17:04:55] still 1000s of requests per minute. [17:04:58] hi all [17:05:38] Is there a reason catscan isa acting like a constipated grandma? [17:06:16] Ryan_Lane: tools.wmflabs.org should be back up, if a little slow. [17:06:28] Qcoder00: ಠ_ಠ [17:06:58] Given I'm a very heavey user of catscan [17:07:08] Qcoder00: mostly because something is hammering hard on tools atm. Looks like either a DDOS or a (my guess) spammer scraping bot gone stupid. Stupider. [17:09:56] Coren: It's not by any chance a power user like myself literally hammering the queries away? [17:10:17] it isn't working for me [17:10:18] Qcoder00: Unless you're doing it through a botnet from thousands of IPs, no. :-) [17:10:33] Ryan_Lane: Use https, it seems to not suffer as hard. [17:10:45] Coren : I don't use botnets... [17:10:51] ok [17:10:59] Unless some crackers Zombied me... [17:11:01] :( [17:11:10] * Qcoder00 goes off to check the last Spyware scan results [17:11:44] bleh. no matter what I'm getting blank pages for reasonator [17:12:44] catgraph is still reachable! http://sylvester.wmflabs.org:8090/list-graphs \o/ [17:14:00] JohannesK_WMDE: because it's not in tools ;) [17:14:19] it's on a different vm, yes [17:17:57] tools is unusable atm. verry verry slow [17:19:45] Steinsplitter: Two things hitting us. As usual, miscreants running bots on login, but also what looks like a DDOS on the web stack [17:20:04] oh :( [17:20:17] * Coren is working on some countermeasures. [17:22:06] Coren: Hm.. something is up with tool labs web servers? Requests are taking up to a minute to respond here [17:22:07] https://tools.wmflabs.org/intuition/load.php [17:22:21] Qcoder00: mostly because something is hammering hard on tools atm. Looks like either a DDOS or a (my guess) spammer scraping bot gone stupid. Stupider. [17:22:52] hooray reasonator loaded for me [17:23:03] any other good tools to grab screenshots of? [17:25:43] Ryan_Lane: of course: https://tools.wmflabs.org/wikiviewstats/ 8-) [17:27:08] Ryan_Lane: if it's a presentation, design & GUI count. [17:27:17] # netstat -ant|wc -l [17:27:17] 4816 [17:28:53] 5152 [17:32:22] * Coren uses the BIG hammer. [17:32:26] grep wikidata-todo access.log|cut -d ' ' -f 1|sort -u|(while read r;do iptables -t filter -A INPUT -s $r -j REJECT;done) [17:37:05] Coren: so I should kill chaos monkey, then? [17:37:51] Heh. [17:38:17] I count 5200 distinct /24 hitting tools; most in APAC. [17:40:03] 502 Proxy Error [17:41:05] Ryan_Lane: Does he have a little hat? [17:41:53] marktraceur: http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html ;) [17:41:58] Because if so I don't want to kill him [17:42:02] Too adorable. [17:42:12] I'm aware :) [17:42:22] heh [17:42:55] I'm glad netflix cares so much about Cod Equality [17:43:06] Fish racism has gone far enough [17:43:49] Anyway, /me spectates as event continues [17:43:55] ugh. we've stopped adding labs stats to the engineering report? [17:45:19] * Coren disables HTTP [17:45:48] Is only tools.wmflabs being affected? [17:46:01] Ryan_Lane: please review https://gerrit.wikimedia.org/r/99051 when you have time [17:46:41] heh, I should probably mention that I don't really work for wikimedia anymore :D [17:46:53] matanya: but I'll review it next week when I get a chance [17:47:35] Coren: Ryan_Lane closed my bug report :/ [17:48:03] really Ryan_Lane ? that is suprising to hear. i'm sad ti hear you go :( [17:48:37] Ryan_Lane: HTTPS now works with reasonable performance for me, now that HTTP is closed. [17:49:47] * Coren should probably have a static page on HTTP to explain. [17:50:54] hi, where should i file a bug about beta labs using production components? [17:51:28] i notice that even though most bits come from beta, geoiplookup is for some roason hits production bits [17:55:33] Nope. Stupid bot still hammers even a static page. [17:59:25] Yeah, completely disabling HTTP makes everything happy except legitimate users of HTTP. [18:02:11] The iptables reject didn't fix it? [18:02:35] anomie: Too many IPs to block by hand, and going to /8s would block 85% of the net. [18:02:44] Ryan_Lane: maybe you can use that for your presentation (now it works): https://tools.wmflabs.org/wikiviewstats/?locale=en&lang=en&page=Florida [18:02:53] I see at least 90k different IPs [18:04:43] hedonil: ah, neat [18:05:51] Ryan_Lane: yeah. and even with powered by Labs logo! :-D [18:05:55] Coren: maybe fail2ban can be of use for this? Not sure how iptables performs with 90k rules, though. [18:06:32] valhallasw: It does okay-ish for the most part, the implementation is a reasonably swift hash table. [18:37:44] Coren: I'm still stuck trying to add a header with the lighttpd config. :( [18:38:13] a930913: I'm in the middle of trying to deal with a DDOS, but I can probably sit down and help a bit later today if you want. [18:38:46] :o No problem, I might be on later. Good luck defending. [18:38:56] Coren: The Labs are under attack? [18:39:31] a930913: Probably just a runaway spammer bot trying to recursively scrape webtools. [18:39:49] DDoS or DoS? [18:40:25] Very D. :-) https is unimpacted, and works fine now. [18:42:57] whats going on? [18:42:57] http://tools.wmflabs.org/wikidata-todo/autodesc.js [18:43:28] ah [18:43:28] ok [18:44:00] valhallasw: Good move with the topic. I'm sure I would have thought of it eventually. :-) [18:44:23] It was still set from the previous downtime ;-) [18:44:43] Coren: can you please reopen : https://bugzilla.wikimedia.org/show_bug.cgi?id=57982 ? [18:45:27] matanya: you can do that yourself, right? [18:45:47] although I'm not sure whether you should [18:46:05] valhallasw: technically, yes. but as you said, i hink i shouldn't [18:47:08] t [18:47:49] matanya: Do comment on the bz and explain you understand the caveats. :-) [18:54:48] thanks Coren i will [18:56:38] aaaanyway, why should anyone ddos labs? [18:57:35] lazowik: It's probably a stupid email scrapper bot gone wild. [18:58:11] pff [18:58:55] a930913: what's your issue with adding additional headers? [18:59:03] Coren, https seems to be affected now too … [18:59:31] ireas: No, it's an unexpected side effect, the lighttpds have stuck while http was enabled; so it depends what tool you are looking at. [18:59:40] ireas: I'm about to kick them to wake them up. [18:59:49] Coren, ah, okay, thx [19:00:17] I hadn't checked /that/ [19:01:57] Ah, it's worse than I thought. [19:12:05] Luckily dns and ping are unaffected. Whew! [19:15:59] Coren: ping [19:16:06] my labs instance appears to be down. [19:16:49] jorm: Something funky with labs DNS. On it. [19:16:54] kk. [19:16:57] carry on! [19:18:16] son of a - I *just* pinged tools.wmflabs.com sucessfully [19:19:04] of course I meant ssuucceessffuullyy [19:20:37] Oh, brother. I'm just going to stop saying anything, in the event the DDOSer is *reading this channel* [19:22:09] DDOSer: you suck! :-) [19:48:50] I'm having trouble logging into various labs instances currently [19:49:03] parsoid rt testing is also broken [19:49:09] could that be related to DNS issues? [19:49:41] gwicke: Almost certainly. I'm trying to see how that is related to the DDOS in progress. [19:49:53] okay [20:05:35] grrr - I can't explain a query in tool labs? [20:05:45] that is why I'm there! [20:48:30] manybubbles: Limitation of mysql; can't explain a query unless you have select rights to the underlying table. [20:52:17] Coren: just a note: on toolserver explain is possible on views of replicated databases [20:53:05] Merlissimo: Older mysql; that has since been made impossible in newer ones. [20:53:06] Merlissimo: it is? I thought that disappeared when they upgraded MySQL a while back? [20:53:40] https://jira.toolserver.org/browse/TS-1585 [20:54:22] oh [20:54:58] Coren: sad:( oh well. [20:55:25] still think it would be great to have some way of figuring out if a query is likely to complete in finite time, though [20:55:27] manybubbles: If you're having performance issues with a query, labs admins can run explains for you. [20:55:59] Coren: I worked around it for now. Thanks! [20:56:36] i have just it in the past so much that i today i have all index in my head [20:57:36] yeah, I also use the MediaWiki SQL table definitions often to make my best attempt at making sure the queries are efficient [20:58:08] in case someone wonders where to find it: https://git.wikimedia.org/blob/mediawiki%2Fcore.git/HEAD/maintenance%2Ftables.sql [20:59:12] Fun news. The DDOS that affects Tool Labs was in fact made by a very impressive bot net: wikipedia users. Someone has added javascript that causes a hit on the tools labs webserver for every pageview on several projects. :-) [20:59:24] Sadly, tool labs doesn't quite scale that well. :-) [20:59:46] Coren: Is there a plain to make EXPLAIN available to labs users eventually or is there something irreconcilable about the feature? [21:00:28] halfak: If it becomes possible in MySQL/MariaDB without allowing selects on the private tables, then we'll get it. [21:01:00] Is there an open bug on MySQL for this? [21:01:23] halfak: It's unlikely to be possible, because that has been stated by the MySQL people as a design decision. Perhaps MariaDB would be willing to do something about it, but I expect they'd be hesitant to fork another bit of code without a very compelling reason. [21:01:58] halfak: It's not considered to /be/ a bug by Oracle. [21:02:05] Gotcha. It's too bad. EXPLAIN is probably one of the most important features for a shared DB. [21:02:19] As is a "public view" [21:02:35] Coren: So how's the weather forecast returning back to normal? [21:03:18] yeah, Oracle likes security. You should see what you need to do to explain something against Oracle's db. yikes! [21:03:50] hedonil: Yes, but it will probably take several hours for things to return to normal as user caches expire. [21:06:34] Coren: tip for new tools datacenter in Asburn: 100Gb internet connection and more server power. [21:07:24] hedonil: There is only so much resources we can throw at Labs. :-) That said, the Labs setup in Ashburn /is/ beefier. [21:08:00] Also, the new proxy setup is going to be more resilient to usage spikes (though still couldn't handle enwiki-levels) [21:08:52] In this recent DDOS map I found Floria being a hot spot destination right now http://www.prolexic.com/plxpatrol/ [21:10:44] *Florida [21:10:53] Coren: even prod couldn't handle enwiki load in some cases... a number of "features" have taken down the cluster when deployed [21:12:08] 157.56.92.142 - - [04/Dec/2013:21:11:44 +0000] "GET /wikidata-todo/autodesc.js HTTP/1.0" 200 24385 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" [21:12:15] et tu, Mircosoft? [21:15:14] ^^ [21:15:54] Coren: supposedly googlebot executes (or at least reads) JS sometimes. why not bing? :) [21:20:25] Coren, I haven't found anything interesting, but DNS also seems somewhat happier now. [21:20:34] So, probably it was a symptom of the ddos… maybe [21:20:55] That sounds a little dubious, but we'll wait-and-see [21:21:25] Oh -- was the ddos not a network congestion thing? [21:21:48] andrewbogott: No, it wasn't a flood so much as too many web requests. [21:22:02] Oh! Then probably not related :( [21:22:05] heh [21:22:06] so my switching the script over to HTTPS caused even more problems :D [21:22:10] Are you still seeing bad behavior? [21:22:22] andrewbogott: Not offhand. Lemme make some tests. [21:22:51] andrewbogott: Seems reasonably snappy now. [21:23:23] I restarted ldap and pdns -- two more random stabs. [21:28:50] jorm, can you reach your instance now? If not, what's the name and project? [21:30:55] funny edit summary in hindsight, https://en.wikipedia.org/w/index.php?title=MediaWiki:Wdsearch.js&diff=584561692&oldid=584449305 [21:30:59] legoktm: :) [21:31:09] indeed :D [21:31:22] who did the translation? nemo? [21:35:07] Coren: but honestly ~1Gb/s can bring this box down. tell me that I'm missing something :) http://ganglia.wmflabs.org/latest/graph_all_periods.php?h=tools-webproxy&m=load_one&r=hour&s=by%20name&hc=4&mc=2&st=1386192645&g=network_report&z=large&c=tools [21:36:03] Coren: normal tcp handshake or syn flood? [21:38:28] hedonil: Full handshakes; but it's a little VM with no load balancing nor any sort of mitigation. I'm surprised it didn't crumble entirely. :-) [21:39:36] Coren: Ahh, bigger box soon :-D [22:00:36] My edit topic functionality is malfunctioning -- could someone else remove the DDOS notice? [22:00:48] andrewbogott: sorry, was at the dentist. and no, i cannot. it is unicorn.wmflabs.org [22:02:43] jorm: Looks like it was a casualty of the virt11 crash… it was in a 'shutoff' state. I just started it… now I get 'it works!' at http://unicorn.wmflabs.org [22:02:46] Is that what you'd expect? [22:03:14] seems to be back up, and yes! thanks! [22:10:50] so is the breakage also why I have jobs that are sitting queued for 6 hours? [22:13:44] MrZ-man: Not likely. Lemme see. [22:14:01] Oh, ow! [22:14:06] Need moar nodes! [22:14:28] Thankfully, exec-01 has drained so I can now put it back in rotation after a quick reboot. [22:15:32] !log tools tools-exec-01 rebooted to fix the autofs issue; will return to rotation shortly. [22:15:34] Logged the message, Master [22:16:48] MrZ-man: We're just out of slots for jobs. I'll add an exec node or two tomorrow to increase capacity. In the meantime, we are going back from 7/8 to 8/8 now. [22:17:21] k [22:17:44] That should help catchup the job backlog fairly quickly. [22:19:23] looks like the oldest queued one just started now on 01 [22:44:36] !newweb [22:44:36] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb [22:56:00] Coren|Dinner: Poke? [23:36:25] Web 4.0 [23:54:44] Coren|Dinner: ping [23:55:02] Semipong. I'm about to leave to the vet. [23:55:12] So if it's a quickie, I might be able to help. [23:55:24] Coren|Dinner: not this urgent [23:56:32] Coren|Dinner: http://tools.wmflabs.org/paste/view/5b16fecb - multiple webs running? [23:58:00] hedonil: Normally shouldn't happen, but might have occured during the overload, I expect that all but one of them are idle and pointless. No harm in killing them and restarting one. [23:58:17] Coren|Dinner: k.