[00:00:19] I can extract IDs from 1k revisions with a cludgy regex in 2.8 seconds. With MWP, it took 88 seconds! [00:00:25] Almost two orders of magnitude. [00:00:29] Yeah DarTar [00:00:49] So, if you go to https://en.wikipedia.org/wiki/biology (notice the lower case) [00:01:00] You'll get a 301 redirect that will reload the page. [00:01:03] yup [00:01:07] that’s as expected [00:01:35] this type of redirect hasn’t changed behavior AFAIK [00:01:38] But "Bilology" will give you a 201 to "Biology" [00:01:42] Indeed. [00:01:47] Where are you seeing page reloads? [00:02:28] Hey J-Mo [00:02:36] hey dude. [00:02:41] in the hangout now [00:02:47] So, we should finalize housing for CSCW. [00:02:54] yeah. [00:02:59] hangout? [00:03:01] halfak: /Obama -> /Barack_Obama [00:03:21] I is in it. Meet me there! [00:03:32] MW used to serve transparently /Barack_Obama when visiting /Obama [00:03:45] I'm only getting one 200 [00:03:48] DarTar, [00:04:03] I’m still getting a 200 too, which is why this is confusing [00:04:06] J-Mo, was just going to ask lzia if she's still planning to push on the airbnb option. [00:04:14] DarTar, we should be getting a 200 [00:04:17] oh I talked to travel, [00:04:47] quick update is: conference venue is running out of rooms, they advised to book [00:04:59] and if we still decide we want to go elsewhere presumably we could cancel [00:05:16] so can we just book? I like the idea of an AirBnB in theory, but this seems like it's more complicated than it's worth [00:05:24] but I don’t see lzia driving the airbnb search given her big visa issues [00:05:32] J-Mo: yes, I got to the same conclusion [00:05:37] J-Mo, our status is "booked" pending an AirBNB [00:06:01] ah, perfecto. I will carry on not thinking about this then ;) [00:06:13] meeting adjourned! [00:06:14] J-Mo, +1 I think we're safe. :) [00:06:20] cool [00:06:31] halfak, so back to redirects [00:06:33] So DarTar about them redirects [00:06:35] yeah [00:06:41] I'm not seeing weird behavior [00:07:13] there used to be a single 200 with the request for the source page and the browser would *not* reload the target page [00:07:31] I have not seen any "reloads" [00:08:12] what happens when you request /Obama [00:08:27] the behavior of normal MediaWiki redirects changed pretty recently [00:08:44] So when you say, "/wiki/Obama", varnish responds with "here's /wiki/Obama" but secretly, it's actually "/wiki/Barack_Obama" [00:08:45] ah Emufarmers I’d love to know more about this :) [00:09:07] platform denied any recent changes in MW redirect handling :) [00:09:24] halfak: that’s what I mean, and it’s not happening any more [00:09:35] my browser now actually loads /Barack_Obama [00:09:39] DarTar, what do you mean. It is right now. [00:09:48] Yeah, that's the old behavior; I only noticed the new behavior in the last couple months [00:10:01] DarTar, are you watching the HTTP requests/responses. [00:10:05] ^? [00:10:08] I only see one [00:10:14] For text/html that is. [00:10:16] It's a 200 [00:10:28] halfak: I am and that’s where it’s confusing [00:10:34] The request has /wiki/Obama, but the response is for /wiki/Barack Obama [00:10:40] legoktm: do you happen to recall seeing the change that might have affected this? [00:10:46] This is the behavior I have always expected. [00:11:12] I still don't understand what the problem is. :( [00:11:32] heh, if you jump on a hangout I can show you [00:11:48] To the batcave! [00:11:53] hi [00:12:13] ah yes, it's MatmaRex's fault [00:12:13] As most things are. [00:12:18] * legoktm finds change [00:12:41] halfak: give me a sec, I want to see what the change is [00:13:06] https://gerrit.wikimedia.org/r/#/c/143852/ [00:13:45] legoktm: ah, excellent [00:13:51] that explains everything [00:14:04] :P [00:14:07] the HTTP code remaining the same but a visible change to the end user [00:14:26] I was going to suggest that, but I thought we'd already been doing that [00:15:46] I killed DarTar [00:15:49] there are follow up patchsets for IE breakage and stuff, but that's the main one [00:15:57] It was probably the extract battery juice the hangout took [00:16:22] TL;DR: Nothing to see here folks. Are data are safe. [00:16:27] Ooh, I didn't realize it fixed the double-back-button issue too [00:17:10] Ironholds, FYI ^ [00:17:23] See my TL;DR about DarTar's redirect concerns. [00:17:29] *our [00:17:31] thanks [00:17:34] thanks for the quick answer folks, that makes me feel better [00:17:35] * halfak facepalms [00:18:12] halfak: still wondering if this may affect other functionality that relies on reading the URL from the client [00:18:19] like share a fact [00:18:27] but if anything, that’s a desirable change [00:18:33] Should be even better now [00:18:35] o/ (ping me if you need anything else) [00:18:36] yup [00:18:37] Yeah :) [00:18:41] o/ legoktm [00:18:42] thanks legoktm [00:38:47] DarTar, https://pypi.python.org/pypi/mwcites [00:39:03] I just updated to include DOI extraction and I'm running it to generate another dataset. [00:41:00] \o/ [00:41:29] nicely done [00:42:35] halfak: did you see Nemo_bis’s comments on the OA list? [00:42:57] I also need to reply to Jo and Pete Binfield but I wanted to hear back from you first [00:43:24] I'm not on the oa list, so I don't think I'm seeing the replies. [00:44:48] Oh wiat. I do see an email from pete. [00:44:49] * halfak reads [00:52:20] Ironholds: do you know if the translation of Special:... in different languages exist somewhere I can look up? [00:52:38] ah; yes! [00:52:58] so, you want to grab the NamespaceNames and NamespaceAliases fields from the API [00:53:39] leila, example: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&meta=siteinfo&format=json&siprop=namespaces|namespacealiases [00:53:57] and it will have per-project localised versions [00:54:50] Ironholds, thanks! should I look at specialpagealiases? [00:57:36] oh! [00:57:48] you meant the localised ... not the localised Special:? [00:58:12] yep; that should work :) [01:11:00] Bah! You must remove the period at the end of the citation from the end of the doi! [02:20:19] fhocutt, Just got a chance to read through your README. Nice work! I feel run my own matchbot with out much time spent getting familiar. :) [02:20:42] thanks, halfak. [02:20:53] This is definitely one of the more thorough READMEs :) [02:21:30] that was the idea, partly out of trying to make using json for the config as painless as possible [02:21:38] I wish it allowed comments [02:21:45] yaml! [02:21:53] I highly recommend it. [02:21:59] It's a superset of JSON [02:22:13] So, you could just dump JSON into the yaml loader and it works. [02:22:25] hm, handy [02:22:26] It supports comments and it's a bit cleaner than JSON [02:22:35] pyyaml is a convenient library [02:22:42] Same behavior as json [02:22:48] load/dump loads/dumps [02:23:12] Anyway, I've got to run for now, but I'll look more deeply tomorrow. [02:23:17] * fhocutt notes [02:23:20] Have a good night. [02:23:30] thanks, you too. [02:27:12] Seriously debating ordering some kind of custom hybrid between a robe and a hoodie. [02:27:24] Basically I want a hoodie I can bury myself in. [02:34:18] Ironholds: sounds cozy. [02:34:39] Hopefully! [02:34:46] * Ironholds is trying to work out what the heck to do with his vacation [02:36:30] SF is having very nice weather for the time of year [02:38:19] yes, but then I'd be in SF ;). Lots of nice people: also a lot of bad memories and evocations. [02:38:49] I suspect it would slightly reduce my ability to avoid work, too. [02:39:35] oh, probably. :D [08:10:34] halfak: You around by any chance? [13:21:37] qchris, hey dude. [13:22:05] I see that I accidentally started a big process on stat3 without Nice. [13:22:23] Oh woops. Just saw the PM. Looks like it is something else. [13:32:41] halfak: Sorry. Gotta run. Is the stat1003 something urgent that needs attention right away? [13:33:47] I'll be back in ~8 hours. [13:34:05] If you need to kill something on stat1003 (stat1003 was not an issue), please ping ops, [13:34:14] as I do not have root on stat1003 anyways. [13:34:21] ttyl :-) [13:34:23] * qchris waves [15:09:01] morning halfak :) [15:29:39] yo, milimetric :) [15:36:19] wb, halfak :) [15:36:39] hey Ironholds [15:36:42] how goes?@ [15:37:05] Just switched monitors and was using the build-in USB hub to support my ethernet dongle. [15:37:07] Not bad. [15:37:13] Hacking on some strategy stuff. [15:37:23] Oh! I'm working on a new island grammar. [15:37:28] It's for parsing DOIs. [15:38:35] awesome! [15:39:07] I'm writing endless UDFs :D [15:39:32] :D UDFs --> less time Oliver spends on adhoc requests, right? [15:39:54] hopefully? This morning I've written a "tell me if this request is zero-rated or not" UDF and got most of the way through host parsing [15:40:01] which is really fun because it outputs a hashmap, not a string [15:40:23] so you'll be able to say hostMap(uri_host)['project_class'] == "wikimedia", or something. [15:40:33] or just return project_class. Or return both! [15:40:47] What's the other think in "both"? [15:40:49] *thing [15:43:00] project_variant [15:43:20] so, "en.wikipedia.org" = {"project_variant":"en","project_class":"wikipedia"}; [15:43:37] "commons.wikimedia.org" = {"project_variant":"commons","project_class":"wikimedia"} [15:43:52] the language is deliberate, because we can't say "language code" (what about neutral or multi-lingual projects? Argh) [15:53:40] halfak: they wouldn't let me reserve a court right after squash class, but we've got it from 12:30, hopefully we can just show up around noon andplay [16:03:46] hrm [16:03:52] symbol not found my ass! [16:03:57] I just explicitly imported you! [16:03:59] * Ironholds grumbles at code [16:12:13] good morning Ironholds, halfak. [16:12:25] o/ [16:12:37] Ironholds: sorry that I disappeared with no notice yesterday. [16:12:54] the link you sent solved my problem with special: translation. thanks! [16:14:55] Nettrom, sorry I missed the ping. 12:30 will be good anyway. We can always hit the gym first. [16:20:10] I suspect that it won't be a problem and we'll get the court right after class, but their system doesn't allow reservations [16:21:49] I see. That might be John's will -- so that we don't kick students off the court right after class ends. [16:27:24] leila, that's okay. [16:50:44] hey Ironholds: I missed your ping, hi :) [16:50:57] milimetric, no problem, was just saying hi [19:20:04] halfak, reasons to love data.tables no.300 [19:20:08] stratified sampling [19:20:10] dataset <- data[,j=.SD[sample(1:.N,100000),], by = "type"] [21:36:30] hey halfak, DarTar, Ironholds, others: how does one request/create a new IRC chan? [21:36:50] J-Mo [21:36:53] Join it [21:37:03] muahaha! [21:37:04] And then learn you some chanserv to register and set up shop. [21:37:10] * halfak googles for chanserv commands [21:37:14] J-Mo: it’s a wiki [21:37:15] neat. [21:37:25] https://blog.freenode.net/2008/04/registering-a-channel-on-freenode/ [21:37:46] thanks halfak [21:39:14] FYI, ChanServ seems to be a bit slow today. [21:39:28] So it might not respond immediately to commands. [23:02:31] halfak: Sorry for running early today just after your pong. [23:02:42] Glad to read that my killing did not do any harm. [23:02:56] on that note.. ;p [23:03:09] something in the cluster...blew up, again. [23:03:19] Ironholds: Bob is not running any major queries. just so you know, since your query has crashed. I checked with him and he said he will send an email to research-internal once he's ready to push the heavy job to the cluster. [23:03:21] really? [23:03:40] qchris, yup! Want me to throw the error into a phab ticket? [23:03:48] it might just be a UDF problem or an input-reading problem, of course [23:03:52] (more likely the former) [23:04:12] Phab ticket? For ottomata to look at when he comes back? Sounds like a great idea! [23:04:13] :-D [23:05:11] Cough. stat1002 had a load of 120. Yes. That looks bad. [23:05:21] But it seems to have resolved ~30 minutes ago. [23:06:22] yup, totally [23:06:29] and yep, that's about when my job died ;p [23:07:07] Your job ... is that a Hive Job, or a local job? [23:08:18] Because ... it sounded like a Hive job (UDF). [23:08:31] And that should not raise the load on stat1002 that high. [23:08:38] hive job [23:08:49] I'm not sure what could be doing anything on stat1002 :/ [23:08:54] define load? mem or proc? [23:09:13] http://en.wikipedia.org/wiki/Load_(computing) [23:09:15] ^ load. [23:09:33] As in: [23:09:35] http://ganglia.wikimedia.org/latest/graph_all_periods.php?h=stat1002.eqiad.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2&st=1423091063&g=load_report&z=large&c=Analytics%20cluster%20eqiad [23:10:02] In the ganglia graphs ... If the grey surface is above the red line that is bad. [23:10:13] s/surface/area/ [23:10:16] aha [23:10:23] ew! [23:10:57] I mean, the only thing running appears to be one of EZ's things. [23:10:59] * Ironholds thinks [23:11:14] I don't suppose the query could be...I dunno. buffering the result set locally or something? [23:11:20] I can't think of why in god's name it would do that, but... [23:11:40] I do not think that a Hive UDF would do that. [23:12:00] hey, mister. I don't appreciate that kind of pessimism. [23:12:13] You suggesting there are limits to how many things my slapdash code could break?! Hmph! [23:12:24] I'll have you know I've written world-class terrible, side-effect-populated code in my time! [23:12:26] ? Ok. ... Then optimism. It totally is your UDF? No. That's sounds wrong. [23:12:37] Oh, I know, I'm just being twee ;p [23:12:52] I can't see anything else going, though. [23:13:28] Hahahaha. I am having a strange conversation and I am having fun. But I am not sure what we are talking about. [23:13:30] * qchris smiles. [23:13:38] I was mocking my own code quality. [23:13:59] Hmn. I mean, I don't see any other tasks on stat2, but then I wouldn't if they got killed when the load went high. [23:14:00] I got that part :-/ [23:14:01] * Ironholds shrugs [23:14:20] It might be a hdfs mount issue again. [23:14:26] I'll leave the phab ticket for otto (who will I am sure be very pleased to come back from holiday and be met by that on day 1 :D) and mark the QAwerk blocked until then [23:14:27] * Ironholds nods [23:14:30] Still haven't had time to check the analytics100[12] logs. [23:14:47] Yup. Do so. [23:14:52] Sounds good. [23:17:49] * halfak --> train --> home [23:17:54] back in ~ an hour [23:50:10] * Ironholds twiddles thumbs