[16:01:18] hey halfak, Ironholds, I'm going to ask on list, but I'd like to turn Hadoop off for a few minutes tomorrow morning. do either of you have specific objections? [16:05:31] Ironholds is taking a sick day at least for this morning and he's on PTO tomorrow [16:06:15] Can you tell if he has any jobs running now or scheduled for tomorrow? [16:14:44] he does have a few now, ha, he launched it 2 hours ago [16:14:59] he's the only one, actually :) [16:20:04] ottomata, no worries here. I'll be running tests, but they can be killed if you need to. [16:20:34] ok cool [17:44:20] hey Nettrom [17:44:36] Any chance we can grab you for the Research Group meeting? [17:44:49] halfak: yep [17:44:57] Cool [17:44:57] just waiting for the 15 mins of Q3 to be over :) [17:45:17] Q3? [17:46:09] Etherpad said something about Q3 stuff, I figured that wasn't particularly for me [17:48:38] Oh! of course. [17:48:48] Sorry I should have told you I could pull you in afterward [17:48:53] Nettrom, [17:48:55] ^ [17:49:06] Can still do that. This might take a bit. [17:49:13] We started late :( [17:49:22] halfak: no worries, I'll pay 25-50% attention [17:49:50] excellent :) [17:50:17] Hello every body, here a french speaker socio-anthropologist who are discovering wikimedia research and MetaWiki in general. Do you thing this kind of article can interest your team ? : https://meta.wikimedia.org/wiki/A_Wikimedian_ambassador_in_south_of_India [18:15:50] ZZZZzzzZZZ ? [18:15:50] Ironholds, will you join research meeting? (sorry if it was discussed. I just noticed you're not in the meeting) [18:15:51] hi LionelScheepmans ! I think most of the WMF people here are currently busy in a meeting, sorry :/ [18:15:51] Ok, I will be back later. Thanks Nettrom [18:43:02] hey halfak [18:43:03] do you have a sec for a quick update? [18:43:08] literally 5 mins [18:43:27] meh, scrap that – I know you’re presenting [18:43:31] let’s talk after metrics [18:47:14] kk Dar [18:47:15] Tar [19:09:14] halfak: do you have slides that I can tweet via WikiResearch? [19:21:41] bleeeeh [19:23:32] clap clap clap, yay halfak! [19:23:45] Thanks Emufarmers :) [19:57:51] FYI, I"m going to do a little refactoring of the webrequest refined table. i don't see any queries currently using it, but it will be funky for a few minutes [20:09:25] wmf.webrequest table should be back and just fine [20:21:56] ottomata, what sort of refactoring? [20:28:31] mostly oozie [20:28:33] names, etc. [20:28:43] we also made the table into an external one, for safety [20:28:50] external ones don't delete data on hdfs if you drop them [20:28:54] that was mainly the change i did [20:29:00] that made the table blink [20:29:05] i had to remove it and rename it and recreate it, etc. [20:29:10] re add partitions [20:29:21] i'm now testing that the oozie jobs with the new dataset names work [20:29:25] cool! [20:32:36] http://guerillero.net/geolocation-fun/ - Guerillero did some awesome work with the geotags dataset. Check it out! [20:37:07] Ironholds, feeling better? [20:37:16] yeah, doing some Java :) [20:37:21] woke up with stomach leurgies [20:37:25] Speaking of Java [20:37:30] * halfak goes to make more coffee [20:37:34] brb [20:46:02] Isn't this pretty [20:46:02] http://guerillero.net/content/images/2015/Jan/20Breaks-1.png [21:01:32] Woops. Got coffee forgot to say "back" [21:01:47] * halfak googles "leurgies" [21:30:12] DarTar, JFYI I'm around for the rest of today, including our 1:1 [21:30:32] today's stupid API discovery: you can filter external links to only those involving particular URI schemes. You may select a maximum of one possible URI scheme. Doy. [21:36:14] Ironholds, should probably just use the DB. [21:36:33] YuviPanda, did we ever discuss a quarry API? [21:36:40] nope [21:36:42] halfak, yeah, it's for WikipediR [21:36:42] should we? [21:36:47] Not sure. [21:37:03] Was just thinking that the stuff that Ironholds is looking at with external links would be better served by a database query [21:37:09] (like most things related to the API) [21:37:26] you can just use mysql + ssh and connect to labsdb from anywhere... [21:38:03] YuviPanda, how about a JS gadget? [21:38:15] halfak: hmm, how / why? Also would need authentication [21:38:27] Indeed it would. [21:38:32] Just lightly thinking about it :) [21:41:55] no, streaming first, all other things second! [21:42:13] streaming first, shaving second! Streaming first, your loved ones second! Streaming first, sating your need for sustenance second! ALL HAIL THE STREAMING. [21:42:36] s'ok. I've got some leftover chinese food within reach :) [21:42:45] And Jenny works from the same office that I do. [21:42:51] STREAM ALL THE THINGS [21:44:28] ugh, now I want chinese food [21:44:44] halfak, is it from that place near you with the incredible pork-based fried rice and the tremendously large portions? [21:44:56] It is :) [21:44:58] brb moving states again [21:45:11] I heard about the General Tso documentary and had to get some and eat it. [21:45:23] http://www.thesearchforgeneraltso.com/ [21:45:35] would you describe it as Tsopreme, Tso-Tso or Tso bad it's good? [21:45:49] Tsopreme [21:45:53] fo sho [21:47:46] uggghghghghghgh [21:47:52] ...whelp, I know what I'm ordering for dinner tonight [21:49:28] https://c1.staticflickr.com/5/4134/4899429947_b60fb08e94.jpg [21:53:12] tyo [21:53:14] *yup [22:01:55] Hey J-Mo, do we have an IRC channel for IEG or grants? [22:02:56] #wikimedia-hella-awesome-people? [22:03:10] halfak: we're on our way [22:03:17] connecting one bit at a time :) [22:06:19] takeout ordered [22:06:24] * Ironholds nods firmly [22:07:23] halfak: nope. AFAIK not many folks in grantmaking (or on the grant committees) do IRC [22:07:30] LAME [22:07:31] k [22:07:34] thanks [22:07:57] i know, right? I was a late adopter of the might chan, but I definitely appreciate it now [22:09:38] I've been on IRC since I was nine [22:09:38] I win! [22:10:08] So have I. [22:10:24] August 2001! [22:10:52] * Ironholds high-fives [22:11:27] * YuviPanda hmpfs [22:11:34] I was writing code since I was 9! [22:11:38] * YuviPanda wonders if he wins anything [22:11:56] I created my first web page when I was 5 [22:12:17] o.O [22:13:06] Ironholds: back [22:13:26] YuviPanda, wasn't it VisualBasic? [22:13:31] I think you actually lose points for that. [22:13:34] * quiddity grumbles about kids on his lawn. Controlling synchronized herds of drones. [22:13:34] Turbo C [22:13:41] and GWBASIC [22:13:46] Logo! [22:14:20] the OLPC has a nifty version of Logo on it. if I ever had time+energy at once, I'd investigate further. [22:15:14] Ironholds: so, I could use some time to GTD, but happy to hop on the hangout if you want to chat. Most important thing right now is that you join the call with Michelle [22:15:36] I will [22:16:10] danke [22:23:53] halfak: great presentation! [22:24:02] \o/ thanks dude :) [22:24:04] where can I download the models you've got so far? [22:24:13] I hate our API. I hate our API. I hate our API. [22:25:00] great indeed! :-) [22:28:05] ragesoss, no models ready for use yet, but I'd be happy to send you one i've been testing [22:28:22] It's fitted for revert detection in enwiki. [22:28:29] Let me know if that sounds interesting. [22:28:57] FWIW, my next step is to rescale the features and I think that will push accuracy a lot :) [22:29:13] I'm just curious about what a model looks like. It sounds like something we will want to build into the Dashboard system we're building. [22:30:02] doing machine-learning-related stuff on student edits is a big part of our roadmap. [22:30:09] at least, potentially. [22:30:52] halfak: have you followed at all the progress of Jake and James Heilman's plagiarism bot project? [22:31:15] ragesoss, negative. Link? [22:32:01] halfak: https://en.wikipedia.org/wiki/User:EranBot [22:32:20] Oh yeah! Saw that come by. [22:32:30] The basic concept is, systematically send individual revisions through iThenticate (plagiarism checking service) [22:32:58] ragesoss, check out http://nbviewer.ipython.org/github/halfak/Revision-Scoring/blob/feature_work/ipython/Demo_LinearSVC.ipynb for an example of how scorer training and classification works. [22:33:36] but it would make more sense in the long run in a similar infrastructure: you want to have a service that anyone can use to get plagiarism check info about an individual revision, and then use that in different ways -- without needing to hit iThenticate for each time anyone wants info about any revision. [22:34:49] +1 this could fit into the revscoring model. [22:34:57] The idea is to keep "scoring" as a general concept. [22:35:13] Meeting. Be back in 30 min. [22:35:39] bye fore now [22:35:48] o/ good to chat [23:34:47] and of course last_touched has a different timestamp format to every other mediawiki timestamp [23:34:51] grr [23:35:25] Ironholds: if you’re in the hangout with Michelle, we’re getting there: mac mini meltdown [23:35:32] grand