[00:04:39] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality , 10rsaas-editquality, 07Spike: Can we switch from rf model to gb to save memory? - https://phabricator.wikimedia.org/T139963#2467592 (10Halfak) a:05Halfak>03Ladsgroup [00:04:57] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality , 10rsaas-editquality, 07Spike: Can we switch from rf model to gb to save memory? - https://phabricator.wikimedia.org/T139963#2448200 (10Halfak) https://github.com/wiki-ai/editquality/pull/40 [16:10:17] o/ Amir1 [16:10:23] Turns out I had some volunteer hours today [16:10:38] So I'm around for a bit to introduce a new volunteer :) [16:10:56] o/ rjustin [16:10:58] Welcome! [16:11:16] This is our main communication space for AI projects in Wikimedia. [16:11:26] Hello :) [16:11:36] Mostly this is where my team -- Revision Scoring as a Service lives. But we have a lot of people working on other projects here too. [16:12:05] We use Phabricator for asyncronous task tracking. Check out https://phabricator.wikimedia.org/tag/revision-scoring-as-a-service/ [16:12:26] That board contains everything we hope to get done in the timescale of ~1 week. [16:12:34] (with some notable exceptions -- heh) [16:12:49] This board has all of our outstanding tasks: https://phabricator.wikimedia.org/tag/revision-scoring-as-a-service-backlog/ [16:13:27] *viewing* [16:13:53] Amir1, is the other major mover in the AI projects. He's in Iran, so his timezone doesn't match with ours all the time. [16:14:27] Hi Amir! [16:14:39] schana, is a software engineer at the WMF. He's just getting onboarded to our team. [16:15:04] ToAruShiroiNeko, used to work with us based on some grant money, but that ran out, so now he contributes when he can. [16:15:50] SoniWP and Sabya have been working on some projects in a volunteer capacity. SoniWP is doing some work on spammy new page creations and schana is working on some text processing signal extraction stuff. [16:16:27] hey, what exactly does this group do? [16:16:33] Many of the other people in this channel are expert-users of the systems we develop or engineers/analysts who work with us more distantly. [16:16:48] o/ rg_ [16:16:54] Our biggest platform right now is https://ores.wikimedia.org/ [16:17:00] It's machine learning as a service. [16:17:19] hmmm... interesting [16:17:21] Right now, we have high fitness prediction models for edit quality (is this vandalism?) and article quality. [16:17:35] and it tells users where edits are required? [16:17:53] rg_, much more for reviewing Special:RecentChanges [16:18:10] got it [16:18:29] :) [16:18:44] i'm a machine learning researcher, so very interested in learning more about what you're working on [16:18:48] so any user can use it to review recent changes to any article? [16:19:01] hi Ross :) [16:19:06] new here^ [16:19:32] yeah, i've stopped by a few times, but this is the first time others have been around [16:19:32] rg_, cool! I'm Aaron Halfaker -- a principal scientist at the Wikimedia Foundation. We should catch up. [16:19:44] nice to meet you Aaron [16:20:01] i'm sort of a machine learning artist [16:20:09] rg_, must not be in North America. We're usually active in here during the daytime UTC-5 -- UTC-9 [16:20:21] i'm in NYC [16:20:34] so i must've just dropped by during strange hours [16:20:39] Hmm. Weird. Maybe it's because most of our activities are on the weekdays. [16:20:44] Some days are quiet too :) [16:20:53] yeah, no worries! [16:20:57] so, rjustin, I understand that you are interested in design work -- at least to start. [16:21:10] And when I say "design" I mean, "front end everything" [16:21:43] interface development, yeah and ux in general kind of comes along with ui dev [16:21:56] front-end is my interest [16:22:06] My goal today is to get (1) give you a quick map of what's going on, who people are, and how we coordinate and (2) get you some introductory tasks so that you can start playing around. [16:22:36] perfect [16:22:38] (1) is mostly done for now, so I'd like to start on (2) :) [16:22:53] ^-^ [16:23:22] I've been thinking about this in prep for your arrival. :D There's an outstanding task that I think is important and would give you some freedom to explore. [16:23:49] https://phabricator.wikimedia.org/T102335 [16:24:21] So, quick into into Wikilabels. https://meta.wikimedia.org/wiki/Wiki_labels [16:24:36] Wiki labels is a system for allowing Wikipedia editors to help us train our models. [16:25:10] Essentially, it shows random samples of stuff to people and let's them label it. [16:25:22] E.g. edits to articles as "vandalism" [16:25:35] snippets or full edit? [16:25:43] full edit. [16:25:45] k [16:25:58] Could do snippets, but we'd have to do a bit of engineering work. [16:26:25] We have labelling "campaigns" active right now for labeling the "type of edit" and whether or not the topic of articles is "academic" or "pop culture" [16:26:45] and campaign is? [16:26:48] We'll use the data from these campaigns to train new prediction models. [16:27:15] A campaign is a specific random sample of stuff (edits, pages, users, whatever) and a form to complete when labeling. [16:27:21] figured [16:27:21] kk [16:27:29] Users request "worksets" from the campaign and label the items in them. [16:28:19] So, right now, when we want to set up a new campaign, I have to run some command-line utilities that we've build directly against the database. [16:28:27] Or sometimes I even need to run SQL directly! [16:28:41] and your campaigns expire and then you introduce a new one with new variables with different random sample sets? or is each campaign delineated based on duration alone? [16:29:14] ok, so setting up campaign could use ui [16:29:23] Not sure I understand the question. But it may help to say that sampling methods can differ arbitrarily between the campaigns. [16:29:31] Yeah exactly. [16:29:37] I think this UI will be mostly CRUDy [16:30:02] But there are some considerations we'll need to work through. [16:30:11] I guess what I'm asking is: is each campaign completely randomized data sets, or is is somewhat contrived at all by the campaign creator? [16:30:27] It's all on the campaign creator to work that out [16:30:38] that's what I was wondering. ok, cool [16:30:38] The Wiki labels system has no facilities for actually generating samples. [16:31:38] We have some other systems that would be effective in generating samples. E.g. quarry.wmflabs.org. [16:32:04] That system lets you query replicas of the database behind Wikipedia/Commons/Wikidata/etc. from a web UI [16:32:24] All you need is a Wiki(m|p)edia account [16:32:49] Eventually, I'd like the CRUDy interface for Wiki labels campaigns to be as open as Quarry. [16:33:00] I think we have a long way to go before that will make sense. [16:33:20] E.g. our campaign browser can hardly support N campaigns where N is a large number. [16:33:38] due to? [16:33:46] Bad design. :) [16:33:50] front-end? [16:33:55] Anyway, I thought it might be nice to take a walk through the Wiki labels interface and you can see for yourself. [16:34:05] Yeah just front-end. The DB and all that backend should be able to handle it. [16:34:07] yes :) that sounds great [16:34:14] roger [16:34:25] Can we do a quick call over hangouts? [16:34:31] yup! [16:34:45] * halfak calls [16:36:09] Log into Wikipedia and go to https://en.wikipedia.org/wiki/Wikipedia:Labels [16:53:32] labels.wmflabs.org [16:57:09] rjustin: hey, [16:57:17] halfak: hey there, I was afk for dinner [16:57:46] I hope I'm not too late :D [17:05:02] o/ Amir1 [17:05:14] Giving rjustin a demo of Wiki labels and the backend system right now :) [17:15:37] nice, Tell me if I can help with anything [17:15:43] nice to meet you rjustin :) [17:18:07] rjustin, https://github.com/wiki-ai/wikilabels [17:19:04] nice to meet you as well, Amir! [17:24:42] thanks [19:55:59] ello [20:54:29] halfak so my country had its shooting too [20:54:42] I think someone didnt get our memo of less guns :/ [20:58:03] ToAruShiroiNeko you and white_cate same person? [20:58:13] indeed [20:58:19] the more you know (: [20:58:29] ToAruShiroiNeko = a certaub white cat [20:58:37] *ToAruShiroiNeko = a certain white cat [21:00:43] oh I got it ... とある白い猫 means white cat [21:01:21] a certain white cat :3 [21:02:42] certain about about what [21:03:05] I am not any white cat, I am that certain specific white cat :3 [21:05:58] I am trying to make sense :P [21:06:32] a white cat which is certain [21:06:44] or certainly a white cat [21:06:54] its a play of words [21:06:57] it can be both [21:07:54] ahh paradox [21:10:01] where do you live by the way