[01:13:04] halfak: I wrote a simple gadget that will display current prediction on a page! [01:13:07] needs visual design though [01:18:19] halfak: hmm we have no precache for wp10 model [01:18:21] interesting [12:50:30] In case anyone is interested: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey#Machine-learning_tool_to_reduce_toxic_talk_page_interactions [15:50:50] yuvipanda, we should be precaching wp10 model [15:51:10] It's probably that all the edits you are loading the interface for are older than our last model release. [15:52:47] Fun times. I mentored the guy who wrote the first study (that I know of) about toxicity in League of Legends. [15:53:00] Helder, ^ [15:53:15] Guy works for Riot now. I've been meaning to ask him how it's going. [15:57:33] http://kraut.hciresearch.org/sites/kraut.hciresearch.org/files/open/shores14-Deviance%26ImpactOnRetentionInOnlineGame.pdf [15:58:17] I'm a shadow author on that one. I helped a lot with the data analysis and paper direction. :) [16:20:08] cool! :) [16:21:36] We used that same dataset in this paper: http://arxiv.org/abs/1411.2878 [16:21:37] :D [18:52:08] halfak: I filed a bug for ORES in phab, about getting the quality of a page given only its page name and wiki rather than revid [18:52:12] (should get latest) [18:53:43] yuvipanda, yeah, I've been thinking about that a lot. It seems like we should have a nice way to do that, but I'm not sure if it should be part of ORES. [18:53:52] or if it is part of ORES, how we implement it. [18:54:07] hmm [18:54:13] E.g. we could have rev_id depend on page_id and have page_id depend on page_name. [18:54:34] So if a page_id or page_name is provided, the dependency injection system will know how to get a rev_id. [18:54:38] so I definitely think it should be part of ORES but I don't know enough about its dependency thingies to figure out how to do it [18:54:49] right. that would be nice [18:54:55] yuvipanda, also, how do we do the call pattern? [18:55:03] The URL structure is kinda weird for that. [18:55:14] After all, ORES is the Objective Revision Evaluation Service. [18:55:16] we're already doing &revids= no [18:55:36] Yes. [18:55:42] And each rev_id gets a score. [18:55:52] so one thing that some services do [18:56:00] is to say page_name=someting&rev_id=latest [18:56:10] so you are explicitly scoring the latest rev_id of a page [18:56:33] and the 'latest' is just a shortcut, and is also uncached [18:56:51] another option is to have a different service of sorts that just does a redirect [18:56:53] Well, presumably, we would cache the underlying rev_id [18:56:55] so you'll get [18:57:02] oh yeah, but the URL won't be cached [18:57:07] Right [18:57:22] (doesn't matter for ORES, since we have no frontend cache) [18:57:36] so the different service of sorts would just be a redirector [18:57:52] you ask it for whatever, and it'll find the revid and redirect you [18:58:09] yuvipanda, what's a situation where you want to be able to provide ORES with a page_name/page_id and it is inconvenient to do so? [18:58:37] halfak: wp10 mostly. if I'm writing a gadget and want to get wp10 scores for all links on a page [18:58:45] Ahh... All links. [18:58:48] That's a good one. [18:58:51] How do you get the links? [18:58:57] api call? [18:59:07] gadgets, so I can just do $("#mw-content-area a") [18:59:12] and then I've all links [18:59:20] and I can use the title attribute and hueristics to get the page titles [18:59:24] Ohh... So that'll get external links too. [18:59:24] offtopic: talking about latest revs, I've generated tables of top scores for some other wikis: [18:59:24] https://meta.wikimedia.org/wiki/Talk:Objective_Revision_Evaluation_Service#More_Artificial_Intelligence_for_your_quality_control.2Fcuration_work. [18:59:39] Helder: right, so that's where the 'title attribute and heuristics' come in :) [18:59:41] err [18:59:43] halfak: [18:59:56] halfak: lots of gadgets / code does this (Extension:Popups comes to mind) [19:00:04] and it's fairly accurate [19:00:08] so now I've a list of page titels [19:00:11] *titles [19:00:22] (The Android App also does this because extra network requests suck) [19:00:29] yuvipanda, gotcha. So you want to skip the API call to get the most recent revision for those titles and have ORES make the API call for you. [19:00:36] basically, yeah [19:00:47] this is a lot easier since you're dealing with one round of paginating async calls [19:00:54] instead of *two* that have to feed into each other [19:01:05] I don't see what you mean [19:01:19] so I'll need to first get the 50 page titles [19:01:22] hit mediawiki API [19:01:23] I see the same set of calls regardless. [19:01:24] wait for response [19:01:28] Gotta do that in ORES too [19:01:31] and then call the ORES response [19:01:45] I suppose ORES is *closer* to the API [19:01:55] and then paginate in both ways since the MW API and ORES will return at different times [19:02:01] "The ORES Response" sounds like a Ludlum novel. [19:02:09] halfak: yeah but then we do it in ORES once and everyone who wants to do this doesn't have to deal with this issue [19:03:36] do you know about https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=ids&titles=Example&generator=links ? [19:04:15] that single API call gets the revids of all the links at https://en.wikipedia.org/wiki/Example [19:04:18] (I think) [19:04:45] Helder: right but that's n number of network requests [19:05:00] where n = total number of links / pagination limit [19:05:09] and that pagination limit can be different from ORES's pagination limit [19:05:19] yuvipanda, I don't think we can reduce the number of network requests. [19:05:22] right [19:05:29] It's just where the requests are coming from [19:05:46] yes, and that's the important bit - a network request from my laptop to the API is a lot slower than from labs to the API [19:05:52] Yeah. [19:05:56] also it's far simpler for someone to program. [19:06:07] But much more complex for us to maintain. [19:06:09] I think a part of this is also my time in mobile where extra network requests kill everything [19:06:13] And it pushes the scope of ORES. [19:06:26] well, I don't think rev_id=latest pushes the scope of ORES... [19:06:37] but you're the maintainer, etc :) [19:06:43] yuvipanda, that honestly doesn't make any sense to me. [19:06:49] which bit? [19:06:49] "rev_id=latest" [19:06:53] rev_id=latest? [19:06:59] let me find the restbase example, moment [19:07:02] I know what you want it to do, but the system would not know what that means. [19:07:12] yuvipanda, yes. Restbase is designed to access pages. [19:07:16] I know what it means. [19:07:43] right, so the system would need something additional, I guess? so it'll be page_title=someting&rev_id=latest [19:07:52] but I guess at that point rev_id=latest is redundant [19:08:04] Yes [19:08:30] U aksi don't knfd [19:08:32] err [19:08:37] I also don't know how much complexity this'll add [19:08:40] to the system in ORES [19:08:47] Don't get me wrong. I'm with you on how this *could* work. I'm just not sure how it *should* work. [19:09:02] I don't want to just serve page_name, but page_id too. [19:09:21] so stepping back, are you with me on how this is complicated when doing mass requests from the client side? [19:09:21] Should we also support user_name and user_id? [19:09:38] yuvipanda, sure. I think it's complicated wherever it happens though. [19:09:44] sure, I agree too [19:09:54] I think our primary concern should be performance. [19:09:57] it's totally possible that we can fix this in a lot of other ways [19:10:03] putting it into ORES is one [19:10:08] having another service is one [19:10:12] hell, having a JS library is one [19:10:16] Yeah +1 [19:10:35] so [19:10:35] So, should we have another service? I'm not sure, but that helps me keep the scope of ORES small. [19:10:40] right [19:10:42] So I see these two options. [19:11:09] 1. Implement page_id and page_name in the dependency injection framework of ORES and provide alternative routes to access those models. [19:11:32] 2. Stand up a secondary service that uses ORES (OPES?) [19:12:09] For 1, I don't see a clear way to modifiy the routes that makes sense long term. I'd want to section it off from our URL structure as much as possible. [19:12:18] do (1) has the problems of: scope creep, additional complexity [19:12:20] *so [19:12:21] For 2, I get tired thinking of maintaining another service. [19:12:36] yeah, I don't think it should be on you [19:12:48] But I have other reasons to want these secondary services. [19:12:55] go on! [19:13:13] I want to stand one up for newcomers -- one that you give a user_id and it makes a prediction whether that editor is working in good-faith or not. [19:13:20] Basically the model that Snuggle uses internally. [19:13:24] That should not be in ORES. [19:13:26] But it would use ORES. [19:13:34] right [19:13:43] +1 [19:13:50] So maybe we have a little farm of these secondary services. [19:13:54] so we need a way to stand up simple 'remixers' of sorts [19:14:09] that consume ORES and maybe another API (the MW API?) and serve that out [19:14:17] Yeah. [19:14:37] and it should be setup in a way [19:14:40] that's super low cost [19:14:45] to maintain and upgrade and what not [19:15:29] and ORES reminds simple and well scoped (we should write down its scope, btw) [19:15:42] and people who want additional things can get additional things in this 'framework' of sorts [19:16:02] how does that sound, halfak [19:16:24] Sounds like the right direction to me. [19:16:28] we can call it ores-contrib? or some other bikeshedded name [19:16:34] yeah. [19:16:49] which makes it clear it's not for arbitrary services though - it's gotta interact with ores in some form [19:17:07] * halfak thinks [19:17:19] Good for scope of what we're talking about it, but a weird requirement otherwise. [19:17:26] why so [19:18:01] I just want to think about this work as part of a farm of remixes. E.g. we might have another service that mixes article recommendations, semantic relatedness (from WikiBrain) and the MediaWiki API. [19:18:09] I think something being in ores-contrib will have some vague form of support (in form of CR? or ops? or whatever) and so would need limits. [19:18:12] I see [19:18:34] Are you imagining that we'd have one root endpoint for ores-contrib? [19:18:45] exactly [19:18:53] so things would register for / [19:18:54] E.g. ores-contrib.wmflabs.org/pages/enwiki/wp10/... [19:18:58] yup [19:19:05] E.g. ores-contrib.wmflabs.org/users/enwiki/goodfaith/... [19:19:32] and they get access to some things they don't have to worry about for - maybe redis, maybe a live feed of RC, caching mechanisms, etc [19:19:47] halfak: essentially, they'll implement a class and then there'll be a base framework that figures it out (tm) [19:19:58] and this base framework will constrain them [19:19:59] Gotcha. That'd be nice. [19:20:21] I can probably take a shot at it [19:20:42] since I've been wanting to write some real code :) [19:20:48] Essentially, you'd need to turn a request into a set of rev_ids/models. ORES would go make appropriate scores and give them back. Then you do something with those scores to generate a new score. [19:20:56] \o/ [19:21:01] yesh [19:21:04] YuviProgrammer [19:21:11] I'm terrible! [19:21:16] so out of touch >_> [19:21:36] anyway, essentially you'd register a name and some routes [19:22:31] so you can be 'pages/wp10' and route can be '/' [19:22:37] hmm [19:22:43] maybe [19:23:01] page/wp10//title/ [19:23:05] or [19:23:18] Yeah [19:23:22] So that we can do page_ids too [19:23:23] page/wp10//id// [19:23:28] Noooo [19:23:31] Why namespace? [19:23:32] oh [19:23:34] right [19:23:36] page_id doesn't require namespace [19:23:36] :P [19:23:40] <- rusty [19:23:56] how would a user request look like? [19:23:57] * halfak provides polishing compound [19:24:20] user/goodfaith//username/ [19:24:31] also where to put wiki [19:24:57] user//goodfaith/ [19:25:31] I dunno if we should be in the business of looking up usernames. [19:25:39] Then again, that's sort of what got us started. [19:26:05] We'd always want to store the score under the ID. [19:26:15] Since usernames and page_names change. [19:26:51] We could have the plain path assume IDs and let a query string work for names. [19:26:57] right so I think ORES shouldn't be doing these but I think ores-contrib should [19:27:02] E.g. user//goodfaith/ assumes ID [19:27:14] but user//goodfaith/?usernames=...|...|...|etc [19:27:23] yeah [19:27:28] I'm wondering if that's just the more flexible way [19:27:42] SO we always clearly support IDs, but we'll provide support for names. [19:28:13] so everything is [19:28:22] * halfak just finished https://meta.wikimedia.org/wiki/ORES/What [19:28:26] Well... not finished [19:28:35] But I think we need it and that is a start. [19:28:40] ///?just-a-bunch-of-parameters [19:29:12] problem with that will be caching [19:29:23] if we want to do that [19:29:25] err [19:29:27] frontend caching [19:29:40] halfak: nice! [19:32:06] halfak: ok how about I write up a page model for enwp and put it up and we can iterate on it if you've the time and I'll set up ores-contrib.wmflabs.org [19:32:18] +1 [19:32:18] err [19:32:20] wp10 [19:32:22] not enwp [19:32:39] dammit I wanted to write some new thing in go but oh well [19:32:41] * YuviPanda writes this in py3 [19:33:08] You *could* write it in go. But i might find it difficult to contribute. [19:33:19] yeah I think that's a bad idea [19:33:27] I wish I could just download the best practices and code style of a new language into my brain. [19:33:35] Learning the language takes almost no time in comparison. [19:33:35] halfak: yeah even I'm struggling with go atm [19:33:46] but it's a lot better now once I've started thinking of it as 'high level C' [19:33:52] instead of 'concurrent and parallel python' [19:34:17] and once my C idioms kicked in a bit (i wrote fairly high-levelish C (with glibc and GNOME) for a couple years) [19:34:49] halfak: cheap and easy concurrency *and* parallelism makes code fairly nice and different and enables things that I wouldn't have thought of before [19:34:56] it's a nice language to learn at least for teh CSP aspect of it [19:35:40] CSP? [19:35:56] https://en.wikipedia.org/wiki/Communicating_sequential_processes [19:36:05] you can call it an alternative to threading and multiprocessing [19:36:28] Gotcha. [19:36:36] Pools and queues? [19:36:49] channels mostly [19:37:02] which are kindof like queues but by default can contain only one active item [19:39:14] * YuviPanda misses actually writing code [19:40:48] halfak: did you end up adding continuation support for mwapi? [19:41:15] YuviPanda, yup [19:41:20] It doesn't do anything fancy. [19:41:53] See https://github.com/mediawiki-utilities/python-mwapi/blob/master/mwapi/session.py#L154 [19:42:18] Also, if you want, get() and post() have continuation arguments. E.g. https://github.com/mediawiki-utilities/python-mwapi/blob/master/mwapi/session.py#L232 [19:42:33] halfak: nice! so I can just use it as a generator and it'll DTRT [19:42:56] See my use of the generator here: https://github.com/mediawiki-utilities/python-mwapi/blob/master/demo_queries.py#L53 [19:43:09] Still messy because of the response structure, but not awful for sure., [19:43:23] The hard part was dealing with a custom "limit" [20:06:50] OK. I'm off for a bit. Have a good one, folks. [20:06:53] o/ [21:46:52] halfak: I've run into a conceptual problem with ores-contrib [21:46:56] halfak: so now I have code that lets you do [21:47:10] /page/wp10/?page_titles=whatever|whatever [21:47:17] and it redirects you to the appropriate ORES page [21:47:27] like https://ores.wmflabs.org/scores/enwiki/?models=wp10&revids=639573859|683548387 [21:47:34] except that response only contains revids! [21:47:36] and not the pagename [21:47:48] so as a requestor I have no way of finding out which page has which score [21:48:02] so this means I can't just redirect but I'll have to make the request myself and parse and augment it [21:48:24] oh well [21:48:34] I suspect that'll be true for most things [21:49:08] I think I'll just add an extra key there with 'mapping information' [21:49:19] and leave the original ORES response as untouched as possible [22:39:22] halfak: ok I think I've a nice and restricted API with minimal boilerplate producing consistent outputs [22:39:27] * YuviPanda cleans it up a bit before pushing [23:30:53] hmm, so things I can think of a mapper interface as mapping... [23:31:02] holla [23:31:14] would be: page titles, page ids, users, maybe more? (categories, etc) [23:31:28] hello aetilley [23:32:08] (I think halfak might be out for the day) [23:32:47] no problem