[14:45:15] o/ [17:57:00] * halfak labels 100 edits [18:01:41] It takes me much longer to go through the filtered edits since each one was reverted for an interesting reason. [18:01:51] Very often, it's the original author who reverts. [18:04:04] o/ madhuvishy [18:04:21] When you get back, drop me a ping. I'd like to get wikilabels deployment worked out today. [18:04:37] (of course I can wait if you don't have time) :) [18:05:00] * halfak works on filtering scripts in the meantime. [18:05:53] halfak: btw, did the requests thing get fixed? [18:05:56] requests version thing [18:06:16] Yeah. I've been pinging you on the pull request. [18:06:17] :P [18:06:31] oh [18:06:32] link? [18:06:34] Oh.. I guess on the phab card [18:06:35] sorry been on a train :D [18:06:35] https://github.com/wiki-ai/revscoring/pull/150 [18:06:53] halfak: is that format supported? if so I'll just merge [18:06:59] Yes [18:07:05] TOok me a long time to find it. [18:07:12] https://phabricator.wikimedia.org/T106638 [18:07:13] There [18:07:15] 's my ping [18:08:17] merged [18:10:04] YuviPanda, sweet. So with that, you should be able to continue work on precached stuff. [18:20:16] * awight sniffs around for anyone who might want to merge my Travis stuff [18:20:39] https://github.com/wiki-ai/revscoring/pull/146 [18:20:40] * halfak looks at YuviPanda [18:20:42] d'argh, conflicts [18:22:01] resolved. [18:28:16] * halfak uses ORES in a related project :) [18:29:55] halfak: btw, checkout tools.wmflabs.org/crosswatch [18:29:58] has ORES integration now [18:30:46] YuviPanda, yeah. I saw that. Woot! [18:31:21] awesome tool! [18:31:27] pretty slick too [18:31:42] it's a gsoc project, it's me and legoktm who are the maintainers [18:32:05] * halfak is kinda frustrated with his GSOC experience [18:32:16] moar! [18:32:39] Working with a volunteer in an IRC channel >>>>> Massive bureaucratic process of GSOC [18:32:49] oh hello [18:32:57] halfak: really? mine has been pretty smoot [18:32:58] h [18:32:59] hi legoktm [18:33:06] we were just talking about crosswatch [18:33:08] is this the new -research? [18:33:10] :o [18:33:10] halfak: hi, i'm a CDC contractor. You don't know *shit* about bureacracy. [18:33:36] legoktm: haha, not quite yet [18:33:56] halfak: I had to leave work early today because I'm not allowed to use any of the computers because **the operations director did not know I was being hired** [18:34:14] halfak: I was talking with legoktm about integrating ORES with Special:NewPagePatrol, he thinks we should use Special:RecentChanges to begin with. [18:34:16] should do that [18:34:32] awight: i see it say travis tests in progress [18:34:41] yeah! [18:34:43] https://travis-ci.org/wiki-ai/revscoring/builds/73937689 [18:34:46] nice [18:34:52] We are running on Special:RecentChanges [18:35:08] But I'm not sure what we'd want to do with NewPagePatrol with the models we have now. [18:35:36] Though, I'd love to build a spam detector and slow down the review of non-obvious spam. [18:36:35] hmm [18:36:42] I was thinking of running article quality predictor on them [18:36:46] halfak: as a beta feature [18:37:27] Seems it seems like that could contribute a bit, but I don't think it would do much better than a simple page length measurement. [18:37:42] Right now, NewPagesFeed is making it hard to be a new page creator because it helps people flag pages with CSDs really fast [18:37:42] hmm [18:37:42] ok [18:37:44] TOo fast [18:38:06] Faster than the newcomer can make a second edit to the draft they just created. [18:38:06] halfak: how to get ORES on special:recentchanges now? is it a gadget or a userscript? [18:38:16] is there docs somewhere? [18:38:22] Userscript [18:38:31] https://github.com/he7d3r/mw-gadget-ScoredRevisions [18:38:45] legoktm: ^ [18:38:52] I think step1 would be to propose that as a gadget [18:43:51] ok [18:46:09] neat [18:46:38] and this info can be fetched in bulk? what's the limit? [18:46:43] 50 revs at a time [18:46:44] so far [18:46:50] per request, that is [18:47:17] can it be computed ahead of time? [18:47:25] yes! [18:47:29] there's a precaching daemon [18:47:31] will a revision's score change? [18:47:43] not for a same model [18:47:49] and the cache key has the model version in it [18:47:52] what is a model? [18:47:59] and that doesn't happen that often [18:47:59] some AI thingy [18:48:06] :P [18:48:10] lol [18:48:25] so is the precaching daemon thing turned on? [18:48:34] halfak: hey, so I'm wondering about something about ORES. My understanding is that the models can change over time, and that this therefore produces different scores for the same revision over time. Is that correct? If it is, are the models versioned in such a way that we can attach the version as metadata to the scores (or through an API request)? [18:48:48] Ideally we'd be able to do a JOIN on rc_id => score [18:48:55] legoktm: hmm this is a HTTP API [18:49:09] guillom, models are versioned [18:49:14] legoktm: the pre-caching daemon I'll hopefully turn on later today [18:49:37] Currently, we don't surface this. It's used to manage caching -- so when we deploy a new model, the old cache becomes invalid and scores will need to be regenerated. [18:49:49] I see [18:50:02] It's not a big problem at the moment, but I was wondering :) [18:50:05] YuviPanda: right, I was thinking that post-save MW hits the API, stores it in a database table for quick joins. [18:50:14] legoktm: the pre-caching daemon listens to RCStream now [18:50:25] legoktm: but yeah, that's logical next step for MW itself [18:50:43] how easy is it to set this up locally? [18:50:51] legoktm: there's a vagrant setup that works :D [18:50:56] it's not part of MWV tho [18:51:00] ah, fantastic [18:52:37] legoktm: so I was thinking step 1 is to make this available as a gadget and get feedback, then move on to making this work as a betafeature [18:52:58] when do we expect ores to be in production? [18:53:14] I don't think the gadget is very useful. [18:53:31] I just want a feed of bad edits [18:53:42] (it's a good POC though) [18:53:53] legoktm: I see. [18:54:10] halfak: hello [18:54:10] I have an implementation in my head :p [18:54:14] haha :D [18:54:15] do it! [18:54:19] o/ madhuvishy [18:54:32] legoktm: basically, step 1 of the deployment is to do the integration, and then we get it in prod [18:54:48] wikilabels deployment - YuviPanda and I talked about getting it done on Wednesday [18:54:54] when he's back here [18:54:58] OK. That works for me :) [18:55:00] YuviPanda: er, which integration? [18:55:02] yes, do you want to setup a calendar event? [18:55:15] YuviPanda, me or madhuvishy ? [18:55:16] legoktm: with MW [18:55:19] okay [18:55:21] YuviPanda: okay i'll do that. would 2pm SF time work? [18:55:23] halfak: madhuvishy, but I think you should be there too [18:55:25] when do you want this by? [18:55:37] OK. 2PM PDT works for me [18:55:40] legoktm: I think halfak can answer the 'when do you want this by' question :D [18:55:45] halfak: cool, I'll set it up [18:55:49] halfak: looks like we have converted legoktm :) [18:55:54] WOoT! [18:56:06] legoktm, what's the "this"? [18:56:36] filtering Watchlist/RC based on rev score things [18:56:58] I'm not sure what to call it. ORES? "revscoring"? [18:57:39] heh [18:57:41] bikeshed bikeshed [18:58:22] ORE-colored-bikeshed [18:59:08] We'd want to have something we could have as a beta feature within the next 6 months. 3 months would help us keep some momentum [18:59:28] Right now, Lila et al. seem to grok what we're doing. [18:59:48] when do you expect to have ORES in production? [19:00:19] legoktm: next quarter, but I think we need to have the integration in place by then and then schedule / budget for hardware [19:00:35] * legoktm flips a coin [19:00:52] I forget when the next quarter starts [19:00:52] legoktm: I'm pushing it through as soon as we have a user facing feature we can deploy :) [19:00:56] okay [19:01:01] I wrote down some notes: https://www.mediawiki.org/wiki/User:Legoktm/ORES [19:01:03] it has general support from the ops folks who matter, etc [19:01:15] nice [19:01:19] does ORES let me know if an edit has already been reverted? [19:01:30] halfak: ^ [19:01:42] legoktm, it doesn't [19:01:48] I'd like to make an event stream for that. [19:02:03] We can do it in realtime, but it requires an absurd amount of resources. [19:02:12] do we have an API for that halfak? [19:02:13] hh [19:02:21] Not yet. We could. [19:02:22] so I guess for now the answer is 'no'? [19:02:22] wait I forgot that MediaWiki knows if edits are rollbacked. [19:02:42] legoktm, it does? Short of a structured comment? [19:02:44] legoktm: hmm, will this be a post save job or just post-save? [19:02:47] On the reverting edit [19:03:16] halfak: if it goes through action=rollback, MW knows [19:03:23] action=undo is more sketchy [19:03:43] YuviPanda: I was thinking job, it could be a deferred update. [19:03:51] Aaron would probably say job [19:04:42] legoktm, which DB table do I check? [19:04:58] legoktm: hmm, what's the latency on those like? but yeah, probably job, I guess. esp. if we run the pre-caching daemon off RCStream still, the job will complete pretty quickly [19:07:28] halfak: it's not a db table, but you can use the ArticleRollbackComplete hook. [19:07:44] Oh! So, mediawiki "knows" but immediately forgets? [19:07:59] sure. [19:08:20] But it doesn't need to. [19:08:34] doesn't need to know or forget? [19:09:25] YuviPanda: it depends on the job. It could be push based too, where ORES hits a special API endpoint to set the score after it pre-caches it [19:10:00] but I'd go with the job first, and then look into other things if its too slow. [19:10:29] yup [19:11:03] legoktm, doesn't need to forget. [19:11:45] bbl, lunch [19:13:00] o/ valhallasw`cloud [19:13:11] \o halfak [19:13:12] just noticed you've joined us in the skynet cabal [19:13:14] ;) [19:13:19] yes, YuviPanda forced me [19:13:22] :D [19:13:27] even though I know nothing about AI [19:13:45] wait, that's a lie. I did a course in artificial intelligence during my bachelors [19:13:46] Know anything about using good sources of signal to build interesting tools? [19:14:05] getting a signal out of noise is my day job, yes [19:14:26] A linear regression is a machine learning strategy :) [19:14:38] combined with a bit of https://upload.wikimedia.org/wikipedia/commons/f/ff/LabVIEW_Block_diagram.JPG [19:15:00] heh, fair enough [19:15:09] * halfak waves his hand at Fourier Transforms too [19:15:12] halfak: valhallasw`cloud is a tools admin as well :D [19:15:18] and definitely way more sciency than me [19:15:32] * valhallasw`cloud convolutes halfak with YuviPanda [19:15:36] convolves? [19:15:40] convolves, yes [19:16:17] heh [19:16:32] That's probably because when me and YuviPanda bump knuckles, we make Science-Engineer-Megaton and build cool stuff. [19:16:46] heh [19:16:51] I'd convolve me too [19:17:00] * YuviPanda looks up convolve [19:17:08] oh that [19:17:09] ok [19:17:13] :D [19:17:39] brb, shower and food and things [20:09:18] YuviPanda, all but one worker is offline and I can't restart them from flower. Any suggestions on what I should retry? [20:09:22] Maybe cycle the boxes? [20:12:37] halfak: service celery* restart [20:12:44] On the boxes [20:15:46] Worked great. [20:15:51] Still no idea why they are crashing. [20:15:57] Or locking up or whatever. [20:16:22] Once we push this next version of revscoring, our error rate will fall dramatically and we'll get a better view of what errors could be involved. [20:17:06] Ya [20:17:15] We need to have centralized logging and graphite [20:24:49] YuviPanda: not joining the scalable events system meeting? [21:08:12] madhuvishy: nope [21:08:26] YuviPanda: okay it's almost over anyway :) [22:43:34] madhuvishy: how did it go? [22:43:57] YuviPanda: I think it went great. Lot of people are invested in it [22:44:04] madhuvishy: nice [22:59:25] madhuvishy: anyone from ops outside if otto? [22:59:41] Faiden was there [23:00:01] faidon* [23:00:11] Nice [23:00:13] Cool then [23:00:41] i dont know if performance and services are ops [23:01:15] i din't think so - but there was representation from those teams [23:24:02] They aren't [23:24:03] Nice [23:43:35] halfak: hey, around?