[00:01:01] brb [00:01:40] our work is better than we thought, it feels good.... [00:02:01] https://www.youtube.com/watch?v=CmwRQqJsegw [00:02:25] Amir1, so, naively, we could just take the unusual predictions from our dataset and post them for re-review. [00:02:43] I'd want to edit those somehow in the Wiki labels system, but I don't even know how to think about that yet. [00:02:56] So maybe we just edit it and check it in to wiki-ai/editquality [00:03:17] I think the latter is better [00:03:21] So, non-naively, we're getting into dangerous territory by being skeptical of our false-positives and false-negatives. [00:03:29] We could be fooling ourselves. [00:03:49] Practically, I think that this is a fine idea. [00:04:17] it's fine for me [00:04:24] do you want get input of other people? [00:04:33] like Helder [00:05:37] Sure. Not sure Helder will draw from expertise. Would be nice to get some thoughts from an ML expert. [00:05:50] * halfak thinks. [00:06:32] OK. If we can stratify the training set by how it gets scores -- e.g. randomly sample 50 @ 90%, 50 @ 80%, 50 @ 70%, ... [00:06:36] this results and false-false-positives made me extremely happy :D [00:06:50] We could get a sense for where in the scale the false-labels appear [00:07:05] That would be mostly fair. [00:07:06] have you considered semi-supervised learning? [00:07:20] specially self-training [00:07:38] You're saying we could use a clustering alg and then see which edits seem to appear in the wrong cluster? [00:07:45] no no [00:07:53] self-training works this way [00:08:17] we have two sets: one is labeled data and one is not-labeled data [00:08:39] then we train our classifier based on labeled data [00:09:19] and label everything that gets really high score (either close to zero or one) from unlabeled data [00:09:29] and add them to our labeled data [00:09:36] and retrain [00:09:49] and we do this for several times [00:10:15] this method boosts our efficiency [00:11:19] Amir1, interesting. Haven't considered that. [00:11:40] I actually suggested it in one of our emails with Arthur [00:11:51] I must have misunderstood. [00:12:27] Amir1, travis is happy please merge https://github.com/wiki-ai/revscoring/pull/233 [00:12:51] with pleasure [00:13:08] search for "self-training" in your inbox [00:13:09] :) [00:14:26] Amir1, so we'd still withhold a test set, right? [00:14:49] yes [00:15:13] OMG pull request of the decade is done! [00:15:34] So, next question: what does the workflow look like for this? [00:15:58] first we need labeled data [00:16:04] Not from an abstract point of view, but what we could put into the makefile/revscoring [00:16:18] and then we sample from recent changes table [00:16:43] randomly we take five times of the labeled data (~100K edits) [00:16:47] we extract features [00:16:54] Amir1, what do you think about starting with a `reverted` model and loading a sample of strongly-scored revisions into Wiki labels for "damaging", "goodfaith"? [00:17:21] So we'd start with implicitly labeled data and train it with new explicitly labeled data [00:18:36] we should start with explicitly labeled data and move on with unlabeled data (in theory) [00:18:50] OK. [00:19:03] so it would be better to start with damaing and then add reverted data [00:19:24] But "reverted" data is messy :/ [00:19:46] So maybe we do two runs with Wikilabels. 1st run is ~2k edits and that trains the initial model. Then we load 5k of balanced strong scores for additional labeling. [00:20:21] +1 [00:22:25] OK. So, what we need to get started is a step by step plan for what to do starting from scratch including the utilities we'll want to use to get datasets/extreme scores/etc. [00:22:45] Then we'll want to back-port this process to the wikis we already have models for. [00:23:06] yessssssssss [00:23:21] I would love to help [00:24:07] Cool :) [00:29:15] halfak: let's fix the data issue before we move on [00:29:33] should we change label of these badly labeled data? [00:30:08] Amir1, right now, I think we should check them into editquality and manually update the file. [00:30:18] With a commit message that says what's up. [00:30:24] So commit the file and then commit manual changes. [00:30:38] This is the fastest route to better fitness. [00:31:00] kk [00:31:03] I'll do it [00:31:27] please make a plan for self-training implementation [00:31:35] if you have a time [00:32:07] Feel free to beat me to it! But I might be inspired. If I do, I'll make sure you know where I start drafting. [00:32:46] awesome [00:32:57] so I go back to working [00:40:32] same! [00:58:24] * pipivoj goes to sleep, bye [01:03:19] halfak: https://en.wikipedia.org/w/index.php?diff=605843328 [01:03:22] :)))) [01:04:24] lol [01:05:10] That's some hard damage to catch [01:06:01] https://en.wikipedia.org/w/index.php?diff=619478391 [01:06:17] this one is crazy, someone dated back everything one year before [01:06:27] Ohhhhhh We could do this I think [01:06:46] So, we get "numeric tokens" from our model. [01:06:55] *in our feature set [01:07:01] We can do all sorts of things with them [01:07:24] E.g. "numeric tokens added/removed" [01:07:35] yeah [01:07:37] +1 [01:07:42] If we have *a lot* of numeric tokens affected, that's probably a good indicator. [01:07:49] We can use numeric term frequency measures too. :) [01:07:53] https://en.wikipedia.org/w/index.php?diff=641982090 this one is the hardest one to catch [01:08:16] Is that even damage? [01:08:30] sneaky vandalism [01:08:35] Sneaky indeed. [01:08:44] I'm not sure that we'll ever catch that kind of vandalism. [01:08:51] Not without being inherently skeptical of anons. [01:08:53] EVen then [01:09:09] If you ask me, I would say let's keep is_anon feature [01:09:18] and automatically it goes up for review [01:09:31] Amir1, that's a rough time for anons though. [01:09:45] it won't go that much to be reverted by bot [01:09:50] I'm down for putting "is_anon" back if we don't reduce our false-positive rate by removing it. [01:10:07] * halfak needs to actually do some research there. [01:10:11] Been engineering for too long. [01:10:18] teeehee [01:10:24] :P [01:10:28] :D [01:10:36] researchineering! [01:10:57] researgineering [01:10:59] ^ better [01:11:48] BTW, I'm running the model tuner on our article quality model and it looks like we can get a bunch more fitness :))) [01:12:04] Also, we can train most models in 10 minutes [01:12:12] Big benefit over the 1 hour train time we have now [01:12:27] OK. I'm going to let that run and call it a day. [01:12:37] Amir1, you have what you need to keep working for a bit? [01:13:02] yeah [01:13:03] :) [01:13:04] * halfak will seriously consider the semi-supervised workflow proposal in the AM. [01:13:06] enjoy [01:13:11] Have a good one! o/ [01:13:20] you too [02:18:47] halfak: When you get a minute, check out the extended README [02:18:50] https://github.com/aetilley/pcfg [04:35:15] violetto: hey, I'm working on the UI, I couldn't make a button disabled in Wikimedia UI [04:35:19] is there a way to do that? [04:35:35] hey Amir1 [04:35:46] hang on a sec [04:39:31] Amir1: http://dev.munmay.com/experiment/ores-experiment.html [04:40:04] used this instead https://www.irccloud.com/pastebin/sO6hGktT/ [04:40:07] awesome [04:40:13] thanks :) [04:40:39] it's not available yet as a web component, so i just used regular bootstrap snippet [04:41:46] gI see [04:41:48] *I [05:51:23] violetto: I'm about to finish, One question, You haven't made any suggestion regarding results table. Is it okay at this shape? [14:47:45] o/ halfak [14:47:50] I haven't slept yet [14:47:54] I'm about to :) [14:47:56] o/ [14:48:14] >:( BE HEALTHY plz <3 [14:48:37] I fixed the bug of Scored REvision gadget [14:49:26] thanks for your care :) [14:49:49] Which bug is this? [14:50:20] https://fa.wikipedia.org/w/index.php?title=%D9%85%D8%AF%DB%8C%D8%A7%D9%88%DB%8C%DA%A9%DB%8C:Gadget-ScoredRevisions.js&diff=prev&oldid=16491280 [14:50:41] it disabled rollback [14:50:53] = no one would use it [14:50:53] 04Error: Command “no” not recognized. Please review and correct what you’ve written. [14:51:05] now lots of people can use the gadget [14:51:10] Oh! [14:51:28] = silly AsimovBot [14:51:28] 04Error: Command “silly” not recognized. Please review and correct what you’ve written. [14:52:27] :)))))) [14:56:58] Amir1, I didn't realize this was an issue. It seems like we should write an announcement of some sort. [14:57:51] I told about it to fa.wp people [14:58:06] but since Helder didn't make his tool a gadget even on pt.wp [14:58:14] no one noticed this bug [15:02:01] Amir1, do we need to gadget-ify ScoredRevisions on every wiki individually? [15:02:48] unfortunately yes [15:03:53] Boo [15:04:12] the good thing is we can configure it [15:04:43] so I changed it high numbers since people in fa.wp complained this tool is marking everything possible vandalism [15:05:19] Nice. Yeah, I saw that in the diff you sent. [15:06:29] Amir1, re. new GUI, it looks great. I'll look into other ways we can represent JSON. We might want some hierarchical divs or something. [15:06:36] I'll think about that some. [15:06:52] https://tools.wmflabs.org/dexbot/scorer.html [15:06:54] sure [15:07:02] make a "mock up" and I do it :) [15:07:06] I like it [15:07:38] Will do. [15:08:35] Actually... before I do that and the semi-supervised mockup, I think I'm going to clean up our phab board and publish the last month of progress reports. [15:08:38] We're way behind. [15:09:03] oh great [15:09:17] Also I think final report of IEG would be good [15:09:53] my report of Wikimania TEG(?) is still not reviewed [15:10:12] Amir1, yeah. Waiting on ToAruShiroiNeko for that report. [15:10:21] I'm not excited about taking on the full reporting load again :S [15:11:07] I don't know what is their time scale for reviewing reports [15:11:28] but I hope we don't get warnings, errors (:D) or things like that [15:11:47] Yeah. I hear you. [15:34:01] yes halfak I did submit it [15:34:48] I'll put in a good chunk of work today [15:35:04] ToAruShiroiNeko, I was just preparing to update our progress reports. [15:35:07] Want to take that? [15:35:14] sure [15:35:16] I'll get everything lined up in the "Done" column [15:36:54] I think we only have 2 or 3 weeks, we did skip a week IIRC [15:39:17] ToAruShiroiNeko, I'll be resolving cards in the "Done" column so make sure that you turn off the "open" filter [15:39:59] sure no probs, thats what I do all the time anyways [15:40:42] :) [15:45:45] ToAruShiroiNeko, {{done}} The "done" column is now all resolved. [15:45:46] How cool, halfak! [15:45:52] Tasks should be in order. [16:13:17] alright [18:24:30] halfak sorry for the delay [18:24:36] the fire was being very stubborn -_- [18:24:51] fireplaces are fun and all but -_- [18:25:09] I am writing week 3 & 4 (this ongoing week) [18:29:40] Hello all [18:30:08] whoa! activity! [18:30:33] we are always active [18:30:35] :3 [18:30:50] okay maybe not ALWAYS. [18:32:33] I sat in on an ORES meeting a couple of weeks ago, been curious about it ever since [18:33:03] I need to take a look around, been busy doing some stuff with d3 in the interim [18:54:04] ResMar sure [18:54:13] we'd be more than happy to explain anything you desire [18:54:46] I know :P Wikimedians are great like that [19:02:32] ToAruShiroiNeko: Actually, this is maybe a slightly tangent question, but I felt worth asking... [19:03:09] I put together a pretty cool Python library that I want other people to take a look at and try playing around with, what are some good places to advertise it to others? [19:03:41] well, I would start by publishing it somewhere public such as github [19:04:02] if you havent already that is [19:04:28] Yep! I have. [19:04:56] https://github.com/ResidentMario/watsongraph, if you're curious. [19:04:59] then it is a matter of starting a phabricator board perhaps and seeking people [19:05:17] Ah, sorry, this is a bit more general than Wikimedian tech. [19:05:31] It's a project I'm trying to finish up before I jump into figuring out ORES [19:05:50] well, we use phabricator too [19:08:31] halfak weekly reports are out [19:08:33] enjoy :3 [19:08:42] Will work on the final report in a minute [19:16:10] ResMar what is your intended audience? [19:16:19] Oftem to me wikimedia crowd extends outward [19:16:37] Do you mean with regards to the watsongraph library? [19:16:47] yes [19:17:05] I'm not sure. :) This is a case of my building the tool first and figuring out what to do with it later. [19:17:10] if something gets used on wikimedia projects that tends to be noticed for example [19:17:43] ores for example could benefit from graphs, I am assuming this is where your library excells [19:17:52] I couldnt read about it as I have my hands tied a bit [19:17:53] I was thinking of using it for some sort of analysis of the cohesiveness of a user's contributions for example. [19:18:19] imagine intergrating scores from ORES for that? :) [19:18:40] The in-your-face application is generating a recommendation service of some sort, but I'm still finding an IRL jQuery developer I can borrow to make a nice example web-app. [19:20:22] I still need to read up on ORES, but I don't think this is too useful for revision scoring [19:21:27] You can probably leverage it to do some sort of network analysis on editor interactions, though. [19:30:31] on the contrary [19:30:40] you can use ORES in your system [19:31:00] it isnt ORES taking use of your library, it is your library taking use of ORES [19:31:25] Interesting [19:31:52] we hope our system is adopted by 3rd party developers, we do not create anything that would be used directly. We provide the framework [19:32:12] so if you want to create Cluebot Version 2, you can use our AI but its your bot, not ours. [19:32:31] Yeah :) [19:33:08] we are taking necesary steps for this so if 10 bot hits our API at the same time, we will probably brush it off. [19:33:18] each requesting a revision per recent changes edit [19:33:31] so we are talking about tens of thousands per minute probably [19:36:24] It sounds me like you guys are mostly done building the infastructure at this point [19:43:56] we did more than that [19:44:01] we are adopted all over the place [19:44:05] including by huggle for example [19:44:20] we are expanding on many languages too [19:44:25] they each bring a different challenge [19:44:28] like chinese wikipedia [19:44:36] they have two variants of writing and 7 dialects [19:44:45] and its not space delimitered [19:58:51] Alright, you've got me excited :-) [20:05:38] :) [20:06:06] YuviPanda, I'm gettin a 404 on PAWS. Is it out of order again? [20:06:08] think of us as a middleware supplier you can capitalize on. And if our implementation is insufficient, we probably will fix that based on your demands. [20:06:25] yes [20:06:27] we're doing mass restarts [20:06:29] now [20:06:39] the same stuff that started yesterday [20:06:41] sorry about that [20:06:52] :D [20:07:02] if something breaks, who you gonna call? :D [20:08:31] You have estimation for how long will this occur in the future, or is it difficult to predict due to stochastic nature of security risks, vulnerabilities, etc? ;) [20:11:24] I'm thinking of switching to offline work. :) Installing this Python notebook stuff on my MS machine would probably be overly optimistic, but if this thing happen more it would be a smart alternative... methinks [20:13:12] pipivoj: probably when a few hours pass it should be all good [20:13:40] pipivoj: the last time a security issue required wholesale reboots was... january 2015 [20:13:43] so a full year ago [20:16:24] > stochastic nature [22:36:58] Found it: https://github.com/NARKOZ/hacker-scripts [22:49:14] halfak: btw I'm making some progress on the side towards getting ORES in prod [22:49:27] mostly looking at ways to not do full debianization [22:49:41] I've realized it's busywork that I keep putting off and hence blocks us from doing other things [22:50:05] been talking to akosiaris about alternatives that aren't as much of a busywork-y thing [22:50:08] will keep you updated [22:50:21] um [22:50:25] am running some tests on ores-compute [22:50:30] (just installing ores to time it) [22:50:34] do you want me to not use that machine? [22:53:10] YuviPanda, na. go ahead and use it [22:53:19] What do you think about python bdist wheels? [22:53:40] that's an option [22:53:44] one option we were thinking of [22:53:46] was [22:53:51] virtualenv [22:53:53] and commit it [22:53:55] to a git repo [22:53:57] and deploy from that [22:54:04] bdist wheels could be another [22:54:06] option [22:54:23] we condiered docker containers too, but dismissed that as needing waaaay too much work to work with our nifra [22:54:25] *infra [22:54:35] (in prod that is) [22:55:06] Gotcha [23:06:25] 04Error: Command “recibir” not recognized. Please review and correct what you’ve written. [23:53:01] YuviPanda, I want to move the "Productization process" from the Research page on wikitech to it's own. I'm thinking that we should call it "Service productization" or maybe say something about "Meso" support in the title. What do you think? [23:53:49] I don't know where the word 'Meso' came from :D [23:53:51] also [23:54:03] calling it Service productization might lead to unnecessary turf wars