[00:10:14] halfak: I noticed another mistake I made in following Zache’s suggestions [00:11:25] According to https://phabricator.wikimedia.org/T166235#3449799, I took the “next best solution”, but there might be a significant difference [00:12:04] because the signal I lost were probably some of the higher quality contributions: multi-edit chains by a single author, approved as a chain. [00:12:33] I diluted the negatives with so-so changes [00:16:41] ooh, interesting:  If we can break out of scoring pure revisions, the diff between start and end of a Flagged Revs approval is a high confidence good edit. [00:32:19] awight: Whoa thanks for merging my patch before I noticed [00:32:26] I was about to ask this channel to give input on it [00:33:03] awight: BTW the reason I moved away from the testwiki fake model is that it doesn't have test_stats, and so RCFilters can't configure itself with it [01:05:27] 10Scoring-platform-team, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Flagged revs approve model to fiwiki - https://phabricator.wikimedia.org/T166235#3495853 (10awight) @Zache We would love if you weighed in with how you would like to proceed. Our second experiment showed a slight... [01:05:38] 10Scoring-platform-team, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Flagged revs approve model to fiwiki - https://phabricator.wikimedia.org/T166235#3495854 (10awight) Work is described in more detail here: https://meta.wikimedia.org/wiki/Research_talk:Automated_classification_of_... [01:10:48] RoanKattouw: It’s because we don’t want people knowing that testwiki has the best test stats of all!!! [01:11:41] Sorry to jump the gun on that config patch though, Reedy had to do a (not-so-urgent) emergency deployment cos I had forgotten that mediawiki-config +2 means, “I’m in the middle of deployment". [03:02:13] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10MW-1.30-release-notes (WMF-deploy-2017-08-01_(1.30.0-wmf.12)): Hide ORES review letter from the change list legend. - https://phabricator.wikimedia.org/T172338#3496022 (10Catrope) > @Catrope:... [07:44:54] 10Scoring-platform-team, 10ORES, 10Operations, 10Scap, 10Release-Engineering-Team (Watching / External): Simplify git-fat support for pulling from both production and labs - https://phabricator.wikimedia.org/T171758#3496219 (10fgiunchedi) CC'ing #operations here too for wider distribution [09:47:44] 10Scoring-platform-team, 10editquality-modeling, 10Performance, 10RfC, 10artificial-intelligence: [RfC] Should we remove all reverted models when there is a damaging one? - https://phabricator.wikimedia.org/T171059#3496458 (10Ladsgroup) After two weeks, no objections has been raised. I'm moving forward t... [09:54:08] 10Scoring-platform-team, 10editquality-modeling, 10Performance, 10RfC, and 2 others: [RfC] Should we remove all reverted models when there is a damaging one? - https://phabricator.wikimedia.org/T171059#3496466 (10Ladsgroup) a:03Ladsgroup [09:56:32] 10Scoring-platform-team, 10editquality-modeling, 10Performance, 10User-Ladsgroup, 10artificial-intelligence: Remove reverted models from editquality repo - https://phabricator.wikimedia.org/T172370#3496483 (10Ladsgroup) [13:27:45] Reading scrollback, it looks like testwiki needs some fake stats ;) [13:57:06] 10Scoring-platform-team, 10ORES: Design Meta ORES data storage schema - https://phabricator.wikimedia.org/T153152#3497044 (10Halfak) a:03awight [14:18:24] 10Scoring-platform-team, 10draftquality-modeling, 10artificial-intelligence: Experiment with Sentiment score feature for draftquality - https://phabricator.wikimedia.org/T167305#3497198 (10Halfak) Here's what I got. It roughly matches Adama's 2nd model. But one surprising bit that I've only noticed now is... [15:17:04] 7am might not be my forte [15:18:21] lol [15:18:32] awight if it's any better it's 4:18pm here [15:18:33] heh [15:18:42] * awight squints [15:18:46] that does sound better [15:18:52] lol :) [15:19:27] I’m in Pacific time so it’s just after 8, but that means I missed an hour-long meeting with the rest of the staff [15:21:06] oh [15:24:14] Bah. Missed awight [15:24:18] Been trying to ping him. [15:24:35] Offered to move the meeting but that got shot down. I think it's time to just move it. [17:12:03] * paladox likes the new create change dialog he created in polygerrit :) https://gerrit-review.googlesource.com/#/c/gerrit/+/117290/ [18:23:47] o/ [19:25:07] halfak: you like this filter_data utility? [19:25:08] https://github.com/wiki-ai/editquality/compare/fiwiki.flaggedrevs#diff-b67911656ef5d18c4ae36cb6741b7965R1146 [19:25:33] CLI signature is like, python oneoff-fiwiki.flaggedrevs/filter_data.py autolabel.review_reason --unequal="reverted edit" [19:26:04] source: https://github.com/wiki-ai/editquality/compare/fiwiki.flaggedrevs#diff-fcc1c6091615b8792534bb13602cbbb6 [19:26:20] If you think it’s useful, I’ll extract a patch for editquality [19:27:10] I’m about to rework the flaggedrevs patch to merge the data into the main model training+test set, unless you think I should wait for feedback from Zache? [19:41:25] halfak: also, I implemented the fix from your last comment and https://github.com/wiki-ai/revscoring/pull/338 is ready for rereview. [19:44:54] * halfak looks at PR [19:44:59] wow. That tool longer than expected. [19:45:07] ^ my lunch [19:45:10] yikes [19:58:30] halfak i also tried to make wikilabels-wmflabs-deploy's repo irc bot's name different from wikilabels repo itself but the name was too long for irc [20:09:38] halfak: hey, I just woke up from a very long nap, Can I work on some stuff for ORES? [20:09:52] Hey Amir1! [20:09:58] Hmm... let's look :D [20:10:04] I just finished the wikilabels deploy [20:10:25] Awesome, I want to start with removing all of reverted models when there is a damaging [20:10:26] halfak ty for that fyi [20:10:41] is that okay? Two weeks passed and no one objected [20:10:52] Amir1, +1 sounds good. [20:11:02] In parallel, I'm going to start experimenting with revscoring 2.0 [20:11:07] I'm going to cut the version now :D [20:11:19] and also lv, but beside that, is there anything? [20:11:45] Amir1, help me experiment with 2.0 and/or help me flesh out content for the hackathon next week [20:11:52] halfak tested the oauth nothing broke so thats a good sign! [20:12:27] halfak: you get my questions about filter_data.py, PR #388 and where to go with flaggedrevs? [20:13:37] awight, sorry missed it. Looking now. [20:13:41] Zppix, awesome thank you! [20:13:47] * awight flushes internal buffer [20:13:48] I forgot to test that on staging :S [20:14:00] filter_data [20:14:57] * awight nods solemnly [20:15:47] awight, why is https://github.com/wiki-ai/editquality/compare/fiwiki.flaggedrevs#diff-fcc1c6091615b8792534bb13602cbbb6 not grep? [20:16:55] halfak im going to close that task for oauth for wikilabels then considering theres not really a reason to use home wikis unless anyone thinks otherwise [20:17:35] Because structured data is nicer than randomly looking for a string, IMO… [20:18:04] Most importantly to my long-term goals, it’s also much better to include in a declarative definition of the data set recipe [20:18:05] awight, help me not cite YAGNI :) [20:19:24] * awight waits for opinion on my second point about code-generating this nonsense [20:22:53] Sorry, I’m not sure if I’m waiting for you to respond or not. [20:22:58] I can continue though [20:23:48] awight, OK so I see a recorded sequence of core utils calls as declarative. [20:24:08] I guess I'm not sure what you're talking about re. declarative. [20:25:51] that when I try to get rid of the makefile, and there’s a sequence like: https://github.com/wiki-ai/editquality/blob/master/Makefile#L648-L653 , I would rather that the recipe is a semantic description of what we want to achieve rather than the exact implementation. [20:26:55] It’s “grab 20k unreverted revisions, merge with all the reverted revisions, and shuffle" [20:27:21] To me, this reads like SQL [20:27:40] It's a structured language for common file operations [20:27:45] It’s key to decouple our description from the implementation [20:27:52] YAGNI [20:27:59] stop that :p [20:28:25] Abstractions where abstractions are really important and concreteness everywhere else. [20:28:37] For example, when we come to grips with the fact that “shuf” is actually really crappy and uses a horribly memory-inefficient algorithm, we’ll be able to replace it [20:28:44] without sed’ing 100 usages [20:29:17] There will be plenty of concreteness in the code-generated Makefile [20:29:28] shuf is the fastest randomization I've ever seen for a large chunk of data. [20:29:34] I... I can't imagine something better [20:30:04] shuf stores everything in memory. anything would be better [20:30:15] awight, how are you going to make a better shuf? [20:30:25] And why haven't the people who have been working on shuf for so long done that already? [20:30:56] gah /me allows self to rabbithole into shuf land [20:31:14] 10Scoring-platform-team, 10User-Zppix: Early Aug 2017 Wikilabels Deployment - https://phabricator.wikimedia.org/T172332#3498877 (10Zppix) [20:31:16] 10Scoring-platform-team, 10Wikilabels, 10User-Zppix: Wikilabels should authenticate on the right wiki - https://phabricator.wikimedia.org/T166472#3498875 (10Zppix) 05Open>03Resolved Resolving as Meta is internationalize in regards to oauth process, so I see no need to send each user to home wiki. [20:31:32] want me to resolve T172332 halfak? [20:31:33] T172332: Early Aug 2017 Wikilabels Deployment - https://phabricator.wikimedia.org/T172332 [20:31:48] http://lemire.me/blog/2010/03/15/external-memory-shuffling-in-linear-time/ [20:31:55] Zppix, sure [20:32:09] ack [20:32:29] 10Scoring-platform-team, 10User-Zppix: Early Aug 2017 Wikilabels Deployment - https://phabricator.wikimedia.org/T172332#3495230 (10Zppix) 05Open>03Resolved Deployment done and requested by @Halfak [20:33:00] so... sort -R? [20:33:07] 10Scoring-platform-team, 10User-Zppix: Early Aug 2017 Wikilabels Deployment - https://phabricator.wikimedia.org/T172332#3498883 (10Zppix) [20:33:09] 10Scoring-platform-team, 10Wikilabels, 10editquality-modeling, 10artificial-intelligence: Change "yes/no" in damaging_goodfaith form to "damaging/good" and "good-faith/bad-faith" - https://phabricator.wikimedia.org/T171493#3498881 (10Zppix) 05Open>03Resolved Deployed [20:33:10] lol that link didn’t actually answer our question [20:36:35] awight, how does N calls to filter_data end up being easier to address later than N calls to grep? [20:36:44] Also, there are utilities for this -- like jq [20:36:46] Because they’re not calls [20:37:03] The concrete implementation is created by code generation [20:37:32] ah cool: til jq [20:38:02] So it seems you are telling me something about a larger vision you have -- yet I've not heard the details of that vision so I don't know what you are talking about or why generating code is ever a good idea. [20:39:02] halfak: We’ve talked about it a few times, it’s this one: T168455 [20:39:02] T168455: Investigate code generation for model makefile maintenance - https://phabricator.wikimedia.org/T168455 [20:39:53] Right. So that's still just a part of a larger vision. [20:40:00] And I'm also skeptical of code generation [20:40:32] "awight renamed this task from [Spec] Bury horrors of the editquality makefile to Investigate code generation for model makefile maintenance." [20:40:39] The reason it’s a good idea is that our Makefiles are a huge amount of boilerplate, error-prone, and make it difficult to see meaningful variations between how we train each wiki. [20:41:00] See, at an earlier point, I think I was suspecting you were planning to use a general dependency solving framework based on a yaml config file. [20:41:34] code generation is not something scary—it’s the same idea as exporting jsonlines, just more interesting [20:41:38] awight, I'm not sure I agree with any of that. I think that having one giant file is problematic though. [20:42:06] If we had a separate file for each wiki/modelset, I'd consider the older version of that task resolved. [20:42:26] ha—that really would bury the horrors [20:42:50] but we would unearth them again each time we had to make a CLI syntax change to everything that uses cvtrain [20:42:53] awight, well, I guess I don't agree that makefiles are that horrible. What's so wrong with using coreutils? [20:43:19] awight, right. It would be like any other refactoring. [20:43:39] Everything that used the old API would need to be updated to call the new API correctly. [20:44:02] and if we were using CG, there would be a single template to update [20:44:32] same idea as in the code: you don’t want 100 calls to the same function strewn about the code, you’re best off having minimal interaction with APIs [20:45:12] What do you think about the fawiki demonstration I give in the task description? [20:46:08] I think it's an interesting thought experiment. But building it would take forever and it might end up being a bigger pain in the ass than we already have. So far, it's pretty straightforward to use the Makefile -- it's just very big. [20:46:20] Currently, you have to be a wiki-ai expert to go into that Makefile. With a simplified YAML config file, anyone would be able to come in and say “oh, fiwiki mixes flagged revs approvals into the training set” [20:46:24] And I don't like very big files [20:46:41] awight, you'd need to dig through more code to use a CG [20:46:46] Because what is it doing? [20:46:48] “building it would take forever” is not fact-based... [20:47:00] I'm not sure there are facts involved here [20:47:07] I'm stating an opinion [20:47:19] CG would be reading configuration and feeding it into templates as parameters [20:47:23] hehe [20:47:45] And you'd need to read that CG code to figure out what it's doing. [20:47:56] but it will generate a Makefile [20:48:01] And then we're back where we started because you'd need to understand what a certain set of commands and Make does [20:48:02] that you can read just like today's [20:48:06] Right [20:48:07] ! [20:48:13] um… point for me :p [20:48:25] no, I don’t see this as a debate to be solved now [20:48:27] I'm not sure you see the point I'm making [20:48:30] I’ll keep annoying you about this [20:48:35] Didn't you just say the Makefile was impossible to read. [20:48:39] yes [20:48:49] Well, that's how you propose we debug the CG? [20:49:09] Scientists who care about repeatability can crib from the makefiles [20:49:19] we as devs can debug by checking the makefiles [20:49:50] but wikimedians trying to help us with their wiki pipeline can also reference the YAML which we have optimized to be as readable as possible [20:50:13] why would they want to look at that yaml? [20:50:27] good question. Maybe they don't [20:51:05] cmake is a similar thing. lemme see if I can show… [20:51:51] I can’t quite. But here’s the input: https://github.com/adamwight/photo-booth/blob/master/CMakeLists.txt [20:52:36] cmake reads that file, which explains our high-level dependencies, and it generates a huge Makefile that I have never needed to care about. [20:57:56] Looks like cmake was designed specifically for C code [20:57:58] Is that right? [20:58:04] I don’t want to make my point by contradiction, but do you have any arguments for why the Makefile is the best solution? * standard technology, * coreutils, * deterministic [20:58:20] yeah cmake is not related to our problem, other than demonstrating a dope CG application [20:58:41] Think of CG as compilation-lite [20:58:49] I'm not saying the Makefile is the best solution [20:59:19] But I would say it has some really nice characteristics. [20:59:33] sorry, didn’t mean to put words in your mouth [20:59:35] Flexibility and replicability being among them. [20:59:43] Standard, sure. [21:00:17] It seems that you're looking for buy-in on a solution and I don't think I buy-in to your formulation of the problem. [21:00:18] CG will be annoying if we want to go off the beaten path… we would have to write code to support one-off utils [21:00:36] that’s fine—I can continue working on buy-in to the problem [21:00:57] IMO it’s really hard to see what each wiki is doing differently [21:01:35] It’s also totally unclear whether the tuning params are set because that was the best practice when the wiki model was added, or if they have actually been tuned for each wiki,. [21:02:39] I’m less concerned about the mechanics of grep/shuf, it just happens to be in my crosshairs [21:04:13] awight, check the tuning reports? [21:04:57] :D that’s a good answer [21:10:06] Anyway, jq answers my question about data_filter [21:10:14] Great :) [21:10:29] We'll need to add something about making sure jq is available in the makefile. [21:10:45] *Readme [21:10:53] Oooh! Good opportunity for json2tsv too [21:11:31] isn’t that a python requirements.txt thing? [21:11:58] Hmm... good q. I suppose it can work that way just fine [21:13:13] I can just grep for now, it’s what the other kids are doing. [21:13:47] +1 for keeping things consistent until we agree on a change of strategy [21:13:53] Meanwhile, shall I go ahead and merge flagged revs data into the main fiwiki training+test set? [21:14:01] Or did you prefer that I wait to hear from Zache? [21:14:14] Hmm... Let's let Zache block that a bit longer [21:14:22] ok cool [21:14:35] Maybe schema stuff for a bit. I'm curious what you think of my recent edits to the ores meta stuff [21:14:47] I still haven't added a "preference" dimension yet. [21:18:17] great, will do [21:29:24] halfak: oh hey, this is almost related to meta-ores schema... [21:29:54] yesterday I realized that Flagged Revs approvals are of the entire edit chain, so the only thing they’re actually verifying is that end-begin is a good diff [21:30:24] We could have a new entity to represent diffs that don’t correspond to a single revision. [21:31:13] That can of worms aside, I want to implement Zache’s original suggestion to include multi-change approvals if all the changes were made by the same author [21:31:27] I think we’ve lost a lot of the higher-quality edits, by omitting those [21:32:16] Imagining that what I should do with those is include *every* revision in the chain. [21:38:40] awight, +1 for including every revision. [21:38:52] However, I'm skeptical that we should include revision chains as a "thing" [21:39:17] or "artifact" as I'd renamed it [21:52:19] Revision chains would be missing a lot of the features we need, but possibly we could synthesize those features… luckily we can avoid all that for now by just taking every revision. The only catch is that, there could be an accidentally damaging change in the chain which is later fixed. [21:53:39] 10Scoring-platform-team, 10Edit-Review-Improvements-RC-Page, 10ORES, 10Wikidata, and 3 others: ORES: Don't highlight changes propagated from Wikidata - https://phabricator.wikimedia.org/T168487#3366057 (10Etonkovidova) Checked the fix in production (wmf.12). Wikidata (and log entries) do not display ORES h... [21:54:26] awight, should be OK so long as we have a much lower rate of damage than in the population [21:54:56] kk [21:55:20] I do want to at least get the number of revisions I missed by omitting multi-change approvals with one author. [21:55:50] Do you agree those are probably going to be unusually high quality? [21:57:03] awight, hard to say. I have little experience with flaggedrevs [22:35:17] 10Scoring-platform-team, 10Edit-Review-Improvements-RC-Page, 10ORES, 10Wikidata, and 3 others: ORES: Don't highlight changes propagated from Wikidata - https://phabricator.wikimedia.org/T168487#3499151 (10jmatazzoni) 05Open>03Resolved [22:42:29] out for a few hours...