[01:04:11] 10Scoring-platform-team-Backlog, 10articlequality-modeling, 10editquality-modeling, 10Documentation, 10artificial-intelligence: Document nuances of training data - https://phabricator.wikimedia.org/T168912#3462483 (10awight) Some notes I'm taking for myself, https://etherpad.wikimedia.org/p/revscoring_fi... [04:25:01] wiki-ai/revscoring#1115 (data_utils - 69f159a : Adam Wight): The build failed. https://travis-ci.org/wiki-ai/revscoring/builds/256292555 [07:41:48] 10Scoring-platform-team, 10draftquality-modeling, 10artificial-intelligence: Experiment with Sentiment score feature for draftquality - https://phabricator.wikimedia.org/T167305#3462767 (10awight) @Halfak when you have time to review these results, ping me cos I want to learn how to debunk. For example, how... [07:46:13] 10Scoring-platform-team-Backlog, 10Bad-Words-Detection-System, 10revscoring, 10User-Ladsgroup, 10artificial-intelligence: Add language support for Swahili (sw) - https://phabricator.wikimedia.org/T162271#3462772 (10Ladsgroup) [07:46:24] 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10User-Ladsgroup, 10artificial-intelligence: Add language support for Swahili (sw) - https://phabricator.wikimedia.org/T162271#3157434 (10Ladsgroup) [08:15:10] 10Scoring-platform-team-Backlog, 10editquality-modeling, 10artificial-intelligence: Investigate code generation for model makefile maintenance - https://phabricator.wikimedia.org/T168455#3462799 (10awight) [08:28:39] 10Scoring-platform-team-Backlog, 10editquality-modeling, 10artificial-intelligence: Investigate code generation for model makefile maintenance - https://phabricator.wikimedia.org/T168455#3462802 (10awight) [09:53:19] 10Scoring-platform-team, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Add new data for damaging models of Persian Wikipedia - https://phabricator.wikimedia.org/T170960#3462835 (10Ladsgroup) Reverted: ``` ScikitLearnClassifier - type: GradientBoosting - params: init=null, max_leaf_... [09:55:41] 10Scoring-platform-team, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Add new data for damaging models of Persian Wikipedia - https://phabricator.wikimedia.org/T170960#3462836 (10Ladsgroup) https://github.com/wiki-ai/editquality/pull/86 [10:14:10] 10Scoring-platform-team, 10Wikilabels, 10User-Ladsgroup: linting tests for wikilabels - https://phabricator.wikimedia.org/T171084#3462854 (10Ladsgroup) stylelint: https://github.com/wiki-ai/wikilabels/pull/196 It's not passing probably because of a bug in stylelint itself: https://github.com/stylelint/style... [16:06:40] o/ [16:06:44] Hey folks. [16:06:46] Sorry I'm late. [16:07:06] I'm mostly planning to work on reviews for CSCW today, but let me know if there's something you want me to take a look at. [16:48:36] 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10User-Ladsgroup, 10artificial-intelligence: Add language support for Swahili (sw) - https://phabricator.wikimedia.org/T162271#3463087 (10Halfak) @Baba_Tabita, do you think you could help us by generating a curated list of bad words (cu... [17:12:48] halfak: hey [17:12:58] I'm around now, had to be a call with Amir [17:13:12] *in a call [17:24:09] o/ [17:24:17] Sorry stepped away to have some lunch [17:24:18] Amir1, ^ [17:24:26] anything you want me to look at? [17:24:36] yeah, some PRs [17:24:46] halfak: one in editquality for Persian Wikipedia new models [17:25:36] the other one needs some tests [17:25:47] (stylelint was broken today) [17:25:55] I need to make sure if it's fixed [17:26:48] okay, it's fixed now but I need to work on the PR [17:29:48] OK I'll work on editquality PR [17:31:21] Amir1, it looks like we lost a minor amount of fitness with the new data. Is that right? [17:31:26] (by the numbers) [17:32:07] For reverted yes, but for damaging it was improved AFAIK [17:32:27] Looks like damaging had a very minor loss [17:33:46] when the test data is different, it's hard to say we are really losing fitness or ot [17:34:18] Right. Even so, we should see a clear boost from this. I wonder if we need to extend our tuning params. [17:34:59] E.g. learning_rate and n_estimators [17:36:20] hmm, maybe [17:36:45] I haven't seen any huge difference when changing the params [17:36:55] see the tuning reports [17:37:32] halfak: for damaging, roc-auc is at 0.973 in both of them [17:37:38] https://ores.wikimedia.org/v2/scores/fawiki/?models=damaging&model_info [17:37:50] ROC-AUC: [17:37:50] ----- ----- [17:37:50] False 0.964 [17:37:50] True 0.973 [17:38:09] Interesting. For the tuning report, it has 0.967 [17:38:19] vs. the old .969 [17:38:28] Which is hardly different, but the difference is consistent. [17:39:30] Amir1, here's what I propose. Let's merge this and can you file a task to investigate WTF and assign it to me? [17:39:41] yeah, sure [17:39:55] My sense is that this *is* better and our test statistics are wacky [17:40:15] You can get weirdness in test stats by messing up interpolation when calculating auc [17:40:19] So they are a little fragile. [17:40:27] It will cause minor changes. [17:40:51] 10Scoring-platform-team, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Investigate small loss in accuracy with the new data in fawiki - https://phabricator.wikimedia.org/T171386#3463133 (10Ladsgroup) [17:40:52] https://phabricator.wikimedia.org/T171386 [17:41:12] halfak: Also, I want to see it in practise [17:41:18] +1 [17:41:32] so for example, test it in false positives of the old model and see if it gets improved [17:41:33] 10Scoring-platform-team, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Investigate small loss in fitness with the new data in fawiki - https://phabricator.wikimedia.org/T171386#3463146 (10Halfak) [17:41:52] It's also possible that vandals have adapted a bit so the model has learned some new important things. [17:42:05] Which would improve the model but not necessarily the test stats. [17:42:21] good point [17:42:34] back to fixing stylelint [18:02:01] o/ halfak I kept it simple, btw: https://github.com/wiki-ai/revscoring/compare/data_utils [18:02:39] hehe, that might be too worky for a Saturday [18:03:49] awight, why "normalize"? [18:05:52] Also thinking about deduplicate_revs, see https://github.com/wiki-ai/revscoring/blob/master/revscoring/utilities/util.py for read_observations() and dump_observations() [18:06:28] Also, I think the name "union_merge_observations" makes sense. [18:06:41] 1. not necessarily a rev. An observation can be lots of things. [18:06:48] 2. We're performing a set union [18:06:58] 3. for overlaping IDs, we'll do a dict merge. [18:07:44] One more thought: Why define a class? I'm genuinely curious what your thoughts are re. wrapping up these two functions in a class. [18:13:44] Bah. Lost awight. [18:23:59] 10Scoring-platform-team, 10Wikilabels, 10User-Ladsgroup: linting tests for wikilabels - https://phabricator.wikimedia.org/T171084#3463170 (10Ladsgroup) https://github.com/wiki-ai/wikilabels/pull/196 [18:30:41] o/ awight [18:30:58] https://pastebin.ca/3845613 [18:31:09] See conversation bits I think you missed. [18:37:16] * halfak sends another highly negative review to some poor authors [18:37:28] I'm going to get bitten by karma. [18:37:32] But the science is important [18:37:46] No drawing conclusions from crappy analysis! [18:38:25] halfak: ohey thank for the repost, I had in fact missed all that. laptop is annoying about saving power when I step away [18:38:33] Down with conclusions! [18:38:43] :D [18:40:10] deduplicate revs: perfect, I’ll reuse existing file stuff—though maybe holding off until we decide whether the new utils are keepers. [18:40:18] +1 union_merge_observations [18:40:38] ah /phabricator add [doc] Glossary of terms [18:41:53] Defining a class is my favorite thing in the world :p. Even if it were just a wrapper around the one function, it’s a win if we’re going to treat that function as a first-class entity, which I’m imagining is the next logical step here [18:42:16] so that we can define each model’s transformation pipelines in a mini DSL [18:44:51] halfak: Mostly unrelated question: I realized the Makefile doesn’t encode how we get from sampled_revisions to uploading into the Wiki Labels campaign. I’d like to see how that works. There seems to be variation, i.e. we aren’t just uploading “*sampled*” files each time. From what I can tell, we’ll sometimes craft the wikilabels inputs to tweak population rates, maybe to get more detail on rare cases. [18:48:57] Back to why class though (and I’d be thrilled to learn of a better way to do these things). Cos state is useful even when shared between just two functions. And when we wrap stuff up as a class, we can make the contract more obvious, it’s a transformation stage with a “run(in, out)” method. [18:49:08] I have an example of this strategy working pretty well recently... [18:50:14] https://github.com/wikimedia/mediawiki-extensions-DonationInterface/blob/master/gateway_common/StagingHelper.php [18:50:53] https://github.com/wikimedia/mediawiki-extensions-DonationInterface/blob/master/gateway_common/FiscalNumber.php [18:51:17] It gives really clean encapsulation, IMO [18:52:26] In Python, module-level encapsulation is pretty nice, and almost equivalent to class-level, but modules don’t have state [18:52:58] and I would feel personally uncomfortable passing a module in a chain of transformations, for example. [18:59:41] sorry for amending and pushing too much, travis is very stupid (and I am too) [19:03:40] My long-term vision for the experiment build spec is kinda pervy… Each model draws its inputs from a small subgraph that describes the dependencies and data flow, so (QuarrySample(59580) -> WikiLabelsCampaign(fawiki, 6) -> AutoLabel()) plus (autolabeled -> UnionMerge() <- human_labeled) plus (union_merged -> CVTrainModel(tuning_params) -> TestModel()) etc. etc. [19:04:05] subgraphs that overlap, i.e. models in a language, reuse common dependencies [19:04:33] Amir1: I have no doubt you’ll eventually conquer Travis! [19:05:13] awight: Thanks! [19:06:27] Thanks for the upload and wiki gnoming, it’s looking fantastic with nirzar’s new graphics [19:07:13] Hey! Stepped away for a bit. Reading scrollback. [19:08:19] awight, all functions are first-class entities in python. [19:08:45] re. Makefile and wikilabels, whenever we produce sampled balanced sets of needs_review True/False, then that is what is uploaded. [19:08:56] Otherwise it's just the "needs_review": true subset. [19:11:02] Re. capturing state in the class, I think it might be more obvious to just have one function that takes a sequence of dict "observations" that is called "merge_and_union" [19:11:15] And then have state only exist within that function. [19:12:06] I think that might be more obvious than "self.read_file()" just doing an in-place state change. [19:12:12] halfak: Functions would work for the same purpose, no argument there. You could even implement the same idea in C using scalar variables. I’m just going off about why the extra wrapping paper is so cozy. [19:12:24] thx for needs_review tips, that answers it for me. [19:12:53] Right. I'm talking about why the extra wrapping paper is surprising to me and why I don't find it as obvious as you do. [19:13:07] But am learning about why you like it too :) [19:13:37] I’m fine with raw-dogging it as much as the next person [19:14:03] but I guess now that I’ve added the potentially unnecessary baggage, it’s a low priority to clean that up :) [19:14:21] probably a ten-line change when we get to it though [19:14:33] meanwhile, I will reap the benefits of statefulness :p [19:15:28] to your point, I like a single function too, being functional is polite cos your caller knows there’s no more interaction expected [19:17:48] yeah read_file is extra messy, and is poorly named [19:17:56] Right ^ I feel this deeply. [19:18:07] I WON [19:18:11] Travis is happy [19:18:12] If it can be a function, it should. If it is cumbersome to pass things around, consider a class. [19:18:13] yeaaaaaah [19:18:15] \o/ Amir1 [19:18:31] I love the new logo in https://phabricator.wikimedia.org/tag/scoring-platform-team/ [19:19:58] Oh yeah! Looks sharp [19:20:13] I love that little oval shadow. [19:20:18] It adds so much for something so simple. [19:20:22] halfak: I’ve gotten some mileage out of a pattern where the module has global functions that define its interface to the outside, but internally the module deals with encapsulated objects. It might even return one of these objects in response to a public API. [19:20:40] halfak: shadow is key to friendliness [19:20:46] Do not trust anyone without a shadow [19:21:53] awight, +1. [19:22:11] E.g. https://github.com/wiki-ai/revscoring/blob/master/revscoring/dependencies/functions.py [19:24:53] halfak: That is dope. I’ve been relishing the thought of diving into Dependency but hadn’t gotten the chance yet. [19:25:55] It was really satisfying to put that together. It could totally be a separate library. [19:26:21] halfak: +1 doit [19:26:29] halfak: What do you think about context and cache... [19:26:38] Those are perfect candidates to become state, IMO [19:27:41] functional is maximum explicitness, but for a function like _expand_many, which never touches either var, it seems a shame to be passing that boilerplate around [19:29:49] One thing I haven’t figured out, which your code does well, is how to avoid ActiveRecord and pare classes down to their container responsibilities. [19:30:11] awight, they come from the user. [19:30:15] e.g. https://github.com/wiki-ai/revscoring/blob/master/revscoring/dependencies/dependent.py [19:30:27] cache and context [19:31:03] yes [19:31:06] E.g. here's a user of solve() https://github.com/wiki-ai/revscoring/blob/master/revscoring/extractors/api/extractor.py [19:31:49] Here, the painful state was "session" [19:32:20] Which https://github.com/wiki-ai/revscoring/blob/master/revscoring/dependencies/context.py makes easy to wrap up [19:32:48] If you're going to keep some "context" and reuse it a ton, then use a Context to do that :) [19:33:30] yeah that’s awesome. Make state look small [19:36:42] halfak: https://github.com/wiki-ai/wikilabels/pull/196 is waiting for your review :D [19:37:57] Ewww. Tabs in our CSS... [19:38:23] halfak: To a much earlier comment, I’m not attached to “DataNormalize”, but not sure what else to call it. Here it’s used to cast int to bool, but since we’re just passing first-class things and eventually execute as a function, we could do much more interesting things to each column using DataNormalize. [19:39:13] awight, oh I missed that part. Hmm... We could just use sed in our Makefile. [19:40:06] lol yeah sure but for 0.4s more processing time we have the world [19:40:20] sed 's/"damaging": 1,/"damaging": true,/' | sed 's/"damaging": 0,/"damaging": false,/' [19:40:28] haha I can’t believe you have me arguing for *more* complexity, this isn’t the dynamic I’m used to. [19:40:37] hahaha :D [19:41:37] I thought this was kinda sick, https://github.com/wiki-ai/revscoring/compare/data_utils#diff-d2fbd6f420cd1d5a38e271bff2bddb3cR30 [19:42:10] We could add anything we need in there, reject or warn about invalid data, etc. [19:42:18] halfak: Thanks@ [19:42:23] *! [19:59:46] The good news is, ^ those miniscripts smoke tested OK [20:00:15] I’ll try to kick off the fiwiki experiment this weekend so the results are ready on Monday 8d [20:02:43] Cool :D [20:03:24] 10Scoring-platform-team-Backlog, 10editquality-modeling, 10artificial-intelligence: Investigate code generation for model makefile maintenance - https://phabricator.wikimedia.org/T168455#3463254 (10awight) [20:03:40] awight, I wonder if we should bring back the --label-type argument. [20:03:48] Since that's the only field we really want to normalize. [20:04:15] Normalizing and even validating rev_id seems prudent [20:04:21] Arg but then again we'd need to make sure everything has a "--label-type" argument. [20:05:11] I've done something like this before. E.g. https://github.com/mediawiki-utilities/python-mwxml/blob/master/mwxml/utilities/normalize.py [20:05:29] It takes all old formats and updates them to the most recent schema. [20:05:46] It doesn't touch any fields that the schema doesn't talk about. [20:05:58] I guess I appreciate this pattern. [20:06:05] Oh! One more note. Be careful of bool. [20:06:10] I think it will work in this case. [20:06:18] but bool("false") == true [20:06:30] Cool, except we should always treat columns dynamically, not hardcode like in that mwxml [20:06:38] and bool("0") == true [20:06:48] harr good call [20:06:55] awight, if we're working within the editquality repo, I think we can hard-code. [20:07:04] But yeah, +1 for revscoring not hardcoding. [20:07:06] I shouldn’t be such a bool() in a china shop [20:07:09] Too much potential variation [20:07:12] lol [20:07:29] Actually, now that I think of it, I think normalize() should be in editquality. [20:07:33] I was headed for revscoring, fwiw [20:07:33] mmm? [20:07:46] haha now you’re getting into “I should be paid to talk about this crap” territory :p [20:07:56] wat [20:08:33] It sounds like we’re about to make actual decisions about what the responsible path forward is. [20:08:37] Totally weekday talk [20:08:55] But I’m still curious, why in editquality? [20:09:38] because we only have this problem in editquality now. [20:10:07] I see what you mean re. weekday talk. [20:10:09] ok fair enough [20:10:16] I'm used to doing serious work during my office hours ^_^ [20:10:36] Yikes! [20:12:23] :D This is often the best time for volunteers :) If I'm ever late to work on a weekday, I don't feel bad about it ;) [20:12:47] Ouch, python on macos is not as fun as it should be. Getting errors from https://github.com/cloudmatrix/esky/commit/f2ef69a91e0b9ec5ef1a0cfe2ceacba6310fb54e [20:12:49] I can’t watch [20:13:28] What am I looking at here? [20:13:33] don't... [20:13:47] I’m idly wondering why my python3 can’t do squat [20:13:55] and those lines are crashing for me [20:14:01] What do you mean. [20:14:05] * halfak <3's python 3 [20:14:14] Python is busted on my new laptop [20:14:29] Ohhh [20:14:30] I’m working around that but would like to fix one day soon [20:16:00] * awight hollers “make models/fiwiki.damaging_w_flaggedrevs.gradient_boosting.model” and drops the mic [20:16:15] \o/ [20:16:25] I'm also ready to drop the mic on my reviews. [20:16:32] And go outside for a bit. [20:16:40] have a good rest of your day! [20:16:41] o/ [20:16:51] enjoy. I’m gonna finish up my last metalwork project before moving out of my studio [20:17:21] Fancy one of these, https://www.etsy.com/listing/163370199/no-wood-wrought-iron-bookshelf?ref=shop_home_active_6 [20:17:44] * awight submits Conflict of Interest form section B12-2