[12:38:45] (03CR) 10Ladsgroup: "I will work on it in later patches." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400625 (https://phabricator.wikimedia.org/T181892) (owner: 10Ladsgroup) [14:51:01] o/ Nice to be back! [15:04:30] o/ [15:07:32] * halfak starts in on the email [15:07:46] Oh! I have fiwiki models. [15:07:49] * halfak pokes at that [15:07:54] nice work [15:08:15] PROBLEM - puppet on ORES-worker08.experimental is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:10:20] I’m going to push our logging changes at 18:00 UTC [15:12:08] awight, sounds good. :) [15:12:14] fiwiki is WEIRD [15:12:24] PROBLEM - puppet on ORES-redis02.experimental is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:12:30] * halfak starts working on his assumptions. [15:12:43] In what way? Test stats? [15:13:08] This is the model with FlaggedRevs mixed in, right? [15:13:28] goodfaith model gets high ROC-AUC and damaging gets very low ROC-AUC [15:13:31] right [15:14:24] O_o [15:15:31] Maybe we could use the FR data to train goodfaith, but not when training damaging… until the mystery is dispelled. [15:18:08] hmm... There's definitely something strange going on here. [15:21:31] 10Scoring-platform-team, 10editquality-modeling, 10artificial-intelligence: Investigate code generation for model makefile maintenance - https://phabricator.wikimedia.org/T168455#3867303 (10awight) @Halfak @Ladsgroup This seemed like a fruitful project, and my prototype is c. 50% complete. Is there a good t... [15:37:46] RECOVERY - puppet on ORES-worker08.experimental is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [15:41:54] RECOVERY - puppet on ORES-redis02.experimental is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [15:44:22] ^ puppet show [15:46:05] lol [15:49:58] :D [15:50:02] So many emails. [15:54:19] awight: hey, can you review this patch soon? https://gerrit.wikimedia.org/r/#/c/400183/ It would be great if I can merge this before the branch cut (in couple of hours) [15:55:24] Amir1: Will do! [15:55:37] Thank you! [16:01:47] Amir1: Do you have time to write a test that integrates using the top-level API response? [16:02:23] I think I should do it in another patch, specially I will be moving this method around [16:02:33] kk reviewing [16:02:59] awight: I have integration tests for the API up and working: https://gerrit.wikimedia.org/r/#/c/400623/ [16:03:16] awight & Amir1: Do we need to do anything to get simple english enabled in rc filters this week? [16:03:34] halfak: It needs more beta testing. [16:03:49] RCFilter pieces were failing intermittently. [16:05:49] Amir1: Those tests still wouldn’t catch the key glitch we saw earlier... [16:06:52] Which key glitch? [16:07:23] The one where checkModelVersions expected the full API response in some places and a subtree in others [16:07:28] gotcha. Thanks awight [16:07:33] This tests would catch the changes in the hooks of the API regardin database query as it does the round trip to the database and comes back [16:07:56] Yeah, there are two different things [16:09:16] Sorry to harp on this, but I still think it’s a good idea to log notices if the API response structure is surprising, rather than silently return null. [16:10:11] I agree with your argument that it should be “safe”, but IMO it’s even safer to not crash, and also log any funkiness [16:10:29] * awight shakes off holiday conservativism [16:10:49] anyway, yeah let’s get this thing out and see what it does. It does look safe. [16:13:16] o/ [16:14:19] (03CR) 10Awight: [C: 04-1] "Lacking a call to updateModelVersion" (032 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400183 (https://phabricator.wikimedia.org/T183468) (owner: 10Ladsgroup) [16:15:09] O/ [16:15:38] (03CR) 10Awight: [C: 04-1] "This or a future patch should include a test that integrates across getScores, and checks that we've updated the model version if necessar" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400183 (https://phabricator.wikimedia.org/T183468) (owner: 10Ladsgroup) [16:18:02] (03PS5) 10Ladsgroup: Update model version when it's different in Scoring [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400183 (https://phabricator.wikimedia.org/T183468) [16:19:01] (03CR) 10Awight: [C: 032] "Great, concise!" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400183 (https://phabricator.wikimedia.org/T183468) (owner: 10Ladsgroup) [16:19:36] (03CR) 10Ladsgroup: "I will write integration tests for that ASAP." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400183 (https://phabricator.wikimedia.org/T183468) (owner: 10Ladsgroup) [16:24:15] (03Merged) 10jenkins-bot: Update model version when it's different in Scoring [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400183 (https://phabricator.wikimedia.org/T183468) (owner: 10Ladsgroup) [16:46:02] Amir1: Aw man so I'm working on what you said for normalizing the tag schema and realizing two things [16:46:33] 1) This really makes the change_tag_statistics table the table that defines tags (the ID->name mapping) and so it should really be called change_tag... but that table name is already taken [16:46:45] 2) There is a valid_tag table with basically no documentation as to what it does [16:47:03] facepalm [16:47:04] * RoanKattouw grumbles and goes off to do some code archeology into valid_tag [16:47:26] For laughs, go take a look at the schema of the valid_tag table in tables.sql [16:47:40] RoanKattouw: Yeah, I think the name should be changed to something more meaningful but I also can't find a proper name as well [16:48:25] If there isn't some kind of crazy reason why we can't, I think my preferred approach would be to expand the valid_tag table to contain both the ID+name mapping and the stats [16:48:36] But first I have to figure out WTH that table is used for now [16:48:41] https://github.com/wikimedia/mediawiki/blob/master/maintenance/archives/patch-valid_tag.sql [16:48:48] That's amazing [16:49:09] enwiki.valid_tag contains 5 rows so that's not very promising [16:53:02] O_O burn it! [16:54:03] Yeah I probably can [16:54:15] halfak: I wrote a Model class that holds a collection of these classifiers, it works perfectly but the scoring seems to have taken a hit, scoring one instance with 40 diff classifiers taking 6s [16:54:25] But dinner beckons, so I'll eat first, then figure out whether I can burn it [16:54:25] I'm looking for a way to optimize right now [16:54:53] But yes it basically looks useless [16:55:00] RoanKattouw: We checked and it's used in ChangeTags.php but not sure why the methods are orphan [16:56:21] RoanKattouw: On another topic, would you feel like talking through our wacky JADE-in-ContentHandler ideas some time, for maybe 30min? [16:59:34] awight: for when you have some free time: https://gerrit.wikimedia.org/r/#/q/owner:Ladsgroup%2540gmail.com+status:open+project:mediawiki/extensions/ORES [17:00:49] Cool! [17:03:25] codezee, damn. I figured that might be a problem. I'm still surprised that it takes 6 seconds. That seems like a really long time. [17:03:44] Is all the time spent waiting on estimator to return a predict_proba()? [17:04:44] codezee: halfak: Random thing I’ve been wondering is, what volume of new articles will we be scoring and does the scoring latency matter? [17:05:11] ^ good point. Maybe it's OK to score an article once every second. [17:05:17] I don't think that once every 6 will work [17:05:23] awight: yes I'm not sure about that, and if we score a bunch of articles together we could gain on that front [17:06:19] halfak: do we not fire parallel scoring requests in ORES currently? [17:06:46] codezee, we do not. That might be preventative, but maybe it is worth a shot. [17:06:58] codezee, oh wait. Yes parallel. [17:07:18] I was thinking its a trivia if the requests are independent and the requester has the patience to wait for 6s [17:07:18] But I was just thinking, what if you did parallelization within the prediction itself. [17:07:27] https://en.wikipedia.org/wiki/Special:NewPagesFeed suggests it’s 1-2 per minute [17:08:03] halfak: I'm trying to do that exactly but when i used multiprocessing it took 54s ! and 52s of that were stuck in acquire_lock [17:08:07] something I'm missing here [17:08:26] codezee, let's not do that. Now that I think of it, ORES will fork bomb if that happens. [17:09:59] halfak: the real problem is with the number of estimators per classifier, currently 400, if we drop that to 50 while retaining the fitness we gain 8 times, I've generated results with n_estimators as 50, looking into them [17:12:15] OK that sounds good :) [17:15:04] oh, clearly that hypothesis holds, with n_estimators as 50, it takes 1.3s \o/ we just need to balance off n_estimators till the limit we can [17:17:45] (03CR) 10Awight: [C: 032] Introduce ScoreStorage and its Sql implementetion (032 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398651 (https://phabricator.wikimedia.org/T181334) (owner: 10Ladsgroup) [17:19:28] (03Merged) 10jenkins-bot: Introduce ScoreStorage and its Sql implementetion [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398651 (https://phabricator.wikimedia.org/T181334) (owner: 10Ladsgroup) [17:20:09] and looks like the results aren't even that much affected [17:20:43] (03CR) 10Awight: [C: 032] "Thanks!" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400623 (https://phabricator.wikimedia.org/T182942) (owner: 10Ladsgroup) [17:22:01] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Graphite, 10User-fgiunchedi: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969#3867919 (10Halfak) @fgiunchedi, can you help me figure out what our next step should be here? [17:22:09] (03Merged) 10jenkins-bot: Integration tests for API [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400623 (https://phabricator.wikimedia.org/T182942) (owner: 10Ladsgroup) [17:23:54] awight: Sure! But let's schedule a time. I wanna make progress on the tag stuff today, and tomorrow I have a lot of meetings [17:24:05] RoanKattouw: k that sounds best to me, too. [17:24:18] I’ll pencil something in for Thursday, feel free to move it. [17:24:23] Sounds ogod [17:24:32] Actually Wed and Thu are equally bad so feel free to use either [17:28:12] As for tags: some docs seem to claim that only tags defined in the valid_tag table can be used, but that's clearly not true [17:28:15] lol next week is fine, too. [17:29:46] Hmm but now you've moved it to 8am :( [17:29:57] RoanKattouw: Oops, yeah I see you’re back on Pacific time by then. [17:30:08] Yeah sorry [17:30:19] gcal's TZ feature is nice but gets a bit confusing when people travel [17:30:25] I'm on CET this week and PST next week [17:30:57] I'd rather have it this week at a time that I'm awake than next week at a time that I'm wishing I were still asleep ;) [17:31:32] It’s not a huge rush, cos MediaWiki is explicitly not happening in the current iteration of our project. [17:31:39] *MediaWiki integration [17:31:57] halfak: before I push this major model addition, I want to check this - I've added a new Model class EnsembleClassifier(ProbabilityClassifier) that does everything and made a new RandomForestEnsemble as its subclass, rather than just inserting this functionality in ProbabilityClassifier, does that sound okay? [17:32:05] RoanKattouw: Your calendar is lamentable :p [17:32:12] Yeah, I know :/ [17:32:26] Comes with the territory of being on the annual planning core group this year, I guess [17:32:47] That plus my volunteering engagement eats up a bunch of time, but that's mostly time that non-PST people can't have meetings anyway [17:33:35] Can I ask what that is? [17:33:58] codezee, only concern is that "Ensemble" is already taken as a word. [17:34:00] https://en.wikipedia.org/wiki/Multi-label_classification [17:34:17] I'm not sure we have a one-vs-rest or one-vs-all classifier. [17:36:11] awight: Certainly! https://medium.com/@Srish_Aka_Tux/volunteering-at-scripted-what-i-knew-taught-and-learned-b8174545b8d0 (not with Srishti, I only found out she was doing the same thing today when she published this, but she and I do the same work in different schools) [17:36:16] codezee, I think it's One-vs-the-rest [17:36:22] awight: Also, more calendar lamentations, I had to decline again, sorry [17:37:14] halfak: any difference b/w one-vs-rest and one-vs-all? [17:37:43] Yes. One vs. all is multiclass and once vs. rest is multilabel, it seems. [17:44:06] RoanKattouw: Maybe you can find a slot? halfak and I normally work c. 14:00-23:00 UTC [17:44:17] OK will do [17:44:28] The one you had before, on Thursday the 4th, would be fine actually [17:44:45] codezee, actually, it seems like one-vs-all and one-vs-rest are the same. I'm unclear on that :/ [17:45:10] yes I'm also refering to the docs and they seem to be the same [17:45:29] anyways i think it wouldn't harm to go with one, i'm going with onevsrest [17:46:04] Cool. :) [17:46:05] RoanKattouw: Sounds good, thx. ScriptEd looks fun! I’ve been hoping to get into exactly that sort of unpaid work, I’ll check that out if I can ever afford to move back to the U.S. ;-) [17:47:55] RoanKattouw: FYI your “contributions leadership” conflicts [17:48:08] NVM [17:48:11] wrong meeting [18:13:17] (03CR) 10Awight: Clean up ThresholdLookup (033 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400625 (https://phabricator.wikimedia.org/T181892) (owner: 10Ladsgroup) [18:19:29] wiki-ai/revscoring#1403 (ensemble - 2759b3d : Sumit Asthana): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/324230380 [18:22:18] Drafttopic coming soon.... :D [18:23:57] awight, you're a saint for letting me get some lunch before we chat. [18:24:10] halfak: lolol better outcomes [18:24:42] Any time, really. It’s pouring cats and dogs on the tin roof I’m under, it might not let up within the next 30 min actually [18:25:56] 10Scoring-platform-team (Current), 10ORES, 10Operations: Investigate why ORES logs are being written to syslog despite explicit logging config. Fix. - https://phabricator.wikimedia.org/T182614#3868235 (10awight) A little additional excitement... Now that we're seeing all the logs, some previously hidden er... [18:28:28] ^ I was reading `ls` wrongly :) [18:35:03] aaand… power went out to the town [18:35:04] tiny little lightning storm. [18:35:10] *Get a ground pin, y’all* [18:38:57] 10Scoring-platform-team: Jinja error in ORES - https://phabricator.wikimedia.org/T183949#3868333 (10awight) [18:39:11] 10Scoring-platform-team, 10ORES: Jinja error in ORES - https://phabricator.wikimedia.org/T183949#3868344 (10awight) [19:03:25] 10Scoring-platform-team, 10Collaboration-Team-Triage, 10Edit-Review-Improvements: "Hide probably good edits" should not hide my own edits on Special:Contributions/Myself - https://phabricator.wikimedia.org/T182462#3868633 (10jmatazzoni) The answer here is to turn the feature off locally, on the Contributions... [19:03:39] 10Scoring-platform-team, 10Collaboration-Team-Triage, 10Edit-Review-Improvements: "Hide probably good edits" should not hide my own edits on Special:Contributions/Myself - https://phabricator.wikimedia.org/T182462#3868639 (10jmatazzoni) 05Open>03Resolved a:03jmatazzoni [19:13:23] Ugh this change_tag ID stuff is going to be a bit of a pain [19:13:43] There are unique indexes on (ct_rc_id,ct_tag) etc [19:13:59] So migrating from ct_tag to ct_tag_id is going to be pretty annoying [19:16:10] I think what we have to do is not have indexes on (ct_rc_id, ct_tag_id) et al initially, and only introduce them once ct_tag_id is populated [19:16:24] But until then we also have to keep ct_tag populated for the unique index to keep working [19:16:25] sigh [19:16:36] DB migrations are hard [19:29:40] (03PS5) 10Ladsgroup: Clean up ThresholdLookup [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400625 (https://phabricator.wikimedia.org/T181892) [19:30:22] (03CR) 10Ladsgroup: Clean up ThresholdLookup (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400625 (https://phabricator.wikimedia.org/T181892) (owner: 10Ladsgroup) [19:38:49] (03PS1) 10Ladsgroup: Remove maintenance/CheckModelVersions.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401592 (https://phabricator.wikimedia.org/T183468) [19:42:19] (03CR) 10Awight: [C: 032] Clean up ThresholdLookup [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400625 (https://phabricator.wikimedia.org/T181892) (owner: 10Ladsgroup) [19:44:47] (03Merged) 10jenkins-bot: Clean up ThresholdLookup [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400625 (https://phabricator.wikimedia.org/T181892) (owner: 10Ladsgroup) [19:46:36] (03CR) 10Awight: [C: 032] Remove maintenance/CheckModelVersions.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401592 (https://phabricator.wikimedia.org/T183468) (owner: 10Ladsgroup) [19:49:05] (03Merged) 10jenkins-bot: Remove maintenance/CheckModelVersions.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401592 (https://phabricator.wikimedia.org/T183468) (owner: 10Ladsgroup) [19:53:32] arg. I somehow got a different roc-auc for true and false in the goodfaith model for fiwiki [19:53:37] should not be possible. [19:53:40] * halfak looks into it. [19:56:16] WTF >:( [19:56:21] Doesn't happen with any other models. [19:56:37] Maybe it's because of something we did with RandomForest in recent revscoring code. [19:56:38] hmmm [19:58:03] Looks like it predicts 100% True all the time. [20:08:29] Ohhh... Seems like we have a rounding error here. [20:08:30] Hmm. [20:10:02] Just saw this. Looks fun! [20:10:15] Anything I can write tests for? [20:10:50] I just cleared my plate of “hard” coding tasks [20:11:39] err, not the best choice of adjective. I mean to say, the remaining stuff is reading + thinking rather than coding. [20:13:17] I'm not sure. Let me talk it out. [20:13:32] So we set up threshold_ndigits so that we could limit the number of thresholds that we report statistics for. [20:13:42] This is primarily because it was taking up too much space. [20:14:13] By limiting thresholds to ndigits (3 by default) we limit the number of rows of stats substantially. [20:14:45] For most models, this means that we'll have ~500 - 1000 rows of data. [20:15:11] With the fiwiki model, the damaging/not-goodfaith case is so uncommon that the estimator gives a very low likelihood prediction. [20:15:31] This is because we are boosting 20k representative observations with 200k known good observations. [20:15:58] This vanishingly small likelihood estimate has useful increments smaller than 0.001 [20:16:04] So the rounding breaks the statistics. [20:16:22] Options I see (1) find a better way to generate a small-ish set of useful thresholds. [20:16:36] (2) round to more digits for this model specifically. [20:17:00] I think that's it. I can't think of a better option. [20:17:02] awight, ^ [20:17:39] IMO (2) [20:17:42] for now, at least. [20:18:09] I’m just wondering what that actually looks like in the number of stats rows, though,. [20:18:20] Oh about 10 [20:18:22] Maybe 20 [20:18:24] heh [20:18:30] Assuming we set ndigits=5, though [20:18:38] Ahh yeah. Let's see. [20:18:55] We get a long tail of almost nothing, then 10,000 useless data points at the very edge? [20:19:04] s/useless/nearly redundant/ [20:19:34] Not clear to me. [20:19:54] I think that if we want to do (1), we're going to want some information theoretic measure of the usefulness of a threshold. [20:20:04] E.g. how much did the statistics change at this increment. [20:20:22] I like recall specifically for this because it's not very sensitive to frequency of the positive class. [20:20:22] Yeah or less sophisticated, a 2d line graph compression algo [20:20:54] Arguably a threshold is only interesting if it includes more stuff. [20:21:12] Oh wait. It's also useful if it excludes stuff while still including the same amount of stuff. [20:21:13] Hmm. [20:25:02] halfak: There’s something I’m not understanding. Are we limiting ndigits in the inputs or outputs? [20:25:18] Cos it seems that limiting just the output precision would give us what we want. [20:25:18] Both. [20:25:35] Right. We generate statistics based on what we publish. [20:25:45] But we could generate statistics and only publish a subset of thresholds. [20:25:57] Still we have the problem that it's hard to pick a useful threshold for fiwiki. [20:25:59] :| [20:26:01] Arbitrary precision on the input dimension, but limited precision for the output might give us a high-fidelity curve with finite data size. [20:26:03] (03PS1) 10Ladsgroup: Fully deprecate Cache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401608 (https://phabricator.wikimedia.org/T181334) [20:26:06] haha that’s for real though. [20:28:03] OK I have code that I think will work. Re-CV-ifying [20:30:18] (03CR) 10jerkins-bot: [V: 04-1] Fully deprecate Cache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401608 (https://phabricator.wikimedia.org/T181334) (owner: 10Ladsgroup) [20:30:19] ^ The final patch to deprecate the most horrible class in ORES extension is now up ^_^ [20:30:27] \o/ [20:35:34] (03PS2) 10Ladsgroup: Fully deprecate Cache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401608 (https://phabricator.wikimedia.org/T181334) [20:40:24] wiki-ai/revscoring#1405 (threshold_ndigits_option - 0b4d664 : halfak): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/324277893 [20:43:51] Okay! And away we go! [20:44:17] (03CR) 10Awight: [C: 032] "I didn't even feel the surgery happen!" (032 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401608 (https://phabricator.wikimedia.org/T181334) (owner: 10Ladsgroup) [20:45:57] (03Merged) 10jenkins-bot: Fully deprecate Cache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401608 (https://phabricator.wikimedia.org/T181334) (owner: 10Ladsgroup) [20:47:35] Gotta do grocery things, back in a few hours. [20:48:25] Good time for me to take a break too so I'll be AFK for about 30 mins. Time to pedal! [20:51:29] (03PS1) 10Ladsgroup: Follow up to I4246706 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401611 [20:51:42] (03CR) 10Ladsgroup: "Done in https://gerrit.wikimedia.org/r/401611" (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401608 (https://phabricator.wikimedia.org/T181334) (owner: 10Ladsgroup) [20:57:13] I'm calling it a day [20:57:14] o/ [21:17:06] o/ [21:17:14] Have a good one, Amir1 :) [23:03:40] OK I'm out of here. [23:03:45] See ya, folks! [23:24:22] (03CR) 10Petar.petkovic: "This change causes:" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401608 (https://phabricator.wikimedia.org/T181334) (owner: 10Ladsgroup)