[10:29:41] 10Scoring-platform-team (Current), 10ORES, 10Operations: Reboot oresrdb - https://phabricator.wikimedia.org/T189781#4067975 (10akosiaris) 05Open>03Resolved a:03akosiaris Indeed. Here it is https://wikitech.wikimedia.org/wiki/Incident_documentation/20180314-ORES. Anyway, let's close this for now and f... [13:46:03] o/ [13:58:57] 10Scoring-platform-team (Current), 10ORES, 10Operations: Reboot oresrdb - https://phabricator.wikimedia.org/T189781#4068740 (10Halfak) Great! Thank you. [14:33:16] 10Scoring-platform-team (Current): Drafttopic estimators take very less time but tune hangs up forever - https://phabricator.wikimedia.org/T190288#4068892 (10Sumit) [14:33:33] 10Scoring-platform-team (Current): Drafttopic estimators take very less time to train but tune hangs up forever - https://phabricator.wikimedia.org/T190288#4068903 (10Sumit) [14:34:40] 10Scoring-platform-team (Current): Drafttopic estimators take very less time to train but tune hangs up forever - https://phabricator.wikimedia.org/T190288#4068892 (10Halfak) I wonder if you could figure out where the hangup is happening by adding "--debug" to the tune utility call. [14:36:08] 10Scoring-platform-team (Current): Drafttopic estimators take very less time to train but tune hangs up forever - https://phabricator.wikimedia.org/T190288#4068907 (10Sumit) [14:36:23] 10Scoring-platform-team (Current): Drafttopic estimators take very less time to train but tune hangs up forever - https://phabricator.wikimedia.org/T190288#4068892 (10Sumit) >>! In T190288#4068904, @Halfak wrote: > I wonder if you could figure out where the hangup is happening by adding "--debug" to the tune uti... [14:36:50] awight: hi, is JADE still in development or do you plan to roll it out to production soon? [14:39:54] Hi Hauskatze! We just deployed to Beta. [14:40:12] I don't think JADE will be in "production" very soon. We're pretty understaffed at the moment. [14:40:42] ok thanks :) - I'll keep posting true/false positives on wiki for now then [14:40:49] Hauskatze, we're working with a designer to develop the UI that people will use to submit reports from on-wiki [14:40:56] Would you be interested in helping support that work? [14:42:02] We're just starting the process of building user-scenarios. Your onwiki work would help make sure we get them right. [14:42:26] Also, thanks for filing the issues on-wiki. Is the list getting big? I'd be interested in discussing any trends you are seeing. [14:42:57] Well, I'd love to help, but I'm don't have any coding knowledge but very basic-ish stuff [14:43:03] not very bit atm [14:43:12] not even big at all [14:43:18] just 4 entries for eswb [14:43:54] Hauskatze, right now, we need insight from a real user about what makes sense :) [14:44:04] No programming -- just imagination and experience. [14:44:19] Good to know re. reports. Let me know when you think you see something worth discussing :) [14:44:35] Will do. Thanks for all your work. [14:45:13] when i'm running manually its certainly not 1.5minutes when i'm running cv_train manually [14:45:36] codezee, ? [14:48:01] halfak: when i started cv_train with 200 estimators its certainly taking more than 5 minutes, something's wrong with that script [14:48:24] codezee, is it actually training a model? [14:48:24] i'm now running with all the debugging and stuff on terminal, rather than redirecting it to /dev/null [14:49:34] oh, i see the problem there was an invalid argument so it was exiting, running again [14:50:02] :) [14:50:18] going down for a reboot. BRB [14:50:30] 10Scoring-platform-team (Current): Investigate runtime of tune with high number of estimators - https://phabricator.wikimedia.org/T190288#4068995 (10Sumit) [15:04:08] wiki-ai/editquality#213 (simplify_template - e055679 : Adam Wight): The build was broken. https://travis-ci.org/wiki-ai/editquality/builds/356391240 [15:35:59] halfak: If an observation is autolabeled as needs_review, but no human labeling happened, we should: (1) discard the observation, and (2) tally how many rows are like this, right? [15:36:21] Currently, instead of (1), we assume defaults, which seems wrong to me. [15:36:38] defaults=(good faith, not damaging) [15:38:59] awight, +1 [15:39:25] cool. Training to get model_info signal [15:39:59] * awight UGHs at repeated extraction runs which could somehow be cached. [15:42:51] awight, oh! you can :) [15:43:17] * awight hold breath [15:43:22] sec [15:44:01] OK! so. the extract utility should use old cache values if they are present. [15:45:16] * halfak digs [15:45:21] It can't look in its own output file, cos that's deleted on ">" [15:45:30] https://github.com/wiki-ai/revscoring/blob/master/revscoring/utilities/extract.py#L205 [15:45:33] Oh [15:51:47] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10Patch-For-Review, 10Regression, and 2 others: OresDamagingPref back-compatibility is logging exceptions - https://phabricator.wikimedia.org/T182354#4069229 (10Trizek-WMF) [16:00:33] wiki-ai/editquality#217 (simplify_template - 8ee96d0 : Adam Wight): The build was broken. https://travis-ci.org/wiki-ai/editquality/builds/356413225 [16:05:31] halfak: https://phabricator.wikimedia.org/P6872 [16:05:56] That's the damaging model, somehow everything looks a hair lower to my untrained eye. [16:06:03] Running the goodfaith training... [16:06:10] Looks pretty good though [16:06:30] Damn that macro-pr_auc [16:06:35] I wish we had that for enwiki [16:07:01] Looks slightly better to me [16:07:19] Probably insignificant statistically speaking [16:07:54] okay, this is a valid way to do merge_labels, then...? [16:08:05] My headphones finally bit the dust. These travel headphones have seen a lot of use. [16:08:19] I think so awight. Let's stick with it. [16:08:28] It should solve huwiki [16:08:36] Oh hey -- what do we do in the case where there's a goodfaith label but no damaging label? [16:08:44] YAGNI intersection, I suppose [16:09:15] halfak: https://github.com/wiki-ai/editquality/pull/144/commits/334c412565c0a06e44f0d70ce9cd78d260344590 [16:12:38] halfak: The goodfaith model_info looks like a bigger change, https://phabricator.wikimedia.org/P6874 [16:14:00] awight, how did the false prop fall for goodfaith? [16:14:12] Shouldn't have only increased? [16:14:24] I was just gonna say. Only count("true") should have dropped [16:14:25] Since we aren't blindly applying the goodfaith=true default? [16:14:43] * halfak thinks [16:14:51] I'll grep through the files [16:17:55] The master and simplify_template branches have the same count of goodfaith=true rows in labeled_revisions. [16:18:39] halfak: with 50 estimators it took 45min real time and 60min user time, so i'm expecting it to blow off till it reaches 200 :/ [16:19:09] "blow off"? [16:19:20] i mean take "a lot of time" [16:19:30] gotcha :) [16:19:34] It works while you sleep [16:19:53] I don't understand why the model_info shows lower numbers of observations than the labeled_revisions file. Is there another filter that would throw out rows? [16:23:11] I don't think so. [16:23:20] Oh! Wait maybe in feature extraction? [16:23:30] Are you looking at the rows in "w_cache"? [16:23:48] If feature extraction fails for a row, it's discarded with a loud log message. [16:23:56] no, w/o cache. That explains it, thanks! [16:23:57] It's common for deleted pages/suppressed revisions. [16:24:00] :) [16:25:32] Interesting, now I see the right number of goodfaith=false, but there are 17712 goodfaith=true rows in the w_cache file, and the model_info reports 17592 [16:26:04] Doesn't make sense to me. [16:26:12] ok, I'll dig around [16:26:58] https://github.com/wiki-ai/revscoring/blob/master/revscoring/scoring/statistics/classification/counts.py#L12 [16:28:20] nice, ty [16:28:47] I'm curious to learn how pop_rates are calculated. [16:30:06] they come from the user [16:30:36] +1 they're plugged into the config, but how do I calculate one? [16:31:33] Oh, I start with the random sample, and then apply filters and algebra [16:31:47] E.g. if we sampled 20k revisions [16:31:53] And 4556 were labeled [16:32:05] 452 of those labeled edits were "damaging" [16:32:19] then prop-rate of damaging=true is 452/20k [16:32:36] 0.02 [16:32:37] ok cool [16:32:59] I'll make sure we have that documented... [16:33:01] This doesn't account well for deleted revisions and the like, but that's a relatively minor issue. [16:35:56] ooo nvm me. I've got a few copies of datasets, and was measuring the wrong one. [16:36:01] model_info is sane. [16:36:32] I'll try to account for the difference using new code, though. [16:38:06] \o/ [16:38:10] me feels more sane again [16:38:16] */me [16:38:18] :D [16:44:11] Okay, it got weirder [16:44:38] I found duplicate rows in the master branch cswiki.labeled_revisions.20k_2016.json output [16:45:51] e.g. rev_id 12424101 [16:46:39] wtf, that row is in master/cswiki.autolabeled_revisions.20k_2016.no_review.json and in master/cswiki.human_labeled_revisions.5k_2016.json as needing review [16:49:23] awight, tgr changed the trusted groups [16:49:26] see my note on his PR [16:49:32] that changes the labeling by autolabel [16:49:48] in cswiki, though? [16:50:07] I'll rebuild all the cswiki datasets again, using master code... [16:50:24] Oh. was thinking huwiki. I think the same concept still applies. [16:50:30] Also users rights change [16:50:39] We look at their rights at the time we run the script. [16:51:08] Doing lunch and a meeting now. Will be back soon. [16:51:27] kk [16:51:29] Actually, it looks like the research showcase comes after that :| [16:52:03] SoS is about to eat my lunch [16:59:33] Master branch is still writing contradictory and duplicate rows. That gives me something concrete to do, at least. [17:11:08] With revision 12424101, it seems to be "trusted user" vs "reverted edit" [17:16:28] I think I get it. So wikilabels takes autolabeled data and passes those values back in human_labeled. Autolabeling has changed between cutting the original wikilabels campaign and now. [17:21:58] This is something merge_labels can deal with, and I suppose we should use the autolabeled values in the autolabeled file, over those in the human_labeled file. This is easy to program, but feels weird cos it will prioritize different fields in each input. [17:23:01] oh! Now I'm thinking that we were overestimating the goodfaith model health, since it was getting duplicates in the training set it would overfit to those and seem to perform better? [18:10:32] awight, yes, that's possible. [18:10:45] We should not have duplicate labels in the training/test set. [18:10:52] Can you confirm where the duplicates are coming in? [18:11:00] Just starting up the research showcase now. [18:11:11] Looks like Leila et al. are presenting on section recommendation. [18:11:17] https://www.youtube.com/watch?v=ACevHs0sMMw [18:11:26] See #wikimedia-research for the official backchannel [18:36:42] our research showcase on article expansion algorithms just started. Tune in —> https://www.youtube.com/watch?v=ACevHs0sMMw or join the discussion in #wikimedia-research [18:39:13] repost :P [18:42:23] dartarbot [18:54:32] https://www.irccloud.com/pastebin/9qUfzLeY/ [18:55:35] It also makes common sense, at least according to my explanation, earlier. [18:55:51] & your guess [18:57:09] Do you agree with my proposal to have merge_labels take human labeling from human_labeled, and autolabeled from the autolabeled stream? [18:58:28] halfak: ^ [18:58:50] awight, yes. I think that makes the most sense. [18:58:54] * awight wanders off to code it [18:58:55] ty [18:59:01] Too bad we can't just use deep_merge.merge() [18:59:52] We could if it included parameters for how to prioritize fields between inputs. [19:00:08] Weird problem to have, I don't think I've seen the need before now. [19:00:30] Right. For now, I think we should just do what makes sense for editquality [19:00:52] Yup & it happens to be complicated [19:20:51] I'm still stuck on how to take advantage of old w_cache when making small changes to the pipeline. [19:21:06] Re-running the extractor for the tenth time :-] [19:22:08] awight, you can pipe the old w_cache into the extractor. [19:22:20] This is less useful if there's a change to the observations [19:22:24] Which I guess is your case :\ [19:22:34] It's better if there's a change to the extracted features. [19:22:45] It's a small change to the set of labeled_revisions [19:23:16] Sounds like this ad-hoc piping will probably merge revisions that appear in the old w_cache but not in the new labeled_revisions? [19:24:57] you could get that by using revscoring_union_merge_observations :) [19:25:03] - _ [19:25:12] O_o [19:26:34] I wouldn't want that, I think we should be dropping observations that only appear in the old w_cache [19:28:07] Ahh good point. [19:28:14] So not even intersect would work [19:28:24] Because that would drop observations that don't appear in both [19:28:56] +1 [19:29:05] * halfak smells his slippers start to cook and moves his feet away from the heater [19:29:11] lol [19:29:18] my life too [19:29:19] ^ not a metaphor. Actually happened. [19:29:42] I'm like *sniff sniff* What is that smell? [19:29:46] lol [19:30:54] 11226/17903 [19:31:08] argh I want to automate w_cache reuse [19:32:27] FWIW, you're probably just hitting varnish with the API calls [19:32:30] So it's super cheap [19:33:03] :) great, I'm saving WMF money as usual [19:34:07] I'm like *sniff sniff* What is that smell? <-- /me hears on the scanner that a fire engine has been dispatched to halfax's house [19:35:28] lol [19:36:04] awight, and saving jynus some stress hormone. [19:38:49] I'm trying to decide how to measure the model health delta, and it seems the only fair thing to do is to modify master-branch w_cache to remove the dupes, then re-train the old model. [19:38:56] I might not go to that length... [19:41:43] awight, I don't think you need to go that far. Just make sure we don't see a huge drop. [19:41:54] How many dupes were part of master? [19:42:20] * halfak picks up JADE blog post [19:42:22] I MADE IT [19:42:40] awight, I want to make an aggressive editing pass on the outline. [19:42:48] Actually I might fork and we can discuss. [19:42:57] +1 both sound good [19:43:04] you can own the blog post if you want... [19:43:52] goodfaith differences, with the newest model: https://phabricator.wikimedia.org/P6878 [19:45:30] awight, oh! [19:45:35] I see what's going on here. [19:45:41] The old process was bananas. [19:45:57] This process is 100% more fair and appropriate. [19:46:08] This is a good change. +1 [19:47:39] awight, combining your work and the pop-rate params, this is a much cleaner way of dealing with complications around wikis like cswiki. [19:47:49] I think this closes the door on some weirdness in our past :) [19:48:02] :D [19:48:10] Here's the damaging diff: https://phabricator.wikimedia.org/P6879 [19:52:27] halfak: This is ready for CR, then, https://github.com/wiki-ai/editquality/pull/144 [19:53:15] awight, want me to squash this? [19:53:22] +1 sounds good [19:54:29] Biggest commit message ever [19:54:55] \o/ nice work, dude [19:54:59] That was a lot of stuff :) [19:55:06] Want to try it out on huwiki? [19:55:10] Or should I? [19:55:20] What was that revert about? [19:55:47] Oh. Misclick. [19:55:49] I'm happy to look at huwiki, probably will ask some questions. [19:55:55] Did it stick awight? [19:56:04] lemme see [19:56:13] I clicked and then canceled. [19:56:23] yep. it's just a rogue branch [19:56:33] probably useful for when we see everything melt down :) [19:56:56] lol [19:57:11] * halfak kills it [19:57:15] For https://github.com/wiki-ai/editquality/pull/145 , I don't think there's an easy way to do exactly what you're suggesting, passing a file stream [19:57:31] Right. Not passing a file stream, but passing a string itself. [19:57:39] What do you think about just catching the TemplateNotFound exception? [19:57:44] There's an easy way to do that detailed in the comment. [19:57:53] I think we should let that propagate up probably. [19:57:55] oh, passing the string! sure, lemme see [19:58:05] I wonder if there's a method for reading a file. [19:58:16] open(file).read() [19:58:20] Seems that jinja was not designed to be a simple utility -- more a framework [19:58:21] lol [19:58:22] That works. [19:59:35] and what we're achieving is to improve architecture, or catch the not_found exception in a better place, or what? [19:59:40] testability was mentioned. [20:17:08] awight, catch the exception in a better place is one benefit. [20:17:27] E.g. if a param is invalid (file not found) we should error out while reading the commandline params if possible. [20:18:06] Testability is another one. Much easier when we can provide a function with a template string rather than creating a temp file and asking Jinja to pretend that we've implemented a framework directory in "/tmp" [20:18:15] The latter sounds super fragile. [20:18:33] * halfak always needs to remind himself of the difference between latter and ladders [20:19:06] Oof, I accidentally snuck a WIP into the last PR [20:21:12] Sorry to make you explain such a small change, I was disturbed by duplicating template loader code, but I think it's a good suggestion at the end of the day. [20:21:17] PR is ready for CR :) [20:21:35] Got a quick link handy? [20:22:33] https://github.com/wiki-ai/editquality/pull/145 [20:22:54] hehe that's the problem with shortened URLs, they're opaque [20:23:28] goes to the branch -- not the PR :P [20:25:22] 10Scoring-platform-team, 10Community-Tech, 10ORES, 10Teahouse: Improve HostBot invitation efficiency by integrating with ORES - https://phabricator.wikimedia.org/T190338#4070518 (10TBolliger) [20:25:42] ^_^ ^ [20:35:34] This is a thing I've never had to do in a Makefile before. I'm trying to rebuild the model_info text file whenever a model is trained. [20:40:47] A wrapper target made easy work of this [20:40:59] Should tuning reports be generated every time a model is rebuilt? [20:41:02] I'll assume yes [20:43:46] Actually, it sounds expensive. [20:45:35] awight, I agree. Too expensive [20:46:03] halfak: ^ [20:46:37] There's some junk in there from not rebuilding the Makefile after a [wip] [20:49:34] wiki-ai/editquality#228 (model_info - ba9d7db : Adam Wight): The build passed. https://travis-ci.org/wiki-ai/editquality/builds/356560398 [20:55:51] awight, I added some notes on that PR, but I need to run away now. [20:55:57] Taking the ferret to the vet. [20:56:03] Will be on in a couple of hours. [20:56:40] godspeed! [20:58:43] wiki-ai/editquality#230 (awight-huwiki-damaging-goodfaith - 5873f4f : Gergő Tisza): The build passed. https://travis-ci.org/wiki-ai/editquality/builds/356565002 [21:09:19] 10Scoring-platform-team (Current), 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Wikimedia-Incident: [Blocked] Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4070684 (10mmodell) @awight: Scap 3.7.7 is deployed now, are you... [21:10:07] 10Scoring-platform-team (Current), 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Wikimedia-Incident: [Blocked] Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4070691 (10awight) Cool! Yes it was 3.7.7 I was waiting for, wi... [21:10:28] 10Scoring-platform-team (Current), 10Operations, 10Release-Engineering-Team (Watching / External), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4070707 (10awight) [21:10:39] 10Scoring-platform-team (Current), 10Release-Engineering-Team (Watching / External), 10Wikimedia-Incident: [Spike] Write reports about why Ext:ORES is helping cause server 500s and write tasks to fix - https://phabricator.wikimedia.org/T181010#4070710 (10awight) [21:10:41] 10Scoring-platform-team (Current), 10Operations, 10Release-Engineering-Team (Watching / External), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#3778593 (10awight) 05stalled>03Open [21:11:05] 10Scoring-platform-team (Current), 10Operations, 10Release-Engineering-Team (Watching / External), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4070712 (10mmodell) I probably _should_ have called it 3.8 ;) [21:17:08] (03PS13) 10Awight: [DNM] Build venv into deployed source dir [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/392682 (https://phabricator.wikimedia.org/T181071) [21:17:40] (03PS14) 10Awight: Build venv into deployed source dir [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/392682 (https://phabricator.wikimedia.org/T181071) [21:42:21] 10Scoring-platform-team (Current), 10Operations, 10Release-Engineering-Team (Watching / External), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4070861 (10awight) [22:25:35] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [extensions/ORES] - 10https://gerrit.wikimedia.org/r/421165 (owner: 10L10n-bot) [22:48:27] o/ [22:48:48] * halfak gets back to reviewing 144 [22:50:17] halfak: How's the wild animal? [22:50:38] She's OK. Got an infection on a benign tumor, so we're doing a round of antibiotics. [22:50:48] aww, glad there's a happy ending [22:51:01] (03CR) 10Ladsgroup: [V: 032 C: 032] Build venv into deployed source dir [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/392682 (https://phabricator.wikimedia.org/T181071) (owner: 10Awight) [22:51:06] Yeah. It's fun to go to the vet with here these days because everyone comments on how old she is. [22:51:18] She's very old for a ferret. [22:51:33] how old? [22:51:45] She'll turn 8 soon. [22:51:55] Amir1: thanks! Please let me deploy that tomorrow, there's some stuff I want to babysit. [22:52:02] I've never heard of an 8 year old ferret. My vet says they are very rare. [22:52:09] sure :) [22:52:11] she's seen the inside of pants that don't even exist any more [22:52:13] o/ Amir1 [22:52:17] I just got to do some ORES stuff for today :D [22:52:18] you back from vacation? [22:52:31] well, technically not but I have some free time [22:52:41] halfak: btw I'm applying the new code to huwiki_models, and running into things caused by my lack of understanding. I'll get through it tomorrow [22:53:00] :D looks like the fawiki stuff is getting close. I almost pinged you earlier today to re-announce on fawiki :) [22:53:10] awight, gotcha. [22:54:06] yeah, I can also do the rest too :D It's not much [22:54:19] o/ [22:55:44] o/ [22:56:40] 10Scoring-platform-team (Current), 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Simplify hewiki, cswiki, plwiki, svwiki - https://phabricator.wikimedia.org/T188270#4071117 (10Ladsgroup) plwiki is not as it's different than others. I leave it to you if you want to get it done or not. [22:57:03] 10Scoring-platform-team (Current), 10User-Ladsgroup: Fix or decide on edge cases of Makefile - https://phabricator.wikimedia.org/T186453#4071120 (10Ladsgroup) This is done [22:57:17] 10Scoring-platform-team (Current), 10User-Ladsgroup: Fix or decide on edge cases of Makefile - https://phabricator.wikimedia.org/T186453#4071133 (10Ladsgroup) [23:03:29] Amir1, exciting to see that cleaned up ^ [23:04:11] :) I'm officially leaving that part to do other things \o/ [23:04:47] Amir1, did you get an email re. hackathon travel? [23:05:03] yup, it's booked now yay [23:08:41] Awesome. :) [23:30:34] https://www.mediawiki.org/wiki/JADE/Intro_blog/Short_story [23:30:40] OK that's good enough for today :) [23:31:40] 10Scoring-platform-team (Current), 10JADE, 10WMF-Communications: Blog about JADE - https://phabricator.wikimedia.org/T183200#4071272 (10Halfak) https://www.mediawiki.org/wiki/JADE/Intro_blog/Short_story [23:46:26] I'm deploying a rather scary change for the ORES ext. [23:46:36] Oh? [23:46:41] Right now? [23:46:44] Amir1, ^ [23:46:54] yup, SWAT [23:50:01] * halfak is around for moral support [23:50:23] * halfak fiddles with his cycling fitness metrics. [23:55:25] Nothing exploded so far :D [23:57:10] boom [23:59:27] Hauskatze: Don't scare me :D