[05:00:27] wiki-ai/revscoring#811 (poc_hashing_vector - 33f1b2b : sabya): The build has errored. https://travis-ci.org/wiki-ai/revscoring/builds/160072803 [06:57:09] (03CR) 10Ladsgroup: [C: 032] Drop unique part from oresm_model index [extensions/ORES] - 10https://gerrit.wikimedia.org/r/309825 (https://phabricator.wikimedia.org/T144432) (owner: 10Ladsgroup) [06:57:14] (03CR) 10jenkins-bot: [V: 04-1] Drop unique part from oresm_model index [extensions/ORES] - 10https://gerrit.wikimedia.org/r/309825 (https://phabricator.wikimedia.org/T144432) (owner: 10Ladsgroup) [07:04:04] (03PS2) 10Ladsgroup: Drop unique part from oresm_model index [extensions/ORES] - 10https://gerrit.wikimedia.org/r/309825 (https://phabricator.wikimedia.org/T144432) [07:04:36] (03CR) 10Ladsgroup: "PS2 is rebase only" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/309825 (https://phabricator.wikimedia.org/T144432) (owner: 10Ladsgroup) [07:05:47] (03CR) 10Ladsgroup: [C: 032] Drop unique part from oresm_model index [extensions/ORES] - 10https://gerrit.wikimedia.org/r/309825 (https://phabricator.wikimedia.org/T144432) (owner: 10Ladsgroup) [07:05:52] (03CR) 10Ladsgroup: [C: 032] Move storeScores stuff into another method [extensions/ORES] - 10https://gerrit.wikimedia.org/r/309824 (owner: 10Ladsgroup) [07:06:56] (03Merged) 10jenkins-bot: Drop unique part from oresm_model index [extensions/ORES] - 10https://gerrit.wikimedia.org/r/309825 (https://phabricator.wikimedia.org/T144432) (owner: 10Ladsgroup) [07:06:59] (03Merged) 10jenkins-bot: Move storeScores stuff into another method [extensions/ORES] - 10https://gerrit.wikimedia.org/r/309824 (owner: 10Ladsgroup) [10:47:42] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Complete Spanish Wikibooks edit quality campaign - https://phabricator.wikimedia.org/T145408#2640113 (10MarcoAurelio) I've just announced it and fixed my cookies, so I've been able to do an initial tagging of 50 revisions. [13:50:46] o/ [13:51:44] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Complete Spanish Wikibooks edit quality campaign - https://phabricator.wikimedia.org/T145408#2640480 (10Halfak) Thanks @MarcoAurelio. We're working on making that user script easier to use. Thanks for your patience with it. [14:07:57] Looks like I'm backlog grooming by myself today [14:13:11] halfak: I'm joining [14:13:14] wait a sec [14:13:43] Joined [14:14:04] refreshed [14:41:05] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Deploy ORES review tool in es.wikibooks - https://phabricator.wikimedia.org/T145394#2640657 (10Halfak) [14:42:44] 06Revision-Scoring-As-A-Service, 10DBA, 10MediaWiki-extensions-ORES: Ensure ORES data violating constraints do not affect production - https://phabricator.wikimedia.org/T145356#2640661 (10Halfak) [14:43:56] 06Revision-Scoring-As-A-Service, 10ORES: Investigate short period of ores-web-03 insanity - https://phabricator.wikimedia.org/T145353#2640665 (10Halfak) a:03Halfak [14:44:53] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : Generate monthly article quality dataset - https://phabricator.wikimedia.org/T145655#2640686 (10Halfak) p:05Triage>03Low [14:46:09] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : Generate monthly article quality dataset - https://phabricator.wikimedia.org/T145655#2637137 (10Halfak) https://github.com/wiki-ai/wikiclass/pull/28 [14:52:54] 06Revision-Scoring-As-A-Service, 10revscoring: Train on all data, Report test statistics on cross-validation - https://phabricator.wikimedia.org/T142953#2640719 (10Halfak) ``` $ head -n 5648 datasets/enwiki.observations.damaging_w_cache.20k_2015.json | ./utility cv_train revscoring.scorer_models.GradientBoosti... [15:00:49] 10Revision-Scoring-As-A-Service-Backlog, 10revscoring: Chinese language utilities - https://phabricator.wikimedia.org/T109366#2640761 (10Halfak) [15:00:51] 10Revision-Scoring-As-A-Service-Backlog, 10Bad-Words-Detection-System, 10revscoring: Add language support for Chinese - https://phabricator.wikimedia.org/T145663#2640763 (10Halfak) [15:01:54] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Complete Spanish Wikibooks edit quality campaign - https://phabricator.wikimedia.org/T145408#2629130 (10Halfak) p:05Triage>03Normal [15:05:36] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Deploy ORES review tool in es.wikibooks - https://phabricator.wikimedia.org/T145394#2628616 (10Halfak) p:05Triage>03Normal [15:06:05] 10Revision-Scoring-As-A-Service-Backlog, 10Data-release, 06Research-and-Data, 10rsaas-articlequality , 03Research-and-Data-2017-Q1: Formal publication of article quality score dataset - https://phabricator.wikimedia.org/T145332#2640794 (10Halfak) [15:07:32] 10Revision-Scoring-As-A-Service-Backlog, 10Data-release, 06Research-and-Data, 10rsaas-articlequality : Formal publication of article quality score dataset - https://phabricator.wikimedia.org/T145332#2626819 (10Halfak) [18:20:41] 06Revision-Scoring-As-A-Service, 10revscoring, 07Spike: [Spike] Investigate HashingVectorizer - https://phabricator.wikimedia.org/T128087#2641684 (10Sabya) @Halfak: **Plots with and without sample weights:** The plots below also include roc score for greater number of estimators to find where the return fl... [18:28:22] 10Revision-Scoring-As-A-Service-Backlog, 10revscoring, 10rsaas-editquality: [Research] What's the difference in scoring ranges when we don't balance sample weight? - https://phabricator.wikimedia.org/T145809#2641699 (10Halfak) [18:28:50] 06Revision-Scoring-As-A-Service, 10revscoring, 07Spike: [Spike] Investigate HashingVectorizer - https://phabricator.wikimedia.org/T128087#2063555 (10Halfak) Looks good. Seems like a clear win here. I think that we're sure to see an ROC drop due to the increased weighting of fewer observations. But I wonde... [18:36:50] 06Revision-Scoring-As-A-Service, 10revscoring, 07Spike: [Spike] Investigate HashingVectorizer - https://phabricator.wikimedia.org/T128087#2641726 (10Sabya) @Halfak, Also, regarding ROC score difference between T128087#2600696 and current: current one is correct. GridSearchCV is calculating it. Earlier I was... [18:47:12] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Implement ~100 most important hash vector features in editquality models - https://phabricator.wikimedia.org/T145812#2641766 (10Halfak) [18:49:25] 06Revision-Scoring-As-A-Service, 10revscoring, 07Spike: [Spike] Investigate HashingVectorizer - https://phabricator.wikimedia.org/T128087#2641783 (10Halfak) Next step: T145812 :) [18:51:52] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Implement ~100 most important hash vector features in editquality models - https://phabricator.wikimedia.org/T145812#2641786 (10Halfak) So, I've been thinking that we might want to discover our high utility hash vector using a larger analysis of rev... [21:48:56] wiki-ai/revscoring#818 (model_cv - 20c6eb6 : halfak): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/160297460 [21:55:40] o/ RoanKattouw [21:55:46] 06Revision-Scoring-As-A-Service, 10revscoring: Train on all data, Report test statistics on cross-validation - https://phabricator.wikimedia.org/T142953#2552210 (10Halfak) Woo! Seems to work as intended. See https://github.com/wiki-ai/revscoring/pull/288 [21:56:38] Hey halfak [21:58:46] RoanKattouw, I heard you suggest that we convene a smaller group to talk ORES integrations post product event. I just wanted to say +1 and thanks :) [21:59:12] Yes, once I get to a real computer I'll set that up [21:59:30] Stuck with a loaner? [21:59:58] What I wanna do is for us to agree on ownership of things like WikiLabels and the ORES extension and a few other things [22:00:01] Nah I'm on my phone [22:00:08] Gotcha [22:00:11] My laptop is in my bag which is somewhere [22:00:17] on this floor