[00:02:49] (03CR) 10Legoktm: "So...I think upsert is wrong here, sorry. The indexes are not unique to be used in an OR, they have to be used together in AND...so doing " [extensions/ORES] - 10https://gerrit.wikimedia.org/r/307624 (https://phabricator.wikimedia.org/T144195) (owner: 10Ladsgroup) [04:35:31] 06Revision-Scoring-As-A-Service, 10revscoring, 07Spike: [Spike] Investigate HashingVectorizer - https://phabricator.wikimedia.org/T128087#2600696 (10Sabya) @Halfak: New results with below params: ``` gbc = GradientBoostingClassifier(n_estimators=700, max_depth=7, learning_rate=0.01) sample_weight=[18939... [07:54:09] (03CR) 10Ladsgroup: "I disagree. We definitely have an issue in db schema of ores_model (see T144432) but other than the unique index, everything is okay. I te" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/307624 (https://phabricator.wikimedia.org/T144195) (owner: 10Ladsgroup) [08:06:07] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 13Patch-For-Review, 15User-Ladsgroup: Redundant results in ORES review tool - https://phabricator.wikimedia.org/T144233#2600884 (10Ladsgroup) Okay, Imagine we had this in ores_model | oresm_id | oresm_name | oresm_is_current | 1 | damaging | 0... [08:35:46] Platonides: ^_^ if you label edits and finish the labeling campaign we can make more accurate predictions and enable ORES review tool as a beta feature in Spanish Wikipedia [08:35:51] 1- the labeling campaign https://es.wikipedia.org/wiki/Wikipedia:Etiquetando [08:36:09] 2- ORES review tool en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-betafeatures [08:53:08] btw. We keep informal and swear words of each language (I wrote the swear finding part :D) [13:03:50] o/ [13:13:36] halfak: o/ [13:19:59] Amir1, so it's kind of anti-climatic to review the results of my TFiDF selector since it used a one-way hash to get here. [13:20:16] I can't review the important keys without re-hashing everything! [13:20:21] Stupid one-way-ness [13:20:29] :) [13:20:40] Also, I'm still working on making hashing faster [13:20:55] Right now, it takes 0.15 seconds to tokenize [[:en:Biology]] [13:20:56] 10[3] 04https://meta.wikimedia.org/wiki/:en:Biology [13:21:04] One-way hash has its own perks, like protecting our passwords :D [13:21:17] Then it takes 0.18 seconds to turn that into ngrams and skipgrams [13:21:31] Then it takes another 0.50 seconds to turn that into hashes. [13:21:44] So, it's *sllllooooowwww* [13:21:50] that's a lot :( [13:22:39] Yeah... Working on it. :| [13:22:56] We're down from hashing taking 1.5 seconds [13:26:42] It takes about 2 minutes to read 20k pickle caches and convert them to feature-label pairs. [13:26:51] Which seems a little bit too long [13:58:33] 06Revision-Scoring-As-A-Service, 10revscoring, 07Spike: [Spike] Investigate HashingVectorizer - https://phabricator.wikimedia.org/T128087#2601422 (10Halfak) Interesting. It could be that we're already getting all the signal that the new features can provide. I'm guessing that some tuning could help here.... [13:58:58] I think that sabya is really close with he work on hash vectors. [14:03:39] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Extend user group features - https://phabricator.wikimedia.org/T143909#2601423 (10Halfak) [14:09:06] (03CR) 10Thiemo Mättig (WMDE): [C: 031] "The name of this column sounds like ORES is keeping outdated, non-current data in the database. Why is this done?" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/307870 (https://phabricator.wikimedia.org/T144233) (owner: 10Ladsgroup) [14:14:59] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Extend user group features - https://phabricator.wikimedia.org/T143909#2601433 (10Halfak) a:03Halfak [14:24:49] 06Revision-Scoring-As-A-Service, 10revscoring: Implement abstraction for Sparse Feature Vectors - https://phabricator.wikimedia.org/T132580#2601449 (10Halfak) Running tests producing hash tables for https://en.wikipedia.org/wiki/Biology It looks like we're pretty slow. With grams: ``` my_grams = [(0,), (0,1)... [14:39:14] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 13Patch-For-Review, 07Schema-change: Add uniqueness constraints to ores_classification - https://phabricator.wikimedia.org/T143962#2584746 (10Halfak) @Catrope, do you want to assign this task to yourself since you started work? [14:47:02] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Train reverted model for metawiki - https://phabricator.wikimedia.org/T144163#2590382 (10Halfak) p:05Triage>03Normal [14:50:57] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Train reverted model for metawiki - https://phabricator.wikimedia.org/T144163#2601504 (10Halfak) I think we should do some comparison in unicode character ranges to detect language type issues generically. E.g. ``` >>> ord("a") # Latin (English)... [18:54:31] 06Revision-Scoring-As-A-Service, 06Research-and-Data, 10Research-management: ORES and Product: resourcing discussion - https://phabricator.wikimedia.org/T144517#2602407 (10DarTar) [20:55:41] 06Revision-Scoring-As-A-Service, 10revscoring: Implement abstraction for Sparse Feature Vectors - https://phabricator.wikimedia.org/T132580#2603158 (10Halfak) OK. I did some more optimizations and I reduced the gram set as follows: Then I added some tests to compare our performance against the raw HashingVe... [21:25:50] wiki-ai/revscoring#799 (feature_vector - f095611 : halfak): The build was fixed. https://travis-ci.org/wiki-ai/revscoring/builds/156935756 [21:26:01] shaddap travis [22:00:49] 06Revision-Scoring-As-A-Service, 10revscoring: Implement abstraction for Sparse Feature Vectors - https://phabricator.wikimedia.org/T132580#2603389 (10Halfak) OK. We're good to go. I have everything wrapped up in one giant commit right now. I'll be coming back to this tomorrow to split it up into a few logic... [22:01:03] (03CR) 10Ladsgroup: "We do but we have maintenance script to clean them, here's the thing. ORES services updates its model so all scores for all revisions beco" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/307870 (https://phabricator.wikimedia.org/T144233) (owner: 10Ladsgroup) [22:03:09] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality , 03Research-and-Data-2017-Q1, 15User-Ladsgroup: Generate recent article quality scores for English Wikipedia - https://phabricator.wikimedia.org/T135684#2603391 (10Ladsgroup) a:05Halfak>03Ladsgroup [22:03:17] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality , 03Research-and-Data-2017-Q1, 15User-Ladsgroup: Generate recent article quality scores for English Wikipedia - https://phabricator.wikimedia.org/T135684#2603393 (10Ladsgroup) It'll be fun [22:15:54] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES: ORES extension should Assume good faith page creator's revisions - https://phabricator.wikimedia.org/T137846#2603450 (10Ladsgroup) @Yamaha5's bug is something different than what @Iniquity says. Reza says we need to determine if the page is patrolle... [22:16:04] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES: ORES extension should Assume good faith page creator's revisions - https://phabricator.wikimedia.org/T137846#2603451 (10Ladsgroup) 05Open>03declined