[09:00:38] 10Scoring-platform-team, 10Wikimania-Hackathon-2017, 10Documentation: [Wikimania doc sprint] docs on how to install ORES - https://phabricator.wikimedia.org/T170506#3862235 (10Aklapper) No reply, hence removing tags. Feel free to re-add once my previous comment has been answered. Thanks! [09:02:25] 10Scoring-platform-team, 10Wikilabels, 10Easy, 10Google-Code-in-2017: qunit tests for wikilabels - https://phabricator.wikimedia.org/T171083#3862241 (10Aklapper) @Ladsgroup: Any chance / time to reply to my previous comment so we could maybe turn this into GCI tasks (how much to do per 1 task)? Thanks in a... [14:31:41] (03CR) 10Awight: [C: 04-1] "Please also include tests for updateModelVersion" (037 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400183 (https://phabricator.wikimedia.org/T183468) (owner: 10Ladsgroup) [15:02:06] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10Patch-For-Review: Clean up ORES vagrant role - https://phabricator.wikimedia.org/T181850#3862683 (10awight) [16:17:28] o/ [16:17:40] Accidentally slept in a bit today [16:17:46] Just woke up [16:17:46] o/ [16:22:21] codezee: you'd said yesterday that you have some exciting news about draft topic 😁 [16:26:19] halAFK: i'll share the stats in a moment, you can look at the predictions then for yourself [16:26:51] Cool 😊. Was it weighting that ultimately got it? [16:29:26] halAFK: https://gist.github.com/codez266/5fbf7ab5853225680a598a5bb954f7c1 [16:30:33] halAFK: weighting experiment wasn't directly useful in the sense that weighting with sklearn's multilabel RFs was still pretty bad, hardly 2-3 True positives, the above results are using one classifier per class explicitly and scoring individually [16:30:54] although I did use the same weighting scheme there and it helped in balancing the true/false ratio [16:31:35] theoretically sklearn should train one classifier per class in its multilabel version, what I did manually, but somehow it isn't learning properly practically [16:32:41] they aren't super awesome but when I look at prediction labels, they are mostly sane... [16:32:45] and seem useful [16:33:06] How much RAM/disk does a big set of classifiers occupy? [16:35:13] halAFK: I didn't do that benchmarking, but they seemed pretty small and fast, model size are around 1-2MB for most [16:35:52] That seems tractible [16:36:00] so max we're using 40MB or so disk space [16:36:09] halAFK: 100 predictions - https://gist.github.com/codez266/ddd4b22b4e9ec86fe76f393aadea48d1 [16:36:22] "prediction" and "actual" are keys of interest [16:36:44] rest are in /srv/drafttopic.10k.scored [16:37:32] and this was when I did tune the models at all individually, just using n_estimators =400 , max_depth=4 for everyone [16:37:36] *didn't [16:42:10] the format seems horrible to read, i'll make it more friendly [16:43:48] Did you end up building a scoring model class that contained mainly estimators? [16:43:55] codezee: ^ [16:45:29] halAFK: :D as of now that is an improvised cooked up script as I wasn't sure if it was worth it...i'll now add it [16:45:53] given that we have good results [16:45:56] Ok sounds good. 😁 [16:48:13] halAFK: as a next step, we can maybe exploit the interdependence of labels under the four major catetories. From what all I've read, exploiting label dependence is a very standard thing to do [16:48:38] Eg labels under STEM are not entirely independent [16:49:26] Right. How would we exploit that though? [16:51:02] in one of the approaches they develope classifier chains where output of one classifier is input to another in addition to features, roughly speaking [16:51:08] https://jmread.github.io/talks/Tutorial-MLC-Porto.pdf [16:51:34] I'll read it up thoroughly and take up in one of the syncups [16:52:41] on a side note - "Multi-Label Learning with Millions of Labels:Recommending Advertiser Bid Phrases for Web Pages" uses this thought at the next level to prune search spaces in labels to reduce the label complexity [16:52:53] although its not much related to our case, we have very few labels [16:56:09] We could build a classifier for the first level of the label and then use that as input. [16:56:25] Hmmm. In the meantime, let's ignore that and charge forward. [16:57:42] halAFK: this is more amenable to the eye and for analysis - https://gist.github.com/codez266/ddd4b22b4e9ec86fe76f393aadea48d1 [16:59:19] Looks like some of the "mistakes" are more likely a missing label [16:59:51] halAFK: yes that was one of the things what excited me...the primary purpose of drafttopic [17:00:07] the model does learn something meaningful and predicts that [17:01:27] Eg. the labels for the article 360s aren't wrong when I look at it [17:05:09] meanwhile I had added a PR in revscoring to improve cross-validation speed - https://github.com/wiki-ai/revscoring/pull/388 [17:05:24] i did see marked difference in speeds which is why I added ^ [17:38:32] Hey folks, I wanted to give you a sneak preview from my embedding model. It finds cross-lingual "neighbors" in both wikidata and search space. [17:40:12] https://www.irccloud.com/pastebin/PVR2E80U/minnesota_results.txt [17:41:14] codezee: I'll get a review in today [17:41:27] The results are a little wonky for searches right now because it's only using fulltext searches (not autocomplete) and is missing the most popular searches because of that. So the "Minnesota" article is missing. That should be fixed very soon. [17:42:06] Shilad: cool! I wonder if we could use these like word2vec to boost some of our prediction models [17:42:36] I bet! What are you trying to predict? [17:43:11] Recently we have been focused on predicting to topics of new pages before they get tagged [17:44:20] https://www.irccloud.com/pastebin/BrlZPxom/wikidata_minnesota [17:45:03] what is the universe of topics you are considering? [17:46:35] shilad: WikiProject topics which are predefined topic namespaces on Wikipedia [17:47:47] Neat! And how are you using word2vec right now? [17:52:18] At a high-level, this should be interchangeable with word2vec trained on article text. The distinctions are that 1) it includes embeddings for both concepts / articles as well as search phrases, 2) it has fewer phrases / words than a standard article text NLP model, but 3) it should be generally more accurate for the information it does have. [17:54:27] shilad: on what data were the embeddings generated? [17:55:52] webrequest data, so page views and searches [17:59:16] (03PS2) 10Ladsgroup: Update model version when it's different in ScoreParser [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400183 (https://phabricator.wikimedia.org/T183468) [17:59:33] (03CR) 10Ladsgroup: "Tests added" (034 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400183 (https://phabricator.wikimedia.org/T183468) (owner: 10Ladsgroup) [18:01:23] codezee: For webrequests, a single user session corresponds to a word2vec sentence, and the individual events (pageviews or searches) correspond to words. [18:03:31] shilad: is it this? - https://meta.wikimedia.org/wiki/Research:Wikipedia_Navigation_Vectors [18:03:39] or sth similar [18:05:18] codezee: Similar. Elery's model was produced on an ad-hoc basis. I re-engineered for Spark, built models from three months of data instead of one month, and folded in user searches. [18:06:02] i see [19:21:16] codezee, just dropped a comment on your revscoring PR [19:22:53] halfak: yes I saw, I'll write a simple script to time it and post results [19:23:05] cool :) [19:43:07] (03CR) 10Thiemo Kreuz (WMDE): [C: 032] Clean up ThresholdLookup, make the cache key use model version (033 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400196 (https://phabricator.wikimedia.org/T182111) (owner: 10Ladsgroup) [19:43:37] halfak: done, gain for cv_train on 10k samples is 5 times...left a comment [19:43:48] wow! [19:44:33] good enough for me [19:44:49] (03Merged) 10jenkins-bot: Clean up ThresholdLookup, make the cache key use model version [extensions/ORES] - 10https://gerrit.wikimedia.org/r/400196 (https://phabricator.wikimedia.org/T182111) (owner: 10Ladsgroup) [19:45:22] * halfak is 50 minutes into today's training ride [19:52:27] I've added a todo to my current work to "Run tune on a standard existing dataset to verify sanity after above change" [19:52:50] with roc_auc.macro as fitness [23:03:47] 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10ORES, 10User-Ladsgroup: Wikidata beta edit filters are showing every edit in watchlist as damaging - https://phabricator.wikimedia.org/T180686#3766448 (10Ladsgroup) 05Open>03Resolved [23:35:09] halfak: hey there :D https://github.com/wiki-ai/editquality/pull/111