[05:30:49] 10Scoring-platform-team-Backlog, 10Wikilabels, 10User-Zppix: Change the skip button's confirmation message on Wikilabels - https://phabricator.wikimedia.org/T168185#3417715 (10Zppix) patch is there still awaiting merge... [08:34:11] 10Scoring-platform-team, 10draftquality-modeling, 10artificial-intelligence: Experiment with Sentiment score feature for draftquality - https://phabricator.wikimedia.org/T167305#3417846 (10Sumit) New PR - https://github.com/wiki-ai/draftquality/pull/9 [13:46:29] o/ [13:46:57] I'm going to need to miss the hack session today. I'll be working to help me buddy clean up an old deck from his yard. [13:47:10] Might be online around this time tomorrow. [15:37:43] if anyone knows, the previous wp10 dump contains a weighted_sum field, is it possible to get this from the api? I need to fill in a few thousand missing page_id's in the dump [15:38:14] i could also impute them with means or some such, but seems if i can get the actual values would be best [16:06:55] o/ [16:07:09] my connection is horrible today :( [16:16:59] Amir1: hi! quick question, the wp10 dump has a weighted_sum field, randomly guessing it seems like the chosen class is approximatly just rounding this weighted sum. In the api though there are 5 predictions and a chosen class, which seems like max(predictions) or some such. Is the dump perhaps from a previous version that used a single classifier, and there is some new classifier that uses 5 [16:17:05] different models to predict each possible class? [16:18:14] ebernhardson: hey, no, the weighted sum is sum of the probability of each class [16:18:20] but weighted [16:18:32] let me show you an example. Do you have anything in hand? [16:18:56] Amir1: how is that calculated? Basicially i need to fill in ~6k page's that are missing from the dump (created since then probably) but couldn't figure out how to get the weighted sum from the api [16:20:11] just taking the first rev_id, 762420475 so https://ores.wmflabs.org/v2/scores/enwiki/?models=wp10&revids=762420475 [16:20:12] it's super simple, the stub class * 0 + the start class * 1 + ... + the fa class * 5 [16:20:33] ahh yea thats pretty straight forward [16:21:26] The order is super important here, 1- stub (0) 2- start (1) + c (2) + b (3) + ga (4) + fa (5) [16:23:50] i'm currently trying to figure out a massive overfitting problem, i took a sample of ~1.2M labeled points, added in wp10 scores for them but had to drop ~30k rows representing 6k page ids. The odd thing though is that just dropping those 30k rows leads to massive overfitting. Attempting now to fill in those 30k rows or see if i did something else crazy wrong :) [16:26:58] oh, keep me posted [16:42:01] halfak: [05:13] celery seems to mean it's an analytics thing [16:42:01] [05:13] is this about getting into stats? [16:42:01] [05:13] +https://wikitech.wikimedia.org/wiki/Wikimetrics, ? [16:42:01] [05:13] eh, gettin data OUT of stats? [16:42:01] [05:14] maybe it's a question for #wikimedia-analytics if they run celerey ? [16:42:02] [05:14] the word "celery" doesnt appear in site.pp either [16:45:02] Zppix: what are you trying to get out of stats? [16:47:06] IM NOt [16:47:09] sorry caps lock [17:19:34] 10Scoring-platform-team, 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 10ORES, 10User-Ladsgroup: ORES spamming Beta Cluster's logstash - https://phabricator.wikimedia.org/T170026#3418110 (10Ladsgroup) a:03Ladsgroup [18:11:15] Halfak are you around or no? [18:13:27] 10Scoring-platform-team-Backlog, 10ORES, 10Wikimedia-Logstash: Send celery logs and events to logstash - https://phabricator.wikimedia.org/T169586#3402704 (10Zppix) [05:13] celery seems to mean it's an analytics thing [05:13] is this about getting into stats? [05:13] +https://wikit... [18:17:08] 10Scoring-platform-team-Backlog, 10ORES, 10User-Zppix, 10Wikimedia-Incident: ORES disk space monitor - https://phabricator.wikimedia.org/T147163#3418172 (10Zppix) @halfak are we still working on this or am I okay to close this? [18:20:50] filling in the missing data during training doesn't seem to have helped :S i'm doing something very wrong but not sure what yet. getting cv-test-ndcg@10: 0.8953 cv-train-ndcg@10: 0.9120, holdout-test-ndcg@10: 0.8350. This is all very odd. Using the same training code without adjusting to add the feature i get cv-test-ndcg@10: 0.8479 cv-train-ndcg@10: 0.8622 holdout-test-ndcg@10: 0.8614 [18:21:39] what model are you doing this on? [18:21:48] mjolnir [18:23:11] err, that last holdout-test-ndcg@10 should be 0.8489 [18:24:03] its not to far off? is it really that bad to be a bit higher? [18:24:48] no, the original that is reasonable is 0.8479 on cv-test and 0.8489 on the holdout, which is perfect. the new one which is the same dataset with one additional feature is 0.8953 on cv-test and 0.8350 on holdout-test, which is huge [18:30:45] sorry had connection issues [18:31:00] sorry connection issues [18:31:10] ebernhardson i see... sorry i dont know what to say [18:36:54] i dont know either :P i've tried a few different things and i'm obviously doing something wrong but can't figure out what yet...i guess keep trying things but taking 20 minutes to train a model makes it slow going :) [18:37:32] Amir1 halfak are you around ebernhardson needs help with models [18:38:10] oh they arn't famliar with mjolnir either, i'm just complaining to the ether :P This is all rather different from ores because everything is done in spark on the hadoop cluster [18:38:35] oh disregard sorry for pinging [18:39:01] ebernhardson what exactly besides helping us out from time to time do you do? [18:39:12] Zppix: i'm tech lead for the search team [18:39:26] isnt that discovery? [18:39:29] not any more [18:39:34] oh [18:39:59] are you affliated with an non-us chapter or are you thru wikmedia sf? [18:40:22] discovery doesn't exist anymore, with two parts split into reading and technology. search is in technology now [18:40:37] i'm a wmf employee [18:40:38] Oh thats right i remember that now [18:40:55] ebernhardson i knew that i just know that some employees are apart of wmde for example [18:41:47] i think the two are mostly independent, except for wmf giving wmde funding each year. [18:42:13] oh you learn something new everyday [19:35:31] 10Scoring-platform-team-Backlog, 10Wikilabels: Wikilabels should authenticate on the right wiki - https://phabricator.wikimedia.org/T166472#3297052 (10Zppix) I mean the oauth process is the same regardless what wmf wiki is used, and is (IIRC) i18n'd so besides unfamiliarity why should this need to be changed? [19:42:41] 10Scoring-platform-team, 10User-Ladsgroup, 10User-Zppix: upgrade pytz to 2017.2 for revscoring - https://phabricator.wikimedia.org/T167604#3418241 (10Zppix) 05Open>03Resolved [19:55:49] ebernhardson sorry to bug you but have use messed with travis ci before? [20:04:13] Zppix: nope, but if you want jenkins help i can :) [20:29:27] 10Scoring-platform-team-Backlog, 10Wikilabels: Wikilabels should authenticate on the right wiki - https://phabricator.wikimedia.org/T166472#3418255 (10Tgr) Not sure why you'd something else besides unfamiliarity.