[08:18:54] 10Quarry, 06Discovery, 10Labs-project-other, 10Wikidata, and 2 others: Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry) - https://phabricator.wikimedia.org/T104762#2635939 (10Multichill) With the current SPARQL setup it's easy to share queries either by full url or by short url. I think... [10:49:24] 10Quarry, 06Discovery, 10Labs-project-other, 10Wikidata, and 2 others: Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry) - https://phabricator.wikimedia.org/T104762#2636433 (10Base) Do I get it right that now a query cannot be longer than URL length limit? How much exactly is that number... [10:55:56] 10Quarry, 06Discovery, 10Labs-project-other, 10Wikidata, and 2 others: Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry) - https://phabricator.wikimedia.org/T104762#1426314 (10jcrespo) @Base, your questions are very interesting, and you seem to have really nice suggestions, but I would s... [15:03:56] o/ [18:50:57] halfak: not sure if you saw this, but I missed it during the wishlist process last year: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Miscellaneous#Editor_Stats_API.2FGUI [18:51:06] they're basically asking for wikicredit [18:51:18] which I can't wait to help with :) [18:51:32] do yall have thoughts about a timeline for that? [18:52:11] milimetric, ORES success --> WIkiCredit backburner :( [18:52:28] it's all good, we'll get there :) [18:52:31] But really, if we could design a system that could generate the productivity stats in semi-realtime. [18:52:36] o/ halfak [18:52:42] That's the major barrier to progress now. [18:52:44] o/ sabya [18:53:33] regarding the model performance, I think it is performing better if we don't pass sample_weight [18:54:25] generating another graph for proving this. [18:55:11] so, roc is around 92 without sample weight, and I think it will be around 90 with weight. [18:55:38] I'm surprised that roc auc changes [18:55:50] I know that accuracy/precision/etc. should change. [18:56:06] Since essentially we're saying, "Pretend that this one case isn't all that uncommon" [18:56:09] When really it is. [18:57:19] ok, will confirm once I see the plot. [18:57:33] it is currently generating the grid. [19:00:58] also, since in the last plot, the lines were still going upwards till the last value of n_estimators, I'm generting roc for estimators till 2100. Want to see where it becomes flat. [19:02:04] makes sense? [19:09:52] Sounds good. Thanks sabya [19:10:36] milimetric, do you think that anything analytics has in the pipeline would help us generate the massively CPU-intensive content persistence measures in easier ways? [19:10:46] I suppose the stream processing will be key. [19:11:03] We need to not miss an event (kafka) and we need to distribute processing with flexible memory [19:11:23] yeah, stream processing [19:11:40] also though, we have to find a good way to store/retrieve/operate on text chunks [19:12:36] nothing turned up on my past searches but when we do this "for real", I intend on searching and thinking for at least a week, because if there's nothing out there, there should be [20:50:23] milimetric, +1 to that. Seems like a clear technical proposal is a good next step [21:37:40] I'm just going to leave this here: https://en.wikipedia.org/wiki/Ore [21:37:56] Better link: https://en.wikipedia.org/wiki/ores