[00:07:26] 10ORES, 10Scoring-platform-team: Review promethius ORES rules for completeness - https://phabricator.wikimedia.org/T233448 (10colewhite) If the statsd-exporter sidecar approach is appropriate for ORES, there are quite a few metrics with unclear type and meaning. I've constructed a tree to assist us in definin... [00:15:04] 10ORES, 10Scoring-platform-team: Review promethius ORES rules for completeness - https://phabricator.wikimedia.org/T233448 (10colewhite) [15:02:52] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @nuria - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:52:36] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [16:07:03] I have a huge set of revisions for 4-5 articles, about 22k, and I want to obtain reverted status of each of them. What would be the best way to go about them? Using the mwreverts api doesn't seem to be a very scalable option - https://pythonhosted.org/mwreverts/api.html [16:47:00] o/ codezee! [16:47:08] halfak: o/ [16:47:32] Re. the mwreverts API, I agree that it'll take a little while, but I think it will scale to 22k no big deal. [16:48:17] We use the mwreverts api module in our "autolabel" utility in editquality. [16:48:22] We'll do up to 100k batches. [16:48:51] https://github.com/wikimedia/editquality/blob/master/editquality/utilities/autolabel.py [16:49:00] You could copy this script and re-use it. [16:49:18] It uses "multiprocessing" to distribute the API requests a little bit. [16:49:45] halfak: oh i see, yeah that'll make it faster thanks for the link [16:50:08] Sorry I didn't respond earlier. Feel free to ping me with your questions in the future :D [16:52:29] Hey accraze! Good morning :) [16:52:29] halfak: sure! i was wondering that given there's already an article quality model whats stopping us from using it to rate unassessed articles? [16:52:45] codezee, nothing! That's a great use of it. [16:53:22] Although, I wouldn't necessarily use to create new assessment labels as that would result in a feedback loop when we re-train the model. [16:53:39] accraze, did you join the tech-all meeting that's going on right now? [16:54:36] oh shoot.... no [16:54:50] halfak: okay, also i was looking at the nomination process of articles for quality upgrades. They are assessed by reviewers and sometimes rejected. In this regard, do you see any gap that can be addressed? in terms of helping editors either judge which articles need upgrade or what improvements can they do? [16:57:43] accraze, no sweat. Nothing important. Except I asked Grant to give you some public kudos for your amazing work in the last 4 months :D [16:58:03] my basic point is - to speed up article quality improvements, can we better manage the process, so that the limited number of editors that are already there, utilize their efforts at the right place. [16:59:01] codezee, I think there are a lot of opportunities there. [16:59:14] LOL bummed I missed it, well thanks anyways halfak :) [16:59:26] looks like I'm still not in tech-all [16:59:34] Bah! Dang. [17:00:01] E.g., ragesoss has been using the article quality model to give recommendations to students by using the feature injection system. Essentially it asks ORES if one more section, one more citation, or one more image would help the quality of the article the most. [17:00:15] * halfak jumps into a meeting. [17:00:34] halfak: thanks! i'll follow up with you later, since you're in meeting [17:50:42] codezee, back from meeting if you want to chat some more. [18:21:16] halfak: i'm aware that a limitation of the article quality model is its statistical nature. Sometimes new content or a better way of writing maybe needed to get an article to a new quality level. I'm wondering if there's any value there [19:16:31] Bah. Missed codezee [19:47:01] o/ codezee [19:47:06] sorry I missed your last message. [19:47:23] I do think there are opportunities to improve the functionality of the article quality models WRT quality of writing. [19:47:49] There was a researcher who made some progress with LSTM modeling but it is super bad performance. [19:48:25] https://dl.acm.org/citation.cfm?id=2910917 [19:49:51] I'm not sure if their model was able to pick up the kind of writing quality signals you have in mind. [20:06:52] lunchin for a bit [20:31:04] halfak: ok, i see. and on another note, similar to routing for new page revision, is there any potential value in routing pages for article quality reviews? sth like making efficient use of the review bandwith of users in judging article quality [20:47:00] codezee, ooh. I like it. [20:47:04] I'd not considered this before. [20:47:33] You might imagine a recommended review queue that automatically calls attention to articles as they cross important thresholds. [20:50:02] halfak: yes, thats what i'm thinking of. Right now, I'm trying to imagine the best way i can get some data around this scenario which can tell us that it indeed has value. Something like showing some potential good quality articles not being upgraded soon enough [21:25:22] codezee_, re. getting some data, I might look for articles that ORES ranks as FA that are not labeled as GA and then check to see if they should be GA. [21:25:51] You could do this by (1) manually checking if it looks like they should be promoted to GA and (2) nominating them. [21:26:13] In the past, people used ORES to blindly nominate for GA and that didn't go amazingly. [21:26:31] I think with a little pre-nomination review, it could go really well. [21:26:53] Alternatively, you could check out the GA/FA review process to see if some topic routing could be done around there. [21:31:40] wikimedia/articlequality#257 (euwiki_match_eu-en-es - 976955c : Aaron Halfaker): The build has errored. https://travis-ci.org/wikimedia/articlequality/builds/608427064 [21:32:23] halfak: thanks, i'll see if i can find sth around there [21:49:50] 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10artificial-intelligence: Add language support for Swahili (sw) - https://phabricator.wikimedia.org/T162271 (10Halfak) @kevinbazira, could you take a look at this task? I figure you might be cut out for it. Right now, we need to genera... [21:51:16] 10Jade, 10Scoring-platform-team: Clean up Jade title calculations - https://phabricator.wikimedia.org/T207858 (10Halfak) 05Open→03Resolved a:03Halfak Looks like we already have this. [21:53:09] 10Scoring-platform-team, 10Analytics: Investigate formal test framework for Oozie jobs - https://phabricator.wikimedia.org/T213496 (10Halfak) [21:53:12] 10ORES, 10Scoring-platform-team, 10Analytics, 10Dumps-Generation: Produce dump files for ORES scores - https://phabricator.wikimedia.org/T209739 (10Halfak) [21:53:53] 10Scoring-platform-team (Research), 10Analytics: Investigate formal test framework for Oozie jobs - https://phabricator.wikimedia.org/T213496 (10Halfak) [21:55:11] 10ORES, 10Scoring-platform-team, 10Documentation: ELI5 ORES docs - https://phabricator.wikimedia.org/T190411 (10Halfak) Some work started on Statistics here: https://mediawiki.org/wiki/ORES/Model_info/Statistics [21:58:30] 10ORES, 10Scoring-platform-team, 10Documentation, 10Epic: Improve documentation for answering "What is ORES and how can I use it?" - https://phabricator.wikimedia.org/T190814 (10Halfak) [21:59:26] 10ORES, 10Scoring-platform-team, 10SEO: Improve search-ability of ORES stuff - https://phabricator.wikimedia.org/T179630 (10Halfak) 05Open→03Resolved a:03Halfak See [22:00:14] 10ORES, 10Scoring-platform-team: Precache should include bot edits to wikidata - https://phabricator.wikimedia.org/T212264 (10Halfak) It looks like Google might be running into a QPS limitation because they are now scoring all of the edits to Wikidata. We can help by precaching all of the edits -- bot or othe... [22:00:31] 10ORES, 10Scoring-platform-team: Precache should include bot edits to wikidata - https://phabricator.wikimedia.org/T212264 (10Halfak) p:05Lowest→03Low [22:02:20] 10ORES, 10Scoring-platform-team: Precache should include bot edits to wikidata - https://phabricator.wikimedia.org/T212264 (10Halfak) Edit should happen here: https://phabricator.wikimedia.org/source/ores-deploy/browse/master/config/00-main.yaml$540 [22:02:43] 10ORES, 10Scoring-platform-team: Precache should include bot edits to wikidata - https://phabricator.wikimedia.org/T212264 (10Halfak) But changes should be submitted to https://gerrit.wikimedia.org/r/admin/projects/scoring/ores/deploy [22:06:30] 10ORES, 10Scoring-platform-team, 10Wikidata, 10editquality-modeling, 10artificial-intelligence: ORES is too slow for ORC tool - https://phabricator.wikimedia.org/T226120 (10Halfak) 05Open→03Invalid Thanks for the notes @YMS. (Sorry to not respond sooner. Lost track of this ticket.) We'll close for... [22:08:01] 10Jade, 10Scoring-platform-team, 10Design: Design user experience for Jade sentence-level annotation - https://phabricator.wikimedia.org/T185247 (10Halfak) 05Open→03Declined This is pretty deeply out of scope for us at the moment so declining. [22:12:26] 10Scoring-platform-team, 10Wikilabels: Behaviour of "Abandon", "unsure" is problematic - https://phabricator.wikimedia.org/T166891 (10Halfak) 05Open→03Resolved a:03Halfak We changed it to "Skip" so I'm calling this resolved. [22:16:19] 10ORES, 10Scoring-platform-team (Research), 10Wikidata: Explore using ShEx to support ORES in Wikidata - https://phabricator.wikimedia.org/T225944 (10Halfak) @WMDE-leszek, has any progress been made on item #1. [22:18:15] 10Scoring-platform-team (Research), 10Paper: Paper: Designing for Empowerment: ORES & Wikipedia - https://phabricator.wikimedia.org/T128195 (10Halfak) 05Open→03Resolved Submitted! [23:56:00] hey halfak [23:56:16] Can you change a setting on github for me? [23:56:21] https://github.com/orgs/wikimedia/teams/scoring-platform/repositories [23:56:36] is it possible to set ores_bias_project to "Admin"