[14:50:51] halfak: o/ I can't be at today's backlog grooming, it's yearly review of WMDE engineering dept. [15:00:27] Gotcha Amir1. Thanks for letting me know [15:13:57] (03PS5) 10Sbisson: Add classes to RC lines so they can be highlighted [extensions/ORES] - 10https://gerrit.wikimedia.org/r/327007 (https://phabricator.wikimedia.org/T152797) [16:55:48] Amir1, ^ if you have the time today [16:56:42] halfak: still in the meeting :( [16:56:49] I'll check once I'm done with this [16:56:53] No worries :) [18:06:19] HELLO? [18:06:38] o/ [18:07:04] srrodlund: I was mentioning to halfak that you would have the brilliant for our docs [18:07:15] Here's a sort-of overview: https://etherpad.wikimedia.org/p/ORES_docs_split [18:07:46] Ah! Hello! [18:07:47] o/ [18:07:58] In meeting now, but I'll be back in ~25 mins [18:08:05] ok [18:08:39] srrodlund, immediately, I guess the questions are, would you like to help, do you have the time to help, and what would you need to help? [18:08:58] :D halfak does not mince the words [18:09:14] :D [18:09:21] I'd love to help; time will be better post all hands but I can take a cursory look if it is urgent; and I don't know you tell me :-) [18:09:43] * awight stealthily pencils "technical writer" into ORES team plan [18:10:07] srrodlund, last one was intended to be more "What would you need in order to help?" [18:10:53] But yeah, post allhands seems reasonable. This isn't terribly urgent, but if we have you take a look or just help us know how you think, we can get ourselves organized better to work with you in the meantime :) [18:11:18] +1 awight for tech writer time. I'll make a note. [18:11:36] Yup! I can look! [18:12:46] Oh ha I read the last one wrong :-) Let me look first, and then I can tell you if I need anything. I may have some questions about what you need once I have a look at the doc. It's on this etherpad? [18:13:09] The Etherpad is the best reference we have to where all the docs live [18:13:23] We're ready to start thinking about where they *should* live and where the gaps are. [18:13:37] * halfak pays attention to meeting [18:14:04] srrodlund: I've been making drawings, meanwhile, cos it's all I'm good for: https://github.com/adamwight/ores-diagrams [18:14:41] :P awight [18:15:21] Thanks @awight [18:17:25] srrodlund: any time you need a change of stale air, I'd love to chat about my understandings and misunderstandings [18:17:52] I'm definitely at the "learn" pay grade of volunteer tho [18:18:35] Yes DEFINITELY -- I'm really interested; sometimes it's nice to do things that require more thought than what I'm currently doing :-) [18:57:08] o/ Amir1 Thanks for the merge. I just noticed :D [18:57:26] Now to figure out how to maintain the current wikilabels while we prepare to migrate people off-wiki. [18:57:39] halfak: you are very welcome :) [18:58:03] 1- deploy it 2- announce it 3- put a notice on WP:Labels on its wikilinks 4- wait for a while :D [19:38:33] 06Revision-Scoring-As-A-Service: Remove host from wikilabels config -- infer from request - https://phabricator.wikimedia.org/T154693#2920629 (10Halfak) [19:38:36] 06Revision-Scoring-As-A-Service: Remove host from wikilabels config -- infer from request - https://phabricator.wikimedia.org/T154693#2920642 (10Halfak) https://github.com/wiki-ai/wikilabels/pull/145 [19:38:47] 06Revision-Scoring-As-A-Service, 10Wikilabels: Remove host from wikilabels config -- infer from request - https://phabricator.wikimedia.org/T154693#2920643 (10Halfak) a:03Halfak [19:39:40] Amir1, +1 for that plan. [19:39:54] Happily, the current on-wiki code will still work. [19:40:04] So we can do a gradual transition [19:56:10] 06Revision-Scoring-As-A-Service, 10ORES: Split wheels repo into Prod/WMFLabs branches and maintain independence - https://phabricator.wikimedia.org/T154436#2920751 (10Halfak) Working on https://github.com/wiki-ai/ores-wmflabs-deploy/pull/71 [19:56:18] 06Revision-Scoring-As-A-Service, 10ORES: Split wheels repo into Prod/WMFLabs branches and maintain independence - https://phabricator.wikimedia.org/T154436#2920752 (10Halfak) [19:56:32] 06Revision-Scoring-As-A-Service, 10ORES: Split wheels repo into Prod/WMFLabs branches and maintain independence - https://phabricator.wikimedia.org/T154436#2911361 (10Halfak) Also, I pushed a `wmflabs` branch to the wheels repo [20:35:42] OMG we had a stupid bug that was cause by just not shuffling. I can't believe this was the fix. [21:54:41] Amir1, if you're around, they're doing some hacking on the ORES extension in #wikimedia-collaboration [22:29:54] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements, 10rsaas-editquality, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Automatically adjust ORES threshold settings when ORES models are updated - https://phabricator.wikimedia.org/T152161#2921341 (10jmatazzoni) [22:30:15] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements, 10rsaas-editquality, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Implement new precision-based test stats for editquality models - https://phabricator.wikimedia.org/T151970#2921345 (10jmatazzoni) [22:44:47] 06Revision-Scoring-As-A-Service, 10UI-Standardization, 06WMF-Design, 07Design: Add Yellow70 to the color palette - https://phabricator.wikimedia.org/T151938#2921495 (10Volker_E) [22:46:20] ello [22:46:23] hi [22:47:12] so I was discussing with halfak on trying to identify (potential) copyrighted files on en.wikipedia [22:47:42] https://quarry.wmflabs.org/query/15190 [22:47:46] I had a query like this [22:47:54] Looking at it now. [22:48:05] o/ Earwig :D [22:48:09] hey there [22:48:31] Unrelated, but maybe relevant to your interests... I just move this to mainspace: https://www.mediawiki.org/wiki/Mediawiki-utilities [22:48:36] * halfak dives back into query [22:49:10] I am in the know of copyright issues of files, which indeed is non-trivial. I dont expect automation to help with that, however some user patterns that leads to problematic uploads are known to me per experience. [22:50:00] halfak: hm, I wonder if mwparser could be adapted to use these libraries to solve the perennial "we don't know wiki-specific info" problem [22:50:00] Consider https://en.wikipedia.org/wiki/File:For.my.love.png [22:50:08] and really https://en.wikipedia.org/wiki/Special:Contributions/Pritam_priya_oraon [22:50:57] Earwig, all of those libraries try to avoid wiki-specif things too. Maybe we need a "wmf" library that can help or something like that. [22:51:27] Earwig, is there any way you can assert wiki-specific info in mwparserfromhell [22:51:49] E.g. "file_namespace=['File', 'Image']" [22:51:56] halfak: no way currently; it just avoids dealing with things that are wiki-specific or makes broad generalizations [22:52:02] yeah, e.g. currently it doesn't treat files in a special way [22:52:05] there's an open issue for it though [22:52:15] ToAruShiroiNeko: makes sense [22:52:33] ToAruShiroiNeko: so, originally you made me think of running reverse-image searches on large quantities of images; that's not the plan here? [22:52:50] not at the moment. [22:52:54] I dont think that is necesary [22:53:15] the main reaosn is you do not upload an image to english wikipedia and NEVER edit any mainspace page [22:53:34] cleaning those up should be semi-trivial [22:53:38] or the page was deleted [22:53:44] that is indeed true [22:53:56] which, again, is likely to indicate copyvios [22:53:59] esp if the deletion was for G12 (but I don't think we can check that with the db) [22:53:59] but if page is deleted and that removes all of your contribs, that is also a red flag [22:54:02] Arg. So yeah... This is not going to be a terribly efficient query. [22:54:03] copyvios or worse [22:54:09] we often get over commercilised bs [22:54:23] deletion reason isnt too important [22:54:51] anyone with exclusive edits to one page and not edit anything else is pretty much a single purpose user. [22:54:56] hmm [22:55:02] dletiono strongly implies a problem with their work [22:55:13] be it notability, npov or blatent attack pages [22:55:31] `user` stores editcount, but it's not namespace-specific [22:55:34] the reaosn doesnt matter, their files have dubious copyright status and probably arent useful. [22:55:35] so that doesn't really help [22:55:42] no, unfortunately it doesnt [22:55:46] but it can be an indicator [22:55:52] you could have an edit count of 500 [22:55:55] or 100 [22:56:01] typically drive byers dont have much [22:56:23] but it would indeed be noisy [22:56:34] Was thinking the same thing. [22:56:37] Let me try that out [22:56:41] you could use that as an initial filter on the query to reduce the number of subqueries [22:56:49] oh uh [22:57:07] That is what I tried I think, the not in is super expensive probably [22:57:08] revision_userindex [22:57:29] you have millions of users and hhundreds of millions of revisions :/ [22:57:42] millions of uploads too [22:58:26] Here's my attempt. https://quarry.wmflabs.org/query/15259 [22:58:33] It looks OK in the optimizer. [22:59:11] that looks a lot better, no subquery [22:59:14] where is this optimiser? [22:59:27] I would like to use it too. [22:59:51] I'm using the analytics replicas :/ [23:03:52] I have the query running on quarry and the analytics boxes. If you don't beat me to it, I think I'll have an answer for you tomorrow. [23:35:21] ToAruShiroiNeko, where should I paste when this finishes? [23:38:17] 06Revision-Scoring-As-A-Service, 10ORES: Split wheels repo into Prod/WMFLabs branches and maintain independence - https://phabricator.wikimedia.org/T154436#2911361 (10Halfak) https://gerrit.wikimedia.org/r/330823 For new wheels. [23:40:15] halfak umm [23:40:22] I can acomodate any method of publishing [23:40:31] you could post on your userpace on wiki [23:40:35] that would be most convenient [23:40:54] how much info is there? [23:48:32] https://quarry.wmflabs.org/query/15259 finished [23:48:39] 80k users [23:49:11] https://en.wikipedia.org/wiki/Special:Contributions/Xi_Mingze [23:49:13] wow, look at this guy [23:50:27] query says he has 8k uploads though, which doesn't seem right [23:53:19] hmm [23:53:48] he has edits [23:54:06] few [23:54:16] but definately problematic [23:54:26] \o/ [23:54:49] then again uploads seem to have proper fair use [23:54:50] wtf [23:55:35] it seems like he overwrites existing copyrighted files [23:58:07] yeah, I just see him uploading resized images [23:58:17] which is not necesarily OK [23:58:22] fair use supposed to be low res [23:58:23] when convention isn't to have them be that big [23:58:24] yeah [23:58:34] I'd tell him to stop [23:58:38] to be fair new resolutions arent huge [23:58:49] no, but bizarre [23:58:55] he did it very quickly [23:58:55] unless we have a definition of low res [23:59:02] I am more puzzled by him having 7 edits [23:59:06] I've heard < 100,000 pixels [23:59:12] but that's kind of arbitrary [23:59:16] 8 [23:59:19] honestly I wish we had clearer guidelines for that [23:59:27] but that would be gamed [23:59:35] we can enforce it at mediawiki level [23:59:37] how? [23:59:37] like with thumb [23:59:39] oh [23:59:47] fair use would have such an enforced size [23:59:58] huh actually yeah, what if mediawiki resized fair-use images for you [23:59:59] linking to it directly wouldnt work