[07:51:44] 10Scoring-platform-team, 10translatewiki.net: Manual export of translations for wikilabels/wikilabels-wmflabs-deploy - https://phabricator.wikimedia.org/T206041 (10Nikerabbit) 05Open>03Resolved a:03Nikerabbit [10:48:59] o/ [15:01:39] o/ [15:02:27] o/ [15:02:29] Amir1: so... mwparserfromhell ? :D [15:02:33] halfak: as well ^ [15:02:50] There's some work that earwig is doing to make everything interruptable :) [15:02:59] So that'll solve the general problem [15:03:07] I don't know if there is a new release. [15:03:45] there is from what I saw [15:03:47] 0.5.2 [15:04:57] I still haven't even grasped how on earth does this revision make it busy loop [15:06:12] Oh yes. No idea :| [15:12:28] akosiaris: yup :P [15:13:02] afk for lunch [15:26:49] back [15:27:56] halfak: btw, would it make sense that we do about a blog post about this ? (after we are done upgrading and are sure we 've fixed that issue) [15:33:08] akosiaris, +1 [15:33:47] :D [15:39:07] btw. ores-staging.wmflabs.org is operating with celery 4.1 (on celery branch of ores) and it's fine ^^ [15:39:47] Amir1, I want to see a good summary of your tests. If we fail to dedupe in production, I'm worried we'll see a 4x increase in traffic :| [15:42:08] halfak: I will write it in the PR [15:43:04] In the short term, I want (1) get mwpfh 0.5.2 tested to make sure it *is* interruptable, (3) get 0.5.2 into production, and switch to working on Tech Conf stuff. [15:46:43] halfak: https://github.com/wikimedia/ores/pull/270 [15:46:44] :D [15:47:01] Right. So focusing on the production issue first :P [15:47:15] Unless you want to get 0.5.2 tested and deployed for me :) [15:47:36] Oh! I see your note about testing [15:47:38] reading through [15:48:23] I'd like to see duplicate precache calls. One with a --delay=0.1 [15:48:39] Then check CPU to see if it's higher when the second process runs. [15:50:11] 0.5.2 requires rebuilding all models, that would take lots of time (you need to make editquality/itemquality for wikidata also updated to work with mwbase otherwise things will explode) [15:50:38] It shouldn't require rebuilding any models. [15:50:50] Only change is related to signal handling. [15:51:05] hmm, if it would like that, that seems simple [15:51:49] Gotta make sure it is interruptable first [15:51:52] code in task [16:21:00] 10ORES, 10Scoring-platform-team, 10Mobile-Content-Service, 10RESTBase-API, and 4 others: Add ORES articlequality data to summaries? - https://phabricator.wikimedia.org/T157132 (10mobrovac) [16:57:54] hoo: I've invited you to the party, https://gerrit.wikimedia.org/r/#/projects/mediawiki/extensions/JADE,dashboards/default [17:18:34] I'm back [17:25:28] woo! Got some new volunteers started. They are maybe going to take on the development of words-to-watch features. :) They'll join IRC next week. [17:25:43] lol @ "Have you ever used IRC before?" "What's that?" [17:25:55] It's like slack but old and difficult. [17:26:07] * halfak --> lunch [17:34:40] IRC is from when emojis were hewn from the bones of other punctuation. [17:45:50] !bash ^ [17:45:50] 04Error: Command “bash” not recognized. Please review and correct what you’ve written. [17:45:50] halfak|Lunch: Stored quip at https://tools.wmflabs.org/bash/quip/AWZ41YJQ9KCHCtNUghO2 [17:46:08] Why not AsimovBot? That would be the best feature [17:53:14] https://office.wikimedia.org/w/index.php?title=Bash&type=revision&diff=239658&oldid=239450 [17:53:34] OMG stashbot <3 [17:56:58] ty [18:20:00] 10ORES, 10Scoring-platform-team, 10Patch-For-Review: Consistent TimeoutErrors when using Celery 4 - https://phabricator.wikimedia.org/T179524 (10Ladsgroup) I'm running celery 4.2 on ores-staging.wmflabs.org and run a hammer test with ?features= on and it worked just fine. [18:25:52] 10ORES, 10Scoring-platform-team, 10Patch-For-Review: Consistent TimeoutErrors when using Celery 4 - https://phabricator.wikimedia.org/T179524 (10awight) @Ladsgroup That's great news! Note that in T179524#3735543 I disclose that I may have worked around this problem with a barbaric hack. [18:28:34] Looks like our precached script doesn't work. Trying to figure out why [18:30:21] 10ORES, 10Scoring-platform-team, 10Patch-For-Review: Consistent TimeoutErrors when using Celery 4 - https://phabricator.wikimedia.org/T179524 (10Ladsgroup) >>! In T179524#4667923, @awight wrote: > @Ladsgroup That's great news! Note that in T179524#3735543 I disclose that I may have worked around this proble... [18:32:39] halfak: I discovered something interesting about the MW XML dumps. Contributor ids are always local IDs. [18:33:02] Yes they are. This is frustrating for many reasons. [18:33:14] Local IDs can be matched to global IDs but it's very difficult. [18:33:20] halfak: How do you feel about a schema, user: { local_id: 123, global_id: 321 } where the global ID is only written for wikis with CentralAuth enabled? [18:33:30] Local IDs have no utility. They are purely legacy. [18:33:40] I like it. [18:33:47] kk [18:33:50] It seems useful and it doesn't suffer from the suppression problem. [18:34:56] hmm, and of course there's still a oneOf switch, so if this is an IP editor then the structure is user: { ip: "::1" } [18:35:23] Just to confirm, [18:35:52] If CentralAuth is present: user: { local_id: 123, global_id: 321 } or user: { ip: "::1" } [18:36:14] If CentralAuth isn't present: user: { local_id: 123 } or user: { ip: "::1" } [18:37:52] I generally agree we should be using UIDs rather than usernames [18:38:06] Since usernames are a vandalism vector and, even when they are not, can change, leading to out of date revisions [18:40:35] harej: +1, I liked the username as a courtesy to human readers, but dropping is okay with me and I've changed the code accordingly. For reference, signatures in talk page content have all the same problems as the proposal to allow usernames in JADE JSON. [18:41:09] Oh yeah, talk page signatures are an existing problem but there's not much we can do there. [18:41:18] For showing the username as a courtesy, couldn't we do that in the UI? [18:41:26] Have some magic that converts the UID into the current value for that name? [18:42:16] harej, yeah totally. Kind of like how Structured Discussions does it. But for data consumers, they will need to do a lookup. [18:42:53] That's definitely a downside, but I think the tradeoff is reasonable (given vandalism and change over time) [18:43:34] I think sticking to UIDs is better for the integrity of the data [18:43:36] Even if it's less convenient [18:43:39] totally. The JSON content is optimized for machines anyway, my only "red line" would be if we did something to make it completely unusable to the naked eye, e.g. pickle a field. [18:44:08] halfak: Are you okay with my change summarized at 18:35? [18:44:51] We could also generate a convenience dump that swaps out the user IDs for user names, but those dumps could end up becoming outdated, which is a weird thing for revisions which are supposed to be immutable [18:44:55] Actually, how do the dumps normally do it? [18:44:59] +1 awight [18:45:14] apergos, ^ [18:45:26] We're wondering how usernames work for old revisions in XML dumps. [18:45:37] Do they pull the rev_user_text field or do a lookup into the user table? [18:45:45] I can answer how the dumps are currently produced, fwiw [18:45:48] This would matter for when users are renamed. [18:45:52] nice [18:46:22] would require more swap space than I am currently willing to allocate... [18:46:41] once I have this doc written (30 min to one hour?) I can look and weigh in [18:46:44] https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/export/XmlDumpWriter.php$349-359 [18:47:17] 10ORES, 10Scoring-platform-team: Build a brand-new GUI for ORES - https://phabricator.wikimedia.org/T207071 (10Ladsgroup) [18:47:20] ^ for IP users, the "ip" field is written and nothing more. [18:47:58] For logged-in editors, we write a "username" and "id" field, for revisions that will be the content of rev_user_text and rev_user. [18:49:15] [18:54:45] right. So that means the username doesn't get updated -- unless rev_user_text gets updated. I'm not aware that it does get updated. [18:54:48] Let me check. [18:55:47] rev_user_text would be updated if a username was renamed, no? isn't that why user renames are so expensive? [18:56:18] Hmm. It looks like they are changed! [18:56:31] Yes. I didn't realize we were performing such expensive operations. [18:57:13] Ugh. I don't know why our precached script isn't working and I haven [18:57:19] (03CR) 10Awight: "After some IRC discussion, we're going to split the logged-in user identifiers into local_id (always present) and global_id (present when " (031 comment) [extensions/JADE] - 10https://gerrit.wikimedia.org/r/461502 (https://phabricator.wikimedia.org/T206573) (owner: 10Awight) [18:57:30] 't figured it out yet. Gonna have to drop this for a bit. Will come back to review your work later, Amir1. [18:58:27] okay, is there anything worrying [19:20:42] I'm done for the day [19:20:51] See you on Wednesday [19:22:33] halfak: WikiProject Women Scientists would be a women (i.e. biographical) project and WikiProject Women's Health would be a women's issues project, correct? [19:24:07] Because this would make Women's Health a politics project, per your ontology. Is this intentional? [19:52:49] Yes. I think that's up for debate. [19:53:21] But from my (western) view, women's health has been very political. Maybe we could put women's health under STEM.Health.Women's health [19:53:50] * halfak looks for where it is in the council directory [19:54:03] Are projects limited to one categorization? That makes things harder [19:54:28] It's under STEM.Women's health. [19:54:41] harej, nope. Multi-class. [19:54:59] Hi there, harej: replied to your email (sorry for the late reply: family/health issues) [19:55:03] So "Women Scientists" would be "Culture.Biography.Women" And "STEM" generally. [19:55:22] Hauskatze, sorry to hear about that. Been missing you in IRC recently especially re. JADE discussions :) [19:55:24] Hauskatze: hi there! Thank you very much; it's a useful email [19:55:35] Sorry to hear about the family and health issues. [20:03:06] Thanks both :) [20:24:18] harej, does ^ make sense? [20:25:02] seems fine to me, yes [20:27:18] Woops! I got something wrong here. [20:27:26] Seems that "Women's issues" is under politics. [20:27:38] I don't have a "Women's health" [20:27:59] https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Women%27s_Health [20:28:21] So maybe "WikiProject Women's Health" would be under STEM.Health and Society.Politics.Women's issues [20:28:40] "including related social and political issues." [20:28:42] Seems right. [20:29:25] * halfak jumps head first into political battle. [20:29:29] :D [20:29:47] also halfak, one use case I am considering is JADE for the Small Wiki Monitoring Team [20:29:57] This would have us deploy JADE on... like 95% of wikis [20:30:09] I swear I'm not trying to game the metrics :P [20:30:13] lol [20:30:19] It does seem like a fine idea to me. [20:30:38] I'd also like to build a "small wiki" vandalism detection model. [20:32:22] 10JADE, 10Scoring-platform-team (Current), 10WMF-Communications: Blog about JADE - https://phabricator.wikimedia.org/T183200 (10Harej) https://wikimediafoundation.org/2018/10/10/mitigating-biases-artificial-intelligences-wikipedian-way/ [20:32:31] 10Scoring-platform-team: Scoring Platform FY18 Q3 - https://phabricator.wikimedia.org/T183198 (10Harej) [20:32:33] 10JADE, 10Scoring-platform-team (Current), 10WMF-Communications: Blog about JADE - https://phabricator.wikimedia.org/T183200 (10Harej) 05Open>03Resolved [20:33:22] The main concern I have is that it would probably lead to labels being posted in English to non-English wikis. [20:33:49] But that may be seen as an acceptable cost for wikis not big enough to have their own communities [21:22:16] Sorry. I don't understand harej. What labels being posted? [21:22:35] Oh! I think I see. [21:22:55] As in English speaking patrollers will post notes about their judgments in English. [21:22:59] So, say I'm a Small Wiki Monitoring Team member. I'm an English speaker. I judge an edit on Kazakh Wiktionary. For my rationale, I write it in English, and not Kazakh. [21:34:45] Right. How is that handled now? [21:35:32] That's what I would like to know. Presumably SWMT members just revert without leaving commentary of any kind [21:35:42] Alterrrrnatively [21:35:56] I think krinkle is involved in the small wiki monitoring [21:35:56] We could mitigate this problem by (in the UI) providing some stock comments which are translated into as many languages as possible [21:36:12] That's a great idea. [21:36:30] Maybe not something *we* are equipped to do, but I really like it. [21:36:45] Maybe *tagging* is a better way to think about it. [21:36:57] And each wiki can provide a translation of relevant tags for others to use [21:37:35] E.g. #repeatchars #keymash #racialslur #hoax #massdelete etc. [21:38:16] Could we define these tags in the schema? [21:38:42] It'd be some kind of language-agnostic tag that would show up in the UI as a message string [21:51:40] (03CR) 10Hoo man: [C: 04-1] "Wasn't able to fully grasp this, but other than my comments this looks good to me." (032 comments) [extensions/JADE] - 10https://gerrit.wikimedia.org/r/461502 (https://phabricator.wikimedia.org/T206573) (owner: 10Awight) [22:05:51] hoo: Thanks! [22:07:19] I was actually in the midst of that patch, so perfect timing. [22:08:58] :) [22:23:27] OK. I've learned something very deep and frustrating about python's requests. It seems they made a change recently that does not allow for multiple requests to be made in parallel. [22:23:32] * halfak lights hair on fire. [22:23:39] Been doing that a lot recently. [22:29:25] Using async? [22:39:19] (03CR) 10Awight: Split user schema into ID or IP; validate (032 comments) [extensions/JADE] - 10https://gerrit.wikimedia.org/r/461502 (https://phabricator.wikimedia.org/T206573) (owner: 10Awight) [22:41:52] Using a thread pool [22:42:03] Got it working now and am finally able to run a test [22:42:18] * halfak douses hair fire [22:43:07] halfak: "ores precached https://stream.wikimedia.org/v2/stream/recentchange ores-staging.wmflabs.org --debug" doesn't work, I tried even stream.wikimedia.org but still no success [22:43:25] yes. We need to use the new revision-create [22:43:25] oh! halfak, how much do you know about Special:Import? [22:43:34] harej, not much [22:43:41] Hmm, okay [22:43:44] I just know about the XML format [22:43:46] sorry :| [22:47:19] Still I can't make it to work :/ [22:47:28] OK I'm running a test now [22:47:36] Amir1, ^ [22:49:11] okay, make sure you run it on staging. [22:51:22] Yes. :) [22:51:35] * halfak twiddles thumbs while watching graph [22:52:35] Uh oh. It doesn't seem to be working. [22:52:53] CPU usage doubled when I started a parallel precache with delay of 0.1 seconds. [22:53:02] (03PS7) 10Awight: [WIP] Split user schema into local, global ID, and IP [extensions/JADE] - 10https://gerrit.wikimedia.org/r/461502 (https://phabricator.wikimedia.org/T206573) [22:53:07] * halfak lets it run for a bit longer. [22:53:32] That's fun, `$globalUser->getLocalId( wfWikiId() );` returns null. [22:53:40] I'm out for the evening o/ [22:54:46] OK stopping test [22:54:52] o/ awight [22:55:25] Amir1, I'm convinced that this doesn't work at 0.1 second delay. It could be that setting up the initial map-job takes too long. [22:55:34] * halfak makes a graphic [22:57:23] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Split user schema into local, global ID, and IP [extensions/JADE] - 10https://gerrit.wikimedia.org/r/461502 (https://phabricator.wikimedia.org/T206573) (owner: 10Awight) [22:57:24] halfak: do you want to try with 0.2? [22:58:04] https://phabricator.wikimedia.org/M267 [22:58:27] I'm on it. [22:58:36] halfak: what's the precache command you are running? I couldn't make it to work [22:58:59] ./utility precached https://stream.wikimedia.org/v2/stream/revision-create https://ores-staging.wmflabs.org --debug --config ../ores-wmflabs-deploy/config/ [22:59:12] I've got a hacked version to deal with the issue that set my hair on fire earlier. [22:59:39] Will submit a PR shortly. [23:03:46] Hmm yeah. It's not working. [23:04:39] wikimedia/ores#1064 (fix_precached - cd515fe : halfak): The build passed. https://travis-ci.org/wikimedia/ores/builds/441920788 [23:04:46] Same pattern [23:05:34] halfak: shit, I think I know what's going on [23:05:39] Oh good. [23:05:40] I haven't enabled it yet [23:05:44] lolol [23:06:08] I need to run but I'll get this PR in place so hopefully you can do it too :) [23:08:12] Amir1, https://github.com/wikimedia/ores/pull/274 [23:08:17] I think that'll make it work for you. [23:08:32] I have to run. Will be looking at this again tomorrow. Have a good night! [23:08:39] halfak: can you try again now? [23:48:35] still not woking [23:48:42] will try again tomorrow