[13:18:34] Hi everyone! I am versed with Python and ML concepts and have basic knowledge of WebDev. Please suggest some issues to work on. [13:19:36] Hellooo!! [13:25:18] hello [13:25:55] people are not always active or looking here; many people are in a us timezone and it would be quite early for them [13:26:49] if you can wait a few hours (3?) it will be more active [13:27:13] in the meantime let me find the phabricator workboard which has some tasks that you might get familiar with [13:28:10] Ohh no worries. Its evening here. Thanks apergos! [13:28:11] https://phabricator.wikimedia.org/project/view/1467/ this is the biggest project using ml here [13:28:30] it's afternoon for me, hazards of a global movement :-) [13:29:50] https://phabricator.wikimedia.org/tag/scoring-platform-team/ well this might be a bit scary to just jump in on [13:30:08] but you could look at ores and see if it interests you [13:30:26] https://www.mediawiki.org/wiki/ORES [13:34:57] https://gerrit.wikimedia.org/r/#/admin/projects/?filter=ores some of the relevant repos are here in case you want to look at code [13:39:25] o/ [13:39:53] Hi Reshu! What brings you to #wikimedia-ai? [13:43:38] ah, that was a quick few hours, thanks hal fak! [13:44:48] :) Thank you for being around and helping out apergos :D [13:45:19] I'm always excited to see new possible contributors! [13:59:11] Was searching for ml based proj in Wikimedia,read somebody's blog. :) [14:01:50] Cool! I can help you get started with a task when you are ready. What's you background and interests? I'll try to match you with something appropriate :D [14:03:59] I am pursuing Bachelors in Electronics. Tho I have worked with some ML based mini-projects usually prediction model , NLP. [14:04:49] Have you worked with python at all? [14:05:12] Yes :) [14:05:45] Would like to contribute aiming Outreachy Winter session. [14:06:11] Any experience with sci-kit learn? [14:06:33] Yups [14:07:12] Cool. Now, would you be more interested in building up our model training/validation pipeline or training a specific model for a specific context? [14:09:33] 2nd one more preferred :) [14:10:52] OK! [14:11:03] What languages can you read/write in? [14:11:54] C/C++ ,Python ,SQL [14:13:12] Sorry. What spoken languages :) [14:13:18] heh [14:13:27] o/ SQL :) [14:13:31] Heya! [14:13:39] lol [14:13:40] Ohh Its English, Hindi [14:13:49] haha :D [14:14:16] SQL, I've been talking to folks about improving the class names/structure for the topic model. Thanks for picking it up in the meantime. It's really helping socialize the next step of work :D [14:14:36] Reshu, OK! Fun story is that our support for Hindi is lack-luster right now if you are interested in that. [14:14:49] halfak: NP, I think I need to patch it for the change in names (wp10), just busy right now [14:15:06] No rush on that. We won't pull the rug out from under you soon. [14:15:20] I think it's already failing? IDK for sure [14:15:40] Reshu, Otherwise, we have some work building article quality prediction models for Glacian. [14:15:47] SQL it shouldn't be! [14:16:04] lack-lustre means not currently practiced? [14:16:22] Reshu, more that it's behind on the state of the art. [14:16:37] halfak: I see a lot of blanks in prediction / predicted class, but - I haven't had the opportunity to look more into it lately. [14:16:39] what's the 2-letter code for hindi? [14:17:02] SQL, I just double-checked and you should still have wp10 coming from ores.wikimedia.org. Maybe it is something else. [14:17:10] Could be [14:17:13] HI I think btw [14:17:16] is the iso code [14:17:34] OK. Thanks SQL. So reshu, it looks like we have *no support at all* for hindi. [14:17:50] We could set them up with basic damage (vandalism and other problematic edit) detection. [14:18:01] them = Hindi Wikipedians. [14:19:08] Ohh whatever is currently executed for english, making some basic detections for hindi, am i getting it right? [14:20:11] But isn't it all usually submit edits etc in English and they are translated to hindi...hindi edits are also submitted? [14:21:32] Yes. That's right. [14:21:58] Each language wiki has their own editing community that do some translation but mostly it's direct writing for articles. [14:25:54] " work building article quality prediction models for Glacian" What is Glacian here? [14:26:44] Oh I mean Galacian Wikipedia. https://en.wikipedia.org/wiki/Galician_Wikipedia [14:28:19] Oh okay! Its yet another lang.:) [14:29:16] :) Yup. An old dead language. Still, one we can help with. [14:29:34] Wait. no. [14:29:37] 2.4 million people [14:29:41] So not dead! [14:33:02] where can I find repos or codes etc to begin with? [14:34:58] https://github.com/orgs/wikimedia/teams/scoring-platform/repositories [14:35:56] editquality is the repo with the code most specific to damage/vandalism detection. [14:37:34] Looks like we have the basic language assets to start working on hindi damage detection in our framework repo [14:37:35] https://github.com/wikimedia/revscoring/blob/master/revscoring/languages/hindi.py [14:38:44] The next step is to generate a sample of edits and train a basic revert-detection model on them. [14:39:09] Ohh cool :) ..1st link not opening.. [14:39:19] https://quarry.wmflabs.org/query/29657 [14:40:58] awight, traffic is switching to codfw right now. [14:41:06] FYI [14:41:19] * halfak pulls up the dashboard. [14:41:29] Reshu, I'm looking into why you can't click on that first link. [14:43:15] yup this one https://github.com/orgs/wikimedia/teams/scoring-platform/repositories not opening [14:45:02] I found this research paper https://pdfs.semanticscholar.org/d358/33c08bd2607f16798fdf23eda51a19417d06.pdf ... want to understand what rever t detection is.. is it minor undoing being detcted ? [14:45:48] Error 404 showing [14:50:18] Reshu, this works for now: https://github.com/search?q=topic%3Aartificial-intelligence+org%3Awikimedia+fork%3Atrue [14:50:49] Reshu, regretfully, that's a very bad paper to start with :) [14:51:04] That paper re-defines reverts in a very confusing way. [14:51:11] We like to stick with the common usage :) [14:51:42] xhttps://meta.wikimedia.org/wiki/Research:Revert [14:51:51] "identity revert" [15:00:00] halfak, awight: https://phabricator.wikimedia.org/T204064 [15:00:31] [citation needed] [15:01:00] Related: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.62.8435&rep=rep1&type=pdf [15:01:02] "Accuracy and Confidence in Group Judgment" by Sniezek and Henry, 1988 [15:01:17] This is a seminal paper about "meta-moderation" or moderation of moderators. [15:01:23] Put it in the task :P [15:01:38] Preferably with a PDF link ;) [15:02:53] What do you think of the broader concept discussed by the task? [15:05:20] harej, it's good. Right now, we don't have any real meta-moderation. [15:05:35] If we did, we could fix issues like patrollers being unnecessarily unkind. [15:05:38] Or making mistakes [15:05:54] And we might be able to better reward patrollers who are kind towards newcomers. [15:06:12] Also I understand an endorsements schema has been brought up in the past. This doesn't necessarily have to be data that is encoded in the content. [15:06:29] The threat of being observed and judged tends to make people more careful which will also make patrolling more difficult/stressful. [15:07:06] harej, we discussed having endorsements/!votes on the talk page. This only makes sense in the case of content being a single judgement per schema. [15:07:26] What is lost is *who* performed the judgment and in what context they did so. [15:07:38] As the talk page would not be easy to parse as talk pages are generally awful to parse. [15:14:16] this is just a random idea, but one idea i have is for a wikilabels-like interface where you are shown an old judgment and you are asked "do you agree?" and if yes, this gets noted in a table somewhere, and if not, that is also noted but you are further invited to edit [15:14:34] Reshu, I started a query to get a random sample of hiwiki edits. I can show you how to generate "reverted" labels for them. See https://quarry.wmflabs.org/query/29657 [15:15:12] harej, maybe we could riff off of past meta-moderation interfaces :) [15:18:52] Ok :) [15:19:25] harej, I'm imagining that the same feed of judgments that tools like Huggle uses could be fed back into ORES to check the distance between the judgment and ORES predictions. Then unexpected judgments can be highlighted for meta-moderation. [15:19:43] That would also help us quickly find trends in mistakes that ORES is making. [15:19:58] Reshu, what operating system are you generally programming in? [15:22:02] Amir1, I can't clone from wikimedia/articlequality [15:22:10] Will paste the error in a minute. [15:22:31] https://pastebin.com/FByD38q6 [15:23:09] Ohh... maybe I'm missing lfs. [15:24:14] * awight tears open a gallon of popcorn for the backscroll [15:24:33] linux [15:24:50] Reshu, excellent! That'll save us time and energy. What distro? [15:25:11] brb [15:25:12] ubuntu [15:25:54] ...."This query is waiting to be executed " [15:28:56] Quarry will be subject to server maintenance on Wednesday, September 12 from 7pm UTC. The site will be read-only for a few hours, but should remain online. [15:29:10] Is this issue on your side too? [15:36:07] There's an ongoing datacenter migration today and tomorrow [15:38:10] harej: Thanks for making T204064--IMO it should be "normal" priority, though? [15:38:10] T204064: Find way to re-surface judgments for continuous evaluation - https://phabricator.wikimedia.org/T204064 [15:42:21] harej: I have a few more papers related to the points in Sniezek, so far it seems like we want * asynchronous discussion * anonymity * [15:42:44] * diverse groups [15:43:49] awight, I don't think we should prioritize that one [15:44:12] Except for the basic ability for others to meta-moderate, we shouldn [15:44:16] 't be building a tool [15:44:39] * discourage majority-rules voting, * discussion of yes-no propositions whenever possible [15:44:55] Shouldn't prioritize which one? [15:44:58] halfak: ^ [15:45:29] T204064 [15:45:30] T204064: Find way to re-surface judgments for continuous evaluation - https://phabricator.wikimedia.org/T204064 [15:46:03] ah, I see. But I'd say it's actually important regardless of whether or not it's within our scope. [15:46:38] awight: it's low priority insofar as the current priority is to get the actual thing out the door first. new features can come later [15:48:03] harej, if we're not doing the work, we shouldn't set the priority, IMO [15:48:12] +1 [15:48:28] What if we want to do the work eventually? [15:48:39] I don't think we should. I think this is out of scope. [15:48:50] Generally, not talking about this task specifically, I think https://www.mediawiki.org/wiki/Phabricator/Project_management#Priority_levels is really problematic. [15:49:05] Agreed! [15:49:08] "someone is still planning to work on it" is downright weird [15:49:37] There's another dimension which is sometimes used in planning, importance vs. urgency [15:50:04] I think phabricator's "priority" column misses that detail [15:50:32] like, for this task it seems * really important, * not something we're going to do any time soon [15:51:50] 10Scoring-platform-team (Current), 10JADE, 10WMF-Communications: Blog about JADE - https://phabricator.wikimedia.org/T183200 (10Halfak) I resolved everything except something that needs review from @DannyH. I think that's our final blocker. [15:53:05] Generally, I think that purpose built UIs should always be out of scope for our team. [15:53:19] Unless they serve the direct needs of the service or data stores. [15:53:55] Maybe we build a simple toy UI that probably can't work in practice but lets people see what is possible. [15:53:59] I'm okay with that, as long as we're chronically understaffed at least. [15:54:24] But it's worth mentioning that harej's task has implications for the content schema and other stuff which is still in scope. [15:55:35] I think halfak is generally right that we should not be building consumer products [15:55:52] I wouldn't go so far as to say no UIs ever, but I think a meta-moderation system falls clearly in the consumer product category [15:56:12] Things other than backends for other things. [15:56:50] It's something I'd like to shop with Contributors when JADE picks up more steam. [15:58:11] I guess we could split that task into two facets, one is to build a UI which encourages a specific behavior, the other is that the JADE content schema and API should support this behavior. [16:02:00] harej, +1 [16:02:13] In the meantime, it's a good idea to start documenting the what and why of meta-moderation. [16:02:21] +1 awight [16:02:32] I don't think it necessarily needs to be built into the schema, or that it needs to be included in the schema any time soon, but if we did that, it could be an optional "endorsements" sub-object [16:04:02] Migrating later sounds unnecessarily painful. What is the cost of including this functionality at launch time? [16:04:03] Especially since we already know the data will be used. [16:04:29] It wouldn't require a migration. This would be a backwards-compat change. But if there's an easy way to include it we can do that. [16:04:56] In any case, before we overwork ourselves figuring out this exact thing, it may be worth figuring out other schema requirements. [16:05:31] for the Nth time :P [16:07:11] awight: what do you think it would take to support this... I guess we can call it endorsements, use case? [16:07:30] halfak: This of course happened while you were on vacation, but I've driven myself back to the starting line, plus a bit further back even. I'm currently looking at how we might be about to use pure wikitext, totally free-form propositions and conclusions. [16:07:52] awight, unstructured wikitext? O_O [16:08:01] I'm blocked though, cos the next step for designing anything like that is to see what editors will do in the wild. [16:08:04] :-) [16:08:05] exactly. [16:08:30] Here's a crazy example, [16:08:33] {{Diff|123}} is {{damaging}} because it adds an {{WP:IRRELEVANT}} paragraph without bringing value to the... [16:08:34] Editors are already doing things in the wild with wikitext [16:08:34] 10[2] 10https://meta.wikimedia.org/wiki/Template:Diff13 => [16:08:37] 10[3] 04https://meta.wikimedia.org/wiki/Template:damaging13 => [16:08:38] lol [16:08:40] 10[4] 04https://meta.wikimedia.org/wiki/Template:WP:IRRELEVANT [16:08:48] * awight tases AsimovBot [16:09:31] What about MCR slots with freeform content within each slot? [16:09:51] The slot being responsible for providing the necessary context [16:10:04] harej: I'm open to it, what are the slots? [16:10:24] I mean, I'd honestly like to avoid MCR for now but happy to hear how it might help. [16:10:58] Hmmm [16:11:06] Thinking about it more, there's still certain pieces of structured data that we want. [16:11:10] So inevitably we fall back to JSON [16:11:21] One way I think we would use MCR is by having separate slots for different types of judgments being rendered. [16:13:35] For this particular use case, MCR is either a godsend or a gimmick, and I'm not sure which. [16:17:14] We've been advised not to use MCR because it doesn't have any integrations and won't have them any time soon. [16:18:48] We may have an updated understanding on MCR since the RFC meeting, that it's better at supporting use cases. awight, can you confirm? [16:19:15] In any case, I'm not sure we should be going down strange rabbit holes. [16:19:21] harej: It still has no AbuseFilter integration, and various UI stuff is either completely unplanned or planned and lacking. [16:19:29] Oh. Then it's clearly a nonstarter. [16:19:44] We don't even know what it will look like in page history, AFAIK [16:21:08] I like the idea of capturing more freeform content, but I wouldn't want to sacrifice all structure for it. [16:23:20] Someone flagging something as true or false, that's something that should be captured in a structured way and not based on freeform text. [16:23:28] Same with any other controlled vocabulary or enum. [16:25:34] I want the propositions to be freeform, so that users can express "sure smells like a paid sock edit", then people are not simply flagging true or false but having discussion and perhaps revising the proposition and synthesizing into a conclusion. [16:25:39] or whatever they do :-) [16:27:54] As an example of why I think freeform is the best place to start testing, the "endorsements" concept is based on an existing practice of "!vote" consensus, but no matter what we do in the schema we're going to be providing a stunted version of the normal process. [16:29:21] I'd rather see the full range of propositions people want to make, and let them discuss using "native" methods, then determine how to get the machine-readable stuff out and see if the tools should be providing some common structure. [16:29:29] You have an interesting point, but I don't think this is the way to do it. From a UI perspective, it'd be nice if it looked like people were just editing other wiki pages. I think there should be an underlying structure that gives the context necessary to make this data useful. [16:30:56] I have that instinct too, which I'm trying to resist :-) Why would we decide on a particular structure for discussion? All we need to get at the end of the day is a machine-readable answer, "is this damaging?" [16:32:06] If {{conclusion|damaging}} or nondamaging is enough to flag that, I think we've accomplished our goals and created a generative and inspiring platform. [16:32:06] 10[5] 04https://meta.wikimedia.org/wiki/Template:conclusion [16:32:15] AsimovBot: I knew you might say that. [16:32:16] 04Error: Command “i” not recognized. Please review and correct what you’ve written. [16:32:21] got me that time. [16:33:54] awight, FWIW, I think the endorsements pattern serves many more needs than simply replicating the !vote pattern. [16:34:25] E.g. tracking the origin of the person making the judgment as well as their identifier so we can easily process that later. [16:34:51] E.g. lets say we build this meta-moderation interface. How do we know that a judgment has been moderated and confirmed by another user? [16:35:11] Structured data is the best strategy IMO [16:38:07] In the spirit of !vote and from the literature I've seen so far, we shouldn't be do anything like tallying the number of endorsements a proposition has, at least not to produce a machine conclusion, so I'm not sure what purpose tracking the editors will serve. [16:38:36] It'll be very interesting to analyze, but won't give us higher quality data AIUI [16:38:37] We can track the context using edit tags. [16:38:53] awight, there are ways to review without tallying. E.g. "do two unique users agree?" [16:38:57] Insofar as we need to track the context for individual edits, such as a person clicking on an ORES-highlighted edit [16:39:36] awight, will help us know which judgments have been reviewed. See the law of four eyes. [16:39:56] * halfak digs [16:40:09] Hmm, I'd think we want to use the data whether or not it gets reviewed? [16:41:04] awight, depends who "we" are and what our "use" is. [16:41:57] Arg, Can't find this but essentially the idea is that all serious decisions in Wikipedia should be reviewed by at least two people who agree before being implemented. [16:42:04] It's sort of a minimal-consensus pattern. [16:42:20] E.g. article deletions should just be performed by a single admin using their own judgment in a vacuum. [16:43:56] I love the four-eyes idea and have tried to use it for my own community projects, but in our case it seems like a minor decision to say "this is damaging" [16:44:19] it's obviously lower quality data than if 4 or 40 eyes have looked [16:44:34] awight, right. But let's say we're looking for places that ORES was wrong. Well, if we have two people who agree that ORES is wrong, we can probably be pretty sure that ORES is wrong. [16:44:38] but are we tracking data quality, or just designing a system that encourages max quality? [16:44:45] awight, the difference between 1 and 2 is huge. [16:44:51] +1 :) [16:44:52] The difference between 2 and 40 is smaller. [16:45:41] I think this speaks to harej's task though, that we want to steer an extra reviewer to surprising or suspect judgments in order to confirm. [16:45:54] We can do that just by looking at the number of editors, it wouldn't require structure. [16:47:35] Would be nicer for us to know that it wasn't just a typo correction, but at least a second person looked at the judgment. [16:47:56] awight, which judgement was edited? [16:48:07] Also revision count isn't part of a most-recent revision XML dump. [16:48:21] Also, from revision count, we can't figure out if the editors were different people. [16:48:29] does quality of eyes follow a log pattern? [16:48:36] err, quality of number of eyes [16:48:36] wat [16:48:47] you said the difference between 1 and 2 is huge and between 2 and 40 is smaller [16:48:59] Oh maybe log-ish [16:49:39] halfak: good point about multiple revisions [16:49:44] I mean, multiple judgments. [16:51:08] I like the idea that people can withdraw their own endorsements or remove vandalism endorsements too. [16:51:15] Hard to do that with the revision history. [16:52:37] Another consideration is that I don't think we can double the amount of reviewer labor, so we assume that multiple reviewers is a rare case. [16:53:04] Right. [16:53:23] Slashdot uses randomization to assign meta-moderation work [16:53:26] *used* [16:53:31] I think that paper is now out of date ;) [16:56:14] Sniezek? hehe yeah I think so too. It was a very specific context, anyway. [16:56:51] I noticed that papers in this area seem very ideologically motivated, and come to extreme conclusions about democratic consensus being good or bad. [16:57:46] It would be really easy to cherry-pick and conclude that the thing I already wanted to hear is correct. [16:58:53] awight, uh. Certainly not what I was taking from it. [16:59:18] I took from it a system that mostly worked using a distributed moderation strategy based on statistical principles(randomness) [16:59:46] Send me the out of date paper ;-) [16:59:58] I was looking at other stuff, I think. [17:00:47] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.62.8435&rep=rep1&type=pdf [17:00:55] Here's an example of a paper that concludes that discussion is bad, https://chicagounbound.uchicago.edu/cgi/viewcontent.cgi?article=1087&context=public_law_and_legal_theory [17:01:11] halfak: neat, ty [17:02:20] I'm going to head to lunch. [17:02:35] So, what would you think about parsing wikitext !votes where a proposition is followed by {{support}} .... ~~~~ [17:02:36] 10[6] 10https://meta.wikimedia.org/wiki/Template:support [17:02:56] kk, thanks for entertaining round 5 of JADE design [17:03:41] Re. your last question, I think the 5 months of work that went into processing deletion discussions for http://files.grouplens.org/papers/lam_group2009_wikipedia-longer-tail.pdf would say that is a very bad idea. [17:05:00] fun! [17:07:27] I like the idea of doing sentiment analysis of unstructured content being *possible* but I don't want it to be the only option. [17:10:16] Not really sentiment analysis, IMO that's too subtle and probably won't work for many languages. [17:10:32] I'm just giving an alternative to endorsements, since they don't seem to serve much purpose. [17:10:59] They're a rough indicator of data quality and agreement or disagreement, but what do we do with that? [17:12:12] Can't we just design for encouraging high data quality, take all the data we get at face value, then use it for whatever purpose and see if it's good or not? [17:12:48] If we have to cull lower-quality data, it seems like we have a few tools for doing that even with the extreme case of freeform judgment content. [18:14:14] halfak|Lunch: FYI, the Lam and Riedl paper never actually analyzed deletion discussions. They parsed deletion log comments for keywords like "prodded", but they leave the discussion itself for future work. [18:14:37] awight, oh! must have been a follow-up [18:14:38] Doesn't invalidate your point of course, it probably means they thought about analyzing the discussion and decided it was too difficult. [18:15:17] Na. they definitely did. [18:16:11] http://www.pensivepuffin.com/dwmcphd/syllabi/insc547_wi13/papers/wikipedia/lam-decisionmakingRfA-GROUP10.pdf [18:16:12] Got it! [18:16:30] dang that was fast [18:19:32] I think I might apply to https://algorithmsworkshop.wixsite.com/mysite/call-for-papers [18:19:41] Looks like it might be productive. [18:20:39] Question about open access: I've been noticing that a lot of authors have a pre-publication "draft" of the paper posted on their personal sites, is that a workaround that avoids the weird copyright ownership after publication? [18:21:17] halfak, so that means you are attending cscw? :) [18:21:29] halfak: yes! That's a JADE-y workshop if I've ever seen one. [18:22:04] codezee, yes. I'll be there. I have, like, 4 papers in CSCW this year lol [18:22:30] nice! finally a chance to meet you [18:22:47] Oh yeah! We haven't met in person! I'd forgotten. lol [18:22:54] So much working online! [18:23:59] halfak, btw the conference ends on 7th evening, so we can checkout on 7th itself right? no need to book for the night of 7th i think [18:24:58] Yeah. It depends on your flight arrangements. You might regret having to cut your conference attendance short in order to catch a flight. [18:25:27] 4/5 times I end up spending another night. Sometimes I can book a flight that fits perfectly. When I can, I take that opportunity. [18:44:45] * halfak looks into building a workshop paper. [19:27:53] 10Scoring-platform-team, 10Wikilabels, 10articlequality-modeling, 10artificial-intelligence: Build article quality model for Galician Wikipedia - https://phabricator.wikimedia.org/T201146 (10Halfak) I think the best place to do that would be to simply describe a labeling scale in a similar way to what you... [19:28:56] 10Scoring-platform-team, 10ORES, 10Documentation, 10User-srodlund: Feedback on ORES threshold optimization docs - https://phabricator.wikimedia.org/T203505 (10srodlund) p:05Triage>03Normal @awight I made some super quick changes to style and grammar. I'm assuming your audience is individuals who are al... [20:05:00] Docs meeting? [20:05:05] awight, srrodlund? [20:06:14] halfak: I don't have it on my calendar [20:06:30] Oh! I see you have a "No" [20:06:52] Did you say "no" to all future meetings? I haven't seen you at one of these in a while. [20:07:13] Confirmed. You are a "no" for all of them :P [20:07:41] Our last todo was to have you review our work at https://www.mediawiki.org/wiki/ORES/Thresholds [20:07:52] And advise on what needs to be done to call this "good" [20:08:11] It is not showing up on my calendar at all, so that is possible. I just updated the Phab task for that, though [20:08:54] Argh! I thought it was later for some reason. Are we doing this? [20:09:00] There is still a To Do section that needs to be completed. Otherwise the page fits its audience well. I made some minor grammar changes [20:09:17] No. I actually have another meeting I need to prep for. Sorry [20:09:23] I can do it later this week. [20:09:33] I just re-invited you to future events. [20:09:39] Dang, I have to transfer Mari between school and afterschool at this hour every day now, I'll put that on my calendar. [20:10:06] ok or just change the time [20:12:05] Adam and Aaron can you let me know best times? I have a few meetings and standing appts these days and don't have as much flexibility but we can probably find a time that works for us all [20:13:38] I'll let awight move. [20:14:21] kk, back on duty at ~ 21:00 UTC [21:17:15] Yay! Another surprise slide deck to be reviewed while I'm on vacation. [21:21:16] lol glad to hear everything is the SNAFU [21:35:26] OK I'm out of here. Gotta go bike to downtown to meet up with Subbu and the parsing team for dinner. [21:35:43] Still holding out hope that y'all can come for an offsite in MSP some time ; [22:05:35] harej: eureka? We do use MCR, one slot for freeform wikitext and the other strictly structured json [22:05:42] that... would be the whole point of MCR. [22:06:23] and also leaves us in a great position to evolve the schema, going back and forth between the two slots as needed to express things better. [22:07:18] Migrations between the two are not difficult, and make linear change to the [22:07:27] err hehe that last part isn't right, yet. [22:08:15] Something like 1:1 migrations, where a semi-structured stanza in the wikitext would map nicely to a structured representation, so it's plainly lossless. [22:09:51] 10Scoring-platform-team, 10ORES: Explore alternative model serializations - https://phabricator.wikimedia.org/T201047 (10awight) [22:10:37] 10Scoring-platform-team (Current), 10DBA, 10JADE, 10Operations, and 2 others: [Epic] Extension:JADE scalability concerns - https://phabricator.wikimedia.org/T196547 (10awight) [22:12:55] 10Scoring-platform-team (Current), 10Community-Tech, 10JADE, 10WMF-Communications: Blog about JADE - https://phabricator.wikimedia.org/T183200 (10awight) [22:21:51] halAFK: ^ you might be interested in the latest MCR twist. I think it unifies the two worlds we've been juggling, and we're free to make a best-attempt initial schema for the JSON slot. [22:23:28] In the same vein, maybe that schema should be liberal rather than strict, such as our last draft was before my latest unmooring. [22:23:47] We accept multiple judgments, have optional sections for endorsements and provenance fields [22:26:00] That way, we have the leeway to experiment with various methods of inputting data. [22:27:36] The liberal schema does complicate clients, but I'm thinking it's easier to write that hard parser in a few languages, than parse even semi-structured wikitext by tallying templates. [22:27:57] or at least, we get much more machine-readability for the effort. [22:46:52] 10Scoring-platform-team (Current), 10DBA, 10JADE, 10Operations, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) I'm making some changes to the proposal, which I hope emphasize the role of Judgment pages to car... [22:55:51] A judgment can also have *only* a wikitext or JSON content. I'm really liking this, so far. [23:16:35] This also simplifies the JSON schema, we can remove judgment.notes since that *is* the wikitext. [23:17:15] I'm working on something wacky with endorsements, realized that a rev_id can serve as your ~~~~ [23:17:44] * awight pretends to be surprised by AsimovBot's silence on that [23:18:20] The rev_id can be from a content page edit or talk page edit, and it's the revision where the author states their endorsement in prose. [23:18:57] I haven't figured out how that would be updated, maybe just give a new rev_id and that should be fairly easy to analyze using both main slot and json histories. [23:21:24] awight: out of curiosity, what do you think resolution of https://phabricator.wikimedia.org/T196547 would look like? [23:21:59] what. this actually works: endorsement also has a proposition_rev, which is the proposition being endorsed. The proposition can then be updated, and we can poll people to update their !vote. [23:22:14] harej: cool, thanks for asking. [23:22:38] I think it's nearly done, sort of a vestigal subtask of the RFC now. [23:22:57] Once the RFC is closed, that's concluded. [23:29:02] harej: Do take a peek at backscroll when you get the chance, I think this is going somewhere. [23:34:26] I like the idea of having a JSON slot and a wikitext slot, especially if it unblocks us on the Forbidden Four. [23:34:46] And generally makes TechCom happier [23:35:00] I'm not sure why we're going back to multiple judgments [23:38:05] 10Scoring-platform-team (Current), 10DBA, 10JADE, 10Operations, 10User-Joe: Write our anticipated "phase two" schemas and submit for review - https://phabricator.wikimedia.org/T202596 (10awight) Here's a new proposal for the anatomy of a judgment (in this case, of a diff): ``` wikitext (main slot): n... [23:40:00] harej: I went back on both the multiple judgments and endorsement only because it seemed in the spirit of having the multiple slots. It will be up to the tools to suggest or enforce different usage patterns. [23:40:30] Here's another major issue which it resolves: there's nowhere to hang endorsements off of for non-preferred judgments. [23:41:34] i.e., if someone toggles a judgment from false to true because they think that should be preferred, then the endorsements are all invalid. [23:41:53] I included endorsements.rank so that non-preferred judgments can remain. [23:42:17] Our business logic can enforce a single normal or preferred judgment any time we want to do that. [23:43:22] I'm thrilled that I got the judgment.notes out of JSON, btw, that's been haunting me. editing snippets of wikitext inside of a JSON document? I'd like to not be the first person on that remote planet. [23:44:09] k I gtg pick up from school, feel free to comment on T202596 or here! [23:44:10] T202596: Write our anticipated "phase two" schemas and submit for review - https://phabricator.wikimedia.org/T202596