[21:00:11] #startmeeting RFC meeting [21:00:11] Meeting started Wed Aug 22 21:00:11 2018 UTC and is due to finish in 60 minutes. The chair is kchapman. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:00:11] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:00:11] The meeting name has been set to 'rfc_meeting' [21:00:21] #topic RFC: Introduce a new namespace for collaborative judgments about wiki entities https://phabricator.wikimedia.org/T200297 [21:00:34] #link https://phabricator.wikimedia.org/T200297 [21:00:45] hi! I can kick off with a short overview of the proposal, if that's helpful. [21:00:46] who is here to discuss the RFC? [21:00:53] <_joe_> o/ [21:00:53] o/ [21:01:01] o/ [21:01:05] o/ [21:01:06] o/ [21:01:09] o/ just listening [21:01:19] o/ [21:01:22] o/ [21:01:29] oi [21:01:32] awight: go ahead and kick if off [21:01:38] ty [21:01:44] lemme know if I'm using the bot wrong: [21:02:25] #info Judgment and Dialogue Engine (JADE) is a data store for subjective opinions about wiki entities, be they wiki pages, revisions (snapshots), or diffs (edits). [21:02:28] mostly the #info tag is your friend to note things like agreements and important points [21:02:31] aha [21:02:59] We brought this as an RFC because our proposed backend storage is both elegant and horrifying in equal measures. [21:03:32] Specifically, we're planning to use ordinary MediaWiki article pages to store a JSON representation of the type of data that will be produced through JADE. [21:04:11] A namespace "Judgment" (exact name to be determined) would have a special content-handler to do stuff like validate that the contents are a legal set of judgments. [21:04:44] is this meant only for manual labels or automatic labels are part of the use case too? [21:04:47] Another namespace "Judgment_talk" is provided, to give users a natural place to have conversations about the content, and come to consensus about values e.g. whether an edit was damaging or not. [21:04:48] can you point to a non trivial example of a content? [21:04:49] Purely manual labels. [21:05:34] jynus: In https://www.mediawiki.org/wiki/Extension:JADE#Data_structure, you'll find a few examples of content. [21:05:36] what about ORES labels (which are semi-automatic)? [21:05:44] ORES will continue to use its own system. [21:05:46] These pages can hold multiple judgments, but all about a single wiki entity. [21:05:47] also, what is the data type - is it binary labels, tags, classification, categorization, all of the above? [21:06:10] Yep, ORES or any other automatically created scores are inappropriate for this namespace. [21:06:18] will there be an API so that feedback can be solicited on the diff page, without going to the JADE edit page? [21:06:37] The presence of a 'note' field I found surprising. Would that not be more suitable for the edit summary and/or talk page? [21:06:53] SMalyshev: The data types are curently: single boolean, single string enum, and multi-string set. See https://phabricator.wikimedia.org/diffusion/EJAD/browse/master/jsonschema/judgment/v1.json [21:07:13] TimStarling: Yes, here's the prototype API, https://www.mediawiki.org/wiki/Extension:JADE#API [21:07:47] <_joe_> awight: the entity can be either a page, a revision or a diff, right? Do they all need the same solution? Each of those need a wiki page for holding a judgement? [21:08:15] #link https://www.mediawiki.org/wiki/Extension:JADE#Data_structure [21:08:20] how would a jade namespace page be identified? [21:08:40] Krinkle: I agree, there was some discussion about this so far and we decided that we want the users to be able to edit their notes. Also, users should be able to edit each other's notes. If edit summaries were editable, we might have considered that, but as it is the idea is: * edit summary is "why you made the change", * notes are "why you believe the value you supplied" [21:08:42] e.g. normal pages have a title [21:08:45] <_joe_> i can imagine a judgement about a revision to be quite different from a judgement about a page [21:09:01] Would WikiProject assignment be a potential use case? i.e. "This page belongs to WikiProject Medicine"? [21:09:01] but you don't have titles for revisions, diffs? [21:09:16] jynus: The title is awkward, see https://www.mediawiki.org/wiki/Extension:JADE#Wiki,_namespace,_and_title . For example, "Judgment:Diff/123456" [21:09:28] awight can correct me, but: Judgment:(Page|Revision|Diff)/{page_id|revision_id|revision_id) [21:10:01] _joe_: agreed, revision vs page judgments are stored on separate pages and have mutually exclusive allowed schemas. [21:10:03] revision referring to the page as it existed at a point of time, diff as in the difference between the given revision and its parent [21:10:03] That'S not pretty, but doesn't have to look like that on the page, just in the URL [21:10:05] awight: I understand, but what is the meaning of such note in the context of a machine-readable judgement system? I assume the outcomes/goals are primarily to indirectly power things, such as categorising pages as good articles, adding page-indicators atop the page view to show its rating, etc, and for ORES to learn form (like WikiLabels does now, correct?). [21:10:25] <_joe_> awight: so, do you really need a wiki page for judgements about revisions/diffs? [21:10:27] kaldari: I'd like to support exactly that, yeah. [21:10:36] cool [21:10:44] kaldari: In fact, a page might be categorized into multiple topics of course. [21:10:50] sure [21:11:21] awight: on the topic of associating tiltes with their subject: the experience with wikidata shows that maintaing such a mapping in the database is annoying and pointless, if a programmatic mapping is possible. we ended up ditching the table ofter a couple of years [21:11:23] Krinkle: in case that helps, JADE supposed to empower backend of wikilabels, so wikilabels in future can hold set of edits to review and put them into JADE (whatever storage JADE wants to use) [21:11:46] I also imagine this could replace some of the "maintanence categories" system, like "Pages needing copyediting", "Pages lacking citations", etc. [21:11:55] Krinkle: the machine-readability is downstream, judgment.notes are mostly for human use. Also, from what I've read it seems that data quality is much higher when you provide a freeform text field where people are encouraged to justify their work. [21:12:13] Yes, which is why the presence of a note surprises me in the data model. I would expect such note to be identical to the edit summary explaining why the user choose that score etc. [21:12:25] _joe_: You cut to the heart of the question :-) We don't know, but the use cases do point towards pages being a great fit. [21:12:27] it's kind of ugly from the user's perspective to use the page_id in the URL [21:12:51] kaldari, awight: i can well imagine the using proposed JSOPN structure for that kind of info about a page. but for page level things, why use a separate namespace, instead of MCR? [21:12:52] DanielK_WMDE_: Cool, a programmatic mapping is possible, but I don't like the thought of including that in SQL queries. [21:12:54] <_joe_> kaldari: we're (I suppose) here to discuss the scalability issues of the proposed solution; if we want to add more uses to this system - which is great - we should tackle the issues first, imho [21:13:24] <_joe_> storing a json into a page looks to me a lot like you want a versioned object storage [21:13:27] yeah, MCR was supposed to be the place to move those page metadata templates into [21:13:36] kaldari: if you have ideas for use cases I'd love to discuss them with you. I also thought of the pageassessments parallel [21:13:53] _joe_: got it. I'll be quiet now :) [21:13:55] awight: putting judgements about different things into a single namespace seems odd to me. judgements about pages (and perhaps about users) could easily and naturally use MCR. judgements abour revisions are the odd one out - they coudl have their own namespace, if editing is absolutely needed. [21:13:58] TimStarling: I don't like that part either. I also have unresolved questions about the scope of a judgment page, which can hold multiple judgments and schemas, but haven't found a better alternative yet. [21:14:33] you could have a JADE object in MCR if you wanted to [21:14:39] <_joe_> kaldari: heh not what I meant :) [21:14:39] using the same schema [21:14:59] awight: yes, the sql gets icky, but it turns out to be more efficient that way. joins can get rather expensive [21:15:05] The associating of data with a page does seem problematic to me. For two reasons 1) page ids are not always stable, everything else in mediawiki exclusively associates with titles (except for internal linkage, sometimes). and 2) I would expect as a user to be able to view a linear history of the page. Right now, this is how other page properties and categories are changed and modified. Moving that away means watchlist notifications are [21:15:05] missed, and reverting becomes more difficuly, as well as viewing older revisions. E.g. viewing a featured article at a time from 2 years ago, should /not/ query JADE and see the page ID is currently marked as good. [21:15:05] DanielK_WMDE_: MCR is great but as you point out it will only work for certain wiki entities. I'd rather have the mechanism be uniform across entity types. We intend to support judgments of logged admin actions, for example. [21:15:55] awight: well, you could have the same ContentHandler (or subclasses). The same editing interface, the same rendering, etc. The only difference is namespace name and slot name. [21:15:57] Krinkle: that's surprising, since I'd expect page IDs to be much more stable than titles. Either way, we can redirect from a hook if necessary. [21:16:27] awight: wikibase does the same thing: you just say which entity goes into which slot in which namespace. you can mix&match freely. COntentHandler works the same, no matter where the stuff is stored,. [21:16:59] MCR is nicely structured, thanks to an extremely expensive development process which is finally coming to its conclusion, it doesn't require page move hooks or ugly URLs [21:17:01] Krinkle: Not sure I understand the second part of your question, we'll have easy joins from page ID to all judgments about page revisions, but if the page ID is changed we have to maintain somehow. [21:17:41] the JSON schema would be consistent between log judgements and page judgements, just not the storage system [21:17:43] I don't like the multiplexing page -> revision -> jade page -> revision, we already have issues with joins there, you are adding self joins to a very large table [21:17:45] awight: what I mean is, if you store page-related data on a separate page, how do you correlate? If we use jade to show "this is a grade B article", how does that work if I view last year's version of an article that was grade A at the time? [21:17:53] Krinkle, awight: page IDs vs Page titles gets odd when deletion, undeletion, and renames get mixed together. the current behavior is... unclear. there's no way to make either thing totally stable, it's really a conceptual problem [21:18:04] also watchlist and view=history wouldn't mix. MCR would solve that. [21:18:09] MCR would fix that though [21:18:14] ^_^ [21:18:34] DanielK_WMDE_: there are a few notes about potential MCR integration here, https://www.mediawiki.org/wiki/JADE/Implementations#Other_ideas -- but I see the lack of support for non-page entities as a big problem. [21:18:43] I think page ids *will* become stable, and should. But unfortunately, not yet. It'd become a source of bugs and technical debt. [21:18:54] why revisions cannot be properties of pages? [21:19:14] in the context of jade content, I mean [21:19:15] jynus: you might have to explain what you mean by that [21:19:30] jynus: that was considered, but if we have 1 blob storing all revision judgements, it would grow rapidly and be costly to read and write. [21:20:00] Krinkle: the article quality (e.g. grade "FA") is a judgment on revisions, for that reason, so you can see the history over time. [21:20:05] Krinkle: I think the problem is then what is the intended and expected usage [21:20:11] awight: what would be the problem with using slots for page level judgements, and putting revision level judgements into a separate namespace? how would that makwe things complicated? in my mind, it would avoid a lot of complications you might otherwise have to deal with wrt associating jsudgements with pages [21:20:36] Basically unless we want to load judgements for an entity, any other use case requires custom table [21:20:41] awight: Does that mean for every un-notable edit, JADE creates a duplicate page with the new rev id, based on the previous page/rev ID judgement? [21:20:46] as json blobs are not query-able [21:20:49] Question from a product perspective: if we use MCR, would it result in a weird/unfamiliar user interface because MCR is a new concept? Or would this be totally transparent to the end user? [21:21:09] jynus: There are a few use cases here, if that makes the expectations more concrete: https://docs.google.com/spreadsheets/d/1RPb8VHbseE_xPe46nFqo4QVYmwzgfFJrO4_Wh-QKBSw/edit#gid=0 [21:21:22] how would it correlate the "current" status of every revision? Or would the page status indicator go away every each edit and have to be re-inserted? [21:21:28] <_joe_> Amir1: I really dislike the idea of storing json blobs into mysql tables, it seems like the mother of all anti-patterns to me [21:21:51] the multiplexing for the most flexible options all pages -> all revisions -> jade pages -> revisions of judgemeents will make some queries impossible [21:22:02] it is a similar issue to watchlist + recentchanges [21:22:04] harej: it would be more transparent than storing stuff on separate pages. it would eb better integrasted with RC/Watchlist, page protection, etc. [21:22:06] <_joe_> have you thought of how you would perform schema migrations for the json objects, for instance? [21:22:07] DanielK_WMDE_: All of diff, revision, and page judgments are associated with pages somehow and we should possibly surface them all in history views. [21:22:16] _joe_: +1 [21:22:17] except with revision instead of only 30 days of data [21:22:23] _joe_: True, my suggestion was using logging instead [21:22:28] + custom tables [21:22:46] or a key/val store? [21:23:03] mobrovac: for which part? [21:23:09] Krinkle: no, JADE does nothing without human intervention. Judgments of two revisions on the same page are stored in separate JADE pages, e.g. Judgment:Diff/123 and Judgment:Diff/124 [21:23:10] json blobs [21:23:13] awight: binding to a revision is easy, binding to a page (without a specific revision) is hard [21:23:28] awight: different question: how will judgments be queries? [21:23:38] other than lookup by "target thing" Id, I mean [21:23:40] mobrovac: with custom table + logging, there wouldn't be any json blob [21:23:40] <_joe_> mobrovac: that's currently stored in external storage, AIUI [21:23:53] awight: yes, so if a page A is at revision 100, and I score it as "Good, Grade A", and then I edit it to fix a missing comma, the score is no longer associated with page A beause revision is now 101? [21:24:02] _joe_: For better or worse, we already have a bunch of JSON creeping into wiki pages, but also, a pure json blob with no other text is becoming a standard and modern DB engines like postgresql have primitives to query and update the json. [21:24:15] wow [21:24:24] "modern DB engines like postgresql" [21:24:35] Amir1, mobrovac: a k/v store (or nosql store) wouldn't have version management. If version management isn'tr needed, it's a good option. [21:24:38] because unmodern mysql doesn't have json native types :-) [21:24:38] no trolling please [21:24:52] <_joe_> awight: that's postgres chasing the document storage systems IMHO, but I think it's off-topic [21:24:57] https://dev.mysql.com/doc/refman/8.0/en/json.html [21:25:01] In the long term, it would be good to be able to search judgements (at least via API), for example, give me all the stubs that need copyediting. This will be very difficult/expensive if the data is in json blobs. [21:25:31] I was assuming there is a computed table in JADE that stores things in an indexable way. [21:25:33] _joe_: kaldari: migration will suck. I think we'll follow the wikidata precedent, and just announce that we're going to make a breaking change, then bot-edit all the pages :-/ [21:25:41] DanielK_WMDE_, Amir1: it is my impression that rev and diff judgement don't need to be editable? or am i missing something? [21:25:53] kaldari: the use cases you mention sound lay way more than the 1% that was projected. They soiund more like 100% of the article namespace... [21:26:21] mobrovac: awight seems to think otherwise [21:26:29] I'm not quite sure what the rationale is [21:26:45] apologies, I'm trying to go chronologically but like 3 minutes behind at this point. [21:26:45] mobrovac: they need to be editable but it can be handled by logs [21:26:50] instead of pages [21:26:56] *log action = logging table [21:27:15] Amir1: editable -> updatable? [21:27:22] DanielK_WMDE_: yes [21:27:34] DanielK_WMDE_: queries so far will be: * get all judgments on entity, * join all judgments from a revision pager, * write new judgment page content, * use the "append judgment" api [21:27:40] kaldari: what you said is very valid and it means regardless what backend we chose to use, we eventually need to build custom table anyway [21:27:50] Krinkle: I think so too, and the schema of that would eb a critical point of the design [21:28:11] #info migration [..] follow the wikidata precedent, and just announce that we're going to make a breaking change, then bot-edit all the pages [21:28:22] awight: so only queris *for* judgments, not *by* judgments? [21:28:28] DanielK_WMDE_: Is that what wikidata does? I was expecting that would break ability to view older revisions. [21:28:51] kaldari: For things that need to be indexed, we'll have to maintain separate tables and indexes with some of the data extracted, and updated using hooks. [21:28:53] <_joe_> Krinkle: you will have to make your code backwards compatible, I guess [21:29:05] awight: uh, *we* didn't bot-edit all the pages... [21:29:09] <_joe_> Krinkle: it's technical debt [21:29:19] Yeah, but if you need to be compat indefinitely, why bother editing. [21:29:19] Does anyone have a good example of a page with multiple MCR slots in action? [21:29:27] https://phabricator.wikimedia.org/T200297#4524704 So my main concern here is how "revision LEFT JOIN revision (and I guess JOIN recentchanges)· is going to work [21:29:30] Krinkle: i have no idea what that was referring to. [21:30:06] when right now we cannot even do many queries with a single scan of revision on enwiki, wikidatawiki [21:30:12] So the Wikidata approach (for better or worse) is schema cannot be changed, only extended. Indefinite compat, zero migration. [21:30:51] Krinkle, awight: the wikidata policy for breaking changes is . Compatibiliuty with old revisions is conisdered extremely important and is, to my knowledge, very close to 100% [21:31:09] <_joe_> ok, I don't think we're doing a lot of progress, we have at least 3 different threads going on. [21:31:16] DanielK_WMDE_: I call " editable -> updatable" a good compromise on user-level but people might have different ideas [21:31:36] DanielK_WMDE_: So you mean, a query like "get all judgments where schema=damaging and data=true"? There's no use case for that, yet. But we can extract into custom tables if it becomes necessary. [21:31:38] Amir1: ok, but if it's doable with log tables, then presumably the history for such judgments is not relevant? [21:31:42] Amir1: that would avoid the overhead of full version management for revision judgments [21:32:12] yeah, same ^ [21:32:27] <_joe_> awight, Amir1 what would prevent us from using (maybe MCR) + logs + custom tables, thus avoiding the need to add more records to the revision table? [21:32:33] I do not know if the use cases for Jade include querying the data when normally browsing the wiki, but it seems to me like a perf concern if we have to query other revisions of other pages to be able to view a normal page/revision. That's a fair amount of read queries + json parsing etc. That would presumably need some form of secondary index table that is updated on-save in Jade namespace. [21:32:36] mobrovac DanielK_WMDE_ : Yes, exactly [21:32:49] awight: extracting into custom tables (or into a different service, like leastic) is the solution, but thinking early about what and how, and how that scales, is important [21:32:58] Krinkle is saying what I mean, but better [21:33:00] awight should we go through each thread at a time? _joe_ mentioned we have 3 going on [21:33:16] _joe_: correct assessment [21:33:31] <_joe_> Amir1: you mean it is possible? [21:33:51] kchapman: Thanks, either way. I'm getting a bit dizzy interleaving responses, at least. [21:33:59] _joe_: it'S possibel at the price of not having revision management, just a change log. [21:34:11] There are more tradeoffs than just that ^ [21:34:22] <_joe_> DanielK_WMDE_: for revisions, not for pages AIUI? [21:34:31] _joe_: It has some downsides as outlined in https://docs.google.com/spreadsheets/d/1y7CPeAFpjOO-FTXLhp9qfO3lx6-OsaroCMNSNJMUFqc/edit?ouid=115707764633511727503&usp=sheets_home&ths=true [21:34:45] <_joe_> Amir1: yeah I took a quick look at that [21:34:55] _joe_: that'S how I'd do it, yea. for page level stuff, use mcr. [21:35:41] DanielK_WMDE_: harej keeps asking what the MCR UI looks like, is there a labs instance or something for that? [21:35:41] DanielK_WMDE_: Can we go through what TechCom needs to make a decision on this? [21:36:13] <_joe_> So, it seems clear to me - correct me if I'm wrong - that the proposed model creates the potential for very expensive queries that both Krinkle and jynus think could be critical [21:36:16] DanielK_WMDE_: I wanted to address "the use cases you mention sound lay way more than the 1% that was projected"--that rough number is the sum of all patrolling workflows that I'm aware of. JADE volume is limited by human labor, so I hardly see people suddenly becoming interested in reviewing every single edit, especially if it's mostly going to be a moot point. [21:36:21] <_joe_> I think that is a deal breaker? [21:36:46] <_joe_> kchapman: I think we should discuss this ^ [21:36:55] I don't see the argument for queries being expensive, so would like to discuss. [21:37:15] harej, TimStarling: there is no MCR UI, really. Most things will just work exactly as before. Except diffs may cover mutliple slots and such. No demo yet, give it another two or three weeks. [21:37:23] It's similar to any other query included in the revision pager. [21:37:31] no [21:37:38] it is a self join on revision [21:37:43] to be clear I am saying absolutely no on the idea of putting wikiproject membership or featured article status into a separate page in a JADE namespace [21:37:48] and we already have huge issues with revision [21:37:57] <_joe_> awight: < Krinkle> I do not know if the use cases for Jade include querying the data when normally browsing the wiki, but it seems to me like a perf concern if we have to query [21:38:01] <_joe_> other revisions of other pages to be able to view a normal page/revision. That's a fair amount of read queries + json parsing etc. That would presumably need some [21:38:05] <_joe_> form of secondary index table that is updated on-save in Jade namespace. [21:38:08] <_joe_> argh sorry for the bad paste [21:38:08] harej, TimStarling: the most we'll ever have an "MCR UI" would be an edit page with mutliple text fields. But we are not working on that yet, since SDC doesn't need atomic free text edits. [21:38:34] _joe_: that's correct, and we're planning for that secondary index, see https://phabricator.wikimedia.org/T200297#4524547 [21:38:36] <_joe_> but yes, basically they argue that we would be joining the revision table with itself, which is surely expensive on large wikis [21:38:39] DanielK_WMDE_: the idea for SDC is to have a separate action for editing the structured slot, right? [21:38:50] TimStarling: it will use the wikibase api [21:38:56] so, yes. [21:39:03] DanielK_WMDE_: so it would content that is adjunct to a given page, edited through whatever interface we come up with, yes? [21:39:07] <_joe_> awight: so add a custom table? [21:39:15] harej: yes. [21:39:16] Yes. Unless jade is only used on-wiki to store judgement when interacted with and queried offline/in aggregate, we would need a table or other store for at least the parts that need to be queryable. With that in mind, it seems to me like there is potential for a solution that does not involve pages for jade at all, whilst still reaping the benefits of the 'wiki model' and the 'abuse management features'. Specifically, the wiki model would be [21:39:16] adhered to for page-level stuff if using MCR (even better than a jade namepace because watchlist would work naturally, and move/delete would work without needing to "sync" between Article and Jade). And revision stuff could be stored in the table directly, with only log entry - which is subject to abuse filter and revision delete (revision delete works on log entries. [21:39:23] awight: so if extra tables are needed [21:39:25] One thing I want to mention is that limiting JADE to human labor is basically not fixing the scalability problem, it's actually acknowledging that it's not scalable [21:39:26] ¿don't we already have a table for that? [21:39:30] why not just use extra tables? [21:39:41] Platonides: yes, it is call page properties [21:39:44] or something [21:39:49] page_props [21:39:52] that one [21:39:55] * Platonides is looking for the schema [21:40:01] why not use that? [21:40:21] Krinkle: jynus: It's a good question, the reason we want to use pages is so that they can be edited and suppressed normally. [21:40:26] you are hiper-normalizing just to denormalie again [21:40:43] <_joe_> awight: same question as jynus: if we need additional tables, then the whole reason to use the page abstraction (not adding additional tables) is lost. [21:40:44] as you will need index-tables to make things work [21:40:56] awight: i get teh 1% for revison judgements. but kaldari brought up page level judgements, with use cases such as missing citations, stubs, or protal association, which would cover pretty much all pages in the article namespace [21:40:56] That's not the case, see wikibase. [21:41:56] Platonides: page_props doesn't give you version management. You can update it, but that wouldn't show in the history, or watchlists. no diffs, no reverts, etc [21:42:01] To answer the "MCR UI" question, there isn't really a question of what the UI is, it's storage model, like revisions. The UI is up to the extension. But it *can* be present on action=edit if desirable (second editor), or via another edit interface e.g. action=edit-judgement. MCR would be stored, much like wikitext, and be on action=history, abuse filter, revision surpression etc. all works for free. [21:42:07] (catching up) it seems like judgements on revisions and diffs are really similar? [21:42:07] sure, use those extra tables, and put the normalized content outside of the main metadata dbs [21:42:21] that is, they could both use the 'revision/XYZ' namespace, just with different judgement titles? [21:42:21] Platonides: I'm not saying that is needed, but as far as I understand, it'S a stated product requirement [21:42:25] I think it would be a better use of our time to not talk any more about kaldari's ideas for splitting existing templates into JADE [21:42:41] Sorry, ^ elaborating: we use wiki pages because that's the only feasible way to get wiki behaviors. We cannot emulate any of that on top of custom tables. The custom tables we're planning are strictly secondary, think of them as a fancy way of just adding indexes to our custom table or whatever. [21:42:42] implicitly by annotating revision/XYZ we are also specifying the diff from XYZ to XYZ-1 [21:43:10] not really [21:43:13] and if we can use just page/title and revision/xyz as namespaces, then it seems like we could use MCR for storage [21:43:21] cscott: not conceptually, a revision judgement can be quality of the article at that point of time but diff judgement would be if it's vandalism or not [21:43:25] cscott: They're very closely related, but judgments will be about a different thing. e.g. revision judgments are never about the edit itself. [21:43:40] the article at revision XYZ may be very good [21:43:52] but the last diff could have removed a lot of important info [21:44:06] even though XYZ is still not bad [21:44:12] cscott: We can use MCR for page-level things indeed, but presumably not for diffs and revisions, since those are meant to be immutable. (how would I see the previosu value of a slot of a revision, if it is addded/overwritten into the same revision). [21:45:06] awight: one point I'd like you to consider is if it'S really necessary to store all kinds of judgement in the same place. You are encoding types using subpage syntax. you could just as easily encode it using slot names, or namespaces. [21:45:11] time check: 15 mins left [21:45:22] I'd like to second what awight said, the discussion boils down is it worth reinventing the wheel (suppression, etc.) for the sake of storage or not [21:45:38] cscott: judgement of revisions is mostly what you think of as judgement of pages. it's about the current state of the page as a whole (more like Flaggedrevs perhaps?). Diffs is about the edit as made. [21:45:57] and how we can reduce the cost of this reinvent if we are going in that direction [21:45:59] suppression will work with MCR [21:46:09] what about abusefilter? [21:46:12] awight: the wiki-like behavior requirement seems to be a major point of contention. can you point us to some kind of rationale for this? since it'S expensive on the technical side, there should be a very good reason for it on the use case side. [21:46:14] it's not just about suppression vs storage but about the expense of queries too [21:46:22] Amir1: I believe Daniel and myself have both indicated that suppression would *not* need to be re-invented when using MCR and logging with custom tables. [21:46:35] harej: abusefilter will also work with MCR, though some of its features will need some tweaking. [21:47:05] cscott uses the term annotation, which reminds me of T185607, I don't think I spelt out the schema there but I imagined it would be identical to Flow [21:47:06] T185607: Provide an inline discussion feature, "DiscussThis" - https://phabricator.wikimedia.org/T185607 [21:47:09] Krinkle: yup, that's why I lean towards logging + custom (Maybe MCR) [21:47:14] <_joe_> Amir1: we just demonstrated you can't query these pages directly, because of the storage issues. I don't think it's really up for discussion at this point. Frankly, if we need to add additional tables just to be able to query the pages in this namespace, it should tell us the storage model can't really cope with the abstraction we created on top of it. [21:47:15] I'd like to sideline the MCR thread if that's okay, since IMO it's a small adjustment to the main proposal. [21:47:32] harej: specifically, anything that matches against raw wikitext is limited to the main slot for now. but stuff like external links should work out of the box. [21:47:32] that is to say, Flow discussions attached to revisions by means of a new tracking table that links them [21:47:44] harej: but let's duscuss that some other time :) [21:48:15] so you don't have the rev_id in the page title as a subpage, the page title can just be autoincremented [21:48:47] awight are there other items you need clarity on to adjust your proposal? [21:49:02] DanielK_WMDE_: Thanks, all we know so far is that * freeform text input is what makes judgments meaningful to humans, so we want to keep that. * editing is required because people make accidents or change their mind. * suppression will be required for any freeform text. [21:49:38] * normalizing jade content as regular content will decrease the revision workflow scalability- scale revision so it can scale inifnitely and I personally will have no problem with doing that (that is not exclusively of jade, but it is part of the problem) [21:50:49] amir/awight: yes, but "quality of revision XYZ" is a different keyed judgement than "isVandalism revision XYZ" (which is implicitly a judgement about the diff to XYZ-1) [21:51:01] <_joe_> jynus: well jade given the relationship with other revisions makes the problem even larger, right? [21:51:02] the semantics can be in the judgement key, you don't need separate namespaces [21:51:04] kchapman: This has been useful so far, even if we don't have a conclusion, getting the concepts in front of peers is a big step forward. I don't have specific questions. [21:51:21] #info * freeform text input is what makes judgments meaningful to humans, so we want to keep that. * editing is required because people make accidents or change their mind. * suppression will be required for any freeform text. [21:51:29] unless you want to talk about arbitrary diffs, like between 123 and 456. but i don't think that was mentioned as a use case? [21:51:47] Judging diffs between arbitrary revisions is not a use case we're planning on at the moment. [21:51:47] _joe_: given there is "no limits", I would predict it makes it expenentially worse, but that is only a guess [21:51:51] TimStarling: that kind of thing proved a real pain for wikibase... [21:52:04] DanielK_WMDE_ and other TechCom folks is there anything else? (9 minutes left) [21:52:04] awight: are you keeping the functionality to judge jade judgements too? [21:52:05] <_joe_> #info < jynus> * normalizing jade content as regular content will decrease the revision workflow scalability [21:52:23] awight: MCR is a major difference for page-level data. revisions don't get renamed or merged, but pages do. MCR means the data is "on the same page" logically, so watchlist, delete, discussion are all associated with it. As well as API queries can generate and search based on page ids and page titles the same way all other MW APIs work. Storing it under a separate namespace for page-level data requires re-invention of all that, and [21:52:23] ironically by being a wiki page will beak most expectations I have of being a wiki page. [21:52:26] awight: with infinite recusrivity? [21:52:52] #question I understand that Jade namespace only hosts info like diffs, etc; not actual content? I say so because of privacy reasons (think on an outing edit we need to remove, etc.) [21:53:02] jynus: Same as a normal wiki page, but it's not possible to make cycles. [21:53:23] Hauskatze: it will contain free form content. hence the requirement for suppression [21:53:35] DanielK_WMDE_: is there an MCR content type that is *not* pulled forward into new revisions? That is, I want to record a judgement on revision XYZ that will not show up if someone queries that content type for revision XYZ+1 [21:53:54] can we come to a conclusion as to whether we are recommending MCR? [21:54:03] Hauskatze: It will contain actual content, but not the actual content of the diff/revision it describes. Jade is "real contenet" the same way a talk page is real content. [21:54:20] cscott: it would be possible to achive this, but then your judgement would not be editable [21:54:59] TimStarling: i recommend mcr per-page and per-user judgments. not for revision or diff level stuff [21:55:03] hm. if you allowed branching it potentially would be. but that's not in the MCR roadmap yet. [21:55:25] okay 5 minutes left. Last change to get things into the minutes with the #info tag [21:55:30] <_joe_> #info < DanielK_WMDE_> i recommend mcr per-page and per-user judgments. not for revision or diff level stuff [21:55:43] Krinkle: I mean, think of diff 12345 that outs an editor and an OS removes it, if it happened to be "judged" if on Jade: there's nothing about that outing but just "diff" "bad faith" (no private data) then I guess I'm okay. [21:55:44] DanielK_WMDE_: it's a little awkward to have to change representations based on the entity type? [21:55:53] DanielK_WMDE_: do you have any recommendation for other types of judgments? [21:56:24] awight: i'd really liek to see some more exploration of what kind of custom extra tables for querying one might expect. and clarity of whether extra tables for associating judgment tables with their subjects are planned. [21:56:25] Hauskatze: ah yes exactly. It will refer to a missing revision, but with no clues as to the content. If the judgment.notes repeat some of the bad stuff, that will have to be suppressed separately. [21:56:38] Hauskatze: if someone uses JADE to out someone or post illegal/defamatory content than that edit should be suppressed as it would in any other namespace, and we need to be able to technically support this. [21:56:47] awight: if such association tables are needed, i'd recommend using separat tables for different target types, instead of having polymorphic fields [21:56:48] Hauskatze: Indeed. It is possible that a user says (with their own words) about it, but that is the same as if someone reports a problem to WP:AN and mentions the problem too much. That will always require additional surpression. [21:56:51] <_joe_> cscott: well they are different entities and it's awkward to treat an object linked to them as exactly equal [21:56:52] i think i would like to see the same MCR content type used for all judgements, but a specific namespace selected for revisions, diffs, admin actions, etc which is *not* jade-specific [21:56:55] Once concern I have is that overseers will be annoyed by editing JSON in order to suppress. Maybe rollback alone will be good enough. [21:57:26] cscott: there is no change in representation, just in namepsace and slot name [21:57:30] "if such association tables are needed, i'd recommend using separat tables for different target types, instead of having polymorphic fields" <- I second this very much, look at change_tag table.... [21:57:33] DanielK_WMDE_: makes sense, I hate polymorphic db columns as well. [21:57:44] that is, if we decide to associate other info with (say) admin actions in the future, it would be better to use a new MCR slot on the generic name (strawman) Entity:AdminAction/345 than have everyone invent new namespaces [21:57:49] <_joe_> Amir1: +10 :) [21:57:56] harej: *if* editability is *really* needed for other kinds of content, then they should indeed be wiki pages. [21:57:56] awight / harej / Krinkle - thanks, it's clear now for me. Indeed suppression should be allowed in that namespace, but as I understand normally their contents will just be the diff number and the "tags" [21:58:07] to be honest I don't know why judgements about pages are even included in this proposal [21:58:09] awight: I assume the JSON will not be in any way exposed to users for Jade, right? that's just internal. More like Wikidata pages, not like EventLogging Schema pages. [21:58:14] there's no ORES for pages [21:58:18] harej: but i'd like to see more extensive justification for that [21:58:24] The UI can be as pretty as you want. [21:58:30] Krinkle: not yet [21:58:38] People can edit json directly [21:58:41] #info let's not have polymorphic columns when designing the secondary tables [21:58:46] like template metadata [21:58:48] OK. but that's not a technical issue. [21:58:54] Ie, instead of Jade:AdminAction/345, Annotation:AdminAction/345, CoolNewThing:AdminAction/345, there would be one canonical title for that concept selected, eg AdminAction:345, and everyone will use MCR the way it is meant to be used in order to chain different data types off that concept [21:59:00] the proposal starts with "let's give feedback to ORES" and ends with "let's do all our hobby horse projects while we're at it" [21:59:21] :P [21:59:24] TimStarling: there is ORES for pages, e.g. drafttopic [21:59:55] okay one minute, last chance [21:59:59] ORES also does wp10 qualifications, which I assume are page-associated effectively as of a particular revision. [22:00:12] Krinkle: no, only revision-associated [22:00:24] you can query "latest assessment", of course [22:00:31] Yeah [22:00:45] So if everythign is revision associated, what is an example of something page related? [22:00:45] #info please do not roll all arbitrary user-edited page metadata into this proposal [22:00:58] +1 [22:01:17] defining new slots with new jsonish content models is easy enough [22:01:21] Krinkle: draft topic, for example. An article is about "history of cooperatives" for all time [22:01:26] no need to fit everything into the same mdoel [22:01:26] thanks all [22:01:33] #endmeeting [22:01:33] Meeting ended Wed Aug 22 22:01:33 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:01:33] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-08-22-21.00.html [22:01:33] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-08-22-21.00.txt [22:01:33] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-08-22-21.00.wiki [22:01:33] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-08-22-21.00.log.html [22:01:34] o/ thanks! [22:01:45] restating my proposal: Diff:345, AdminAction:345, Special:Revision/345, NormalPageTitle; with the same MCR content type used for each type of entity (as opposed to Jade:Diff/345, Jade:AdminAction/345, etc) [22:02:01] awight: congratulations to surviving this unruly mob ;) [22:02:12] haha it's... sort of fun [22:02:14] #toomanynamespaces [22:02:18] ^ +1 [22:02:42] However, I do support (seriously) a generic approach to meta data storage about revisions and pages. [22:02:43] <_joe_> well better to die by namespaces explosion than by database explosion :) [22:02:54] Krinkle: yeah, but better we all agree on what name to use when adding content about a specific admin action (say) than have that data hanging off multiple different tool-specific namespaces [22:03:04] Agreed. [22:03:13] * cscott dances in joy [22:03:22] Metadata: might be better than Jade:Page/<id> [22:03:28] <cscott> sure [22:03:33] <Krinkle> to match with other associated namespaces, like for Talk. [22:03:57] <cscott> and use MCR consistently for everything, rather than MCR only for articles and JSON directly in Jade:Foo for other things [22:04:00] <Krinkle> There is precedent for that, and also standardisation (an RFC from daniel) to break the link that only 1 namespace is associated (Talk) and to make that one optional. [22:04:04] <awight> That falls apart with e.g. AdminAction, but we haven't spec'ed that out yet [22:05:00] <cscott> awight: i'm just saying to deal with the "naming an admin action" problem separately from the jade issue. [22:05:17] <awight> ah gotcha, that would be great. [22:05:18] <Krinkle> Yeah, all data associated with the page, or with the page as of a particular revision as a whole, could be in MCR directly in the page. The only thing not suitable for in-same-page-MCR would be judgement about specific diffs/edits. [22:05:25] <cscott> once you've decided what to name it (ie, what Title to use), then it should be "obvious" how to get the MCR blob containing the judgement for it [22:06:01] <cscott> Krinkle: right, but there's potentially other things we might want to associate with diffs. annotations, inline chat even. [22:06:25] <TimStarling> Dan Garry so cruelly crushed any hopes we might have of inline chat [22:06:27] <cscott> storage would be obvious if only we could agree on a Title to give for the diffs [22:06:39] <cscott> TimStarling: did he now? [22:06:49] <TimStarling> > Deskana triaged this task as Lowest priority. [22:06:54] <TimStarling> > This seems out of scope of the annual plan, and the relevant teams have all deprioritised it, so I doubt this will be worked on for quite some time. [22:07:15] <cscott> that sounds like a completely normal WMF feature then [22:07:16] <cscott> ;-p [22:07:41] <TimStarling> come back next decade [22:07:49] <cscott> all the good features are skunkworks features [22:09:08] * cscott looks at halfak [22:09:29] <awight> ferretworks, sometimes. also stinky [22:09:33] <Hauskatze> > come back next decade <-- lol [22:10:09] <Krinkle> Amir1: What do the colors and numbers mean in the spreadsheet? [22:11:14] <awight> Krinkle: The numbers are an estimate of difficulty, 0 = impossible 100 = for free. [22:11:16] <harej> Krinkle: in comparing the two implementation approaches i used red/green shading for aspects where one had an advantage/disadvantage over the other [22:11:56] <harej> the top two implementation approaches, I should clarify [22:12:15] <Krinkle> harej: would you like to open it for comments (me or @wmf), or should I comment elsewhere? [22:12:17] <harej> As an exercise we compared implementation strategies for feasibility and product fit [22:12:37] <harej> Krinkle: are you not able to comment? [22:12:58] <Krinkle> It's view only. [22:18:01] <awight> Krinkle: permissions granted. U can have edit too if that's helpful. [22:18:25] <awight> {{done}} [22:22:15] <Krinkle> thx, i'll look through later and leave some thoughts. [22:22:51] <awight> great! Thanks for the tough questions :-)