[07:00:15] <KateChapman>	 #startmeeting RFC meeting
[07:00:36] <awight>	 o/ thanks!
[07:00:39] <KateChapman>	 #topic Today’s discussion: Introduce a new namespace for collaborative judgments about wiki entities https://phabricator.wikimedia.org/T200297
[07:01:47] <marostegui>	 hey
[07:01:49] <KateChapman>	 TimStarling could you kick the bot?
[07:03:09] <_joe_>	 KateChapman: do we need to kick meetbot?
[07:03:12] <TimStarling>	 done
[07:03:18] <_joe_>	 oh ok
[07:03:27] <KateChapman>	 #startmeeting RFC meeting
[07:03:27] <wm-labs-meetbot>	 Meeting started Thu Nov 22 07:03:27 2018 UTC and is due to finish in 60 minutes.  The chair is KateChapman. Information about MeetBot at http://wiki.debian.org/MeetBot.
[07:03:27] <wm-labs-meetbot>	 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
[07:03:27] <wm-labs-meetbot>	 The meeting name has been set to 'rfc_meeting'
[07:03:32] <KateChapman>	 #topic Today’s discussion: Introduce a new namespace for collaborative judgments about wiki entities https://phabricator.wikimedia.org/T200297
[07:03:49] <KateChapman>	 hi! who is here to discuss this RFC?
[07:04:17] <KateChapman>	 and while we are waiting for people to chime in remember that the #info, #link and #action tags are your friends during the meeting.
[07:04:19] <apergos>	 o/
[07:04:26] <marostegui>	 o/
[07:04:28] <harej>	 o/
[07:04:39] <Nikerabbit>	 at least following o/
[07:04:50] <awight>	 :) I'm here to represent the Jade development team.
[07:05:44] <awight>	 It looks like everyone who waved probably has enough background that I don't need to go into an overview of what we're trying to build.  Or should I?
[07:05:47] <_joe_>	 I'm following too
[07:06:07] <_joe_>	 awight: I think it's more useful if you state the questions you want to be addressed in this meeting
[07:06:21] <_joe_>	 it will help keeping the discussion on topic as much as possible
[07:06:29] <marostegui>	 awight: Probably not, but an overview on which point are we now in terms of development, rough timelines and so forth
[07:06:44] <awight>	 marostegui: Okay, thanks for the suggestion.
[07:07:28] <awight>	 So, last time we discussed Jade at this meeting, we identified some questions that needed to be addressed before we could consider deployment.
[07:07:37] <awight>	 The two biggest questions were about storage.
[07:08:32] <awight>	 First was the possibility that Jade judgments, stored as wiki pages, would grow mariadb metadata beyond the tipping point.
[07:09:08] <awight>	 That was mostly addressed by an agreement that we wouldn't deploy to the biggest 5 or so wikis, where the revision table is already close to 100GB or might grow that big in the next year or two.
[07:09:40] <awight>	 The second point, which AIUI is the main remaining question to clear before deployment, is whether our "secondary schema" makes sense.
[07:10:21] <awight>	 This consists of two link tables and two indexes, deployed as custom extension schema.
[07:10:32] <awight>	 https://phabricator.wikimedia.org/diffusion/EJAD/browse/master/sql/
[07:10:58] <awight>	 To help make the context more clear, I went ahead and implemented all the things that would be necessary to maintain these tables.
[07:11:18] <awight>	 Hooks, e.g. https://phabricator.wikimedia.org/diffusion/EJAD/browse/master/includes/Hooks/LinkTableHooks.php
[07:11:47] <awight>	 A maintenance script for cleanup, https://phabricator.wikimedia.org/diffusion/EJAD/browse/master/maintenance/CleanJudgmentLinks.php
[07:12:46] <awight>	 My main goal is to discuss this schema, whether it might be unnecessary, or if there are gotchas that I still need to address.
[07:13:24] <awight>	 If there's time at the end, I'd be happy to get feedback on the view rendering and a proposal for how to edit this content.  Example view rendering: https://en.wikipedia.beta.wmflabs.org/wiki/Judgment:Diff/376901
[07:13:31] <awight>	 <end monologue>
[07:13:36] <awight>	 </>
[07:13:43] <duesen>	 awight: do you have a writeup anywhere of what queries would be run against these tables, and how often?
[07:14:18] <marostegui>	 awight: What is the expected usage for those tables? how many writes? how many reads? how would the clean up work (how often will it run)?
[07:14:28] <duesen>	 in other words, what are the access patterns that drive the schema design?
[07:14:30] <marostegui>	 awight: expected growth for those two tables?
[07:14:35] <awight>	 duesen: There's some discussion in Phab notes, but IMO I gave bad examples, so I'm concerned these might not help here.
[07:15:24] <awight>	 The main use for the link tables will be to join revision pagers like Special:RecentChanges to judgment pages targeting those revisions.
[07:15:36] * awight dusts off ill-advised example query
[07:16:39] * RoanKattouw perks up at the mention of RecentChanges queries
[07:16:53] <duesen>	 awight: Special:RecentChanges is very high traffic. Is the idea to hook into the RC feed and query the JADe tables every time someone looks at RC?
[07:16:58] <duesen>	 or would that be an opt-in feature?
[07:17:10] <awight>	 Okay here's the example query https://phabricator.wikimedia.org/T202596#4631229
[07:18:13] <awight>	 marostegui: The expected usage is, writes will only happen when a judgment page is edited, maximum about 1% of the total volume of edits on a wiki.
[07:18:40] <awight>	 marostegui: Read usage is, joining on these indexes for every revision pager on the wiki.
[07:19:08] <TimStarling>	 joining recentchanges to jade_diff_judgment, but then you only get a page ID
[07:19:19] <awight>	 marostegui: Expected growth is also up to a rough maximum of 1% of the number of revisions being added to a wiki.
[07:19:28] <TimStarling>	 joining is cheap enough but then if you have to read the page and parse the JSON that would take longer
[07:19:56] <TimStarling>	 unless it is just some little icon link that goes to the JADE page?
[07:20:13] <marostegui>	 awight: The writes will be append only I guess?
[07:20:16] <RoanKattouw>	 The impact on the query itself looks like it'd be pretty low, it'd be no worse than the existing joins against watchlist (which happens sometimes) and page (which happens often but not always)
[07:20:16] <duesen>	 even if it's just a link, you need to join the page table to get the title
[07:20:34] <awight>	 duesen: Exactly, the initial deployment would almost certainly be protected by an opt-in before joining in RC pages.  Actually, harej was thinking that our first phase would be to only join in Special:Watchlist.
[07:20:54] <TimStarling>	 yeah, it could join with the page table, that would be OK
[07:21:11] <apergos>	 cn you talk about later phases too please?
[07:21:19] <duesen>	 +1
[07:21:33] <duesen>	 you don't want to do schema changes on large tables when you go from beta to fully deployment
[07:21:40] <RoanKattouw>	 You'd have to write the query carefully because you're  joining against the page table twice, but it'd be fine. Most RC entries are going to be non-Jade anyway, you wouldn't usually end up with a result set of 50 changes that are all JADE
[07:22:22] <marostegui>	 duesen: good point, if we can avoid schema changes on larger tables that'd be a good plus
[07:22:30] * duesen thinkgs a subquery may be better than a join, or even programmatic post-processing of the result set.
[07:22:33] <harej>	 My hope was that in developing something for watchlists, we would also be able to have such features deployed for recent changes.
[07:22:33] <RoanKattouw>	 Oh wait, I misunderstood, this is for RC to find judgments associated with the edits being displayed, not the judgments themselves
[07:22:55] <awight>	 TimStarling: Thanks for bringing up the judgment content--there are a few options here.  We can start with a simple "5 judgments" pill which doesn't require fetching the content.  Alternatively, we could fetch judgment revision metadata and show them nested "enhanced RC"-style.  We should never have to pull judgment content for this view.
[07:23:29] <RoanKattouw>	 duesen: There are joins against page and watchlist in these queries already, I think an indexed join against the jade table will be fine, it's basically the same hting
[07:23:33] <duesen>	 harej: since both are based on the recentchanges table and share a lot of code, that's sensible. But building something that forks for opt-in on watchlist, but will not would for full-use on RC, would be sad.
[07:23:55] <awight>	 apergos: Later phases have changed since we last spoke :-)  Most importantly, we *never* want bot-authored content in this namespace now.  My argument for that is here: https://www.mediawiki.org/wiki/Extension:JADE#Bots,_abuse,_and_quality_standards
[07:24:39] <RoanKattouw>	 awight: Hmm wait, a revision can have multiple judgments? So the join that you outlined could potentially duplicate RC rows then?
[07:25:10] <harej>	 If a revision has multiple judgments they would still lie on the same judgment namespace page
[07:25:22] <duesen>	 awight: "we could fetch judgment revision metadata and show them nested "enhanced RC"-style.  We should never have to pull judgment content for this view." -> that doesn't make sense to me as a use case. when looking at RC, I'm porimarily interested in the *judgment*, not edits to the judgment.
[07:25:34] <awight>	 RoanKattouw: Sorry, you're right--the "5 judgments" pill is not possible.
[07:25:43] <RoanKattouw>	 Oh right, the (revision, judgment) index is a unique index, so there's a 1:1 mapping from rev ids to judgment page ids
[07:25:43] <duesen>	 I'm surprised that there are no plan to ever display anything like "this edit was judged to be bad"!
[07:25:53] * RoanKattouw un-freaks out
[07:25:53] <duesen>	 is this not the idea? people will assume that and ask for it
[07:25:59] <duesen>	 and the schema as proposed can't deliver
[07:26:17] <TimStarling>	 there should be a unique key just on jade_diff_judgment
[07:26:39] <apergos>	 the nice thing about rc is that besides authot, byte count there's a big fat comment which gives the viewer something to latch onto; what would that be for the judgment info?
[07:26:42] <RoanKattouw>	 For mapping judgment page IDs back to revisions?
[07:26:43] <marostegui>	 awight: How will you "protect" against bots abuse?
[07:26:52] <TimStarling>	 I mean jade_diff_judgment.jaded_revision
[07:27:11] <TimStarling>	 a unique key on (revision,judgement) means you can have multiple rows for each revision
[07:27:20] <TimStarling>	 need a unique key just on (revision)
[07:27:37] <RoanKattouw>	 Yes you're right
[07:27:40] <harej>	 duesen: from a UX perspective, the watchlist display is already very complicated. What I currently have in mind is that for a given diff, there is an indicator of “someone already looked at this edit” vs otherwise. This way, a user would not have to do duplicative work.
[07:28:05] <RoanKattouw>	 Assuming that it really is a 1:1 relationship. You're right that the index as written is many-to-many, I misread that (I blame the late hour)
[07:28:08] <harej>	 Someone already looked at, and rendered an opinion on, the edit, that is.
[07:28:09] <awight>	 duesen: I think you're right that people will want to run queries against the content as well, currently I'm planning for that to do into ElasticSearch, but I'm not sure if that'll be enough to satisfy.  Maybe we would need to introduce additional secondary tables if people demand it.
[07:28:23] <duesen>	 harej: and you do not anticipate people asking to see immediately how that other person judged that edit?
[07:28:35] <duesen>	 *immediately see
[07:29:01] <harej>	 It’s certainly possible.
[07:29:14] <RoanKattouw>	 I guess if you wanted to implement something like that, now would be the time to work that into this schema. You could add one or more summary fields to these judgment tables
[07:29:20] <TimStarling>	 our revision list UIs are very cluttered, everyone just tacks on features
[07:29:24] <duesen>	 awight: querying *by* judgment is probably best left to elastic. I'm wondering about querying *for* judgment.
[07:30:05] <awight>	 duesen: Your point about judgment edit vs judgment is critical, and I don't have a good answer at the moment.  noting for myself...
[07:30:54] <duesen>	 awight: e.g. having something like a "judment summary" in the table, for display. To be clear, I'm not saying that's a must-have. I'm just saying it would not be great if that would have to be added later, instead of being part of the design from the start,.
[07:30:57] <apergos>	 think about what's going to be beneficial to the patroller or vandalism fighter when they look at rc output; I'm not sure I understand who your target audience is with the results you want to display, as you have described them
[07:31:13] <awight>	 I had imagined that people would see the recent change in one line, then a "j" letter for "it's been judged", plus an indented, enhanced RC line which shows what that judgment edit was, but I think you might be right.
[07:32:45] <duesen>	 the first thing they'll as for is to see "j+" or "j-".
[07:32:51] <awight>	 marostegui: We have to rely on social conventions to protect against bot abuse.  This mostly works for the rest of wiki, but it does take a community effort, and we still need some technical monitoring to catch floods of bad data getting inserted.
[07:32:58] <KateChapman>	 *** about 30 minutes left ***
[07:33:26] <awight>	 TimStarling: > need a unique key just on (revision)
[07:33:33] <awight>	 ^ thanks for noticing, that's right!
[07:34:34] <marostegui>	 awight: Is there no way to prevent it in a technical way? It could be a target to spam or just vandalism which could end up with lots of unexpected growth and writes if some bots start to judge every single page
[07:34:46] <marostegui>	 awight: Maybe some sort of rate limiting per page per time?
[07:34:48] <apergos>	 we want to prevent the flood before it gets in eh
[07:35:23] <harej>	 marostegui: to the extent these are wiki pages, the technical restraints that apply to wiki pages would apply to Jade judgments as well
[07:36:16] <duesen>	 yea, i agree that the threat of spam is no worse than for regular pages - *if* all the spam protection mechsnims apply.
[07:36:25] <_joe_>	 marostegui: I'd prefer a kill switch
[07:36:28] <duesen>	 abuse filter integration should be looked into
[07:36:32] <duesen>	 it's not trivial
[07:36:40] <awight>	 duesen: just noting that there are other types of judgment beyond "damaging", e.g. "content quality".  I don't think we want to pack anything more into the letter matrix in the sidebar, but I do agree that we might want this data to be available in the link table for filtering or highlighting.
[07:36:44] <marostegui>	 harej: Yeah, but this is a new feature, so I was thinking that if we can implement some protection from the start (so way of rate limiting) that would be an additional safety net against unexpected table growth/writes
[07:36:46] <_joe_>	 specifically a way to stop addition of new judgements
[07:36:52] <marostegui>	 _joe_: that is also a must I would say
[07:37:07] <TimStarling>	 you could protect the whole namespace with an AbuseFilter rule
[07:37:16] <TimStarling>	 or throttle or whatever
[07:37:25] <duesen>	 awight: yea, i agree. but people will ask for immediate access to that info. if you are ready to just tell them no, fine.
[07:37:31] <apergos>	 a kill switch is fine but it is after the fact; I like the rate limiting being applied right away so damage is limited from the start
[07:37:31] <awight>	 marostegui: We have all the usual technical approaches at our disposal, yes.  We can do rate limiting, blacklisting, spam filtering, permissions configuration and so on.
[07:37:32] <marostegui>	 Sure, if that works, I am fine, but I think it needs some way to throttle, at least at the start until we see how it is used
[07:37:49] <awight>	 (ah harej beat me to it by 5 minutes)
[07:38:02] <RoanKattouw>	 The DB load of adding a thousand new judgments is the same as that of creating a thousand pages with a single category, AIUI, and less than that of creating a thousand pages with two categories each
[07:38:26] <awight>	 duesen: https://phabricator.wikimedia.org/diffusion/EJAD/browse/master/tests/phpunit/AbuseFilter/AbuseFilterTest.php
[07:38:37] <awight>	 :) I verified AbuseFilter and SpamBlacklist integration.
[07:38:42] <RoanKattouw>	 So unless I'm missing something, I don't see how this would be a more worrisome abuse vector than the existing ability to create pages
[07:38:49] <TimStarling>	 there's $wgNamespaceProtection
[07:38:49] <duesen>	 marostegui: why do you conisder this more problematic for jade pages than for other pages? what drives the need for extra caution?
[07:39:16] <awight>	 My big breakthrough was actually the wikitext rendering in our ContentHandler, so it can be scanned like normal articles.
[07:39:23] <RoanKattouw>	 Yeah the kill switch is $wgNamespaceProtection[NS_JUDGMENT] = 'steward'; or something
[07:39:24] <duesen>	 awight: oh nice!
[07:39:31] <marostegui>	 duesen: It is a new feature, and I think we should make sure we have some way to be protected from the start rather than have to develop it in a rush after deploying it.
[07:39:49] <marostegui>	 If it is already there, I am happy! :)
[07:39:53] <TimStarling>	 so no need for special kill switch features, we have enough already
[07:40:30] <RoanKattouw>	 marostegui: For context, $wgNamespaceProtection is an MW config setting that lets you say "only users with this right are allowed to edit in this namespace", so with one line of config we can apply extremely restrictive permission controls to the JADE namespace
[07:40:39] <duesen>	 marostegui: we should ensure that the regular mechanisms properly work for jade pages. if they do, i think they are sufficient. if jade would just bypass abouse filter or throttles or token checks or something, that would of course be bad.
[07:41:00] <awight>	 +1.  I'm happy to set rate limiting, but would push back on any requests to block to certain user groups, etc.
[07:41:01] * duesen can't type, needs more coffee
[07:41:08] <marostegui>	 RoanKattouw: Great then!
[07:41:22] <apergos>	 but blockng bots is fine, yes?
[07:41:33] <awight>	 apergos: yes--is there a way to do that?
[07:41:39] <awight>	 ah
[07:41:53] <apergos>	 well that's the question
[07:42:10] <awight>	 well it depends on the mechanism.  We're going to have almost exclusively API-mediated edits, so hard to say what's a bot then.
[07:42:17] <marostegui>	 duesen: Agreed - just making sure. As I read https://www.mediawiki.org/wiki/Extension:JADE#Bots,_abuse,_and_quality_standards and I wasn't clear to me whether we were just leaving this to have a good netiquette or we do have proper rate limiting in place in case we have to apply it. I don't know MW that well :)
[07:42:33] <duesen>	 i don't think there is - or can $wgNamespaceProtection be used to say "must NOT be in group X"?
[07:42:44] <TimStarling>	 is this API action=edit or a special action?
[07:43:09] <KateChapman>	 (reminder #info will get things into the minutes)
[07:43:10] <RoanKattouw>	 MW isn't really set up to have "is not in group X" permission controls AFAIK, but if JADE has its own API module(s) it could throw errors if $user->isBot() returns true
[07:43:27] <duesen>	 marostegui: there is rate limiting, but I'm not sure it can be set per-namespace. afaik it's a global thing
[07:43:27] <awight>	 TimStarling: some tools might use action=edit, others will use an API provided by the extension
[07:43:55] <duesen>	 RoanKattouw: works if all edits happen via api
[07:44:02] <RoanKattouw>	 (if anons can add judgments, such a restriction would be trivially circumventable though)
[07:44:12] <duesen>	 but i think the current pla is manually edited json in EditPage
[07:44:18] <duesen>	 awight: can you confirm --^
[07:44:27] <TimStarling>	 isBot() is a very high level privilege, seems weird to use that for abuse control
[07:44:30] <awight>	 We're hoping that nobody ever has to edit json :)
[07:44:43] <awight>	 duesen: ^
[07:44:46] <TimStarling>	 isBot() causes edits to be hidden from RC
[07:44:49] <RoanKattouw>	 Yeah we don't really ever do reverse permission controls like that
[07:44:57] <duesen>	 awight: +1
[07:45:43] <awight>	 "bot" is an unlikely user group to get into trouble here, I'd think.  With a registered bot user, we can detect and shut them down pretty quick I'd think.
[07:45:46] <duesen>	 can we just rely on "bots should not make jade judgments, and communities should enforce that by bot policy"?
[07:46:13] <RoanKattouw>	 They don't really make much sense in MW's permissions model, and are often going to end up being circumventable. But depending on what the architecture looks like you could at least add frictions for bots trying to make edits
[07:46:14] <duesen>	 awight: who's responsible for monitoring for that problem that?
[07:46:47] <awight>	 duesen: The argument I'm making now is a bit different than last time.  It's that judgments are only high-quality if a human is making them, basically bot opinions are "non-notable".
[07:46:48] <KateChapman>	 *** 15 minutes remaining ***
[07:47:31] <duesen>	 awight: yea, bot judgments aren't desirable. someone will still try.
[07:47:41] <awight>	 duesen: The scoring platform team will have monitoring for our pilot wikis, with metrics like "# of judgment pages" and "# of judgment revisions created"
[07:47:57] <duesen>	 perhaps even maliciously, to poison the ORES model
[07:48:03] <marostegui>	 ^ that is my worry
[07:48:09] <awight>	 duesen: Agreed they will try, so I'm hoping that we can establish that bot opinions are meaningless and should be reverted in this domain
[07:48:11] <apergos>	 a community may disagree with the ORES team's view of bot edits; and then what?
[07:48:20] <awight>	 apergos: Definitely a possibility!
[07:48:30] <TimStarling>	 disable JADE on that wiki
[07:48:31] <awight>	 Then we have... wikipedia ;-)
[07:48:34] <duesen>	 awight: if it's malicious, the bot withh not have the bot flag.
[07:48:40] <awight>	 +1 ^
[07:48:53] <duesen>	 but anyway, this is a tangent. it's not really different from other spam fighting, and the community is pretty good at that
[07:48:59] <awight>	 I don't have plans to do anything with the bot flag, yeah
[07:49:06] <Nikerabbit>	 I am little bit skeptical of having special bot restrictions for this thing... our tools should be cabable of detecting and handling issues caused by bots for everything
[07:49:10] <TimStarling>	 this was the advantage of putting JADE in pages, we have lots of tools
[07:49:56] <awight>	 All I can say about this is that we're no longer planning to allow bulk imports in a later phase, so I think we have a better story to tell the communities.  For example, we decided not to import a trove of similar data from Huggle, because if it isn't collaboratively created, it's not Jade content.
[07:50:20] <awight>	 TimStarling: 8D
[07:50:38] <marostegui>	 awight: I've got another question, about the clean up script, how will that work? I mean, will it be running constantly or will it run on a daily/weekly/xx basis?
[07:50:40] <duesen>	 the only question left for me is: is the page ID enough info in the proposed tables, or do they need some kind of "data" or "summary" field, for efficiently injecting more than just a link into RC or Watchlist views
[07:51:43] <awight>	 marostegui: Good question--it should never be needed.  At first, we run once a week or something.  If corruption is detected, we try to fix our hooks so that the maintenance becomes increasingly irrelevant :)
[07:51:50] <RoanKattouw>	 #info 23:50:40 <duesen> the only question left for me is: is the page ID enough info in the proposed tables, or do they need some kind of "data" or "summary" field, for efficiently injecting more than just a link into RC or Watchlist views
[07:52:02] <awight>	 ^ thanks!
[07:52:31] <awight>	 So ah while I have people on the horn, I'd like to ask if we should be cloning the Wikidata edit interface.
[07:52:40] <RoanKattouw>	 +1 to what duesen said, I think this abuse control stuff is a bit of a red herring because it's not materially different from abusers (or bots) editing/creating plain wiki pages
[07:52:47] <marostegui>	 awight: So the idea for that script is basically to clean up "bad" judgments?
[07:52:49] <awight>	 I like it, but want to know if the authors would do anything differently if they could do it again?
[07:53:14] <awight>	 marostegui: exactly, it's for the cases where our hooks failed to maintain data integrity.
[07:53:24] <marostegui>	 awight: got it - thanks
[07:53:29] <TimStarling>	 if summary data is needed, it can be added later one way or another, I don't think that blocks the RFC
[07:53:33] <TimStarling>	 I think the RFC should go to last call
[07:54:08] * awight looks warily around the room ;-)
[07:54:16] <awight>	 and wearily
[07:55:40] <RoanKattouw>	 I guess this is going to be a small table and easy to ALTER later, is that the idea?
[07:55:50] <duesen>	 awight: don't clone the wikidata edit interface. wikidata uses a *completely* different data model. the fact that it uses json as a storage format is really not relevant. and the JS code needs a rewrite
[07:55:57] <RoanKattouw>	 We typically strive for some amount of future-proofing in schema designs
[07:56:04] <KateChapman>	 *** 5 minutes left, please add anything to the minutes that you want noted ***
[07:56:21] <marostegui>	 RoanKattouw: Once it is deployed, it will not be that easy to alter (in most cases)
[07:56:24] <awight>	 #info 07:27 < TimStarling> a unique key on (revision,judgement) means you can have multiple rows for each revision[. You] need a unique key just on (revision)
[07:56:34] <duesen>	 +1 for last call on this
[07:56:43] <awight>	 duesen: ah I meant the UI concept, not the code itself
[07:57:12] <duesen>	 awight: for that, ask UX folks :)
[07:57:12] <RoanKattouw>	 marostegui: Yeah that's why I'm a bit wary of Tim's assertion that summary data can be added later
[07:57:31] <RoanKattouw>	 Unless "later" means "after the RFC is approved but before the tables are created in production"
[07:57:34] <marostegui>	 RoanKattouw: If it has a few rows (meaning less than 1000 or so) it can easily be alter directly on the master (still depends) but if not, it has to be done on a per server basis in most cases. Just saying
[07:58:14] <TimStarling>	 yeah well, you know what I think about schema changes on slaves
[07:58:16] <duesen>	 awight: what's your size estimate for the table?
[07:58:25] <TimStarling>	 you know you can switch DC now faster than you used to be able to switch a single master
[07:58:27] <marostegui>	 TimStarling: Yeah, but that is a different topic :)
[07:58:32] <awight>	 RoanKattouw: +1 I'll think about how to add the judgment data into our link table, probably just a byte for each of the 3 schemas so far.
[07:58:32] <TimStarling>	 you shouldn't have an excuse
[07:58:47] <TimStarling>	 what was it last time, 7 minutes?
[07:58:57] <marostegui>	 TimStarling: less than 5
[07:59:17] <awight>	 duesen: The size estimate is that, once the community is warmed up, the table will grow at 1% of revision growth.
[07:59:37] <TimStarling>	 so the secondary data schema can be changed later at a cost of 5 minutes r/o time, most likely as part of a batch of other schema changes
[07:59:37] <KateChapman>	 #info 11:59 PM <awight> duesen: The size estimate is that, once the community is warmed up, the table will grow at 1% of revision growth.
[08:00:01] <KateChapman>	 okay let's wrap things up. Anything else needed for the minutes?
[08:00:03] <duesen>	 awight: right. that's totoally fine in terms of storage space, but still makes schema changes non-trivial work.
[08:00:05] <TimStarling>	 assuming you don't just add a new summary table and join to it
[08:00:06] <marostegui>	 TimStarling: We have 8 masters, but I don't think this is part of this discussion anyways
[08:00:12] <KateChapman>	 awight: you at a good place?
[08:00:20] <RoanKattouw>	 awight: I think it's not that important to be stingy with space there, I'm more concerned about not having to add it to the schema later
[08:00:37] <KateChapman>	 *** we are at time ***
[08:00:46] <awight>	 +1 to the concerns about including the judgment data, it makes sense
[08:00:46] <RoanKattouw>	 We have relationship tables like this that spend more space on rollup/summary data than they do on describing the actual relationship
[08:01:03] <awight>	 ^ good to know!  I like that approach.
[08:01:10] <awight>	 very graph-y
[08:01:22] <RoanKattouw>	 Good point re a new summary table, since this one has auto-increment IDs that 'll be easy to do
[08:01:50] <RoanKattouw>	 #info Per Tim, a new summary table could be added later and JOINed in
[08:01:53] <duesen>	 yea, we have NameTableStore for this now.
[08:01:58] <duesen>	 works like a charm
[08:02:13] <duesen>	 (if the set of possible "names" is small)
[08:02:21] <KateChapman>	 thanks all who joined.
[08:02:32] <KateChapman>	 #endmeeting
[08:02:33] <wm-labs-meetbot>	 Meeting ended Thu Nov 22 08:02:32 2018 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)
[08:02:33] <wm-labs-meetbot>	 Minutes:        https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-11-22-07.03.html
[08:02:33] <wm-labs-meetbot>	 Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-11-22-07.03.txt
[08:02:33] <wm-labs-meetbot>	 Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-11-22-07.03.wiki
[08:02:34] <wm-labs-meetbot>	 Log:            https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-11-22-07.03.log.html
[08:02:37] <awight>	 KateChapman: Thanks for moderating!
[08:02:48] <RoanKattouw>	 I don't think this is really a NameTableStore use case, but either way you can join by ID
[08:03:45] * awight pats self on head for following best practice and including an autoincrement ID
[08:03:47] <KateChapman>	 awight: no problem. Since we weren't great with the minutes this meeting if there is anything else you think should be noted could you add to the phab ticket?
[08:04:04] <awight>	 KateChapman: will do!
[08:12:27] <awight>	 duesen: I just noticed something awkward about adding judgment values to the link table--we're anticipating adding more judgment schemas later.  This suggests that each schema (e.g. "damaging", "content quality") should be stored in its own table, right?
[08:15:08] <awight>	 making a note in the task, will look at this later...
[08:15:09] <duesen>	 awight: not necessarily. the summary is basically a denormalization aimed at display in the rc feed. it could just be a string, or a reference to a string in another table.
[08:15:38] <duesen>	 there's no need to represent the data "properly" here, if it's just for display.
[08:15:53] <duesen>	 (i18n is a concern though, if it's a plain string)
[08:15:54] <awight>	 But to query efficiently it would have to be a single field per schema.
[08:16:51] <duesen>	 In my mind, that field is never queried *by* (WHERE), only queried *for* (SELECT)
[08:18:00] <awight>	 I was imagining "by", like filtering for judged-damaging but ORES-nondamaging
[08:18:26] <duesen>	 then we are discussing a completely different use case now.
[08:18:36] <duesen>	 i was just optimizing injection of info into the rc feed
[08:18:42] <awight>	 :) harej ^ just a note for us to get into later
[08:18:46] <duesen>	 querying by judgment is probably best done with elastic
[08:19:06] <duesen>	 if you want to do it in sql, you'll indeed need dedicated schemas
[08:19:09] <awight>	 Currently, "patrolled" and ORES scores are integrated into RC filtering though
[08:19:43] <awight>	 I think we have to go back to our use cases... thanks for pointing out the denormalized alternative too.
[08:19:43] <duesen>	 patrolled are - ORES is not really, as far as I know. At least not properly/efficiently.
[08:19:58] <duesen>	 Patrolled has it's own flag in the rc table for this purpose
[08:20:12] <awight>	 the ORES stuff is pretty efficient, we actually join on ores_score
[08:20:51] <duesen>	 oh ok then. but joining on multiple schemas for multiple kinds of judgements will add a lot of weight there.
[08:21:13] <duesen>	 filtering by judgment in RC was not part of what we just discussed, and not mentioned in the rfc afaik.
[08:21:20] <awight>	 yeah it makes me feel a bit ill :)
[08:21:23] <duesen>	 it will need another round of discussion with dbas
[08:21:48] <duesen>	 dedicated tables, properly. or make it "search" and use elastic
[08:23:49] <awight>	 Right on.  Thanks for your time!
[08:23:55] <awight>	 o/