[13:02:33] #startmeeting Language Team office hour - September 2016 [13:02:34] Meeting started Wed Sep 21 13:02:34 2016 UTC and is due to finish in 60 minutes. The chair is Nikerabbit. Information about MeetBot at http://wiki.debian.org/MeetBot. [13:02:34] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [13:02:34] The meeting name has been set to 'language_team_office_hour___september_2016' [13:02:34] Meeting started Wed Sep 21 13:02:34 2016 UTC and is due to finish in 60 minutes. The chair is Nikerabbit. Information about MeetBot at http://wiki.debian.org/MeetBot. [13:02:34] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [13:02:34] The meeting name has been set to 'language_team_office_hour___september_2016' [13:02:51] Welcome to the online+IRC office of the WMF Language team [13:03:09] Our main conversation is happening on Google Hangout/youtube: [13:03:13] #link https://www.youtube.com/watch?v=NXgMZ7myEA4 [13:03:44] Please let us know if you would like to join on the hangout [13:04:01] We will also be taking questions here [13:04:21] Remider that the logs of this channel will be recorded and posted on meta wiki [13:05:33] The recording from the last meeting is at: [13:05:40] #link https://www.youtube.com/watch?v=0FrowkpBEnQ [13:05:50] and logs are at: [13:05:54] #link https://meta.wikimedia.org/wiki/IRC_office_hours/Office_hours_2016-06-15 [13:06:10] and we just finished introductions in the stream [13:11:06] Amir and Runa spoke about compact language links [13:11:25] Now there will be a demo [13:35:13] #link https://blog.wikimedia.org/2016/03/29/wikipedias-essential-vaccines/ [13:37:11] Thanks [13:49:24] nternational Society of Service Innovation Professionals https://www.youtube.com/watch?v=JRHJawWIPXo [13:50:10] http://www.slideshare.net/issip/pres-124-evren-ay-may-18-2016 Pres 124 evren ay may 18 2016 [13:59:43] #endmeeting [13:59:44] Meeting ended Wed Sep 21 13:59:43 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [13:59:44] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-13.02.html [13:59:44] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-13.02.txt [13:59:44] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-13.02.wiki [13:59:44] Meeting ended Wed Sep 21 13:59:43 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [13:59:44] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-13.02.html [13:59:44] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-13.02.txt [13:59:44] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-13.02.wiki [13:59:44] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-13.02.log.html [13:59:44] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-13.02.log.html [14:00:06] thanks everyone who joined here on irc :) [14:00:35] Thanks Nikerabbit [14:03:57] arrbee: bit confusing with two bots [20:59:25] howdy all [21:00:28] * robla fumbles to start up E273 [21:01:41] #startmeeting ArchCom Meeting about Multi-Content Revisions (T107595) [21:01:41] T107595: [RFC] Multi-Content Revisions - https://phabricator.wikimedia.org/T107595 [21:01:41] Meeting started Wed Sep 21 21:01:41 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:01:41] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:01:41] The meeting name has been set to 'archcom_meeting_about_multi_content_revisions__t107595_' [21:01:41] Meeting started Wed Sep 21 21:01:41 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:01:41] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:01:41] The meeting name has been set to 'archcom_meeting_about_multi_content_revisions__t107595_' [21:01:42] hm, I'm still wondering whether we should go for the details questions first to get stuff done, or the broader questions first, for guidance... [21:02:11] #topic Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: https://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:02:53] hi everyone [21:03:08] robla: do you think it would be ok to talk about schema details for half an hour, and the cut off and move on to discussing the migration? [21:03:55] DanielK_WMDE: possibly. what are you hoping we accomplish in today's conversation? [21:04:19] 1) sort out the remaining details of what the schema should look like [21:04:34] 2) get feedback about whether the migration plan is sane [21:05:21] (Hello:) [21:06:02] DanielK_WMDE: I'm assuming we're not ready to actually resolve the schema in the course of this hour though, correct? [21:06:39] not as a final decision. i do hope to get oppinions on my questions. [21:06:42] that plan sounds good to me [21:06:52] and perhaps even answers :) [21:07:06] so, the most important question regarding the schema is whether we should add one layer of indirection, or two. Adding only one layer of indirection means repeating the meta-data about the content of each slot for every revision. [21:07:44] Can you please post an example URL - re "The idea of this RFC is to allow multiple Content objects to be associated with a single revision (one per "slot"), resulting in multiple content "streams" for each page"? In what ways are Wikidata Q items involved here? [21:07:47] Doing it that way keeps the schema simpler, but means a lot of redundand data. The basic schema is then: [21:08:02] Scott_WUaS: they are not involved [21:08:22] Thanks [21:08:24] The "basic" version of the schema looks like this: [21:08:26] [page] --page_current--> [revision] <--cont_revision-- [content] --cont_address--> (text|external) [21:08:38] ok [21:09:08] As an alternative, we can add another table, the "slot" table, to tell us which content belongs to which revision, so the content-meta-data can be re-used for multiple (typically consecutive) revisions [21:09:44] so if we store categories in a separate slot, and the categories are nto touched by 10 edits, we would recycle the meta-data about the content of the category slot 10 times. [21:09:51] the schema would look like this: [21:09:57] [page] --page_current--> [revision] <--slot_revision-- [slots] --slot_content--> [content] --cont_address--> (text|external) [21:10:10] I guess we have no jynus this week [21:10:11] (DanielK_WMDE: Is there an existing example URL which you may develop further?) [21:11:01] schema details: https://www.mediawiki.org/wiki/Multi-Content_Revisions/Content_Meta-Data#Database_Schema [21:11:03] #link https://www.mediawiki.org/wiki/Multi-Content_Revisions/Content_Meta-Data#Database_Schema [21:11:15] #link https://www.mediawiki.org/wiki/Multi-Content_Revisions/Content_Meta-Data#Re-using_Content_Rows [21:11:31] thanks [21:11:48] TimStarling: looks like it... who else would have an oppinion on the schema? [21:12:01] DanielK_WMDE: is there an asynchronous conversation that is still moving forward? [21:12:20] no. not with me anywway [21:12:52] I can try to be surrogate jynus and raise a few of his points [21:13:19] great :) [21:13:20] TimStarling: that would be helpful. [21:13:24] my fear is that most of the asynchronous conversation has been in private email. that makes it hard to then hope for a good public IRC conversation [21:13:29] TimStarling: thanks! [21:13:35] surrogate jynus says: you want to store media info in a slot. Let's have a media_info table [21:13:47] yeah need to distill it down, the email convos were pretty high-bandwidth :) [21:13:48] then that table will be small and easy to handle [21:14:04] DanielK_WMDE:I wonder if it's good to hold current and old content in the same place... [21:14:14] TimStarling: what would the media_info table contain? the actual json blob? [21:14:17] in history, present a union between revision and media_info if users really really want that [21:14:34] unclear [21:14:38] SMalyshev: that's actually a good point leading -> to ideas about partitioning 'hot' and 'cold' data. for another time probably but we need to be thinking about it at some point [21:14:54] if we're already refactoring DB structure... [21:14:58] SMalyshev: so far, the answer looks like yes: moving data between tables when the current version becomes an archived version is a major pain. [21:15:10] (nitpick: if the slot table is only used as a many-to-many binding between revision and content, can we just call it revision_content? it's hard to keep up with the terminology) [21:15:33] SMalyshev: we (tim, mostly) moved main storage away from that 10 years ago, we are now planning to mave image meta data away from it too. but it's a possible parameter for partitioning. [21:15:43] tgr: I think the idea is that some of the slots are revision_content_derivedcontent thought. [21:15:52] from my perspective, what I'm really lacking about this MCR thing is any context on its higher-level purpose and utility. All of the details are deep, but no simple big picture about why we're doing this. [21:16:07] tgr: E.g. revision 3 -> wikitext -> JSON representation of the template or whatever. [21:16:12] tgr: i was called that, I changed it to be in line with the use of "slots" in the conceptual model. i don't care about the name [21:16:27] bblack: at a high level, we want to be able to break things out of wikitext into structured data that's still atomically versioned with the wikitext [21:16:35] DanielK_WMDE: what's the idea behind reusable content? I.e. is that useful for something? [21:16:46] brion: higher-level than that :) [21:16:56] bblack: there's a list of use cases [21:17:00] :) [21:17:02] I mean, wikitext does have some kind of structure. a single content can hav einternal structure in general [21:17:07] https://www.mediawiki.org/wiki/Multi-Content_Revisions#Use_Cases [21:17:21] bblack: "We want to move awat from MW's 1:1 relationship between "page" and "content"." [21:17:24] TimStarling: that "unclear" bit is the problem i have with discussing the "store in dedicated table" option. how will the content be versioned? [21:17:24] Err. Away. [21:17:47] bblack: https://www.mediawiki.org/wiki/Multi-Content_Revisions#Use_Cases [21:17:48] DanielK_WMDE: it would be linked to page and have its own timestamp [21:17:59] like a clone of revision [21:18:08] TimStarling: and it's own edit comment, reference to user, and so on? [21:18:09] TimStarling: So we'd JOIN on string-matched timestamps? [21:18:13] yes [21:18:15] Eww. [21:18:22] no [21:18:30] yes to DanielK_WMDE, no to James_F [21:18:34] Ah. [21:18:36] a related alternative would be to have each 'slot' live in a separate table, but all use the same revision key with metadata in revision. thus text edits would (or could) live in a separate table from revision too [21:18:37] TimStarling: so we would dublicate the revision table for each kind of content, and use unions everywhere we want to list revisions? [21:18:39] So it would have the revision_id in it? [21:18:46] but you'd have a consistent revision_id and place to search on [21:19:15] but there's some benefit in consistency and normalization, especially when we need to bulk-fetch data for dumps or otherwise handle them opaquely [21:19:22] at the SQL level you'd have several totally distinct revision concepts, like how oldimage and revision are separate now [21:19:34] TimStarling: i can't see that working, it sounds hideously complex to me. but maybe i'm just not seeing the elegance of it all. [21:19:35] #chair robla brion DanielK_WMDE TimStarling [21:19:36] Current chairs: DanielK_WMDE TimStarling brion robla [21:19:36] Current chairs: DanielK_WMDE TimStarling brion robla [21:19:37] at the application layer these may optionally be merged by a UNION [21:19:42] (what are the implications for multiple languages and translation here in Multi-Content Revisions, if any?) [21:20:01] brion: so, have one revision table, but basically one "content" table per slot? [21:20:02] * robla steps afk for 2 minutes [21:20:18] Scott_WUaS: interesting question. one _could_ store multiple wikitext Content items as well, one per language [21:20:19] brion: that's more doable, but still needs big jons or unions. [21:20:23] Scott_WUaS: "Complicated". There are options to fundamentally re-work Translate and parallel translation based on MCR, but this is a bit out of scope. [21:20:29] though i'm not sure it's ideal for the way translations get versioned [21:20:33] brion: *cough*DOM-based translation*cough* [21:20:36] FWIW, I think most of those use-cases sound like metadata more than parallel alternative content, except for the ones that seem like they could just be separate objects (e.g. template+css), or embedded documentation [21:20:48] thanks [21:20:59] bblack: the big reason i want MCR for 'separate objects' is atomic versioning [21:21:11] having a high-level abstraction in MW around several similar tables is an idea that was mentioned in that book jynus was passing around [21:21:13] template + css, gadget js+css, etc [21:21:20] you know, feature table and bug table [21:21:26] bblack: File description (wikitext), meta-data (JSON), and file (pointer to the BLOB) versioned together is the ambition. [21:21:30] TimStarling, brion: can we assume that the revision or content tables that would exist per slot would all contain *exactly* the same fields? [21:21:42] no [21:22:05] i think if we had separate tables they'd explicitly want to be different, otherwise it's only a partitioning mechanism [21:22:15] but that changes the interfaces [21:22:16] brion: that's what i'm thinking [21:22:19] if they're exactly the same then you have sharding, and jynus doesn't really seem keen on sharding [21:22:22] i just don't see how they would be different [21:22:30] I'll switch back from being pseudo-jynus to TimStarling for a second [21:22:30] and for data where the structured data would go straight into a table that makes sense [21:22:38] let's do sharding, I like sharding [21:22:38] for where everything's a big blob, i don't see the benefit of splitting [21:22:38] :) [21:22:52] what's your preferred axis to shard on here tim? [21:22:58] TimStarling: Do we have a plan for stopping the current tables from getting "too long" other than sharding? (Ignoring this change, which might make the rate of growth faster.) [21:23:18] TimStarling: yes, +1 for sharding/partitioning. let's have an RFC about that [21:23:33] yups [21:24:23] well, the existing recentchanges partitioning hack splits on user ID [21:24:26] brion: to level do you expect it to be atomic? you'd still be fetching js+css as 2x http fetches, right? it seems like there are ways to solve the problem of always fetching synced revs of such things simpler... [21:24:31] (i like the idea of a 'hot'/'cold' separation with a union-like interface, with a consistent revision id lineage so most things won't notice the difference other than potentially issuing two queries and combining them) [21:24:35] #info discussion of sharding for much of the first part of the meeting [21:24:40] which optimises for contributions queries [21:24:43] brion: re "everythign is a big blob": if we want to move away from that, we need a document oriented db. the content models we have would be a pain to model on an rdbms. not to mention that they would create absolutely humangous tables. [21:24:45] I've been lazily assuming that at some point we'd shard revision based on something (modulo the page_id?) but I don't know what's ideal. [21:24:52] bblack: http? oh no i mean inside, like the parser [21:25:02] or the html that specified which js/css to load [21:25:36] anyway i think we should address sharding/partitioning later, more explicitly [21:25:37] i would prefer to shard by mod(page_id). or timestamp blocks. [21:25:57] Yeah, let's fork that to another RfC. [21:26:12] one possibility is to duplicate the revision table: once with user-based sharding (for contributions), and again with page/timestamp sharding (for history) [21:26:21] so, if that's for another rfc, can we move forward with this one? [21:26:22] denormalize the revision table, in other words [21:26:44] bblack: so the alternative to atomic updates of multiple content blobs in one revision is to build another versioning abstraction on top of multiple pages [21:26:59] bblack: which is certainly possible too [21:27:08] TimStarling: basically, duplicate it. yea. [21:27:23] so, key question: is is ok to maintain the meta-data for all slot content in a single table? [21:27:35] with sharding to be descussed? [21:27:45] I think the key question is project order: does sharding/partitioning block MCR? [21:27:45] brion: or question why we're trying to version-sync css+js inside wiki articles in the first place... [21:27:51] DanielK_WMDE: i say yes, as long as we keep it compact and have a future plan to shard that won't explode based on our changes :D [21:28:11] bblack: well "scratch mediawiki, just use github" is a third option ;) [21:28:30] TimStarling: that's also an important question, yes, though i think we can decide on the schema without knowing whether implementation is blocked on sharding [21:28:34] I suspect jynus is on the verge of vetoing MCR until we have better scalability [21:28:43] it seems to be ok to have _lots of rows_ (tall tables) as long as those table rows are small (narrow) [21:29:09] data size is a relevant metric, yes [21:29:16] TimStarling: i'm fine with him vetoing implementation on this grounds. but i need to know whether and how i should change the design. [21:29:36] implementaion o nthe cluster = deployment [21:29:38] for example, you have to copy all the data in a table during ALTER TABLE, and that is becoming a problem [21:29:54] remember it was a problem in the olden days too [21:30:09] brion: or any of the thousands of saner ways to develop->deploy css and js than "do it inside the wiki it's meant to operate on, shoe-horning it in as if it's like article content, and then remodel the wiki software to support that use case poorly" [21:31:01] bblack: if you want it to be user-maintained, i don't really see an alternative. but the css/js use case isn't really at the focus of this. [21:31:06] (not entirely fair, but as fair as your github retort) [21:31:22] bblack: oh sure, you're not wrong. :) there's tradeoffs in all these directions [21:31:41] and honestly using a git-oriented backend for code? not an awful ideal at all [21:31:53] i'm stilly trying to find out whether i can go ahead with implementing the revision<-slot->content schema [21:32:04] brion: It's on the backlog. Let's not get further distracted from the RfC. ;-) [21:32:07] but even if we broke out gadgets/userscripts we've got these on-wiki data objects :D [21:32:09] yep [21:32:12] or whether all work on this needs to rest until we have an rfc on optimizing revision storage & sharding [21:32:24] or whether there is a concrete request to change the db schema i propos [21:32:25] I don't see how you can implement it if you can't deploy it [21:32:35] I get an impression that jynus has to answer that :) [21:33:17] jynus is always reluctant to use the veto power we keep wanting to give him :) [21:33:18] be gentle [21:33:27] TimStarling: we can get the code ready for deployment while we are also working on, or deciding on, optimization strategies for revision storage. [21:33:37] I don't think we're going to get on board with jynus's idea of splitting the revision concept [21:33:54] but I think we should work by consensus [21:34:03] *nod* [21:34:24] is jynus's idea spelled out somewhere? [21:34:36] so if we want consensus but won't get on board with his idea, then we need to convince him?... [21:34:45] we've got some bits of discussions, no concrete alt proposal [21:34:59] robla: no, not really, he was reluctant to dive in and do fully worked schema [21:35:12] DanielK_WMDE: right [21:35:35] i have tried and failed [21:36:38] DanielK_WMDE: I think one thing that may be slowing this conversation down is it getting too bogged down in details [21:36:59] there's a *lot* to sort through here: https://www.mediawiki.org/wiki/Multi-Content_Revisions [21:37:04] I don't want to get into detail about tactics in this discussion [21:37:40] how would it work to implement it but not deploy it? would you be able to have a feature flag in MW? or would it have to be a branch? [21:37:52] robla: yes, that's why I announced only the schema bit as today's topic: https://www.mediawiki.org/wiki/Multi-Content_Revisions/Content_Meta-Data [21:37:59] Branch or unmerged commit. [21:38:21] robla: that's already quite a bit, but I think it is managable. [21:38:32] where are we meant to have the bigger discussion? I just don't get artchitecting the details before having some consensus that this is the right model for some real use-cases. The use-cases section mentions its own speculative nature, many of them are more metadata than parallel separate content, which is an entirely simpler case to handle. the rest are questionable, IMHO... [21:38:58] maybe that's for my lack of information, but still [21:39:01] TimStarling: We will need feature flags for the migration/transition anyway. So, yes. [21:39:22] it would be nice to have say two initial use cases which will be initially implemented [21:39:31] TimStarling: hopefully, if/the/else cruft can be kept to a minimum be swapping in alternative implementation of the relevant components. [21:40:13] * brion hmms [21:40:18] TimStarling: thw first two in the list: MediaInfo and PageAssessments. [21:40:29] That could work. [21:40:39] MassMessage is also a hot candidate I think [21:40:45] And TemplateData. ;-) [21:40:55] (As it's so simple.) [21:41:00] ok i think i'm going to try fleshing out an alt proposal along some, but not all, of jynus and surrogate-jynus's lines, and we can just compare that [21:41:06] presumably we will have an MCR-aware API, and all the if/else will be in the implementation of that API [21:41:08] it'll be good to have some key use cases to go along with that [21:41:29] RevisionLookup [21:41:33] bblack: if it's editable and versioned, it's not meta-data [21:41:50] cause if we do concentrate on cases where the secondary slots are special kinds of data, maybe extra tables aren't too awful. but maybe they are ;) [21:42:02] TimStarling: yes, exactly [21:42:34] maybe we should start moving towards rev_id being opaque rather than an auto-increment integer [21:42:55] but still an integer? [21:42:56] brion: i'm not thinking of secondary (derived) slots any more. just primary user editable content. [21:43:06] a UUID might make more sense if it is sharded [21:43:06] right, sorry wrong term :) [21:43:13] i mean non-main-wikitext slots [21:43:25] but yes, still an integer initially [21:43:30] but maybe type-hinted as a string [21:43:34] TimStarling: or a time-uuid. gabriel loves those. [21:43:38] TimStarling: for multi-master insert that can be important yes [21:43:48] But they are big. We are trying to make that table smaller, right? [21:43:58] bigints are smaller :) [21:44:00] (but we are discussing the revision table again) [21:44:02] i will just warn about Bigints and the JavaScript/node 53-bit limit though [21:44:15] brion: extra tables would need some PHP-layer abstraction on top of our current DB abstraction, for all code that needs to search or iterate all content. That seems scary. [21:44:29] tgr: very. [21:44:30] reminds me of https://gerrit.wikimedia.org/r/#/c/16696/20/includes/rdbstore/RDBStore.php [21:44:45] * AaronSchulz almost forgot about that, haha [21:44:48] tgr: yeah, at least some would need to add to the tables joined on things. others would not actually need to touch those tables, though, and would only care about what's in revision i think [21:44:50] brion: you are worried that we will exceed 2^53 rows in a table? ;) [21:45:07] (of course half of that was wild experimentation that would never be used) [21:45:11] depends how fine-grained we make editing ;) [21:45:16] AaronSchulz: PTSD flashbacks to that code? ;-) [21:45:16] AaronSchulz: that's basically home grown partitioning, right? [21:45:26] non-sequential revids may be problematic as it'd be impossible to know the order [21:45:37] mianly i was thinking if we do something clever like a 64-bit mini uuid [21:46:02] i'm getting worried that I'm stranded with this with no way to actively move forward. [21:46:11] yeah :( [21:47:00] can i at least get some feedback on "super tall content table" vs "not-so-tall content table + super tall slots table"? [21:47:17] i am strongly in favor of super tall slots table [21:47:22] I like the second one better [21:47:23] as in https://www.mediawiki.org/wiki/Multi-Content_Revisions/Content_Meta-Data#Re-using_Content_Rows [21:47:24] lets us keep the content table much smaller [21:47:26] bblack: maybe you can discuss your concerns on https://www.mediawiki.org/wiki/Talk:Multi-Content_Revisions ? [21:47:43] if we're going to have huge table, it's better to have it as "narrow" as possible [21:47:46] basically this proposal is blocked on deciding how to handle very tall tables, which is something that needs to be decided soon anyway, right? [21:48:00] so maybe just give up for now and make that decision happen as soon as possible? [21:48:01] ok. how about I work on some strawman code that allows us to look at the schema with some data in it, maybe on labs? [21:48:13] would that help, or would it be a waste of time? [21:48:14] is there a wiki page / talk page / phab task that discusses ops concerns with the MCR proposal? [21:48:30] subbu: I'm not aware of any [21:48:43] tgr: i'm not sure there is a generic answer to that question. it may very much depend on the table. [21:49:41] subbu: there is one comment by jynus: https://www.mediawiki.org/wiki/Topic:Tb6fok3z43ar16fe [21:49:58] i frankly can't extract much guidance from it [21:50:56] "I will create an alternative one" -- maybe we just need to nag jynus to write that [21:51:03] thanks for the refresher about the link, DanielK_WMDE [21:51:16] #link https://www.mediawiki.org/wiki/Topic:Tb6fok3z43ar16fe [21:51:27] TimStarling: please do, i'm quite curious [21:52:00] well, on the tactics front, I'm hoping that ArchCom doesn't become NagCom ;-) [21:52:08] heh [21:52:20] or ArgCom... [21:52:47] I think it may be a useful conversation starter to *attempt* to come up with what jynus is shooting for [21:53:00] DanielK_WMDE: I'm up for discussing partitioning, since I still remember thinking about that a lot in the past. My inclination is tall-and-narrow metadata => sharded blobs though [21:53:20] robla: i honestly can't imagine how it would work. if i could, i would have propsoed it. [21:53:49] AaronSchulz: yes, i'm with you there. And I also think we should discuss sharding. [21:53:53] I mentioned some ideas about making revision narrower, jynus was receptive to those [21:54:02] AaronSchulz: who's going to drive that conversation? [21:54:41] like splitting out rev_comment, you know we have a bug to make rev_comment be larger than 255 bytes [21:54:55] * AaronSchulz shrugs...probably would be good to know about what parameters jynus wants [21:55:00] yeah rev_comment and rev_user_text are easy wins [21:55:03] (Hoping all can keep this helpful conversation going - https://www.mediawiki.org/wiki/Topic:Tb6fok3z43ar16fe ) [21:55:43] AaronSchulz: yes, that would be good [21:56:04] TimStarling, brion: so, who's going to write an rfc about optimizing row size in rrevision? [21:56:18] TimStarling: Would we make rev_comment just another slot? [21:56:41] i can do that if TimStarling isn't excited about it, we have some good ideas from last week's offline discussion [21:57:13] * brion compacts ALL the rows! [21:57:19] brion: yay :) [21:57:27] ok brion, compact away, I will comment on it [21:57:30] i'm happy to help and give input, but i don't see me driving this [21:57:33] too much on my plate [21:57:51] no worries [21:57:55] the problem with pausing MCR is: i have mde room for this in my schedule *now* [21:57:55] #info 14:55:00 yeah rev_comment and rev_user_text are easy wins [21:58:06] if we drop this for 3 months, I have *no* idea when i can get back on working on it [21:58:11] great i'll write those up next couple days [21:58:15] it also pushes back the sche4dule for structured commons [21:58:31] do we need MCR for structured commons? [21:58:39] (Thanks, All!) [21:58:51] I mean need like "no way we can do structured commons without it"? [21:58:56] It would be super if we could not delay that again... [21:59:06] #info re "super tall content table" vs "not-so-tall content table + super tall slots table": i am strongly in favor of super tall slots table I like the second one better [21:59:43] #info brion to write up additional RfC on compacting rows in revision table (should apply with or without MCR) [22:00:06] ok...should we end the official part of this meeting on that? [22:00:09] brion: will partitioning be part of that? [22:00:23] * robla plans to hit #endmeeting in 120 seconds [22:00:36] SMalyshev: pretty much, yes. [22:00:41] DanielK_WMDE: not explicitly but i'll mention some related concerns [22:01:06] can expand to that if we decide we must super-prioritize it [22:01:12] SMalyshev: at least if we want to stick to the product requirements as set out by the WMF back in the day. [22:01:17] brion, thanks for taking that on! [22:01:36] :D [22:01:36] DanielK_WMDE: well, you say you can implement it with a feature switch, which should be relatively uncontroversial [22:01:36] so, reading that talk page topic, iiuc, jynus is objecting to using a single unified table for all slots and prefers different tables for different slots? [22:02:09] we can continue the conversation in #wikimedia-tech for those that want to [22:02:27] thanks all! [22:02:32] #endmeeting [22:02:33] Meeting ended Wed Sep 21 22:02:32 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:02:33] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-21.01.html [22:02:33] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-21.01.txt [22:02:33] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-21.01.wiki [22:02:33] Meeting ended Wed Sep 21 22:02:32 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:02:33] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-21.01.html [22:02:33] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-21.01.txt [22:02:33] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-21.01.wiki [22:02:34] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-21.01.log.html [22:02:34] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-09-21-21.01.log.html [22:02:39] DanielK_WMDE: can we extract MVP for commons and see if jynus would be fine with that? or current proposal pretty much sizeof(MVP)? [22:02:42] subbu: sounds like it. but i can't see how that would actually work. and if we want to partition stuff, there are other parameters that seem much saner. [22:03:18] * subbu moves to #wikimedia-tech [22:03:21] SMalyshev: i don't see how. the proposal would still be pretty much the same. [22:03:30] DanielK_WMDE: I see [22:03:47] SMalyshev: we can use an "associated namespaces" like approach. [22:03:56] SMalyshev: that would work at least half-way [22:04:39] DanielK_WMDE, i suppose you mean because introducing a new slot => new table and associated code changes to incorporate it? and huge joins? i can go back and read the backlog above if you guys discussed it already. i came in late. [22:05:07] moving to wikimedia-tech? [22:05:11] sure. [22:05:35] subbu: yes, that and other issues, like tons of redundant php code. [22:06:38] ok i gotta run, will post some docs tomorrow