[20:58:39] archcomm here in a couple, right? [21:01:39] apergos: 1 hour [21:01:47] gah [21:01:52] unless i got my timezones wrong ;) [21:02:15] ok what time utc? [21:02:26] let;s start with that [21:02:44] it is currently 21:02UTC [21:02:48] 22:00 [21:04:31] sorry, I missed the beginning of that. is the archcom meeting not until 22:00? [21:04:59] YairRand: correct [21:05:04] i hope i didn't say 21:00 somewheres [21:05:09] i might have if so sorry :D [21:05:47] brion: you did, on wikitech-l yesterday. might make sense to send a correction :) [21:05:51] poop [21:05:54] ok moment [21:06:40] sent :D [21:06:45] :) [21:06:59] yeah i see i calculated for 2pm PDT instead of PST [21:07:03] timezones are harrrrrd [21:07:16] indeed [21:08:19] anyway we'll be in a hangout for the next 50 minutes so feel free to look over the page & talk page if you're bored for now :) [21:08:27] ah so that's why I was here an hour early [21:08:29] mmeeehhh [21:08:46] sorry apergos :) [21:08:53] https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2 && https://www.mediawiki.org/wiki/User_talk:Brion_VIBBER/Compacting_the_revision_table_round_2 [21:08:58] well what's done is time for me to shovel some food in [21:09:15] hehe [21:09:16] are those not... the same link? [21:09:20] no [21:09:22] user vs user_alk [21:09:23] I see [21:10:21] agh i can't change the /topic [21:10:25] ah well [21:15:33] has anybody seen daniel online today? [21:15:50] (kinzler) [21:17:50] * bawolff not going to be here for the meeting, but I was wondering if we could more clearly document the problems we are running into in the current schema, and what precisely we are optimizing for and why [21:18:06] bawolff: *nod* [21:18:27] I've heard vauge talk about the revision table being too big, so not enough of it fits into the innodb buffer pool [21:18:49] yeah, plus there are some machines with custom indexes on it which is a problem for ops when machines need to be maintained [21:18:56] i need more details from jynus on that part [21:19:16] And something about too many queries end up reading large numbers of rows (Which kind of confuses me, because I'm aware of very few queries on the revision table that read large amounts of rows, except maybe special:Contributions on edge cases) [21:20:00] But I feel like everything I know about what the actual issues are, I've kind of gotten second hand if that makes sense [21:20:33] yeah :) [21:20:47] should make sure we more clearly spell out the issues [21:21:19] we're also kind of combining three updates into one: compaction, prep for multi-content revs, and some general cleanup [21:21:26] should spell them all out separately [21:21:39] is jy nus going to be here for this one? [21:21:42] * apergos hopes [21:21:49] i hope but i don't have a confirmation :D [21:21:54] ok [21:22:41] Which totally reminds me I have to add unit tests to https://gerrit.wikimedia.org/r/#/c/328623/ [21:37:50] 23 minutes to schema meeting :D [21:42:52] ah good we found DanielK_WMDE :D [21:44:32] tick tick tick [21:45:35] :D [21:45:41] 15 minutes to SCHEMA [21:51:04] 10 minutes... time enough to take a bathroom break, make sure I have snack food and water and get settled in [21:58:52] wooooo [21:59:21] * brion coffees up quickly [22:00:50] schema plan discussion about to start :) [22:00:56] I'm here, but I am performing a maintenance [22:01:04] ding [22:01:14] brion do we have a link for information on this discussion? [22:01:21] welcome jynus ! i'll have a few qs for ya later, no rush :) [22:01:49] Zppix: https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2 and https://www.mediawiki.org/wiki/User_talk:Brion_VIBBER/Compacting_the_revision_table_round_2 (talk page) [22:01:50] #startmeeting RFC meeting [22:01:50] Meeting started Wed Feb 15 22:01:50 2017 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. [22:01:50] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [22:01:50] The meeting name has been set to 'rfc_meeting' [22:02:30] !link current notes for our schema change planning at https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2 and the associated talk page [22:02:46] or was that #link [22:02:47] #topic RFC: Compacting the revision table round 2 [22:02:54] was meant to be #link [22:02:59] #link current notes for our schema change planning at https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2 and the associated talk page [22:03:17] Ok folks! [22:04:01] I think we need to improve the intro on that page a bit more to clarify exactly which changes we're making and for what reasons, so feel free to ask for details to make sure we flesh it out :) [22:04:26] But roughly: this'll be the biggest MediaWiki schema change since 1.5 created page/revision/text :) [22:04:39] while still being relatively minor in terms of functionality [22:04:57] ...with the exception of a very cool preparation for storing multiple chunks of data per revision ('multi-content revisions') [22:05:12] the other changes are for cleanup, performance, and future-proofing reasons... [22:05:31] ...and we're kind of planning to do them in one big change set. [22:05:40] brion I may of overlooked this if it does infact mention this on the page itself but will this affect the UI or is it all database and code related changes? [22:05:42] Any initial questions? [22:05:55] A question for clarification: this is in part to stave off the day when we'll have to shard the revision table for certain ginormous wikis? [22:05:56] Zppix: initially, all database internals. [22:06:01] apergos: yes :) [22:06:10] brion ack thanks [22:06:16] Zppix: but in the future, there will be user-visible feature that use the multi-content revisions system [22:06:24] for instance, to store structured metadata for uplaoded files [22:06:33] alright, sounds good [22:06:43] potentially for storing things like categories in a more structured way [22:06:51] comment_text remains varbinary(767)? it says "TODO: change to a blob", seems like that would be pretty easy to change in this document [22:06:59] looking at it, one question: not sure what rev_sha1 was used for, but if it was used to quick-compare and identify revisions, maybe still worth to have it? [22:07:07] TimStarling: yeah go ahead and tweak it if you want while you've got it open :) [22:07:24] SMalyshev: *nod* i'm a bit unsure on the placement of the _sha1 field [22:07:36] well, we can have both :) [22:07:37] it can either stay on revision, or move to the individual content blobs, or exist for both [22:08:05] if it stays on revision it eats a few bytes per row, but it's not super huge. [22:08:08] however if there is anything with an index on it, i need to know about it :D [22:08:12] making one sha out of all content sha's would be trivial, and in most cases we'll have one content so far anyway [22:08:18] TimStarling: perhaps that blob should be json, so we can easily add more info there. for wikidata, we are encoding transaltable summaries in the comment, and it sucks horribly [22:08:27] would be nice to have a better place for that [22:08:41] or have a revision_props table?... [22:08:48] brion, SMalyshev: it is used for that even now, by folks analyzing eg reversions [22:08:57] DanielK_WMDE: gabriel also mentioned the possibility of using a content blob for comments, but i'm not sure that's the best association [22:09:04] SMalyshev: that's the plan for mcr [22:09:09] if you want it to have json in it then it should probably be mediumblob not blob [22:09:14] apergos: yep, i know it's used for comparisons, but dont know if anybody does global indexed lookups on it [22:09:20] apergos: then I'd say worth keeping it as a whole in revision and also in contents [22:09:33] brion: no, i wouldn't want to load hundreds of blobs from external store to show an rc page [22:09:37] #info use mediumblob not blob for comment to make sure is future-proof space [22:09:43] I have no preference about where it lives, as long as it lives on someplace. [22:09:43] DanielK_WMDE: *nod* [22:10:06] with rev_sha = sha(all of content_sha) [22:10:20] #info consider keeping sha1 on both the rev and the content for comparisons. relatively cheap field, fixed size [22:10:23] brion: but any info we put into a comment-blob could be exposed as a "vistual" content object, if we want that [22:10:42] perhaps :) [22:10:57] brion maybe instead of having both rev_user and rev_userid why not just have one or the other? [22:11:02] DanielK_WMDE: oh also -- we might want to have common storage for *_comment fields also for logs and uploads [22:11:38] I think that's part of the plan, Zppix [22:12:00] Zppix: the provisional schema plan right now proposes using a user_entry table, where we'd have a single rev_user_entry that links to user_entry row which then links to either a user_id or an IP [22:12:08] there is some contoversy over this in the comments so far :) [22:12:34] I'd be happy with comment_text being blob for now to be honest, if we want to extend it then we'll need to alter the comment table anyway to add a format field [22:12:47] apergos it may be I'm still reading the last couple sections of the plan [22:13:19] blob size is ..... 64k? [22:13:20] brion: The user_entry table is the one that I most love, BTW. [22:13:38] James_F: thanks :) [22:13:52] can someone make me feel good about the "slot" name? I'm not loving it yet [22:14:11] apergos: Better names welcome. It's just what we've been using. [22:14:18] apergos: we coudl also just call it 'content role' [22:14:19] yeah, 64k for blob, 16MB for mediumblob [22:14:40] re user_text + user_id: i have this crazy idea that we just store an ipv6 address, and reserve a block of "fake" addresses to refer to the user_entry table. [22:14:45] there is already a content role proposed, brion [22:14:52] #info per tim blob actually plenty for text, if change to json would need other changes in addition to the mediumblob upgrade [22:14:58] so instead of conflating the ip of anons with the user name, we'd conflate it with the user id [22:14:59] cr_id and cr_role being the fields [22:15:43] ah right [22:15:53] apergos: "slot" comes from the mcr proposal. happy to discuss it. it's basically the association between revision and content. [22:15:55] that being the list of available slot role ids/names [22:15:56] jynus: any opinion on blob versus mediumblob for comment_text? [22:16:22] ok, that shouldn't eat any more time during this hour but maybe it could be marked down as "find another name that's a little more explanatory" [22:16:31] #info terminology question on 'slot' and 'content role', may want to clarify these and have something that sounds awesomer [22:16:38] TimStarling, for me it doesn't matter, make sure you understand the indexing requirements [22:16:39] #info s/awesomer/clearer/ [22:16:49] max index atound 1K [22:16:49] you can't index on either [22:16:56] can be the prefix [22:16:59] if needed [22:17:00] yeah we wouldn't index it [22:17:06] it being the comment_text [22:17:09] ok, then no difference to me [22:17:18] if we needed to do lookups for duplicate row reuse, we'd want to add a hash field [22:17:22] limit on app level [22:17:30] so people do not store 1GB :-) [22:17:31] .oO(a fulltext index on the comment field might be nice, but proably a job for cirrus) [22:17:32] I'll just make it mediumblob, I'm being stingy, average row size is enormous [22:17:42] DanielK_WMDE: definitely outsource that to cirrus [22:17:44] one extra byte in the length field won't blow out the table size ;) [22:17:48] make it on the db as lage as you want [22:17:52] #info (for future: consider searchable index for comments. -> cirrus) [22:18:32] #question do we need an index to look up comment rows by text match? if so add a hash column probably [22:18:37] any thoughts on crazy fake ipv6 addresses to identify logged in users? [22:19:02] DanielK_WMDE just use 8.8.8.8 and its equivalent in ipv6? [22:19:05] DanielK_WMDE: like using a 128-bit column? [22:19:20] a) i'd be leery of conflating the values [22:19:20] (re search) searchable comments are a pain, today we don't have to care about revdel or anything because we only index the 'live' viewable data. not impossible just annoying [22:19:33] and b) i like foreign keys in the database that "just work" even if we don't use the real constraints :D [22:19:39] brion: well, we already conflate ip addresses with names. [22:19:43] heh true [22:19:44] :D [22:20:08] re comment indexing: yeah actually shouldn't be hard to make indexing. Querying is a bit harder but indexing is about 2 lines of code now :) [22:20:15] #info daniel has another idea on combining user & ip refs to one field -- link on IPv6 addr with a block of reserved address values that correspond to user_id [22:20:29] SMalyshev that could probably be shortened via some sort of API [22:20:31] SMalyshev: actually lots more, because today we only index the most recent document. my understanding is this would be indexing every revision [22:20:36] Zppix: my idea is to get rid of the need to store ip addresses as fake names, thusremoving the need for the user_text column [22:20:39] #info searchable commetns would need to handle rev_del too [22:20:50] ebernhardson: ah if you want only most recent then yes. I was thinking about all of them [22:20:54] DanielK_WMDE then remove the need for that for logged in users [22:21:20] terminology... available rough synonyms for "slot" include "compartment", "division", "chamber", "cubbyhole", ... (maybe a thesaurus wasn't the best place to look) [22:21:22] DanielK_WMDE: we do have one other case where rev_user is 0 and rev_user_text is not an IP: imports [22:21:23] ebernhardson: wait, the reverse, I was thinking about current one [22:21:30] Zppix: that's the idea: use special fake ipv6 addresses that encode the user id. then you can look up the name. [22:21:30] SMalyshev: right [22:21:45] YairRand: lol [22:21:50] :) [22:22:10] DanielK_WMDE why use and ip address at all then just leave it as null? [22:22:12] maybe we should have a naming contest ;) [22:22:22] for what brion? [22:22:23] I'd rather not have fake addresses in there; just have the username or the ip depending which it is [22:22:28] Zppix: for slots [22:22:32] in the same field, I mean [22:22:38] Zppix: then you still need two columns, one for ip and one for user-id. [22:22:50] cleaner, but less compact [22:22:53] brion lets name them... well sorry theres possibly minors about [22:23:06] apergos: a third possibility is to have three columns: a user_id key, an ip, and a 'non-linked imported username' [22:23:20] brion: I'm not sure I have enough confidence in people not to call them "cubbyholes" [22:23:29] "nooks" [22:23:39] brion: imported user names should probably get a user_entry row [22:23:49] with a negative id?... [22:24:01] brion or just make the user_id either an user or ip or an id given by DB instead of having mutiple colums [22:24:03] user names change over time, I think it's good to reflect those changes in the revision history [22:24:04] columns* [22:24:23] if only because people use those names directly in talk pages and whatnot without always wikifying the references [22:24:23] apergos: Which way around do you mean that? [22:24:24] DanielK_WMDE: with the current plan they'd get a user_entry row with ue_user of 0 and ue_text of the imported name, equivalent to rev_user=0 and rev_user_text of the imported name [22:24:34] brion: i'm not too happy with the ue_id -> user_id indrection... [22:24:43] apergos: yeah that came up a couple times [22:24:44] would be nice if it was the same thing [22:24:45] apergos: Oh. Then I disagree. :-) Most users want to walk away from their old user name. [22:24:54] DanielK_WMDE: yeah it's the most controversial part so far ;) [22:24:57] you know I don't really like the idea of using IP addresses as usernames in the way we do right now [22:25:12] possibly stupid question: this changes how authorship is stored, yeah? will this make any eventual switch to multiple-author revisions at all easier? [22:25:14] #question should we preserve the old username on revision in case of renamed users? [22:25:16] so I don't like the idea of baking it into the schema [22:25:23] YairRand: hmmmmmmmm [22:25:32] I am 100% with tim here [22:25:38] YairRand: potentially user_entry could be extended to point to a group of users, yes [22:25:39] TimStarling: so what about using ips as user *ids* (or vice versa)? [22:25:45] but i'm not sure if i like that model either [22:25:45] TimStarling: what would you like to see? [22:26:01] I think we should have session identifiers which users can claim, reassigning those edits to an account [22:26:10] upgrade editors as a 1st class citizen, the fix the needs with extra stuff [22:26:16] brion: A better use for negative user_entry IDs, though. [22:26:20] DanielK_WMDE: sounds inelegant [22:26:41] yeah, my proposal was a hack [22:26:52] #question should we make bigger changes to authorship storage of IP edits, such as a session id that can be upgraded into a user? (per tim) [22:27:17] when I mean 1st class citizen [22:27:17] TimStarling: I'd love us to be able to do that. [22:27:23] I mean db-wise [22:27:27] not functionality [22:27:35] having its table, id, etc. [22:27:44] the problem with this on the UI side is that we break all those feeds of anonymous edits from political offices [22:27:50] heh [22:27:52] TimStarling: well, i like to think it's an elegant hack :) but it's not clean. But... a session identifier wouldn't allow people to review contribs from an ip when deciding whether to block the ip [22:28:09] TimStarling: whatever idenifiers we use for anon edits have to correspond to the identifiers we use for blocking [22:28:24] a long-expiry cookie is probably more stable than an IP in most cases [22:28:43] so, we'd block that cookie, instead of blocking an ip? [22:28:57] both measures are relatively easy to circumvent [22:29:04] #info hiding ips has issues for blocking, reporting, etc, needs consideration in detail [22:29:07] how does this translate into the equiv of range-blocking? [22:29:21] autoblocks extending to IPs perhaps [22:29:21] I don't really want to side track too far [22:29:26] yeah it gets complex :) [22:29:54] I would love us not to have ips in there at all... if we could do it. [22:29:56] i will say that starting with user_entry means we can extend that table's relations and features in detail later... [22:30:02] I appreciate that it's easier to maintain the same UI by whatever means [22:30:34] brion: can we make eu_id the same as user_id? is there a problem with that? [22:30:57] DanielK_WMDE: we could if we put the non-user_id items into negative space, though that feels icky to me [22:31:06] since auto_increment PK wouldn't be available [22:31:13] would have to maintain own sequence [22:31:15] brion: or into a separate table, or put a flag on them [22:31:26] ah [22:31:36] DanielK_WMDE: The problem is that we can't enumerate IPs in advance (IPv6 => 2^128 >> bigint) [22:31:36] or, if we extend user_id space to include the non-logged-in folks [22:31:47] negative ids is just me trying too hard to be smart ;) [22:31:51] 0.0.0.0/8 is reserved space [22:31:58] Let's please not use negative IDs. :-) [22:32:11] so the first N user_ids are the existing user rows, then we append stub rows for every IP that's ever edited so far during transition [22:32:20] then when new IPs edit, add rows for them too [22:32:43] brion: So much for centralising the WMF-wide user IDs to only have one user table. [22:32:45] my worry there is it breaks the assumption that user_id != 0 means "logged in" [22:32:55] James_F: good luck :D [22:32:59] brion: Inorite. [22:33:18] brion: that assumptions is already false [22:33:24] how so? [22:33:30] imported revisions [22:33:36] #question should we break the "user_id 0 == not logged in" assumption and store user rows for IP editors? [22:33:51] Platonides: indeed -- those are "not IPs" but they're also "not user accounts" [22:33:58] they're kind of a third thing [22:34:03] they were logged insomeplace, and in the post-sul world that's "good enough" [22:34:10] well, inside wikimedia they are user accounts [22:34:15] brion: i guess that's a point in favor of having eu_id != user_id: we can normalize the storage of names of non-logged-in users (anons and imports [22:34:15] brion: Your current proposal stores user_entries for them but not user_ids, which would allow us to change that afterwards without a schema change I guess? [22:34:21] given that we only accept them from trusted sites [22:34:27] James_F: hmmmmm, true. [22:34:37] brion: So… park for now? ;-) [22:34:46] James_F: but then we'd end up with ue_id that doesn't match user_id. [22:34:47] yeah :D [22:34:51] shudder [22:34:58] #info park the user_entry discussion for now, need to work it out in more detail [22:34:59] apergos: imports can come from wikis outseide the sul-family [22:35:05] brion: Yes, but that's also going to happen with Special:Import. [22:35:10] DanielK_WMDE: we do??!! [22:35:22] apergos: even if we don't, others do. [22:35:22] special:import upload an .xml :D [22:35:34] if you have suffient rights [22:35:37] and of course there are imports from usemod which have user_id=0 [22:35:39] And we have edits "made" in 1970 on zhwiki from a template that was edited on enwiki or wherever and apparently got poorly imported. [22:35:43] haha [22:35:44] apergos: no idea if we actually do. definitly not often. [22:35:55] So… let's not make Special:Import the shining example of what to do. [22:36:02] we know there are user_id = 0 entries which aren't anon edits, we suffer through that [22:36:16] as the original author of special:import let me say it's a piece of shit ;) [22:36:37] as the "so-called" maintainer od special:import, let me say thanks a whole lot [22:36:43] sowwy :D [22:36:52] hehehe [22:37:22] ok, any other big concerns we haven't covered yet? [22:37:36] jynus: if you have a chance i'd love to pick your brain about the custom live-hack index issues [22:37:45] i don't have a good idea of which ones are actually in use right now [22:37:46] someone tell me about the rev_content_address thing? [22:37:49] if we introduce ue_id then wgUserId in JS can still be 0 [22:38:02] so we don't have to break gadgets [22:38:08] apergos: ah -- i think i half-copied some of the earlier MCR docs there, it needs finalizing [22:38:09] we also have some ancient "IPs" that are actually hostnames, or where the last octet was removed [22:38:11] TimStarling: Yeah, and we can slowly transition to wgUserEntryId or whatever later. [22:38:29] Platonides: those are imports [22:38:43] The 123.123.123.xxx edits? [22:38:50] brion, give 30 minutes [22:38:52] yeah, those are imports from usemodwiki [22:39:02] brion: what's supposed to go into that field, do we have an example (since apparently it's not the text_id)? [22:39:03] apergos: so we were thinking of having a new abstraction for the backend storage blobs, with URL-like addressing [22:39:06] Good-old useMod, when we cared about IP's anonymity. ;-) [22:39:18] apergos: that would replace a reference to a 'text' row that just contains a URL ref to external storage [22:39:19] does ext store kinda already do that? [22:39:22] *doesn't [22:39:26] kinda ya [22:39:36] so cut out the middleman [22:39:39] so we could either go ahead and make that change fully, or else not make it at all [22:39:43] ok [22:39:51] usemodwiki also put reverse DNS hostnames into the username field on occasion [22:40:02] #info need to either finish or roll back the _text_id -> cont_address change. decide on this [22:40:36] DanielK_WMDE: did you have any strong opinions on the cont_address usage? [22:40:46] what about the archive table? [22:40:49] vs keeping a cont_text_id -> [22:41:05] Platonides: excellent question! it needs updating to match revision, probably, I forgot to include it ;) [22:41:19] #info make plans for archive table as well, this didn't make it into the last draft [22:41:25] currently revision is much nicer than archive [22:41:39] didn't you forget the archive table last time around too? ;) [22:41:39] we need to plan how archive will become [22:41:42] another possibility is to say "screw archive table, switch everything to use rev_deleted" [22:41:49] lucky you have Platonides [22:41:51] which we kinda wanted to do years ago and then never did [22:42:02] heh [22:42:08] yes, we could clean it up like that [22:42:18] brion: the indirection ( cont_text_id -> strange blob in the text table -> external store ) annoyes me. itwould be nice to get rid of that, especially if we want other storage mechanisms in the future. [22:42:24] what's the relative size of the archive table compared to revision? [22:42:27] (for big wikis) [22:42:29] archive and rev_deleted have different UI personalities now [22:42:34] DanielK_WMDE: that's my inclination too [22:42:36] would probably need some more flags to mimic how things are now [22:42:43] #question relative size of archive table? (for transition planning) [22:42:54] and Aaron was not interested in merging them when he was doing his rev_deleted work [22:42:57] brion: we could have content_address just *be* the text_id for now, with a prefix (or none) [22:43:08] #info archive and rev_deleted have different behaviors. if want to drop archive entirely need to extend rev_deleted semantics [22:43:18] we can then easily encode addresses for different storage backends later [22:43:29] hmm [22:43:33] there's also that deleted pages don't have a page_id [22:43:42] can one join on a string_field=int_field? [22:43:52] Platonides: Yeah, we'd definitely want to fix that. [22:43:55] you can [22:44:06] jynus might kill you after that, though ;) [22:44:09] :-D [22:44:24] James_F: it's not that easy [22:44:25] #info for deleted -- what about pages completely deleted, where page entry is gone? no good rev_deleted model there yet [22:44:34] and recreated pages, too [22:44:36] do restored pages get a new id then? I thought they got their old id back [22:44:40] Platonides: ar_page_id ? [22:45:01] are we really restoring it? [22:45:02] ar_page_id can dead-end, if the page was deleted [22:45:24] Platonides: i think we are, for a while now, actually. i seem to recall the rfc [22:45:48] hmmm, maybe [22:45:51] brion: you'd just re-create te page row with that id. i think that's actually what we do these days. [22:45:57] there was some change in behavior a while ago, either from "they used to get their old ids and now they don't" or the other way around [22:46:01] the archive table has the page title, so ar_page_id is not necessary [22:46:12] i forget if we preserve the ar_page_id on recreation... can't do it in all cases (eg if a new page with same title was recreated) [22:46:16] revision has rev_page already but does not have the title [22:46:18] actually, before 1.11, it wasn't preserved [22:46:35] and in early enough versions we didn't save ar_rev_id either ;) [22:46:50] iirc [22:47:04] yes, this gets back to my dig about forgetting archive [22:47:09] :) [22:47:40] how about archiving page rows, instead of revision rows? [22:47:44] there's quite a bit of playing with ar_page_id at SpecialUndelete [22:47:48] just leave the revisions rows where they are [22:47:49] what is the action item on user_entry? [22:47:54] so yes, I think it finally is kept [22:48:07] DanielK_WMDE: LGTM [22:48:13] TimStarling: current action item is to make some kind of decision. none decided yet :) [22:48:29] note that you must be able to merge revisions from different archived pages, though [22:48:34] will you add links from more tables? [22:48:52] or drop it from the proposal for now? [22:48:53] oh, and while we are fixing all of this, image should get an id :P [22:48:58] TimStarling: ah yes -- that should be done (where we have log_user_text, img_user_text, etc) [22:49:29] #info also note image has no id still. change that later or now? [22:49:41] Platonides: yes, rev merging would need some code. and page_archive would need to be able to have multiple entries with the same title. [22:49:42] if I could +20 that, Platonides, I would [22:49:45] I think we have a proposal for that already [22:49:48] oh. I just did :-P [22:49:51] brion: Later. When we do the epic MCR re-write. [22:49:54] #question should we change all the various *_user_text rows to *_user_entry, or wait on the change until model is consistent? [22:50:15] https://www.mediawiki.org/wiki/Requests_for_comment/image_and_oldimage_tables [22:50:19] \o/ [22:50:24] this is an approved RFC which proposes adding img_id [22:50:26] (We have a different simple RfC for fixing image that we approved before.) [22:50:27] brion: what do you think about archiving page instead of revision? [22:50:27] Snap. [22:50:51] DanielK_WMDE: hmm, so using rev_delete on the revisions (if needed) and disconnecting them from the page row? possible [22:50:52] accepted feb 6. nice! [22:51:09] brion: yes. i think it's more logical, and simpler [22:51:17] alternately we could decouple titles from page [22:51:20] I missed that! [22:51:31] then a page row can be in a state that's disconnected from a title [22:51:41] this starts to sound like a lot of work to do in one project [22:51:41] what would that mean? [22:51:43] how do you handle "disconnected" titles? :s [22:51:57] i know jynus wants to decouple (*_namespace,*_title) from the various places it's in, though that has index ordering issues perhaps [22:52:03] TimStarling: we better plan everything right [22:52:05] ic [22:52:14] yeah, could potentially become a very big project ;) [22:52:22] namespace is easier in that there are ids along with the text name [22:52:31] brion: nice idea. though "page" is a strange name, then. it's a non-page. an un-page. [22:52:42] better to have a plan I guess, even if we end up splitting it up in practice [22:52:50] +1 [22:52:52] #question make bigger changes to archive & deletion & page? or stick with making archive look like revision and keeping the deletion model as is for now? (may become too big a project to do all at once) [22:53:12] I think we should retain the same deletion model for now [22:53:19] DanielK_WMDE: well, it's like an inode that's been mved to ~/.Trashes/ :D [22:53:21] TimStarling: Which one? [22:53:32] TimStarling: The revdel one or the archive one? [22:53:41] both [22:53:44] since this project is now approaching the size of my working memory [22:53:45] in unixy filesystems the name and the file details live seaprately and can be decoupled [22:53:48] I don't think we should keep the current deletion model ("copying things around to/from archive table") [22:53:54] heh [22:53:56] Oh, right. Yeah, I agree that we should de-scope that. [22:54:10] But having a plan for it is mostly "I agree with Platonides". [22:54:10] scope is not important [22:54:21] the inode analogy is not horrible. I would have to think on it some [22:54:22] but make sure you have very small milestones [22:54:27] #action probably don't completely change the deletion model just yet, but start thinking about it for next big change [22:54:36] we will hit it as soon as needing to delete a page with revisions in the new model [22:54:46] like #1 separating content_mode #2 separating bla bla [22:54:52] #info about decoupling title from page idea -- analogy with unix directory entries & inodes [22:54:59] yes [22:55:03] but now [22:55:17] maybe fix archive first, then adding multi-content [22:55:18] title would be a leve1 entity [22:55:22] although that slows the later [22:55:29] and a property of a page [22:55:35] time notification: 5 minutes left [22:55:36] but also can exist without a page [22:55:38] Platonides: We're doing MCR sooner. [22:55:46] (links to pages that do not exist) [22:55:50] yeah we have features we want to ship that'll need MCR [22:55:58] yes [22:55:59] whereas deletion can stay awkward for a while longer ;) [22:56:02] leave titles for later [22:56:05] And we signed a contract saying we'd them. [22:56:23] Err. + ship. [22:56:24] and what will you do when the first MCR page is goign to be deleted? [22:56:25] we have to alter archive, yes, I'm just saying keep the same schizophrenic UI and data model, where rev_deleted is separate from archive [22:56:36] * James_F nods. [22:56:48] Platonides: well we need a model that works. ;) it may just remain ugly (copying revision rows and maybe even a separate role/slot association table to the content rows) [22:57:03] ugh [22:57:09] but we won't have to move content rows around, so that's something [22:57:16] this is probably why you didn't remember archive this time either... eww [22:57:19] That would mean either having slots referred to by archive (eww) or adding an archive_slots table which is bad. [22:57:24] wouldn't doing things right be easier than adding more uglyness on top of that? [22:57:43] #info even if we keep the ugly revision->archive row copying model, can avoid copying the content rows (similar to keeping text rows) [22:57:44] #action brion to add user_entry links to more tables [22:57:55] Platonides: Changing the user experience for deletion in a rush feels like a bad idea. [22:58:04] it would be easier but only if we had a timetable that permitted it [22:58:08] James_F: I am not asking for changing the user experience [22:58:10] #action brion to propose archive schema modification [22:58:12] better is the enemy of good enough ;) [22:58:13] only the backing tables :P [22:58:19] OK. :-) [22:58:20] ok two minutes left :) [22:58:26] though i'll be around a bit longer [22:58:41] ok, let's stop general discussion now [22:58:51] let's just talk about action items [22:58:54] any other action items? [22:59:14] we should schedule another round of discussion in a couple/few weeks [22:59:17] can someon recap the existing list? [22:59:19] Next step is… Brion to finish and we re-discuss? [22:59:57] apergos: i will update the archive model (slightly), extend the user_entry usage to other tables, a couple other tweaks, and then we rediscuss [22:59:58] or: write code? [23:00:16] yes, should start a gerrit patch with the updated tables.sql [23:00:23] and start putting bits of code into it [23:00:36] ok, sounds good [23:00:40] +1 [23:00:40] #action brion: start a gerrit patch with the updated tables.sql so we can iterate andprototype more [23:00:49] thanks everyone [23:00:53] thanks all! [23:01:01] Thanks especially to TimStarling and brion. [23:01:01] thanks for hosting and for the work [23:01:09] #endmeeting [23:01:10] Meeting ended Wed Feb 15 23:01:09 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [23:01:10] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-02-15-22.01.html [23:01:10] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-02-15-22.01.txt [23:01:10] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-02-15-22.01.wiki [23:01:11] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-02-15-22.01.log.html [23:01:20] it's.... 11pm (i'm in london atm) so i'll be up a little longer than go to bed, i have a flight back to portland in the morning :D [23:01:33] it's 1 am so I'm pretty much gone [23:01:39] g'night apergos :) [23:01:40] brion: Lots of quiet time to write the patch. ;-) [23:01:41] thanks for you feedback! [23:01:42] what will happen to the "info' items? [23:01:52] apergos: they should appear called-out in the meetbot log iirc [23:01:58] ok, great [23:02:00] * apergos bookmarks [23:02:10] thanks again, happy schema-ing! [23:02:16] apergos: rather in the 'minutes' [23:02:19] :) [23:02:42] I think sr_id should be cr_id [23:02:55] Platonides: yep that wa a typo :) [23:03:25] see ya in the other channels :-) [23:03:37] I was looking how you had an orphan content_role table ;) [23:04:33] hehe [23:08:58] the slots table is there just to ensure that several revisions may have the same content? [23:09:12] I wonder if it is worth an extra table [23:10:49] Platonides: once we roll out uses of multi-content revisions it will be, since many edits will only change one or the other content item [23:11:05] that way we only have to repeat the foreign keys in the slots table for the unchanged bits [23:11:17] I see [23:19:37] ok it's bedtime for bonzo, i gotta get up earlyish to go to the airport :) [23:19:44] gnight all! [23:23:48] I have finished the maintenance, but send me an email if you need it