[21:04:09] #startmeeting oldimage/image discussion T589 [21:04:09] T589: RfC: image and oldimage tables - https://phabricator.wikimedia.org/T589 [21:04:09] Meeting started Wed Aug 31 21:04:09 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:04:09] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:04:09] The meeting name has been set to 'oldimage_image_discussion_t589' [21:04:09] Meeting started Wed Aug 31 21:04:09 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:04:09] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:04:09] The meeting name has been set to 'oldimage_image_discussion_t589' [21:04:18] #topic Wikimedia meeting channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:04:51] hi everyone! [21:05:09] hi [21:05:29] hi All! [21:06:53] hi [21:06:58] I am a bit confused by this topic, I think most people agreed in general about the idea; implementation details could be filled and discussed offline? [21:07:13] Oh herp derp I'm here :) [21:07:23] it may be a short discussion then :D [21:07:25] * ostriches puts on his releng hat [21:07:48] wait, it was off the rest of the day? [21:08:12] have you seen the comment Krinkle wrote after the last meeting? [21:08:17] specially because I didn't see any were concerns, just a question [21:08:20] jynus: we found that you were giving a lot of great feedback. Krinkle was hoping for more input on release strategy [21:08:25] greg-g: I was wearing https://www.amazon.com/Adult-Propeller-Beanie-Hat-Made/dp/B001QK4RZC instead [21:08:29] Hi [21:08:44] So the main questions for this meeting are written here - https://phabricator.wikimedia.org/T589#2541630 [21:08:46] no, that's a mistake; stop listening to what I say! :-) [21:08:51] These were also on wikitech-l [21:08:57] 3 open questions. [21:09:26] robla: Shall I start with the first one? [21:09:30] sure [21:10:22] So before I do, I want to help set the stage and clarify any misunderstanding about the scope of this RFC. This was raised in the previous IRC meeting. We cleared it up at the time but not everyone may be caught up with that. https://www.mediawiki.org/wiki/Requests_for_comment/image_and_oldimage_tables#Problems [21:10:41] We've reduced the scope to two core problems we want to address. [21:10:48] 1. File revisions should have better unique identifiers than "current file title + timestamp". [21:10:54] 2. Uploading file revisions must not involve rows moving across tables, or rows being replaced. [21:11:26] The first one is primarily from an architecture perspective to simplify a lot of our internal classes and API consuming, as well as to remove conflicts in race conditions. [21:11:55] The second one is because of an anti-pattern in our database schema that jynus wants us to get rid of (and I wholeheartly agree) [21:12:08] 2 is to stop commons from exploding like it happens every 6 months [21:12:18] :) [21:12:28] jynus: exploding as in replication failure? [21:12:29] and "make Labs great again" [21:12:44] yes, TimStarling [21:13:11] it has to be a special case for it to happen in production, but it definitely happens on labs [21:13:22] how is this related to multi-content revisions? it seems like a plan to refactor something that will need to be refactored again not much later [21:13:23] like a master failover [21:14:19] tgr: I think Multi-content revisions is for the future right now. It's too big a step to move into right away. However once MCR is stable and used elsewhere, by doing this RFC, we'll be in a position where migrating to MCR is fairly straight forward. [21:14:32] Krinkle, +1 [21:15:06] question 1 was what fields to keep in image? [21:15:09] yea, what Krinkle said... is discussing the MCR option in scope of this meeting? I like the idea, but it would probably mean shelving this until we have MCR, and have some experience with it (let's say, in a year). [21:15:29] Yeah, we more or less shelved MCR for file tables in the last meeting. [21:15:36] But it's a good question to ask. [21:15:54] I think it is worth it, once I propose a solution for 3; independently of the status of MCR [21:15:54] migrating twice is extra work, but this sounds urgent, going for the more immediate options seems wise for now. [21:16:21] TimStarling: Yeah. So question 1: Given that we'll put all image revisions in 1 table (instead of one for current and one for older ones), which fields do we want to keep in our generic 'image' table (the equivalent of 'page' basically). [21:16:48] I notice that the following fields have indexes: img_name, img_timestamp, img_user_text, img_sha1, img_media_type, img_major_mime, img_minor_mime [21:16:56] The minimal set of fields would be: img_id, img_name, img_latest [21:17:08] DanielK_WMDE: urgent? [21:17:38] img_sha1 is used to provide duplicate upload warnings [21:17:52] AaronSchulz: if i understood correctly, jynus sais commons blows up because of data being moved between tables [21:18:29] "blows up" could be clarified [21:18:30] AaronSchulz: the point being: the problems is urgent. while managing file revisions in mcr is a nice idea, it's blocked for a while. so let's do something else for now [21:18:33] it's probably still needed for that purpose [21:18:40] Yeah, so we'll need to keep that in sync. This is not unprecedented as we do this for page as well (page_touched, page_is_redirect, page_len) whenever we update page_latest. But ideally fewer is better I guess. [21:18:42] AaronSchulz, https://wikitech.wikimedia.org/wiki/Incident_documentation/20160705-commons-replication [21:19:06] We also have a query page (special page and API) for finding files by mime type. [21:19:08] isn't that mostly just INSERT SELECT [21:19:11] ? [21:19:21] the img_user_text image is for contributions, that will be replaced by a filerevision index [21:19:22] AaronSchulz, yes [21:19:25] I would still like to see content-based image ids (hashes) rather than other random ids if that's feasible to do while we are at it [21:19:29] s/image/index/ [21:19:46] gwicke: For the revisions? [21:19:52] the img_media_mime is a joke, do we seriously use that? [21:19:59] yes, for each image original [21:20:09] gwicke: We do currently have duplicates, however. [21:20:22] gwicke, do not we have already a hash column? [21:20:22] And given we are not separating content from revision right now, that wouldn't be semantically possible. [21:20:34] jynus: we only do that for moving the current to the oldimage table and the current to the archive table (e.g. 1 row). The others now do the SELECT in the app. [21:20:35] #info agreement to shelve the idea to manage upload history using MCR. Discussing the two-table file / file_revision option for now. [21:20:42] since after a rollback it'd be the same content, but a different timestamp+user. [21:21:10] Krinkle: how does the current proposal support revision deletion? (do we support rev-deletion for images at all right now?) [21:21:25] Krinkle: documenting those considerations on the RFC might be useful [21:21:32] jynus: https://gerrit.wikimedia.org/r/#/c/307078/ will banish even those [21:21:42] for later reference, as well [21:22:26] gwicke: Even if we separate them later, we still need a revision ID. Which is the first step (this step). [21:22:56] TimStarling: the original idea was to allow people to search for "videos" or "bitmaps". never happened, should be done with elastic now. img_media_type can die in the db. the concept may still be useful in code, even if we just feed it to the search engine. [21:23:05] unless we decide that having two different copies of the same thing & under possibly different licenses isn't really useful [21:23:05] AaronSchulz, I say thanks, but why moving things in the first place? [21:23:17] Special:MIMEsearch.php mentions that the index is wrong [21:23:21] SMalyshev: does cirrus index media type (not mime) for files? [21:23:21] but, agreed, that brings up a whole can of worms [21:23:22] we do support rev-deletion for images, but that just updates the value of a bitfield, does not seem problematic [21:23:26] DanielK_WMDE: Elastic specfiic? Or possible in mysql default search too? [21:23:27] it doesn't contain a sorting field [21:23:27] jynus: oh, the moving still sucks, I'm just saying the urgent part isn't apparent to me [21:23:38] DanielK_WMDE: do you mean normal deletion of single revisions? [21:23:46] tgr: single revision. [21:23:48] I'm all on board for refactoring to be have filerevision [21:23:50] AaronSchulz, I want things fixed, I do not matter how [21:23:56] DanielK_WMDE: hmm not sure [21:24:02] I think he means 'revision deletion' (hiding of revisions within the same table, like rev_deleted) [21:24:06] DanielK_WMDE, SMalyshev: I don't believe we do [21:24:11] I don't think right now a lot is indexed for files [21:24:17] let me check [21:24:18] And yes, oi_deleted exists. [21:24:31] Krinkle: we have a field-based system for feeding the search engine now. core could expose the field, but the default (sql) search engine would just ignore it for now [21:24:31] that would ideally need to be integrated with the SearchEngine interface in core [21:24:43] AaronSchulz, I wasn't the one saying it was urgent; however, I think MCR is a bit ambitious to block lots of small changes on it [21:25:20] TimStarling: SMalyshev has recently implemented an interface that could be used to do that I think [21:25:41] Sidenote: SearchEngine & friends are terrible and need a lot of cleanup. [21:25:44] (never got to that) [21:25:45] Current fields we are discussing: https://www.mediawiki.org/wiki/Manual:Image_table [21:25:47] jynus, DanielK_WMDE: I just want us not to feel a need to rush it :) [21:25:55] Krinkle: that's being done via img_deleted / oi_deleted / fa_deleted currently, does not seem hard to migrate [21:26:13] so let's suppose that Special:MIMEsearch can be deleted from core [21:26:28] that would let us remove that index [21:26:29] * gwicke would be very interested in a summary outlining how this will integrate with the longer term image naming roadmap, including things like content based addressing [21:26:29] tgr: Yeah, oi_deleted is just a revision property. That'll migrate just fine and isn't part of the generic image object. [21:26:32] but normal deletion does move rows to filearchive (and page revisions work identically), do we intend to change that? [21:26:35] AaronSchulz: well, if it's ok to push this back for a year, I'm all for the MCR option :) [21:26:35] DanielK_WMDE: I don't think for files we index more than text [21:26:38] (the same way there is no rev_del in the page table) [21:26:39] the only thing related to fields I do not have clear is the whole old-revision-handling [21:26:54] tgr: No, archiving is not in the scope of this RFC. [21:26:55] SMalyshev: there is some special case code for file pages, let me find it... [21:26:56] does that require some tuning on indexes, extra fields, etc? [21:27:25] DanielK_WMDE: we could of course add more fields, shouldn't be hard [21:27:52] do we have currently issues with lots of old revisions of the same file? [21:27:59] gwicke: I'd say the outline is that it'll be a whole lot easier once we have primary keys for images and their revisions. It's mostly just technical debt that makes everything easier after it. [21:28:31] jynus: For some files, yes. Most files have very few revisions. [21:28:34] So, long tail. [21:28:44] We should think about those things a little, but not much imho as there isn't much we could do wrong or would different in that perspective. We're just moving the table around basically. [21:28:47] from what I saw, there were relatively very few old revisions [21:28:55] Yeah, that surprised me too. [21:29:06] I expected there to be more old revisions than current revisions. [21:29:11] But we didn't include filearchive here [21:29:12] meh, I've known that for a while [21:29:16] That doesn't surprise me. [21:29:21] DanielK_WMDE: though I wonder if there's intersection between this and structured commons work... [21:29:26] The vast majority of files get uploaded and never edited again [21:29:27] yeah, it makes sense that most images don't have multiple revisions. [21:29:32] I remember when I added paging to the file history due to big history files [21:29:44] On the other hand, there's some images that get edited all the time (think SVGs with routinely updated data) [21:29:45] SMalyshev: ah, right, file_text is just one field. there's a few more useful things we could index for use on commons, i think. we could supper type:video or resolution:>800 [21:29:51] I don't recall any scaling issues we are facing right *now* aside from moving all those files around [21:29:53] *support [21:30:00] but that's not affected by this work [21:30:14] jynus: there are some edge cases, like the image that the smoke tests update every day, but they are extremely rare [21:30:15] DanielK_WMDE, SMalyshev: Yeah, we never stuffed a ton of data about files into Elastic originally [21:30:23] Mainly file_text so we could search inside of pdf/djvu files [21:30:24] So back to the original question :) Aside from img_latest and img_sha1 (for duplication detection). Any other fields we need to keep? [21:30:26] I suppose a pdf with many revisions could hurt [21:30:29] SMalyshev: i see no overlap with structured commons work tbh. it's orthogonal [21:30:32] DanielK_WMDE: it should be pretty easy to do now that he have infrastructure to add fields ieasily [21:30:35] or djvu...all those big blobs of text [21:30:41] If we don't keep img_media_type we need to migrate Special:MIMESearch first. Do we want to block on that? How trivial would it be? [21:30:43] in oi_metadata [21:30:50] SMalyshev: yay! make a ticket? [21:31:03] Will we be able to make that special page work with default mysql search? [21:31:20] DanielK_WMDE: and we have also much easier keyword adding in Cirrus, btw - it's not proper structure instead of one humngous switch [21:31:24] Or do we want to remove it from core or somehow make disable for simple backends? [21:31:27] can I be pedantic and suggest, as usual, not to store in text things that only have a few values? [21:31:42] like mime types [21:31:49] Krinkle: Not easily with the way things are right now. [21:31:54] DanielK_WMDE: so yeah I think making a ticket and figuring out what we want to put there won't hurt [21:31:55] I doubt there is anyone who really relies on that special page [21:31:56] I would be +10000 to removing it from core. [21:31:56] jynus: It's currently ENUM [21:32:03] Krinkle, good! [21:32:08] /** [21:32:08] * The index is on (img_media_type, img_major_mime, img_minor_mime) [21:32:08] * which unfortunately doesn't have img_name at the end for sorting. [21:32:08] * So tell db to sort it however it wishes (Its not super important [21:32:08] * that this report gives results in a logical order). [21:32:23] this is a special page which gives you images of a given type, in random order [21:32:28] Krinkle: one question that could have an impact on how we want to evolve things is which metadata should be associated with the image itself, and which should be associated with a name + description [21:32:39] DanielK_WMDE: byw there's a big reindex coming IIRC sometime soon... so maybe worth discussing with all search team if we want to add stuff before it [21:32:54] We also have https://commons.wikimedia.org/wiki/Special:MediaStatistics [21:32:56] DanielK_WMDE: because adding stuff after would take long time to become useful [21:33:00] Which is arguably less random than MIMESearch [21:33:14] can we go to question 2? [21:33:42] Yeah, time is moving. OK. So mime remains an open question for later. No other fields? [21:34:11] question 2: What about img_metadata field. It's fairly large. Do we just move it to the new imagerevision? (like it is now). Or differently? [21:34:26] Krinkle: MIMESearch is basically only useful for the less-used mimetypes :) [21:34:30] This field is a blob of serialised PHP (typically representing the Exif data of an image). [21:34:42] Like, who on earth is doing [[Special:MIMESearch/image/jpeg]] [21:34:44] ;-) [21:34:44] it's pretty horrible for djvu files especially [21:34:48] megabytes of xml [21:34:53] Krinkle: maybe external store support? [21:35:00] caused some outages [21:35:06] EXIF data is usally pretty small, the problem is that on wikisource it stores OCR text of enormous scanned books [21:35:10] I mean if we still have filearchive, we don't want to be moving those around even after a big refactor [21:35:13] #info Krinkle: question 2: What about img_metadata field. It's fairly large. Do we just move it to the new imagerevision? (like it is now). Or differently? [21:35:17] that would be a missed opportunity [21:35:49] It should've been normalized when Brian redid our Exif/etc support a few years back, we just didn't get to that point. [21:35:56] TimStarling: yeah, tiny ones should be able to stay, and a flag field could say if it use ES or something perhaps [21:36:12] I'd like to avoid another 'text' table. [21:36:25] well, we have metadata upgrade which means we need to load it all the time [21:36:54] this comes back to my earlier question about which metadata should be associated with the image itself, vs. the name -- exif data and OCR XML look more like a property of the image itself [21:36:54] we probably need to load it for other things... [21:37:10] gwicke: image revision itself? [21:37:17] no, image [21:37:19] we load it for thumbnails I think [21:37:31] the same image under different names would still have the same OCR XML [21:37:41] just use blob storage for the extracted text? [21:37:42] might be nice to have metadata that can point to more metadata [21:37:50] 'image' = equivalent to page, not image content. There is no image content right now. [21:37:51] so thumbnails could render from the minimal metadata [21:37:53] or jpg metadata, for that matter [21:37:54] that's going to be its eventual fate anyway [21:37:57] and pages of djvu from the rest [21:38:04] gwicke: One image with two revisions needs two meta data blobs [21:38:21] to record user / timestamp / comment, yeah [21:38:24] It can't be in 'image'. [21:38:26] but the XML etc can be unchanged [21:38:34] do we currently store "metadata" for old revisions, it is not clear to me? [21:38:37] and the minimal metadata would fit in cache with the rest of the filerevision fields [21:38:45] jynus: we do [21:38:49] so, if that was keyed on content hash, then we'd get the right association [21:38:54] A new revision means a new upload, which would always bring new Exif. It could be unchanged, but it's part of the image file. [21:38:55] ok, then keep doing it [21:39:01] & avoid the need to touch it when reverting [21:39:20] Can we deterministically hash Exif and other meta data? [21:39:27] we can improve it in the future, as we will probably will, but I think that is out of the scope? [21:39:29] (Do we normalise first?) [21:39:31] we can hash the original [21:39:41] and just reuse the existing extracted metadata [21:39:54] what's wrong with having another text table? [21:40:06] Either way, yeah, we could have a separate image_metadata table. Would that table associate primary keys with blobs and/or references to external store? Whcih we reference in imagerevision? Or do we reference it directly? [21:40:28] TimStarling, seems like a good idea whenever even if there is 1:1, you will not always read or write the entire row [21:40:50] we said years ago that we would change the image / oldimage structure [21:40:51] could the image revision metadata reference a hash, and the blob metadata then key on hash as well? [21:40:55] maybe metadata for most things could be in filerevision, and the DjVu handler could have its own special pointer to ES [21:41:17] that way you can efficiently load things like video duration when it's needed for list display [21:41:20] gwicke: we currently cash djvu tree metadata by base file hash [21:41:21] TimStarling: I meant that if we need a generic place to store blobs in a way that won't be moved, altered or deleted. We may want to re-use text instead of creating another one. I don't know if that's beneficial or not though. [21:41:32] Given we already have sharding etc. and a scalable solution for that. [21:41:33] gwicke, you keep saying hash, and if it is a hash or another kind of id, it doesn't matter [21:42:00] when djvu OCR was introduced, we put it in img_metadata because there wasn't really a simple alternative [21:42:02] we can store the hash separately or as an id [21:42:02] jynus: it matters a lot for dedup and cache invalidation [21:42:09] ^ [21:42:33] it would be nice to have ids for images [21:42:33] TimStarling: I like that. So we have reason to avoid a separate table because we typically need to query both anyway? [21:42:56] Krinkle, that is the right question [21:43:25] separte if logically or physically is separated [21:43:28] And that balances okay with the cost of having to read/write both since we would mostly only be writing it once? (aside from archiving). In the current schema in production, we'd avoid this because it'd need to move the metadata when a new revision is uploaded. But we won't have that probelm anymore. [21:43:30] I'm not sure of the exact numbers, but file description pages for example obviously need metadata loaded, for all revisions of the file [21:43:48] gwicke: you could always change the ES id if you change the metadata. We probably shouldn't focus on hash vs ID too much for that :) [21:43:53] however, maybe that could be a decision that we do not need to take now? [21:44:00] things that use ImageListPager probably need metadata too, since they generate thumbs [21:44:05] yeah OCR text should be split out of generic metadata [21:44:11] Okay, so that means for the current migration it is orthogonal. (Supporting external store for img_metadata) [21:44:15] that's T32906 / T99263 fwiw [21:44:15] T32906: Store DjVu extracted text in a structured table instead of img_metadata - https://phabricator.wikimedia.org/T32906 [21:44:15] T99263: Store Pdf extracted text in a structured table instead of img_metadata - https://phabricator.wikimedia.org/T99263 [21:44:19] hmm, and using images from within the parser will generate thumbs and so need metadata [21:44:21] TimStarling: Should we do it before or after the schema change? [21:44:45] can be separate, the only question we need to answer right now is whether filerevision.fr_metadata will exist [21:44:51] and I guess the answer to that is yes [21:45:01] #info discussion of q2 has been about historic reason for having it (e.g. djvu) [21:45:08] we don't have nearly enough time for question 3 now you know [21:45:22] but we can give it a go [21:45:28] Yeah :) [21:45:34] SMalyshev: https://phabricator.wikimedia.org/T144447 [21:45:38] So last question 3: Migration. [21:45:45] haha [21:45:47] #info question 3 discussion starts. Migration [21:46:02] tgr: perhaps OCR text can wait for MCR? seems like an obviour candidate [21:46:04] Background in 3. Migration at https://phabricator.wikimedia.org/T589#2541630 [21:46:25] I chatted with Tim about a potential path at the data side, I think I summarized on the ticket response [21:47:06] I'd like to avoid adding temporary logic to MediaWiki for ignoring 'in-mid-air' rows since it's complicated technical debt to have to maintain. Especially since we support upgrades from older versions. [21:47:15] keep X table doesn't mean we have to keep the names [21:47:20] I fear we'd have to keep it forever. [21:47:38] last time jynus mentioned views [21:48:04] Good point. [21:48:07] I don't have a fully-worked plan for using them but it seems like that could avoid the need for MW migration code [21:48:07] AaronSchulz: I agree that any blob id will help with cache invalidation, but there are differences in dedup, multi-repo use (commons vs. local project) etc; numeric ids are basically less efficient for a lot of those problems, but have the advantage of being slightly shorter [21:48:52] So that would allow for migrating rows *to* the new table while keeping the old MediaWiki logic live. [21:49:00] how does non-duplication interact with 1) het deploy of MW versions and 2) ability to back it out if it fails? [21:49:02] can you insert into a view? [21:49:19] TimStarling: no [21:49:21] TimStarling, yes [21:49:26] what? [21:49:30] :D [21:49:32] if you have an "insertable view" [21:49:48] https://dev.mysql.com/doc/refman/5.7/en/view-updatability.html [21:49:49] which basically meas 1:1 relationship with the original table [21:49:53] * DanielK_WMDE just googled [21:49:54] robla: it's not like we are running two versions of commons at the same time right now [21:50:22] you cannot insert into a "GROUP BY" view, means mostly [21:50:39] #info  can you insert into a view?  if you have an "insertable view" [21:50:40] We always require backward compatibility indeed because of FileRepo on wmf wikis querying commons. [21:50:48] Which can be one version ahead or behind Commons [21:50:53] robla: thanks [21:50:59] but you can into a "WHERE deleted=0" [21:51:22] I had created https://phabricator.wikimedia.org/T134827 precisely because I thought (after testing) that you could not insert into views [21:51:33] ha ha [21:51:51] so that is another reason to keep the original tables around (even if they are renamed) [21:51:51] So if we temporarily add a 'fixme' column, and migrate rows from image to oldimage (to be renamed). [21:52:00] actually [21:52:05] the plan was the other way [21:52:10] yeah, the other way [21:52:12] I suppose we'd do that in the background and then keep doing that until it's small enough and then we go into read-only mode while we update the software? [21:52:14] as image was larger [21:52:19] Sorry, yeah, that's what I meant. [21:52:22] rename image to filerevision, create an image view that filters filerevision [21:52:44] I hope we don't do any "*" select anywhere [21:52:45] how efficient are table renames, jynus ? [21:52:51] table renames are O(1) [21:52:55] reana tables are cost "0" in mysql * (* conditions may apply) [21:53:02] #info  rename image to filerevision, create an image view that filters filerevision [21:53:03] xD [21:53:13] TimStarling: depends on the storage [21:53:15] aka metadata lock, but that is an infra issue [21:53:29] I think there was something to take into account [21:53:31] Given we need less fields in the new table, should we do the view the other way around? [21:53:42] just like renaming a db should be O(1) [21:53:46] but is actually not supported [21:54:00] Krinkle, for performance reasons, you do not want to write heavility into a view [21:54:09] poor indexing support [21:54:28] Sure [21:54:33] views do not have indexes, only the underlying table does [21:54:36] if it's really necessary we can disable uploads for a few hours while migration takes place [21:54:42] but if it works, it works [21:54:52] (but it is a pain in labs) [21:54:58] how does that matter for writes? [21:54:59] I think the whole migration will take longer than a few hours though, no? [21:55:07] We need to do some of it in the background ideally [21:55:40] what is the slowest part? copying all rows of commons.oldimage to commons.filerevision [21:55:42] ? [21:55:46] image->filerevision, create 'image' view for back-compat, add all oldimage rows to image. [21:55:53] Right [21:56:06] And adding primary keys [21:56:07] can that be done with INSERT SELECT? [21:56:21] on a depooled slave? [21:56:37] that's why I say hours [21:56:40] What? [21:57:00] TimStarling: You want to do a musical chairs dance? [21:57:23] I think the "new" table didn't have but an id and a few fields? [21:57:33] a title, maybe [21:58:10] a page_id, maybe? [21:58:11] thanks, All! [21:58:16] sorry, I do not have the structure in front of me right now [21:58:18] yes [21:58:18] the link between file and page is a bit strange right now, no? [21:58:19] Hm.. If we project old 'image' on top of new 'filerevision', we still need to create the new 'file' table. [21:58:21] as well. [21:58:31] you can't do INSERT SELECT on tens of millions of rows on the live master can you? [21:58:59] Okay, I'll summarise input from question 1 and 2 on the task. [21:59:00] TimStarling, I would say you cannot do it on a slave either- how do you do that an keep inserting? [21:59:08] we're just about out of scheduled time, but we'll go over 5 minutes (or longer if everyone prefers) [21:59:16] I think that would lag slaves [21:59:17] we should wrap up [21:59:35] #info  Okay, I'll summarise input from question 1 and 2 on the task. [21:59:35] We can continue talking in #wikimedia-tech jynus and TimStarling if you have time, and anyone else is welcome to join. [21:59:54] I can't stay for long today [22:00:10] #info  We can continue talking in #wikimedia-tech jynus and TimStarling if you have time, and anyone else is welcome to join. [22:00:15] TimStarling: Id like to make sure I understnad what you meant though. [22:00:52] it really just needs design work [22:01:05] whether it can be done in SQL should be more obvious once the design is further along [22:01:24] Yeah, I think migration is clear enough for now that we know we have several optoins. [22:01:38] My advice: avoid sql and insert selects [22:01:43] We know that it can be done in a performant way. The details can be figured out in code review and elsewhere. [22:02:09] For the actual migration that is. As long as we know we can do it. [22:02:09] ok, is that what we should wrap up on? [22:02:30] 60 second warning [22:02:42] I guess it's a bit late for last-call now. I'll finalise over the coming week and propose for next week (no rfc meeting but just the last-call anouncement) [22:02:50] sounds good? [22:03:07] ah, right, yeah, I think that sounds good [22:03:27] we don't have a full design to approve at the moment, there are a lot of (literal) question marks in the phab task [22:03:45] Krinkle: you're proposing that you do the design, put it up, give people one week in last call? [22:03:45] we can discuss procedure in next week's committee meeting [22:04:30] robla: one week from next week. [22:04:33] yup. ok. now 30 second warning :-) [22:04:47] thanks everyone! [22:04:58] TimStarling: Yeah, but good stuff to think about :) [22:05:09] Thanks! [22:05:12] #endmeeting [22:05:13] Meeting ended Wed Aug 31 22:05:12 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:05:13] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-08-31-21.04.html [22:05:13] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-08-31-21.04.txt [22:05:13] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-08-31-21.04.wiki [22:05:13] Meeting ended Wed Aug 31 22:05:12 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:05:13] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-08-31-21.04.html [22:05:13] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-08-31-21.04.txt [22:05:13] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-08-31-21.04.wiki [22:05:14] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-08-31-21.04.log.html [22:05:14] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-08-31-21.04.log.html