[09:28:52] volans, how busy are you? [09:31:43] working on the puppet config for TLS, what can I do for you? :) [09:33:21] I do not want to move you away from that, but do you think setting up a slave of a small shard will take you long? [09:34:30] should not take much "real" time, I can work while copying data [09:34:33] I am thinking of SPOF when failovered, and we do not have a proper x1 slave on codfw [09:35:01] true, we have only 2029 [09:35:09] *2009 [09:35:13] technically, the load there is very small [09:35:19] and we already have dbstores [09:35:33] but having one extra host, just in case would be great [09:35:44] it's not symmetrical, this it's already a good reason :-P [09:36:02] do we have a spare host I could use? [09:36:07] Compulsive-obsessive disorder? :-) [09:36:16] we "don't" [09:36:20] but we can get one [09:36:59] maybe using one of these: https://phabricator.wikimedia.org/T125827 [09:37:02] are you thinking of one of the old es? [09:37:29] nah, an even older one will serve, this is only "just in case" [09:37:36] ok [09:38:01] it sould not take more than 1-2 hours, because x1 is "small" [09:39:05] but if I can assing that to you, and then you can do it in any way you want, it would be great [09:40:10] sure, I can do it, no prob [09:40:36] any preference between 2001-2007 for the replacement? [09:40:37] https://phabricator.wikimedia.org/T130098 [09:41:10] not really, if some has been "cleaned-up" (recently reimaged), chose that [09:41:18] so you do not need to reimage again [09:41:45] I'll check to get the "best" one :) [09:41:57] they should all be the same specs [09:42:56] I do not intend to use it, but I do not want to discover too late that db2009 has some irrecoverable problem [09:45:09] ok [09:52:17] so 2006 and 2008 don't have puppet running but are like the others, Ubuntu 14.04, I'll have to reimage one, I'll probably take 2008 that not beeing configured I'm even more sure is not used [09:54:01] when chosen, put a note/alter title on the decom task, so they are not stopped by accident [09:54:12] yep :) [09:55:15] also, some dbstores replicate from x1, but not all of it (just a subset of the databases). Warning if you are going to clone from there [10:00:28] but given that I don't have any other slave I can depool in eqiad from where to copy the data I guess I'm forced to copy from 2009. Given that is not in production and the DB is ~110GB what do you think to use xtrabackup? should not affect the dbstore slaves [10:02:57] it is ok, you do not need to ask me every time, I trust your judgement [10:03:16] if I didn't, I wouln't let you touch production :-) [10:04:21] I'm always wondering if there is a better way that you might know :) [10:04:45] [and if there is, you probably know it ;) ] [10:05:13] it was ok at first, where you didn't know all our deployment quirks and issues [10:05:23] you are past that [12:30:44] hey again jynus! :) So it looks like wl_id went in! [12:31:09] any rough estimate on a timeline for adding the field to all prodctions dbs? (I'm aware it will be a long while)! :) [13:31:02] within 6 months [13:32:00] (I am giving it high priority) [14:24:51] jynus: Internal MariaDB error code: 1030 [14:25:43] ha! that is like saying "sky is blue" [14:25:54] rotfl [14:26:22] db bnwikisource, table shorturls, from ShortUrlUtils::encodeTitle [14:26:32] probably table corruption [14:26:38] mysqld: Aria engine: checkpoint failed [14:26:39] I do not know why that is in ARia [14:26:59] I can imagine, tokudb gives problems for s3 [14:27:45] can you take over? It is probably just rebuilt the table, and that is a very small one [14:29:35] ok, I'm checking [14:32:54] Page 3010: Wrong data in bitmap. Page_type: 1 full: 0 empty_space: 2916 Bitmap-bits: 4 'full' [14:33:28] a repair table should be enough? (I'm not familiar with Aria) [14:37:17] yes, if it isn't, reimport [14:38:46] the table is fixed now, restarted slave but now it gives [14:39:54] same error [14:40:00] and again corrupted :( [14:40:48] dump, drop, recreate? [14:41:21] I do not like Aria, it is a more complex engine with a false sense of security [14:41:49] indeed the table is created with PAGE_CHECKSUM=1 [14:43:53] yep, tried to repair again and now it found a duplicate key and removed an item, but fail again, I'll dump and re-import it [14:44:19] reimport it as innodb or tokudb, preferible the first [14:44:45] I do not trust toku or aria (that is why we only use innodb in production) [14:47:11] I will try to finish soon a "reimport from production" script I have for labs and share it on the common repo [14:47:44] great [16:13:58] jynus: just wondering, if we want to make another schema change the the watchlist table (introducing wl_timestamp) would roll out both patches at the same time or one first and then the other [16:13:59] ? [16:14:34] ie. would the 2 col additions at once take the same ammount of time as the single col addition? [16:15:02] I imagine the timestamp col would be easier as it is not the PK etc and wouldnt really need populating [16:24:54] at the same time would take the same amount of time [16:25:17] in fact, the PK would be a hard blocker for any other change, unless it is trivial [16:25:23] (on that table) [16:25:41] once there is a PK, changes become "easy" [16:25:51] without it, it involves a lot of downtime [16:26:10] that is why PKs are so important and why I am pushing for them [16:27:07] in general, the more changes we can pack at the same time, the better [17:43:13] * volans start thinking that dbstore2002 is in a shutdown deadlock :) [17:50:31] still stopping? Then yes probably blocked on myisam [17:51:12] kill it in the most clean possible way (kill mysqld_safe, then mysqld with sigiint, then with -9 [17:52:11] we may have to rebuild it fully, and that is a lot of time (unless we clone dbstore2001) [17:52:20] yes, full of FUTEX_WAIT_PRIVATE in strace, probably waiting each other. I'll be gentle :) [17:52:47] we will also want to kill aria everywhere [17:53:19] better single tables easily corrupted (but also easyly reimported) [17:53:26] than that [17:53:49] fully agree, why it was changed to aria given that the original is innodb? [17:54:19] I suppose dbstores were initially in myisam, before tokudb was possible [17:54:35] and it carried on, as codfw was not fully production [17:54:43] (I did not took that decision)