[07:05:54] 10DBA, 13Patch-For-Review: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#3056868 (10Marostegui) dbstore1001 is looking good after the weekend! :-) Let's wait till Wednesday to see how backups got generated and if all went good, I believe we should be good to close this t... [07:07:37] 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3056883 (10Marostegui) db2070: ``` Table: revision Create Table: CREATE TABLE `revision` ( `rev_id` int(8) unsigned NOT NULL AUTO_INCREMENT, `rev_page` int(... [07:31:59] 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3056893 (10Marostegui) I am now going to alter the RC slaves on codfw, which they both need bigger alter tables as they have all the indexes a bit messed up. The follow... [08:03:53] 10DBA, 13Patch-For-Review: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3056934 (10Marostegui) I am running pt-table-checksum again on s2 (keeping the replication filters for the rc slaves - db1036 and db2035.) I have added `--no-check-re... [09:46:37] 10DBA, 13Patch-For-Review: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3057275 (10Marostegui) The table has been checksummed already and there was no lag generated, it took around 1:30h in the end. Now it is checking for the differences... [09:51:09] 10DBA, 13Patch-For-Review: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3057279 (10jcrespo) @Marostegui Are you checking the dbs with filters? It may be stuck trying to read the rows from those hosts, which will never arrive: http://kedar... [09:52:43] 10DBA, 13Patch-For-Review: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3057280 (10Marostegui) >>! In T154485#3057279, @jcrespo wrote: > @Marostegui Are you checking the dbs with filters? It may be stuck trying to read the rows from those... [09:59:35] 10DBA, 13Patch-For-Review: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3057287 (10jcrespo) You can keep them on the replication list, so they cannot lag, but run it with `--no-replicate-check`, we can do the checks later, independently. [13:50:59] 10DBA, 10Gerrit, 06Operations, 06Release-Engineering-Team, 13Patch-For-Review: Gerrit: Schedule downtime to migrate db to utf8mb4 - https://phabricator.wikimedia.org/T155764#3057861 (10ema) p:05Triage>03Normal [14:06:26] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3057875 (10jcrespo) I am prepared for this (e.g. backups and prepared to recover ASAP), but I will not drop anything until just before Dereckson or anyone else start the re... [14:07:14] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#2025865 (10jcrespo) a:05jcrespo>03None [14:13:22] 10DBA, 13Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418#3057885 (10Marostegui) I have manually deployed gtid_domain_id on all the hosts in s2. Will prepare the patch to enable it on the config file across production in a bit, and deploy it in a coupl... [14:13:44] ping me when you are back for revision table fun [14:14:00] I am here :) [14:14:03] let's have fun [14:14:28] so my best guess for revision on rcs is current db1045:dewiki current structure [14:14:36] let's see [14:15:07] https://phabricator.wikimedia.org/P4992 [14:15:52] I am going to pool it anyway soon, when I check wikidata [14:15:56] to verify that [14:16:30] so, from that, comapred to db1026 we'd only miss KEY `page_user_timestamp` (`rev_page`,`rev_user`,`rev_timestamp`,`rev_id`) [14:16:34] the rest is the same [14:16:41] yeah, that is what I meant [14:16:51] but it is not documented anywhere [14:16:53] let me check one thing [14:17:08] it is also closer to enwiki, which never gave problems [14:17:16] and are the oldest ones with that structure [14:18:05] it makes sense, as in the non-rc slaves we have: KEY `page_user_timestamp` (`rev_page`,`rev_user`,`rev_timestamp` [14:18:27] when was that index added? [14:18:42] was that recently, and that is why those slaves do not have it? [14:18:54] is that the #1 change you do for others? [14:19:17] https://phabricator.wikimedia.org/T148967 [14:19:20] it was added there [14:19:32] and I guess it wasn't added to the RC slaves because they were RC slaves and we were not 100% sure [14:19:45] no, I mean mediawiki-wise [14:19:54] ah [14:20:00] that I don't know [14:20:01] is that generally missing? [14:20:08] if yes, that is good [14:20:21] I guess it is, because enwiki rc slaves do not have it for example [14:20:22] if not, is there are reason why it was missing only there? [14:20:45] (in codfw) [14:21:10] sorry, my font doesn't have that char on IRC [14:21:26] uh? [14:21:39] what did you wrote before "(in codfw)" [14:21:53] I said: ˜/marostegui 15:20> I guess it is, because enwiki rc slaves do not have it for example [14:22:00] after that [14:22:07] nothing [14:22:07] before the parenthesis [14:22:08] haha [14:22:13] nothing [14:22:23] i just wrote: ˜/marostegui 15:20> (in codfw) [14:22:30] there is a char for me [14:22:37] very weird [14:23:46] so the plan is to have 6 indexes everywhere [14:23:49] the ones on table.sql [14:23:54] tables.sql [14:24:11] exactly as they are everywhere, except on rc slaves [14:24:28] it is weird that that index exists on eqiad slaves but not on codfw [14:24:35] and the pasted definition (the 6 with extra rc_id) [14:24:36] as we haven't touched those recently [14:24:37] oh, no [14:24:39] I added that [14:24:42] just now [14:24:51] ah, then they are missing everywhere most likely [14:24:52] I wanted to show you [14:25:10] plus the modified PK [14:25:25] also, I would have to change the pager scripts [14:25:41] to use the table.sql as a base [14:25:46] rather than anything else [14:25:58] https://phabricator.wikimedia.org/P4992 -> so this is going to be the canonical schema? [14:26:02] (which is fine by me) [14:26:13] for rc, yes [14:26:15] but I was asking [14:26:18] more than asserting [14:27:09] the extra rev_id may not be needed for most indexes [14:27:15] but it is needed for at least one [14:27:29] because of the partitioning [14:27:38] so better having it in advance [14:27:43] plus, it is like that on enwiki [14:28:10] yes, it makes sense to me to have it like that [14:28:15] there they have the rev_id [14:28:21] but not the extra index [14:28:49] how was db1026? [14:28:56] was it missing the index too? [14:28:59] yes [14:29:36] it was nowhere regarding the rcs [14:30:05] the thing is, those indexes are not cheap, they take double the data size [14:30:52] but I prefer something that works and is uniformized [14:31:03] and we can later optimize [14:31:08] except if [14:31:20] adding more indexes forces a bad plan, that could happen, too [14:31:34] but that is not the case for the rc slaves as far as you've seen [14:31:48] and the new indexes have been in dewiki, commonswiki, wikidata for a long time already [14:31:52] well, wikidata had a really bad plan for one query msotly [14:31:55] and we haven't noticed anything, right? [14:31:57] no [14:32:06] the rc_id new col, yes [14:32:11] not the new full index [14:32:20] KEY `rev_page_id` (`rev_page`,`rev_id`) [14:33:58] on the other side, most problems are with API nodes, not rcs [14:34:28] so did you have to add the new index on most servers? [14:34:47] rc slaves or in general? [14:34:56] in general [14:35:07] yes [14:35:11] in most of them [14:35:16] or, even, all of them [14:35:37] let me blame tables.sql, then [14:36:53] points to https://phabricator.wikimedia.org/T142725 [14:37:39] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3057932 (10Dereckson) a:03Dereckson Next step is to plan a deployment window, something we initially wanted do to this morning. @jcrespo I ask greg a green light for a w... [14:37:51] aha [14:38:08] they just convert the UNIQUE into an index because the PK was also modified [14:38:37] and there was multiple confusion there [14:38:46] because unique != PRIMARY [14:39:05] so we should as confident as "nothing unexpected breaks" [14:39:39] so I would like to take care of all rcs [14:39:55] sure, feel free [14:39:57] starting with eqiad because it is outage-important [14:40:05] I can finish db2034 and the other one in enwiki [14:40:08] but if you feel the need to do others [14:40:10] (for codfw) [14:40:18] ok :) [14:40:20] please just ping me and we can both give it a look [14:40:23] yeah [14:40:25] will do [14:40:53] I will put a comment on the rampant ticket [14:41:08] the meta one [14:41:10] sounds good! [14:49:27] 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3057955 (10jcrespo) So, after moths of archeology, trying to understand the reasons why things are they way they are without breaking anything in the way, the final sta... [14:51:02] But then there is https://phabricator.wikimedia.org/T102532 [14:52:04] yes, but how is db1026 behaving with the new index? [14:52:34] the new is not the one that was questioned to be deleted [14:53:12] have you added page_user_timestamp recently? [14:53:22] let me check [14:53:54] on dewiki yes [14:53:56] let me check commons [14:55:10] commons too [14:55:37] enwiki has the index, right? [14:55:56] yes [14:56:01] because the initial ticket claim was that it wasn't there in the first place [14:56:15] not all the servers in enwiki though [14:56:48] let's make a list on eqiad-production [14:57:43] actually [14:57:50] all the servers but the old master (db1057) [14:57:51] have the index [14:58:18] so maybe T102532 is no longer a problem [14:58:18] T102532: revision page_user_timestamp index problematic on large wikis - https://phabricator.wikimedia.org/T102532 [15:01:05] 10DBA, 10MediaWiki-Database, 07Performance: revision page_user_timestamp index problematic on large wikis - https://phabricator.wikimedia.org/T102532#3057975 (10jcrespo) @Anomie @Marostegui I want your thoughts on this- I would like to proceed with the plan for revision as stated here: T132416#3057955 That... [15:08:16] 10DBA, 10MediaWiki-Database, 07Performance: revision page_user_timestamp index problematic on large wikis - https://phabricator.wikimedia.org/T102532#3057987 (10Marostegui) >>! In T102532#3057975, @jcrespo wrote: > @Anomie @Marostegui I want your thoughts on this- I would like to proceed with the plan for re... [15:22:31] 10DBA, 13Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418#3058052 (10Marostegui) So the above patch is ready to be deployed in production. I will roll it out in a couple of days. I know I am being careful, but I prefer to be so than enabling it on s2 t... [15:22:34] 10DBA, 10MediaWiki-Database, 07Performance: revision page_user_timestamp index problematic on large wikis - https://phabricator.wikimedia.org/T102532#3058053 (10Anomie) >>! In T102532#3057975, @jcrespo wrote: > @Anomie @Marostegui I want your thoughts on this- I would like to proceed with the plan for revisi... [15:30:52] 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3058116 (10jcrespo) [15:30:58] 10DBA, 10MediaWiki-Database, 07Performance: revision page_user_timestamp index problematic on large wikis - https://phabricator.wikimedia.org/T102532#3058114 (10jcrespo) 05Open>03declined > if we later decide that page_user_timestamp or any other index doesn't work generally on the cluster for some reas... [17:12:11] 10DBA, 13Patch-For-Review: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3058532 (10Marostegui) I've got a list of tables per wiki from s2 that do not have a PK (unfortunately they are not the same across all the wikis, although they are p... [17:43:02] 10DBA, 10MediaWiki-Categories, 03Community-Tech-Sprint, 13Patch-For-Review: Increase size of categorylinks.cl_collation column - https://phabricator.wikimedia.org/T158724#3058644 (10kaldari) @thiemowmde: We seem to be at a stalemate, which is blocking the merge of two other features. What are your thoughts...