[05:58:43] 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3246305 (10Marostegui) After almost 5 days, dbstore1001 is done: ``` root@neodymium:~# mysql --skip-ssl -hdbstore1001 enwiki -e "show create table revision\G" *********... [06:29:23] 10DBA: db2057 replication broken due to delete on echo_notification and echo_event - https://phabricator.wikimedia.org/T164815#3246348 (10Marostegui) [06:29:52] 10DBA: db2057 replication broken due to delete on echo_notification and echo_event - https://phabricator.wikimedia.org/T164815#3246360 (10Marostegui) 05Open>03Resolved a:03Marostegui I am resolving this because replication is no longer broken. [06:30:15] 10DBA: db2057 replication broken due to delete on echo_notification and echo_event - https://phabricator.wikimedia.org/T164815#3246363 (10Marostegui) [06:32:31] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3246364 (10Marostegui) [06:34:52] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3246365 (10Marostegui) arwiki is done: Differences found on the archive table (the same for all the hosts): ``` Differences on db1086 TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY arwiki.archive 13786 0 1 PRI... [06:42:54] 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3246370 (10Marostegui) 05Open>03Resolved a:03Marostegui The scope of this ticket (enwiki) is done. All the hosts have the new table schema in place. The only hos... [06:54:44] I would like to remove replication on s6 eqiad - codfw to start with tag_summary. change_tag and watchlist reimport into codfw. For that I'd like to run: db1061 -> stop slave; reset slave all; [06:54:54] So whenever you have time, please confirm that makes sense :) [06:55:00] ok [06:56:39] FYI, I'll be removing rpcbind/nfs-common on various analytics hosts today for https://phabricator.wikimedia.org/T106477#3236038 (they're blocked by ferm anyway) [07:01:44] database hosts ofc [07:02:27] ok, to me, no issue- as long as it doesn't bring down the servers :-) [07:02:42] I'm quite sure it won't :-) [07:13:14] 10DBA, 06Operations, 13Patch-For-Review: es2019 crashed again - https://phabricator.wikimedia.org/T149526#3246422 (10Marostegui) 05Open>03Resolved As there were no differences found by compare.py I have repooled this host. Finger crossed so it doesn't crash anymore! [07:17:08] jynus: the "ok" was as in: the command and host looks good or as in: I will check and let you know :) [07:20:38] oh, I forget aviation banned ok as a response :-) [07:20:48] hahaha [07:21:29] Yes, we have to pretty much read back all the important information as the controller says it to you [07:21:42] Warrior three five foxtrot, Boston Tower, runway two two right, cleared for immediate takeoff. [07:21:47] HAHAHAHAHAHAHAHAH [07:22:06] clear for takeoff runway 22R, W35F! [07:22:57] there are also some banned words you cannot say until the controller says it to you first, like takeoff [07:31:48] jynus: marostegui how does moving forward with cognate today sound? :) [08:06:43] 07Blocked-on-schema-change, 10DBA: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#3246538 (10Marostegui) [08:07:06] 07Blocked-on-schema-change, 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, and 3 others: Add wl_id to watchlist tables on production dbs - https://phabricator.wikimedia.org/T130067#3246539 (10Marostegui) [08:28:11] so marostegui you stopped db1050 in sync with db2028 and are not importing it? [08:28:16] *now [08:28:57] yes [08:29:11] did you do it manually or did you use the script? [08:29:27] i did it manually [08:29:32] which script? [08:29:35] he he [08:29:49] manually = one liners at least :) [08:30:05] ./repl.pl --stop-siblings-in-sync [08:30:08] ah [08:30:09] yes [08:30:14] I thought you meant to move tables :) [08:30:20] no no [08:30:29] yeah, to stop them, yes, i used that one [08:30:52] and no issues, I assume (I haven't touch it after GTID for that) [08:31:05] nope, no issues at all [08:31:08] they stopped correctly [08:31:11] on the same position [08:31:16] good [08:33:03] one last question, as I won't be here from thursday [08:33:32] should I try to do the PKs on s6 now or just do it later [08:33:45] I was going to say, that you can try today, I think i will be done in 1 hour or so [08:33:46] now == after your imports finish [08:33:51] So I was planning to leave the replication stopped [08:33:53] so you can go for s6 [08:34:03] that is my question, s6 no problem [08:34:04] I will move to another shard, stop replication and continue [08:34:10] but what about the others [08:34:23] We can do it in parallel if you want [08:34:25] ok, so the plan is to not reconnect the replication [08:34:37] as after all, it will not affect replication itself [08:34:39] You can take another shard, do it there, and when you are done, I will do my changes [08:34:48] Yeah, I am not reenabling replicaiton until you are done :) [08:34:54] no, I am not in a hurry [08:35:08] I was actually asking the other way- can this take longer? [08:35:17] sure [08:35:19] because I won't be here first [08:35:21] as you wish :) [08:35:26] and then you will be off one day [08:35:44] I will leave replicaiton stopped on s6 today if you want. So you can try to do that one between today and tomorrow [08:35:47] and we can restore replication [08:35:53] but I don't really mind :) [08:36:16] I think we can leave replication stopped for a few days [08:36:33] I am stopping only replication on the shard I am working on [08:48:49] just to be clear- I do not need replication stopped- in fact I need it running- just reseted on the codfw->eqiad direction [08:49:25] you don't need replication stopped? [08:49:36] I mean stopped on codfw -> eqiad [08:49:44] I thiknk we are talking about the same thing with different words [08:49:52] So you need what is currently happening on s6, right? [08:49:57] eqiad master does not replicate from codfw master [08:50:23] then yes, I think stopped meant what you are doing on import [08:50:30] yeah [08:50:39] stopped means: I have done reset slave all on eqiad master :) [08:50:51] that doesn't mean stopped to me [08:50:59] stopped to me means STOP SLAVE; [08:51:17] ok, let's call it reseted then :) [08:51:31] reseted for me means RESET SLAVE ALL; [08:51:41] ok, so then yes, it is reseted :) [08:51:50] and stopped? [08:52:02] yes, i ran stop slave; reset slave all [08:52:29] but I see db2028 stopped [08:52:39] yes, until the import finishes [08:52:46] db1061 (eqiad master): stop slave reset slave all; [08:52:50] db2028: stop slave [08:52:54] that is what I do not need [08:53:06] you can start replication on db2028 [08:53:13] when you finish [08:53:18] ok, will do as soon as the last import is done (currently running) [08:53:55] Look, you said " So I was planning to leave the replication stopped" [08:53:59] that confused me [08:54:04] haha yeah, I meant reseted [08:54:06] :) [08:58:01] import finished on s6, replication started on db2028 and db1050 [08:59:17] 10DBA, 10RESTBase, 10Reading List Service, 06Services (watching), 15User-mobrovac: Investigate requirements for MySQL access in RESTBase - https://phabricator.wikimedia.org/T164805#3245958 (10GWicke) MediaWiki does indeed have a lot of features for working with MySQL at scale. However, not all of those f... [09:00:23] 07Blocked-on-schema-change, 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, and 3 others: Add wl_id to watchlist tables on production dbs - https://phabricator.wikimedia.org/T130067#3246625 (10Marostegui) s6 codfw is done: ``` root@neodymium:/home/marostegui/git/software/dbtools# for i in frwiki... [09:00:33] 07Blocked-on-schema-change, 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, and 3 others: Add wl_id to watchlist tables on production dbs - https://phabricator.wikimedia.org/T130067#3246626 (10Marostegui) [09:01:12] 07Blocked-on-schema-change, 10DBA: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#3246628 (10Marostegui) s6 in codfw is done: ``` root@neodymium:/home/marostegui/git/software/dbtools# for i in frwiki jawiki ruwiki; do echo $i; mysql --s... [09:01:29] 07Blocked-on-schema-change, 10DBA: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#3246629 (10Marostegui) [09:06:11] 10DBA, 10RESTBase, 10Reading List Service, 06Services (watching), 15User-mobrovac: Investigate requirements for MySQL access in RESTBase - https://phabricator.wikimedia.org/T164805#3246636 (10GWicke) /cc @aaron @jcrespo [09:09:38] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3246644 (10Marostegui) cawiki is done Some of the hosts (including dbstore1002) have a few differences on the following tables: ``` archive logging revision user ``` [09:09:40] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3246645 (10Marostegui) [09:10:08] 10DBA, 10RESTBase, 10Reading List Service, 06Services (watching), 15User-mobrovac: Investigate requirements for MySQL access in RESTBase - https://phabricator.wikimedia.org/T164805#3246646 (10jcrespo) For a "new" service with modest number of writes and no current concerns for cross-dc consistency, we ca... [09:12:56] jynus: I want to reset replication on codfw -> eqiad for s4, so I want to run: db1068: stop slave; reset slave all [09:13:00] confirm if that looks good [09:13:34] I confirm it looks good [09:13:41] thanks :) [09:21:52] 10DBA, 10RESTBase, 10Reading List Service, 06Services (watching), 15User-mobrovac: Investigate requirements for MySQL access in RESTBase - https://phabricator.wikimedia.org/T164805#3246657 (10jcrespo) The above is storage-wise. Architecture wise (and I have no power here), using storage as a shared mecha... [09:27:48] 10DBA, 06Operations, 10Phabricator, 10ops-eqiad: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3246659 (10Marostegui) This just happened again: ``` root@db1048:~# megacli -AdpBbuCmd -a0 BBU status for Adapter: 0 BatteryType: BBU Battery State: Unknown Battery backup ch... [09:34:22] 10DBA, 06Operations, 10Phabricator, 10ops-eqiad, 13Patch-For-Review: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3246675 (10Marostegui) And after the manual relearn it is back to normal state: ``` root@db1048:~# megacli -ldinfo -l0 -a0 | grep Policy Default Cache Policy... [10:52:34] 10DBA, 10RESTBase, 10Reading List Service, 06Services (watching), 15User-mobrovac: Investigate requirements for MySQL access in RESTBase - https://phabricator.wikimedia.org/T164805#3247611 (10GWicke) @jcrespo: I think we are all on the same page that a single service should exclusively own and access its... [10:56:09] 10DBA, 10RESTBase, 10Reading List Service, 06Services (watching), 15User-mobrovac: Investigate requirements for MySQL access in RESTBase - https://phabricator.wikimedia.org/T164805#3247614 (10jcrespo) That is the part that I would ask to the platform team 0:-) [11:19:05] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#3247664 (10Marostegui) s4 in codfw is done: ``` root@neodymium:/home/marostegui/git/software/dbtools# mysql --skip-ssl -hdb2019.cod... [11:19:20] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#3247665 (10Marostegui) [11:20:31] 07Blocked-on-schema-change, 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, and 4 others: Add wl_id to watchlist tables on production dbs - https://phabricator.wikimedia.org/T130067#3247666 (10Marostegui) s6 in codfw is done: ``` root@neodymium:/home/marostegui/git/software/dbtools# mysql --skip... [11:20:55] 07Blocked-on-schema-change, 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, and 4 others: Add wl_id to watchlist tables on production dbs - https://phabricator.wikimedia.org/T130067#3247680 (10Marostegui) [11:29:38] I am going to reset replication on s5 codfw-> eqiad so: db1063: stop slave ; reset slave all [11:34:31] good [11:34:52] cheers! [12:30:00] hi [12:31:15] nevermind, want to ask about https://phabricator.wikimedia.org/T162539 (wb_terms) but see it's being done in codfw now [12:32:56] yes, there is only one host pending, which is being done at the moment [12:33:06] cool [12:33:09] thanks :) [12:33:09] Once it is done, I will double check again it is in the whole wikidata shard [12:33:12] and close the ticket [12:33:13] ok [13:17:41] 10DBA, 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: MySQL monitoring with prometheus - https://phabricator.wikimedia.org/T143896#3247903 (10jcrespo) [13:21:57] 10DBA, 06Operations: In some database hosts, performance schema loses digest statistics - https://phabricator.wikimedia.org/T164834#3247935 (10jcrespo) [13:23:09] 10DBA, 06Operations: In some database hosts, performance schema loses digest statistics - https://phabricator.wikimedia.org/T164834#3247953 (10jcrespo) Only added volans because once we didn't see a query being registered, and this is the explanation why- but it should be fixed (you do not need to do anything,... [13:24:29] jynus: thanks! ^^^ [13:24:58] I saw this while debugging x1 [13:25:24] the thing is that we limit digests to 10000 thinking no server will do more than 10000 "essentially" different queries [13:25:37] we actually do it :-) [13:26:04] but giving more entries is more memory usage, so we will see if we clean up after X iterations or what [14:03:24] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3248114 (10Marostegui) eswiki is done. Some of the hosts (including dbstore1002) have a few differences on the following tables: ``` archive ``` dbstore1002, has differences on: ``` archive geo_tags revision ``` [14:03:33] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3248115 (10Marostegui) [14:08:18] 10DBA, 06Operations, 10Phabricator, 10ops-eqiad, 13Patch-For-Review: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3248124 (10Cmjohnson) @Marostegui Let me know if you want to do the bbu swap today? [14:35:34] 10DBA, 06Operations, 10Phabricator, 10ops-eqiad, 13Patch-For-Review: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3248207 (10Marostegui) 05Open>03Resolved @Cmjohnson has changed the battery, we will see how it goes. ``` root@db1048:~# megacli -AdpBbuCmd -a0 BBU stat... [14:36:57] 10DBA, 06Operations, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3248215 (10Cmjohnson) [14:36:58] 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decommission db1040 - https://phabricator.wikimedia.org/T164057#3248213 (10Cmjohnson) 05Open>03Resolved wiped, removed from rack, switch was already updated, racktables updated. [14:37:10] 10DBA, 06Operations, 10Phabricator, 10ops-eqiad, 13Patch-For-Review: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3248218 (10Marostegui) I am going to leave m3-slave pointing to the codfw master, until tomorrow just in case. If the host goes fine overnight, I will revert... [14:43:20] ^if it is because you will be gone soon, I can do it [14:43:44] I do not like the idea of having cross-dc queries for long [14:43:54] jynus: no no, it is because I just want to make sure the host has no issues with the new BBU [14:44:10] jynus: but if you are going to be working till late (YOU SHOULD NOT!) feel free to do it! [14:44:13] what better than a bit of load :-) [14:44:14] haha [14:44:25] if the reports fail, they can be run again [14:44:38] that is true [14:44:43] ok, i wil revert in a bit then :) [14:45:05] and better having a bit of load than nothing [14:52:21] reverted! [15:15:11] 10DBA, 06Labs, 10MediaWiki-extensions-Linter, 13Patch-For-Review: Make "linter" table available on Labs - https://phabricator.wikimedia.org/T160611#3248410 (10Andrew) 05Open>03Resolved Now fixed on 1001 as well. [15:16:15] 07Blocked-on-schema-change, 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, and 4 others: Add wl_id to watchlist tables on production dbs - https://phabricator.wikimedia.org/T130067#3248415 (10Marostegui) s5 in codfw is done: ``` root@neodymium:/home/marostegui/git/software/dbtools# for i in dewi... [15:16:44] 07Blocked-on-schema-change, 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, and 4 others: Add wl_id to watchlist tables on production dbs - https://phabricator.wikimedia.org/T130067#3248416 (10Marostegui) [15:17:49] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#3248418 (10Marostegui) s5 in codfw is done: ``` root@neodymium:/home/marostegui/git/software/dbtools# for i in dewiki wikidatawiki;... [15:18:00] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#3248419 (10Marostegui) [16:05:53] 07Blocked-on-schema-change, 10DBA, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539#3248673 (10Ladsgroup) Hey @Marostegui, I can't see db2023 (master in codfw) gets deplooed and repo... [16:17:59] 07Blocked-on-schema-change, 10DBA, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539#3248709 (10jcrespo) Please @aude @Ladsgroup don't stress Marostegui! :-) As you can see he is curr... [16:21:18] 07Blocked-on-schema-change, 10DBA, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539#3248711 (10Ladsgroup) My apologies. I was a little bit confused about the codfw master. Thanks for... [18:02:49] 07Blocked-on-schema-change, 10DBA, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539#3249014 (10Marostegui) >>! In T162539#3248673, @Ladsgroup wrote: > Hey @Marostegui, I can't see db... [19:18:33] 10DBA, 07Tracking: Cleanup x1 database connection patterns - https://phabricator.wikimedia.org/T164504#3249230 (10Catrope) [19:21:09] 10DBA, 10ArchCom-RfC, 10MediaWiki-Database, 07RfC: Should we bump minimum supported MySQL Version? - https://phabricator.wikimedia.org/T161232#3249238 (10Reedy) >>! In T161232#3159527, @tstarling wrote: > Yes, I think it needs an initial wikitech-l announcement, then RFC meeting, then last call announcemen... [19:22:18] 10DBA, 10ArchCom-RfC, 10MediaWiki-Database, 07RfC: Should we bump minimum supported MySQL Version to 5.5? - https://phabricator.wikimedia.org/T161232#3249241 (10Reedy) [20:22:40] 10DBA, 10MediaWiki-Database, 10MediaWiki-Recent-changes: Special:RecentChanges can show recently created page as red link - https://phabricator.wikimedia.org/T129399#3249506 (10Krinkle) [22:23:19] 10DBA, 06Labs, 10wikitech.wikimedia.org: Drop Semantic Database tables from wikitech wikis - https://phabricator.wikimedia.org/T164887#3250009 (10Reedy) [22:23:28] 10DBA, 06Labs, 10wikitech.wikimedia.org: Drop Semantic Database tables from wikitech wikis - https://phabricator.wikimedia.org/T164887#3250022 (10Reedy) [22:26:02] 10DBA, 06Labs, 10wikitech.wikimedia.org: Drop Semantic Database tables from wikitech wikis - https://phabricator.wikimedia.org/T164887#3250030 (10demon) [22:56:19] 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3250098 (10demon) p:05Triage>03Low [23:02:52] 10DBA, 06Labs, 10wikitech.wikimedia.org: Drop Semantic Database tables from wikitech wikis - https://phabricator.wikimedia.org/T164887#3250110 (10bd808) [23:15:20] 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decommission db1040 - https://phabricator.wikimedia.org/T164057#3220228 (10Dzahn) is still in puppet site.pp and Icinga [23:15:27] 10DBA, 06Operations, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3250128 (10Dzahn) [23:15:29] 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decommission db1040 - https://phabricator.wikimedia.org/T164057#3250127 (10Dzahn) 05Resolved>03Open [23:35:39] 10DBA, 06Operations, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3250147 (10Dzahn) [23:35:41] 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decommission db1040 - https://phabricator.wikimedia.org/T164057#3250146 (10Dzahn) 05Open>03Resolved