[08:17:18] 10DBA: Remove muswiki and mhwiktionary from s3 - https://phabricator.wikimedia.org/T260112 (10Marostegui) Deletion progress: NOT NEEDED labsdb1012 NOT NEEDED labsdb1011 NOT NEEDED labsdb1010 NOT NEEDED labsdb1009 [] dbstore1004:3313 [] db2127 [] db2109 [] db2105 [] db2098 [] db2094:3313 [] db2074 [] db1124:3313... [08:23:24] https://mariadb.com/kb/en/mariadb-10416-release-notes/ the blackhole fix is included [08:23:32] I will try to create the packages for it during the week [08:26:25] oh nice [08:27:48] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) >>! In T266483#6601121, @jcrespo wrote: > BTW, check if prometheus exporter daemon needs a shake on restarted host, there is quite a few collection failures showin... [08:32:15] 10DBA: Remove muswiki and mhwiktionary from s3 - https://phabricator.wikimedia.org/T260112 (10Marostegui) This has been cleaned up on codfw. Waiting a few hours before going for eqiad. [09:50:41] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) s6/s7 in codfw are done, xcluding the sanitarium masters + sanitariums, and dbstore hosts. [09:59:04] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [10:00:35] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [10:03:16] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [10:09:06] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [10:33:48] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [10:58:51] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [11:02:16] 10DBA, 10Patch-For-Review: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) >>! In T261717#6598934, @Marostegui wrote: > On-going transfers: > es1011 -> es1026 > es1012 -> es1027 > es1014 -> es1028 > es1026 pooled in es2 es1027 pooled in es1 es1028 pooled... [11:02:25] 10DBA, 10Patch-For-Review: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) [11:03:07] 10DBA, 10Patch-For-Review: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) On-going transfers: es1016 -> es1029 es1013 -> es1030 es1017 -> es1031 [11:12:55] jynus: are you using db2102? i'd like to restart mariadb for T266483 [11:12:56] T266483: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 [11:13:12] it's the `test-s1` host [11:14:43] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [11:16:42] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [11:17:09] what's db2102, test backups? [11:17:40] yes, you can restart it, I am using the test-s1 on eqiad only (1133) [11:17:45] i don't know. it's in the `test-s1` section, running `mariadb::core_test` [11:17:47] ok cheers. [11:18:10] both of those are test-backup hosts [11:18:42] I will be absent soon due to reasons on my calendar [11:18:54] 👍 [11:22:21] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [11:32:27] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [11:46:21] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [11:48:54] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [11:52:32] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [11:53:43] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [11:54:38] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [11:55:31] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [12:08:00] PROBLEM - MariaDB sustained replica lag on db2076 is CRITICAL: 1742 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [12:08:21] ^ I started replication there [12:08:23] see -operations [12:11:29] marostegui: oops, thanks for catching that [12:11:39] :* [12:13:27] oh, cute. i can make icinga show me all non-OK services: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=^db2&servicestatustypes=29 [12:13:40] RECOVERY - MariaDB sustained replica lag on db2076 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [12:42:36] 10DBA, 10Orchestrator: Investigate moving replicas around with Orchestrator doesn't result on skipped transactions - https://phabricator.wikimedia.org/T267133 (10Marostegui) More testing: ` root@dborch1001:~# orchestrator -c move-up -i db1077 2020-11-04 12:41:10 DEBUG Hostname unresolved yet: db1077 2020-11-04... [13:13:43] back [13:40:13] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10Cmjohnson) a:05RobH→03Jclark-ctr John, on Thursday can you swap the motherboard out please. The new one is the flex space. [13:41:50] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) I will have db1139 down and downtimed for a day by Thursday, unless you tell me not to. [13:45:31] 10DBA: Database for link recommendation service - https://phabricator.wikimedia.org/T267214 (10kostajh) [13:47:53] 10DBA, 10Orchestrator: Investigate moving replicas around with Orchestrator doesn't result on skipped transactions - https://phabricator.wikimedia.org/T267133 (10Marostegui) Some investigation shows strange results, this requires more digging: ` REPLACE /* SqlBagOStuff::updateTable */ INTO `pc200` (keyname,v... [14:05:22] 10DBA, 10Orchestrator: Investigate moving replicas around with Orchestrator doesn't result on skipped transactions - https://phabricator.wikimedia.org/T267133 (10Marostegui) Working with GTID this time shows more reliability (expected too): ` root@dborch1001:~# orchestrator -c relocate -i db1077 -d pc1010 2020... [14:11:36] 10DBA: Database for link recommendation service - https://phabricator.wikimedia.org/T267214 (10LSobanski) @kostajh what is your preferred time horizon for this DB to be available? [14:19:39] 10DBA: Database for link recommendation service - https://phabricator.wikimedia.org/T267214 (10kostajh) @LSobanski is next week a possibility? What options would work for your team? [14:44:44] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10jcrespo) [14:45:23] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10jcrespo) [14:46:38] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10jcrespo) I restarted mysql on all backup source instances, as well as the backup testing host. Will run now a script to double check they set the variable correctly. [14:48:18] I just thought that report_port wouldn't be set correctly and all the restarts I just did wouldn't work for multi-instance hosts [14:48:29] but I was happy it reports correctly on the master :-D [14:48:35] (the port) [14:48:52] why wouldn't it work for multiinstance hosts? [14:49:07] I thought it would report port = 3306 [14:49:23] but it doesn't, it works as intended [14:49:33] ah [14:49:40] as the master doesn't really knew the port of the replica, doesn't use it [14:49:47] but it does the right thing [14:50:03] * jynus sights in relief! [14:54:53] 10DBA, 10Operations: db2077 hung on reboot - https://phabricator.wikimedia.org/T267220 (10Kormat) [15:06:58] 10DBA, 10Operations: db2077 hung on reboot - https://phabricator.wikimedia.org/T267220 (10Kormat) I did a second reboot while attached to the console. It hung at "Loading ramdisk..." for a minute or two, and then finally booted successfully. [15:07:12] 10DBA, 10Operations: db2077 hung on reboot - https://phabricator.wikimedia.org/T267220 (10Kormat) [15:07:15] 10DBA, 10Operations: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Kormat) [15:07:30] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10Jclark-ctr) @jcrespo thanks please have host down will change mainboard tomorrow [15:08:40] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) Thanks, will do and report here when done (will do on my -Europe- morning). [15:35:13] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [15:36:02] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [15:40:21] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Platform Team Initiatives (MCR Schema Migration), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10RhinosF1) Can this be unstalled now? [15:41:38] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10RhinosF1) Should this task be unstalled? [15:42:27] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) 05Stalled→03Open It should, fixing :) [15:44:51] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Platform Team Initiatives (MCR Schema Migration), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Kormat) 05Stalled→03Open Unstalling,... [21:40:37] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Urbanecm)