[00:12:06] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 3.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [00:21:26] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 2.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [00:28:26] RECOVERY - MariaDB sustained replica lag on db2132 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [00:39:58] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [00:56:14] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 2.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [01:03:22] RECOVERY - MariaDB sustained replica lag on db2132 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [04:58:10] 10DBA, 10Patch-For-Review: Migrate codfw sanitarium hosts (db2094/db2095) to Buster and 10.4 - https://phabricator.wikimedia.org/T275112 (10Marostegui) Only s4 and s8 are still being fixed. The rest of sections are done and replication is restarted [05:02:24] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Services, and 2 others: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) We can start with that and then check how it goes. [05:10:08] 10DBA: Upgrade 10.4.13 hosts to a higher version - https://phabricator.wikimedia.org/T279281 (10Marostegui) [05:10:26] 10DBA: Upgrade 10.4.13 hosts to a higher version - https://phabricator.wikimedia.org/T279281 (10Marostegui) [05:10:55] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10SRE-tools: Upgrade mysql on db1107 (m2 db master) - https://phabricator.wikimedia.org/T280251 (10Marostegui) 05Open→03Resolved This has been done, the RO timing: start: 05:05:22 stop: 05:05:52 [05:32:14] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) Pooled db1179 into s3 with minimal weight. If it all goes fine I will slowly start to pool it entirely [05:38:31] 10DBA: Upgrade 10.4.13 hosts to a higher version - https://phabricator.wikimedia.org/T279281 (10elukey) >>! In T279281#6971906, @Marostegui wrote: > @elukey can you take care of db1108? > Thanks! >>! In T279281#6971913, @Marostegui wrote: > @elukey can you also please upgrade its kernel for T273281? @razzi can... [05:39:43] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) [05:48:03] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) s1 moved [05:48:10] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) [06:06:52] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) s2 moved [06:07:02] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) [06:33:35] o/ Just fyi again so you're not surprised. watchlist table in wikidata is now basically one third of what it used to be, it's not shrunk yet but alter tables might expose it. enwiki is also slightly smaller but not that much [06:34:09] if someone can get me list of wikis with the biggest watchlist tables. I can do more cleanups (maybe from backups?) [06:35:46] Amir1: Ohhh nice [06:35:49] Amir1: I might o [06:36:02] I might optimize it to see how much it has reduced on disk [06:36:06] sav K¨^P [06:36:10] Sorry, that was the cat [06:36:23] awww [06:36:33] I thought you have a dog [06:37:20] No, I have a cat, the dog in the photo is a friends' [06:39:15] aaah [06:39:44] I will get a dog once I manage to convince Mate :D [06:49:48] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) [06:49:52] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) s3 moved [06:52:30] 10DBA: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db1179 is being automatically pooled in s3. [06:52:39] 10DBA: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) [07:04:57] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) [07:05:04] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) s4 moved [07:15:39] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) [07:35:00] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) s5 moved [07:35:11] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) [07:45:23] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) s6 moved [07:45:32] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) [07:58:08] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) [07:59:04] Amir1: ^ s6 is done, I am going to give it another 24h to make sure nothing is forcing the index [07:59:24] oh awesome [07:59:42] Thanks. We have done some [08:00:07] the structure is so much cleaner in general [08:07:57] 10DBA, 10cloud-services-team (Kanban): Upgrade mysql on db1128 (m5 db master) - https://phabricator.wikimedia.org/T279657 (10ops-monitoring-bot) Icinga downtime set by dcaro@cumin1001 for 2:00:00 1 host(s) and their services with reason: Restarting mysql ` labstore1004.eqiad.wmnet ` [08:33:07] 10DBA, 10cloud-services-team (Kanban): Upgrade mysql on db1128 (m5 db master) - https://phabricator.wikimedia.org/T279657 (10Marostegui) This has been done. Mysql was down from 08:31:00 to 08:31:32 Thanks @dcaro and @aborrero for the support! [08:37:26] 10DBA: Upgrade 10.4.13 hosts to a higher version - https://phabricator.wikimedia.org/T279281 (10Marostegui) [08:37:34] 10DBA, 10cloud-services-team (Kanban): Upgrade mysql on db1128 (m5 db master) - https://phabricator.wikimedia.org/T279657 (10Marostegui) 05Open→03Resolved a:03Marostegui [08:39:41] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) [08:39:50] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) s7 moved [08:49:07] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) [08:49:14] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) s8 moved [08:49:33] 10DBA, 10Patch-For-Review: Test upgrading sanitarium hosts to Buster + 10.4 - https://phabricator.wikimedia.org/T268742 (10Marostegui) [08:49:39] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) 05Open→03Resolved labsdb* hosts are all now running under 10.4 sanitariums. [08:50:44] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) db1124 and db1125 can now be reimaged to buster and moved to their final destinations. [09:21:07] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [09:21:20] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) p:05Triage→03Medium [09:21:31] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [09:21:35] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [09:27:00] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1161.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20... [09:42:19] 10DBA: Migrate sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280315 (10Marostegui) [09:42:21] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [09:42:35] 10DBA, 10Patch-For-Review: Migrate codfw sanitarium hosts (db2094/db2095) to Buster and 10.4 - https://phabricator.wikimedia.org/T275112 (10Marostegui) db2094 is fully done [09:48:28] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1161.eqiad.wmnet'] ` and were **ALL** successful. [09:50:55] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) Checking db1161 tables after the reimage. [09:50:59] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [09:55:12] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1156.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20... [09:56:22] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [10:17:22] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1156.eqiad.wmnet'] ` and were **ALL** successful. [10:36:12] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10SRE-tools: Upgrade mysql on db1107 (m2 db master) - https://phabricator.wikimedia.org/T280251 (10Krenair) > OTRS: @akosiaris @Krenair Sorry, just saw this. I might not be the best of contacts for OTRS, am just an ordinary user with technical kno... [10:37:37] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10SRE-tools: Upgrade mysql on db1107 (m2 db master) - https://phabricator.wikimedia.org/T280251 (10Marostegui) Thanks @Krenair - I thought you were an OTRS admin :) [11:06:10] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 3.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [11:10:58] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [12:07:29] PROBLEM - MariaDB sustained replica lag on db2133 is CRITICAL: 3.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2133&var-port=9104 [12:09:09] RECOVERY - MariaDB sustained replica lag on db2133 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2133&var-port=9104 [12:10:04] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2072.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202104191209_marostegui_... [12:15:21] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2126.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202104191215_marostegui_... [12:32:33] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2072.codfw.wmnet'] ` and were **ALL** successful. [12:42:23] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2126.codfw.wmnet'] ` and were **ALL** successful. [12:48:13] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) db2072 and db2126 reimaged. Checking their tables now. [13:26:06] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db1182 is slowly being pooled in s2. [13:26:15] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) [13:56:31] jynus: question do you have a rough ETA on when you'll be using the media backup dbs at https://phabricator.wikimedia.org/T275633 ? [13:58:07] this quarter [13:58:16] good, thanks! [13:58:18] do you need them? [13:58:28] no no [14:35:36] PROBLEM - MariaDB sustained replica lag on db2072 is CRITICAL: 330 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2072&var-port=9104 [14:45:23] 10DBA, 10decommission-hardware: decommission db1086.eqiad.wmnet - https://phabricator.wikimedia.org/T278229 (10Marostegui) [15:13:24] RECOVERY - MariaDB sustained replica lag on db2072 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2072&var-port=9104 [15:21:27] marostegui: just to make sure I interpreted your last reply correctly, you/DBAs have no specific preference of when the switchover is? [15:22:31] legoktm: yes, we do, sobanski is coordinating that [15:23:12] ack got it, I'll check with them then [15:23:17] thank you! [15:44:33] Tendril is acting weirdly....I am checking [15:45:07] the stupid global_status tables agian [15:47:02] It should start working fine again in a few minutes [18:56:46] 10DBA, 10Patch-For-Review, 10Performance-Team (Radar): Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Krinkle)