[02:17:42] 10Data-Persistence-Backup, 10SRE-tools, 10Patch-For-Review: Make recover-dump show the time taken - https://phabricator.wikimedia.org/T277160 (10Rohitesh-Kumar-Jain) > If there is a logging feature, it might be nice to log it into logs for data recording Hi @h.krishna, This is a different task: [[ https://... [05:44:50] 10DBA: "chemical" major mime type was never added to production database - https://phabricator.wikimedia.org/T277354 (10Ladsgroup) I would have been 100% in agreement with you if it were something else but I have a soft spot for science (As a certified science nerd who studied physics in university). I can pick... [06:06:02] 10DBA, 10SRE, 10ops-eqiad: db1162 crashed - https://phabricator.wikimedia.org/T275309 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1162.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202103150605_marostegui_10267.log`. [06:32:33] 10DBA, 10SRE, 10ops-eqiad: db1162 crashed - https://phabricator.wikimedia.org/T275309 (10Marostegui) It looks like the host isn't rebooting via PXE - trying to force it manually [06:42:08] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [06:42:13] db1162 is a nightmare [06:42:30] 10DBA, 10SRE, 10ops-eqiad: db1162 crashed - https://phabricator.wikimedia.org/T275309 (10Marostegui) 05Resolved→03Open @Cmjohnson I am not able to PXE boot the host. Neither via the normal reimage process nor forcing PXE manually with: ` [06:37:22] marostegui@cumin1001:~$ sudo ipmitool -I lanplus -H db1... [06:55:55] 10DBA, 10SRE, 10Wikimedia-Incident: 14 March 2021 Wikimedia API Outage - https://phabricator.wikimedia.org/T277417 (10Legoktm) >>! In T277417#6912139, @RhinosF1 wrote: >>>! In T277417#6912136, @Legoktm wrote: >>> This also brought down any third party wiki using Instant Commons. >> >> The wikis actually wen... [07:03:07] 10DBA, 10SRE, 10Wikimedia-Incident: 14 March 2021 Wikimedia API Outage - https://phabricator.wikimedia.org/T277417 (10RhinosF1) See subtask @Legoktm [07:07:07] 10DBA, 10SRE, 10ops-eqiad: db1162 crashed - https://phabricator.wikimedia.org/T275309 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1162.eqiad.wmnet'] ` Of which those **FAILED**: ` ['db1162.eqiad.wmnet'] ` [08:29:32] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): mariadb crashed on labsdb1009 - https://phabricator.wikimedia.org/T276980 (10Marostegui) labsdb1009 has been replicating fine since Thursday, so I am going to enable notifications and start repooling it. [08:35:35] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): mariadb crashed on labsdb1009 - https://phabricator.wikimedia.org/T276980 (10Marostegui) 05Open→03Resolved a:03Marostegui Host repooled. Let's see how it goes with user traffic. [08:58:09] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Upgrade firmware on db1136 - https://phabricator.wikimedia.org/T277007 (10Marostegui) @Cmjohnson db1136 is now off, you can proceed as needed [09:07:50] 10DBA: "chemical" major mime type was never added to production database - https://phabricator.wikimedia.org/T277354 (10Marostegui) Sure, that works @Ladsgroup!. But I am not going to give this a lot priority, as we have lots of others which I consider a lot more important! [09:08:17] 10DBA: "chemical" major mime type was never added to production database - https://phabricator.wikimedia.org/T277354 (10Marostegui) p:05Triage→03Low [09:11:43] 10DBA: "chemical" major mime type was never added to production database - https://phabricator.wikimedia.org/T277354 (10Ladsgroup) Sure. That's awesome. Thanks! [09:18:14] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [09:24:23] marostegui: yeah, close to a month of being out of service [09:24:33] yeah :( [09:24:52] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [09:28:28] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: recover-mariadb should use logging (logger) to indicate actions taken - https://phabricator.wikimedia.org/T277162 (10jcrespo) @Rohitesh-Kumar-Jain Thank you for your contribution. A few comments: * The commit looks fine, but because it has not (yet) imp... [09:30:43] 10Data-Persistence-Backup, 10SRE-tools: transfer.py argument parsing exception - https://phabricator.wikimedia.org/T268258 (10jcrespo) a:05rafayghafoor→03None Hey, @rafayghafoor, please do not claim this task before talking to @Marostegui or me first! :-) Let us know what your plans are before working on... [09:53:25] 10Data-Persistence-Backup, 10SRE-tools, 10Patch-For-Review: Make recover-dump show the time taken - https://phabricator.wikimedia.org/T277160 (10jcrespo) Thanks, adding a test shows good initiative. Please check the comments that both Reedy and I have sent on the patch. One thing I thought would be interest... [10:01:32] ACKNOWLEDGEMENT - MariaDB sustained replica lag on db1111 is CRITICAL: 4 ge 2 Kormat Already recovered https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1111&var-port=9104 [13:10:32] 10Blocked-on-schema-change, 10DBA: Drop default of revactor_timestamp - https://phabricator.wikimedia.org/T267767 (10Kormat) s3 eqiad progress: [] db1112 sanitarium master [] db1123 master [] db1124 sanitarium [] db1154 sanitarium [] db1157 [] db1166 [] db1171:3313 backup source [] db1175 [] dbstore1004:3313 [13:11:30] 10Blocked-on-schema-change, 10DBA: Drop default of revactor_timestamp - https://phabricator.wikimedia.org/T267767 (10Kormat) [13:24:10] 10Blocked-on-schema-change, 10DBA: Drop default of revactor_timestamp - https://phabricator.wikimedia.org/T267767 (10Kormat) [13:25:47] 10Blocked-on-schema-change, 10DBA: Drop default of revactor_timestamp - https://phabricator.wikimedia.org/T267767 (10Kormat) [13:27:33] 10Blocked-on-schema-change, 10DBA: Drop default of revactor_timestamp - https://phabricator.wikimedia.org/T267767 (10Kormat) [13:29:46] 10Blocked-on-schema-change, 10DBA: Drop default of revactor_timestamp - https://phabricator.wikimedia.org/T267767 (10Kormat) [13:32:48] 10Blocked-on-schema-change, 10DBA: Drop default of revactor_timestamp - https://phabricator.wikimedia.org/T267767 (10Kormat) [13:36:49] 10Blocked-on-schema-change, 10DBA: Drop default of revactor_timestamp - https://phabricator.wikimedia.org/T267767 (10Kormat) [14:34:07] PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [14:35:36] pc2009 average lag over the last 90 days: 4.37s 👀 [14:38:30] kormat: that check is going off loads [14:42:43] PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [14:43:10] 10DBA, 10SRE, 10ops-eqiad, 10Patch-For-Review: db1162 crashed - https://phabricator.wikimedia.org/T275309 (10Cmjohnson) @marostegui The mac address for the nic changed, just merged the change. The install should work now. Can you try again and resolve this task when it works please. [14:43:44] 10DBA: Switchover s7 from db1086 to db1136 - https://phabricator.wikimedia.org/T274336 (10Cmjohnson) [14:43:47] 10DBA, 10SRE: db1080-95 batch possibly suffering BBU issues - https://phabricator.wikimedia.org/T258386 (10Cmjohnson) [14:43:49] 10DBA, 10SRE: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Cmjohnson) [14:43:52] 10DBA, 10SRE, 10ops-eqiad, 10Patch-For-Review: db1162 crashed - https://phabricator.wikimedia.org/T275309 (10Marostegui) Thanks @Cmjohnson I will try today or tomorrow morning and will close when done. [14:44:17] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: Upgrade firmware on db1136 - https://phabricator.wikimedia.org/T277007 (10Cmjohnson) 05Open→03Resolved @marostegui updated the BIOS firmware [14:44:52] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Kormat) [14:46:48] RhinosF1: yeaah. parsercache is not in a good place, from both a design + operational perspective. [14:47:01] RECOVERY - MariaDB sustained replica lag on pc2009 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [14:49:50] kormat: ack, lots of things aren't [14:50:22] ain't that the truth :) [14:51:07] 10DBA: Switchover s7 from db1086 to db1136 - https://phabricator.wikimedia.org/T274336 (10Marostegui) [14:51:09] 10DBA, 10SRE: db1080-95 batch possibly suffering BBU issues - https://phabricator.wikimedia.org/T258386 (10Marostegui) [14:51:13] 10DBA, 10SRE: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) [14:51:15] I am glad we don't reimage hosts by default anymore https://phabricator.wikimedia.org/T277007#6913918 [14:51:41] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: Upgrade firmware on db1136 - https://phabricator.wikimedia.org/T277007 (10Marostegui) 05Resolved→03Open This host booted from PXE boot, and attempted to reimage itself. Luckily the partman recipe we have didn't delete its data. Did the BIOS upgrade change the defaul... [14:51:43] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: Upgrade firmware on db1136 - https://phabricator.wikimedia.org/T277007 (10Marostegui) @Cmjohnson can you take a look to see if that was the case? [14:52:19] 10DBA, 10SRE, 10Wikimedia-Incident: 14 March 2021 Wikimedia API Outage - https://phabricator.wikimedia.org/T277417 (10RhinosF1) [14:52:44] 10DBA, 10SRE, 10Wikimedia-Incident: 14 March 2021 Wikimedia API Outage - https://phabricator.wikimedia.org/T277417 (10RhinosF1) [14:54:24] 10DBA, 10SRE, 10ops-eqiad, 10Patch-For-Review: db1162 crashed - https://phabricator.wikimedia.org/T275309 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1162.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202103151454_... [15:01:06] @marostegui can you exit the console on db1136 [15:01:08] please [15:01:10] sure [15:01:12] one sec [15:01:21] cmjohnson1: all yours [15:01:29] thx [15:16:44] 10DBA, 10SRE, 10ops-eqiad, 10Patch-For-Review: db1162 crashed - https://phabricator.wikimedia.org/T275309 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1162.eqiad.wmnet'] ` and were **ALL** successful. [15:17:07] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [15:17:52] 10DBA, 10SRE, 10ops-eqiad, 10Patch-For-Review: db1162 crashed - https://phabricator.wikimedia.org/T275309 (10Marostegui) 05Open→03Resolved db1162 was reimaged nicely Thank you Chris I will clone and repool this host tomorrow. [15:19:48] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) [15:20:04] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) [15:20:18] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) > Agree, I'd prefer to consume the binlog of a replica. >> Why not using this on c... [15:26:04] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Kormat) I had a quick look, as mariadb & mysql's GTID implementations are different and inco... [15:29:24] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) Relevant? https://debezium.io/documentation/reference/connectors/mysql.html#mysql... [15:32:52] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) > Their roadmap says that they won't look at what's required to support mariadb un... [15:33:18] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Kormat) I don't think so; our mariadb clusters would count as a "Primary and replica" setup... [15:39:18] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Marostegui) >>! In T120242#6914026, @Ottomata wrote: >> Agree, I'd prefer to consume the bin... [15:45:27] marostegui: I am not sure what is going on with db1136, I get it to boot to the disk but then when I try to login in, I get prompted for a password. cmjohnson@db1136.eqiad.wmnet's password: [15:46:02] cmjohnson1: what's cause it was reimaged :( [15:46:10] and puppet is failing due to certificates mismatch [15:46:21] cmjohnson1: but it is no longer booting via PXE then? [15:46:51] from what I can tell, BIOS is not seeing the disk and only wants to pxe boot [15:47:12] :( [15:47:22] the weird thing is everything looks correct...i am going to try and rollback and see if that helps [15:47:29] ok! [15:47:30] thanks [15:48:32] wait that happened to me with another host [15:49:07] it was because of BIOS missconfiguration- a drive was not set as bootable [15:49:17] cmjohnson1: ^ [15:49:19] okay [15:49:20] although if it is a db, with only one visible disk [15:49:26] probably not the case? [15:49:50] let me find the ticket just in case [15:50:43] https://phabricator.wikimedia.org/T274185#6882042 [15:51:17] in particular, it was https://phabricator.wikimedia.org/T274185#6883969 [15:52:55] jynus: but in that case it was still booting from disks, no? just from the wrong RAID, no? [15:53:24] it had only sdc as bootable, when it should be sda [15:53:32] I guess not applicable here? [15:53:41] unless it has sda as not bootable? [15:54:16] apparently this bios could only have 1 drive as bootable and didn't try other disk after failing one [15:54:25] drive here in the logical sense [15:55:02] which is bad, because if a disk fails, even on RAID1, we will likely have to do BIOS changes to get it to boot [15:58:47] as a followup to our meeting today: db1139 is stretch with s1 and s6 [15:59:10] but db2097 is also buster with s1 and s6 [15:59:17] nice [15:59:27] so if we had to switch backups, we would only need to logical point it on the config file [15:59:33] so that would be easy [16:00:05] this was the whole point of increasing capacity- extra redudancy and easy maintenance [16:00:15] let me see the next step [16:00:49] we also have s5, buster [16:01:44] s2 is not redundant, so we would have to do some moves on that step [16:02:04] jynus it looks like /dev/sda is bootable [16:02:09] https://www.irccloud.com/pastebin/UzOheq5u/ [16:02:19] cmjohnson1, so not the same problem [16:02:34] this is the console outuput....looks wrong imo but I don't know what [16:03:52] marostegui, I assume s1, s4, s8 will be the last sections to upgrade? [16:04:17] jynus: definitely [16:05:20] marostegui I think the almost re-install may have borked the OS. Hardware wise it looks fine and is booting directly to the disk. [16:05:51] cmjohnson1: ok, I will reimage it and reboot it a few times after that to make sure it doesn't boot up from pxe again [16:06:07] I will close the task tomorrow if that works fine, sounds good? [16:06:20] sounds good [16:06:35] cool, thanks for your time cmjohnson1 [16:06:39] so I can switch s1 and s5 locations and once we discard the old stretch instances, we have 1 to play with as buffer [16:07:17] actually, I don't even need to move anything, there is space on db2101 (x1) [16:19:58] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) Interesting, makes sense. A LooONng time ago when I did MySQL DBA work, to rest... [16:23:12] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) Ah, if we needed to change the binlog position information used by Debezium, this... [16:25:26] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) > We are in process of simplifying things to ease our operational load. I'm inter... [16:29:37] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Kormat) From reading through their docs a bit: * Debezium requires [[ https://debezium.io/d... [16:31:15] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) > It also grabs a global read lock on the server it connects to when making an ini... [16:32:03] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) > Debezium requires binlog_format=ROW, which means it cannot connect directly to a... [16:34:40] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10jcrespo) @Ottomata I am going to interject here, as backups owner. I have 2 needs regarding... [16:38:52] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) +1 @jcrespo this ticket is about solving the MW event production consistency probl... [16:48:00] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Joe) Ok I'll try to re-summarize my argument: the problem we're trying to solve is having tr... [16:53:49] 10DBA, 10SRE, 10Wikimedia-Incident: 14 March 2021 Wikimedia API Outage - https://phabricator.wikimedia.org/T277417 (10CDanis) [18:11:50] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) > no analysis of the causes of such inconsistencies is provided. Hm, I guess none... [18:16:31] PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 9.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [18:30:03] RECOVERY - MariaDB sustained replica lag on pc2009 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [18:53:21] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter: Add abuse_filter_log.afl_filter_id and afl_global columns - https://phabricator.wikimedia.org/T234052 (10Bstorm) [19:04:52] 10DBA: Make DB alerts more specific - https://phabricator.wikimedia.org/T277174 (10LSobanski) [19:05:19] 10DBA, 10Icinga, 10observability, 10Sustainability (Incident Followup): Make primary DB masters page on HOST DOWN alert - https://phabricator.wikimedia.org/T233684 (10LSobanski) [19:05:40] 10DBA, 10Analytics-Clusters, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Bstorm) I went ahead and refreshed the view definitions on the host because there have been a few changes... [19:07:13] 10DBA: Update DB read_only alert to represent correct state - https://phabricator.wikimedia.org/T277174 (10LSobanski) [19:13:57] PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [19:16:17] RECOVERY - MariaDB sustained replica lag on pc2009 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [20:20:39] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [20:21:41] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [20:22:41] PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [20:25:43] PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [20:32:13] PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 3.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [20:36:59] RECOVERY - MariaDB sustained replica lag on pc2009 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [22:20:29] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install db11[76-84] - https://phabricator.wikimedia.org/T273566 (10Jclark-ctr) db1176 A1 u6 p14 id1751 db1177 A3 u38 p29 id 1931 db1178 B1 u25 p16 id4020 db1179 B5 u35 p38 id3356 db1180 C3 u20 p6 id2956 db1181 C5 u16 p17 id1846 db1182 D1 u38 p2... [22:21:30] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install db11[76-84] - https://phabricator.wikimedia.org/T273566 (10Jclark-ctr) a:03Cmjohnson [23:35:55] 10Data-Persistence-Backup, 10SRE-tools, 10Patch-For-Review: Make recover-dump show the time taken - https://phabricator.wikimedia.org/T277160 (10h.krishna) Thank you for your feedback @jcrespo and Reedy. Happy to make the changes as suggested A few questions, * Regarding decompression, In code review you ha...