[01:28:21] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [01:38:01] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [06:11:45] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1088.eqiad.wmnet - https://phabricator.wikimedia.org/T276025 (10Marostegui) [06:25:03] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) [06:28:20] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) [06:46:43] 10DBA: Check for errors on all tables on some hosts - https://phabricator.wikimedia.org/T276742 (10Marostegui) [06:48:52] 10DBA: Check for errors on all tables on some hosts - https://phabricator.wikimedia.org/T276742 (10Marostegui) [06:50:57] 10DBA: Check for errors on all tables on some hosts - https://phabricator.wikimedia.org/T276742 (10Marostegui) Starting a table check on db1146:3314 after seeing some errors. [06:52:55] 10DBA: Check for errors on all tables on some hosts - https://phabricator.wikimedia.org/T276742 (10Marostegui) [06:54:01] 10DBA: Check for errors on all tables on some hosts - https://phabricator.wikimedia.org/T276742 (10Marostegui) [07:03:42] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) [07:03:51] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) [07:30:26] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) s2 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1004 [] db1171 [] db1170 [] db1162 [] db1155 [] db1146 [] db1129 [] db1125 [] db112... [07:30:28] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) s2 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1004 [] db1171 [] db1170 [] db1162 [] db1155 [] db1146 [] db1129 [] db1125 [] db1122 [] db1105 [] db1076 [] db1074... [10:32:19] 10DBA: Check for errors on all tables on some hosts - https://phabricator.wikimedia.org/T276742 (10Marostegui) [13:17:05] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1088.eqiad.wmnet - https://phabricator.wikimedia.org/T276025 (10Marostegui) [13:24:28] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1088.eqiad.wmnet - https://phabricator.wikimedia.org/T276025 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db1088.eqiad.wmnet` - db1088.eqiad.wmnet (**PASS**) - Downtimed host on Icinga... [13:24:41] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1088.eqiad.wmnet - https://phabricator.wikimedia.org/T276025 (10Marostegui) a:05Marostegui→03wiki_willy This is ready for #dc-ops [13:24:59] 10DBA, 10decommission-hardware, 10ops-eqiad, 10Patch-For-Review: decommission db1088.eqiad.wmnet - https://phabricator.wikimedia.org/T276025 (10Marostegui) [13:25:30] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [13:48:39] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [13:50:59] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [14:47:06] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) [14:47:49] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) @jcrespo you can take db2151 for the media backups db testing [14:59:55] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Cmjohnson) [15:58:01] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Cmjohnson) [15:58:23] 10DBA, 10SRE, 10ops-eqiad: db1162 crashed - https://phabricator.wikimedia.org/T275309 (10Cmjohnson) 05Open→03Resolved db1162 is back online - updated netbox and resolving the task [18:31:14] 10DBA, 10SRE, 10ops-eqiad: db1162 crashed - https://phabricator.wikimedia.org/T275309 (10Marostegui) Thanks Chris - I can access the host now. I will reimage it and populate it with data on Monday. [18:37:11] 10DBA, 10SRE, 10ops-eqiad: Degraded RAID on db1162 - https://phabricator.wikimedia.org/T277316 (10Marostegui) 05Open→03Resolved a:03Marostegui The backplane of this host was replaced. The RAID is now in optimal status ` root@db1162:~# megacli -LDInfo -Lall -aALL Adapter 0 -- Virtual Drive Informatio... [20:23:16] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [20:27:24] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [21:01:47] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [21:03:17] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104