[05:51:22] PROBLEM - MariaDB sustained replica lag on s8 on db2195 is CRITICAL: 17.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2195&var-port=9104 [05:51:26] PROBLEM - MariaDB sustained replica lag on s8 on db2167 is CRITICAL: 12.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2167&var-port=9104 [05:52:22] RECOVERY - MariaDB sustained replica lag on s8 on db2195 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2195&var-port=9104 [05:52:24] RECOVERY - MariaDB sustained replica lag on s8 on db2167 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2167&var-port=9104 [07:03:13] Amir1: I am going to set up pc7 today [07:03:19] You ok with tha? [08:20:28] urbanecm: Let's move the talk here, as it may be easier :) [08:20:35] sure! [08:20:56] 09:20 urbanecm: Some of the tables already have that definition. Is that expected? [08:20:58] urbanecm: This is ruwiki https://phabricator.wikimedia.org/P72150 [08:21:16] And this is the change https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/1112241/4/sql/mysql/patch-modify_gemm_mentee_is_active_mwtinyint.sql [08:21:30] So it is already in place? [08:24:15] marostegui: i have to admit i'm not 100% sure how boolean/mwtinyint from the abstract layer propagates down to SQL statements. if the table already looks like the alter says, _maybe_ this is actually only a schema change from MediaWiki's pov? [08:24:53] urbanecm: Looks like, in any case, there's no harm in re-running it, so I will let it go in case some others do not have it [08:25:52] thank you [08:26:10] any time! [09:48:06] marostegui: thank you for pc7 🥳 [09:48:18] Let's do this [09:48:33] :) [12:37:47] Amir1: pc7 is live [12:37:49] I will send an email later [12:37:57] But for now I am monitoring [12:56:27] FIRING: SystemdUnitFailed: prometheus-mysqld-exporter.service on db1245:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:41:27] RESOLVED: SystemdUnitFailed: prometheus-mysqld-exporter.service on db1245:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:00:07] :((((((((( https://phabricator.wikimedia.org/T123313#2232274 [14:09:08] oops [14:13:48] <_joe_> I mean the "oops" is all on mediawiki's side :P [18:27:24] PROBLEM - MariaDB sustained replica lag on s2 on db2148 is CRITICAL: 290 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2148&var-port=9104 [18:31:11] ^ I fixed this one but I won't depool and upgrade it yet since we have depooled hosts in s2 plus there is another incident ongoing and other stuff, I let things first stabalize before further actions [18:32:24] RECOVERY - MariaDB sustained replica lag on s2 on db2148 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2148&var-port=9104