[11:18:58] <Amir1>	 arnaudb: Any objections to doing T374087 right now?
[11:18:59] <stashbot>	 T374087: Switchover s6 master (db2129 -> db2214) - https://phabricator.wikimedia.org/T374087
[11:24:57] <Amir1>	 https://phabricator.wikimedia.org/T374087#10120947 \o/
[12:12:17] <arnaudb>	 \o/
[12:12:19] <arnaudb>	 sorry I was AFK
[14:28:56] <dhinus>	 there is a problem with replication on db1154 (s5 only)
[14:29:11] <dhinus>	 "Could not execute Write_rows_v1 event on table srwiki.recentchanges; Index for table 'recentchanges' is corrupt; try to repair it"
[14:34:24] <arnaudb>	 db1240	mswiktionary	pagelinks	9/5/2024
[14:34:34] <arnaudb>	 I've also had that one on s3
[14:34:44] <arnaudb>	 dhinus: I'll check in a few moments
[14:35:20] <dhinus>	 arnaudb: thanks, no rush
[14:35:22] <arnaudb>	 those are noted in the tracking gsheet ↑
[14:35:28] <arnaudb>	 https://docs.google.com/spreadsheets/d/1uZFy9BqMUug14h899cU3-m4pnkrARl-ElV3fGbE7AXQ/edit?gid=0#gid=0
[14:37:51] <arnaudb>	 dhinus: table is rebuilding
[14:37:57] <arnaudb>	 replication resumed
[14:38:05] <dhinus>	 cheers :)
[14:38:36] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s5 on db1154 is CRITICAL: 2.978e+04 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13315
[14:46:46] <Amir1>	 let me check
[14:47:28] <Amir1>	 fixed now already, awesome
[14:48:48] <jinxer-wm>	 FIRING: MysqlReplicationLag: MySQL instance db1154:13315 has too large replication lag (13m 6s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1154&var-port=13315 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag
[14:53:36] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s5 on db1154 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13315
[14:53:48] <jinxer-wm>	 RESOLVED: MysqlReplicationLag: MySQL instance db1154:13315 has too large replication lag (13m 6s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1154&var-port=13315 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag
[15:13:35] <arnaudb>	 I've had to disable semi_sync repl on db2225 for it to catch on its replag after clone, first time it happens right after a clone haha
[15:19:59] <arnaudb>	 annnnd same thing on db2125, obv
[15:30:13] <Amir1>	 that's weird
[15:30:24] <Amir1>	 why semi sync
[15:33:23] <Amir1>	 btw db2187:3312 had its replication broken for days now (codfw sanitarium)
[15:33:38] <arnaudb>	 checking
[15:33:45] <Amir1>	 already fixing it
[15:33:52] <arnaudb>	 ack! sry I missed it!
[15:37:05] <Amir1>	 and db2197 is the same too
[15:37:08] <Amir1>	 1d 
[15:37:42] <arnaudb>	 checking
[15:37:50] <Amir1>	 backup source
[15:37:59] <Amir1>	 (I think, let me double check)
[15:38:03] <Amir1>	 arnaudb: already fixed
[15:38:09] <arnaudb>	 ah ack
[15:38:10] <arnaudb>	 thanks
[15:38:11] <Amir1>	 recentchanges corruption on nlwiki
[15:38:14] <arnaudb>	 urgh
[15:38:30] <Amir1>	 yeah backup source
[16:35:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: ceph-59ea825c-2a67-11ef-9c1c-bc97e1bbace4@mgr.moss-be2003.vtjrnj.service on moss-be2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:36:22] <Emperor>	 That's a false positive from the host being in maintenance mode for network maintenance re T373096
[16:36:23] <stashbot>	 T373096: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096
[17:10:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: ceph-59ea825c-2a67-11ef-9c1c-bc97e1bbace4@mgr.moss-be2003.vtjrnj.service on moss-be2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed