[06:19:30] <taavi>	 morning! I'm trying to create wiki replicas views for a new(ish) wiki on https://phabricator.wikimedia.org/T415977, but maintain-dbusers tells me 's5.urwikisource_p does not exist to create views'.. did something go wrong with the sanitization process?
[07:06:01] <federico3>	 I'll take a look in a bit
[08:05:03] <federico3>	 I'm not sure what could be missing, tried the sanitization and ran ok. @marostegui any idea?
[08:05:24] <marostegui>	 federico3: context?
[08:07:46] <federico3>	 @taavi's question around urwikisource
[08:09:03] <marostegui>	 federico3: have you checked if the script created the _p after running it?
[08:12:39] <federico3>	 @marostegui the sanitize wiki script? Yes the database exists but because the script does a 'CREATE DATABASE IF NOT EXISTS urwikisource_p;' - yet I'm not seeing tables e.g. on an-redacted
[08:13:16] <marostegui>	 federico3: the tables are created by taavi's script, the sanitization script only creates _p database
[08:13:46] <federico3>	 but I,'m not seeing the database on a s5 replica host
[08:13:55] <marostegui>	 I do
[08:13:57] <marostegui>	 on both
[08:15:14] <federico3>	 oh wait, I'm seeing a database urwikisource on db1159 (without _p)
[08:15:25] <taavi>	 yeah, the database was missing when I asked, now it's there and our part ran just fine, thanks
[08:15:33] <marostegui>	 federico3: the _p database only needs to exist on wikireplicas
[08:23:10] <federico3>	 can I reboot db1215? it will give zarcillo a short downtime
[08:23:32] <marostegui>	 federico3: and orchestrator
[08:27:17] <Emperor>	 Morning folks, can I get a +1 on two changes, please? First https://gerrit.wikimedia.org/r/c/operations/puppet/+/1298665 to remove 2 drained nodes from the rings, and then https://gerrit.wikimedia.org/r/c/operations/puppet/+/1298666 to set them up to be reimaged into new-style storage.
[09:03:17] <Emperor>	 thanks :)
[09:32:24] <marostegui>	 federico3: https://phabricator.wikimedia.org/T427357 you aware - anything needed? I see no comment from "our" side
[09:33:15] <federico3>	 I'm aware, there's only db2241 needing depooling as listed
[09:33:34] <marostegui>	 cool
[10:29:46] <federico3>	 jynus: we have a few m* section hosts to be updated for https://phabricator.wikimedia.org/T426633 - can I help with the reboots?
[10:30:03] <federico3>	 (according to https://wikitech.wikimedia.org/wiki/MariaDB/Upgrading_a_section#Updating_Misc_(m)_sections )
[10:30:16] <jynus>	 I don't have anything to do with m sections
[10:30:40] <jynus>	 I only request not to affect the services I own, like bacula
[10:31:21] <jynus>	 I am a user, like any other service
[10:32:43] <federico3>	 if you can let me know a time when the hosts are not doing backup related activities I can reboot them
[10:33:04] <jynus>	 now, and until 0 UTC would be a good time
[10:33:09] <federico3>	 thanks
[10:34:10] <jynus>	 from 0-8h usually database backups and bacula runs most important backups
[10:35:35] <marostegui>	 federico3: you can probably avoid the reboots there, I am upgrading them to Trixie
[10:35:44] <federico3>	 ok thanks
[12:43:26] <marostegui>	 Amir1:  do you want to have x4 ready in q1?
[12:43:39] <Amir1>	 Q4 of this year :P
[12:43:45] <Amir1>	 or early Q1
[12:43:50] <marostegui>	 so in 22 days? XD
[12:43:56] <Amir1>	 gonna be fun. I know
[12:43:58] <marostegui>	 thanks for the heads up XD
[12:44:05] <marostegui>	 I will try to get it done
[12:44:10] <Amir1>	 we can go early Q1 too
[12:44:15] <marostegui>	 let me add it
[12:44:26] <Emperor>	 Hi folks - can I get a +1 to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1298773 to restore the two reimaged nodes to the rings, please?
[12:58:13] <Emperor>	 thanks :)
[13:23:58] <jinxer-wm>	 FIRING: SystemdUnitFailed: swift_rclone_sync.service on ms-be1069:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:29:03] <Emperor>	 That's 宫锁珠帘.jpg and Medical_Heritage_Library_(IA_b21532552).pdf - first was losing a race with a delete, the latter with a rename.
[13:33:58] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: swift_rclone_sync.service on ms-be1069:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:04:15] <jynus>	 federico3: review and ok to perform a switchover on backup1-codfw? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1298816
[15:08:06] <jynus>	 I've added it here, not sure if it is the right place: https://wikitech.wikimedia.org/wiki/Map_of_database_maintenance#Today_%282026-06-08%29
[15:15:03] <jynus>	 apparently not, as it gets overwritten by the bot
[15:19:57] <jynus>	 apparently manual log doesn't work either
[15:20:49] <jynus>	 I will proceed because I haven't seen anyone saying not to, and I already gave a heads up during the meeting
[15:47:32] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s2 on db1155 is CRITICAL: 10 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312
[15:48:10] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s2 on db2175 is CRITICAL: 10 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2175&var-port=9104
[15:48:32] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s2 on db1182 is CRITICAL: 12 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1182&var-port=9104
[15:48:46] <jynus>	 s2 issues?
[15:49:06] <marostegui>	 they look fine in orchestrator, so maybe a spike
[15:49:08] <jynus>	 orch looks good
[15:49:12] <jynus>	 yeah
[15:49:32] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s2 on db1155 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312
[15:50:10] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s2 on db2175 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2175&var-port=9104
[15:50:32] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s2 on db1182 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1182&var-port=9104
[15:53:50] <jynus>	 there seems there was 2 recent lag spikes: at 13:35 and at 15:42, maybe a smaller one at 15:42
[15:54:18] <jynus>	 last one I meant 15:25
[22:09:13] <Amir1>	 one could go over binlogs to see what write is making s2 choke but I'm doing the follow ups for the weekend pa.ge right now
[22:55:58] <jinxer-wm>	 FIRING: SystemdUnitFailed: rsync.service on ms-be2063:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:58:58] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: rsync.service on ms-be2063:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed