[06:19:30] morning! I'm trying to create wiki replicas views for a new(ish) wiki on https://phabricator.wikimedia.org/T415977, but maintain-dbusers tells me 's5.urwikisource_p does not exist to create views'.. did something go wrong with the sanitization process? [07:06:01] I'll take a look in a bit [08:05:03] I'm not sure what could be missing, tried the sanitization and ran ok. @marostegui any idea? [08:05:24] federico3: context? [08:07:46] @taavi's question around urwikisource [08:09:03] federico3: have you checked if the script created the _p after running it? [08:12:39] @marostegui the sanitize wiki script? Yes the database exists but because the script does a 'CREATE DATABASE IF NOT EXISTS urwikisource_p;' - yet I'm not seeing tables e.g. on an-redacted [08:13:16] federico3: the tables are created by taavi's script, the sanitization script only creates _p database [08:13:46] but I,'m not seeing the database on a s5 replica host [08:13:55] I do [08:13:57] on both [08:15:14] oh wait, I'm seeing a database urwikisource on db1159 (without _p) [08:15:25] yeah, the database was missing when I asked, now it's there and our part ran just fine, thanks [08:15:33] federico3: the _p database only needs to exist on wikireplicas [08:23:10] can I reboot db1215? it will give zarcillo a short downtime [08:23:32] federico3: and orchestrator [08:27:17] Morning folks, can I get a +1 on two changes, please? First https://gerrit.wikimedia.org/r/c/operations/puppet/+/1298665 to remove 2 drained nodes from the rings, and then https://gerrit.wikimedia.org/r/c/operations/puppet/+/1298666 to set them up to be reimaged into new-style storage. [09:03:17] thanks :) [09:32:24] federico3: https://phabricator.wikimedia.org/T427357 you aware - anything needed? I see no comment from "our" side [09:33:15] I'm aware, there's only db2241 needing depooling as listed [09:33:34] cool [10:29:46] jynus: we have a few m* section hosts to be updated for https://phabricator.wikimedia.org/T426633 - can I help with the reboots? [10:30:03] (according to https://wikitech.wikimedia.org/wiki/MariaDB/Upgrading_a_section#Updating_Misc_(m)_sections ) [10:30:16] I don't have anything to do with m sections [10:30:40] I only request not to affect the services I own, like bacula [10:31:21] I am a user, like any other service [10:32:43] if you can let me know a time when the hosts are not doing backup related activities I can reboot them [10:33:04] now, and until 0 UTC would be a good time [10:33:09] thanks [10:34:10] from 0-8h usually database backups and bacula runs most important backups [10:35:35] federico3: you can probably avoid the reboots there, I am upgrading them to Trixie [10:35:44] ok thanks [12:43:26] Amir1: do you want to have x4 ready in q1? [12:43:39] Q4 of this year :P [12:43:45] or early Q1 [12:43:50] so in 22 days? XD [12:43:56] gonna be fun. I know [12:43:58] thanks for the heads up XD [12:44:05] I will try to get it done [12:44:10] we can go early Q1 too [12:44:15] let me add it [12:44:26] Hi folks - can I get a +1 to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1298773 to restore the two reimaged nodes to the rings, please? [12:58:13] thanks :) [13:23:58] FIRING: SystemdUnitFailed: swift_rclone_sync.service on ms-be1069:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:29:03] That's 宫锁珠帘.jpg and Medical_Heritage_Library_(IA_b21532552).pdf - first was losing a race with a delete, the latter with a rename. [13:33:58] RESOLVED: SystemdUnitFailed: swift_rclone_sync.service on ms-be1069:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:04:15] federico3: review and ok to perform a switchover on backup1-codfw? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1298816 [15:08:06] I've added it here, not sure if it is the right place: https://wikitech.wikimedia.org/wiki/Map_of_database_maintenance#Today_%282026-06-08%29 [15:15:03] apparently not, as it gets overwritten by the bot [15:19:57] apparently manual log doesn't work either [15:20:49] I will proceed because I haven't seen anyone saying not to, and I already gave a heads up during the meeting [15:47:32] PROBLEM - MariaDB sustained replica lag on s2 on db1155 is CRITICAL: 10 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312 [15:48:10] PROBLEM - MariaDB sustained replica lag on s2 on db2175 is CRITICAL: 10 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2175&var-port=9104 [15:48:32] PROBLEM - MariaDB sustained replica lag on s2 on db1182 is CRITICAL: 12 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1182&var-port=9104 [15:48:46] s2 issues? [15:49:06] they look fine in orchestrator, so maybe a spike [15:49:08] orch looks good [15:49:12] yeah [15:49:32] RECOVERY - MariaDB sustained replica lag on s2 on db1155 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312 [15:50:10] RECOVERY - MariaDB sustained replica lag on s2 on db2175 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2175&var-port=9104 [15:50:32] RECOVERY - MariaDB sustained replica lag on s2 on db1182 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1182&var-port=9104 [15:53:50] there seems there was 2 recent lag spikes: at 13:35 and at 15:42, maybe a smaller one at 15:42 [15:54:18] last one I meant 15:25 [22:09:13] one could go over binlogs to see what write is making s2 choke but I'm doing the follow ups for the weekend pa.ge right now [22:55:58] FIRING: SystemdUnitFailed: rsync.service on ms-be2063:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:58:58] RESOLVED: SystemdUnitFailed: rsync.service on ms-be2063:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed