[03:05:03] RESOLVED: SystemdUnitFailed: swift_rclone_sync.service on ms-be2069:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:32:07] uhm what is this error from db2185 on icinga? CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1032, Errmsg: Could not execute Delete_rows_v1 event on table zarcillo.section_instances: Can't find record in 'section_instances', Error_code: 1032: handler error HA_ERR_KEY_NOT_FOUND: the event's master log db1215-bin.028087, end_log_pos 244238252 [11:32:38] that replication is broken [11:32:47] because there was something on the master that wasn't on the slave [11:32:59] let me see [11:33:15] ah it's the deletion on zarcillo.section_instances because it's replicating from 1215 [11:34:01] also an alert for es2028 and db2190 on icinga [11:34:14] fixed db2185 [11:34:19] can you check the others? [11:34:42] I'm chasing db-test2001 [11:34:55] probably check the others as they sound production [11:36:27] es2028 is been there for a while and it's acked, is it due to https://phabricator.wikimedia.org/T408407 ? [11:40:41] yep [11:40:46] how about the other? [11:50:48] it's been downtimed by Amir [11:52:21] so probably a schema change? [11:52:35] what does the downtime says? [11:57:31] Maintenance / This service has been scheduled for fixed downtime from 2025-12-14 21:22:29 to 2025-12-17 21:22:29. Notifications for the service will not be sent out during that time period. [11:57:52] ok then [13:27:24] fyi particularly for Amir1 - I intend to turn on the postproc cache (additional parsercache) for idwiki this afternoon. it shouldn't have a visible impact on anything, but if it does, you'll know who to blame :) [13:28:25] Noted. Thanks! [13:29:58] https://www.irccloud.com/pastebin/voboZTeV/ [13:30:03] can I drop this database in x1? [13:30:43] https://www.irccloud.com/pastebin/hiPcOEzx/ [13:30:48] and this [14:03:55] federico3: test-s4 topology has been changed to follow our active DC convention [14:04:09] yay, thanks! [15:11:05] https://www.tomshardware.com/pc-components/storage/sphotonix-pushes-5d-glass-storage-toward-data-center-pilots seriously persistent data :) [15:57:32] Amir1: did you and joe have any discussions about hypothesis WE5.4.7 (thumbnail sizes) last week that I should know about? I got an email from him on the 5th suggesting that was likely... [16:25:03] FIRING: SystemdUnitFailed: swift_rclone_sync.service on ms-be2069:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:30:03] RESOLVED: SystemdUnitFailed: swift_rclone_sync.service on ms-be2069:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed