[00:00:05] Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260210T0000) [00:04:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 19.79% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:06:00] cmede: Kemayo: should be live now, sorry that it was a bit chaotic today:) [00:06:17] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1238033|Revert^2 "EditCheck: add instrumentation for checks seen during edit session" (T413419 T412334)]], [[gerrit:1224793|Add MultiTitle to extension list (T404461)]] (duration: 39m 37s) [00:06:23] T413419: Append a tag to edits in which ≥1 Edit Suggestion was visible in the browser viewport - https://phabricator.wikimedia.org/T413419 [00:06:23] T412334: Add instrumentation to measure suggestion visibility within VisualEditor - https://phabricator.wikimedia.org/T412334 [00:06:24] T404461: Enable Extension:MultiTitle on tok.wikipedia.org - https://phabricator.wikimedia.org/T404461 [00:06:42] (03CR) 10Zabe: [C:03+2] Add config variable for MultiTitle [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224794 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [00:07:01] (03CR) 10Zabe: [C:03+2] Reenable MostCategories on frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238025 (https://phabricator.wikimedia.org/T413362) (owner: 10Zabe) [00:07:27] FIRING: HelmReleaseBadStatus: Helm release kserve/kserve on k8s-mlstaging@codfw in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s-mlstaging&var-namespace=kserve - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [00:07:35] (03Merged) 10jenkins-bot: Add config variable for MultiTitle [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224794 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [00:07:50] (03Merged) 10jenkins-bot: Reenable MostCategories on frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238025 (https://phabricator.wikimedia.org/T413362) (owner: 10Zabe) [00:08:30] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1224794|Add config variable for MultiTitle (T404461)]], [[gerrit:1238025|Reenable MostCategories on frwiki (T413362)]] [00:08:35] T413362: Move Mostcategories computation to Hadoop - https://phabricator.wikimedia.org/T413362 [00:10:28] thank you zabe for the help! [00:12:33] !log zabe@deploy2002 tbodt, zabe: Backport for [[gerrit:1224794|Add config variable for MultiTitle (T404461)]], [[gerrit:1238025|Reenable MostCategories on frwiki (T413362)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [00:12:37] T404461: Enable Extension:MultiTitle on tok.wikipedia.org - https://phabricator.wikimedia.org/T404461 [00:13:59] !log zabe@deploy2002 tbodt, zabe: Continuing with sync [00:19:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.61% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:19:45] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.88% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:20:11] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1224794|Add config variable for MultiTitle (T404461)]], [[gerrit:1238025|Reenable MostCategories on frwiki (T413362)]] (duration: 11m 41s) [00:20:16] T404461: Enable Extension:MultiTitle on tok.wikipedia.org - https://phabricator.wikimedia.org/T404461 [00:20:16] T413362: Move Mostcategories computation to Hadoop - https://phabricator.wikimedia.org/T413362 [00:20:45] (03CR) 10Zabe: [C:03+2] Enable MultiTitle on beta cluster testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224795 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [00:20:45] (03CR) 10Zabe: [C:03+2] Load MultiTitle on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224796 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [00:21:36] (03Merged) 10jenkins-bot: Enable MultiTitle on beta cluster testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224795 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [00:21:44] (03Merged) 10jenkins-bot: Load MultiTitle on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224796 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [00:25:06] The increase of used php-fpm workes somewhat matches the deployment of https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/1238033 but not sure if it is related [00:25:55] Well, I mean, the timing of the deployment more or less fits. [00:26:25] zabe: we also normally get a spike right at 00:00 UTC lately, there's a task about it someplace [00:28:03] ah right, it pretty much looks like yesterday when I look at the 2 day graph [00:28:04] https://phabricator.wikimedia.org/T416567 -> https://phabricator.wikimedia.org/T416616 [00:28:29] thank you for looking at it though :) [00:31:34] no problem:) [00:33:08] (03CR) 10Zabe: [C:04-2] Start reading from il_target_id on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238010 (https://phabricator.wikimedia.org/T413669) (owner: 10Zabe) [00:34:45] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.97% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:40:05] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1238064 [00:40:05] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1238064 (owner: 10TrainBranchBot) [00:46:24] (03CR) 10Eevans: [C:03+2] restbase: new host (refresh) restbase2039 [puppet] - 10https://gerrit.wikimedia.org/r/1237956 (https://phabricator.wikimedia.org/T416538) (owner: 10Eevans) [00:52:14] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1238064 (owner: 10TrainBranchBot) [00:55:35] (03CR) 10Zabe: [C:03+2] "This is quite an experimental step, I tested the new file table on beta, but there are still a lot of caveats, so I might need to revert t" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1236870 (https://phabricator.wikimedia.org/T416548) (owner: 10Zabe) [00:56:23] (03Merged) 10jenkins-bot: Start reading from file table on testwikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1236870 (https://phabricator.wikimedia.org/T416548) (owner: 10Zabe) [00:57:14] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1236870|Start reading from file table on testwikis (T416548)]] [00:57:18] T416548: Start reading from file table on wmf production - https://phabricator.wikimedia.org/T416548 [00:59:08] !log zabe@deploy2002 zabe: Backport for [[gerrit:1236870|Start reading from file table on testwikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [00:59:18] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [01:00:12] !log zabe@deploy2002 Sync cancelled. [01:00:16] ... [01:00:53] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1236870|Start reading from file table on testwikis (T416548)]] [01:02:46] !log zabe@deploy2002 zabe: Backport for [[gerrit:1236870|Start reading from file table on testwikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [01:02:50] T416548: Start reading from file table on wmf production - https://phabricator.wikimedia.org/T416548 [01:03:01] (03CR) 10Dzahn: [C:03+2] gerrit::sshkey: add gerrit-lb IPs to host_aliases ssh key (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1237887 (https://phabricator.wikimedia.org/T411895) (owner: 10Jelto) [01:03:05] !log zabe@deploy2002 zabe: Continuing with sync [01:07:11] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1236870|Start reading from file table on testwikis (T416548)]] (duration: 06m 19s) [01:10:13] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1238069 [01:10:14] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1238069 (owner: 10TrainBranchBot) [01:13:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.82% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [01:13:40] FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:16:14] PROBLEM - dump of analytics_meta in eqiad on backupmon1001 is CRITICAL: Last dump for analytics_meta at eqiad (db1208) taken on 2026-02-10 01:09:21 is 1.2 GiB, but the previous one was 1.7 GiB, a change of -28.2 % https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup [01:18:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.54% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [01:24:27] 06SRE, 10SRE-Access-Requests: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11599937 (10Dzahn) Hello @AJaved-WMF let's start with the "WMF group" part of your request. This has actually moved to a self-service workflow, the Wikimedia Identity Managemen... [01:34:50] (03PS1) 10Dzahn: admin: add ajaved-wmf to analytics-privatedata, level 1 [puppet] - 10https://gerrit.wikimedia.org/r/1238070 (https://phabricator.wikimedia.org/T416922) [01:37:01] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1238069 (owner: 10TrainBranchBot) [01:41:56] 06SRE, 10SRE-Access-Requests, 06collaboration-services, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11600058 (10Dzahn) [01:46:02] FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate _etcd-server-ssl._tcp.ml_etcd.codfw.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [02:01:09] !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image [02:09:18] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:10:07] (03PS1) 10TrainBranchBot: Branch commit for wmf/1.46.0-wmf.15 [core] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1238071 (https://phabricator.wikimedia.org/T413806) [02:10:09] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/1.46.0-wmf.15 [core] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1238071 (https://phabricator.wikimedia.org/T413806) (owner: 10TrainBranchBot) [02:13:47] !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 12m 38s) [02:19:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.18% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [02:24:40] (03Merged) 10jenkins-bot: Branch commit for wmf/1.46.0-wmf.15 [core] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1238071 (https://phabricator.wikimedia.org/T413806) (owner: 10TrainBranchBot) [02:34:18] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:35:13] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:43:30] (03PS1) 10Kosta Harlan: Remove A/B test for hCaptcha editing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238072 (https://phabricator.wikimedia.org/T410354) [02:44:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.11% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [02:52:00] (03PS1) 10Ssingh: bird: support configuration of IPv6-only addresses [puppet] - 10https://gerrit.wikimedia.org/r/1238073 [02:55:38] (03PS2) 10Ssingh: bird: support configuration of IPv6-only addresses [puppet] - 10https://gerrit.wikimedia.org/r/1238073 [02:59:55] (03PS3) 10Ssingh: bird: support configuration of IPv6-only addresses [puppet] - 10https://gerrit.wikimedia.org/r/1238073 [03:00:05] Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260210T0300) [03:00:45] (03PS4) 10Ssingh: bird: support configuration of IPv6-only addresses [puppet] - 10https://gerrit.wikimedia.org/r/1238073 [03:02:58] (03PS1) 10Bartosz Dziewoński: Configure rate limit class for local and global bots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234538 (https://phabricator.wikimedia.org/T415588) [03:03:58] (03PS5) 10Ssingh: bird: support configuration of IPv6-only addresses [puppet] - 10https://gerrit.wikimedia.org/r/1238073 [03:05:10] (03PS1) 10Clare Ming: Update reference to Metrics Platform to Test Kitchen for hcaptcha experiment. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238074 (https://phabricator.wikimedia.org/T407904) [03:06:15] (03Abandoned) 10Clare Ming: Update reference to Metrics Platform to Test Kitchen for hcaptcha experiment. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238074 (https://phabricator.wikimedia.org/T407904) (owner: 10Clare Ming) [03:06:43] (03PS6) 10Ssingh: bird: support configuration of IPv6-only addresses [puppet] - 10https://gerrit.wikimedia.org/r/1238073 [03:07:23] (03Restored) 10Clare Ming: Update reference to Metrics Platform to Test Kitchen for hcaptcha experiment. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238074 (https://phabricator.wikimedia.org/T407904) (owner: 10Clare Ming) [03:09:29] (03PS7) 10Ssingh: bird: support configuration of IPv6-only addresses [puppet] - 10https://gerrit.wikimedia.org/r/1238073 [03:09:39] (03CR) 10Clare Ming: [C:03+1] "if it's not too much trouble, could line 2191 be updated to use the new TK variable? i.e. `$wmgUseTestKitchen`" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238072 (https://phabricator.wikimedia.org/T410354) (owner: 10Kosta Harlan) [03:11:49] (03PS2) 10Clare Ming: Update reference to Metrics Platform to Test Kitchen for hcaptcha experiment and fix comment typo. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238074 (https://phabricator.wikimedia.org/T407904) [03:11:51] (03PS8) 10Ssingh: bird: support configuration of IPv6-only addresses [puppet] - 10https://gerrit.wikimedia.org/r/1238073 [03:13:22] (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8014/co" [puppet] - 10https://gerrit.wikimedia.org/r/1238073 (owner: 10Ssingh) [03:14:27] (03CR) 10Ssingh: [V:03+1] "Not ready for review. It works but I am not very happy with it. Needs more revisions -- that's for tomorrow." [puppet] - 10https://gerrit.wikimedia.org/r/1238073 (owner: 10Ssingh) [03:14:42] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [03:24:34] (03CR) 10Ssingh: [V:03+1 C:04-2] bird: support configuration of IPv6-only addresses [puppet] - 10https://gerrit.wikimedia.org/r/1238073 (owner: 10Ssingh) [03:24:44] (03PS2) 10Bartosz Dziewoński: Configure rate limit class for local and global bots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234538 (https://phabricator.wikimedia.org/T415588) [04:00:05] Deploy window Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260210T0400) [04:02:17] (03PS1) 10TrainBranchBot: testwikis to 1.46.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238076 (https://phabricator.wikimedia.org/T413806) [04:02:20] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by mwpresync@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238076 (https://phabricator.wikimedia.org/T413806) (owner: 10TrainBranchBot) [04:03:09] (03Merged) 10jenkins-bot: testwikis to 1.46.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238076 (https://phabricator.wikimedia.org/T413806) (owner: 10TrainBranchBot) [04:03:40] !log mwpresync@deploy2002 Started scap sync-world: testwikis to 1.46.0-wmf.15 refs T413806 [04:03:43] T413806: 1.46.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T413806 [04:51:54] RECOVERY - MariaDB Replica Lag: s8 on db1154 is OK: OK slave_sql_lag Replication lag: 0.21 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [04:52:42] RECOVERY - MariaDB Replica Lag: s8 on an-redacteddb1001 is OK: OK slave_sql_lag Replication lag: 0.19 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [04:52:46] RECOVERY - MariaDB Replica Lag: s8 on clouddb1020 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [04:52:46] RECOVERY - MariaDB Replica Lag: s8 on clouddb1016 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [04:59:18] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [05:00:05] Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260210T0500) [05:12:55] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P88737 and previous config saved to /var/cache/conftool/dbconfig/20260210-051255-ladsgroup.json [05:13:14] 06SRE, 06Infrastructure-Foundations, 10netops: Update esams network pop diagrams - https://phabricator.wikimedia.org/T368084#11600138 (10Papaul) @ayounsi @cmooney please see the Wikimedia Amsterdam DCs physical layer below if all good before uploading it to Wikitech. I am still working on the IP layer diagra... [05:13:40] FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:23:04] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T410589)', diff saved to https://phabricator.wikimedia.org/P88738 and previous config saved to /var/cache/conftool/dbconfig/20260210-052303-ladsgroup.json [05:23:07] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [05:23:19] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance [05:46:02] FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate _etcd-server-ssl._tcp.ml_etcd.codfw.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [05:47:10] (03PS1) 10Kevin Bazira: ml: correct minor version of vLLM 0.14 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1238235 (https://phabricator.wikimedia.org/T415627) [05:51:16] (03CR) 10Kevin Bazira: ml: chunk torch libs in vLLM 0.14 image (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1237730 (https://phabricator.wikimedia.org/T415627) (owner: 10Kevin Bazira) [06:03:55] 06SRE, 06Data-Platform-SRE (2026.01.23 - 2026.02.13), 13Patch-For-Review: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts - https://phabricator.wikimedia.org/T411568#11600183 (10RKemper) Bleh, turned out I'd had a typo in my cumin query, so I'd inverted the hosts: the ones I listed as ne... [06:04:33] (03CR) 10Ryan Kemper: "This is confirmed working, so just need code review with respect to style and implementation" [cookbooks] - 10https://gerrit.wikimedia.org/r/1214664 (https://phabricator.wikimedia.org/T411568) (owner: 10Ryan Kemper)