[00:25:59] (03PS4) 10Arlolra: Deploy PRV to 19 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239270 (https://phabricator.wikimedia.org/T417349) [00:39:09] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1240086 [00:39:09] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1240086 (owner: 10TrainBranchBot) [00:44:56] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [00:49:05] jouncebot: nowandnext [00:49:05] No deployments scheduled for the next 6 hour(s) and 10 minute(s) [00:49:05] In 6 hour(s) and 10 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T0700) [00:49:11] (03CR) 10Zabe: [C:03+2] Add small comment pointing to ForeignDBViaLBRepo above file migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239762 (https://phabricator.wikimedia.org/T416548) (owner: 10Zabe) [00:50:05] (03Merged) 10jenkins-bot: Add small comment pointing to ForeignDBViaLBRepo above file migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239762 (https://phabricator.wikimedia.org/T416548) (owner: 10Zabe) [00:50:47] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1239762|Add small comment pointing to ForeignDBViaLBRepo above file migration (T416548)]] [00:50:51] T416548: Start reading from file table on wmf production - https://phabricator.wikimedia.org/T416548 [00:52:58] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1240086 (owner: 10TrainBranchBot) [00:55:05] !log zabe@deploy2002 zabe: Backport for [[gerrit:1239762|Add small comment pointing to ForeignDBViaLBRepo above file migration (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [00:55:25] !log zabe@deploy2002 zabe: Continuing with sync [00:58:21] !log Edit Module:Date on various wikis in attempt to mitigate T416616, T416540. Details at https://phabricator.wikimedia.org/T416616#11625838. [00:58:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:58:26] T416616: Create new cache-friendly lua/parser function for "is today before X date" and "is today after X date" - https://phabricator.wikimedia.org/T416616 [00:58:27] T416540: Mean MediaWiki backend latency increased by 60% between October and December 2025 - https://phabricator.wikimedia.org/T416540 [01:02:03] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1239762|Add small comment pointing to ForeignDBViaLBRepo above file migration (T416548)]] (duration: 11m 16s) [01:02:08] T416548: Start reading from file table on wmf production - https://phabricator.wikimedia.org/T416548 [01:07:17] 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11625878 (10Papaul) [01:08:57] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1240089 [01:08:57] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1240089 (owner: 10TrainBranchBot) [01:22:46] (03CR) 10CI reject: [V:04-1] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1240089 (owner: 10TrainBranchBot) [01:32:52] 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11625923 (10Papaul) @ayounsi I need you input here. et-0/0/1 on cr3/4-ulsfo are connected to asw2-22/23 the goal was to wait until phase 2 to move et-0/0/1 to the... [01:36:04] 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11625924 (10Dzahn) 05Open→03Resolved [01:44:55] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [01:46:32] (03CR) 10Scott French: [C:03+1] "Thanks Reuven!" [puppet] - 10https://gerrit.wikimedia.org/r/1239428 (https://phabricator.wikimedia.org/T417456) (owner: 10RLazarus) [01:50:34] (03CR) 10Scott French: [C:03+1] "Thanks, Reuven!" [puppet] - 10https://gerrit.wikimedia.org/r/1239429 (https://phabricator.wikimedia.org/T417456) (owner: 10RLazarus) [02:02:51] (03CR) 10Scott French: [C:03+1] deployment_server: Make SKIP_DIRS relative to the repo root in charlie (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1239430 (https://phabricator.wikimedia.org/T417456) (owner: 10RLazarus) [02:08:20] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:11:28] (03CR) 10Scott French: [C:03+1] "Thanks so much for thoughtfully breaking this up the way you did!" [puppet] - 10https://gerrit.wikimedia.org/r/1239431 (https://phabricator.wikimedia.org/T417456) (owner: 10RLazarus) [02:12:12] (03CR) 10Scott French: [C:03+1] deployment_server: Add services_dir arg to charlie helper functions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1239428 (https://phabricator.wikimedia.org/T417456) (owner: 10RLazarus) [02:31:39] FIRING: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [02:33:20] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:36:39] RESOLVED: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [02:37:34] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1190 (T415786)', diff saved to https://phabricator.wikimedia.org/P88852 and previous config saved to /var/cache/conftool/dbconfig/20260218-023733-marostegui.json [02:37:39] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [02:39:34] 14SRE-Sprint-Week-Sustainability-March2023, 10Beta-Cluster-Infrastructure, 06DBA, 10MediaWiki-libs-Rdbms, 07Epic: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255#11626005 (10Reedy) [02:52:40] FIRING: [2x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:52:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P88853 and previous config saved to /var/cache/conftool/dbconfig/20260218-025242-marostegui.json [02:55:39] FIRING: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [03:00:39] RESOLVED: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [03:07:51] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P88854 and previous config saved to /var/cache/conftool/dbconfig/20260218-030750-marostegui.json [03:14:39] FIRING: [2x] CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (195.200.68.147) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [03:19:39] RESOLVED: [2x] CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (195.200.68.147) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [03:19:41] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [03:22:59] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1190 (T415786)', diff saved to https://phabricator.wikimedia.org/P88855 and previous config saved to /var/cache/conftool/dbconfig/20260218-032258-marostegui.json [03:23:03] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [03:23:16] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance [03:23:24] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1199 (T415786)', diff saved to https://phabricator.wikimedia.org/P88856 and previous config saved to /var/cache/conftool/dbconfig/20260218-032324-marostegui.json [04:11:39] FIRING: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [04:16:39] RESOLVED: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [04:28:20] FIRING: [3x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [04:29:41] FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [04:32:25] (03PS1) 10Reedy: CommonSettings: Remove ORES back compat [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240121 [04:36:39] FIRING: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [04:41:39] RESOLVED: [2x] CoreBGPDown: Core BGP session down between asw1-b3-magru and cr2-magru (2a02:ec80:700:fe08::1) - group core - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [04:46:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T415786)', diff saved to https://phabricator.wikimedia.org/P88857 and previous config saved to /var/cache/conftool/dbconfig/20260218-044639-marostegui.json [04:46:45] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [04:48:39] FIRING: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [04:57:03] (03PS1) 10KartikMistry: Update cxserver to 2026-01-20-115813-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240126 (https://phabricator.wikimedia.org/T415038) [04:58:39] FIRING: [2x] CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (195.200.68.147) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [04:59:41] FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [05:01:35] Updating cxserver.. [05:01:48] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P88858 and previous config saved to /var/cache/conftool/dbconfig/20260218-050148-marostegui.json [05:03:39] RESOLVED: [3x] CoreBGPDown: Core BGP session down between asw1-b3-magru and cr2-magru (2a02:ec80:700:fe08::1) - group core - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [05:03:53] (03CR) 10KartikMistry: [C:03+2] Update cxserver to 2026-01-20-115813-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240126 (https://phabricator.wikimedia.org/T415038) (owner: 10KartikMistry) [05:05:59] (03Merged) 10jenkins-bot: Update cxserver to 2026-01-20-115813-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240126 (https://phabricator.wikimedia.org/T415038) (owner: 10KartikMistry) [05:13:20] FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [05:16:57] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P88859 and previous config saved to /var/cache/conftool/dbconfig/20260218-051656-marostegui.json [05:17:41] !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply [05:18:07] !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply [05:23:20] FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [05:24:13] !log kartik@deploy2002 helmfile [codfw] START helmfile.d/services/cxserver: apply [05:24:43] !log kartik@deploy2002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply [05:25:02] !log kartik@deploy2002 helmfile [eqiad] START helmfile.d/services/cxserver: apply [05:25:36] !log kartik@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply [05:28:40] !log Updated cxserver to 2026-01-20-115813-production (T415038, T415046, T414558) [05:28:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:28:48] T415038: Post-creation work for kajwiki - https://phabricator.wikimedia.org/T415038 [05:28:49] T415046: Post-creation work for pplwiki - https://phabricator.wikimedia.org/T415046 [05:28:49] T414558: Wikipedia Content Translation Tool displays blank page and never loads - https://phabricator.wikimedia.org/T414558 [05:32:05] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T415786)', diff saved to https://phabricator.wikimedia.org/P88860 and previous config saved to /var/cache/conftool/dbconfig/20260218-053204-marostegui.json [05:32:09] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [05:32:21] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance [05:32:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2179 (T415786)', diff saved to https://phabricator.wikimedia.org/P88861 and previous config saved to /var/cache/conftool/dbconfig/20260218-053229-marostegui.json [05:43:39] FIRING: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [05:48:39] RESOLVED: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [05:58:39] FIRING: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [06:03:39] RESOLVED: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [06:23:18] (03PS1) 10Marostegui: pc202[14]: New hosts [puppet] - 10https://gerrit.wikimedia.org/r/1240171 (https://phabricator.wikimedia.org/T417069) [06:27:18] (03CR) 10Marostegui: [C:03+2] pc202[14]: New hosts [puppet] - 10https://gerrit.wikimedia.org/r/1240171 (https://phabricator.wikimedia.org/T417069) (owner: 10Marostegui) [06:28:39] FIRING: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [06:31:50] (03PS1) 10Giuseppe Lavagetto: cache::haproxy: limit email addresses to reasonable lengths [puppet] - 10https://gerrit.wikimedia.org/r/1240174 [06:33:39] RESOLVED: CoreBGPDown: Core BGP session down between cr2-magru and asw1-b3-magru (2a02:ec80:700:fe08::2) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=magru&var-device=cr2-magru:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-b3-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [06:46:51] (03PS1) 10Giuseppe Lavagetto: varnish::frontend: add requestctl filters for bots [puppet] - 10https://gerrit.wikimedia.org/r/1240180 [06:46:51] (03PS1) 10Giuseppe Lavagetto: cache::varnish: include requestctl filters for bots [puppet] - 10https://gerrit.wikimedia.org/r/1240181 [06:52:40] FIRING: [2x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T0700) [07:04:19] PROBLEM - SSH on stat1010 is CRITICAL: Server answer: Exceeded MaxStartups https://wikitech.wikimedia.org/wiki/SSH/monitoring [07:05:15] RECOVERY - SSH on stat1010 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u5 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [07:12:05] (03CR) 10Brouberol: dse-k8s: Enable active/active for dse-k8s clusters (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1238441 (https://phabricator.wikimedia.org/T396478) (owner: 10Bking) [07:15:29] (03PS1) 10Ayounsi: Icinga: update mr1-ulsfo IP, remove eqiad old rows C/D [puppet] - 10https://gerrit.wikimedia.org/r/1240188 [07:17:11] (03PS2) 10Ayounsi: Update mr1-ulsfo IP, remove eqiad old rows C/D [puppet] - 10https://gerrit.wikimedia.org/r/1240188 (https://phabricator.wikimedia.org/T412525) [07:41:00] (03PS1) 10Arnaudb: gerrit: adapt httpd config to ATS [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417536) [07:42:15] (03CR) 10CI reject: [V:04-1] gerrit: adapt httpd config to ATS [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417536) (owner: 10Arnaudb) [07:49:39] (03CR) 10Tiziano Fogli: "Since we’ve already implemented the “feature” to avoid requiring a flat structure in the manifests repository, I can work on moving sloth " [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli) [07:52:39] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 18 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239873 (https://phabricator.wikimedia.org/T415910) (owner: 10Thiemo Kreuz (WMDE)) [07:52:55] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 18 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239877 (https://phabricator.wikimedia.org/T415909) (owner: 10Thiemo Kreuz (WMDE)) [07:54:30] (03Abandoned) 10Jelto: wikimedia: revert gerrit behind the CDN [dns] - 10https://gerrit.wikimedia.org/r/1239878 (https://phabricator.wikimedia.org/T417497) (owner: 10Jelto) [07:59:14] (03PS1) 10Muehlenhoff: Fix typo/copy&paste errors in comments [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240202 [07:59:45] (03PS1) 10Jcrespo: backups: Fix error on domain for backup hosts [puppet] - 10https://gerrit.wikimedia.org/r/1240203 (https://phabricator.wikimedia.org/T414727) [08:00:04] Amir1, Urbanecm, and awight: It is that lovely time of the day again! You are hereby commanded to deploy UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T0800). [08:00:05] Thiemo_WMDE: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [08:01:10] I'm here. Sorry I scheduled something just a few minutes ago. Is anyone available for a backport? [08:02:00] (03CR) 10Jcrespo: [C:03+2] backups: Fix error on domain for backup hosts [puppet] - 10https://gerrit.wikimedia.org/r/1240203 (https://phabricator.wikimedia.org/T414727) (owner: 10Jcrespo) [08:03:42] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops, 13Patch-For-Review: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11626314 (10jcrespo) >>! In T414727#11624536, @Jhancock.wm wrote: > @jcrespo i need an edit to the site.pp file. the backup20XX servers have eqiad in the... [08:05:22] (03PS6) 10Arnaudb: gerrit: adapt httpd config to ATS [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417536) [08:05:22] (03CR) 10Arnaudb: "pcc output visible here: https://puppet-compiler.wmflabs.org/output/1240197/5814/gerrit2003.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417536) (owner: 10Arnaudb) [08:05:44] (03CR) 10CI reject: [V:04-1] Fix typo/copy&paste errors in comments [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240202 (owner: 10Muehlenhoff) [08:10:34] (03PS1) 10Muehlenhoff: Switch the orchestrator role to nftables [puppet] - 10https://gerrit.wikimedia.org/r/1240204 [08:11:43] (03PS7) 10Arnaudb: gerrit: adapt httpd config to ATS [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417536) [08:11:43] (03CR) 10Arnaudb: "sorry for the last review, I forgot to use variables in the template. This is now fixed! here is the pcc output: https://puppet-compiler.w" [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417536) (owner: 10Arnaudb) [08:13:43] (03PS4) 10Muehlenhoff: Make the pbuilder hook for apt.wikimedia.org compatible with trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239974 [08:14:05] (03CR) 10Muehlenhoff: Make the pbuilder hook for apt.wikimedia.org compatible with trixie (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1239974 (owner: 10Muehlenhoff) [08:18:00] 10ops-codfw, 10SRE-swift-storage, 06DC-Ops, 10decommission-hardware: decommission ms-be20[57-61].codfw.wmnet - https://phabricator.wikimedia.org/T417735 (10MatthewVernon) 03NEW [08:18:42] (03PS1) 10Marostegui: dbproxy1029: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1240206 (https://phabricator.wikimedia.org/T414656) [08:19:19] (03CR) 10Marostegui: [C:03+2] dbproxy1029: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1240206 (https://phabricator.wikimedia.org/T414656) (owner: 10Marostegui) [08:19:26] (03CR) 10Muehlenhoff: [C:03+2] Remove puppetmaster::r10k [puppet] - 10https://gerrit.wikimedia.org/r/1239897 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [08:19:44] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host dbproxy1029.eqiad.wmnet with OS trixie [08:24:32] (03CR) 10Marostegui: [C:03+1] "Works for me, but coordinate with Federico: https://phabricator.wikimedia.org/T416582#11626157" [puppet] - 10https://gerrit.wikimedia.org/r/1240204 (owner: 10Muehlenhoff) [08:29:46] (03CR) 10Muehlenhoff: [C:03+2] Remove puppetmaster::monitoring and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239891 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [08:32:06] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1029.eqiad.wmnet with reason: host reimage [08:33:26] (03CR) 10Filippo Giunchedi: [C:03+1] Make the pbuilder hook for apt.wikimedia.org compatible with trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239974 (owner: 10Muehlenhoff) [08:38:36] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1029.eqiad.wmnet with reason: host reimage [08:39:49] 06SRE, 10SRE-Access-Requests: Requesting update of Raymond Ndibe's SSH key to Yubikey-backed key - https://phabricator.wikimedia.org/T417594#11626420 (10MatthewVernon) @Raymond_Ndibe I sent you a slack message yesterday - can you either reply to that with your new ssh public key, or upload a patch set to gerri... [08:40:39] (03CR) 10JMeybohm: [C:03+1] docker_registry: route /v2/test prefix to s3/apus (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1239164 (https://phabricator.wikimedia.org/T394476) (owner: 10Elukey) [08:41:39] (03CR) 10Jelto: [C:03+2] gerrit: remove old bots from apache config [puppet] - 10https://gerrit.wikimedia.org/r/1239087 (https://phabricator.wikimedia.org/T417263) (owner: 10Jelto) [08:47:51] (03CR) 10Muehlenhoff: [C:03+2] Remove puppetmaster::rsync and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239898 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [08:53:43] (03CR) 10JMeybohm: [C:03+1] "Done" [puppet] - 10https://gerrit.wikimedia.org/r/1239135 (https://phabricator.wikimedia.org/T416670) (owner: 10Elukey) [08:53:50] (03PS1) 10Marostegui: Revert "dbproxy1029: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1240212 [08:55:27] 06SRE, 10conftool, 06Data-Persistence, 06Infrastructure-Foundations: Integrate dbctl IP changes as part of VLAN changes. - https://phabricator.wikimedia.org/T360029#11626472 (10Volans) a:05Volans→03None Un-assign myself as I'm not working on this. [08:56:46] (03CR) 10Muehlenhoff: [C:03+2] Remove puppetmaster:ssl [puppet] - 10https://gerrit.wikimedia.org/r/1239908 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [08:58:18] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1029.eqiad.wmnet with OS trixie [08:58:22] (03Abandoned) 10Muehlenhoff: puppetmaster: use strong ciphers only [puppet] - 10https://gerrit.wikimedia.org/r/453126 (owner: 10BBlack) [08:58:24] 10SRE-tools, 06Infrastructure-Foundations: Outdated cookbooks cleanup - https://phabricator.wikimedia.org/T379259#11626491 (10Volans) 05Open→03Resolved Resolving this old task, there was some cleanup done at the time, in case it's deemed necessary to do a new pass a new task should be created. [09:00:05] dancy and jnuche: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T0900). [09:01:04] (03PS1) 10Volans: wmcs: infra-tracing-nfs update hiera keys [labs/private] - 10https://gerrit.wikimedia.org/r/1240214 (https://phabricator.wikimedia.org/T399313) [09:01:32] (03PS1) 10Arnaudb: gerrit: clarify confirmation wording on read-only [cookbooks] - 10https://gerrit.wikimedia.org/r/1240213 (https://phabricator.wikimedia.org/T387833) [09:01:33] (03PS1) 10Volans: wmcs: infra-tracing-nfs uniform hiera keys [puppet] - 10https://gerrit.wikimedia.org/r/1240215 (https://phabricator.wikimedia.org/T399313) [09:01:56] (03CR) 10Marostegui: [C:03+2] Revert "dbproxy1029: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1240212 (owner: 10Marostegui) [09:03:05] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-private-users for maxbinderWMF - https://phabricator.wikimedia.org/T417655#11626573 (10MatthewVernon) [09:03:36] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-private-users for maxbinderWMF - https://phabricator.wikimedia.org/T417655#11626576 (10MatthewVernon) There's already a shell account in place, so the only thing needed here is approval from @KSiebert - are you OK to approve this request, please? [09:03:58] (03CR) 10DCausse: [C:04-1] opensearch-cluster: allow the definition of custom network policies (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238298 (https://phabricator.wikimedia.org/T414095) (owner: 10Brouberol) [09:05:01] (03CR) 10Volans: "PCC says noop but it failed to compile on prod (due to the current typo):" [puppet] - 10https://gerrit.wikimedia.org/r/1240215 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans) [09:05:52] (03Merged) 10jenkins-bot: gerrit: clarify confirmation wording on read-only [cookbooks] - 10https://gerrit.wikimedia.org/r/1240213 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [09:05:53] (03CR) 10Muehlenhoff: [C:03+2] Make the pbuilder hook for apt.wikimedia.org compatible with trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239974 (owner: 10Muehlenhoff) [09:06:18] (03PS2) 10Muehlenhoff: Switch the orchestrator role to nftables [puppet] - 10https://gerrit.wikimedia.org/r/1240204 [09:07:48] (03CR) 10Fabfur: [C:03+1] varnish::frontend: add requestctl filters for bots [puppet] - 10https://gerrit.wikimedia.org/r/1240180 (owner: 10Giuseppe Lavagetto) [09:08:07] (03CR) 10Fabfur: [C:03+1] cache::haproxy: limit email addresses to reasonable lengths [puppet] - 10https://gerrit.wikimedia.org/r/1240174 (owner: 10Giuseppe Lavagetto) [09:16:16] (03PS1) 10Arnaudb: gerrit: remove read-only config [puppet] - 10https://gerrit.wikimedia.org/r/1240217 (https://phabricator.wikimedia.org/T387833) [09:17:09] (03PS1) 10Arnaudb: gerrit: read-only confirmation wording tweaking [cookbooks] - 10https://gerrit.wikimedia.org/r/1240216 (https://phabricator.wikimedia.org/T387833) [09:19:05] (03CR) 10Federico Ceratto: "I would suggest merging it in a week or so after we deprovision dborch1001, but if are not happy with keeping the CR on hold I can merge i" [puppet] - 10https://gerrit.wikimedia.org/r/1240204 (owner: 10Muehlenhoff) [09:22:18] (03CR) 10VolkerE: "That's fine, but a task reference would be helpful." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240012 (owner: 10Bernard Wang) [09:24:41] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [09:27:39] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup2015 - https://phabricator.wikimedia.org/T414724#11626669 (10jcrespo) Fixed at https://gerrit.wikimedia.org/r/c/operations/puppet/+/1240203 [09:28:25] !log arnaudb@cumin1003 START - Cookbook sre.hosts.reimage for host gerrit1003.wikimedia.org with OS bookworm [09:31:43] (03PS1) 10Arnaudb: gerrit: change daemon user for gerrit1003 [puppet] - 10https://gerrit.wikimedia.org/r/1240219 (https://phabricator.wikimedia.org/T417246) [09:32:22] (03CR) 10Arnaudb: [C:03+2] gerrit: change daemon user for gerrit1003 [puppet] - 10https://gerrit.wikimedia.org/r/1240219 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb) [09:37:10] (03PS1) 10Marostegui: core_test.pp: Remove read_only check from core_test hosts [puppet] - 10https://gerrit.wikimedia.org/r/1240220 [09:39:49] (03PS1) 10Muehlenhoff: Fix pbuilder hook for trixie [puppet] - 10https://gerrit.wikimedia.org/r/1240221 [09:40:09] (03CR) 10Muehlenhoff: "Either is fine with me" [puppet] - 10https://gerrit.wikimedia.org/r/1240204 (owner: 10Muehlenhoff) [09:40:35] (03CR) 10CI reject: [V:04-1] Fix pbuilder hook for trixie [puppet] - 10https://gerrit.wikimedia.org/r/1240221 (owner: 10Muehlenhoff) [09:42:53] (03PS2) 10Muehlenhoff: Fix pbuilder hook for trixie [puppet] - 10https://gerrit.wikimedia.org/r/1240221 [09:46:32] !log arnaudb@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage [09:48:09] (03CR) 10Muehlenhoff: [C:03+2] Remove puppetmaster::passenger and related files [puppet] - 10https://gerrit.wikimedia.org/r/1239907 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [09:53:19] (03PS2) 10Muehlenhoff: Remove puppetmaster::web_frontend and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239899 (https://phabricator.wikimedia.org/T365798) [09:53:50] !log arnaudb@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage [09:54:22] (03PS2) 10Effie Mouzeli: service.yaml: switch mw-parsoid to lvs_setup #2 [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) [09:55:23] (03PS3) 10Effie Mouzeli: service.yaml: switch mw-parsoid to lvs_setup #2 [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) [09:56:43] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-private-users for maxbinderWMF - https://phabricator.wikimedia.org/T417655#11626786 (10KSiebert) @MatthewVernon Sure, happy to approve! [09:57:35] (03CR) 10Effie Mouzeli: [C:03+1] Run the Redis spec tests on Bullseye [puppet] - 10https://gerrit.wikimedia.org/r/1239931 (owner: 10Muehlenhoff) [09:59:05] (03CR) 10Filippo Giunchedi: [C:03+1] wmcs: infra-tracing-nfs uniform hiera keys [puppet] - 10https://gerrit.wikimedia.org/r/1240215 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans) [09:59:08] (03CR) 10Filippo Giunchedi: [C:03+1] wmcs: infra-tracing-nfs update hiera keys [labs/private] - 10https://gerrit.wikimedia.org/r/1240214 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans) [09:59:26] (03CR) 10Filippo Giunchedi: "Bummer :( but ok!" [puppet] - 10https://gerrit.wikimedia.org/r/1240221 (owner: 10Muehlenhoff) [10:01:00] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 18 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240032 (https://phabricator.wikimedia.org/T375198) (owner: 10Sergio Gimeno) [10:03:55] (03PS1) 10MVernon: admin: add mbinder to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1240224 (https://phabricator.wikimedia.org/T417655) [10:04:23] (03CR) 10Arnaudb: gerrit: read-only confirmation wording tweaking [cookbooks] - 10https://gerrit.wikimedia.org/r/1240216 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [10:04:24] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239899 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [10:04:32] (03CR) 10Arnaudb: [C:03+2] gerrit: read-only confirmation wording tweaking [cookbooks] - 10https://gerrit.wikimedia.org/r/1240216 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [10:09:17] (03CR) 10Effie Mouzeli: Run the Redis spec tests on Bullseye [puppet] - 10https://gerrit.wikimedia.org/r/1239931 (owner: 10Muehlenhoff) [10:09:50] (03Merged) 10jenkins-bot: gerrit: read-only confirmation wording tweaking [cookbooks] - 10https://gerrit.wikimedia.org/r/1240216 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [10:10:12] (03CR) 10Effie Mouzeli: "As we discussed, maybe it is time we let tests go to spec test heaven." [puppet] - 10https://gerrit.wikimedia.org/r/1239931 (owner: 10Muehlenhoff) [10:14:52] (03CR) 10Tiziano Fogli: [C:03+1] Update mr1-ulsfo IP, remove eqiad old rows C/D [puppet] - 10https://gerrit.wikimedia.org/r/1240188 (https://phabricator.wikimedia.org/T412525) (owner: 10Ayounsi) [10:15:15] (03CR) 10Ayounsi: [C:03+2] Update mr1-ulsfo IP, remove eqiad old rows C/D [puppet] - 10https://gerrit.wikimedia.org/r/1240188 (https://phabricator.wikimedia.org/T412525) (owner: 10Ayounsi) [10:16:00] (03CR) 10Muehlenhoff: [C:03+2] Fix pbuilder hook for trixie [puppet] - 10https://gerrit.wikimedia.org/r/1240221 (owner: 10Muehlenhoff) [10:16:51] (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1240224 (https://phabricator.wikimedia.org/T417655) (owner: 10MVernon) [10:17:02] jouncebot: nowandnext [10:17:02] For the next 0 hour(s) and 42 minute(s): MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T0900) [10:17:02] In 0 hour(s) and 42 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1100) [10:17:58] Looks like the train is proceeding in the later slots this week? [10:19:50] RECOVERY - Host mr1-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 71.66 ms [10:20:08] 10ops-eqiad, 06SRE, 06DC-Ops: Alert for device ps1-e3-eqiad.mgmt.eqiad.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T417316#11626887 (10phaultfinder) [10:20:58] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host cumin2003.codfw.wmnet [10:21:37] (03PS8) 10Effie Mouzeli: kubernetes::mediawiki_experimental: add parsoid repo #3 [puppet] - 10https://gerrit.wikimedia.org/r/1238345 (https://phabricator.wikimedia.org/T386246) [10:21:46] (03PS1) 10Dreamy Jazz: Drop $wgIPReputationEnableLoginCaptchaIfIPKnown [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240227 (https://phabricator.wikimedia.org/T416941) [10:21:48] RECOVERY - Host mr1-ulsfo IPv6 is UP: PING OK - Packet loss = 0%, RTA = 71.76 ms [10:25:50] (03CR) 10Dreamy Jazz: [C:04-2] "Didn't see that the patch wasn't merged until Tuesday so it was only included in this week's train." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240227 (https://phabricator.wikimedia.org/T416941) (owner: 10Dreamy Jazz) [10:26:25] (03CR) 10Muehlenhoff: [C:03+2] Remove puppetmaster::web_frontend and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239899 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [10:26:48] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2003.codfw.wmnet [10:26:52] (03CR) 10MVernon: [C:03+2] admin: add mbinder to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1240224 (https://phabricator.wikimedia.org/T417655) (owner: 10MVernon) [10:29:38] (03CR) 10Effie Mouzeli: [C:03+2] kubernetes::mediawiki_experimental: add parsoid repo #3 [puppet] - 10https://gerrit.wikimedia.org/r/1238345 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli) [10:30:25] (03PS4) 10Effie Mouzeli: deployment_server: add parsoid pinkllama release #4 [puppet] - 10https://gerrit.wikimedia.org/r/1238349 (https://phabricator.wikimedia.org/T386246) [10:33:04] (03PS1) 10Federico Ceratto: orchestrator: disable service on dborch1001 [puppet] - 10https://gerrit.wikimedia.org/r/1240228 (https://phabricator.wikimedia.org/T416582) [10:38:20] FIRING: [2x] JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [10:39:52] 10ops-eqiad, 06SRE, 06DC-Ops: Alert for device ps1-e3-eqiad.mgmt.eqiad.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T417316#11626989 (10phaultfinder) [10:41:23] !log arnaudb@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gerrit1003.wikimedia.org with OS bookworm [10:42:07] !log joal@deploy2002 Started deploy [analytics/refinery@28fa1ea] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28fa1eac] [10:42:27] (03PS1) 10Kevin Bazira: ml-services: scale rr-wikidata to two replicas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240234 (https://phabricator.wikimedia.org/T414060) [10:42:33] (03PS1) 10Muehlenhoff: Remove obsolete spec tests [puppet] - 10https://gerrit.wikimedia.org/r/1240235 [10:42:33] (03PS2) 10Arnaudb: gerrit: change system user for gerrit1003 [puppet] - 10https://gerrit.wikimedia.org/r/1240230 (https://phabricator.wikimedia.org/T417246) [10:42:54] (03CR) 10Arnaudb: gerrit: change system user for gerrit1003 [puppet] - 10https://gerrit.wikimedia.org/r/1240230 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb) [10:43:35] (03Abandoned) 10Muehlenhoff: Run the Redis spec tests on Bullseye [puppet] - 10https://gerrit.wikimedia.org/r/1239931 (owner: 10Muehlenhoff) [10:44:04] !log joal@deploy2002 Finished deploy [analytics/refinery@28fa1ea] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28fa1eac] (duration: 01m 57s) [10:44:54] !log joal@deploy2002 Started deploy [analytics/refinery@28fa1ea]: Regular analytics weekly train [analytics/refinery@28fa1eac] [10:46:01] (03PS2) 10Muehlenhoff: Remove puppetmaster::gitclone and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239895 (https://phabricator.wikimedia.org/T365798) [10:47:38] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239895 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [10:49:00] !log joal@deploy2002 Finished deploy [analytics/refinery@28fa1ea]: Regular analytics weekly train [analytics/refinery@28fa1eac] (duration: 04m 06s) [10:49:31] !log joal@deploy2002 Started deploy [analytics/refinery@28fa1ea] (thin): Regular analytics weekly train THIN [analytics/refinery@28fa1eac] [10:51:27] !log joal@deploy2002 Finished deploy [analytics/refinery@28fa1ea] (thin): Regular analytics weekly train THIN [analytics/refinery@28fa1eac] (duration: 01m 56s) [10:51:41] (03CR) 10Gkyziridis: [C:03+1] "Thnx for deploying" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240234 (https://phabricator.wikimedia.org/T414060) (owner: 10Kevin Bazira) [10:52:23] (03PS3) 10Arnaudb: gerrit: change system user for gerrit1003 [puppet] - 10https://gerrit.wikimedia.org/r/1240230 (https://phabricator.wikimedia.org/T417246) [10:52:24] (03CR) 10Arnaudb: [C:03+2] "pcc confirms noop for gerrit2003 and gerrit2002: https://puppet-compiler.wmflabs.org/output/1240230/5818/" [puppet] - 10https://gerrit.wikimedia.org/r/1240230 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb) [10:52:30] (03CR) 10Muehlenhoff: [C:03+2] Remove puppetmaster::gitclone and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239895 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [10:52:40] FIRING: [2x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:56:42] !log arnaudb@cumin1003 START - Cookbook sre.hosts.reimage for host gerrit1003.wikimedia.org with OS bookworm [10:57:05] (03CR) 10Federico Ceratto: "Few suggestion, we could also discuss future monitoring scripts on IRC or in a meeting." [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui) [11:00:04] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1100) [11:00:49] (03PS1) 10Fabfur: New release [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1240240 [11:02:04] (03CR) 10Kevin Bazira: [C:03+2] ml-services: scale rr-wikidata to two replicas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240234 (https://phabricator.wikimedia.org/T414060) (owner: 10Kevin Bazira) [11:04:00] (03CR) 10Fabfur: [V:03+2 C:03+2] New release [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1240240 (owner: 10Fabfur) [11:04:21] (03PS1) 10Muehlenhoff: Revert "Remove puppetmaster::gitclone and related classes" [puppet] - 10https://gerrit.wikimedia.org/r/1240242 (https://phabricator.wikimedia.org/T365798) [11:04:34] (03Merged) 10jenkins-bot: ml-services: scale rr-wikidata to two replicas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240234 (https://phabricator.wikimedia.org/T414060) (owner: 10Kevin Bazira) [11:04:53] (03CR) 10CI reject: [V:04-1] Revert "Remove puppetmaster::gitclone and related classes" [puppet] - 10https://gerrit.wikimedia.org/r/1240242 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [11:06:13] !log fabfur@cumin1003 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "New scope bots - fabfur@cumin1003" [11:06:15] !log fabfur@cumin1003 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: New scope bots - fabfur@cumin1003 [11:07:09] !log fabfur@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: New scope bots - fabfur@cumin1003 [11:07:10] !log fabfur@cumin1003 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "New scope bots - fabfur@cumin1003" [11:09:28] (03CR) 10Muehlenhoff: [V:03+2] "Forcing CI since this is a revert and there's no need to add SPDX headers for a class soon being removed..." [puppet] - 10https://gerrit.wikimedia.org/r/1240242 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [11:09:35] !log kevinbazira@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [11:09:41] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Revert "Remove puppetmaster::gitclone and related classes" [puppet] - 10https://gerrit.wikimedia.org/r/1240242 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [11:11:10] (03PS2) 10Muehlenhoff: Revert "Remove puppetmaster::gitclone and related classes" [puppet] - 10https://gerrit.wikimedia.org/r/1240242 (https://phabricator.wikimedia.org/T365798) [11:12:02] !log kevinbazira@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [11:12:17] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 18 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239873 (https://phabricator.wikimedia.org/T415910) (owner: 10Thiemo Kreuz (WMDE)) [11:12:25] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-private-users for maxbinderWMF - https://phabricator.wikimedia.org/T417655#11627150 (10MatthewVernon) [11:12:28] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 18 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239877 (https://phabricator.wikimedia.org/T415909) (owner: 10Thiemo Kreuz (WMDE)) [11:12:32] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-private-users for maxbinderWMF - https://phabricator.wikimedia.org/T417655#11627151 (10MatthewVernon) 05Open→03Resolved a:03MatthewVernon All done :) [11:13:49] (03CR) 10Fabfur: [C:03+2] varnish::frontend: add requestctl filters for bots [puppet] - 10https://gerrit.wikimedia.org/r/1240180 (owner: 10Giuseppe Lavagetto) [11:14:21] (03CR) 10Muehlenhoff: [C:03+2] Revert "Remove puppetmaster::gitclone and related classes" [puppet] - 10https://gerrit.wikimedia.org/r/1240242 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [11:14:45] (03PS1) 10Jelto: tcpproxy: add internal gerrit backend with higher timeout [puppet] - 10https://gerrit.wikimedia.org/r/1240243 (https://phabricator.wikimedia.org/T417497) [11:14:47] !log arnaudb@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage [11:19:58] !log arnaudb@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage [11:21:01] (03CR) 10Muehlenhoff: [C:03+2] Run Puppetboard spec tests on Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1239586 (owner: 10Muehlenhoff) [11:21:35] (03CR) 10Vgutierrez: cache::haproxy: limit email addresses to reasonable lengths (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1240174 (owner: 10Giuseppe Lavagetto) [11:26:40] 06SRE, 06Infrastructure-Foundations, 06ServiceOps new, 06Traffic: Trixie switches rp_filter from strict (1) to loose (2) for all interfaces - https://phabricator.wikimedia.org/T417632#11627180 (10JMeybohm) @ayounsi suggested we could remove `linux-sysctl-defaults` from our nodes and copy the useful setting... [11:37:34] (03CR) 10Volans: [V:03+2 C:03+2] wmcs: infra-tracing-nfs update hiera keys [labs/private] - 10https://gerrit.wikimedia.org/r/1240214 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans) [11:37:42] (03CR) 10Marostegui: "disable notifications too, please" [puppet] - 10https://gerrit.wikimedia.org/r/1240228 (https://phabricator.wikimedia.org/T416582) (owner: 10Federico Ceratto) [11:37:42] (03CR) 10Volans: [C:03+2] wmcs: infra-tracing-nfs uniform hiera keys [puppet] - 10https://gerrit.wikimedia.org/r/1240215 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans) [11:40:57] (03PS3) 10Majavah: openstack: encapi: Store project names in database [puppet] - 10https://gerrit.wikimedia.org/r/1239965 (https://phabricator.wikimedia.org/T416588) [11:43:18] (03CR) 10Effie Mouzeli: service.yaml: switch mw-parsoid to lvs_setup #2 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli) [11:44:12] (03CR) 10Majavah: "tested in codfw1dev" [puppet] - 10https://gerrit.wikimedia.org/r/1239965 (https://phabricator.wikimedia.org/T416588) (owner: 10Majavah) [11:44:28] (03CR) 10Fabfur: [C:03+1] "VTC tests are good" [puppet] - 10https://gerrit.wikimedia.org/r/1240181 (owner: 10Giuseppe Lavagetto) [11:48:03] (03CR) 10Vgutierrez: gerrit: adapt httpd config to ATS (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417536) (owner: 10Arnaudb) [11:48:16] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [11:48:20] RESOLVED: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [11:48:24] (03PS1) 10Majavah: O:wmcs: codfw1dev: cloudweb: Remove IDP profiles [puppet] - 10https://gerrit.wikimedia.org/r/1240251 (https://phabricator.wikimedia.org/T410294) [11:49:02] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [11:49:38] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1240251 (https://phabricator.wikimedia.org/T410294) (owner: 10Majavah) [11:49:46] (03CR) 10Majavah: [C:03+2] O:wmcs: codfw1dev: cloudweb: Remove IDP profiles [puppet] - 10https://gerrit.wikimedia.org/r/1240251 (https://phabricator.wikimedia.org/T410294) (owner: 10Majavah) [11:51:03] jouncebot: nowandnext [11:51:03] For the next 0 hour(s) and 8 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1100) [11:51:03] In 0 hour(s) and 8 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1200) [11:51:14] Want to deploy a security patch via scap [11:52:43] (03CR) 10Fabfur: [C:03+2] cache::varnish: include requestctl filters for bots [puppet] - 10https://gerrit.wikimedia.org/r/1240181 (owner: 10Giuseppe Lavagetto) [11:54:04] (03CR) 10Majavah: Drop support for Python 3.7 and 3.8 (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1239678 (owner: 10Volans) [11:54:38] !log jayme@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1003.eqiad.wmnet [11:55:16] 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, and 2 others: Site: codfw 1 VM request for codfw1dev CAS test/dev, hostname: cloudidp2001-dev - https://phabricator.wikimedia.org/T410294#11627269 (10taavi) 05Open→03Resolved This seems done so boldly closing [11:57:48] !log upload golang-github-intel-go-cpuid 0.0~git20210602.5747e5c-2+deb13u1 to trixie-wikimedia (apt.wm.o) - T401832 [11:57:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:57:52] T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832 [11:59:27] 10ops-codfw, 06SRE, 06DC-Ops, 06ServiceOps new: wikikube-worker2346 DOA - https://phabricator.wikimedia.org/T414708#11627314 (10Clement_Goubert) Thanks for the hardware wrangling! [11:59:44] !log jayme@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage1003.eqiad.wmnet [12:00:05] mvolz: That opportune time for a Services – Citoid / Zotero deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1200). [12:01:47] !log jayme@cumin1003 START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS trixie [12:06:56] !log dreamyjazz Deployed security patch for T411366 [12:08:03] (03PS1) 10Majavah: puppetboard: Do not load fonts from external CDNs [puppet] - 10https://gerrit.wikimedia.org/r/1240258 (https://phabricator.wikimedia.org/T417771) [12:09:03] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8064/co" [puppet] - 10https://gerrit.wikimedia.org/r/1240258 (https://phabricator.wikimedia.org/T417771) (owner: 10Majavah) [12:10:05] (03CR) 10Vgutierrez: "looking good, please fix the commit message :)" [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli) [12:11:00] (03PS4) 10Effie Mouzeli: service.yaml: remove alerts from mw-parsoid #2 [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) [12:11:10] (03CR) 10Effie Mouzeli: service.yaml: remove alerts from mw-parsoid #2 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli) [12:12:44] !log dreamyjazz Deployed security patch for T411366 [12:16:48] !log jayme@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage [12:24:01] !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage [12:29:42] (03PS1) 10Majavah: P:wmcs::cloudgw: Set VRF before adding more addresses [puppet] - 10https://gerrit.wikimedia.org/r/1240263 (https://phabricator.wikimedia.org/T417075) [12:30:35] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8065/co" [puppet] - 10https://gerrit.wikimedia.org/r/1240263 (https://phabricator.wikimedia.org/T417075) (owner: 10Majavah) [12:31:46] (03PS2) 10Volans: Drop support for Python 3.7 and 3.8 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1239678 [12:31:46] (03PS2) 10Volans: tests: remove fixture require_caplog [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1239679 [12:31:46] (03PS2) 10Volans: type hints: use standard types as type hints [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1239680 [12:31:58] (03CR) 10Volans: Drop support for Python 3.7 and 3.8 (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1239678 (owner: 10Volans) [12:32:41] Deploying again [12:33:02] (03PS1) 10Marco Fossati: ReaderExperiments' MobileToc stream configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240264 (https://phabricator.wikimedia.org/T415611) [12:43:00] !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS trixie [12:45:35] !log jayme@cumin1003 START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1003.eqiad.wmnet [12:45:35] !log jayme@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1003.eqiad.wmnet [12:52:36] !log bking@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1028.eqiad.wmnet with OS bookworm [12:53:19] (03PS1) 10Muehlenhoff: Remove puppetmaster::ca_server and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1240268 (https://phabricator.wikimedia.org/T365798) [12:54:19] (03PS8) 10Arnaudb: gerrit: adapt httpd config to ATS [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417536) [12:55:26] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240268 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [12:57:03] (03PS1) 10Mszwarc: Add '(oathauth-recover-for-user)' to 'wmf-supportsafety' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240270 (https://phabricator.wikimedia.org/T415883) [12:57:10] (03PS1) 10Majavah: P:toolforge: mailrelay: Never expand tool alias to disabled maintainers [puppet] - 10https://gerrit.wikimedia.org/r/1240271 [12:59:31] PROBLEM - SSH on stat1010 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [13:00:24] RECOVERY - SSH on stat1010 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u5 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [13:00:43] (03CR) 10Effie Mouzeli: [C:03+1] Remove obsolete spec tests [puppet] - 10https://gerrit.wikimedia.org/r/1240235 (owner: 10Muehlenhoff) [13:00:55] (03PS5) 10Marostegui: mariadb: Add events checker [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) [13:01:27] (03CR) 10CI reject: [V:04-1] mariadb: Add events checker [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui) [13:02:22] (03CR) 10Marostegui: mariadb: Add events checker (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui) [13:02:25] RESOLVED: [2x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:03:02] (03PS6) 10Marostegui: mariadb: Add events checker [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) [13:04:19] (03CR) 10Arnaudb: gerrit: adapt httpd config to ATS (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417536) (owner: 10Arnaudb) [13:07:50] (03PS1) 10Muehlenhoff: Remove obsolete puppetmaster Cumin aliases [puppet] - 10https://gerrit.wikimedia.org/r/1240274 [13:07:59] (03CR) 10Muehlenhoff: [C:03+2] Remove obsolete spec tests [puppet] - 10https://gerrit.wikimedia.org/r/1240235 (owner: 10Muehlenhoff) [13:08:36] (03PS1) 10JMeybohm: k8s-staging: Switch to IPIP mode for kube-apiserver [puppet] - 10https://gerrit.wikimedia.org/r/1240275 (https://phabricator.wikimedia.org/T352956) [13:08:49] (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240275 (https://phabricator.wikimedia.org/T352956) (owner: 10JMeybohm) [13:10:32] (03PS1) 10Clément Goubert: kubernetes: Add wikikube-worker23[32-56] [puppet] - 10https://gerrit.wikimedia.org/r/1240276 (https://phabricator.wikimedia.org/T417772) [13:13:47] (03CR) 10Federico Ceratto: [C:03+1] "Have you tested e.g. on the testbed? If so, LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui) [13:17:22] (03CR) 10Blake: [C:03+1] kubernetes: Add wikikube-worker23[32-56] [puppet] - 10https://gerrit.wikimedia.org/r/1240276 (https://phabricator.wikimedia.org/T417772) (owner: 10Clément Goubert) [13:18:06] (03CR) 10Marostegui: "That's the plan, testing it on core hosts 😊" [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui) [13:18:21] (03CR) 10Tchanders: [C:03+1] Add '(oathauth-recover-for-user)' to 'wmf-supportsafety' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240270 (https://phabricator.wikimedia.org/T415883) (owner: 10Mszwarc) [13:20:57] (03PS1) 10Muehlenhoff: Move the puppetmaster puppetdb client class under puppet_compiler [puppet] - 10https://gerrit.wikimedia.org/r/1240278 (https://phabricator.wikimedia.org/T365798) [13:21:37] (03PS2) 10Anzx: ruwikisource: EnableProtectionIndicators [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240277 (https://phabricator.wikimedia.org/T417590) [13:21:47] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 18 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240277 (https://phabricator.wikimedia.org/T417590) (owner: 10Anzx) [13:24:03] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240278 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [13:24:20] (03CR) 10Filippo Giunchedi: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1240263 (https://phabricator.wikimedia.org/T417075) (owner: 10Majavah) [13:24:41] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [13:24:56] !log arnaudb@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on gerrit1003.wikimedia.org with reason: T417246 [13:24:58] (03CR) 10Majavah: [V:03+1 C:03+2] P:wmcs::cloudgw: Set VRF before adding more addresses [puppet] - 10https://gerrit.wikimedia.org/r/1240263 (https://phabricator.wikimedia.org/T417075) (owner: 10Majavah) [13:25:00] T417246: Reimage gerrit1003 - https://phabricator.wikimedia.org/T417246 [13:26:13] (03CR) 10Fabfur: [C:03+2] cache::upload: enable global ratelimiting (codfw) [puppet] - 10https://gerrit.wikimedia.org/r/1237244 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [13:27:07] (03CR) 10Filippo Giunchedi: [C:03+1] "I'm not too familiar with the code, though from a quick look it seems sane" [puppet] - 10https://gerrit.wikimedia.org/r/1239965 (https://phabricator.wikimedia.org/T416588) (owner: 10Majavah) [13:28:03] (03CR) 10Marostegui: [C:03+2] mariadb: Add events checker [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui) [13:35:43] PROBLEM - Host cloudgw1004 is DOWN: PING CRITICAL - Packet loss = 100% [13:37:33] RECOVERY - Backup freshness on backup1014 is OK: Fresh: 139 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [13:37:37] RECOVERY - Host cloudgw1004 is UP: PING OK - Packet loss = 0%, RTA = 0.43 ms [13:44:13] arnaudb: I forced a ^ backup run to return to normality [13:44:27] on gerrit2003 [13:44:59] (03CR) 10Vgutierrez: [C:03+1] service.yaml: remove alerts from mw-parsoid #2 [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli) [13:47:03] (03PS2) 10Muehlenhoff: Move the puppetmaster puppetdb client class under puppet_compiler [puppet] - 10https://gerrit.wikimedia.org/r/1240278 (https://phabricator.wikimedia.org/T365798) [13:49:46] ack thanks jynus sorry I forgot to give you a heads up about resuming backups [13:50:25] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240278 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [13:52:35] (03CR) 10Muehlenhoff: [C:03+2] Remove obsolete puppetmaster Cumin aliases [puppet] - 10https://gerrit.wikimedia.org/r/1240274 (owner: 10Muehlenhoff) [13:53:24] (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1239678 (owner: 10Volans) [13:53:54] (03PS3) 10Clément Goubert: api-gateway: Add external services support [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225548 (https://phabricator.wikimedia.org/T414333) [13:54:54] (03CR) 10Majavah: [C:03+2] openstack: encapi: Store project names in database [puppet] - 10https://gerrit.wikimedia.org/r/1239965 (https://phabricator.wikimedia.org/T416588) (owner: 10Majavah) [13:55:07] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 18 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240270 (https://phabricator.wikimedia.org/T415883) (owner: 10Mszwarc) [13:59:25] (03CR) 10Federico Ceratto: [C:03+1] "Done" [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui) [13:59:41] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:00:43] jouncebot: now [14:00:44] For the next 0 hour(s) and 59 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1400) [14:02:22] o/ [14:02:31] o/ [14:03:32] anzx: do you need a deployer? [14:03:37] yes [14:04:18] I can deploy my and your patch together, just a moment [14:04:24] ok [14:04:48] (03CR) 10Clément Goubert: [C:03+1] restbase::production: remove mw-parsoid listener [puppet] - 10https://gerrit.wikimedia.org/r/1239709 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli) [14:04:49] arnaudb: no worries, it is a nice thing to get a heads up, but I don't require it [14:05:26] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszwarc@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240277 (https://phabricator.wikimedia.org/T417590) (owner: 10Anzx) [14:05:26] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszwarc@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240270 (https://phabricator.wikimedia.org/T415883) (owner: 10Mszwarc) [14:05:29] I just check the alert and make sure it was failing for a reason [14:06:33] I do have a backport request as well. Hope someone has time for these. No pressure, do the config changes first. [14:06:49] (03Merged) 10jenkins-bot: ruwikisource: EnableProtectionIndicators [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240277 (https://phabricator.wikimedia.org/T417590) (owner: 10Anzx) [14:06:54] (03Merged) 10jenkins-bot: Add '(oathauth-recover-for-user)' to 'wmf-supportsafety' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240270 (https://phabricator.wikimedia.org/T415883) (owner: 10Mszwarc) [14:07:26] Thiemo_WMDE: you need deployer as well, right? [14:07:39] !log mszwarc@deploy2002 Started scap sync-world: Backport for [[gerrit:1240277|ruwikisource: EnableProtectionIndicators (T417590)]], [[gerrit:1240270|Add '(oathauth-recover-for-user)' to 'wmf-supportsafety' (T415883)]] [14:07:41] Yes. I cannot deploy myself- [14:07:45] T417590: Enable the Protection indicators for ru.wikisource - https://phabricator.wikimedia.org/T417590 [14:07:45] T415883: Create a special page to generate additional recovery keys for other users - https://phabricator.wikimedia.org/T415883 [14:07:49] (03PS21) 10Tiziano Fogli: slothslos: add module to build and deploy sloth manifests [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) [14:08:20] RESOLVED: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:08:30] Okay, I can deploy your later [14:09:52] !log mszwarc@deploy2002 anzx, mszwarc: Backport for [[gerrit:1240277|ruwikisource: EnableProtectionIndicators (T417590)]], [[gerrit:1240270|Add '(oathauth-recover-for-user)' to 'wmf-supportsafety' (T415883)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:09:58] Msz2001: checking [14:11:32] Msz2001: looks good ok to sync [14:11:38] ok [14:11:41] !log mszwarc@deploy2002 anzx, mszwarc: Continuing with sync [14:15:48] !log mszwarc@deploy2002 Finished scap sync-world: Backport for [[gerrit:1240277|ruwikisource: EnableProtectionIndicators (T417590)]], [[gerrit:1240270|Add '(oathauth-recover-for-user)' to 'wmf-supportsafety' (T415883)]] (duration: 08m 10s) [14:15:54] T417590: Enable the Protection indicators for ru.wikisource - https://phabricator.wikimedia.org/T417590 [14:15:55] T415883: Create a special page to generate additional recovery keys for other users - https://phabricator.wikimedia.org/T415883 [14:16:17] Thiemo_WMDE: I'm now ready to deploy yours [14:16:29] Nice. I'm here. [14:16:45] They are effectively 1 change. I just failed to squash them. [14:17:13] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszwarc@deploy2002 using scap backport" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239873 (https://phabricator.wikimedia.org/T415910) (owner: 10Thiemo Kreuz (WMDE)) [14:17:14] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszwarc@deploy2002 using scap backport" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239877 (https://phabricator.wikimedia.org/T415909) (owner: 10Thiemo Kreuz (WMDE)) [14:17:26] (03CR) 10Tiziano Fogli: "`sloth generate` has been moved to a systemd unit." [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli) [14:18:26] anzx: I'll also run the maint script shortly [14:18:34] ok [14:18:40] (03CR) 10CI reject: [V:04-1] Add instrument for clicks in TOC references link [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239873 (https://phabricator.wikimedia.org/T415910) (owner: 10Thiemo Kreuz (WMDE)) [14:18:41] (03CR) 10CI reject: [V:04-1] Add instrument for clicks in footnotes in the article [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239877 (https://phabricator.wikimedia.org/T415909) (owner: 10Thiemo Kreuz (WMDE)) [14:19:18] (03PS22) 10Tiziano Fogli: slothslos: add module to build and deploy sloth manifests [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) [14:20:55] ^ CI failures are due to T417722 FWIW [14:20:56] T417722: php-jwt contains weak encryption - https://phabricator.wikimedia.org/T417722 [14:21:16] (03PS1) 10Marostegui: check_mariadb_events.sh: Fixes [puppet] - 10https://gerrit.wikimedia.org/r/1240286 (https://phabricator.wikimedia.org/T254738) [14:21:26] We should backport a patch to bump jwt to wmf.16, right? [14:21:46] probably… let’s ping Reedy though :) [14:23:19] (03PS3) 10Muehlenhoff: Move the puppetmaster puppetdb client class under puppet_compiler [puppet] - 10https://gerrit.wikimedia.org/r/1240278 (https://phabricator.wikimedia.org/T365798) [14:25:43] anzx: In the meantime, I emptied editor group on sqwiki [14:26:19] (03PS1) 10Muehlenhoff: Remove profile::puppetmaster::frontend [puppet] - 10https://gerrit.wikimedia.org/r/1240288 (https://phabricator.wikimedia.org/T365798) [14:26:21] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240278 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [14:26:24] (03PS2) 10JMeybohm: k8s-staging: Switch to IPIP mode for kube-apiserver [puppet] - 10https://gerrit.wikimedia.org/r/1240275 (https://phabricator.wikimedia.org/T352956) [14:27:48] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240288 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [14:28:33] (To make things clear, I'm not deploying anything right now, if there's someone who'd like to deploy something, you're free to do so) [14:28:46] (03PS4) 10Muehlenhoff: Move the puppetmaster puppetdb client class under puppet_compiler [puppet] - 10https://gerrit.wikimedia.org/r/1240278 (https://phabricator.wikimedia.org/T365798) [14:29:26] 06SRE, 06Traffic: Anycast ns[01].wikimedia.org for IPv4 - https://phabricator.wikimedia.org/T366193#11627887 (10ssingh) Thanks for the comment! Some more thoughts, based on the previous discussions and recent observations: - In our own traffic per netflow data for the month of January 2026, we see that ns2 ge... [14:31:09] Msz2001: fwiw I asked in the security channel if anyone has opinions either way [14:31:22] ack [14:31:28] if there’s no response after a while then my feeling is it’s probably okay to try backporting that version bump [14:31:35] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240278 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [14:33:01] We would need to do that in all four repos, probably, right? https://gerrit.wikimedia.org/r/q/bug:T417722 [14:33:01] T417722: php-jwt contains weak encryption - https://phabricator.wikimedia.org/T417722 [14:34:05] depends on how many of them get pulled into the CI build, I think [14:35:14] ok, at least CheckUser and ContentTranslation seem to be needed for https://integration.wikimedia.org/ci/job/quibble-with-gated-extensions-vendor-mysql-php83/13993/console [14:35:20] OAuth could perhaps be skipped [14:35:52] 06SRE, 06Traffic: Anycast ns[01].wikimedia.org for IPv4 - https://phabricator.wikimedia.org/T366193#11627934 (10ssingh) TL;DR: > [bblack] To me, the more-important meta-point is that we get all our public authdns IPs (IPv4 + IPv6) switched to Anycast-able (not site-specific) IPs, and that they're deployed in... [14:36:46] I might be unable to deploy all of that, I'll need to go soon [14:37:04] (03PS3) 10Muehlenhoff: Puppetserver: Update hooks [puppet] - 10https://gerrit.wikimedia.org/r/1104627 (https://phabricator.wikimedia.org/T365798) [14:37:17] Msz2001: Thanks for deploying, running maintenance script [14:37:34] yw [14:37:46] yeah it sounds like a tall order :/ [14:39:19] !log upload golang-github-u-root-u-root 0.12.0-1 to trixie-wikimedia (apt.wm.o) - T401832 [14:39:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:23] T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832 [14:42:53] (03CR) 10Hnowlan: [C:04-1] "Mostly lgtm, but the api-gateway helmfile.d will also need the redis external service defined" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225548 (https://phabricator.wikimedia.org/T414333) (owner: 10Clément Goubert) [14:45:01] (03PS1) 10Muehlenhoff: proton: Bump to latest image (with latest Chromium security release) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240289 [14:45:12] (03PS1) 10Jforrester: wikifunctions: Upgrade evaluators from 2026-02-11-123504 to 2026-02-12-145008 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240290 (https://phabricator.wikimedia.org/T382795) [14:45:21] (03PS1) 10Jforrester: wikifunctions: Upgrade orchestrator from 2026-02-11-121010 to 2026-02-18-140059 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240291 (https://phabricator.wikimedia.org/T382795) [14:45:35] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8066/console" [puppet] - 10https://gerrit.wikimedia.org/r/1240278 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [14:46:46] (03CR) 10Eevans: [C:03+1] "Hi, is this blocked on something? I was given to understand that serviceops had no capacity to work on it this quarter, but that we were " [puppet] - 10https://gerrit.wikimedia.org/r/1237258 (https://phabricator.wikimedia.org/T414112) (owner: 10Jelto) [14:48:10] !log upload golang-gitlab-wikimedia-sre-qemutest-dev 0.1.0+deb13u1 to trixie-wikimedia (apt.wm.o) - T401832 [14:48:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:15] T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832 [14:49:37] (03PS4) 10Muehlenhoff: Puppetserver: Update hooks [puppet] - 10https://gerrit.wikimedia.org/r/1104627 (https://phabricator.wikimedia.org/T365798) [14:51:18] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1104627 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [14:53:40] (03PS2) 10Federico Ceratto: orchestrator: disable service on dborch1001 [puppet] - 10https://gerrit.wikimedia.org/r/1240228 (https://phabricator.wikimedia.org/T416582) [14:54:02] (03CR) 10Federico Ceratto: "Ok, updated." [puppet] - 10https://gerrit.wikimedia.org/r/1240228 (https://phabricator.wikimedia.org/T416582) (owner: 10Federico Ceratto) [14:54:11] !log uplodaded tcp-mss-clamper 0.6+deb13u1 to trixie-wikimedia (apt-wm.o) - T401832 [14:54:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:17] T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832 [14:54:20] uplowhat? damn :) [14:54:30] (03PS1) 10Majavah: P:puppetserver: Set User-Agent on puppet-facts-upload script [puppet] - 10https://gerrit.wikimedia.org/r/1240295 [14:56:17] (03CR) 10CI reject: [V:04-1] P:puppetserver: Set User-Agent on puppet-facts-upload script [puppet] - 10https://gerrit.wikimedia.org/r/1240295 (owner: 10Majavah) [14:56:33] (03PS5) 10Muehlenhoff: Puppetserver: Update hooks [puppet] - 10https://gerrit.wikimedia.org/r/1104627 (https://phabricator.wikimedia.org/T365798) [14:58:25] (03CR) 10Muehlenhoff: [C:03+2] proton: Bump to latest image (with latest Chromium security release) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240289 (owner: 10Muehlenhoff) [14:59:16] !log jmm@deploy2002 helmfile [staging] START helmfile.d/services/proton: apply [14:59:51] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1104627 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [15:00:05] Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1500) [15:01:17] !incidents [15:01:18] No incidents occurred in the past 24 hours for team SRE [15:01:45] (03PS2) 10Majavah: P:puppetserver: Set User-Agent on puppet-facts-upload script [puppet] - 10https://gerrit.wikimedia.org/r/1240295 [15:02:58] !log jmm@deploy2002 helmfile [staging] DONE helmfile.d/services/proton: apply [15:05:02] (03CR) 10Jforrester: [C:03+2] wikifunctions: Upgrade evaluators from 2026-02-11-123504 to 2026-02-12-145008 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240290 (https://phabricator.wikimedia.org/T382795) (owner: 10Jforrester) [15:05:06] (03PS1) 10Ejegg: Add new dimensions to banner_activity in Turnilo [puppet] - 10https://gerrit.wikimedia.org/r/1240298 (https://phabricator.wikimedia.org/T414478) [15:07:09] (03Merged) 10jenkins-bot: wikifunctions: Upgrade evaluators from 2026-02-11-123504 to 2026-02-12-145008 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240290 (https://phabricator.wikimedia.org/T382795) (owner: 10Jforrester) [15:08:10] !log jforrester@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply [15:08:17] (03PS1) 10Blake: site.pp: wikikube-worker23[32-56] as kubernetes::worker [puppet] - 10https://gerrit.wikimedia.org/r/1240301 (https://phabricator.wikimedia.org/T417772) [15:08:58] !log jforrester@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [15:09:13] !log jforrester@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [15:09:45] (03CR) 10Clément Goubert: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1240301 (https://phabricator.wikimedia.org/T417772) (owner: 10Blake) [15:09:53] !log jforrester@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [15:10:01] !log jforrester@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [15:10:27] (03CR) 10Blake: [C:03+2] site.pp: wikikube-worker23[32-56] as kubernetes::worker [puppet] - 10https://gerrit.wikimedia.org/r/1240301 (https://phabricator.wikimedia.org/T417772) (owner: 10Blake) [15:10:57] !log jforrester@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [15:11:22] (03CR) 10Jforrester: [C:03+2] wikifunctions: Upgrade orchestrator from 2026-02-11-121010 to 2026-02-18-140059 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240291 (https://phabricator.wikimedia.org/T382795) (owner: 10Jforrester) [15:12:33] (03PS1) 10Federico Ceratto: Reflow setup.py using Black [cookbooks] - 10https://gerrit.wikimedia.org/r/1240302 [15:12:33] (03CR) 10Federico Ceratto: "Just a reflow for readability." [cookbooks] - 10https://gerrit.wikimedia.org/r/1240302 (owner: 10Federico Ceratto) [15:13:29] (03Merged) 10jenkins-bot: wikifunctions: Upgrade orchestrator from 2026-02-11-121010 to 2026-02-18-140059 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240291 (https://phabricator.wikimedia.org/T382795) (owner: 10Jforrester) [15:14:45] !log jforrester@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply [15:15:20] !log jforrester@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [15:15:47] !log jforrester@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [15:16:19] !log jforrester@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [15:16:33] !log jforrester@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [15:16:45] (03CR) 10Dzahn: [C:03+1] gerrit: change system user for gerrit1003 [puppet] - 10https://gerrit.wikimedia.org/r/1240230 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb) [15:16:59] !log jmm@deploy2002 helmfile [codfw] START helmfile.d/services/proton: apply [15:17:05] !log jforrester@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [15:17:11] (03PS1) 10Arnaudb: gerrit: move gerrit1003 to insetup [puppet] - 10https://gerrit.wikimedia.org/r/1240306 (https://phabricator.wikimedia.org/T417246) [15:17:32] (03CR) 10Arnaudb: "thanks for the hint!" [puppet] - 10https://gerrit.wikimedia.org/r/1240306 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb) [15:18:13] (03PS2) 10Arnaudb: gerrit: move gerrit1003 to insetup [puppet] - 10https://gerrit.wikimedia.org/r/1240306 (https://phabricator.wikimedia.org/T417246) [15:18:43] !log jmm@deploy2002 helmfile [codfw] DONE helmfile.d/services/proton: apply [15:19:38] (03CR) 10Arnaudb: [C:03+2] gerrit: move gerrit1003 to insetup [puppet] - 10https://gerrit.wikimedia.org/r/1240306 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb) [15:21:17] (03PS1) 10Arnaudb: Revert "gerrit: move gerrit1003 to insetup" [puppet] - 10https://gerrit.wikimedia.org/r/1240311 [15:21:49] (03CR) 10Muehlenhoff: gerrit: move gerrit1003 to insetup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1240306 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb) [15:22:17] !log jmm@deploy2002 helmfile [eqiad] START helmfile.d/services/proton: apply [15:22:34] (03CR) 10Arnaudb: [C:03+2] Revert "gerrit: move gerrit1003 to insetup" [puppet] - 10https://gerrit.wikimedia.org/r/1240311 (owner: 10Arnaudb) [15:24:05] (03CR) 10Arnaudb: [C:03+2] gerrit: move gerrit1003 to insetup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1240306 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb) [15:24:20] !log jmm@deploy2002 helmfile [eqiad] DONE helmfile.d/services/proton: apply [15:24:23] !log pt1979@cumin2002 START - Cookbook sre.dns.netbox [15:24:33] FIRING: KubernetesCalicoDown: wikikube-worker2355.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s&var-instance=wikikube-worker2355.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [15:26:29] (03CR) 10Muehlenhoff: "The diff between the hooks is only the SPDX headers:" [puppet] - 10https://gerrit.wikimedia.org/r/1104627 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [15:26:50] 07Puppet, 06cloud-services-team, 10Cloud-VPS: Prune old reports from /var/lib/puppetserver/server_data/facts on Cloud VPS puppet servers - https://phabricator.wikimedia.org/T417795 (10taavi) 03NEW p:05Triage→03High [15:27:54] (03CR) 10Clément Goubert: [C:03+2] kubernetes: Add wikikube-worker23[32-56] [puppet] - 10https://gerrit.wikimedia.org/r/1240276 (https://phabricator.wikimedia.org/T417772) (owner: 10Clément Goubert) [15:28:04] (03PS2) 10Arnaudb: gerrit: move gerrit1003 to insetup [puppet] - 10https://gerrit.wikimedia.org/r/1240314 (https://phabricator.wikimedia.org/T417246) [15:28:16] 07Puppet, 06cloud-services-team, 10Cloud-VPS: Prune old reports from /var/lib/puppetserver/server_data/facts on Cloud VPS puppet servers - https://phabricator.wikimedia.org/T417795#11628178 (10taavi) [15:29:27] (03PS1) 10Majavah: P:puppetserver::wmcs: Prune old fact files [puppet] - 10https://gerrit.wikimedia.org/r/1240316 (https://phabricator.wikimedia.org/T417795) [15:29:33] FIRING: [25x] KubernetesCalicoDown: wikikube-worker2332.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [15:29:59] (03CR) 10Scott French: [C:03+1] "Thanks, Effie! One last thing to be aware of:" [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli) [15:30:03] (03PS1) 10Muehlenhoff: Remove profile::puppetmaster::common [puppet] - 10https://gerrit.wikimedia.org/r/1240317 (https://phabricator.wikimedia.org/T365798) [15:30:03] pt1979@cumin2002 netbox (PID 2573359) is awaiting input [15:30:05] Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1500) [15:30:05] Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1530) [15:30:26] (03PS10) 10Federico Ceratto: mysql: update replication source [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) [15:31:08] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240317 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [15:31:56] !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for cloudgw2004 and cloudcephosd2008-dev - pt1979@cumin2002" [15:32:01] !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for cloudgw2004 and cloudcephosd2008-dev - pt1979@cumin2002" [15:32:02] !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [15:35:20] (03CR) 10CI reject: [V:04-1] mysql: update replication source [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) (owner: 10Federico Ceratto) [15:35:44] !log upload liberica 0.23 to bookworm-wikimedia (apt.wm.o) - T417306 [15:35:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:48] T417306: liberica-fp doesn't error/refuse to start if the detected MAC Address for the gateway is invalid - https://phabricator.wikimedia.org/T417306 [15:36:59] !log zabe@deploy2002:~$ mwscript extensions/TimedMediaHandler/maintenance/migrateTranscodeStates.php mediawikiwiki # T415064 [15:37:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:03] T415064: Backfill new status and touched columns - https://phabricator.wikimedia.org/T415064 [15:37:07] (03CR) 10Dzahn: [C:03+1] gerrit: move gerrit1003 to insetup [puppet] - 10https://gerrit.wikimedia.org/r/1240314 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb) [15:39:33] FIRING: [25x] KubernetesCalicoDown: wikikube-worker2332.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [15:39:38] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Eqiad: Fr-tech expansion - https://phabricator.wikimedia.org/T403035#11628216 (10VRiley-WMF) 05Open→03Resolved This has been completed [15:40:30] All good for the KubernetesCalicoDown they're expected [15:40:32] !log homer 'cr*codfw*' commit 'T417772' [15:40:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:36] T417772: wikikube-worker23[32-56] implementation tracking - https://phabricator.wikimedia.org/T417772 [15:41:07] !log vgutierrez@cumin1003 START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs7003.magru.wmnet} and A:liberica (T417306) [15:41:18] T417306: liberica-fp doesn't error/refuse to start if the detected MAC Address for the gateway is invalid - https://phabricator.wikimedia.org/T417306 [15:41:22] (03PS5) 10Effie Mouzeli: service.yaml: remove alerts from mw-parsoid #2 [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) [15:41:25] (03CR) 10Effie Mouzeli: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli) [15:41:47] 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on kubestage2004 - https://phabricator.wikimedia.org/T416726#11628228 (10Jhancock.wm) for some reason the package still hasn't been delivered. looping back to dell to figure out why [15:41:58] !log vgutierrez@cumin1003 END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs7003.magru.wmnet} and A:liberica (T417306) [15:44:09] (03PS1) 10Ayounsi: decom cookbook: use homer on Nokia switches [cookbooks] - 10https://gerrit.wikimedia.org/r/1240318 (https://phabricator.wikimedia.org/T417428) [15:44:33] FIRING: [25x] KubernetesCalicoDown: wikikube-worker2332.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [15:44:49] !log homer 'lsw*codfw*' commit 'T417772' [15:44:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:24] (03PS1) 10Zabe: Stop writing to il_to on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240320 (https://phabricator.wikimedia.org/T415787) [15:48:35] (03CR) 10Muehlenhoff: [C:03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/1240295 (owner: 10Majavah) [15:48:38] 10ops-eqiad, 06SRE, 06DC-Ops, 07Sustainability (Incident Followup): move the link from lvs1020 from ssw1-f1-eqiad to ssw1-e1-eqiad - https://phabricator.wikimedia.org/T417054#11628255 (10VRiley-WMF) 05Open→03Resolved lvs1020 is currently connected to ssw1-e1-eqiad https://netbox.wikimedia.org/dcim... [15:48:39] (03CR) 10Zabe: [C:04-2] Stop writing to il_to on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240320 (https://phabricator.wikimedia.org/T415787) (owner: 10Zabe) [15:49:33] RESOLVED: [25x] KubernetesCalicoDown: wikikube-worker2332.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [15:54:37] (03CR) 10Effie Mouzeli: "thanks for finding this, I assumed that it would go away :/" [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli) [15:59:46] (03CR) 10Effie Mouzeli: [C:03+2] service.yaml: remove alerts from mw-parsoid #2 [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli) [16:00:39] 07Puppet, 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Prune old reports from /var/lib/puppetserver/server_data/facts on Cloud VPS puppet servers - https://phabricator.wikimedia.org/T417795#11628332 (10taavi) 05Open→03Resolved [16:02:07] (03PS1) 10Clément Goubert: conftool-data: Remove wikikube-workers in codfw E/F [puppet] - 10https://gerrit.wikimedia.org/r/1240323 (https://phabricator.wikimedia.org/T417772) [16:02:55] (03CR) 10Blake: [C:03+1] conftool-data: Remove wikikube-workers in codfw E/F [puppet] - 10https://gerrit.wikimedia.org/r/1240323 (https://phabricator.wikimedia.org/T417772) (owner: 10Clément Goubert) [16:03:48] !log imported jenkins 2.541.2 for bullseye-wikimedia/bookworm-wikimedia [16:03:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:04:43] (03CR) 10Clément Goubert: [C:03+2] conftool-data: Remove wikikube-workers in codfw E/F [puppet] - 10https://gerrit.wikimedia.org/r/1240323 (https://phabricator.wikimedia.org/T417772) (owner: 10Clément Goubert) [16:05:36] (03PS1) 10Majavah: Add toolsbeta-acme-chief private key [labs/private] - 10https://gerrit.wikimedia.org/r/1240325 [16:05:36] (03PS1) 10Majavah: Add fake metricsinfra Grafana admin password [labs/private] - 10https://gerrit.wikimedia.org/r/1240326 [16:08:20] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:10:19] !log bking@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs1028.eqiad.wmnet with reason: broken puppet [16:11:29] (03CR) 10Arnaudb: [C:03+2] gerrit: move gerrit1003 to insetup [puppet] - 10https://gerrit.wikimedia.org/r/1240314 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb) [16:12:13] (03Abandoned) 10Ahmon Dancy: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1240089 (owner: 10TrainBranchBot) [16:12:22] !log arnaudb@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gerrit1003.wikimedia.org with OS bookworm [16:12:44] !log arnaudb@cumin1003 START - Cookbook sre.hosts.reimage for host gerrit1003.wikimedia.org with OS bookworm [16:14:19] !log upgrade codfw1dev instances of cloudlb and cloudservices* to Bird 2.18 T413740 [16:14:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:23] T413740: Backport and test Bird 2.18 - https://phabricator.wikimedia.org/T413740 [16:15:52] 10ops-codfw, 06SRE, 06DC-Ops, 10decommission-hardware, 13Patch-For-Review: decommission puppetmaster2001 - https://phabricator.wikimedia.org/T416606#11628403 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm [16:18:13] !log upload liberica 0.24 to bookworm-wikimedia (apt.wm.o) - T417306 [16:18:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:18:17] T417306: liberica-fp doesn't error/refuse to start if the detected MAC Address for the gateway is invalid - https://phabricator.wikimedia.org/T417306 [16:18:38] !log vgutierrez@cumin1003 START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs7003.magru.wmnet} and A:liberica (T417306) [16:19:29] !log vgutierrez@cumin1003 END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs7003.magru.wmnet} and A:liberica (T417306) [16:21:56] (03PS1) 10BCornwall: loadbalancer.upgrade: Fix runtime msg misspelling [cookbooks] - 10https://gerrit.wikimedia.org/r/1240329 [16:22:30] !log vgutierrez@cumin1003 START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica AND NOT P{lvs7003.magru.wmnet} and A:liberica (T417306) [16:22:47] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1240330 [16:22:48] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1240330 (owner: 10TrainBranchBot) [16:22:56] (03CR) 10BCornwall: "World's pettiest CR" [cookbooks] - 10https://gerrit.wikimedia.org/r/1240329 (owner: 10BCornwall) [16:25:55] (03PS10) 10Bking: opensearch-cluster: allow the definition of custom network policies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238298 (https://phabricator.wikimedia.org/T414095) (owner: 10Brouberol) [16:26:41] (03CR) 10Bking: opensearch-cluster: allow the definition of custom network policies (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238298 (https://phabricator.wikimedia.org/T414095) (owner: 10Brouberol) [16:27:58] (03CR) 10Vgutierrez: [C:03+1] "thx!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1240329 (owner: 10BCornwall) [16:29:15] !log jnuche@deploy2002 Started deploy [releng/jenkins-deploy@863e5c2] (releasing): Jenkins update test on backup host [16:30:02] !log jnuche@deploy2002 Finished deploy [releng/jenkins-deploy@863e5c2] (releasing): Jenkins update test on backup host (duration: 01m 48s) [16:30:26] !log arnaudb@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage [16:33:11] (03CR) 10Bking: [C:03+2] "I have confirmed that this change will not affect opensearch-ipoid (no network policies are displayed when I run a `helmfile diff`.) Mergi" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238298 (https://phabricator.wikimedia.org/T414095) (owner: 10Brouberol) [16:33:20] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:34:43] !log arnaudb@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage [16:35:30] (03Merged) 10jenkins-bot: opensearch-cluster: allow the definition of custom network policies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238298 (https://phabricator.wikimedia.org/T414095) (owner: 10Brouberol) [16:39:13] 10ops-codfw, 06SRE, 06DC-Ops: RAM upgrade availability for Titan hosts - https://phabricator.wikimedia.org/T417336#11628506 (10Jhancock.wm) a:03herron @herron for visibility [16:45:15] (03CR) 10BCornwall: [C:03+2] loadbalancer.upgrade: Fix runtime msg misspelling [cookbooks] - 10https://gerrit.wikimedia.org/r/1240329 (owner: 10BCornwall) [16:47:05] (03PS1) 10Muehlenhoff: Obsolete airflow-analytics-product-admins POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1240336 [16:48:44] !log vgutierrez@cumin1003 END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica AND NOT P{lvs7003.magru.wmnet} and A:liberica (T417306) [16:48:48] T417306: liberica-fp doesn't error/refuse to start if the detected MAC Address for the gateway is invalid - https://phabricator.wikimedia.org/T417306 [16:53:03] !log arnaudb@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit1003.wikimedia.org with OS bookworm [16:54:58] 10ops-codfw, 06SRE, 06DC-Ops: RAM upgrade availability for Titan hosts - https://phabricator.wikimedia.org/T417336#11628558 (10herron) >>! In T417336#11613204, @Jhancock.wm wrote: > we have plenty of decommissioned servers we can pull from to up the ram count on these servers. i can check in the morning if t... [16:57:12] (03PS1) 10Ladsgroup: Make Pdf thumbs follow the thumb steps [extensions/PdfHandler] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240337 (https://phabricator.wikimedia.org/T402792) [16:57:47] (03PS1) 10Ladsgroup: Make Pdf thumbs follow the thumb steps [extensions/PdfHandler] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240338 (https://phabricator.wikimedia.org/T402792) [16:58:02] jouncebot: nowandnext [16:58:03] No deployments scheduled for the next 1 hour(s) and 1 minute(s) [16:58:03] In 1 hour(s) and 1 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1800) [16:58:08] cool cool [16:58:40] (03CR) 10CI reject: [V:04-1] Make Pdf thumbs follow the thumb steps [extensions/PdfHandler] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240337 (https://phabricator.wikimedia.org/T402792) (owner: 10Ladsgroup) [16:59:00] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [extensions/PdfHandler] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240337 (https://phabricator.wikimedia.org/T402792) (owner: 10Ladsgroup) [16:59:00] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [extensions/PdfHandler] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240338 (https://phabricator.wikimedia.org/T402792) (owner: 10Ladsgroup) [16:59:18] (03CR) 10CI reject: [V:04-1] Make Pdf thumbs follow the thumb steps [extensions/PdfHandler] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240338 (https://phabricator.wikimedia.org/T402792) (owner: 10Ladsgroup) [16:59:39] :( [17:00:35] (03CR) 10CI reject: [V:04-1] Make Pdf thumbs follow the thumb steps [extensions/PdfHandler] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240337 (https://phabricator.wikimedia.org/T402792) (owner: 10Ladsgroup) [17:00:46] 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: FY2526 Q3 ulsfo: switch refresh - https://phabricator.wikimedia.org/T408510#11628604 (10RobH) [17:01:36] I guess you would need to backport the firebase/php-jwt upgrade [17:01:38] (03PS2) 10Bernard Wang: Enable personal main menu to all users in minerva [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240012 (https://phabricator.wikimedia.org/T413912) [17:02:11] that has merge conflict [17:02:21] (03CR) 10Bking: [C:04-1] "David and I discovered that these settings are not actually applied to the OpenSearch cluster, similar to what I found with the cluster dy" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238306 (https://phabricator.wikimedia.org/T414095) (owner: 10DCausse) [17:05:53] (03PS1) 10Reedy: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [vendor] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240340 (https://phabricator.wikimedia.org/T417722) [17:07:47] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1240330 (owner: 10TrainBranchBot) [17:10:55] !log jnuche@deploy2002 Started deploy [releng/jenkins-deploy@863e5c2] (releasing): Jenkins security updates [17:12:04] !log jnuche@deploy2002 Finished deploy [releng/jenkins-deploy@863e5c2] (releasing): Jenkins security updates (duration: 01m 32s) [17:13:01] (03PS11) 10Federico Ceratto: mysql: update replication source [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) [17:14:19] (03CR) 10Federico Ceratto: "I added better docstrings to explain what the cookbook does and unit testing for the safety checks." [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) (owner: 10Federico Ceratto) [17:14:51] (03PS1) 10Reedy: Explicitly pin wikimedia/wikipeg [vendor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240342 [17:14:52] (03PS1) 10Reedy: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [vendor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240343 (https://phabricator.wikimedia.org/T417722) [17:15:22] jouncebot: nowandnext [17:15:22] No deployments scheduled for the next 0 hour(s) and 44 minute(s) [17:15:22] In 0 hour(s) and 44 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1800) [17:15:29] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-private-users for maxbinderWMF - https://phabricator.wikimedia.org/T417655#11628667 (10MBinder_WMF) So, it happens that the group membership did not grant access to the page we're trying to get me into, which is https://superset.wikimedia.org/sup... [17:16:29] (03CR) 10CI reject: [V:04-1] Explicitly pin wikimedia/wikipeg [vendor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240342 (owner: 10Reedy) [17:17:02] (03CR) 10Reedy: [V:03+2 C:03+2] Explicitly pin wikimedia/wikipeg [vendor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240342 (owner: 10Reedy) [17:17:46] 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11628686 (10Papaul) Created ticket Case Order #01144222 for initial racking and wiring of the new Nokia switches. [17:18:24] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8068/co" [puppet] - 10https://gerrit.wikimedia.org/r/1240278 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [17:19:02] (03CR) 10CI reject: [V:04-1] mysql: update replication source [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) (owner: 10Federico Ceratto) [17:19:11] (03CR) 10Reedy: [C:03+2] Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [vendor] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240340 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:24:41] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [17:25:23] (03CR) 10Reedy: [C:03+2] Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [vendor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240343 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:26:32] (03PS23) 10Tiziano Fogli: slothslos: add module to build and deploy sloth manifests [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) [17:29:37] (03CR) 10CI reject: [V:04-1] slothslos: add module to build and deploy sloth manifests [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli) [17:33:08] (03Merged) 10jenkins-bot: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [vendor] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240340 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:33:20] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [17:33:39] (03PS1) 10Reedy: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/CheckUser] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240345 (https://phabricator.wikimedia.org/T417722) [17:33:46] (03CR) 10Reedy: [C:03+2] Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/CheckUser] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240345 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:33:53] (03PS1) 10Reedy: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/CheckUser] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240346 (https://phabricator.wikimedia.org/T417722) [17:34:00] (03CR) 10Reedy: [C:03+2] Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/CheckUser] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240346 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:34:25] (03PS1) 10Reedy: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/OAuth] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240347 (https://phabricator.wikimedia.org/T417722) [17:34:31] (03CR) 10Reedy: [C:03+2] Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/OAuth] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240347 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:35:11] (03PS2) 10Reedy: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/OAuth] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240348 (https://phabricator.wikimedia.org/T417722) [17:35:24] (03CR) 10Reedy: [C:03+2] Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/OAuth] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240348 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:35:44] (03PS1) 10Reedy: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/ContentTranslation] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240349 (https://phabricator.wikimedia.org/T417722) [17:35:51] (03CR) 10Reedy: [C:03+2] Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/ContentTranslation] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240349 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:35:58] (03PS1) 10Reedy: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/ContentTranslation] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240350 (https://phabricator.wikimedia.org/T417722) [17:36:05] (03CR) 10Reedy: [C:03+2] Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/ContentTranslation] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240350 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:39:45] (03CR) 10FNegri: [C:03+1] "LGTM, I haven't tested the ldap query though. how can I easily find a disabled account to test it with?" [puppet] - 10https://gerrit.wikimedia.org/r/1240271 (owner: 10Majavah) [17:39:58] 10ops-eqiad, 06SRE, 06DC-Ops: Alert for device ps1-e3-eqiad.mgmt.eqiad.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T417316#11628730 (10phaultfinder) [17:40:06] (03Merged) 10jenkins-bot: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [vendor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240343 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:40:37] (03CR) 10Subramanya Sastry: [C:03+1] Deploy PRV to 19 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239270 (https://phabricator.wikimedia.org/T417349) (owner: 10Arlolra) [17:41:23] (03CR) 10JHathaway: [C:03+1] Remove profile::puppetmaster::common [puppet] - 10https://gerrit.wikimedia.org/r/1240317 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [17:41:57] (03CR) 10JHathaway: [C:03+1] Puppetserver: Update hooks [puppet] - 10https://gerrit.wikimedia.org/r/1104627 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [17:42:43] (03CR) 10Majavah: "No comment on the "easily" part (I notice these when I look at our mail server logs from time to time) but at the moment https://ldap.too" [puppet] - 10https://gerrit.wikimedia.org/r/1240271 (owner: 10Majavah) [17:43:20] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [17:44:45] (03Merged) 10jenkins-bot: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/CheckUser] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240345 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:46:31] (03Merged) 10jenkins-bot: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/CheckUser] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240346 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:46:34] (03Merged) 10jenkins-bot: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/OAuth] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240347 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:46:36] (03Merged) 10jenkins-bot: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/OAuth] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240348 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:51:24] (03CR) 10Majavah: [C:03+2] P:toolforge: mailrelay: Never expand tool alias to disabled maintainers [puppet] - 10https://gerrit.wikimedia.org/r/1240271 (owner: 10Majavah) [17:55:12] (03Merged) 10jenkins-bot: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/ContentTranslation] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240349 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:55:19] 1 to go... [17:56:59] (03Merged) 10jenkins-bot: Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) [extensions/ContentTranslation] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240350 (https://phabricator.wikimedia.org/T417722) (owner: 10Reedy) [17:58:00] !log reedy@deploy2002 Started scap sync-world: Backport for [[gerrit:1240340|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240343|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240345|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240346|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240347|Upgrading firebase/php-jwt (v6.11.1 => v [17:58:00] 7.0.2) (T417722)]], [[gerrit:1240348|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240349|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240350|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]] [17:58:04] T417722: php-jwt contains weak encryption - https://phabricator.wikimedia.org/T417722 [18:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1800) [18:00:19] Reedy: Thank you very much for taking care of that problem! [18:00:22] !log reedy@deploy2002 reedy: Backport for [[gerrit:1240340|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240343|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240345|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240346|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240347|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]] [18:00:22] , [[gerrit:1240348|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240349|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240350|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [18:00:51] !log reedy@deploy2002 reedy: Continuing with sync [18:04:55] note that moderator tools is about to add ORES extension tables for a number of wikis for T411485 [18:04:56] T411485: Enable revert risk filters for first batch of wikis: < 1000 monthly edits - https://phabricator.wikimedia.org/T411485 [18:04:59] !log reedy@deploy2002 Finished scap sync-world: Backport for [[gerrit:1240340|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240343|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240345|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240346|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240347|Upgrading firebase/php-jwt (v6.11.1 => [18:04:59] v7.0.2) (T417722)]], [[gerrit:1240348|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240349|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]], [[gerrit:1240350|Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)]] (duration: 06m 59s) [18:05:03] T417722: php-jwt contains weak encryption - https://phabricator.wikimedia.org/T417722 [18:21:20] (03CR) 10FNegri: [C:03+1] "Thanks, tested and working as expected!" [puppet] - 10https://gerrit.wikimedia.org/r/1240271 (owner: 10Majavah) [18:22:54] (03CR) 10Bking: "Sorry, to be more clear, the egress settings are OK, just the `additionalConfig` doesn't work." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238306 (https://phabricator.wikimedia.org/T414095) (owner: 10DCausse) [18:22:59] Reedy: Thank you! [18:23:17] jouncebot: nowandnext [18:23:17] For the next 0 hour(s) and 36 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1800) [18:23:17] In 0 hour(s) and 36 minute(s): MediaWiki train - Utc-7+Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1900) [18:24:08] JSherman: I'm going to backport a patch in the mean time, the table creation won't conflict, feel free to do it at any time [18:24:13] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [extensions/PdfHandler] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240337 (https://phabricator.wikimedia.org/T402792) (owner: 10Ladsgroup) [18:24:13] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [extensions/PdfHandler] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240338 (https://phabricator.wikimedia.org/T402792) (owner: 10Ladsgroup) [18:24:48] Amir1: we just finished! [18:24:58] awesome [18:26:32] hmm, spiderpig says the new one is at "Error" state while it's not really [18:26:37] 06SRE: Remove production data access for NDA expired user mobrovac - https://phabricator.wikimedia.org/T388030#11628900 (10Dzahn) As of today, Marko still has deployment access but is not in analytics-privatedata-users anymore. [18:29:14] (03CR) 10Marostegui: [C:03+1] orchestrator: disable service on dborch1001 [puppet] - 10https://gerrit.wikimedia.org/r/1240228 (https://phabricator.wikimedia.org/T416582) (owner: 10Federico Ceratto) [18:30:02] 06SRE: Remove production data access for NDA expired user mobrovac - https://phabricator.wikimedia.org/T388030#11628921 (10Ladsgroup) Somewhat unrelated. They add a lot of files to puppet that gets added to every server: ` ~/puppet$ find . | grep -i mobrovac ./modules/admin/files/home/mobrovac ./modules/admin/fi... [18:32:53] (03Abandoned) 10Ladsgroup: Make Pdf thumbs follow the thumb steps [extensions/PdfHandler] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240338 (https://phabricator.wikimedia.org/T402792) (owner: 10Ladsgroup) [18:37:02] (03Merged) 10jenkins-bot: Make Pdf thumbs follow the thumb steps [extensions/PdfHandler] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240337 (https://phabricator.wikimedia.org/T402792) (owner: 10Ladsgroup) [18:40:11] (03PS10) 10Bking: dse-k8s: Enable active/active for dse-k8s clusters [dns] - 10https://gerrit.wikimedia.org/r/1238441 (https://phabricator.wikimedia.org/T396478) [18:42:28] !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1240337|Make Pdf thumbs follow the thumb steps (T402792 T414805)]] [18:42:33] T402792: Consider rate limiting non-standard thumbnail sizes - https://phabricator.wikimedia.org/T402792 [18:42:34] T414805: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805 [18:43:33] (03CR) 10Bking: dse-k8s: Enable active/active for dse-k8s clusters (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1238441 (https://phabricator.wikimedia.org/T396478) (owner: 10Bking) [18:44:05] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1199 (T415786)', diff saved to https://phabricator.wikimedia.org/P88878 and previous config saved to /var/cache/conftool/dbconfig/20260218-184405-marostegui.json [18:44:09] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [18:44:36] !log ladsgroup@deploy2002 ladsgroup: Backport for [[gerrit:1240337|Make Pdf thumbs follow the thumb steps (T402792 T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [18:46:04] !log ladsgroup@deploy2002 ladsgroup: Continuing with sync [18:48:56] (03PS1) 10Dzahn: gerrit: increase failure_fraction for NEL to 20% [puppet] - 10https://gerrit.wikimedia.org/r/1240357 (https://phabricator.wikimedia.org/T303725) [18:49:38] 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 13Patch-For-Review: Extend NEL headers to sites not fronted by CDN - https://phabricator.wikimedia.org/T303725#11629187 (10Dzahn) >>! In T303725#11256799, @CDanis wrote: > BTW, after looking at a few weeks of data, I suggest increasing the fail... [18:50:11] !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1240337|Make Pdf thumbs follow the thumb steps (T402792 T414805)]] (duration: 07m 43s) [18:50:16] T402792: Consider rate limiting non-standard thumbnail sizes - https://phabricator.wikimedia.org/T402792 [18:50:16] T414805: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805 [18:52:24] (03CR) 10CDanis: [C:04-1] "I’m pretty sure misc-frontend.vcl.erb.guh overrides this unconditionally, so you’ll have to fix that too." [puppet] - 10https://gerrit.wikimedia.org/r/1240357 (https://phabricator.wikimedia.org/T303725) (owner: 10Dzahn) [18:56:10] (03CR) 10CDanis: [C:04-1] "Actually, it’s probably better to not use NEL for this now that Gerrit is behind the CDN. NEL will never tell you about git clones failing" [puppet] - 10https://gerrit.wikimedia.org/r/1240357 (https://phabricator.wikimedia.org/T303725) (owner: 10Dzahn) [18:57:50] (03CR) 10Dzahn: "thank you for that explanation. glad I asked and that was useful. will abandon this patch." [puppet] - 10https://gerrit.wikimedia.org/r/1240357 (https://phabricator.wikimedia.org/T303725) (owner: 10Dzahn) [18:57:58] (03Abandoned) 10Dzahn: gerrit: increase failure_fraction for NEL to 20% [puppet] - 10https://gerrit.wikimedia.org/r/1240357 (https://phabricator.wikimedia.org/T303725) (owner: 10Dzahn) [18:59:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P88879 and previous config saved to /var/cache/conftool/dbconfig/20260218-185912-marostegui.json [19:00:05] dancy and jnuche: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for MediaWiki train - Utc-7+Utc-0 Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T1900). [19:00:39] 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 13Patch-For-Review: Extend NEL headers to sites not fronted by CDN - https://phabricator.wikimedia.org/T303725#11629281 (10Dzahn) Since this is about sites NOT fronted by the CDN - I think we reject doing it for Gitlab. Because it's not as easy... [19:05:51] !log import haproxykafka 0.3.16+deb13u1 into trixie-wikimedia (T401832) [19:05:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:05:55] T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832 [19:08:05] (03PS1) 10DLynch: BaseEditCheck: fix check for blockquote [extensions/VisualEditor] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240361 (https://phabricator.wikimedia.org/T417801) [19:08:34] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 18 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [extensions/VisualEditor] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240361 (https://phabricator.wikimedia.org/T417801) (owner: 10DLynch) [19:09:55] 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 13Patch-For-Review: Extend NEL headers to sites not fronted by CDN - https://phabricator.wikimedia.org/T303725#11629319 (10CDanis) Sounds good to me. [19:14:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P88880 and previous config saved to /var/cache/conftool/dbconfig/20260218-191420-marostegui.json [19:21:48] (03PS1) 10TrainBranchBot: group1 to 1.46.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240364 (https://phabricator.wikimedia.org/T413807) [19:21:50] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by dancy@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240364 (https://phabricator.wikimedia.org/T413807) (owner: 10TrainBranchBot) [19:22:47] (03Merged) 10jenkins-bot: group1 to 1.46.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240364 (https://phabricator.wikimedia.org/T413807) (owner: 10TrainBranchBot) [19:28:53] !log dancy@deploy2002 rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.16 refs T413807 [19:28:57] T413807: 1.46.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T413807 [19:29:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1199 (T415786)', diff saved to https://phabricator.wikimedia.org/P88881 and previous config saved to /var/cache/conftool/dbconfig/20260218-192929-marostegui.json [19:29:33] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [19:29:47] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance [19:30:09] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: Maintenance [19:30:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1221 (T415786)', diff saved to https://phabricator.wikimedia.org/P88882 and previous config saved to /var/cache/conftool/dbconfig/20260218-193017-marostegui.json [19:30:59] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [19:32:38] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host backup2016.codfw.wmnet with OS trixie [19:32:45] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11629369 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host backup2016.codfw.wmnet with OS trixie [19:40:16] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [19:44:24] (03CR) 10Kamila Součková: "Boop, I assume this is still needed? @hnowlan@wikimedia.org how can I help? (Would a +1 help? :D Or should I deploy it?)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1226286 (https://phabricator.wikimedia.org/T411076) (owner: 10Hnowlan) [19:44:57] (03PS2) 10Daniel Kinzler: rest-gateway: use MINUTE limits in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239669 [19:45:23] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host backup2015.codfw.wmnet with OS trixie [19:45:35] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup2015 - https://phabricator.wikimedia.org/T414724#11629446 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host backup2015.codfw.wmnet with OS trixie [19:45:35] (03PS7) 10Daniel Kinzler: rest route: support multiple rate limit policies at once [deployment-charts] - 10https://gerrit.wikimedia.org/r/1228218 (https://phabricator.wikimedia.org/T413186) [19:48:11] !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on backup2016.codfw.wmnet with reason: host reimage [19:53:09] (03CR) 10Kamila Součková: "Sorry, I just saw I0c9f4cf16ddd86a9d5701c22335cfc4848fa6060 , is that blocking this?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1226286 (https://phabricator.wikimedia.org/T411076) (owner: 10Hnowlan) [19:53:49] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2016.codfw.wmnet with reason: host reimage [20:01:15] !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on backup2015.codfw.wmnet with reason: host reimage [20:04:17] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2015.codfw.wmnet with reason: host reimage [20:15:10] FIRING: BFDdown: BFD session down between cr2-eqdfw and fe80::7a4f:9b00:174e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [20:15:59] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [20:19:11] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [20:19:12] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2016.codfw.wmnet with OS trixie [20:19:21] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11629605 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host backup2016.codfw.wmnet with OS trixie completed: - backup2016 (**WA... [20:20:10] RESOLVED: BFDdown: BFD session down between cr2-eqdfw and fe80::7a4f:9b00:174e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [20:20:23] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [20:20:38] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [20:20:39] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2015.codfw.wmnet with OS trixie [20:20:50] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup2015 - https://phabricator.wikimedia.org/T414724#11629625 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host backup2015.codfw.wmnet with OS trixie completed: - backup2015 (**PASS**)... [20:21:28] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup2015 - https://phabricator.wikimedia.org/T414724#11629627 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm @jcrespo, this one is completed. the other ticket should follow shortly. [20:21:40] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup2015 - https://phabricator.wikimedia.org/T414724#11629632 (10Jhancock.wm) [20:21:56] (03PS1) 10Ahmon Dancy: scap3 install provider: Set HOME for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) [20:23:47] (03CR) 10CI reject: [V:04-1] scap3 install provider: Set HOME for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) (owner: 10Ahmon Dancy) [20:25:44] (03CR) 10WMDE-Fisch: "unrelated CI failure" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239873 (https://phabricator.wikimedia.org/T415910) (owner: 10Thiemo Kreuz (WMDE)) [20:29:46] FIRING: Not accepting/receiving prefixes from anycast BGP peer: Alert for device lsw1-e1-eqiad.mgmt.eqiad.wmnet - Not accepting/receiving prefixes from anycast BGP peer - https://alerts.wikimedia.org/?q=alertname%3DNot+accepting%2Freceiving+prefixes+from+anycast+BGP+peer [20:29:57] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2018.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [20:31:17] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 18 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239873 (https://phabricator.wikimedia.org/T415910) (owner: 10Thiemo Kreuz (WMDE)) [20:31:59] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 18 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239877 (https://phabricator.wikimedia.org/T415909) (owner: 10Thiemo Kreuz (WMDE)) [20:33:42] (03CR) 10WMDE-Fisch: "recheck" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239873 (https://phabricator.wikimedia.org/T415910) (owner: 10Thiemo Kreuz (WMDE)) [20:33:55] (03CR) 10WMDE-Fisch: "unrelated CI failure" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239877 (https://phabricator.wikimedia.org/T415909) (owner: 10Thiemo Kreuz (WMDE)) [20:34:01] (03CR) 10WMDE-Fisch: "recheck" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239877 (https://phabricator.wikimedia.org/T415909) (owner: 10Thiemo Kreuz (WMDE)) [20:34:48] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host backup2017.codfw.wmnet with OS trixie [20:34:51] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2018.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [20:35:02] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11629661 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host backup2017.codfw.wmnet with OS trixie [20:35:32] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host backup2018.codfw.wmnet with OS trixie [20:35:39] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11629664 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host backup2018.codfw.wmnet with OS trixie [20:38:52] Hey folks (and especially dancy), the new MW version seems to have introduced a possible regression with image sizes, see e.g. https://it.wikipedia.org/wiki/Template:Bozza/man#Esempi_d'uso (those icons shouldn't be as big). I saw reports about multiple templates, but I still haven't checked the exact cause. I'll file a task once I know more (and assuming it isn't a local issue), so this is a heads-up that it might be coming. [20:39:18] (03PS2) 10Ahmon Dancy: scap3 install provider: Set HOME for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) [20:40:02] Daimona: Thanks! [20:44:46] RESOLVED: Not accepting/receiving prefixes from anycast BGP peer: Device lsw1-e1-eqiad.mgmt.eqiad.wmnet recovered from Not accepting/receiving prefixes from anycast BGP peer - https://alerts.wikimedia.org/?q=alertname%3DNot+accepting%2Freceiving+prefixes+from+anycast+BGP+peer [20:44:52] I filed T417828 so I have somewhere to paste my findings to [20:44:52] T417828: Some images become disproportionately big in 1.46.0-wmf.16 - https://phabricator.wikimedia.org/T417828 [20:46:24] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2179 (T415786)', diff saved to https://phabricator.wikimedia.org/P88883 and previous config saved to /var/cache/conftool/dbconfig/20260218-204624-marostegui.json [20:46:28] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [20:48:05] (03CR) 10WMDE-Fisch: [C:03+1] Add instrument for clicks in TOC references link [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239873 (https://phabricator.wikimedia.org/T415910) (owner: 10Thiemo Kreuz (WMDE)) [20:48:13] (03CR) 10WMDE-Fisch: [C:03+1] Add instrument for clicks in footnotes in the article [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239877 (https://phabricator.wikimedia.org/T415909) (owner: 10Thiemo Kreuz (WMDE)) [20:48:25] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:50:10] !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on backup2017.codfw.wmnet with reason: host reimage [20:50:37] !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on backup2018.codfw.wmnet with reason: host reimage [20:51:58] (03PS1) 10Ahmon Dancy: scap: load scap_source type in specs [puppet] - 10https://gerrit.wikimedia.org/r/1240377 [20:53:04] (03PS3) 10Ahmon Dancy: scap3 install provider: Set HOME for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) [20:53:50] (03CR) 10CI reject: [V:04-1] scap3 install provider: Set HOME for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) (owner: 10Ahmon Dancy) [20:54:08] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2017.codfw.wmnet with reason: host reimage [20:58:21] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2018.codfw.wmnet with reason: host reimage [21:00:05] RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: That opportune time for a UTC late backport window deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T2100). [21:00:05] Sergi0, Kemayo, and WMDE-Fisch: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [21:00:12] o/ [21:00:18] o/ [21:00:21] o/ [21:00:30] I can self-deploy [21:00:36] Same [21:01:16] I'll do my change and leave it to you @Kemayo [21:01:17] sergi0: you're first in the list, and just have a config patch, so go ahead? [21:01:24] Jinx. [21:01:31] (03PS4) 10Ahmon Dancy: scap3 install provider: Set HOME for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) [21:01:33] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P88884 and previous config saved to /var/cache/conftool/dbconfig/20260218-210132-marostegui.json [21:01:39] (03CR) 10TrainBranchBot: [C:03+2] "Approved by sgimeno@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240032 (https://phabricator.wikimedia.org/T375198) (owner: 10Sergio Gimeno) [21:01:45] Kemayo: Mind deploying my backports? [21:02:14] (03CR) 10CI reject: [V:04-1] scap3 install provider: Set HOME for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) (owner: 10Ahmon Dancy) [21:02:34] (03Merged) 10jenkins-bot: [Growth] Specify notification delay as int instead of array [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240032 (https://phabricator.wikimedia.org/T375198) (owner: 10Sergio Gimeno) [21:03:02] !log sgimeno@deploy2002 Started scap sync-world: Backport for [[gerrit:1240032|[Growth] Specify notification delay as int instead of array (T375198 T415536)]] [21:03:08] T375198: Fully adopt TestKitchen for experiment enrollment - https://phabricator.wikimedia.org/T375198 [21:03:09] T415536: Allow running multiple experiments in GrowthExperiments at the same time - https://phabricator.wikimedia.org/T415536 [21:04:11] WMDE-Fisch: Sure, I can bundle yours in. [21:04:12] (03PS5) 10Ahmon Dancy: scap3 install provider: Set HOME for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) [21:04:18] Thanks! [21:05:15] !log sgimeno@deploy2002 sgimeno: Backport for [[gerrit:1240032|[Growth] Specify notification delay as int instead of array (T375198 T415536)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:06:44] !log sgimeno@deploy2002 sgimeno: Continuing with sync [21:10:03] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [21:10:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [21:10:26] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [21:10:27] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2017.codfw.wmnet with OS trixie [21:10:40] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11629782 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host backup2017.codfw.wmnet with OS trixie completed: - backup2017 (**PA... [21:10:50] !log sgimeno@deploy2002 Finished scap sync-world: Backport for [[gerrit:1240032|[Growth] Specify notification delay as int instead of array (T375198 T415536)]] (duration: 07m 48s) [21:10:56] T375198: Fully adopt TestKitchen for experiment enrollment - https://phabricator.wikimedia.org/T375198 [21:10:56] T415536: Allow running multiple experiments in GrowthExperiments at the same time - https://phabricator.wikimedia.org/T415536 [21:11:21] @Kemayo all yours [21:11:29] sergi0: thanks! [21:11:36] (03PS6) 10Ahmon Dancy: scap3 install provider: Set env vars for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) [21:12:02] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kemayo@deploy2002 using scap backport" [extensions/VisualEditor] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240361 (https://phabricator.wikimedia.org/T417801) (owner: 10DLynch) [21:12:02] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kemayo@deploy2002 using scap backport" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239873 (https://phabricator.wikimedia.org/T415910) (owner: 10Thiemo Kreuz (WMDE)) [21:12:04] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kemayo@deploy2002 using scap backport" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239877 (https://phabricator.wikimedia.org/T415909) (owner: 10Thiemo Kreuz (WMDE)) [21:12:21] (03CR) 10CI reject: [V:04-1] scap3 install provider: Set env vars for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) (owner: 10Ahmon Dancy) [21:14:16] (03Merged) 10jenkins-bot: BaseEditCheck: fix check for blockquote [extensions/VisualEditor] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240361 (https://phabricator.wikimedia.org/T417801) (owner: 10DLynch) [21:15:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [21:16:41] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P88885 and previous config saved to /var/cache/conftool/dbconfig/20260218-211640-marostegui.json [21:17:21] (03PS7) 10Ahmon Dancy: scap3 install provider: Set env vars for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) [21:19:39] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [21:20:11] (03PS1) 10SomeRandomDeveloper: Revert "Support CSS/JS thumbnail sizing in Parsoid" [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240384 (https://phabricator.wikimedia.org/T417828) [21:20:30] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [21:21:23] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [21:21:25] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2018.codfw.wmnet with OS trixie [21:21:39] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11629820 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host backup2018.codfw.wmnet with OS trixie completed: - backup2018 (**WA... [21:21:45] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [21:23:39] jouncebot: nowandnexr [21:23:40] jouncebot: nowandnext [21:23:40] For the next 0 hour(s) and 36 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T2100) [21:23:40] In 0 hour(s) and 36 minute(s): Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T2200) [21:24:13] Kemayo: let me know when you're done, I have a UBN to deploy [21:24:26] Amir1: Sure thing. [21:24:30] (03Merged) 10jenkins-bot: Add instrument for clicks in TOC references link [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239873 (https://phabricator.wikimedia.org/T415910) (owner: 10Thiemo Kreuz (WMDE)) [21:24:32] Thanks! [21:24:41] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [21:26:36] (03CR) 10Bearloga: "Shouldn't the other `airflow-*-admins` POSIX groups also be marked as deprecated?" [puppet] - 10https://gerrit.wikimedia.org/r/1240336 (owner: 10Muehlenhoff) [21:26:37] (03Merged) 10jenkins-bot: Add instrument for clicks in footnotes in the article [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239877 (https://phabricator.wikimedia.org/T415909) (owner: 10Thiemo Kreuz (WMDE)) [21:26:38] (03CR) 10Subramanya Sastry: [C:03+1] Revert "Support CSS/JS thumbnail sizing in Parsoid" [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240384 (https://phabricator.wikimedia.org/T417828) (owner: 10SomeRandomDeveloper) [21:27:09] !log kemayo@deploy2002 Started scap sync-world: Backport for [[gerrit:1240361|BaseEditCheck: fix check for blockquote (T417801)]], [[gerrit:1239873|Add instrument for clicks in TOC references link (T415910)]], [[gerrit:1239877|Add instrument for clicks in footnotes in the article (T415909)]] [21:27:16] T417801: ToneCheck should not run on block quotes - https://phabricator.wikimedia.org/T417801 [21:27:17] T415910: Instrument clicks on TOC in reader view desktop - https://phabricator.wikimedia.org/T415910 [21:27:17] T415909: Instrumentation for footnote click in reader view - https://phabricator.wikimedia.org/T415909 [21:29:21] !log kemayo@deploy2002 kemayo, thiemowmde: Backport for [[gerrit:1240361|BaseEditCheck: fix check for blockquote (T417801)]], [[gerrit:1239873|Add instrument for clicks in TOC references link (T415910)]], [[gerrit:1239877|Add instrument for clicks in footnotes in the article (T415909)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:29:39] (03CR) 10Ladsgroup: "deploying it right after the current deploy" [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240384 (https://phabricator.wikimedia.org/T417828) (owner: 10SomeRandomDeveloper) [21:29:48] Kemayo: nothing to test for me [21:29:56] Okay, I just need a second to check mine. [21:30:04] (03CR) 10SomeRandomDeveloper: "thanks!" [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240384 (https://phabricator.wikimedia.org/T417828) (owner: 10SomeRandomDeveloper) [21:31:02] !log kemayo@deploy2002 kemayo, thiemowmde: Continuing with sync [21:31:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2179 (T415786)', diff saved to https://phabricator.wikimedia.org/P88887 and previous config saved to /var/cache/conftool/dbconfig/20260218-213149-marostegui.json [21:31:53] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [21:32:06] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance [21:32:10] FIRING: BFDdown: BFD session down between cr2-esams and fe80::5e5e:ab00:d3d:83c7 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [21:35:05] !log kemayo@deploy2002 Finished scap sync-world: Backport for [[gerrit:1240361|BaseEditCheck: fix check for blockquote (T417801)]], [[gerrit:1239873|Add instrument for clicks in TOC references link (T415910)]], [[gerrit:1239877|Add instrument for clicks in footnotes in the article (T415909)]] (duration: 07m 56s) [21:35:12] T417801: ToneCheck should not run on block quotes - https://phabricator.wikimedia.org/T417801 [21:35:12] T415910: Instrument clicks on TOC in reader view desktop - https://phabricator.wikimedia.org/T415910 [21:35:13] T415909: Instrumentation for footnote click in reader view - https://phabricator.wikimedia.org/T415909 [21:35:15] Amir1: All yours. [21:35:25] Thank you! [21:35:43] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240384 (https://phabricator.wikimedia.org/T417828) (owner: 10SomeRandomDeveloper) [21:36:00] 10ops-eqiad, 06DC-Ops: Install cable managment for E15 and E16 - https://phabricator.wikimedia.org/T417832 (10VRiley-WMF) 03NEW [21:36:46] 10ops-eqiad, 06DC-Ops: Install cable managment for E15 and E16 - https://phabricator.wikimedia.org/T417832#11629888 (10VRiley-WMF) 05Open→03Resolved Added cable managment into the side of the rack for E15 and E16. [21:37:10] RESOLVED: BFDdown: BFD session down between cr2-esams and fe80::5e5e:ab00:d3d:83c7 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [21:38:08] Thanks Kemayo ! [21:39:31] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q3:rack/setup/install frdb1008 - https://phabricator.wikimedia.org/T414374#11629902 (10VRiley-WMF) Attempted to run the script for the fundraising rack to add the cables into netbox and running into an issue (due to the move) I have notified @ayounsi ab... [21:43:00] (03PS3) 10RLazarus: deployment_server: Add services_dir arg to charlie helper functions [puppet] - 10https://gerrit.wikimedia.org/r/1239428 (https://phabricator.wikimedia.org/T417456) [21:43:00] (03PS3) 10RLazarus: deployment_server: Add a ServiceInventory convenience class to charlie [puppet] - 10https://gerrit.wikimedia.org/r/1239429 (https://phabricator.wikimedia.org/T417456) [21:43:00] (03PS3) 10RLazarus: deployment_server: Make SKIP_DIRS relative to the repo root in charlie [puppet] - 10https://gerrit.wikimedia.org/r/1239430 (https://phabricator.wikimedia.org/T417456) [21:43:00] (03PS3) 10RLazarus: deployment_server: Add --services_dir flag to charlie [puppet] - 10https://gerrit.wikimedia.org/r/1239431 (https://phabricator.wikimedia.org/T417456) [21:43:20] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [21:44:41] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [21:48:29] (03CR) 10RLazarus: [C:03+2] deployment_server: Add services_dir arg to charlie helper functions [puppet] - 10https://gerrit.wikimedia.org/r/1239428 (https://phabricator.wikimedia.org/T417456) (owner: 10RLazarus) [21:48:38] (03CR) 10RLazarus: [C:03+2] deployment_server: Add a ServiceInventory convenience class to charlie [puppet] - 10https://gerrit.wikimedia.org/r/1239429 (https://phabricator.wikimedia.org/T417456) (owner: 10RLazarus) [21:48:45] (03Merged) 10jenkins-bot: Revert "Support CSS/JS thumbnail sizing in Parsoid" [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1240384 (https://phabricator.wikimedia.org/T417828) (owner: 10SomeRandomDeveloper) [21:48:55] (03CR) 10RLazarus: [C:03+2] deployment_server: Make SKIP_DIRS relative to the repo root in charlie (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1239430 (https://phabricator.wikimedia.org/T417456) (owner: 10RLazarus) [21:49:02] (03CR) 10RLazarus: [C:03+2] "Thanks for all the reviews!" [puppet] - 10https://gerrit.wikimedia.org/r/1239431 (https://phabricator.wikimedia.org/T417456) (owner: 10RLazarus) [21:49:17] !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1240384|Revert "Support CSS/JS thumbnail sizing in Parsoid" (T417828)]] [21:49:21] T417828: Some images become disproportionately big in 1.46.0-wmf.16 - https://phabricator.wikimedia.org/T417828 [21:51:22] !log ladsgroup@deploy2002 ladsgroup, somerandomdeveloper: Backport for [[gerrit:1240384|Revert "Support CSS/JS thumbnail sizing in Parsoid" (T417828)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:51:31] (03PS1) 10Krinkle: grafana: change swift query from 5m to $__rate_interval [puppet] - 10https://gerrit.wikimedia.org/r/1240390 (https://phabricator.wikimedia.org/T371102) [21:52:13] fixed for me when using k8s-mwdebug and hard-refreshing the page [21:52:14] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2019.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [21:52:27] SomeRandomDev: Thanks! [21:52:30] !log ladsgroup@deploy2002 ladsgroup, somerandomdeveloper: Continuing with sync [21:52:32] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [21:56:30] !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1240384|Revert "Support CSS/JS thumbnail sizing in Parsoid" (T417828)]] (duration: 07m 14s) [21:56:35] T417828: Some images become disproportionately big in 1.46.0-wmf.16 - https://phabricator.wikimedia.org/T417828 [21:57:03] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2019.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [21:58:33] PROBLEM - Host titan1002 is DOWN: PING CRITICAL - Packet loss = 100% [21:58:42] (03PS1) 10BCornwall: haproxy: symlink /etc/acmechief to cert tmpfs [puppet] - 10https://gerrit.wikimedia.org/r/1240395 [21:59:59] RECOVERY - Host titan1002 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [22:00:05] Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T2200) [22:00:37] (03CR) 10BCornwall: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8069/co" [puppet] - 10https://gerrit.wikimedia.org/r/1240395 (owner: 10BCornwall) [22:00:57] FIRING: [3x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [22:05:57] RESOLVED: [3x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [22:09:27] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [22:13:43] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11630040 (10Jhancock.wm) [22:14:43] !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2008-dev [22:14:53] !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2008-dev [22:15:00] !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cloudgw2004-dev [22:15:09] !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudgw2004-dev [22:15:53] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host cloudcephosd2008-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [22:16:08] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host cloudgw2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [22:17:08] jouncebot: nowandnext [22:17:09] For the next 0 hour(s) and 42 minute(s): Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T2200) [22:17:09] In 0 hour(s) and 42 minute(s): Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T2300) [22:26:21] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd2008-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [22:27:37] (03CR) 10Ryan Kemper: [C:03+2] hadoop: decom an-worker1132 [puppet] - 10https://gerrit.wikimedia.org/r/1238826 (https://phabricator.wikimedia.org/T414948) (owner: 10Ryan Kemper) [22:29:21] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudgw2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [22:30:25] (03PS2) 10BCornwall: haproxy: symlink /etc/acmechief to cert tmpfs [puppet] - 10https://gerrit.wikimedia.org/r/1240395 [22:34:01] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcephosd2008-dev.codfw.wmnet with OS bookworm [22:34:11] 10ops-codfw, 06SRE, 06DC-Ops: Q3:rack/setup/install cloudcephosd2008-dev - https://phabricator.wikimedia.org/T416396#11630076 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudcephosd2008-dev.codfw.wmnet with OS bookworm [22:34:41] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [22:35:16] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host cloudgw2004-dev.codfw.wmnet with OS trixie [22:35:24] 10ops-codfw, 06SRE, 06DC-Ops: FY2526 Q3:rack/setup/install cloudgw2004-dev - https://phabricator.wikimedia.org/T413831#11630077 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudgw2004-dev.codfw.wmnet with OS trixie [22:38:20] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [22:44:41] FIRING: [2x] JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [22:50:49] (03PS2) 10Muehlenhoff: Obsolete airflow-analytics-product-admins POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1240336 [22:51:05] (03CR) 10Muehlenhoff: "I'll go through all of them, but one at a time broken down by teams. Some of newer Airflow instances never had an underlying POSIX group t" [puppet] - 10https://gerrit.wikimedia.org/r/1240336 (owner: 10Muehlenhoff) [22:51:33] !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2004-dev.codfw.wmnet with reason: host reimage [22:52:08] (03PS8) 10Ahmon Dancy: scap3 install provider: Set env vars for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) [22:53:20] RESOLVED: [2x] JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [22:54:56] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2004-dev.codfw.wmnet with reason: host reimage [22:57:29] (03PS1) 10RLazarus: deployment_server: Really read namespaces in charlie --dangerously_fast [puppet] - 10https://gerrit.wikimedia.org/r/1240401 (https://phabricator.wikimedia.org/T417456) [23:00:05] Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260218T2300) [23:05:49] (03CR) 10Thcipriani: [C:03+1] scap3 install provider: Set env vars for deploy_user when running scap [puppet] - 10https://gerrit.wikimedia.org/r/1240372 (https://phabricator.wikimedia.org/T417767) (owner: 10Ahmon Dancy) [23:08:58] (03CR) 10RLazarus: "This closes out that TODO from the last patch. It felt Rube-Goldberg-y, but now that I actually look at it, I think it's fine -- intereste" [puppet] - 10https://gerrit.wikimedia.org/r/1240401 (https://phabricator.wikimedia.org/T417456) (owner: 10RLazarus) [23:13:39] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [23:14:45] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [23:14:47] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2004-dev.codfw.wmnet with OS trixie [23:14:59] 10ops-codfw, 06SRE, 06DC-Ops: FY2526 Q3:rack/setup/install cloudgw2004-dev - https://phabricator.wikimedia.org/T413831#11630224 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudgw2004-dev.codfw.wmnet with OS trixie completed: - cloudgw2004-dev (**PASS**)... [23:18:45] 10ops-codfw, 06SRE, 06DC-Ops: FY2526 Q3:rack/setup/install cloudgw2004-dev - https://phabricator.wikimedia.org/T413831#11630243 (10Jhancock.wm) 05Open→03Resolved [23:19:01] 10ops-codfw, 06SRE, 06DC-Ops: FY2526 Q3:rack/setup/install cloudgw2004-dev - https://phabricator.wikimedia.org/T413831#11630249 (10Jhancock.wm) @Andrew this one is complete [23:19:46] 10ops-eqiad, 06SRE, 06DC-Ops: Q3:rack/setup/install cloudcephosd1053 - https://phabricator.wikimedia.org/T416394#11630255 (10Jhancock.wm) [23:20:24] jhancock@cumin2002 reimage (PID 2802271) is awaiting input [23:20:36] 10ops-codfw, 06SRE, 06DC-Ops: Q3:rack/setup/install cloudcephosd2008-dev - https://phabricator.wikimedia.org/T416396#11630258 (10Jhancock.wm) [23:22:00] 10ops-codfw, 06SRE, 06DC-Ops: Q3:rack/setup/install cloudcephosd2008-dev - https://phabricator.wikimedia.org/T416396#11630263 (10Jhancock.wm) @Andrew this one got mad about something in the preseed. Failed to run preseeded command │ ││ Execution of preseeded command "wget -O /tmp/part... [23:22:26] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2008-dev.codfw.wmnet with OS bookworm [23:22:33] 10ops-codfw, 06SRE, 06DC-Ops: Q3:rack/setup/install cloudcephosd2008-dev - https://phabricator.wikimedia.org/T416396#11630268 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudcephosd2008-dev.codfw.wmnet with OS bookworm executed with errors: - cloudcepho... [23:38:57] 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11630302 (10Krinkle) >>! In T414805#11623347, @Joe wrote: >>>! In T414805#11612457, @Krinkle wrote: >> >> @MatthewVernon Based on di... [23:39:04] FIRING: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins [23:42:41] !log jhancock@cumin2002 START - Cookbook sre.dns.netbox [23:46:19] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding apus-fe2004 to codfw - jhancock@cumin2002" [23:46:25] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding apus-fe2004 to codfw - jhancock@cumin2002" [23:46:25] !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [23:47:59] (03PS1) 10Bvibber: Disable ReaderExperiments on commonswiki due to ParserMigration dep [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240406 [23:50:11] !log jhancock@cumin2002 START - Cookbook sre.dns.netbox [23:52:00] (03PS2) 10Bvibber: Disable ReaderExperiments on beta commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240406 [23:53:42] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding apus-fe2004 to codfw - jhancock@cumin2002" [23:53:48] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding apus-fe2004 to codfw - jhancock@cumin2002" [23:53:48] !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [23:54:02] !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host apus-fe2004 [23:54:44] !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe2004 [23:54:46] (03CR) 10Reedy: [C:03+2] Disable ReaderExperiments on beta commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240406 (owner: 10Bvibber) [23:54:52] \o/ [23:55:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:55:29] 06SRE, 10MediaWiki-extensions-OAuth, 06MediaWiki-Platform-Team: Editing using OAuth 2 doesn’t work - https://phabricator.wikimedia.org/T417839#11630340 (10matmarex) >>! In T417839#11630325, @matmarex wrote: > I can't even figure out where does the string "Jwt issuer is not configured" come from. I am pretty... [23:55:33] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host apus-fe2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [23:55:42] hi SRE. i think this is an Envoy configuration problem: https://phabricator.wikimedia.org/T417839 [23:55:55] (03Merged) 10jenkins-bot: Disable ReaderExperiments on beta commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240406 (owner: 10Bvibber) [23:55:55] (it's an unbreak-now task) [23:57:21] sukhe: brett: hi, can you help, or find out who could help? i think this unbreak-now task is an Envoy configuration problem: https://phabricator.wikimedia.org/T417839 (see last comment) [23:59:18] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host apus-fe2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [23:59:40] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host apus-fe2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED