[00:10:54] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on db2141 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 634.55 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[00:13:10] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1222 (T371742)', diff saved to https://phabricator.wikimedia.org/P73196 and previous config saved to /var/cache/conftool/dbconfig/20250205-001309-ladsgroup.json
[00:13:13] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[00:18:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on wikikube-worker1257:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1257 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[00:23:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on wikikube-worker1257:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1257 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[00:30:09] <wikibugs>	 (03Abandoned) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1116889 (owner: 10TrainBranchBot)
[00:32:14] <wikibugs>	 (03PS1) 10Scott French: mw-api-int: serve 5% of traffic on PHP 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117263 (https://phabricator.wikimedia.org/T383845)
[00:32:15] <wikibugs>	 (03PS1) 10Scott French: mw-(api-ext|web): scale next to 25% of main [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117271 (https://phabricator.wikimedia.org/T383845)
[00:32:17] <wikibugs>	 (03PS1) 10Scott French: Enroll 50% of client sessions in PHP 8.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117276 (https://phabricator.wikimedia.org/T383845)
[00:38:26] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1117289
[00:38:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1117289 (owner: 10TrainBranchBot)
[00:49:50] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1117289 (owner: 10TrainBranchBot)
[01:00:02] <wikibugs>	 (03Abandoned) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1116890 (owner: 10TrainBranchBot)
[01:08:23] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1117295
[01:08:23] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1117295 (owner: 10TrainBranchBot)
[01:28:29] <zabe>	 !log zabe@mwmaint2002:/tmp/uploads$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Dyolf77 /tmp/uploads # T385642
[01:28:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:28:32] <stashbot>	 T385642: Server side upload for Dyolf77 - https://phabricator.wikimedia.org/T385642
[01:28:49] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1117295 (owner: 10TrainBranchBot)
[01:40:16] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:40:58] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:43:20] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:44:10] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 09 Apr 2025 10:34:17 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:46:28] <icinga-wm>	 PROBLEM - Disk space on releases1003 is CRITICAL: DISK CRITICAL - /srv/docker/overlay2/d754a861a3040321cd1fff53ffa354ec3fc7cde0db1a7c0f0e9b908053449561/merged is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops
[01:47:20] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:47:50] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53515 bytes in 1.140 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:48:06] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.196 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:48:10] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 09 Apr 2025 10:34:17 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:48:30] <mutante>	 the releases1003 "disk space" issue isn't actually one. it's permissions to the docker overlay filesystem stuff.. as had to be fixed many times before
[01:49:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1244 (T384592)', diff saved to https://phabricator.wikimedia.org/P73197 and previous config saved to /var/cache/conftool/dbconfig/20250205-014907-marostegui.json
[01:49:11] <stashbot>	 T384592: Add normalization columns to categorylinks table - https://phabricator.wikimedia.org/T384592
[02:04:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P73198 and previous config saved to /var/cache/conftool/dbconfig/20250205-020414-marostegui.json
[02:06:28] <icinga-wm>	 RECOVERY - Disk space on releases1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops
[02:06:52] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on db2141 is OK: OK slave_sql_lag Replication lag: 48.37 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[02:08:06] <icinga-wm>	 PROBLEM - SSH on bast4005 is CRITICAL: Server answer: Exceeded MaxStartups https://wikitech.wikimedia.org/wiki/SSH/monitoring
[02:09:06] <icinga-wm>	 RECOVERY - SSH on bast4005 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[02:19:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P73199 and previous config saved to /var/cache/conftool/dbconfig/20250205-021921-marostegui.json
[02:34:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1244 (T384592)', diff saved to https://phabricator.wikimedia.org/P73200 and previous config saved to /var/cache/conftool/dbconfig/20250205-023428-marostegui.json
[02:34:32] <stashbot>	 T384592: Add normalization columns to categorylinks table - https://phabricator.wikimedia.org/T384592
[02:34:44] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
[02:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:06:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:09:55] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:07:16] <icinga-wm>	 PROBLEM - Disk space on ms-be2051 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sde1 is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ms-be2051&var-datasource=codfw+prometheus/ops
[04:11:15] <jinxer-wm>	 FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-int - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[04:16:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-int - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[04:40:43] <jinxer-wm>	 FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1012:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[05:12:41] <wikibugs>	 (03CR) 10Ecarg: [C:03+2] wikifunctions: Upgrade function-orchestrator RAM request, given heap issues [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117243 (https://phabricator.wikimedia.org/T384883) (owner: 10Jforrester)
[05:12:59] <wikibugs>	 (03CR) 10Ecarg: [C:03+2] "thank youu" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117243 (https://phabricator.wikimedia.org/T384883) (owner: 10Jforrester)
[05:13:53] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Upgrade function-orchestrator RAM request, given heap issues [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117243 (https://phabricator.wikimedia.org/T384883) (owner: 10Jforrester)
[05:17:04] <wikibugs>	 (03PS2) 10KartikMistry: Update cxserver to 2025-02-03-095815-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1116912 (https://phabricator.wikimedia.org/T377966)
[05:17:42] <kart_>	 Updating cxserver in a few minutes..
[05:18:34] <wikibugs>	 (03CR) 10KartikMistry: [C:03+2] Update cxserver to 2025-02-03-095815-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1116912 (https://phabricator.wikimedia.org/T377966) (owner: 10KartikMistry)
[05:19:41] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2025-02-03-095815-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1116912 (https://phabricator.wikimedia.org/T377966) (owner: 10KartikMistry)
[05:19:58] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:20:48] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53513 bytes in 0.077 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:31:06] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply
[05:31:34] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[05:41:58] <wikibugs>	 (03PS2) 10KartikMistry: Make MT limit more strict by 10 Percentage Point in Bhojpuri Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117113 (https://phabricator.wikimedia.org/T383789)
[05:42:15] <wikibugs>	 (03CR) 10KartikMistry: Make MT limit more strict by 10 Percentage Point in Bhojpuri Wikipedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117113 (https://phabricator.wikimedia.org/T383789) (owner: 10KartikMistry)
[05:43:42] <logmsgbot>	 !log kartik@deploy2002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[05:44:13] <logmsgbot>	 !log kartik@deploy2002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[05:49:23] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[05:49:57] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[05:50:28] <kart_>	 !log Updated cxserver to 2025-02-03-095815-production (T377966, T385185)
[05:50:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:50:32] <stashbot>	 T377966: Make cxserver Logstash logs readable and reliable - https://phabricator.wikimedia.org/T377966
[05:50:33] <stashbot>	 T385185: Post-creation work for kncwiki - https://phabricator.wikimedia.org/T385185
[05:57:23] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: update article-country prod config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117318 (https://phabricator.wikimedia.org/T382295)
[06:15:43] <jinxer-wm>	 RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1012:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[06:23:43] <jinxer-wm>	 FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1012:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[06:39:05] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1247.eqiad.wmnet with reason: Maintenance
[06:39:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1247 (T384592)', diff saved to https://phabricator.wikimedia.org/P73201 and previous config saved to /var/cache/conftool/dbconfig/20250205-063911-marostegui.json
[06:39:15] <stashbot>	 T384592: Add normalization columns to categorylinks table - https://phabricator.wikimedia.org/T384592
[06:40:42] <wikibugs>	 (03PS2) 10Anzx: kywiki: create draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117321 (https://phabricator.wikimedia.org/T385593)
[06:50:03] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 05 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117321 (https://phabricator.wikimedia.org/T385593) (owner: 10Anzx)
[07:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T0700)
[07:09:55] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:47:14] <icinga-wm>	 PROBLEM - BFD status on cr2-magru is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:47:18] <icinga-wm>	 PROBLEM - BFD status on cr2-eqdfw is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:48:14] <icinga-wm>	 RECOVERY - BFD status on cr2-magru is OK: UP: 3 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:48:18] <icinga-wm>	 RECOVERY - BFD status on cr2-eqdfw is OK: UP: 16 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:48:28] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on clouddb1013 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 86343.06 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[07:49:58] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s2 on an-redacteddb1001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 86217.20 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[07:49:58] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s7 on an-redacteddb1001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 85268.20 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[07:50:48] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on an-redacteddb1001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 75018.19 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[07:55:48] <wikibugs>	 (03CR) 10Elukey: [C:03+1] external_cloud_vendors: Added OpenAI IP lists [puppet] - 10https://gerrit.wikimedia.org/r/1117245 (https://phabricator.wikimedia.org/T385616) (owner: 10Fabfur)
[08:00:04] <jouncebot>	 Amir1, Urbanecm, and awight: gettimeofday() says it's time for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T0800)
[08:00:05] <jouncebot>	 Jhs and anzx: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:00:22] <Jhs>	 hiya, i'm here
[08:03:26] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Q2:rack/setup/install ganeti105[34].eqiad.wmnet - https://phabricator.wikimedia.org/T381576#10524037 (10elukey) I double checked via Redfish and `P1_AIOMAOC_AG_i2LAN1OPROM` is set to `PXE` (as expected).
[08:09:19] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] external_cloud_vendors: Added OpenAI IP lists [puppet] - 10https://gerrit.wikimedia.org/r/1117245 (https://phabricator.wikimedia.org/T385616) (owner: 10Fabfur)
[08:09:43] <anzx>	 o/
[08:12:58] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: fix o11y wmcloud idp-test access [puppet] - 10https://gerrit.wikimedia.org/r/1117488
[08:14:45] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Q2:rack/setup E8/F8 new leaf switches - https://phabricator.wikimedia.org/T382017#10524070 (10ayounsi) Sure, as usual for power/console/mgmt. Regarding production ports : On the ssw1 side: `use `et-0/0/7` towards e8 and `et-0/0/15` tow...
[08:17:10] <wikibugs>	 (03PS1) 10Aklapper: Phabricator: Disable weekly 2fa mail [puppet] - 10https://gerrit.wikimedia.org/r/1117489 (https://phabricator.wikimedia.org/T304792)
[08:23:03] <wikibugs>	 (03CR) 10Elukey: [C:03+1] hieradata: fix o11y wmcloud idp-test access [puppet] - 10https://gerrit.wikimedia.org/r/1117488 (owner: 10Filippo Giunchedi)
[08:27:03] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] hieradata: fix o11y wmcloud idp-test access [puppet] - 10https://gerrit.wikimedia.org/r/1117488 (owner: 10Filippo Giunchedi)
[08:35:34] <wikibugs>	 (03PS1) 10Elukey: knative: backport https://github.com/knative/serving/pull/13402 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1117492 (https://phabricator.wikimedia.org/T369493)
[08:41:50] <wikibugs>	 (03CR) 10Jelto: [C:03+2] "I'll merge this and monitor the metrics for query-main and query service gui closely." [puppet] - 10https://gerrit.wikimedia.org/r/1115766 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[08:55:07] <wikibugs>	 (03PS6) 10Fabfur: hiera: enable json logging for benthos [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392)
[08:55:38] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[08:59:55] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:00:05] <jouncebot>	 jnuche and jeena: gettimeofday() says it's time for MediaWiki train - Utc-0+Utc-7 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T0900)
[09:00:30] <jnuche>	 hi there, rolling out the train in a few minutes
[09:01:58] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 05 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117204 (https://phabricator.wikimedia.org/T385591) (owner: 10Jon Harald Søby)
[09:02:58] <wikibugs>	 (03PS7) 10Fabfur: hiera: enable json logging for benthos [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392)
[09:03:00] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 05 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117321 (https://phabricator.wikimedia.org/T385593) (owner: 10Anzx)
[09:03:32] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[09:03:46] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 to 1.44.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117494 (https://phabricator.wikimedia.org/T382366)
[09:03:47] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group1 to 1.44.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117494 (https://phabricator.wikimedia.org/T382366) (owner: 10TrainBranchBot)
[09:04:34] <wikibugs>	 (03Merged) 10jenkins-bot: group1 to 1.44.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117494 (https://phabricator.wikimedia.org/T382366) (owner: 10TrainBranchBot)
[09:09:28] <wikibugs>	 (03PS8) 10Fabfur: hiera: enable json logging for benthos [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392)
[09:12:01] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[09:13:54] <logmsgbot>	 !log jnuche@deploy2002 rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.15  refs T382366
[09:13:57] <stashbot>	 T382366: 1.44.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T382366
[09:19:44] <wikibugs>	 (03PS9) 10Fabfur: hiera: enable json logging for benthos [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392)
[09:21:11] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[09:22:36] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[09:31:32] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1155-1156].eqiad.wmnet with reason: Rebuild tables
[09:31:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1156 for index rebuild', diff saved to https://phabricator.wikimedia.org/P73202 and previous config saved to /var/cache/conftool/dbconfig/20250205-093152-marostegui.json
[09:32:04] <logmsgbot>	 !log root@cumin1002 START - Cookbook sre.mysql.upgrade for db1156.eqiad.wmnet
[09:32:28] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1014.eqiad.wmnet with reason: Rebuild tables
[09:32:55] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Rebuild tables
[09:34:10] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] knative: backport https://github.com/knative/serving/pull/13402 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1117492 (https://phabricator.wikimedia.org/T369493) (owner: 10Elukey)
[09:34:32] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] ml-services: update article-country prod config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117318 (https://phabricator.wikimedia.org/T382295) (owner: 10Kevin Bazira)
[09:38:24] <logmsgbot>	 !log root@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1156.eqiad.wmnet
[09:39:02] <logmsgbot>	 !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Index rebuild
[09:41:14] <wikibugs>	 (03PS1) 10Marostegui: installserver: Do not format db1250 [puppet] - 10https://gerrit.wikimedia.org/r/1117497
[09:42:44] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s2 on clouddb1018 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 544.98 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[09:43:37] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] installserver: Do not format db1250 [puppet] - 10https://gerrit.wikimedia.org/r/1117497 (owner: 10Marostegui)
[09:46:08] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1018.eqiad.wmnet with reason: Rebuild tables
[09:46:37] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] "thanks for the review :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117318 (https://phabricator.wikimedia.org/T382295) (owner: 10Kevin Bazira)
[09:47:11] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling after cloning db1251', diff saved to https://phabricator.wikimedia.org/P73203 and previous config saved to /var/cache/conftool/dbconfig/20250205-094711-fceratto.json
[09:47:46] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: update article-country prod config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117318 (https://phabricator.wikimedia.org/T382295) (owner: 10Kevin Bazira)
[09:52:25] <logmsgbot>	 !log kevinbazira@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
[09:54:15] <wikibugs>	 (03PS2) 10Elukey: knative: backport https://github.com/knative/serving/pull/13402 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1117492 (https://phabricator.wikimedia.org/T369493)
[09:54:49] <wikibugs>	 (03CR) 10Elukey: [V:03+2 C:03+2] knative: backport https://github.com/knative/serving/pull/13402 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1117492 (https://phabricator.wikimedia.org/T369493) (owner: 10Elukey)
[09:55:59] <logmsgbot>	 !log kevinbazira@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
[09:56:43] <logmsgbot>	 !log mvernon@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ms-be2075.codfw.wmnet with reason: hardware broken awaiting vendor action
[09:56:51] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Frequent disk resets on ms-be2075 - https://phabricator.wikimedia.org/T382707#10524282 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=a9517ffa-d053-4e3b-a7d0-6b08948ed456) set by mvernon@cumin2002 for 7 days, 0:00:00 on 1 host(s) and t...
[09:58:00] <logmsgbot>	 !log mvernon@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ms-be2051.codfw.wmnet with reason: disk failed, due decom soon
[09:58:09] <wikibugs>	 06SRE, 10SRE-swift-storage, 13Patch-For-Review: ms backend hardware refresh for 24/25 - https://phabricator.wikimedia.org/T382056#10524286 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=837a92b8-0555-4a3d-bd8e-9aefd3493691) set by mvernon@cumin2002 for 2 days, 0:00:00 on 1 host(s) and th...
[09:59:43] <wikibugs>	 (03PS1) 10Jelto: trafficserver: move /querybuilder before catch-all [puppet] - 10https://gerrit.wikimedia.org/r/1117498 (https://phabricator.wikimedia.org/T350793)
[09:59:52] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29349 bytes in 0.322 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
[10:02:17] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling after cloning db1251', diff saved to https://phabricator.wikimedia.org/P73205 and previous config saved to /var/cache/conftool/dbconfig/20250205-100216-fceratto.json
[10:04:53] <wikibugs>	 (03PS1) 10Elukey: admin_ng: update Knative docker images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117500
[10:06:08] <wikibugs>	 (03PS1) 10Federico Ceratto: db1251.yaml: enable monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1117501 (https://phabricator.wikimedia.org/T385141)
[10:12:27] <wikibugs>	 (03CR) 10Elukey: [C:03+2] admin_ng: update Knative docker images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117500 (owner: 10Elukey)
[10:13:16] <wikibugs>	 (03CR) 10Elukey: [V:03+2 C:03+2] admin_ng: update Knative docker images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117500 (owner: 10Elukey)
[10:13:30] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] db1251.yaml: enable monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1117501 (https://phabricator.wikimedia.org/T385141) (owner: 10Federico Ceratto)
[10:14:19] <urbanecm>	 jnuche: we have a visual regression that is fairly visible (T385542). we have a fix already, OK to deploy it?
[10:14:20] <stashbot>	 T385542: [testwiki-wmf.15] Add  link inspector elements are misaligned - https://phabricator.wikimedia.org/T385542
[10:14:40] <dcausse>	 !log restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly)
[10:14:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:18] <jnuche>	 urbanecm: yes please, go ahead
[10:15:37] <urbanecm>	 proceeding, thanks!
[10:16:00] <wikibugs>	 (03PS1) 10Urbanecm: fix(AddLink): button should show after link preview [extensions/GrowthExperiments] (wmf/1.44.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1117502 (https://phabricator.wikimedia.org/T385542)
[10:16:05] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] fix(AddLink): button should show after link preview [extensions/GrowthExperiments] (wmf/1.44.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1117502 (https://phabricator.wikimedia.org/T385542) (owner: 10Urbanecm)
[10:16:21] <wikibugs>	 (03CR) 10Federico Ceratto: [C:03+2] db1251.yaml: enable monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1117501 (https://phabricator.wikimedia.org/T385141) (owner: 10Federico Ceratto)
[10:17:22] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling after cloning db1251', diff saved to https://phabricator.wikimedia.org/P73207 and previous config saved to /var/cache/conftool/dbconfig/20250205-101721-fceratto.json
[10:17:25] <logmsgbot>	 !log elukey@deploy2002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
[10:18:54] <logmsgbot>	 !log elukey@deploy2002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
[10:20:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1202, db2221 for index rebuild', diff saved to https://phabricator.wikimedia.org/P73208 and previous config saved to /var/cache/conftool/dbconfig/20250205-102012-marostegui.json
[10:20:19] <logmsgbot>	 !log root@cumin1002 START - Cookbook sre.mysql.upgrade for db2221.codfw.wmnet
[10:20:30] <logmsgbot>	 !log root@cumin1002 START - Cookbook sre.mysql.upgrade for db1202.eqiad.wmnet
[10:20:50] <logmsgbot>	 !log fceratto@cumin1002 START - Cookbook sre.hosts.remove-downtime for db1251.eqiad.wmnet
[10:20:51] <logmsgbot>	 !log fceratto@cumin1002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1251.eqiad.wmnet
[10:21:32] <wikibugs>	 (03CR) 10FNegri: [C:03+1] "Adding a +1 after merge, this makes sense to me." [puppet] - 10https://gerrit.wikimedia.org/r/1116868 (https://phabricator.wikimedia.org/T383370) (owner: 10Andrew Bogott)
[10:23:43] <jinxer-wm>	 RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1012:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[10:25:51] <logmsgbot>	 !log root@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2221.codfw.wmnet
[10:26:14] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1251 (re)pooling @ 1%: Pooling in new host', diff saved to https://phabricator.wikimedia.org/P73209 and previous config saved to /var/cache/conftool/dbconfig/20250205-102614-fceratto.json
[10:26:23] <wikibugs>	 (03PS21) 10Clément Goubert: mediawiki: Add kubernetes periodic job support [puppet] - 10https://gerrit.wikimedia.org/r/1117222 (https://phabricator.wikimedia.org/T385596)
[10:26:23] <wikibugs>	 (03PS11) 10Clément Goubert: mediawiki: Migrate one dry-run job to kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963)
[10:26:44] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[10:26:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1179 (T385645)', diff saved to https://phabricator.wikimedia.org/P73210 and previous config saved to /var/cache/conftool/dbconfig/20250205-102650-marostegui.json
[10:26:54] <stashbot>	 T385645: Drop event_variant column from echo_event - https://phabricator.wikimedia.org/T385645
[10:27:05] <klausman>	 !log pushing Changeprop patch (k8s values) https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1117063
[10:27:06] <wikibugs>	 (03PS1) 10Clément Goubert: mw-cron: Add puppet-defined periodic jobs file [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117503 (https://phabricator.wikimedia.org/T385596)
[10:27:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:27:11] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] mw-api-int: serve 5% of traffic on PHP 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117263 (https://phabricator.wikimedia.org/T383845) (owner: 10Scott French)
[10:27:11] <logmsgbot>	 !log root@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1202.eqiad.wmnet
[10:27:28] <wikibugs>	 (03PS2) 10Jelto: trafficserver: move /querybuilder before catch-all [puppet] - 10https://gerrit.wikimedia.org/r/1117498 (https://phabricator.wikimedia.org/T350793)
[10:27:30] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] mw-(api-ext|web): scale next to 25% of main [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117271 (https://phabricator.wikimedia.org/T383845) (owner: 10Scott French)
[10:27:48] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] Enroll 50% of client sessions in PHP 8.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117276 (https://phabricator.wikimedia.org/T383845) (owner: 10Scott French)
[10:27:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T385645)', diff saved to https://phabricator.wikimedia.org/P73211 and previous config saved to /var/cache/conftool/dbconfig/20250205-102758-marostegui.json
[10:29:01] <logmsgbot>	 !log klausman@deploy2002 helmfile [eqiad] START helmfile.d/services/changeprop: apply
[10:30:22] <logmsgbot>	 !log klausman@deploy2002 helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
[10:30:32] <icinga-wm>	 RECOVERY - Disk space on ml-lab1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ml-lab1001&var-datasource=eqiad+prometheus/ops
[10:31:49] <wikibugs>	 (03Merged) 10jenkins-bot: fix(AddLink): button should show after link preview [extensions/GrowthExperiments] (wmf/1.44.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1117502 (https://phabricator.wikimedia.org/T385542) (owner: 10Urbanecm)
[10:32:01] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] "Actually yeah I completely missed that none of the `Chart.yaml` were bumped, I'm actually surprised it produced a diff in prod for `kartot" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1105972 (https://phabricator.wikimedia.org/T359497) (owner: 10Cwhite)
[10:32:27] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling after cloning db1251', diff saved to https://phabricator.wikimedia.org/P73212 and previous config saved to /var/cache/conftool/dbconfig/20250205-103227-fceratto.json
[10:33:19] <logmsgbot>	 !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1117502|fix(AddLink): button should show after link preview (T385542)]]
[10:33:21] <stashbot>	 T385542: [testwiki-wmf.15] Add  link inspector elements are misaligned - https://phabricator.wikimedia.org/T385542
[10:33:53] <wikibugs>	 (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1117222 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[10:33:57] <wikibugs>	 (03PS1) 10Effie Mouzeli: shellbox: all replicas on PHP 8.1 (score) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117506 (https://phabricator.wikimedia.org/T377038)
[10:34:00] <wikibugs>	 (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963) (owner: 10Clément Goubert)
[10:35:29] <wikibugs>	 (03PS1) 10Hnowlan: trafficserver: remove restbase from hewiki mobile-html api [puppet] - 10https://gerrit.wikimedia.org/r/1117508 (https://phabricator.wikimedia.org/T372746)
[10:36:21] <logmsgbot>	 !log urbanecm@deploy2002 urbanecm: Backport for [[gerrit:1117502|fix(AddLink): button should show after link preview (T385542)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[10:37:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P73213 and previous config saved to /var/cache/conftool/dbconfig/20250205-103738-marostegui.json
[10:39:00] <logmsgbot>	 !log urbanecm@deploy2002 urbanecm: Continuing with sync
[10:39:58] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s7 on an-redacteddb1001 is OK: OK slave_sql_lag Replication lag: 0.11 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:43:25] <wikibugs>	 (03PS1) 10Effie Mouzeli: mw-parsoid & mw-jobrunner serve 2% of traffic on PHP 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117511 (https://phabricator.wikimedia.org/T383845)
[10:43:46] <marostegui>	 !log Set x1 to SBR for a bit T385645
[10:43:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:43:49] <stashbot>	 T385645: Drop event_variant column from echo_event - https://phabricator.wikimedia.org/T385645
[10:44:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73214 and previous config saved to /var/cache/conftool/dbconfig/20250205-104423-root.json
[10:45:34] <logmsgbot>	 !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1117502|fix(AddLink): button should show after link preview (T385542)]] (duration: 12m 15s)
[10:45:37] <stashbot>	 T385542: [testwiki-wmf.15] Add  link inspector elements are misaligned - https://phabricator.wikimedia.org/T385542
[10:45:43] <urbanecm>	 fix should be deployed
[10:45:44] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1251 (re)pooling @ 5%: Pooling host to 5%', diff saved to https://phabricator.wikimedia.org/P73215 and previous config saved to /var/cache/conftool/dbconfig/20250205-104543-fceratto.json
[10:45:54] <urbanecm>	 jnuche: fyi, in case you want to do something other train related
[10:47:20] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] mw-parsoid & mw-jobrunner serve 2% of traffic on PHP 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117511 (https://phabricator.wikimedia.org/T383845) (owner: 10Effie Mouzeli)
[10:47:33] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling after cloning db1251', diff saved to https://phabricator.wikimedia.org/P73216 and previous config saved to /var/cache/conftool/dbconfig/20250205-104732-fceratto.json
[10:47:43] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1237', diff saved to https://phabricator.wikimedia.org/P73217 and previous config saved to /var/cache/conftool/dbconfig/20250205-104742-marostegui.json
[10:47:49] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] mw-parsoid & mw-jobrunner serve 2% of traffic on PHP 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117511 (https://phabricator.wikimedia.org/T383845) (owner: 10Effie Mouzeli)
[10:48:04] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] shellbox: all replicas on PHP 8.1 (score) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117506 (https://phabricator.wikimedia.org/T377038) (owner: 10Effie Mouzeli)
[10:48:24] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] shellbox: all replicas on PHP 8.1 (score) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117506 (https://phabricator.wikimedia.org/T377038) (owner: 10Effie Mouzeli)
[10:49:05] <wikibugs>	 (03PS2) 10Effie Mouzeli: shellbox-media: 1 replica on 8.1 for each DC [deployment-charts] - 10https://gerrit.wikimedia.org/r/1116838 (https://phabricator.wikimedia.org/T377038)
[10:49:21] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] shellbox-media: 1 replica on 8.1 for each DC [deployment-charts] - 10https://gerrit.wikimedia.org/r/1116838 (https://phabricator.wikimedia.org/T377038) (owner: 10Effie Mouzeli)
[10:51:02] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.clone of db1237.eqiad.wmnet onto db1179.eqiad.wmnet
[10:58:42] <effie>	 urbanecm: anything outstanding train wise?
[10:59:05] <jnuche>	 urbanecm: ach, thx for the headsup
[10:59:08] <urbanecm>	 effie: not from my side, but i'm not the conductor
[10:59:17] <jnuche>	 effie: nope, nothing from my side
[10:59:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73218 and previous config saved to /var/cache/conftool/dbconfig/20250205-105928-root.json
[11:00:04] <jouncebot>	 effie and swfrench-wmf: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki infrastructure (UTC mid-day). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T1100).
[11:00:42] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:01:39] <wikibugs>	 (03CR) 10Vgutierrez: "looks good but /querybuilder currently downgrades requests to http:// for 301s even if `X-Forwarded-Proto` is set to `https`, well-known U" [puppet] - 10https://gerrit.wikimedia.org/r/1117498 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[11:02:12] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - thanos-query_443: Servers titan1001.eqiad.wmnet are marked down but pooled: thanos-web_443: Servers titan1001.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[11:02:57] <jinxer-wm>	 FIRING: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:03:07] <logmsgbot>	 !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2221.codfw.wmnet with reason: Index rebuild
[11:03:09] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] trafficserver: move /querybuilder before catch-all [puppet] - 10https://gerrit.wikimedia.org/r/1117498 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[11:03:12] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[11:03:16] <logmsgbot>	 !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Index rebuild
[11:04:02] <wikibugs>	 (03CR) 10Hnowlan: "mostly lgtm, some style nice-to-haves" [puppet] - 10https://gerrit.wikimedia.org/r/1117222 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[11:05:03] <wikibugs>	 (03CR) 10Jelto: [C:03+2] trafficserver: move /querybuilder before catch-all [puppet] - 10https://gerrit.wikimedia.org/r/1117498 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[11:05:55] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] "lgtm once the puppet change is in" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117503 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[11:06:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P73219 and previous config saved to /var/cache/conftool/dbconfig/20250205-110628-marostegui.json
[11:07:31] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1251 (re)pooling @ 7%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73220 and previous config saved to /var/cache/conftool/dbconfig/20250205-110731-fceratto.json
[11:07:57] <jinxer-wm>	 RESOLVED: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:11:00] <wikibugs>	 (03PS22) 10Clément Goubert: mediawiki: Add kubernetes periodic job support [puppet] - 10https://gerrit.wikimedia.org/r/1117222 (https://phabricator.wikimedia.org/T385596)
[11:11:00] <wikibugs>	 (03PS12) 10Clément Goubert: mediawiki: Migrate one dry-run job to kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963)
[11:11:53] <godog>	 !log bounce thanos-query on titan1002
[11:11:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:13:15] <wikibugs>	 (03PS23) 10Clément Goubert: mediawiki: Add kubernetes periodic job support [puppet] - 10https://gerrit.wikimedia.org/r/1117222 (https://phabricator.wikimedia.org/T385596)
[11:13:15] <wikibugs>	 (03PS13) 10Clément Goubert: mediawiki: Migrate one dry-run job to kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963)
[11:14:04] <wikibugs>	 (03CR) 10Clément Goubert: mediawiki: Add kubernetes periodic job support (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1117222 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[11:14:42] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] mediawiki: Add kubernetes periodic job support (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1117222 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[11:14:43] <wikibugs>	 (03CR) 10Clément Goubert: mediawiki: Add kubernetes periodic job support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1117222 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[11:14:50] <wikibugs>	 (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1117222 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[11:15:42] <jinxer-wm>	 RESOLVED: [3x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:22:37] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1251 (re)pooling @ 10%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73221 and previous config saved to /var/cache/conftool/dbconfig/20250205-112236-fceratto.json
[11:22:41] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] shellbox: all replicas on PHP 8.1 (score) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117506 (https://phabricator.wikimedia.org/T377038) (owner: 10Effie Mouzeli)
[11:24:12] <wikibugs>	 (03Merged) 10jenkins-bot: shellbox: all replicas on PHP 8.1 (score) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117506 (https://phabricator.wikimedia.org/T377038) (owner: 10Effie Mouzeli)
[11:24:34] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mw-parsoid & mw-jobrunner serve 2% of traffic on PHP 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117511 (https://phabricator.wikimedia.org/T383845) (owner: 10Effie Mouzeli)
[11:25:09] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox: apply
[11:25:50] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] mediawiki: Add kubernetes periodic job support [puppet] - 10https://gerrit.wikimedia.org/r/1117222 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[11:25:54] <wikibugs>	 (03Merged) 10jenkins-bot: mw-parsoid & mw-jobrunner serve 2% of traffic on PHP 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117511 (https://phabricator.wikimedia.org/T383845) (owner: 10Effie Mouzeli)
[11:25:59] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox: apply
[11:27:16] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
[11:27:30] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
[11:28:03] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
[11:28:26] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
[11:31:00] <logmsgbot>	 !log fnegri@cumin1002 conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
[11:31:04] <logmsgbot>	 !log fnegri@cumin1002 conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=31
[11:31:10] <logmsgbot>	 !log fnegri@cumin1002 conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
[11:31:44] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s2 on db1155 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 7085.06 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:31:54] <logmsgbot>	 !log fnegri@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1017.eqiad.wmnet with reason: Rebooting clouddb1017 T384946
[11:32:26] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1155.eqiad.wmnet with reason: Rebuild tables
[11:32:44] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s2 on clouddb1014 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 7144.82 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:33:18] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Rebuild tables
[11:33:39] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1014.eqiad.wmnet with reason: Rebuild tables
[11:33:51] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb1018.eqiad.wmnet with reason: Rebuild tables
[11:34:06] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
[11:34:14] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb1014.eqiad.wmnet with reason: Rebuild tables
[11:34:27] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
[11:36:05] <wikibugs>	 (03PS2) 10Clément Goubert: mw-cron: Add puppet-defined periodic jobs file [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117503 (https://phabricator.wikimedia.org/T385596)
[11:37:42] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1251 (re)pooling @ 15%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73222 and previous config saved to /var/cache/conftool/dbconfig/20250205-113741-fceratto.json
[11:37:56] <wikibugs>	 (03PS14) 10Clément Goubert: mediawiki: Migrate one dry-run job to kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963)
[11:37:56] <wikibugs>	 (03PS1) 10Clément Goubert: kubernetes_periodic_job: Fix title in job template [puppet] - 10https://gerrit.wikimedia.org/r/1117516 (https://phabricator.wikimedia.org/T385596)
[11:38:06] <logmsgbot>	 !log fnegri@cumin1002 START - Cookbook sre.hosts.reboot-single for host clouddb1017.eqiad.wmnet
[11:39:38] <wikibugs>	 (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1117516 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[11:39:45] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Extend sre.network.configure-switch-interfaces cookbook to add sflow and qos config - https://phabricator.wikimedia.org/T379549#10524854 (10cmooney) 05Open→03Resolved
[11:39:56] <wikibugs>	 (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963) (owner: 10Clément Goubert)
[11:41:29] <logmsgbot>	 !log fnegri@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1017.eqiad.wmnet
[11:41:42] <icinga-wm>	 PROBLEM - mysqld processes on clouddb1017 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[11:41:46] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s1 on clouddb1017 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:41:46] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: s3 on clouddb1017 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:41:46] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: s1 on clouddb1017 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:41:46] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s3 on clouddb1017 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:41:48] <icinga-wm>	 PROBLEM - MariaDB read only s3 on clouddb1017 is CRITICAL: Could not connect to localhost:3313 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[11:41:48] <icinga-wm>	 PROBLEM - MariaDB read only wikireplica-s3 on clouddb1017 is CRITICAL: Could not connect to localhost:3313 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[11:41:48] <icinga-wm>	 PROBLEM - MariaDB read only s1 on clouddb1017 is CRITICAL: Could not connect to localhost:3311 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[11:41:48] <icinga-wm>	 PROBLEM - MariaDB read only wikireplica-s1 on clouddb1017 is CRITICAL: Could not connect to localhost:3311 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[11:41:59] <marostegui>	 dhinus: that's you right ^?
[11:42:25] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
[11:42:37] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
[11:42:45] <dhinus>	 marostegui: yep
[11:42:53] <dhinus>	 I thought I silenced it though
[11:44:05] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] kubernetes_periodic_job: Fix title in job template [puppet] - 10https://gerrit.wikimedia.org/r/1117516 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[11:45:44] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s3 on clouddb1017 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:46:44] <wikibugs>	 (03PS1) 10Marostegui: x1: Change format to STATEMENT [puppet] - 10https://gerrit.wikimedia.org/r/1117517 (https://phabricator.wikimedia.org/T385645)
[11:46:46] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on clouddb1017 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:47:17] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] x1: Change format to STATEMENT [puppet] - 10https://gerrit.wikimedia.org/r/1117517 (https://phabricator.wikimedia.org/T385645) (owner: 10Marostegui)
[11:48:28] <marostegui>	 dhinus: Lately my impression is that lots of downtimes get lost
[11:49:19] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117518
[11:49:29] <dhinus>	 marostegui: there is definitely something odd: https://sal.toolforge.org/log/4zvh1ZQBffdvpiTrhsuR
[11:49:43] <dhinus>	 it should be downtimed for 1 hour
[11:49:59] <dhinus>	 "Created silence ID 266a2b12-14a7-4728-ad13-d4309d19dfd6"
[11:50:31] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Homer trying to delete BGP peerings for VMs on new Eqiad ganeti nodes - https://phabricator.wikimedia.org/T381175#10524944 (10cmooney) 05Open→03Resolved >>! In T381175#10520327, @ayounsi wrote: > For (1) we can have the `sre.ganeti.addnode` cookbook call...
[11:50:56] <wikibugs>	 (03PS1) 10Filippo Giunchedi: statograph: update mw edit rate to use thanos [puppet] - 10https://gerrit.wikimedia.org/r/1117519 (https://phabricator.wikimedia.org/T383963)
[11:51:04] <marostegui>	 dhinus: Yeah, I've had the same for a few days
[11:51:07] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117520
[11:52:42] <icinga-wm>	 RECOVERY - mysqld processes on clouddb1017 is OK: PROCS OK: 2 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[11:52:44] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: s1 on clouddb1017 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:52:44] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s1 on clouddb1017 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:52:46] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: s3 on clouddb1017 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:52:46] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s3 on clouddb1017 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:52:47] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1251 (re)pooling @ 20%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73223 and previous config saved to /var/cache/conftool/dbconfig/20250205-115247-fceratto.json
[11:52:50] <icinga-wm>	 RECOVERY - MariaDB read only s3 on clouddb1017 is OK: Version 10.6.20-MariaDB, Uptime 59s, read_only: True, event_scheduler: False, 380.72 QPS, connection latency: 0.015028s, query latency: 0.000355s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[11:52:50] <icinga-wm>	 RECOVERY - MariaDB read only wikireplica-s1 on clouddb1017 is OK: Version 10.6.20-MariaDB, Uptime 56s, read_only: True, event_scheduler: False, 940.20 QPS, connection latency: 0.023746s, query latency: 0.000396s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[11:52:50] <icinga-wm>	 RECOVERY - MariaDB read only s1 on clouddb1017 is OK: Version 10.6.20-MariaDB, Uptime 56s, read_only: True, event_scheduler: False, 976.20 QPS, connection latency: 0.020599s, query latency: 0.000470s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[11:52:50] <icinga-wm>	 RECOVERY - MariaDB read only wikireplica-s3 on clouddb1017 is OK: Version 10.6.20-MariaDB, Uptime 59s, read_only: True, event_scheduler: False, 380.17 QPS, connection latency: 0.014857s, query latency: 0.000339s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[11:52:55] <wikibugs>	 (03CR) 10Fabfur: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[11:52:57] <wikibugs>	 (03PS1) 10Ladsgroup: Set categorylinks to write both everywhere except commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117521 (https://phabricator.wikimedia.org/T385164)
[11:53:44] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s3 on clouddb1017 is OK: OK slave_sql_lag Replication lag: 0.34 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:53:51] <wikibugs>	 (03PS15) 10Clément Goubert: mediawiki: Migrate one dry-run job to kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963)
[11:54:06] <wikibugs>	 (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963) (owner: 10Clément Goubert)
[11:54:10] <marostegui>	 dhinus: can I start rebuilding indexes on clouddb1017?
[11:54:38] <dhinus>	 I've just restarted mariadb there, and restarted replication, so I think yes!
[11:54:46] <marostegui>	 thank you!
[11:55:46] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on clouddb1017 is OK: OK slave_sql_lag Replication lag: 0.27 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:56:16] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1017.eqiad.wmnet with reason: Rebuild tables
[11:58:12] <wikibugs>	 (03PS1) 10Clément Goubert: mediawiki::periodic_job: Fix kubernetes conditional [puppet] - 10https://gerrit.wikimedia.org/r/1117522 (https://phabricator.wikimedia.org/T385596)
[11:58:14] <wikibugs>	 (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1117522 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[12:00:05] <jouncebot>	 mvolz: Your horoscope predicts another Services – Citoid / Zotero deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T1200).
[12:00:34] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [codfw] START helmfile.d/services/mw-cron: apply
[12:00:38] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
[12:00:47] <logmsgbot>	 !log fnegri@cumin1002 conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
[12:00:54] <logmsgbot>	 !log fnegri@cumin1002 conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
[12:01:19] <wikibugs>	 (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117520 (owner: 10PipelineBot)
[12:01:51] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] mediawiki::periodic_job: Fix kubernetes conditional [puppet] - 10https://gerrit.wikimedia.org/r/1117522 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[12:02:29] <wikibugs>	 (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117520 (owner: 10PipelineBot)
[12:03:02] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply
[12:03:41] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply
[12:04:04] <icinga-wm>	 PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:04:04] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:05:00] <logmsgbot>	 !log mvolz@deploy2002 helmfile [codfw] START helmfile.d/services/citoid: apply
[12:06:30] <logmsgbot>	 !log mvolz@deploy2002 helmfile [codfw] DONE helmfile.d/services/citoid: apply
[12:07:16] <wikibugs>	 06SRE, 06serviceops, 10Wikidata, 10Wikidata Integration in Wikimedia projects, 10Wikimedia-Site-requests: Increase entityAccessLimit for WikibaseClient wikis - https://phabricator.wikimedia.org/T384455#10525034 (10Marostegui) I am tagging #serviceops here to see if this is something they can help with.
[12:07:25] <wikibugs>	 (03PS1) 10Hnowlan: mediawiki: miscellaneous bits of jobrunner cleanup [puppet] - 10https://gerrit.wikimedia.org/r/1117525 (https://phabricator.wikimedia.org/T354791)
[12:07:53] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1251 (re)pooling @ 25%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73224 and previous config saved to /var/cache/conftool/dbconfig/20250205-120752-fceratto.json
[12:08:38] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] mw-cron: Add puppet-defined periodic jobs file [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117503 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[12:09:26] <logmsgbot>	 !log mvolz@deploy2002 helmfile [eqiad] START helmfile.d/services/citoid: apply
[12:09:44] <wikibugs>	 (03Merged) 10jenkins-bot: mw-cron: Add puppet-defined periodic jobs file [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117503 (https://phabricator.wikimedia.org/T385596) (owner: 10Clément Goubert)
[12:09:57] <logmsgbot>	 !log mvolz@deploy2002 helmfile [eqiad] DONE helmfile.d/services/citoid: apply
[12:10:05] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "I should have done this earlier good shout." [puppet] - 10https://gerrit.wikimedia.org/r/1117154 (https://phabricator.wikimedia.org/T382518) (owner: 10Ayounsi)
[12:12:10] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-cron: apply
[12:12:12] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
[12:12:22] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [codfw] START helmfile.d/services/mw-cron: apply
[12:12:28] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
[12:14:48] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on an-redacteddb1001 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:15:26] <wikibugs>	 (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117518 (owner: 10PipelineBot)
[12:15:30] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:17:25] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Remove eqiad and eqsin ripe atlas from monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1117154 (https://phabricator.wikimedia.org/T382518) (owner: 10Ayounsi)
[12:17:42] <icinga-wm>	 PROBLEM - BGP status on cr2-eqsin is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:20:01] <wikibugs>	 (03CR) 10Nikerabbit: [C:03+1] Make MT limit more strict by 10 Percentage Point in Bhojpuri Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117113 (https://phabricator.wikimedia.org/T383789) (owner: 10KartikMistry)
[12:22:58] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1251 (re)pooling @ 30%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73225 and previous config saved to /var/cache/conftool/dbconfig/20250205-122257-fceratto.json
[12:24:10] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] mediawiki: miscellaneous bits of jobrunner cleanup [puppet] - 10https://gerrit.wikimedia.org/r/1117525 (https://phabricator.wikimedia.org/T354791) (owner: 10Hnowlan)
[12:38:03] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1251 (re)pooling @ 35%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73226 and previous config saved to /var/cache/conftool/dbconfig/20250205-123803-fceratto.json
[12:41:57] <wikibugs>	 (03PS1) 10Arnaudb: rt: removing email configurations [puppet] - 10https://gerrit.wikimedia.org/r/1117528 (https://phabricator.wikimedia.org/T384595)
[12:42:10] <wikibugs>	 (03PS1) 10Arnaudb: rt: removing informations about moscovium [puppet] - 10https://gerrit.wikimedia.org/r/1117529 (https://phabricator.wikimedia.org/T384595)
[12:42:46] <wikibugs>	 (03Abandoned) 10D3r1ck01: SUL3: Allow temp users to authenticate (login/signup) via the API [extensions/CentralAuth] (wmf/1.44.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1115106 (https://phabricator.wikimedia.org/T384523) (owner: 10D3r1ck01)
[12:42:50] <wikibugs>	 (03CR) 10Arnaudb: "this should be merged after the decommission cookbook is run on moscovium" [puppet] - 10https://gerrit.wikimedia.org/r/1117529 (https://phabricator.wikimedia.org/T384595) (owner: 10Arnaudb)
[12:46:44] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1237.eqiad.wmnet onto db1179.eqiad.wmnet
[12:48:04] <wikibugs>	 (03PS1) 10Marostegui: db1179: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1117532 (https://phabricator.wikimedia.org/T385645)
[12:48:44] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1179: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1117532 (https://phabricator.wikimedia.org/T385645) (owner: 10Marostegui)
[12:50:30] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.clone of db1237.eqiad.wmnet onto db1179.eqiad.wmnet
[12:51:32] <wikibugs>	 (03PS1) 10Marostegui: db1237: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1117533 (https://phabricator.wikimedia.org/T385645)
[12:52:07] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1237: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1117533 (https://phabricator.wikimedia.org/T385645) (owner: 10Marostegui)
[12:52:47] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s2 on clouddb1014 is OK: OK slave_sql_lag Replication lag: 0.24 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:52:47] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s2 on clouddb1018 is OK: OK slave_sql_lag Replication lag: 0.25 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:52:47] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s2 on db1155 is OK: OK slave_sql_lag Replication lag: 0.43 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:52:59] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s2 on an-redacteddb1001 is OK: OK slave_sql_lag Replication lag: 0.48 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:53:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73227 and previous config saved to /var/cache/conftool/dbconfig/20250205-125259-root.json
[12:53:09] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1251 (re)pooling @ 50%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73228 and previous config saved to /var/cache/conftool/dbconfig/20250205-125308-fceratto.json
[12:54:00] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 10observability, and 3 others: Prevent BGP alerts triggering when K8s host maintenance is being done - https://phabricator.wikimedia.org/T384731#10525215 (10cmooney) >>! In T384731#10516013, @ayounsi wrote: > An alternative (or short term solution until the ab...
[12:56:10] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 10Observability-Alerting: Migrate port utilisation alert from LibreNMS to alertmanager - https://phabricator.wikimedia.org/T384052#10525220 (10cmooney) >>! In T384052#10516521, @ayounsi wrote: > I'm wondering if we could re-write the "instance" in Prometheus t...
[12:57:08] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service ml-staging-ctrl2002:6443 has failed probes (http_ml_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ml-staging-ctrl2002:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:59:32] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service ml-staging-ctrl2002:6443 has failed probes (http_ml_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ml-staging-ctrl2002:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:59:55] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:08:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73230 and previous config saved to /var/cache/conftool/dbconfig/20250205-130804-root.json
[13:08:14] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1251 (re)pooling @ 75%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73231 and previous config saved to /var/cache/conftool/dbconfig/20250205-130813-fceratto.json
[13:09:16] <wikibugs>	 (03PS1) 10MVernon: swift: remove drained codfw nodes from the rings [puppet] - 10https://gerrit.wikimedia.org/r/1117535 (https://phabricator.wikimedia.org/T382056)
[13:09:18] <wikibugs>	 (03PS1) 10MVernon: swift: remove ms-be205[1-6] from profile::swift::storagehosts [puppet] - 10https://gerrit.wikimedia.org/r/1117536 (https://phabricator.wikimedia.org/T382056)
[13:10:51] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Modifications to CR BGP policy for eqiad cloud-private IPv6 aggregate (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/1112268 (https://phabricator.wikimedia.org/T37947) (owner: 10Cathal Mooney)
[13:10:54] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] swift: remove ms-be205[1-6] from profile::swift::storagehosts [puppet] - 10https://gerrit.wikimedia.org/r/1117536 (https://phabricator.wikimedia.org/T382056) (owner: 10MVernon)
[13:11:07] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] swift: remove drained codfw nodes from the rings [puppet] - 10https://gerrit.wikimedia.org/r/1117535 (https://phabricator.wikimedia.org/T382056) (owner: 10MVernon)
[13:11:15] <icinga-wm>	 PROBLEM - BGP status on cr3-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:11:37] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:13:07] <wikibugs>	 (03PS1) 10Cathal Mooney: Add semicolon to end of prefix in cloud6 prefix list [homer/public] - 10https://gerrit.wikimedia.org/r/1117538 (https://phabricator.wikimedia.org/T37947)
[13:13:29] <wikibugs>	 (03PS16) 10Clément Goubert: mediawiki: Migrate one dry-run job to kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963)
[13:14:53] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Use FIDO2 ssh keys for production access - https://phabricator.wikimedia.org/T385229#10525301 (10cmooney) >>! In T385229#10520528, @taavi wrote: > FWIW, this is possible as of today, my account for example is exclusively using them for Bullseye+ hosts....
[13:14:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247 (T384592)', diff saved to https://phabricator.wikimedia.org/P73232 and previous config saved to /var/cache/conftool/dbconfig/20250205-131456-marostegui.json
[13:15:00] <stashbot>	 T384592: Add normalization columns to categorylinks table - https://phabricator.wikimedia.org/T384592
[13:15:43] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mediawiki: Migrate one dry-run job to kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963) (owner: 10Clément Goubert)
[13:17:29] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on clouddb1013 is OK: OK slave_sql_lag Replication lag: 0.18 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[13:17:49] <wikibugs>	 (03PS17) 10Clément Goubert: mediawiki: Migrate one dry-run job to kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963)
[13:18:12] <wikibugs>	 (03CR) 10MVernon: [C:03+2] swift: remove drained codfw nodes from the rings [puppet] - 10https://gerrit.wikimedia.org/r/1117535 (https://phabricator.wikimedia.org/T382056) (owner: 10MVernon)
[13:21:16] <wikibugs>	 (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963) (owner: 10Clément Goubert)
[13:21:29] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: add per user breakdown to mw edit rates [puppet] - 10https://gerrit.wikimedia.org/r/1117539 (https://phabricator.wikimedia.org/T383963)
[13:22:19] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1116846 (https://phabricator.wikimedia.org/T383902) (owner: 10Jcrespo)
[13:23:10] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73233 and previous config saved to /var/cache/conftool/dbconfig/20250205-132309-root.json
[13:23:19] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'db1251 (re)pooling @ 100%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73234 and previous config saved to /var/cache/conftool/dbconfig/20250205-132319-fceratto.json
[13:24:23] <logmsgbot>	 !log klausman@deploy2002 helmfile [staging] START helmfile.d/services/changeprop: apply
[13:24:27] <logmsgbot>	 !log klausman@deploy2002 helmfile [staging] DONE helmfile.d/services/changeprop: apply
[13:24:47] <logmsgbot>	 !log klausman@deploy2002 helmfile [codfw] START helmfile.d/services/changeprop: apply
[13:25:25] <logmsgbot>	 !log klausman@deploy2002 helmfile [codfw] DONE helmfile.d/services/changeprop: apply
[13:27:54] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] prometheus: add per user breakdown to mw edit rates [puppet] - 10https://gerrit.wikimedia.org/r/1117539 (https://phabricator.wikimedia.org/T383963) (owner: 10Filippo Giunchedi)
[13:29:59] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] prometheus: add per user breakdown to mw edit rates [puppet] - 10https://gerrit.wikimedia.org/r/1117539 (https://phabricator.wikimedia.org/T383963) (owner: 10Filippo Giunchedi)
[13:30:03] <wikibugs>	 (03PS2) 10Filippo Giunchedi: prometheus: add per user breakdown to mw edit rates [puppet] - 10https://gerrit.wikimedia.org/r/1117539 (https://phabricator.wikimedia.org/T383963)
[13:30:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P73235 and previous config saved to /var/cache/conftool/dbconfig/20250205-133003-marostegui.json
[13:30:15] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+2 C:03+2] prometheus: add per user breakdown to mw edit rates [puppet] - 10https://gerrit.wikimedia.org/r/1117539 (https://phabricator.wikimedia.org/T383963) (owner: 10Filippo Giunchedi)
[13:31:29] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Add semicolon to end of prefix in cloud6 prefix list [homer/public] - 10https://gerrit.wikimedia.org/r/1117538 (https://phabricator.wikimedia.org/T37947) (owner: 10Cathal Mooney)
[13:32:08] <jinxer-wm>	 FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-jobrunner_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:32:39] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10525385 (10Andrew) This is flapping like crazy, I ack'd it before bed last night but have another 15 alert messages this morning.
[13:34:27] <wikibugs>	 (03CR) 10Elukey: Add interative.ask_yesno (032 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1115767 (owner: 10JMeybohm)
[13:34:37] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-jobrunner_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-jobrunner_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[13:37:08] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service ml-staging-ctrl2001:6443 has failed probes (http_ml_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ml-staging-ctrl2001:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:38:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73236 and previous config saved to /var/cache/conftool/dbconfig/20250205-133815-root.json
[13:39:32] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service ml-staging-ctrl2001:6443 has failed probes (http_ml_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ml-staging-ctrl2001:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:45:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P73237 and previous config saved to /var/cache/conftool/dbconfig/20250205-134510-marostegui.json
[13:45:48] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): Add sourceswiki to $wgImportSources for all Wikisources (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117204 (https://phabricator.wikimedia.org/T385591) (owner: 10Jon Harald Søby)
[13:47:50] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "Yesterday’s change for a draft namespace, I4ebe6927ae, also added the namespace to `wmgExemptFromUserRobotsControlExtra` – would that make" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117321 (https://phabricator.wikimedia.org/T385593) (owner: 10Anzx)
[13:48:55] <wikibugs>	 (03CR) 10Kamila Součková: "I have questions!" [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963) (owner: 10Clément Goubert)
[13:49:37] <jynus>	 !log deploy removal of old hosts for the m1 dbbackups backup user T383871
[13:49:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:40] <stashbot>	 T383871: decommission dbprov1001, dbprov1002 - https://phabricator.wikimedia.org/T383871
[13:49:44] <wikibugs>	 (03CR) 10Kamila Součková: "[marking as not resolved]" [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963) (owner: 10Clément Goubert)
[13:52:54] <wikibugs>	 (03PS5) 10Jcrespo: dbbackups: Remove last references to dbprov[12]00[12] [puppet] - 10https://gerrit.wikimedia.org/r/1116846 (https://phabricator.wikimedia.org/T383902)
[13:53:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73238 and previous config saved to /var/cache/conftool/dbconfig/20250205-135320-root.json
[13:57:16] <wikibugs>	 (03CR) 10Anzx: "i think it should be done if community request, good to it as it is for now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117321 (https://phabricator.wikimedia.org/T385593) (owner: 10Anzx)
[13:57:39] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Add semicolon to end of prefix in cloud6 prefix list [homer/public] - 10https://gerrit.wikimedia.org/r/1117538 (https://phabricator.wikimedia.org/T37947) (owner: 10Cathal Mooney)
[13:58:16] <wikibugs>	 (03Merged) 10jenkins-bot: Add semicolon to end of prefix in cloud6 prefix list [homer/public] - 10https://gerrit.wikimedia.org/r/1117538 (https://phabricator.wikimedia.org/T37947) (owner: 10Cathal Mooney)
[14:00:04] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: It is that lovely time of the day again! You are hereby commanded to deploy UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T1400).
[14:00:05] <jouncebot>	 Jhs and anzx: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247 (T384592)', diff saved to https://phabricator.wikimedia.org/P73240 and previous config saved to /var/cache/conftool/dbconfig/20250205-140017-marostegui.json
[14:00:20] <stashbot>	 T384592: Add normalization columns to categorylinks table - https://phabricator.wikimedia.org/T384592
[14:00:32] <Lucas_WMDE>	 o/
[14:00:32] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1248.eqiad.wmnet with reason: Maintenance
[14:00:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1248 (T384592)', diff saved to https://phabricator.wikimedia.org/P73241 and previous config saved to /var/cache/conftool/dbconfig/20250205-140039-marostegui.json
[14:02:31] <anzx>	 o/
[14:02:54] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "Well, the community did request “web indexing: not indexed” according to the task." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117321 (https://phabricator.wikimedia.org/T385593) (owner: 10Anzx)
[14:04:12] <wikibugs>	 (03PS2) 10Jon Harald Søby: Add sourceswiki to $wgImportSources for all Wikisources [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117204 (https://phabricator.wikimedia.org/T385591)
[14:05:01] <anzx>	 Lucas_WMDE: i will update my patch
[14:05:46] <Lucas_WMDE>	 ok
[14:05:51] * Lucas_WMDE looks at Jhs PS2
[14:05:51] <wikibugs>	 (03CR) 10Jon Harald Søby: Add sourceswiki to $wgImportSources for all Wikisources (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117204 (https://phabricator.wikimedia.org/T385591) (owner: 10Jon Harald Søby)
[14:07:24] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] Add sourceswiki to $wgImportSources for all Wikisources (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117204 (https://phabricator.wikimedia.org/T385591) (owner: 10Jon Harald Søby)
[14:08:33] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops, 06Infrastructure-Foundations: Perform fake disk swap on ms-be2088 as test - https://phabricator.wikimedia.org/T384003#10525488 (10elukey) @Neobeta61 Hi! I just followed up on the email threads, I didn't get any response so far, I tried to summarize my un...
[14:08:49] <Lucas_WMDE>	 Jhs: are you ready for the deployment window?
[14:08:56] <Jhs>	 Lucas_WMDE, yup
[14:09:00] <Lucas_WMDE>	 ok, then let’s start
[14:09:11] <wikibugs>	 (03CR) 10Jon Harald Søby: Add sourceswiki to $wgImportSources for all Wikisources (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117204 (https://phabricator.wikimedia.org/T385591) (owner: 10Jon Harald Søby)
[14:09:47] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117204 (https://phabricator.wikimedia.org/T385591) (owner: 10Jon Harald Søby)
[14:10:33] <wikibugs>	 (03Merged) 10jenkins-bot: Add sourceswiki to $wgImportSources for all Wikisources [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117204 (https://phabricator.wikimedia.org/T385591) (owner: 10Jon Harald Søby)
[14:11:00] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1117204|Add sourceswiki to $wgImportSources for all Wikisources (T385591)]]
[14:11:02] <stashbot>	 T385591: $wgImportSources for Wikisources should include the multilingual Wikisource by default - https://phabricator.wikimedia.org/T385591
[14:11:04] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): Add sourceswiki to $wgImportSources for all Wikisources (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117204 (https://phabricator.wikimedia.org/T385591) (owner: 10Jon Harald Søby)
[14:11:12] <wikibugs>	 (03PS3) 10Anzx: kywiki: create draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117321 (https://phabricator.wikimedia.org/T385593)
[14:11:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73243 and previous config saved to /var/cache/conftool/dbconfig/20250205-141131-root.json
[14:12:56] <wikibugs>	 (03CR) 10Anzx: "Done" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117321 (https://phabricator.wikimedia.org/T385593) (owner: 10Anzx)
[14:14:22] <Lucas_WMDE>	 one of the checks failed, https://movementroles.wikimedia.org/wiki/Main_Page gave 503
[14:14:27] <Lucas_WMDE>	 retrying
[14:14:59] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 jhsoby, lucaswerkmeister-wmde: Backport for [[gerrit:1117204|Add sourceswiki to $wgImportSources for all Wikisources (T385591)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:15:08] <Lucas_WMDE>	 now it worked 🤷
[14:15:10] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: mediawiki: introduce feature flags [deployment-charts] - 10https://gerrit.wikimedia.org/r/1116639
[14:15:10] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Add the networkpolicy feature flag [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117225
[14:15:11] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: mediawiki-common: introduce chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117547
[14:15:11] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Add a mediawiki-common release to mw-script [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117548
[14:15:41] <Amir1>	 why everything had a private wiki
[14:16:08] <James_F>	 It was 2008, it was the cool thing.
[14:16:13] <Jhs>	 Amir1, cause when you were in a private wiki, you were *the shit*
[14:16:32] <Amir1>	 :D
[14:16:33] <Jhs>	 i remember getting access to internalwiki back in 2006, and i was indeed the shit
[14:16:41] <Amir1>	 lol
[14:17:50] <Lucas_WMDE>	 can’t find the error in logstash
[14:17:58] <Lucas_WMDE>	 do private wikis not send errors to logstash?
[14:18:11] <Amir1>	 they should, I've seen some from officewiki
[14:18:16] <Lucas_WMDE>	 (it’s probably safe to ignore but I’d like to know what’s going on)
[14:18:20] <Lucas_WMDE>	 Jhs: please test, by the way ^^
[14:18:41] <Jhs>	 Lucas_WMDE, already on it, works like a charm so far
[14:18:49] <Lucas_WMDE>	 \o/
[14:20:35] <Lucas_WMDE>	 would be nice if httpbb dropped the test output somewhere in /tmp, I think
[14:20:37] <Lucas_WMDE>	 “Body: expected to contain 'Movement Roles', got '<!DOCTYPE html>\n<html lang="en">\n<meta charset="ut'... (1953 characters total).”
[14:20:50] <Lucas_WMDE>	 there’s probably a semi-useful error message or request ID somewhere in that “...” :S
[14:23:14] <wikibugs>	 (03PS2) 10Cathal Mooney: Add FIDO2-based ssh keys for user cmooney [puppet] - 10https://gerrit.wikimedia.org/r/1115495 (https://phabricator.wikimedia.org/T385229)
[14:26:37] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73244 and previous config saved to /var/cache/conftool/dbconfig/20250205-142636-root.json
[14:29:18] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] dbbackups: Remove last references to dbprov[12]00[12] [puppet] - 10https://gerrit.wikimedia.org/r/1116846 (https://phabricator.wikimedia.org/T383902) (owner: 10Jcrespo)
[14:29:32] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: httpbb_kubernetes_mw-jobrunner_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:31:04] <Lucas_WMDE>	 noooo stashbot
[14:31:14] <Lucas_WMDE>	 Jhs: just to confirm, are you still testing? 😅
[14:31:29] <Jhs>	 Lucas_WMDE, sorry, i'm done
[14:31:35] <Lucas_WMDE>	 ok ^^
[14:31:40] <Lucas_WMDE>	 but lemme resurrect stashbot before I continue
[14:31:52] <Jhs>	 i moved on to continue writing the JavaScript I needed that change for 😁
[14:31:54] <Lucas_WMDE>	 nothing of note in its kubectl log, as usual…
[14:31:56] <Lucas_WMDE>	 :D
[14:33:26] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 jhsoby, lucaswerkmeister-wmde: Continuing with sync
[14:34:37] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-jobrunner_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-jobrunner_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[14:36:37] <wikibugs>	 (03PS1) 10CDanis: webrequest-live: new X-Analytics Authorization subkey [puppet] - 10https://gerrit.wikimedia.org/r/1117550
[14:37:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:38:34] <wikibugs>	 (03CR) 10Elukey: [C:03+1] spicerack: extend run_cookbook() accessor [software/spicerack] - 10https://gerrit.wikimedia.org/r/1116818 (owner: 10Volans)
[14:39:16] <logmsgbot>	 !log klausman@deploy2002 helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
[14:39:46] <logmsgbot>	 !log klausman@deploy2002 helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
[14:40:01] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1117204|Add sourceswiki to $wgImportSources for all Wikisources (T385591)]] (duration: 29m 00s)
[14:40:03] <stashbot>	 T385591: $wgImportSources for Wikisources should include the multilingual Wikisource by default - https://phabricator.wikimedia.org/T385591
[14:40:17] <Lucas_WMDE>	 alright, let’s continue with anzx :)
[14:40:31] <anzx>	 ok
[14:40:45] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117321 (https://phabricator.wikimedia.org/T385593) (owner: 10Anzx)
[14:40:50] <wikibugs>	 (03PS5) 10Andrew Bogott: sysctl: Introduce base::sysctl::inotify helper [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[14:41:06] <wikibugs>	 (03CR) 10Elukey: [C:03+1] webrequest-live: new X-Analytics Authorization subkey [puppet] - 10https://gerrit.wikimedia.org/r/1117550 (owner: 10CDanis)
[14:41:28] <wikibugs>	 (03Merged) 10jenkins-bot: kywiki: create draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117321 (https://phabricator.wikimedia.org/T385593) (owner: 10Anzx)
[14:41:39] <wikibugs>	 (03CR) 10Andrew Bogott: "done" [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[14:41:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73245 and previous config saved to /var/cache/conftool/dbconfig/20250205-144141-root.json
[14:41:43] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[14:41:57] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1117321|kywiki: create draft namespace (T385593)]]
[14:42:00] <stashbot>	 T385593: New namespace ("Макала долбоору") for the Kyrgyz Wikipedia - https://phabricator.wikimedia.org/T385593
[14:43:15] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1237.eqiad.wmnet onto db1179.eqiad.wmnet
[14:43:50] <jynus>	 !log deploy new grants to analytics_meta T385565
[14:43:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:53] <stashbot>	 T385565: Some analytics_meta databases are not being backed up - https://phabricator.wikimedia.org/T385565
[14:44:04] <wikibugs>	 (03CR) 10Elukey: "Can you run pcc again to confirm :) ?" [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[14:44:59] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 anzx, lucaswerkmeister-wmde: Backport for [[gerrit:1117321|kywiki: create draft namespace (T385593)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:45:14] <anzx>	 Lucas_WMDE: checking
[14:45:51] <wikibugs>	 (03PS6) 10Andrew Bogott: sysctl: Introduce base::sysctl::inotify helper [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[14:45:59] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[14:46:01] <Lucas_WMDE>	 thanks!
[14:46:06] <anzx>	 Lucas_WMDE: looks good 
[14:46:21] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 anzx, lucaswerkmeister-wmde: Continuing with sync
[14:46:22] <Lucas_WMDE>	 \o/
[14:46:23] <wikibugs>	 (03CR) 10CDanis: [C:03+2] webrequest-live: new X-Analytics Authorization subkey [puppet] - 10https://gerrit.wikimedia.org/r/1117550 (owner: 10CDanis)
[14:46:24] <logmsgbot>	 !log cmooney@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cr2-magru with reason: IBGP instability from cr1 to cr2 in magru causing ping faulures from alert1002
[14:46:51] <wikibugs>	 (03CR) 10Elukey: sysctl: Introduce base::sysctl::inotify helper (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[14:48:56] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] webrequest-live: new X-Analytics Authorization subkey [puppet] - 10https://gerrit.wikimedia.org/r/1117550 (owner: 10CDanis)
[14:49:11] <wikibugs>	 10ops-magru, 06Infrastructure-Foundations, 10netops: Jan 2025 - Magru core router connectivity blips - https://phabricator.wikimedia.org/T384774#10525590 (10cmooney) I've added BFD to this particular session now.  Not that it will fix things but it should give us more datapoints for the (likely) case with Ju...
[14:49:14] <wikibugs>	 (03PS7) 10Andrew Bogott: sysctl: Introduce base::sysctl::inotify helper [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[14:52:20] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[14:52:52] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1117321|kywiki: create draft namespace (T385593)]] (duration: 10m 54s)
[14:52:55] <stashbot>	 T385593: New namespace ("Макала долбоору") for the Kyrgyz Wikipedia - https://phabricator.wikimedia.org/T385593
[14:53:33] <Lucas_WMDE>	 jouncebot: nowandnext
[14:53:33] <jouncebot>	 For the next 0 hour(s) and 6 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T1400)
[14:53:33] <jouncebot>	 In 0 hour(s) and 6 minute(s): Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T1500)
[14:53:36] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[14:53:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:53:38] <wikibugs>	 (03CR) 10Elukey: sysctl: Introduce base::sysctl::inotify helper (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[14:53:55] <Lucas_WMDE>	 I’d still love to deploy https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1116812 at some point if I can get a +1, but no need to overrun into the wikifunctions window for that ^^
[14:54:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2221 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73246 and previous config saved to /var/cache/conftool/dbconfig/20250205-145434-root.json
[14:54:55] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:56:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73247 and previous config saved to /var/cache/conftool/dbconfig/20250205-145647-root.json
[15:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T1500)
[15:00:08] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Upgrade orchestrator from 2025-01-28-144249 to 2025-02-03-215824 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117551 (https://phabricator.wikimedia.org/T379977)
[15:00:15] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Upgrade evaluators from 2025-01-29-140344 to 2025-01-30-011236 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117552
[15:01:24] <logmsgbot>	 !log jforrester@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[15:01:54] <logmsgbot>	 !log jforrester@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[15:04:27] <logmsgbot>	 !log jforrester@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[15:05:13] <logmsgbot>	 !log jforrester@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[15:05:24] <logmsgbot>	 !log jforrester@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[15:06:15] <logmsgbot>	 !log jforrester@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[15:06:21] <wikibugs>	 (03CR) 10Elukey: [C:03+1] sysctl: Introduce base::sysctl::inotify helper [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[15:06:39] <wikibugs>	 (03CR) 10Elukey: sysctl: Introduce base::sysctl::inotify helper [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[15:06:41] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Add a mediawiki-common release to mw-script [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117548
[15:06:51] <wikibugs>	 (03CR) 10Elukey: "need to check one thing first :)" [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[15:07:14] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] wikifunctions: Upgrade orchestrator from 2025-01-28-144249 to 2025-02-03-215824 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117551 (https://phabricator.wikimedia.org/T379977) (owner: 10Jforrester)
[15:07:57] <wikibugs>	 (03PS1) 10Ayounsi: ganeti.addnode: run ImportPuppetDB script after node addition [cookbooks] - 10https://gerrit.wikimedia.org/r/1117554 (https://phabricator.wikimedia.org/T381175)
[15:08:35] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Upgrade orchestrator from 2025-01-28-144249 to 2025-02-03-215824 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117551 (https://phabricator.wikimedia.org/T379977) (owner: 10Jforrester)
[15:08:48] <wikibugs>	 (03CR) 10Elukey: "+1 for the kubernetes part but https://puppet-compiler.wmflabs.org/output/1116888/2921/prometheus1005.eqiad.wmnet/index.html shows a chang" [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[15:09:00] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add a mediawiki-common release to mw-script [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117548 (owner: 10Giuseppe Lavagetto)
[15:09:22] <logmsgbot>	 !log jforrester@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[15:09:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2221 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73248 and previous config saved to /var/cache/conftool/dbconfig/20250205-150940-root.json
[15:09:43] <swfrench-wmf>	 !log reprepro included conftool 5.0.1-1 - T383324
[15:09:51] <logmsgbot>	 !log jforrester@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[15:11:10] <logmsgbot>	 !log jforrester@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[15:11:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:11:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73249 and previous config saved to /var/cache/conftool/dbconfig/20250205-151152-root.json
[15:11:57] <logmsgbot>	 !log jforrester@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[15:12:00] <logmsgbot>	 !log jforrester@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[15:12:48] <logmsgbot>	 !log jforrester@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[15:13:33] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] wikifunctions: Upgrade evaluators from 2025-01-29-140344 to 2025-01-30-011236 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117552 (owner: 10Jforrester)
[15:14:28] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ganeti.addnode: run ImportPuppetDB script after node addition [cookbooks] - 10https://gerrit.wikimedia.org/r/1117554 (https://phabricator.wikimedia.org/T381175) (owner: 10Ayounsi)
[15:14:51] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Upgrade evaluators from 2025-01-29-140344 to 2025-01-30-011236 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117552 (owner: 10Jforrester)
[15:15:15] <logmsgbot>	 !log jforrester@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[15:15:56] <logmsgbot>	 !log jforrester@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[15:18:15] <logmsgbot>	 !log jforrester@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[15:19:01] <logmsgbot>	 !log jforrester@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[15:19:05] <logmsgbot>	 !log jforrester@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[15:20:06] <logmsgbot>	 !log jforrester@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[15:21:23] <wikibugs>	 (03PS2) 10Ayounsi: ganeti.addnode: run ImportPuppetDB script after node addition [cookbooks] - 10https://gerrit.wikimedia.org/r/1117554 (https://phabricator.wikimedia.org/T381175)
[15:24:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2221 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73250 and previous config saved to /var/cache/conftool/dbconfig/20250205-152445-root.json
[15:27:17] <Dreamy_Jazz>	 jouncebot: nowandnext
[15:27:17] <jouncebot>	 For the next 0 hour(s) and 32 minute(s): Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T1500)
[15:27:17] <jouncebot>	 In 2 hour(s) and 32 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T1800)
[15:28:33] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Frequent disk resets on ms-be2075 - https://phabricator.wikimedia.org/T382707#10525803 (10Jhancock.wm) I reset a what they asked me to inside the server yesterday. When you get a chance, @MatthewVernon can you see if that fixed the errors.? Thanks
[15:29:57] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Frequent disk resets on ms-be2075 - https://phabricator.wikimedia.org/T382707#10525807 (10MatthewVernon) Hi, I'm afraid the answer is "no": ` Feb  5 15:23:01 ms-be2075 kernel: [71988.739632] sd 0:0:25:0: Power-on or device reset occurred Feb  5 15:23:02 ms...
[15:33:47] <wikibugs>	 (03CR) 10Ayounsi: "Moritz, is there a host I can test that change with ?" [cookbooks] - 10https://gerrit.wikimedia.org/r/1117554 (https://phabricator.wikimedia.org/T381175) (owner: 10Ayounsi)
[15:39:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2221 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73251 and previous config saved to /var/cache/conftool/dbconfig/20250205-153951-root.json
[15:41:02] <wikibugs>	 10ops-magru, 06Infrastructure-Foundations, 10netops: Jan 2025 - Magru core router connectivity blips - https://phabricator.wikimedia.org/T384774#10525894 (10ayounsi) Good idea regarding BFD. From https://supportportal.juniper.net/s/article/Observing-BGP-IO-ERROR-CLOSE-SESSION-error-logs-when-BGP-protocolgoes...
[15:51:19] <swfrench-wmf>	 !log finished deploying conftool 5.0.1-1 - T383324
[15:51:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:51:22] <stashbot>	 T383324: Prevent too many parsercache sections from being depooled - https://phabricator.wikimedia.org/T383324
[15:51:31] <wikibugs>	 (03PS1) 10Hnowlan: mobileapps: use correct port for eventgate [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117560 (https://phabricator.wikimedia.org/T385718)
[15:54:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2221 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73252 and previous config saved to /var/cache/conftool/dbconfig/20250205-155456-root.json
[15:56:25] <wikibugs>	 (03CR) 10Jgiannelos: [C:03+1] mobileapps: use correct port for eventgate [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117560 (https://phabricator.wikimedia.org/T385718) (owner: 10Hnowlan)
[15:59:52] <logmsgbot>	 !log klausman@deploy2002 helmfile [codfw] START helmfile.d/services/eventgate-main: sync
[16:00:14] <logmsgbot>	 !log klausman@deploy2002 helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
[16:03:28] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mobileapps: use correct port for eventgate [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117560 (https://phabricator.wikimedia.org/T385718) (owner: 10Hnowlan)
[16:04:35] <wikibugs>	 (03Merged) 10jenkins-bot: mobileapps: use correct port for eventgate [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117560 (https://phabricator.wikimedia.org/T385718) (owner: 10Hnowlan)
[16:05:39] <hnowlan>	 jouncebot: nowandnext
[16:05:39] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 54 minute(s)
[16:05:39] <jouncebot>	 In 1 hour(s) and 54 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T1800)
[16:07:27] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: Add the networkpolicy feature flag [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117225
[16:07:27] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: mediawiki-common: introduce chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117547
[16:07:27] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: Add a mediawiki-common release to mw-script [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117548
[16:10:09] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add a mediawiki-common release to mw-script [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117548 (owner: 10Giuseppe Lavagetto)
[16:10:40] <wikibugs>	 06SRE, 10SRE Observability (FY2024/2025-Q3): etcd: adapt etcd-backup.py for etcd 3.4 - https://phabricator.wikimedia.org/T385727 (10herron) 03NEW
[16:11:31] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 48113784 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[16:12:31] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 32848 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[16:15:54] <wikibugs>	 (03CR) 10Clément Goubert: "1. It will get applied on an `helmfile apply`, we can work around the potential gap on a case-by-case basis." [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963) (owner: 10Clément Goubert)
[16:17:39] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[16:19:55] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:21:22] <wikibugs>	 06SRE, 10SRE Observability (FY2024/2025-Q3): etcd: adapt etcd-backup.py for etcd 3.4 - https://phabricator.wikimedia.org/T385727#10526002 (10herron) setting environment `ETCDCTL_API=2` for the backup script may be an option as well
[16:22:13] <wikibugs>	 (03CR) 10CDanis: [C:03+1] hiera: enable json logging for benthos [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[16:22:24] <wikibugs>	 (03CR) 10Clément Goubert: "To be completely accurate, it will get applied on a subsequent `puppet` run, then an `helmfile apply`" [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963) (owner: 10Clément Goubert)
[16:22:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:23:58] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: enable json logging for benthos [puppet] - 10https://gerrit.wikimedia.org/r/1116763 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[16:30:29] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [eqiad] START helmfile.d/services/mobileapps: apply
[16:31:00] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
[16:32:42] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [codfw] START helmfile.d/services/mobileapps: apply
[16:33:08] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
[16:38:47] <wikibugs>	 (03PS1) 10Elukey: conftool-data: add wikikube workers to kartotherian-k8s-ssl [puppet] - 10https://gerrit.wikimedia.org/r/1117568 (https://phabricator.wikimedia.org/T216826)
[16:39:13] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Frequent disk resets on ms-be2075 - https://phabricator.wikimedia.org/T382707#10526052 (10Jhancock.wm) big sigh. can i get another smartctl report to send to dell?
[16:41:35] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] conftool-data: add wikikube workers to kartotherian-k8s-ssl [puppet] - 10https://gerrit.wikimedia.org/r/1117568 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[16:45:05] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Frequent disk resets on ms-be2075 - https://phabricator.wikimedia.org/T382707#10526061 (10MatthewVernon) OK; same commands as before: ` mvernon@ms-be2075:~$ sudo smartctl --scan /dev/sda -d scsi # /dev/sda, SCSI device /dev/sdb -d scsi # /dev/sdb, SCSI dev...
[16:50:30] <wikibugs>	 (03CR) 10Dzahn: "removing from preseed.yaml and hierdata/requesttracker can go right away, but please keep the host in site.pp for now so we can apply the " [puppet] - 10https://gerrit.wikimedia.org/r/1117529 (https://phabricator.wikimedia.org/T384595) (owner: 10Arnaudb)
[16:50:46] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] rt: removing email configurations [puppet] - 10https://gerrit.wikimedia.org/r/1117528 (https://phabricator.wikimedia.org/T384595) (owner: 10Arnaudb)
[16:53:39] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] rt: removing email configurations [puppet] - 10https://gerrit.wikimedia.org/r/1117528 (https://phabricator.wikimedia.org/T384595) (owner: 10Arnaudb)
[16:58:35] <wikibugs>	 (03PS1) 10Dzahn: installserver: remove moscovium [puppet] - 10https://gerrit.wikimedia.org/r/1117572 (https://phabricator.wikimedia.org/T384595)
[17:01:21] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "a part of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1117529 but keeping site/role alive for now" [puppet] - 10https://gerrit.wikimedia.org/r/1117572 (https://phabricator.wikimedia.org/T384595) (owner: 10Dzahn)
[17:02:12] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] rt: removing email configurations [puppet] - 10https://gerrit.wikimedia.org/r/1117528 (https://phabricator.wikimedia.org/T384595) (owner: 10Arnaudb)
[17:02:54] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] Phabricator: Disable weekly 2fa mail [puppet] - 10https://gerrit.wikimedia.org/r/1117489 (https://phabricator.wikimedia.org/T304792) (owner: 10Aklapper)
[17:06:45] <wikibugs>	 (03CR) 10Andrew Bogott: sysctl: Introduce base::sysctl::inotify helper (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1116888 (https://phabricator.wikimedia.org/T385530) (owner: 10BryanDavis)
[17:14:55] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:15:58] <wikibugs>	 (03PS1) 10Kamila Součková: kube-state-metrics: export extra jobs labels [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117574 (https://phabricator.wikimedia.org/T385709)
[17:16:50] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "Well, this will not actually remove the timer and mails. But I can do that manually." [puppet] - 10https://gerrit.wikimedia.org/r/1117489 (https://phabricator.wikimedia.org/T304792) (owner: 10Aklapper)
[17:17:02] <wikibugs>	 (03CR) 10Kamila Součková: kube-state-metrics: export extra jobs labels (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117574 (https://phabricator.wikimedia.org/T385709) (owner: 10Kamila Součková)
[17:20:02] <wikibugs>	 (03CR) 10CI reject: [V:04-1] kube-state-metrics: export extra jobs labels [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117574 (https://phabricator.wikimedia.org/T385709) (owner: 10Kamila Součková)
[17:22:52] <wikibugs>	 (03PS2) 10Kamila Součková: kube-state-metrics: export extra jobs labels [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117574 (https://phabricator.wikimedia.org/T385709)
[17:29:40] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "on phab1004: rm /lib/systemd/system/phabricator_stats_job_mfa_check.*" [puppet] - 10https://gerrit.wikimedia.org/r/1117489 (https://phabricator.wikimedia.org/T304792) (owner: 10Aklapper)
[17:30:06] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10526218 (10VRiley-WMF) So, I believe I have figured out what the issue may be. Currently there is a server (ganeti1044) is right underneth it. The server is...
[17:30:13] <mutante>	 !log phab1004 - rm /lib/systemd/system/phabricator_stats_job_mfa_check.*  for gerrit:1117489 T299403 
[17:30:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:31:17] <wikibugs>	 (03PS1) 10FNegri: icinga_exporter: don't route wikitech-static to wmcs [puppet] - 10https://gerrit.wikimedia.org/r/1117575 (https://phabricator.wikimedia.org/T376400)
[17:32:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73254 and previous config saved to /var/cache/conftool/dbconfig/20250205-173245-root.json
[17:32:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1237 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73255 and previous config saved to /var/cache/conftool/dbconfig/20250205-173257-root.json
[17:43:06] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:45:39] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:46:35] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:47:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73256 and previous config saved to /var/cache/conftool/dbconfig/20250205-174750-root.json
[17:48:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1237 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73257 and previous config saved to /var/cache/conftool/dbconfig/20250205-174802-root.json
[17:48:41] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] "Ack, thanks! Should we document this somewhere or is it obvious enough?" [puppet] - 10https://gerrit.wikimedia.org/r/1117234 (https://phabricator.wikimedia.org/T377963) (owner: 10Clément Goubert)
[17:51:50] <swfrench-wmf>	 jouncebot: nowandnext
[17:51:50] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 8 minute(s)
[17:51:51] <jouncebot>	 In 0 hour(s) and 8 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T1800)
[17:53:23] <swfrench-wmf>	 since things are quiet deployment-wise, I'm going to proceed with some prep work for the upcoming infra window
[17:53:49] <wikibugs>	 (03CR) 10Scott French: [C:03+2] mw-(api-ext|web): scale next to 25% of main [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117271 (https://phabricator.wikimedia.org/T383845) (owner: 10Scott French)
[17:55:12] <wikibugs>	 (03Merged) 10jenkins-bot: mw-(api-ext|web): scale next to 25% of main [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117271 (https://phabricator.wikimedia.org/T383845) (owner: 10Scott French)
[17:56:55] <wikibugs>	 (03CR) 10FNegri: [C:04-1] "after discussing with Andrew in IRC, we decided it's best to keep the alerts going to WMCS until the wikitech-static redesign is complete." [puppet] - 10https://gerrit.wikimedia.org/r/1117575 (https://phabricator.wikimedia.org/T376400) (owner: 10FNegri)
[17:57:30] <wikibugs>	 (03CR) 10Michael Große: [C:03+1] Babel: Do not use a wmg variable for BabelDefaultLevel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115520 (https://phabricator.wikimedia.org/T119117) (owner: 10Urbanecm)
[17:57:38] <wikibugs>	 (03CR) 10Scott French: [C:03+2] mw-api-int: serve 5% of traffic on PHP 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117263 (https://phabricator.wikimedia.org/T383845) (owner: 10Scott French)
[17:58:53] <wikibugs>	 (03Merged) 10jenkins-bot: mw-api-int: serve 5% of traffic on PHP 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117263 (https://phabricator.wikimedia.org/T383845) (owner: 10Scott French)
[17:59:04] <wikibugs>	 (03PS1) 10Dzahn: httpbb: remove tests for rt.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1117580 (https://phabricator.wikimedia.org/T384595)
[18:00:05] <jouncebot>	 swfrench-wmf: How many deployers does it take to do MediaWiki infrastructure (UTC late) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T1800).
[18:00:13] <icinga-wm>	 PROBLEM - Host mr1-magru.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[18:00:23] <swfrench-wmf>	 o/
[18:00:44] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
[18:01:04] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
[18:01:20] <wikibugs>	 (03CR) 10Michael Große: [C:03+1] Babel: Remove config that is now in community configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115518 (https://phabricator.wikimedia.org/T385239) (owner: 10Urbanecm)
[18:01:24] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-web: apply
[18:01:47] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-web: apply
[18:02:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73258 and previous config saved to /var/cache/conftool/dbconfig/20250205-180256-root.json
[18:03:06] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
[18:03:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1237 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73259 and previous config saved to /var/cache/conftool/dbconfig/20250205-180307-root.json
[18:03:23] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
[18:03:53] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
[18:04:07] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
[18:04:34] <wikibugs>	 (03PS1) 10AikoChou: ml-services: increase cpu and memory for reference-quality [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117585 (https://phabricator.wikimedia.org/T384172)
[18:04:54] <swfrench-wmf>	 !log scaled mw-api-ext and mw-web next releases to 25% of main - T383845
[18:04:55] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:04:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:04:57] <stashbot>	 T383845: MediaWiki on PHP 8.1 production traffic ramp-up - https://phabricator.wikimedia.org/T383845
[18:06:08] <wikibugs>	 (03CR) 10Michael Große: Babel: Merge back into IS.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117105 (https://phabricator.wikimedia.org/T385239) (owner: 10Urbanecm)
[18:06:21] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
[18:06:37] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
[18:07:09] <wikibugs>	 (03PS3) 10Urbanecm: Babel: Merge back into InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117105 (https://phabricator.wikimedia.org/T385239)
[18:07:11] <wikibugs>	 (03CR) 10Urbanecm: Babel: Merge back into InitialiseSettings.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117105 (https://phabricator.wikimedia.org/T385239) (owner: 10Urbanecm)
[18:07:18] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
[18:07:23] <icinga-wm>	 RECOVERY - Check unit status of etcd-backup on aux-k8s-etcd2004 is OK: OK: Status of the systemd unit etcd-backup https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[18:07:25] <wikibugs>	 (03PS4) 10Urbanecm: Babel: Merge back into InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117105 (https://phabricator.wikimedia.org/T385239)
[18:07:34] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
[18:09:21] <wikibugs>	 (03CR) 10Michael Große: [C:03+1] Babel: Merge back into InitialiseSettings.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117105 (https://phabricator.wikimedia.org/T385239) (owner: 10Urbanecm)
[18:10:17] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
[18:10:30] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
[18:10:54] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
[18:11:03] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
[18:11:44] <swfrench-wmf>	 !log mw-api-int to ~ 5% of traffic on PHP 8.1 - T383845
[18:11:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:47] <stashbot>	 T383845: MediaWiki on PHP 8.1 production traffic ramp-up - https://phabricator.wikimedia.org/T383845
[18:14:13] <wikibugs>	 (03PS1) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/1117588
[18:16:47] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by swfrench@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117276 (https://phabricator.wikimedia.org/T383845) (owner: 10Scott French)
[18:17:34] <wikibugs>	 (03Merged) 10jenkins-bot: Enroll 50% of client sessions in PHP 8.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117276 (https://phabricator.wikimedia.org/T383845) (owner: 10Scott French)
[18:18:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73261 and previous config saved to /var/cache/conftool/dbconfig/20250205-181801-root.json
[18:18:02] <logmsgbot>	 !log swfrench@deploy2002 Started scap sync-world: Backport for [[gerrit:1117276|Enroll 50% of client sessions in PHP 8.1 (T383845)]]
[18:18:06] <stashbot>	 T383845: MediaWiki on PHP 8.1 production traffic ramp-up - https://phabricator.wikimedia.org/T383845
[18:18:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1237 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73262 and previous config saved to /var/cache/conftool/dbconfig/20250205-181813-root.json
[18:22:38] <wikibugs>	 (03PS1) 10Jgiannelos: mobileapps: Fix typo in event stream name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117591 (https://phabricator.wikimedia.org/T385718)
[18:22:58] <logmsgbot>	 !log swfrench@deploy2002 swfrench: Backport for [[gerrit:1117276|Enroll 50% of client sessions in PHP 8.1 (T383845)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[18:23:41] <wikibugs>	 (03CR) 10Jgiannelos: "`" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117591 (https://phabricator.wikimedia.org/T385718) (owner: 10Jgiannelos)
[18:24:22] <logmsgbot>	 !log swfrench@deploy2002 swfrench: Continuing with sync
[18:24:54] <wikibugs>	 (03PS2) 10Herron: etcd-backup: ensure api v2 is used in newer etcd versions [puppet] - 10https://gerrit.wikimedia.org/r/1117588 (https://phabricator.wikimedia.org/T385727)
[18:31:00] <logmsgbot>	 !log swfrench@deploy2002 Finished scap sync-world: Backport for [[gerrit:1117276|Enroll 50% of client sessions in PHP 8.1 (T383845)]] (duration: 12m 57s)
[18:31:03] <stashbot>	 T383845: MediaWiki on PHP 8.1 production traffic ramp-up - https://phabricator.wikimedia.org/T383845
[18:31:27] <wikibugs>	 (03PS1) 10Pppery: Enable section translation on Kanuri Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117594 (https://phabricator.wikimedia.org/T385185)
[18:33:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73263 and previous config saved to /var/cache/conftool/dbconfig/20250205-183306-root.json
[18:33:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1237 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73264 and previous config saved to /var/cache/conftool/dbconfig/20250205-183318-root.json
[18:34:17] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-jobrunner_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-jobrunner_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[18:34:32] <jinxer-wm>	 FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-jobrunner_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:39:18] <swfrench-wmf>	 ^ `X-Powered-By header: expected to match /^PHP/7\./, got 'PHP/8.1.31'.`
[18:39:55] <swfrench-wmf>	 I had no idea that was there! I'll follow up with a patch later today
[18:40:21] <swfrench-wmf>	 (note: mw-jobrunner is serving 2% or requests on 8.1 as of earlier today)
[18:41:39] <icinga-wm>	 RECOVERY - Host mr1-magru.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 123.77 ms
[18:42:20] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops, 06Infrastructure-Foundations: Perform fake disk swap on ms-be2088 as test - https://phabricator.wikimedia.org/T384003#10526728 (10Neobeta61) So i think I am understanding 2 issues here -  The card does not move into JBOD mode easily, and are looking for...
[18:43:14] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10procurement: codfw:expansion: Network devices/patch panel wiring - https://phabricator.wikimedia.org/T382219#10526729 (10cmooney) >>! In T382219#10521197, @Papaul wrote: > On the other hand I have a question for @ayounsi and @cmooney  > so we  haven't decided on the network...
[18:43:53] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Homer trying to delete BGP peerings for VMs on new Eqiad ganeti nodes - https://phabricator.wikimedia.org/T381175#10526733 (10cmooney) 05Resolved→03Open
[18:43:55] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops, 06Infrastructure-Foundations: Perform fake disk swap on ms-be2088 as test - https://phabricator.wikimedia.org/T384003#10526734 (10Neobeta61) >>! In T384003#10525488, @elukey wrote: > @Neobeta61 Hi! I just followed up on the email threads, I didn't get an...
[18:44:18] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1117554 (https://phabricator.wikimedia.org/T381175) (owner: 10Ayounsi)
[18:47:26] <wikibugs>	 (03PS3) 10Pmiazga: Disable new WebAuthn credentials creation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1113141 (https://phabricator.wikimedia.org/T378402)
[18:47:26] <wikibugs>	 (03CR) 10Pmiazga: Disable new WebAuthn credentials creation (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1113141 (https://phabricator.wikimedia.org/T378402) (owner: 10Pmiazga)
[18:48:03] <icinga-wm>	 PROBLEM - Host mr1-magru.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[18:52:07] <wikibugs>	 (03CR) 10Pmiazga: "Ok, I have better understanding, the `wmg` is used only when there is some logic in CommonSettings that depends on this variable. Here we " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1113141 (https://phabricator.wikimedia.org/T378402) (owner: 10Pmiazga)
[18:52:37] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, February 06 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1113141 (https://phabricator.wikimedia.org/T378402) (owner: 10Pmiazga)
[18:53:12] <swfrench-wmf>	 enrollment on 8.1 seems to have stabilized. I'll continue monitoring, but I am otherwise done with the infra window.
[18:58:23] <icinga-wm>	 PROBLEM - Check unit status of etcd-backup on aux-k8s-etcd2004 is CRITICAL: CRITICAL: Status of the systemd unit etcd-backup https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[18:59:58] <wikibugs>	 (03PS1) 10Dzahn: site: remove requesttracker role from host moscovium [puppet] - 10https://gerrit.wikimedia.org/r/1117598 (https://phabricator.wikimedia.org/T384595)
[19:00:03] <wikibugs>	 (03PS1) 10Jdlrobson: Deploy dark mode to anonymous users for certain projects (February 2025) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117599 (https://phabricator.wikimedia.org/T383451)
[19:00:04] <jouncebot>	 jnuche and jeena: OwO what's this, a deployment window?? MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T1900). nyaa~
[19:00:51] <wikibugs>	 (03CR) 10Dzahn: "I did the installserver part separately and also see https://gerrit.wikimedia.org/r/c/operations/puppet/+/1117598" [puppet] - 10https://gerrit.wikimedia.org/r/1117529 (https://phabricator.wikimedia.org/T384595) (owner: 10Arnaudb)
[19:04:55] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:11:21] <wikibugs>	 (03PS1) 10Scott French: jobrunner: remove PHP major version httpbb assertion [puppet] - 10https://gerrit.wikimedia.org/r/1117603 (https://phabricator.wikimedia.org/T383845)
[19:11:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:14:42] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] openstack: keystone: Do not create Wikitech pages for service projects [puppet] - 10https://gerrit.wikimedia.org/r/1114111 (owner: 10Majavah)
[19:21:10] <wikibugs>	 (03CR) 10RLazarus: [C:03+1] "I had questions and then your commit message answered all of them. :)" [puppet] - 10https://gerrit.wikimedia.org/r/1117603 (https://phabricator.wikimedia.org/T383845) (owner: 10Scott French)
[19:29:29] <icinga-wm>	 RECOVERY - Host mr1-magru.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 123.77 ms
[19:32:08] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: httpbb_kubernetes_mw-jobrunner_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:33:39] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 05 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117599 (https://phabricator.wikimedia.org/T383451) (owner: 10Jdlrobson)
[19:34:17] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-jobrunner_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-jobrunner_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:38:15] <wikibugs>	 (03PS3) 10BCornwall: varnish: Enable single_backend by default [puppet] - 10https://gerrit.wikimedia.org/r/1115086
[19:38:15] <wikibugs>	 (03PS16) 10BCornwall: conftool: rm ats-be services cache nodes [puppet] - 10https://gerrit.wikimedia.org/r/1114074
[19:43:44] <wikibugs>	 (03CR) 10BCornwall: [V:03+1] "PCC SUCCESS (NOOP 8): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4931/console" [puppet] - 10https://gerrit.wikimedia.org/r/1115086 (owner: 10BCornwall)
[19:54:38] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] ml-services: increase cpu and memory for reference-quality [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117585 (https://phabricator.wikimedia.org/T384172) (owner: 10AikoChou)
[19:58:12] <wikibugs>	 (03CR) 10Bartosz Dziewoński: [C:03+1] Disable new WebAuthn credentials creation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1113141 (https://phabricator.wikimedia.org/T378402) (owner: 10Pmiazga)
[19:59:22] <wikibugs>	 (03PS1) 10CDanis: aptrepo: conftool: add buster as well [puppet] - 10https://gerrit.wikimedia.org/r/1117611
[19:59:32] <wikibugs>	 (03PS2) 10CDanis: aptrepo: conftool: add buster as well [puppet] - 10https://gerrit.wikimedia.org/r/1117611
[20:00:34] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Looks good! I would recommend a manual test on beta to make sure things are working since the configuration there tends to be a bit "not s" [puppet] - 10https://gerrit.wikimedia.org/r/1115086 (owner: 10BCornwall)
[20:05:10] <wikibugs>	 (03CR) 10Ssingh: "I think this still needs to be updated or am I missing a related CR?" [puppet] - 10https://gerrit.wikimedia.org/r/1114074 (owner: 10BCornwall)
[20:06:12] <wikibugs>	 (03PS3) 10CDanis: aptrepo: conftool: add buster as well [puppet] - 10https://gerrit.wikimedia.org/r/1117611
[20:10:09] <wikibugs>	 (03PS1) 10Jdlrobson: Speed tests: Add HTML files for touch action [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117612 (https://phabricator.wikimedia.org/T118509)
[20:10:36] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10526968 (10phaultfinder)
[20:10:46] <sukhe>	 !log granting brett member,reader role on beta
[20:10:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:12:40] <wikibugs>	 (03PS2) 10Jdlrobson: Speed tests: Add HTML files for touch action [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117612 (https://phabricator.wikimedia.org/T118509)
[20:12:53] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 05 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117612 (https://phabricator.wikimedia.org/T118509) (owner: 10Jdlrobson)
[20:12:56] <wikibugs>	 (03CR) 10BCornwall: [V:03+1 C:03+2] varnish: Enable single_backend by default [puppet] - 10https://gerrit.wikimedia.org/r/1115086 (owner: 10BCornwall)
[20:28:29] <wikibugs>	 (03PS4) 10CDanis: aptrepo: conftool: auto import from apt-staging [puppet] - 10https://gerrit.wikimedia.org/r/1117611
[20:30:07] <wikibugs>	 (03PS5) 10CDanis: aptrepo: conftool: auto import from apt-staging [puppet] - 10https://gerrit.wikimedia.org/r/1117611
[20:30:42] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Frequent disk resets on ms-be2075 - https://phabricator.wikimedia.org/T382707#10527003 (10Jhancock.wm) they're send a new backplane and controller card to try and fix this. i'll update when these parts have been replaced.
[20:42:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248 (T384592)', diff saved to https://phabricator.wikimedia.org/P73266 and previous config saved to /var/cache/conftool/dbconfig/20250205-204208-marostegui.json
[20:42:12] <stashbot>	 T384592: Add normalization columns to categorylinks table - https://phabricator.wikimedia.org/T384592
[20:47:37] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] aptrepo: conftool: auto import from apt-staging [puppet] - 10https://gerrit.wikimedia.org/r/1117611 (owner: 10CDanis)
[20:48:27] <wikibugs>	 (03CR) 10CDanis: [C:03+2] aptrepo: conftool: auto import from apt-staging [puppet] - 10https://gerrit.wikimedia.org/r/1117611 (owner: 10CDanis)
[20:49:38] <wikibugs>	 (03CR) 10Lucas Werkmeister: "I just realized something… I’m not gonna be able to test this on WikimediaDebug, am I? Because the `X-Wikimedia-Debug` header isn’t allowl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1116795 (https://phabricator.wikimedia.org/T322944) (owner: 10Lucas Werkmeister)
[20:53:21] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db2243 - https://phabricator.wikimedia.org/T382425#10527055 (10Jhancock.wm)
[20:55:25] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[20:57:06] <wikibugs>	 (03PS1) 10CDanis: aptrepo: conftool: fix distributions locations [puppet] - 10https://gerrit.wikimedia.org/r/1117619
[20:57:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P73267 and previous config saved to /var/cache/conftool/dbconfig/20250205-205715-marostegui.json
[20:57:53] <wikibugs>	 (03CR) 10CDanis: [C:03+2] aptrepo: conftool: fix distributions locations [puppet] - 10https://gerrit.wikimedia.org/r/1117619 (owner: 10CDanis)
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: #bothumor My software never has bugs. It just develops random features. Rise for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T2100).
[21:00:05] <jouncebot>	 lucaswerkmeister and Jdlrobson: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:08] <lucaswerkmeister>	 o/
[21:01:36] <Jdlrobson>	 o/
[21:01:50] <wikibugs>	 (03PS1) 10LorenMora: Deploy Vector 2022 skin to next set of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117620 (https://phabricator.wikimedia.org/T384824)
[21:05:43] <lucaswerkmeister>	 any deployers around tonight? ^^
[21:12:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P73268 and previous config saved to /var/cache/conftool/dbconfig/20250205-211222-marostegui.json
[21:14:53] <cdanis>	 !log released new conftool 5.0.2 for all distros to apt.wm.o
[21:14:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:15:07] <lucaswerkmeister>	 …no deployers tonight?
[21:16:38] <Jdlrobson>	 lucaswerkmeister: I'm seeing if I can find some one but no luck so far
[21:21:30] <cdanis>	 !log upgraded python3-conftool-requestctl and friends on puppetservers/puppetmasters
[21:21:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:22:35] <wikibugs>	 (03PS2) 10Jdlrobson: Deploy dark mode to anonymous users for certain projects (February 2025) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117599 (https://phabricator.wikimedia.org/T383451)
[21:25:21] <jan_drewniak>	 lucaswerkmeister: hey, Jdlrobson mentioned you're looking for a deployer, I can help out. Is it just a config patch? 
[21:25:30] <lucaswerkmeister>	 mine is, I haven’t looked at Jon’s ^^
[21:25:35] <wikibugs>	 (03PS1) 10CDanis: aptrepo: comment out some old update lines [puppet] - 10https://gerrit.wikimedia.org/r/1117624
[21:25:55] <lucaswerkmeister>	 ok looks like it’s all config changes
[21:26:38] <jan_drewniak>	 lucaswerkmeister: Ok I can go ahead and deploy. I'll start with yours
[21:26:43] <lucaswerkmeister>	 thanks \o/
[21:27:09] <lucaswerkmeister>	 note, I’ll try to test my change but it’s possible that it’s only testable once it’s out of WikimediaDebug (because I suspect WikimediaDebug doesn’t work with CORS requests)
[21:27:13] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdrewniak@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1116795 (https://phabricator.wikimedia.org/T322944) (owner: 10Lucas Werkmeister)
[21:27:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248 (T384592)', diff saved to https://phabricator.wikimedia.org/P73269 and previous config saved to /var/cache/conftool/dbconfig/20250205-212729-marostegui.json
[21:27:33] <stashbot>	 T384592: Add normalization columns to categorylinks table - https://phabricator.wikimedia.org/T384592
[21:27:37] <lucaswerkmeister>	 but we can at least test that nothing obvious breaks
[21:27:44] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1249.eqiad.wmnet with reason: Maintenance
[21:27:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1249 (T384592)', diff saved to https://phabricator.wikimedia.org/P73270 and previous config saved to /var/cache/conftool/dbconfig/20250205-212751-marostegui.json
[21:27:51] <jan_drewniak>	 yeah I noticed the comment :P 
[21:27:58] <wikibugs>	 (03Merged) 10jenkins-bot: Enable $wgAllowAuthenticatedCrossOrigin on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1116795 (https://phabricator.wikimedia.org/T322944) (owner: 10Lucas Werkmeister)
[21:28:27] <logmsgbot>	 !log jdrewniak@deploy2002 Started scap sync-world: Backport for [[gerrit:1116795|Enable $wgAllowAuthenticatedCrossOrigin on testwiki (T322944)]]
[21:28:30] <stashbot>	 T322944: Allow authenticated requests via OAuth to the Action API from any origin - https://phabricator.wikimedia.org/T322944
[21:28:43] <lucaswerkmeister>	 ^^
[21:31:31] <logmsgbot>	 !log jdrewniak@deploy2002 lucaswerkmeister, jdrewniak: Backport for [[gerrit:1116795|Enable $wgAllowAuthenticatedCrossOrigin on testwiki (T322944)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:31:37] <lucaswerkmeister>	 testing…
[21:32:00] <lucaswerkmeister>	 yup, CORS request gets blocked
[21:32:59] <lucaswerkmeister>	 but if I put it in curl, it looks pretty good
[21:34:46] <lucaswerkmeister>	 jan_drewniak: everything’s working as far as I can tell
[21:35:16] <lucaswerkmeister>	 including previous functionality (normal anonymous CORS with origin=*)
[21:35:17] <jan_drewniak>	 lucaswerkmeister: ok, I'll sync now
[21:35:27] <logmsgbot>	 !log jdrewniak@deploy2002 lucaswerkmeister, jdrewniak: Continuing with sync
[21:41:55] <thcipriani>	 sorry was in a meeting, thanks for deploying jan_drewniak 
[21:42:18] <logmsgbot>	 !log jdrewniak@deploy2002 Finished scap sync-world: Backport for [[gerrit:1116795|Enable $wgAllowAuthenticatedCrossOrigin on testwiki (T322944)]] (duration: 13m 50s)
[21:42:21] <stashbot>	 T322944: Allow authenticated requests via OAuth to the Action API from any origin - https://phabricator.wikimedia.org/T322944
[21:42:25] <jan_drewniak>	 thcipriani: np!
[21:42:30] <thcipriani>	 <3
[21:42:52] <lucaswerkmeister>	 trying without wikimediadebug now ^^
[21:43:18] <lucaswerkmeister>	 WOOOOOOOOOOOH https://test.wikipedia.org/w/index.php?title=M3api-examples_guestbook&diff=prev&oldid=643174
[21:43:35] <lucaswerkmeister>	 thanks jan_drewniak \o/ \o/
[21:49:43] <jan_drewniak>	 lucaswerkmeister: np! glad to help out :) 
[21:53:39] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic2085-production-search-psi-codfw is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[22:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T2200)
[22:31:24] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.hosts.decommission for hosts logstash2029.codfw.wmnet
[22:34:33] <jinxer-wm>	 FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-jobrunner_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:35:17] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-jobrunner_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-jobrunner_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[22:36:19] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.dns.netbox
[22:40:02] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
[22:40:20] <logmsgbot>	 !log cwhite@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
[22:40:20] <logmsgbot>	 !log cwhite@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[22:40:21] <logmsgbot>	 !log cwhite@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2029.codfw.wmnet
[22:41:16] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.hosts.decommission for hosts logstash2028.codfw.wmnet
[22:45:53] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.dns.netbox
[22:51:17] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2028.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
[22:54:15] <logmsgbot>	 !log cwhite@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2028.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
[22:54:16] <logmsgbot>	 !log cwhite@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[22:54:17] <logmsgbot>	 !log cwhite@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2028.codfw.wmnet
[22:55:56] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.hosts.decommission for hosts logstash2027.codfw.wmnet
[23:00:04] <jouncebot>	 Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250205T2300)
[23:00:53] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.dns.netbox
[23:04:55] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:06:13] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
[23:07:03] <logmsgbot>	 !log cwhite@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
[23:07:03] <logmsgbot>	 !log cwhite@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[23:07:05] <logmsgbot>	 !log cwhite@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2027.codfw.wmnet
[23:09:10] <jan_drewniak>	 Jdlrobson: alright, I'm starting with 1117599, then  1117612 
[23:10:00] <Jdlrobson>	 cool
[23:10:22] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdrewniak@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117599 (https://phabricator.wikimedia.org/T383451) (owner: 10Jdlrobson)
[23:10:23] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdrewniak@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117612 (https://phabricator.wikimedia.org/T118509) (owner: 10Jdlrobson)
[23:10:47] <wikibugs>	 (03CR) 10Jdlrobson: [C:03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117620 (https://phabricator.wikimedia.org/T384824) (owner: 10LorenMora)
[23:11:01] <wikibugs>	 (03Merged) 10jenkins-bot: Deploy dark mode to anonymous users for certain projects (February 2025) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117599 (https://phabricator.wikimedia.org/T383451) (owner: 10Jdlrobson)
[23:11:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:11:34] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-dnsleaks: make faster by specifying deployment [puppet] - 10https://gerrit.wikimedia.org/r/1117630 (https://phabricator.wikimedia.org/T384118)
[23:11:36] <wikibugs>	 (03PS1) 10Andrew Bogott: Designate: unset legacy_domain_id [puppet] - 10https://gerrit.wikimedia.org/r/1117631 (https://phabricator.wikimedia.org/T384118)
[23:12:48] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10527296 (10cmooney) @Jhancock.wm no rush but putting down while I remember.  Whenever you've a chance it'd be good to do a revi...
[23:13:47] <wikibugs>	 (03CR) 10CI reject: [V:04-1] wmcs-dnsleaks: make faster by specifying deployment [puppet] - 10https://gerrit.wikimedia.org/r/1117630 (https://phabricator.wikimedia.org/T384118) (owner: 10Andrew Bogott)
[23:13:51] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1117631 (https://phabricator.wikimedia.org/T384118) (owner: 10Andrew Bogott)
[23:15:36] <wikibugs>	 (03PS2) 10Andrew Bogott: wmcs-dnsleaks: make faster by specifying deployment [puppet] - 10https://gerrit.wikimedia.org/r/1117630 (https://phabricator.wikimedia.org/T384118)
[23:15:36] <wikibugs>	 (03PS2) 10Andrew Bogott: Designate: unset legacy_domain_id [puppet] - 10https://gerrit.wikimedia.org/r/1117631 (https://phabricator.wikimedia.org/T384118)
[23:16:59] <wikibugs>	 (03CR) 10Jdrewniak: [C:03+2] Speed tests: Add HTML files for touch action [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117612 (https://phabricator.wikimedia.org/T118509) (owner: 10Jdlrobson)
[23:17:39] <jan_drewniak>	 Jdlrobson: just waiting for this one to merge https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1117612?tab=checks
[23:18:42] <jan_drewniak>	 Oh is there a relation chain on those patches? I'll do the dark-mode one first. 
[23:18:48] <wikibugs>	 (03PS3) 10Andrew Bogott: Designate: unset legacy_domain_id [puppet] - 10https://gerrit.wikimedia.org/r/1117631 (https://phabricator.wikimedia.org/T384118)
[23:19:01] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1117631 (https://phabricator.wikimedia.org/T384118) (owner: 10Andrew Bogott)
[23:19:03] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.hosts.decommission for hosts logstash2026.codfw.wmnet
[23:19:12] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Designate: unset legacy_domain_id [puppet] - 10https://gerrit.wikimedia.org/r/1117631 (https://phabricator.wikimedia.org/T384118) (owner: 10Andrew Bogott)
[23:19:30] <logmsgbot>	 !log jdrewniak@deploy2002 Started scap sync-world: Backport for [[gerrit:1117599|Deploy dark mode to anonymous users for certain projects (February 2025) (T383451)]]
[23:19:33] <stashbot>	 T383451: Deploy dark mode to anonymous users for certain projects (February 2025) - https://phabricator.wikimedia.org/T383451
[23:20:29] <jan_drewniak>	 oh I guess I could have also pressed the 'submit' button... 
[23:22:31] <logmsgbot>	 !log jdrewniak@deploy2002 jdrewniak, jdlrobson: Backport for [[gerrit:1117599|Deploy dark mode to anonymous users for certain projects (February 2025) (T383451)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[23:23:04] <wikibugs>	 (03PS4) 10Andrew Bogott: Designate: unset legacy_domain_id [puppet] - 10https://gerrit.wikimedia.org/r/1117631 (https://phabricator.wikimedia.org/T384118)
[23:25:14] <Jdlrobson>	 jan_drewniak: sorry they didnt need to be chained.
[23:25:19] <logmsgbot>	 !log jdrewniak@deploy2002 jdrewniak, jdlrobson: Continuing with sync
[23:26:02] <jan_drewniak>	 Jdlrobson: np, just checked the dark-mode patch, lgtm. syncin
[23:29:02] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1117631 (https://phabricator.wikimedia.org/T384118) (owner: 10Andrew Bogott)
[23:30:33] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.dns.netbox
[23:31:42] <Jdlrobson>	 jan_drewniak: thanks
[23:31:58] <logmsgbot>	 !log jdrewniak@deploy2002 Finished scap sync-world: Backport for [[gerrit:1117599|Deploy dark mode to anonymous users for certain projects (February 2025) (T383451)]] (duration: 12m 27s)
[23:32:01] <stashbot>	 T383451: Deploy dark mode to anonymous users for certain projects (February 2025) - https://phabricator.wikimedia.org/T383451
[23:32:08] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: httpbb_kubernetes_mw-jobrunner_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:33:02] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdrewniak@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1117612 (https://phabricator.wikimedia.org/T118509) (owner: 10Jdlrobson)
[23:35:11] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
[23:35:17] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-jobrunner_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-jobrunner_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[23:35:30] <logmsgbot>	 !log cwhite@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
[23:35:30] <logmsgbot>	 !log cwhite@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[23:35:31] <logmsgbot>	 !log cwhite@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2026.codfw.wmnet
[23:39:27] <logmsgbot>	 !log jdrewniak@deploy2002 Started scap sync-world: Backport for [[gerrit:1117612|Speed tests: Add HTML files for touch action (T118509)]]
[23:39:30] <stashbot>	 T118509: Evaluate using `touch-action: manipulation;` - https://phabricator.wikimedia.org/T118509
[23:42:26] <logmsgbot>	 !log jdrewniak@deploy2002 jdlrobson, jdrewniak: Backport for [[gerrit:1117612|Speed tests: Add HTML files for touch action (T118509)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[23:42:26] <wikibugs>	 (03PS1) 10Cwhite: site: clean up logstash202[6789] configs [puppet] - 10https://gerrit.wikimedia.org/r/1117634 (https://phabricator.wikimedia.org/T383288)
[23:44:00] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] wmcs-dnsleaks: make faster by specifying deployment [puppet] - 10https://gerrit.wikimedia.org/r/1117630 (https://phabricator.wikimedia.org/T384118) (owner: 10Andrew Bogott)
[23:44:02] <logmsgbot>	 !log jdrewniak@deploy2002 jdlrobson, jdrewniak: Continuing with sync
[23:46:02] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] site: clean up logstash202[6789] configs [puppet] - 10https://gerrit.wikimedia.org/r/1117634 (https://phabricator.wikimedia.org/T383288) (owner: 10Cwhite)
[23:48:31] <wikibugs>	 (03PS2) 10Scott French: jobrunner: remove PHP major version httpbb assertion [puppet] - 10https://gerrit.wikimedia.org/r/1117603 (https://phabricator.wikimedia.org/T383845)
[23:50:38] <logmsgbot>	 !log jdrewniak@deploy2002 Finished scap sync-world: Backport for [[gerrit:1117612|Speed tests: Add HTML files for touch action (T118509)]] (duration: 11m 10s)
[23:50:41] <stashbot>	 T118509: Evaluate using `touch-action: manipulation;` - https://phabricator.wikimedia.org/T118509
[23:51:23] <wikibugs>	 (03CR) 10Scott French: "Thanks, Reuven! Ah, yeah adding a comment pointing this out seems like a solid idea. I've added one just above the 405 case, since in prac" [puppet] - 10https://gerrit.wikimedia.org/r/1117603 (https://phabricator.wikimedia.org/T383845) (owner: 10Scott French)
[23:51:46] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10527341 (10Jhancock.wm) we received two each of these.   SFP-GIG BASE-T RJ45 SFP28-25GE-LR SFP+ 10GE-LR QSFP28-100GB-CWDM4 QSFP...