[00:31:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:33:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.54% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[00:38:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.54% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[00:38:55] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1239799
[00:38:55] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1239799 (owner: 10TrainBranchBot)
[00:50:27] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1239799 (owner: 10TrainBranchBot)
[01:01:19] <icinga-wm>	 PROBLEM - dump of db_inventory in codfw on backupmon1001 is CRITICAL: Last dump for db_inventory at codfw (db2185) taken on 2026-02-17 00:36:52 is 132 KiB, but the previous one was 112 KiB, a change of +18.2 % https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[01:08:57] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1239802
[01:08:58] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1239802 (owner: 10TrainBranchBot)
[01:13:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:16:17] <icinga-wm>	 RECOVERY - dump of analytics_meta in eqiad on backupmon1001 is OK: Last dump for analytics_meta at eqiad (db1208) taken on 2026-02-17 01:09:28 (1.2 GiB, +0.7 %) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[01:16:29] <icinga-wm>	 PROBLEM - dump of db_inventory in eqiad on backupmon1001 is CRITICAL: Last dump for db_inventory at eqiad (db1215) taken on 2026-02-17 00:40:03 is 131 KiB, but the previous one was 111 KiB, a change of +18.0 % https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[01:31:13] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1239802 (owner: 10TrainBranchBot)
[01:33:12] <icinga-wm>	 RECOVERY - dump of matomo in eqiad on backupmon1001 is OK: Last dump for matomo at eqiad (db1208) taken on 2026-02-17 01:08:37 (818 MiB, +3.9 %) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[02:00:41] <logmsgbot>	 !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image
[02:08:52] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.46.0-wmf.16 [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239804 (https://phabricator.wikimedia.org/T413807)
[02:08:54] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/1.46.0-wmf.16 [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239804 (https://phabricator.wikimedia.org/T413807) (owner: 10TrainBranchBot)
[02:09:19] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:10:15] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:13:42] <logmsgbot>	 !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 13m 01s)
[02:14:19] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:19:40] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.46.0-wmf.16 [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239804 (https://phabricator.wikimedia.org/T413807) (owner: 10TrainBranchBot)
[02:31:59] <wikibugs>	 (03CR) 10ArielGlenn: python tests: use type hints (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239529 (owner: 10Daniel Kinzler)
[02:34:19] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:00:05] <jouncebot>	 Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T0300)
[03:15:15] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:18:13] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[03:19:20] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:44:05] <wikibugs>	 (03CR) 10ArielGlenn: "Except for the one typo I think this is good to go. But I'd like to understand better your thoughts about test coverage and so on, later." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225085 (owner: 10Daniel Kinzler)
[04:00:05] <jouncebot>	 Deploy window Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T0400)
[04:02:05] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis to 1.46.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239813 (https://phabricator.wikimedia.org/T413807)
[04:02:08] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Initiated by mwpresync@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239813 (https://phabricator.wikimedia.org/T413807) (owner: 10TrainBranchBot)
[04:03:01] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis to 1.46.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239813 (https://phabricator.wikimedia.org/T413807) (owner: 10TrainBranchBot)
[04:03:30] <logmsgbot>	 !log mwpresync@deploy2002 Started scap sync-world: testwikis to 1.46.0-wmf.16  refs T413807
[04:03:34] <stashbot>	 T413807: 1.46.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T413807
[04:15:15] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:19:19] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:19:28] <wikibugs>	 (03CR) 10ArielGlenn: "The limits look ok in the current version of the patch; I have tested nothing yet, just one careful  reading." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225699 (https://phabricator.wikimedia.org/T413183) (owner: 10Daniel Kinzler)
[04:19:32] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1013:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1013:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[04:24:05] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, February 19 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238432 (https://phabricator.wikimedia.org/T415588) (owner: 10Bartosz Dziewoński)
[04:29:17] <jinxer-wm>	 FIRING: [4x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[04:31:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:32:21] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: Configure rate limit class for local bots (and local-bot global group) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238432 (https://phabricator.wikimedia.org/T415588)
[04:32:52] <wikibugs>	 (03PS3) 10Bartosz Dziewoński: Configure rate limit class for local bots (and local-bot global group) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238432 (https://phabricator.wikimedia.org/T415588)
[04:47:39] <logmsgbot>	 !log mwpresync@deploy2002 Finished scap sync-world: testwikis to 1.46.0-wmf.16  refs T413807 (duration: 44m 09s)
[04:47:44] <stashbot>	 T413807: 1.46.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T413807
[05:00:04] <jouncebot>	 Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T0500)
[05:01:13] <logmsgbot>	 !log mwpresync@deploy2002 Pruned MediaWiki: 1.46.0-wmf.13 (duration: 01m 11s)
[05:02:27] <wikibugs>	 (03PS1) 10C. Scott Ananian: Add ParserOutputFlags::PREVENT_SELECTIVE_UPDATE [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239855 (https://phabricator.wikimedia.org/T348236)
[05:02:54] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 17 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239855 (https://phabricator.wikimedia.org/T348236) (owner: 10C. Scott Ananian)
[05:13:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:14:25] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add ParserOutputFlags::PREVENT_SELECTIVE_UPDATE [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239855 (https://phabricator.wikimedia.org/T348236) (owner: 10C. Scott Ananian)
[05:19:19] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:24:19] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:16:57] <wikibugs>	 (03CR) 10Marostegui: "I still don't see a dry-run or a confirmation before running this, did I miss it?" [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) (owner: 10Federico Ceratto)
[06:21:18] <wikibugs>	 (03CR) 10Marostegui: mysql: update replication source (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) (owner: 10Federico Ceratto)
[07:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T0700)
[07:00:04] <jouncebot>	 marostegui, Amir1, and federico3: Time to do the Primary database switchover deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T0700).
[07:18:13] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:20:15] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:24:19] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:38:23] <wikibugs>	 (03PS1) 10Arnaudb: gerrit: bump Jetty threads [puppet] - 10https://gerrit.wikimedia.org/r/1239872 (https://phabricator.wikimedia.org/T417536)
[07:52:23] <wikibugs>	 (03CR) 10Elukey: [C:03+1] AM: only send critical I/F alerts to the I/F IRC chan [puppet] - 10https://gerrit.wikimedia.org/r/1239674 (owner: 10Ayounsi)
[07:53:13] <wikibugs>	 (03CR) 10Elukey: [C:03+1] profile::puppet::agent: Remove support for Buster [puppet] - 10https://gerrit.wikimedia.org/r/1239749 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[07:53:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Let's give it a shot" [puppet] - 10https://gerrit.wikimedia.org/r/1239674 (owner: 10Ayounsi)
[07:58:12] <wikibugs>	 (03PS1) 10Thiemo Kreuz (WMDE): Add instrument for clicks in TOC references link [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239873 (https://phabricator.wikimedia.org/T415910)
[07:59:08] <wikibugs>	 (03PS1) 10Thiemo Kreuz (WMDE): Add instrument for clicks in footnotes in the article [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239877 (https://phabricator.wikimedia.org/T415909)
[08:00:05] <jouncebot>	 Amir1, Urbanecm, and awight: #bothumor My software never has bugs. It just develops random features. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T0800).
[08:00:05] <jouncebot>	 Thiemo_WMDE: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:03:24] <wikibugs>	 (03PS1) 10Jelto: wikimedia: revert gerrit behind the CDN [dns] - 10https://gerrit.wikimedia.org/r/1239878 (https://phabricator.wikimedia.org/T417497)
[08:04:25] <wikibugs>	 (03CR) 10Jelto: [C:04-1] "I'd like to try other options first (increasing timeouts, reducing concurrency etc.)" [dns] - 10https://gerrit.wikimedia.org/r/1239878 (https://phabricator.wikimedia.org/T417497) (owner: 10Jelto)
[08:06:24] <wikibugs>	 (03PS26) 10Daniel Kinzler: rest gateway: add tests for chart rendering [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225085
[08:06:33] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "let's try that, I assume this needs a gerrit restart to use the new config?" [puppet] - 10https://gerrit.wikimedia.org/r/1239872 (https://phabricator.wikimedia.org/T417536) (owner: 10Arnaudb)
[08:07:04] <wikibugs>	 (03CR) 10Daniel Kinzler: rest gateway: add tests for chart rendering (036 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225085 (owner: 10Daniel Kinzler)
[08:09:24] <wikibugs>	 (03CR) 10Hashar: [C:04-1] "The Jetty metrics are exposed to Prometheus and they can be seen at https://grafana.wikimedia.org/d/fe848RoMz/http-jetty?var-instance=gerr" [puppet] - 10https://gerrit.wikimedia.org/r/1239872 (https://phabricator.wikimedia.org/T417536) (owner: 10Arnaudb)
[08:12:02] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] "we can adopt those 😊" [alerts] - 10https://gerrit.wikimedia.org/r/1238361 (https://phabricator.wikimedia.org/T416985) (owner: 10Blake)
[08:12:59] <wikibugs>	 (03CR) 10Daniel Kinzler: rest gateway: implement per-policy shadow mode (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225699 (https://phabricator.wikimedia.org/T413183) (owner: 10Daniel Kinzler)
[08:12:59] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] validating-admission-policies: add /srv/parsoid-testing (vanilla) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239169 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[08:14:13] <wikibugs>	 (03PS13) 10Daniel Kinzler: rest gateway: implement per-policy shadow mode [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225699 (https://phabricator.wikimedia.org/T413183)
[08:14:46] <wikibugs>	 (03Merged) 10jenkins-bot: validating-admission-policies: add /srv/parsoid-testing (vanilla) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239169 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[08:14:49] <wikibugs>	 (03PS7) 10Effie Mouzeli: kubernetes::mediawiki_experimental: add parsoid repo #3 [puppet] - 10https://gerrit.wikimedia.org/r/1238345 (https://phabricator.wikimedia.org/T386246)
[08:14:53] <wikibugs>	 (03CR) 10Effie Mouzeli: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1238345 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[08:17:37] <wikibugs>	 (03CR) 10Arnaudb: "checking on the explore version of the threadpool graph: https://grafana.wikimedia.org/goto/ONyXiUDvg?orgId=1 there is fairly frequently a" [puppet] - 10https://gerrit.wikimedia.org/r/1239872 (https://phabricator.wikimedia.org/T417536) (owner: 10Arnaudb)
[08:19:39] <WMDE-Fisch>	 \o I will deploy that patch from the backport window
[08:21:11] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by wmde-fisch@deploy2002 using scap backport" [extensions/Cite] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1239573 (https://phabricator.wikimedia.org/T416630) (owner: 10WMDE-Fisch)
[08:24:53] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Parsoid: Add safeguard when checking for reflist template [extensions/Cite] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1239573 (https://phabricator.wikimedia.org/T416630) (owner: 10WMDE-Fisch)
[08:25:50] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by wmde-fisch@deploy2002 using scap backport" [extensions/Cite] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1239573 (https://phabricator.wikimedia.org/T416630) (owner: 10WMDE-Fisch)
[08:27:25] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] AM: only send critical I/F alerts to the I/F IRC chan [puppet] - 10https://gerrit.wikimedia.org/r/1239674 (owner: 10Ayounsi)
[08:29:32] <jinxer-wm>	 FIRING: [4x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:29:59] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: reduce rr-wikidata workers to fix resource contention [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239880 (https://phabricator.wikimedia.org/T414060)
[08:30:55] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
[08:30:57] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
[08:31:36] <wikibugs>	 (03PS3) 10Effie Mouzeli: validating-admission-policies: add /srv/parsoid-testing #0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239695 (https://phabricator.wikimedia.org/T386246)
[08:31:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:32:07] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 17 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239873 (https://phabricator.wikimedia.org/T415910) (owner: 10Thiemo Kreuz (WMDE))
[08:32:18] <wikibugs>	 (03PS2) 10Muehlenhoff: standard_packages: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/1239688
[08:32:20] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 17 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/Cite] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239877 (https://phabricator.wikimedia.org/T415909) (owner: 10Thiemo Kreuz (WMDE))
[08:32:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove now obsolete Cumin aliases for Buster and Puppet 5 [puppet] - 10https://gerrit.wikimedia.org/r/1239648 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[08:34:20] <wikibugs>	 (03CR) 10CI reject: [V:04-1] standard_packages: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/1239688 (owner: 10Muehlenhoff)
[08:34:53] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS trixie
[08:36:25] <wikibugs>	 (03Merged) 10jenkins-bot: Parsoid: Add safeguard when checking for reflist template [extensions/Cite] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1239573 (https://phabricator.wikimedia.org/T416630) (owner: 10WMDE-Fisch)
[08:37:36] <logmsgbot>	 !log wmde-fisch@deploy2002 Started scap sync-world: Backport for [[gerrit:1239573|Parsoid: Add safeguard when checking for reflist template (T416630)]]
[08:37:40] <stashbot>	 T416630: TypeError: Wikimedia\Parsoid\Core\DOMCompat::getPreviousElementSibling(): Argument #1 ($node) must be of type DOMElement|DOMCharacterData|Wikimedia\Parsoid\DOM\Element|Wikimedia\Parsoid\DOM\CharacterData, null given - https://phabricator.wikimedia.org/T416630
[08:41:19] <wikibugs>	 (03PS3) 10Muehlenhoff: standard_packages: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/1239688
[08:41:51] <logmsgbot>	 !log wmde-fisch@deploy2002 wmde-fisch: Backport for [[gerrit:1239573|Parsoid: Add safeguard when checking for reflist template (T416630)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[08:42:20] <logmsgbot>	 !log wmde-fisch@deploy2002 wmde-fisch: Continuing with sync
[08:43:33] <wikibugs>	 (03PS1) 10Jelto: gerrit: disable nftables throttling [puppet] - 10https://gerrit.wikimedia.org/r/1239881 (https://phabricator.wikimedia.org/T417536)
[08:43:42] <wikibugs>	 (03CR) 10Majavah: [C:03+1] Run cloudlb spec tests on Trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239603 (owner: 10Muehlenhoff)
[08:45:30] <wikibugs>	 (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8044/co" [puppet] - 10https://gerrit.wikimedia.org/r/1239881 (https://phabricator.wikimedia.org/T417536) (owner: 10Jelto)
[08:47:36] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239688 (owner: 10Muehlenhoff)
[08:48:26] <logmsgbot>	 !log wmde-fisch@deploy2002 Finished scap sync-world: Backport for [[gerrit:1239573|Parsoid: Add safeguard when checking for reflist template (T416630)]] (duration: 10m 51s)
[08:48:31] <stashbot>	 T416630: TypeError: Wikimedia\Parsoid\Core\DOMCompat::getPreviousElementSibling(): Argument #1 ($node) must be of type DOMElement|DOMCharacterData|Wikimedia\Parsoid\DOM\Element|Wikimedia\Parsoid\DOM\CharacterData, null given - https://phabricator.wikimedia.org/T416630
[08:50:41] <WMDE-Fisch>	 I'm done!
[08:51:08] <wikibugs>	 (03CR) 10Jelto: [V:03+1 C:03+2] gerrit: disable nftables throttling [puppet] - 10https://gerrit.wikimedia.org/r/1239881 (https://phabricator.wikimedia.org/T417536) (owner: 10Jelto)
[08:54:15] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
[08:56:15] <logmsgbot>	 !log trueg@deploy2002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[08:56:22] <logmsgbot>	 !log trueg@deploy2002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[08:57:04] <wikibugs>	 (03CR) 10Hashar: [C:04-1] "If we had an issue in the Jetty pool I would expect it to log a warning/error and I don't see any." [puppet] - 10https://gerrit.wikimedia.org/r/1239872 (https://phabricator.wikimedia.org/T417536) (owner: 10Arnaudb)
[08:58:53] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
[09:01:44] <wikibugs>	 (03PS1) 10Muehlenhoff: Run the Bacula spec tests on Bookworm/Trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239882
[09:04:16] <wikibugs>	 (03PS1) 10Jelto: gerrit: fix nftables exporter [puppet] - 10https://gerrit.wikimedia.org/r/1239883 (https://phabricator.wikimedia.org/T417618)
[09:08:06] <wikibugs>	 (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8045/co" [puppet] - 10https://gerrit.wikimedia.org/r/1239883 (https://phabricator.wikimedia.org/T417618) (owner: 10Jelto)
[09:09:03] <wikibugs>	 (03CR) 10Jelto: [V:03+1 C:03+2] gerrit: fix nftables exporter [puppet] - 10https://gerrit.wikimedia.org/r/1239883 (https://phabricator.wikimedia.org/T417618) (owner: 10Jelto)
[09:09:43] <wikibugs>	 (03CR) 10Muehlenhoff: "Oh, I missed that, will amend the patch." [puppet] - 10https://gerrit.wikimedia.org/r/1239603 (owner: 10Muehlenhoff)
[09:09:46] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Run cloudlb spec tests on Trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239603 (owner: 10Muehlenhoff)
[09:10:33] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "\o/ \o/ \o/ \o/" [puppet] - 10https://gerrit.wikimedia.org/r/1239688 (owner: 10Muehlenhoff)
[09:14:36] <wikibugs>	 (03PS1) 10Trueg: rdf-streaming-updater: Update to flink-1.20.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239884 (https://phabricator.wikimedia.org/T414430)
[09:14:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:14:50] <wikibugs>	 (03CR) 10Vgutierrez: "> This change is attached to T417536 and I am convinced that is due to the TCP proxy in between Gerrit and the CDN rather than Apache/Jett" [puppet] - 10https://gerrit.wikimedia.org/r/1239872 (https://phabricator.wikimedia.org/T417536) (owner: 10Arnaudb)
[09:16:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] standard_packages: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/1239688 (owner: 10Muehlenhoff)
[09:17:18] <wikibugs>	 06SRE, 06Infrastructure-Foundations: offboarding Alex Kosiaris - https://phabricator.wikimedia.org/T417465#11621816 (10Peachey88)
[09:17:41] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] "Can go through anytime" [puppet] - 10https://gerrit.wikimedia.org/r/1239882 (owner: 10Muehlenhoff)
[09:18:00] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS trixie
[09:18:19] <dcausse>	 jouncebot: nowandnext
[09:18:19] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 41 minute(s)
[09:18:19] <jouncebot>	 In 1 hour(s) and 41 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1100)
[09:18:46] <wikibugs>	 (03CR) 10DCausse: [C:03+2] rdf-streaming-updater: Update to flink-1.20.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239884 (https://phabricator.wikimedia.org/T414430) (owner: 10Trueg)
[09:19:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Enable Bird 2.18 on codfw1dev hosts for cloudlb/cloudservices [puppet] - 10https://gerrit.wikimedia.org/r/1239322 (https://phabricator.wikimedia.org/T413740) (owner: 10Muehlenhoff)
[09:20:45] <wikibugs>	 (03Merged) 10jenkins-bot: rdf-streaming-updater: Update to flink-1.20.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239884 (https://phabricator.wikimedia.org/T414430) (owner: 10Trueg)
[09:21:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] bullseye tracking: phab references [puppet] - 10https://gerrit.wikimedia.org/r/1239334 (owner: 10Muehlenhoff)
[09:22:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] production-m2.sql.erb: Update comment [puppet] - 10https://gerrit.wikimedia.org/r/1239690 (owner: 10Muehlenhoff)
[09:22:26] <wikibugs>	 (03PS1) 10Marostegui: pc102[14]: New hosts [puppet] - 10https://gerrit.wikimedia.org/r/1239885 (https://phabricator.wikimedia.org/T417070)
[09:22:40] <wikibugs>	 (03CR) 10Marostegui: "This is a noop" [puppet] - 10https://gerrit.wikimedia.org/r/1239885 (https://phabricator.wikimedia.org/T417070) (owner: 10Marostegui)
[09:23:07] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] pc102[14]: New hosts [puppet] - 10https://gerrit.wikimedia.org/r/1239885 (https://phabricator.wikimedia.org/T417070) (owner: 10Marostegui)
[09:24:09] <logmsgbot>	 !log trueg@deploy2002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[09:24:20] <logmsgbot>	 !log trueg@deploy2002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[09:25:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] profile::puppet::agent: Remove support for Buster [puppet] - 10https://gerrit.wikimedia.org/r/1239749 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[09:25:49] <logmsgbot>	 !log trueg@deploy2002 helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
[09:26:54] <logmsgbot>	 !log trueg@deploy2002 helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
[09:27:07] <wikibugs>	 (03PS1) 10Brouberol: spark3.4: update debian build dependencies [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239886 (https://phabricator.wikimedia.org/T416455)
[09:28:20] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:28:36] <wikibugs>	 (03PS1) 10Brouberol: flink-operator: upgrade to 1.14 based on bookworm [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239887 (https://phabricator.wikimedia.org/T416455)
[09:33:53] <logmsgbot>	 !log trueg@deploy2002 helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
[09:34:14] <logmsgbot>	 !log trueg@deploy2002 helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
[09:34:42] <wikibugs>	 (03PS2) 10Brouberol: flink-operator: upgrade to 1.14 based on bookworm [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239887 (https://phabricator.wikimedia.org/T416455)
[09:34:42] <wikibugs>	 (03PS2) 10Brouberol: spark3.4: update debian build dependencies [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239886 (https://phabricator.wikimedia.org/T416455)
[09:39:54] <wikibugs>	 (03PS1) 10Vgutierrez: trafficserver: Disable connection re-use for gerrit [puppet] - 10https://gerrit.wikimedia.org/r/1239888 (https://phabricator.wikimedia.org/T417536)
[09:41:02] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239888 (https://phabricator.wikimedia.org/T417536) (owner: 10Vgutierrez)
[09:41:45] <wikibugs>	 (03PS1) 10Muehlenhoff: Unconditionally  install puppet-module-puppetlabs-augeas-core [puppet] - 10https://gerrit.wikimedia.org/r/1239889
[09:41:49] <wikibugs>	 (03PS1) 10Brouberol: flink-operator-crds: upgrade CRDs to release 1.14.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239890 (https://phabricator.wikimedia.org/T416455)
[09:42:04] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm thanks for finding this settings. Let's try this" [puppet] - 10https://gerrit.wikimedia.org/r/1239888 (https://phabricator.wikimedia.org/T417536) (owner: 10Vgutierrez)
[09:44:34] <wikibugs>	 (03PS3) 10Brouberol: spark3.4: update debian build dependencies [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239886 (https://phabricator.wikimedia.org/T416455)
[09:44:44] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
[09:44:47] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
[09:44:56] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] wmcs: infra-tracing-nfs support non-k8s nodes [puppet] - 10https://gerrit.wikimedia.org/r/1239689 (https://phabricator.wikimedia.org/T415199) (owner: 10Volans)
[09:46:24] <wikibugs>	 (03CR) 10DCausse: [C:03+1] flink-operator-crds: upgrade CRDs to release 1.14.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239890 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[09:47:37] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove puppetmaster::monitoring and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239891 (https://phabricator.wikimedia.org/T365798)
[09:49:10] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] trafficserver: Disable connection re-use for gerrit [puppet] - 10https://gerrit.wikimedia.org/r/1239888 (https://phabricator.wikimedia.org/T417536) (owner: 10Vgutierrez)
[09:52:37] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239891 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[09:57:13] <icinga-wm>	 PROBLEM - orchestrator process on dborch1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args orchestrator http https://wikitech.wikimedia.org/wiki/Orchestrator
[09:57:33] <icinga-wm>	 PROBLEM - orchestrator TCP port on dborch1001 is CRITICAL: connect to address 127.0.0.1 and port 3000: Connection refused https://wikitech.wikimedia.org/wiki/Orchestrator
[09:57:37] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] hiera: remove ms-be20[57-61] for decom [puppet] - 10https://gerrit.wikimedia.org/r/1239735 (https://phabricator.wikimedia.org/T404771) (owner: 10MVernon)
[09:57:53] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] codfw swift: remove drained ms-be20[57-61] for decom [puppet] - 10https://gerrit.wikimedia.org/r/1239734 (https://phabricator.wikimedia.org/T404771) (owner: 10MVernon)
[09:58:24] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] trafficserver: Disable connection re-use for gerrit [puppet] - 10https://gerrit.wikimedia.org/r/1239888 (https://phabricator.wikimedia.org/T417536) (owner: 10Vgutierrez)
[09:59:16] <wikibugs>	 (03PS2) 10Blake: kubernetes-prod: Target serviceops, rather than sre. [alerts] - 10https://gerrit.wikimedia.org/r/1238361 (https://phabricator.wikimedia.org/T416985)
[09:59:51] <wikibugs>	 (03CR) 10Blake: "Done" [alerts] - 10https://gerrit.wikimedia.org/r/1238361 (https://phabricator.wikimedia.org/T416985) (owner: 10Blake)
[10:00:13] <icinga-wm>	 RECOVERY - orchestrator process on dborch1001 is OK: PROCS OK: 1 process with regex args orchestrator http https://wikitech.wikimedia.org/wiki/Orchestrator
[10:00:38] <icinga-wm>	 RECOVERY - orchestrator TCP port on dborch1001 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 3000 https://wikitech.wikimedia.org/wiki/Orchestrator
[10:01:50] <wikibugs>	 (03CR) 10Arnaudb: "> This change is attached to T417536 and I am convinced that is due to the TCP proxy in between Gerrit and the CDN rather than Apache/Jett" [puppet] - 10https://gerrit.wikimedia.org/r/1239872 (https://phabricator.wikimedia.org/T417536) (owner: 10Arnaudb)
[10:01:51] <wikibugs>	 (03CR) 10Daniel Kinzler: [C:04-1] "Make sure this works with python 3.9" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239529 (owner: 10Daniel Kinzler)
[10:06:22] <wikibugs>	 (03CR) 10MVernon: [C:03+2] codfw swift: remove drained ms-be20[57-61] for decom [puppet] - 10https://gerrit.wikimedia.org/r/1239734 (https://phabricator.wikimedia.org/T404771) (owner: 10MVernon)
[10:06:25] <wikibugs>	 (03CR) 10Blake: [C:03+2] kubernetes-prod: Target serviceops, rather than sre. [alerts] - 10https://gerrit.wikimedia.org/r/1238361 (https://phabricator.wikimedia.org/T416985) (owner: 10Blake)
[10:07:40] <wikibugs>	 (03Merged) 10jenkins-bot: kubernetes-prod: Target serviceops, rather than sre. [alerts] - 10https://gerrit.wikimedia.org/r/1238361 (https://phabricator.wikimedia.org/T416985) (owner: 10Blake)
[10:09:03] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove puppetmaster::gitclone and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239895 (https://phabricator.wikimedia.org/T365798)
[10:09:46] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] admin_ng: add ValidatingAdmissionPolicy for mw-parsoid #1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239174 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[10:12:26] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239880 (https://phabricator.wikimedia.org/T414060) (owner: 10Kevin Bazira)
[10:12:43] <wikibugs>	 (03PS1) 10Ayounsi: WIP: create cookbook to depool all services in a given rack [cookbooks] - 10https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300)
[10:13:14] <wikibugs>	 (03PS2) 10Ayounsi: WIP: create cookbook to depool all services in a given rack [cookbooks] - 10https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300)
[10:14:09] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239895 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[10:16:07] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove puppetmaster::r10k [puppet] - 10https://gerrit.wikimedia.org/r/1239897 (https://phabricator.wikimedia.org/T365798)
[10:17:46] <wikibugs>	 (03Merged) 10jenkins-bot: admin_ng: add ValidatingAdmissionPolicy for mw-parsoid #1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239174 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[10:18:40] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove puppetmaster::rsync and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239898 (https://phabricator.wikimedia.org/T365798)
[10:19:08] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WIP: create cookbook to depool all services in a given rack [cookbooks] - 10https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300) (owner: 10Ayounsi)
[10:19:49] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] ml-services: reduce rr-wikidata workers to fix resource contention [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239880 (https://phabricator.wikimedia.org/T414060) (owner: 10Kevin Bazira)
[10:20:37] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting update of Raymond Ndibe's SSH key to Yubikey-backed key - https://phabricator.wikimedia.org/T417594#11622170 (10MatthewVernon)
[10:21:07] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting update of Raymond Ndibe's SSH key to Yubikey-backed key - https://phabricator.wikimedia.org/T417594#11622172 (10MatthewVernon) I've reached out to seek out-of-band confirmation of the new pubkey, then this request is good to go.
[10:21:31] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove puppetmaster::web_frontend and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239899 (https://phabricator.wikimedia.org/T365798)
[10:21:32] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: reduce rr-wikidata workers to fix resource contention [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239880 (https://phabricator.wikimedia.org/T414060) (owner: 10Kevin Bazira)
[10:24:13] <logmsgbot>	 !log kevinbazira@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[10:26:09] <wikibugs>	 (03PS2) 10Muehlenhoff: Unconditionally  install puppet-module-puppetlabs-augeas-core [puppet] - 10https://gerrit.wikimedia.org/r/1239889
[10:26:33] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Run the Bacula spec tests on Bookworm/Trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239882 (owner: 10Muehlenhoff)
[10:26:36] <logmsgbot>	 !log kevinbazira@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[10:30:53] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1160 (T415786)', diff saved to https://phabricator.wikimedia.org/P88836 and previous config saved to /var/cache/conftool/dbconfig/20260217-103053-marostegui.json
[10:30:57] <stashbot>	 T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
[10:34:23] <wikibugs>	 (03CR) 10Elukey: flink-operator: upgrade to 1.14 based on bookworm (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239887 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[10:35:08] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] validating-admission-policies: add /srv/parsoid-testing #0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239695 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[10:36:24] <wikibugs>	 (03PS1) 10Dreamy Jazz: Take locked users into account for case auto-closure [extensions/CheckUser] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239901 (https://phabricator.wikimedia.org/T417013)
[10:36:35] <Dreamy_Jazz>	 jouncebot: nowandnext
[10:36:35] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 23 minute(s)
[10:36:35] <jouncebot>	 In 0 hour(s) and 23 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1100)
[10:37:05] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [extensions/CheckUser] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239901 (https://phabricator.wikimedia.org/T417013) (owner: 10Dreamy Jazz)
[10:37:17] <wikibugs>	 (03Merged) 10jenkins-bot: validating-admission-policies: add /srv/parsoid-testing #0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239695 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[10:38:52] <wikibugs>	 (03PS3) 10Brouberol: flink-operator: upgrade to 1.14 based on bookworm [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239887 (https://phabricator.wikimedia.org/T416455)
[10:38:53] <wikibugs>	 (03PS4) 10Brouberol: spark3.4: update debian build dependencies [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239886 (https://phabricator.wikimedia.org/T416455)
[10:39:12] <wikibugs>	 (03CR) 10Brouberol: flink-operator: upgrade to 1.14 based on bookworm (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239887 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[10:39:58] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239897 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[10:40:30] <wikibugs>	 (03CR) 10Volans: [C:03+2] wmcs: infra-tracing-nfs support non-k8s nodes [puppet] - 10https://gerrit.wikimedia.org/r/1239689 (https://phabricator.wikimedia.org/T415199) (owner: 10Volans)
[10:44:58] <moritzm>	 !log upgrading clamav on vrts2002 to 1.4.3
[10:45:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:46:02] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P88837 and previous config saved to /var/cache/conftool/dbconfig/20260217-104601-marostegui.json
[10:46:29] <wikibugs>	 (03PS1) 10Brouberol: flink-operator: upgrade to 1.14 based on bookworm [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239903 (https://phabricator.wikimedia.org/T416455)
[10:50:25] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239898 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[10:52:08] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1239901|Take locked users into account for case auto-closure (T417013)]]
[10:52:13] <stashbot>	 T417013: Suggested Investigations: Take locked users into account for case auto-closure - https://phabricator.wikimedia.org/T417013
[10:56:01] <Dreamy_Jazz>	 jouncebot: nowandnext
[10:56:01] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 3 minute(s)
[10:56:01] <jouncebot>	 In 0 hour(s) and 3 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1100)
[10:56:29] <Dreamy_Jazz>	 My backport will probably take longer than usual as it includes i18n changes
[10:59:13] <wikibugs>	 (03CR) 10Elukey: [C:03+1] flink-operator: upgrade to 1.14 based on bookworm [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239887 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[10:59:43] <wikibugs>	 (03CR) 10Elukey: [C:03+1] "I guess it failed to build or similar so we don't need to bump the version." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239886 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1100)
[11:01:10] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P88838 and previous config saved to /var/cache/conftool/dbconfig/20260217-110109-marostegui.json
[11:05:08] <wikibugs>	 (03PS1) 10Volans: wmcs: infra-tracing-nfs fix variable [puppet] - 10https://gerrit.wikimedia.org/r/1239905 (https://phabricator.wikimedia.org/T415199)
[11:06:16] <moritzm>	 !log upgrading clamav on vrts1003 to 1.4.3
[11:06:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:06] <wikibugs>	 (03CR) 10Volans: "PCC: https://puppet-compiler.wmflabs.org/output/1239905/8046/toolsbeta-test-k8s-worker-nfs-10.toolsbeta.eqiad1.wikimedia.cloud/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1239905 (https://phabricator.wikimedia.org/T415199) (owner: 10Volans)
[11:09:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] wmcs: infra-tracing-nfs fix variable [puppet] - 10https://gerrit.wikimedia.org/r/1239905 (https://phabricator.wikimedia.org/T415199) (owner: 10Volans)
[11:10:12] <wikibugs>	 (03CR) 10Volans: [C:03+2] wmcs: infra-tracing-nfs fix variable [puppet] - 10https://gerrit.wikimedia.org/r/1239905 (https://phabricator.wikimedia.org/T415199) (owner: 10Volans)
[11:13:43] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove puppetmaster::passenger and related files [puppet] - 10https://gerrit.wikimedia.org/r/1239907 (https://phabricator.wikimedia.org/T365798)
[11:15:20] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] cache::upload: increase global request limit on upload (browser) [puppet] - 10https://gerrit.wikimedia.org/r/1239703 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur)
[11:15:58] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] cache::upload: increase global request limit on upload (browser) [puppet] - 10https://gerrit.wikimedia.org/r/1239703 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur)
[11:16:06] <wikibugs>	 (03CR) 10Cory Massaro: [C:03+2] wikifunctions: [WIP] Specify the Rust-based evaluator releases too [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238385 (https://phabricator.wikimedia.org/T402957) (owner: 10Jforrester)
[11:16:19] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1160 (T415786)', diff saved to https://phabricator.wikimedia.org/P88839 and previous config saved to /var/cache/conftool/dbconfig/20260217-111618-marostegui.json
[11:16:23] <stashbot>	 T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
[11:16:35] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
[11:16:44] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1190 (T415786)', diff saved to https://phabricator.wikimedia.org/P88840 and previous config saved to /var/cache/conftool/dbconfig/20260217-111643-marostegui.json
[11:16:51] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz: Backport for [[gerrit:1239901|Take locked users into account for case auto-closure (T417013)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[11:16:54] <stashbot>	 T417013: Suggested Investigations: Take locked users into account for case auto-closure - https://phabricator.wikimedia.org/T417013
[11:17:09] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz: Continuing with sync
[11:18:20] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[11:18:24] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: [WIP] Specify the Rust-based evaluator releases too [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238385 (https://phabricator.wikimedia.org/T402957) (owner: 10Jforrester)
[11:18:38] <wikibugs>	 (03PS1) 10Muehlenhoff: This was only used with Puppet 5. [puppet] - 10https://gerrit.wikimedia.org/r/1239908 (https://phabricator.wikimedia.org/T365798)
[11:18:51] <wikibugs>	 (03PS1) 10Fabfur: Revert^6 "cache::upload: enable global ratelimiting (magru)" [puppet] - 10https://gerrit.wikimedia.org/r/1239909
[11:19:52] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] Revert^6 "cache::upload: enable global ratelimiting (magru)" [puppet] - 10https://gerrit.wikimedia.org/r/1239909 (owner: 10Fabfur)
[11:21:03] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] Revert^6 "cache::upload: enable global ratelimiting (magru)" [puppet] - 10https://gerrit.wikimedia.org/r/1239909 (owner: 10Fabfur)
[11:23:19] <Dreamy_Jazz>	 On sync-prod-k8s. If anyone needs to use scap, I can ping you when I'm done
[11:25:18] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove puppetmaster:ssl [puppet] - 10https://gerrit.wikimedia.org/r/1239908 (https://phabricator.wikimedia.org/T365798)
[11:25:26] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239907 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[11:28:41] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 07Puppet (Puppet 7.0): Review/cleanup content of /srv/git/private/modules/secret/secrets/ssl in the private repo - https://phabricator.wikimedia.org/T364622#11622373 (10MoritzMuehlenhoff) 05Open→03Resolved All done. We still keep two old...
[11:29:33] <logmsgbot>	 !log dreamyjazz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1239901|Take locked users into account for case auto-closure (T417013)]] (duration: 37m 25s)
[11:29:37] <stashbot>	 T417013: Suggested Investigations: Take locked users into account for case auto-closure - https://phabricator.wikimedia.org/T417013
[11:29:40] <Dreamy_Jazz>	 I'm done
[11:30:21] <wikibugs>	 (03PS3) 10Fabfur: cache::upload: enable global ratelimiting (ulsfo) [puppet] - 10https://gerrit.wikimedia.org/r/1237242 (https://phabricator.wikimedia.org/T406545)
[11:32:02] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] k8s-staging: Switch to IPIP mode [puppet] - 10https://gerrit.wikimedia.org/r/1237277 (https://phabricator.wikimedia.org/T352956) (owner: 10Alexandros Kosiaris)
[11:32:03] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove cergen [puppet] - 10https://gerrit.wikimedia.org/r/1239912 (https://phabricator.wikimedia.org/T357750)
[11:40:46] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[11:42:38] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239908 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[11:42:39] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[11:46:06] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[11:49:44] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239912 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff)
[11:50:01] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[11:51:56] <wikibugs>	 (03CR) 10Brouberol: "It did build but I had to rebuild the jre image manually. This is to ensure it's all done automatically." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239886 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[11:52:02] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] flink-operator: upgrade to 1.14 based on bookworm [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239887 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[11:52:09] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] spark3.4: update debian build dependencies [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239886 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[11:52:13] <wikibugs>	 (03CR) 10Brouberol: [V:03+2 C:03+2] flink-operator: upgrade to 1.14 based on bookworm [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239887 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[11:52:17] <wikibugs>	 (03CR) 10Brouberol: [V:03+2 C:03+2] spark3.4: update debian build dependencies [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1239886 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[11:55:09] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[11:55:24] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.13 point update - https://phabricator.wikimedia.org/T414205#11622439 (10MoritzMuehlenhoff)
[11:55:45] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove build hooks once needed to be build cergen [puppet] - 10https://gerrit.wikimedia.org/r/1239915 (https://phabricator.wikimedia.org/T357750)
[11:58:53] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 06ServiceOps new: Trixie switches rp_filter from strict (1) to loose (2) for all interfaces - https://phabricator.wikimedia.org/T417632 (10JMeybohm) 03NEW
[11:59:02] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[12:00:13] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 06ServiceOps new, 06Traffic: Trixie switches rp_filter from strict (1) to loose (2) for all interfaces - https://phabricator.wikimedia.org/T417632#11622455 (10JMeybohm) /link {T352956}
[12:00:16] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] flink-operator-crds: upgrade CRDs to release 1.14.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239890 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[12:00:20] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] flink-operator: upgrade to 1.14 based on bookworm [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239903 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[12:04:46] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove build hooks once needed to be build cergen [puppet] - 10https://gerrit.wikimedia.org/r/1239915 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff)
[12:05:02] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[12:06:41] <wikibugs>	 (03Merged) 10jenkins-bot: flink-operator-crds: upgrade CRDs to release 1.14.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239890 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[12:06:56] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10Prod-Kubernetes, 06ServiceOps new, and 5 others: Fix thumbor discovery records and make swift use them - https://phabricator.wikimedia.org/T397618#11622476 (10Clement_Goubert) >>! In T397618#11607310, @MatthewVernon wrote: > Swift calls out to thumbor via our own...
[12:08:32] <wikibugs>	 (03Merged) 10jenkins-bot: flink-operator: upgrade to 1.14 based on bookworm [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239903 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[12:10:26] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 06ServiceOps new, 06Traffic: Trixie switches rp_filter from strict (1) to loose (2) for all interfaces - https://phabricator.wikimedia.org/T417632#11622481 (10JMeybohm) I seem to lack caffeine, the revert was already reverted at: https://gerrit.wikimedia.org/r/c/operati...
[12:12:45] <wikibugs>	 (03PS1) 10Marostegui: monitor_eventscheduler.pp: Monitor event_scheduler on core hosts [puppet] - 10https://gerrit.wikimedia.org/r/1239919 (https://phabricator.wikimedia.org/T254738)
[12:13:08] <wikibugs>	 (03PS2) 10Marostegui: core.pp: Monitor event_scheduler on core hosts [puppet] - 10https://gerrit.wikimedia.org/r/1239919 (https://phabricator.wikimedia.org/T254738)
[12:15:34] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[12:17:58] <wikibugs>	 (03PS1) 10Muehlenhoff: varnishkafka: Remove some obsolete references to cergen [puppet] - 10https://gerrit.wikimedia.org/r/1239921 (https://phabricator.wikimedia.org/T357750)
[12:19:32] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 06ServiceOps new, 06Traffic: Trixie switches rp_filter from strict (1) to loose (2) for all interfaces - https://phabricator.wikimedia.org/T417632#11622496 (10JMeybohm)
[12:21:11] <wikibugs>	 (03PS1) 10Brouberol: scaffold: ensure the required chart metadata gets scaffolded [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239922 (https://phabricator.wikimedia.org/T412693)
[12:22:57] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Remove cergen [puppet] - 10https://gerrit.wikimedia.org/r/1239912 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff)
[12:23:05] <wikibugs>	 (03CR) 10CI reject: [V:04-1] scaffold: ensure the required chart metadata gets scaffolded [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239922 (https://phabricator.wikimedia.org/T412693) (owner: 10Brouberol)
[12:23:30] <wikibugs>	 (03PS1) 10Muehlenhoff: Record LDAP access for catherinegroves [puppet] - 10https://gerrit.wikimedia.org/r/1239923
[12:24:31] <Reedy>	 jouncebot: nowandnext
[12:24:31] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 35 minute(s)
[12:24:31] <jouncebot>	 In 0 hour(s) and 35 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1300)
[12:25:59] <wikibugs>	 (03PS4) 10Mstyles: CommonSettings.php: Stop loading WebAuthn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233679 (https://phabricator.wikimedia.org/T303495) (owner: 10Reedy)
[12:26:09] <wikibugs>	 (03CR) 10Reedy: [C:03+2] CommonSettings.php: Stop loading WebAuthn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233679 (https://phabricator.wikimedia.org/T303495) (owner: 10Reedy)
[12:27:06] <wikibugs>	 (03Merged) 10jenkins-bot: CommonSettings.php: Stop loading WebAuthn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233679 (https://phabricator.wikimedia.org/T303495) (owner: 10Reedy)
[12:27:09] <wikibugs>	 (03PS2) 10Reedy: wmf-config: Remove $wmgUseWebAuthn and extension from extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233680 (https://phabricator.wikimedia.org/T303495)
[12:27:15] <wikibugs>	 (03CR) 10Reedy: [C:03+2] wmf-config: Remove $wmgUseWebAuthn and extension from extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233680 (https://phabricator.wikimedia.org/T303495) (owner: 10Reedy)
[12:28:08] <wikibugs>	 (03Merged) 10jenkins-bot: wmf-config: Remove $wmgUseWebAuthn and extension from extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233680 (https://phabricator.wikimedia.org/T303495) (owner: 10Reedy)
[12:28:21] <wikibugs>	 (03CR) 10Reedy: [C:04-2] "Ib0f3cad82143fe43e59f67ad03172d80d3554501 needs to be everywhere. So when .16 is stable, we should be good" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239026 (https://phabricator.wikimedia.org/T416544) (owner: 10Reedy)
[12:29:23] <logmsgbot>	 !log reedy@deploy2002 Started scap sync-world: Backport for [[gerrit:1233679|CommonSettings.php: Stop loading WebAuthn (T303495)]], [[gerrit:1233680|wmf-config: Remove $wmgUseWebAuthn and extension from extension-list (T303495)]]
[12:29:29] <stashbot>	 T303495: Merge WebAuthn extension into OATHAuth - https://phabricator.wikimedia.org/T303495
[12:29:56] <wikibugs>	 (03PS2) 10Brouberol: scaffold: ensure the required chart metadata gets scaffolded [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239922 (https://phabricator.wikimedia.org/T412693)
[12:30:25] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Record LDAP access for catherinegroves [puppet] - 10https://gerrit.wikimedia.org/r/1239923 (owner: 10Muehlenhoff)
[12:33:20] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:33:54] <logmsgbot>	 !log reedy@deploy2002 reedy: Backport for [[gerrit:1233679|CommonSettings.php: Stop loading WebAuthn (T303495)]], [[gerrit:1233680|wmf-config: Remove $wmgUseWebAuthn and extension from extension-list (T303495)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[12:34:13] <logmsgbot>	 !log reedy@deploy2002 reedy: Continuing with sync
[12:38:32] <wikibugs>	 (03CR) 10Brouberol: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239922 (https://phabricator.wikimedia.org/T412693) (owner: 10Brouberol)
[12:39:19] <icinga-wm>	 PROBLEM - orchestrator process on dborch1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args orchestrator http https://wikitech.wikimedia.org/wiki/Orchestrator
[12:39:37] <icinga-wm>	 PROBLEM - orchestrator TCP port on dborch1001 is CRITICAL: connect to address 127.0.0.1 and port 3000: Connection refused https://wikitech.wikimedia.org/wiki/Orchestrator
[12:40:21] <logmsgbot>	 !log reedy@deploy2002 Finished scap sync-world: Backport for [[gerrit:1233679|CommonSettings.php: Stop loading WebAuthn (T303495)]], [[gerrit:1233680|wmf-config: Remove $wmgUseWebAuthn and extension from extension-list (T303495)]] (duration: 10m 58s)
[12:40:25] <stashbot>	 T303495: Merge WebAuthn extension into OATHAuth - https://phabricator.wikimedia.org/T303495
[12:40:52] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] "Thank you!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239922 (https://phabricator.wikimedia.org/T412693) (owner: 10Brouberol)
[12:43:29] <wikibugs>	 (03PS1) 10Marostegui: orchestrator.sql.erb: Add replication master admin to orchestrator [puppet] - 10https://gerrit.wikimedia.org/r/1239927 (https://phabricator.wikimedia.org/T416582)
[12:43:52] <wikibugs>	 (03PS1) 10Jelto: tcpproxy: raise connection limit from 200 to 400 [puppet] - 10https://gerrit.wikimedia.org/r/1239928 (https://phabricator.wikimedia.org/T417497)
[12:43:59] <wikibugs>	 (03CR) 10Marostegui: "@fceratto@wikimedia.org can you merge this and deploy the new grant across the board?" [puppet] - 10https://gerrit.wikimedia.org/r/1239927 (https://phabricator.wikimedia.org/T416582) (owner: 10Marostegui)
[12:45:40] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] scaffold: ensure the required chart metadata gets scaffolded [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239922 (https://phabricator.wikimedia.org/T412693) (owner: 10Brouberol)
[12:46:05] <wikibugs>	 (03CR) 10Jelto: "@paladox as discussed in IRC, this bumps the connection limit. However I'd like to get a bit more feedback in T417497 before trying more c" [puppet] - 10https://gerrit.wikimedia.org/r/1239928 (https://phabricator.wikimedia.org/T417497) (owner: 10Jelto)
[12:50:34] <wikibugs>	 (03CR) 10Marostegui: "These were the previous patches:" [puppet] - 10https://gerrit.wikimedia.org/r/1239919 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui)
[12:53:14] <wikibugs>	 (03PS1) 10Muehlenhoff: Run the Redis spec tests on Bullseye [puppet] - 10https://gerrit.wikimedia.org/r/1239931
[12:53:57] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Run the Redis spec tests on Bullseye [puppet] - 10https://gerrit.wikimedia.org/r/1239931 (owner: 10Muehlenhoff)
[13:00:04] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1300)
[13:00:19] <icinga-wm>	 RECOVERY - orchestrator process on dborch1001 is OK: PROCS OK: 1 process with regex args orchestrator http https://wikitech.wikimedia.org/wiki/Orchestrator
[13:00:34] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.dns.admin DNS admin: depool site magru [reason: no reason specified, T416442]
[13:00:41] <icinga-wm>	 RECOVERY - orchestrator TCP port on dborch1001 is OK: TCP OK - 0.001 second response time on 127.0.0.1 port 3000 https://wikitech.wikimedia.org/wiki/Orchestrator
[13:00:43] <stashbot>	 T416442: magru: upgrade routers & switches (2026) - https://phabricator.wikimedia.org/T416442
[13:00:46] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site magru [reason: no reason specified, T416442]
[13:01:43] <logmsgbot>	 !log ayounsi@cumin1003 conftool action : set/pooled=no; selector: cluster=dnsbox,dc=magru [reason: magru maintenance]
[13:02:18] <logmsgbot>	 !log ayounsi@cumin1003 DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-magru,cr1-magru IPv6,cr1-magru.mgmt with reason: router upgrade
[13:03:14] <logmsgbot>	 !log ayounsi@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr1-magru,cr1-magru IPv6 with reason: router upgrade
[13:03:20] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:04:54] <logmsgbot>	 !log cgoubert@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1030.eqiad.wmnet
[13:05:26] <logmsgbot>	 !log cgoubert@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1030.eqiad.wmnet
[13:08:31] <logmsgbot>	 !log cgoubert@cumin1003 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1030.eqiad.wmnet
[13:08:32] <logmsgbot>	 !log cgoubert@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1030.eqiad.wmnet
[13:11:13] <logmsgbot>	 !log ayounsi@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-magru,cr2-magru IPv6 with reason: router upgrade
[13:12:56] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "recheck (T417536)" [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239855 (https://phabricator.wikimedia.org/T348236) (owner: 10C. Scott Ananian)
[13:13:52] <logmsgbot>	 !log ayounsi@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 40 hosts with reason: Switches upgrade
[13:14:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:18:30] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] "LGTM, good to go from my viewpoint." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225699 (https://phabricator.wikimedia.org/T413183) (owner: 10Daniel Kinzler)
[13:19:55] <wikibugs>	 (03PS1) 10JMeybohm: ipip: Disable prometheus_lvs_realserver_mss if no clamping [puppet] - 10https://gerrit.wikimedia.org/r/1239936 (https://phabricator.wikimedia.org/T352956)
[13:20:00] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 17 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239672 (owner: 10Phuedx)
[13:20:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove cergen [puppet] - 10https://gerrit.wikimedia.org/r/1239912 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff)
[13:20:08] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239936 (https://phabricator.wikimedia.org/T352956) (owner: 10JMeybohm)
[13:23:19] <XioNoX>	 !log cr1-magru> request vmhost reboot - T416442
[13:23:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:23] <stashbot>	 T416442: magru: upgrade routers & switches (2026) - https://phabricator.wikimedia.org/T416442
[13:25:18] <logmsgbot>	 !log ladsgroup@cumin1003 START - Cookbook sre.wikireplicas.update-views
[13:26:17] <wikibugs>	 (03CR) 10Elukey: [C:03+1] varnishkafka: Remove some obsolete references to cergen (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1239921 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff)
[13:26:33] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:26:51] <jinxer-wm>	 FIRING: SwitchCoreInterfaceDown: Switch core interface down - asw1-b4-magru:et-0/0/48 (Core: cr1-magru:et-0/0/2 {#70128}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=asw1-b4-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[13:27:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr2-eqiad and 195.200.68.151 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:28:31] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] ipip: Disable prometheus_lvs_realserver_mss if no clamping [puppet] - 10https://gerrit.wikimedia.org/r/1239936 (https://phabricator.wikimedia.org/T352956) (owner: 10JMeybohm)
[13:28:39] <jinxer-wm>	 FIRING: [6x] CoreBGPDown: Core BGP session down between asw1-b3-magru and cr1-magru (195.200.68.142) - group core - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[13:29:33] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T415786)', diff saved to https://phabricator.wikimedia.org/P88841 and previous config saved to /var/cache/conftool/dbconfig/20260217-132932-marostegui.json
[13:29:37] <stashbot>	 T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
[13:29:40] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:29:57] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "recheck (T417636, wtf)" [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239855 (https://phabricator.wikimedia.org/T348236) (owner: 10C. Scott Ananian)
[13:30:29] <logmsgbot>	 !log ladsgroup@cumin1003 END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
[13:31:51] <jinxer-wm>	 FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - asw1-b3-magru:et-0/0/48 (Core: cr1-magru:et-0/0/1 {#70094}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[13:32:01] <wikibugs>	 (03PS2) 10JMeybohm: ipip: Disable prometheus_lvs_realserver_mss if no clamping [puppet] - 10https://gerrit.wikimedia.org/r/1239936 (https://phabricator.wikimedia.org/T352956)
[13:32:27] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239936 (https://phabricator.wikimedia.org/T352956) (owner: 10JMeybohm)
[13:33:33] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:34:48] <wikibugs>	 (03CR) 10Hashar: [C:04-1] "Valentin wrote:" [puppet] - 10https://gerrit.wikimedia.org/r/1239872 (https://phabricator.wikimedia.org/T417536) (owner: 10Arnaudb)
[13:36:51] <jinxer-wm>	 RESOLVED: [2x] SwitchCoreInterfaceDown: Switch core interface down - asw1-b3-magru:et-0/0/48 (Core: cr1-magru:et-0/0/1 {#70094}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[13:36:59] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] cache::upload: enable global ratelimiting (ulsfo) [puppet] - 10https://gerrit.wikimedia.org/r/1237242 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur)
[13:37:10] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr2-eqiad and 195.200.68.151 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:37:58] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] cache::upload: enable global ratelimiting (ulsfo) [puppet] - 10https://gerrit.wikimedia.org/r/1237242 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur)
[13:38:39] <jinxer-wm>	 RESOLVED: [6x] CoreBGPDown: Core BGP session down between asw1-b3-magru and cr1-magru (195.200.68.142) - group core - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[13:39:13] <wikibugs>	 (03PS15) 10Tiziano Fogli: slothslos: add module to build and deploy sloth manifests [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579)
[13:39:13] <wikibugs>	 (03CR) 10Tiziano Fogli: "The patch no longer relies on the systemd::path class, as systemd::service appears to handle resource dependencies more effectively." [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli)
[13:39:29] <wikibugs>	 (03PS3) 10Anzx: sqwiki: remove editor usergroup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239329 (https://phabricator.wikimedia.org/T415196)
[13:39:44] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 17 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239329 (https://phabricator.wikimedia.org/T415196) (owner: 10Anzx)
[13:40:03] <wikibugs>	 (03PS2) 10Anzx: lift IP cap for event at Tshwane University of Technology [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239935 (https://phabricator.wikimedia.org/T417578)
[13:42:05] <wikibugs>	 (03PS3) 10Anzx: lift IP cap for event at Tshwane University of Technology [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239935 (https://phabricator.wikimedia.org/T417578)
[13:42:17] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 17 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239935 (https://phabricator.wikimedia.org/T417578) (owner: 10Anzx)
[13:42:42] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 17 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239329 (https://phabricator.wikimedia.org/T415196) (owner: 10Anzx)
[13:43:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] etcd: Remove obsolete check [puppet] - 10https://gerrit.wikimedia.org/r/1223676 (owner: 10Muehlenhoff)
[13:44:41] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P88843 and previous config saved to /var/cache/conftool/dbconfig/20260217-134440-marostegui.json
[13:46:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr2-eqiad and fe80::b6f9:5dff:fe30:cd38 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:47:03] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission puppetmaster1001 - https://phabricator.wikimedia.org/T417580#11622805 (10Jclark-ctr) a:03Jclark-ctr
[13:47:58] <XioNoX>	 !log cr2-magru> request vmhost reboot - T416442
[13:48:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:02] <stashbot>	 T416442: magru: upgrade routers & switches (2026) - https://phabricator.wikimedia.org/T416442
[13:49:10] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: disk in slot 10 for an-worker1194 - https://phabricator.wikimedia.org/T389065#11622816 (10Jclark-ctr) @RKemper  i did swap drive this morning
[13:49:15] <icinga-wm>	 PROBLEM - Host asw1-b3-magru.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[13:49:15] <icinga-wm>	 PROBLEM - Host asw1-b4-magru.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[13:49:35] <icinga-wm>	 RECOVERY - Host asw1-b4-magru.mgmt is UP: PING OK - Packet loss = 0%, RTA = 111.16 ms
[13:49:47] <icinga-wm>	 RECOVERY - Host asw1-b3-magru.mgmt is UP: PING OK - Packet loss = 0%, RTA = 111.71 ms
[13:50:45] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:51:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr2-eqiad and fe80::b6f9:5dff:fe30:cd38 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:51:20] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission puppetmaster1001 - https://phabricator.wikimedia.org/T417580#11622820 (10Jclark-ctr) 05Open→03Resolved
[13:51:44] <jinxer-wm>	 FIRING: [2x] RipeAtlasAnchorUnreachable: ipv4 ping to magru RIPE Atlas anchor: failures over threshold for measurement 95140314 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[13:51:51] <jinxer-wm>	 FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - asw1-b3-magru:et-0/0/50 (Core: cr2-magru:et-0/0/1 {#70130}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[13:52:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr2-eqdfw and 195.200.68.153 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:52:29] <Amir1>	 !log ladsgroup@deploy2002:~$ mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 echo-subscriptions-web-article-linked
[13:52:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:52:39] <jinxer-wm>	 FIRING: [2x] CoreBGPDown: Core BGP session down between cr2-eqdfw and cr2-magru (195.200.68.153) - group Confed_magru - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=codfw&var-device=cr2-eqdfw:9804&var-bgp_group=Confed_magru&var-bgp_neighbor=cr2-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[13:52:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11622821 (10Jclark-ctr) 05Open→03Resolved
[13:53:20] <wikibugs>	 (03PS6) 10Muehlenhoff: etcd: Remove the use_pki_certs flag [puppet] - 10https://gerrit.wikimedia.org/r/978615
[13:53:27] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Eqiad: row C/D switch refresh - https://phabricator.wikimedia.org/T396063#11622828 (10Jclark-ctr) 05Open→03Resolved a:03Jclark-ctr All sub task have been resolved
[13:54:13] <wikibugs>	 (03CR) 10Hnowlan: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8049/co" [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli)
[13:55:01] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/978615 (owner: 10Muehlenhoff)
[13:55:37] <wikibugs>	 (03PS2) 10Muehlenhoff: varnishkafka: Remove some obsolete references to cergen [puppet] - 10https://gerrit.wikimedia.org/r/1239921 (https://phabricator.wikimedia.org/T357750)
[13:56:44] <jinxer-wm>	 FIRING: [4x] RipeAtlasAnchorUnreachable: ipv4 ping to magru RIPE Atlas anchor: failures over threshold for measurement 95133212 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[13:57:10] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr2-eqdfw and 195.200.68.153 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:57:39] <jinxer-wm>	 FIRING: [6x] CoreBGPDown: Core BGP session down between asw1-b3-magru and cr2-magru (195.200.68.146) - group core - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[13:58:45] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:59:49] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P88844 and previous config saved to /var/cache/conftool/dbconfig/20260217-135949-marostegui.json
[14:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: #bothumor My software never has bugs. It just develops random features. Rise for UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1400).
[14:00:05] <jouncebot>	 cscott, Thiemo_WMDE, phuedx, and anzx: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:15] <Lucas_WMDE>	 I’m in a meeting, might be able to deploy later if needed
[14:00:16] <wikibugs>	 (03PS1) 10Hnowlan: thanos::rule: add ExecReload to the service unit [puppet] - 10https://gerrit.wikimedia.org/r/1239906 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli)
[14:00:50] <anzx>	 o/
[14:01:04] <cscott>	 o/
[14:01:07] <Amir1>	 let me se
[14:01:17] <cscott>	 I can spiderpig mine
[14:01:42] <Amir1>	 oh already started it
[14:01:44] <jinxer-wm>	 RESOLVED: [4x] RipeAtlasAnchorUnreachable: ipv4 ping to magru RIPE Atlas anchor: failures over threshold for measurement 95133212 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[14:01:46] <phuedx>	 o/
[14:01:51] <jinxer-wm>	 RESOLVED: [2x] SwitchCoreInterfaceDown: Switch core interface down - asw1-b3-magru:et-0/0/50 (Core: cr2-magru:et-0/0/1 {#70130}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[14:01:54] <Amir1>	 cscott: let me know once you're done so I do the rest
[14:02:02] <cscott>	 Ok! 
[14:02:10] <jinxer-wm>	 RESOLVED: [3x] BFDdown: BFD session down between cr2-eqdfw and 195.200.68.153 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[14:02:39] <jinxer-wm>	 RESOLVED: [6x] CoreBGPDown: Core BGP session down between asw1-b3-magru and cr2-magru (195.200.68.146) - group core - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[14:02:45] <wikibugs>	 (03CR) 10MVernon: [C:03+2] hiera: remove ms-be20[57-61] for decom [puppet] - 10https://gerrit.wikimedia.org/r/1239735 (https://phabricator.wikimedia.org/T404771) (owner: 10MVernon)
[14:03:10] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239855 (https://phabricator.wikimedia.org/T348236) (owner: 10C. Scott Ananian)
[14:03:49] <wikibugs>	 (03CR) 10Hnowlan: [V:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8050/console" [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli)
[14:03:56] <XioNoX>	 !log asw1-b3-magru> request system reboot - T416442
[14:03:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:04:00] <stashbot>	 T416442: magru: upgrade routers & switches (2026) - https://phabricator.wikimedia.org/T416442
[14:06:59] <icinga-wm>	 PROBLEM - Host asw1-b3-magru is DOWN: PING CRITICAL - Packet loss = 100%
[14:06:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] varnishkafka: Remove some obsolete references to cergen (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1239921 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff)
[14:07:03] <vgutierrez>	 !log upload golang-github-florianl-go-tc_0.4.7 to trixie-wikimedia (apt.wm.o) - T401832
[14:07:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:07] <stashbot>	 T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832
[14:07:15] <icinga-wm>	 PROBLEM - Host asw1-b3-magru.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:07:21] <wikibugs>	 (03CR) 10Hnowlan: [V:03+1] "This mostly lgtm, the parent patch needs to move out of WIP though" [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli)
[14:07:27] <wikibugs>	 (03Merged) 10jenkins-bot: Add ParserOutputFlags::PREVENT_SELECTIVE_UPDATE [core] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1239855 (https://phabricator.wikimedia.org/T348236) (owner: 10C. Scott Ananian)
[14:07:58] <jinxer-wm>	 FIRING: [7x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:08:00] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1239855|Add ParserOutputFlags::PREVENT_SELECTIVE_UPDATE (T348236)]]
[14:08:03] <icinga-wm>	 PROBLEM - Host asw1-b3-magru IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[14:08:04] <stashbot>	 T348236: Introduce parser limit for # of images - https://phabricator.wikimedia.org/T348236
[14:08:11] <icinga-wm>	 PROBLEM - Router interfaces on mr1-magru is CRITICAL: CRITICAL: host 195.200.68.132, interfaces up: 34, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[14:08:37] <_joe_>	 I assume the failing probes are for magru, right?
[14:08:37] <logmsgbot>	 !log ayounsi@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on asw1-b3-magru,asw1-b3-magru IPv6,asw1-b3-magru.mgmt with reason: router upgrade
[14:08:48] <XioNoX>	 jinxer-wm: yeah you can ignore
[14:08:52] <XioNoX>	 er _joe_ ^ :)
[14:09:06] <XioNoX>	 vgutierrez: any idea why this paging alert triggers ? 50% of the backends should be online
[14:09:47] <vgutierrez>	 XioNoX: cause we don't have a depool threshold of 50% in the CDN
[14:09:54] <vgutierrez>	 XioNoX: it's a 66%
[14:10:04] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] etcd: Remove the use_pki_certs flag [puppet] - 10https://gerrit.wikimedia.org/r/978615 (owner: 10Muehlenhoff)
[14:10:08] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, cscott: Backport for [[gerrit:1239855|Add ParserOutputFlags::PREVENT_SELECTIVE_UPDATE (T348236)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:11:17] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1239600 (https://phabricator.wikimedia.org/T335765) (owner: 10Muehlenhoff)
[14:11:23] <XioNoX>	 vgutierrez: so what happends when we go past the 66% ?
[14:11:30] <XioNoX>	 it's just alerting?
[14:11:46] <vgutierrez>	 and it's sending traffic to hosts that are down
[14:11:47] <wikibugs>	 (03CR) 10Federico Ceratto: [C:03+1] "Compared with previous changes, LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1239919 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui)
[14:11:54] <XioNoX>	 iirc in drmrs it paged for rack #1 but not for rack #2
[14:11:56] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] core.pp: Monitor event_scheduler on core hosts [puppet] - 10https://gerrit.wikimedia.org/r/1239919 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui)
[14:12:10] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, cscott: Continuing with sync
[14:12:14] <XioNoX>	 vgutierrez: not sure I understand why?
[14:12:15] <vgutierrez>	 assuming that magru is depooled at the CDN level, it shouldn't be a big deal
[14:12:20] <logmsgbot>	 ayounsi@cumin1003 downtime (PID 1785246) is awaiting input
[14:12:22] <XioNoX>	 yeah it's depooled
[14:12:48] <logmsgbot>	 !log ayounsi@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on asw1-b4-magru,asw1-b4-magru IPv6,asw1-b4-magru.mgmt with reason: router upgrade
[14:12:49] <XioNoX>	 it's not used facing but I'm surprised by the paging alert
[14:12:51] <XioNoX>	 user*
[14:12:57] <jinxer-wm>	 FIRING: [7x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:13:09] <sukhe>	 I wlil silence the alert in the meantime
[14:13:10] <sukhe>	 !ack
[14:13:10] <sirenbot>	 no value provided for parameter incident and no default available
[14:13:10] <sirenbot>	 All incidents are already acked.
[14:13:17] <vgutierrez>	 XioNoX: yeah.. sending traffic to hosts that are flagged as down has user impact
[14:13:20] <jinxer-wm>	 FIRING: [25x] JobUnavailable: Reduced availability for job benthos in ops@magru - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:13:29] <XioNoX>	 vgutierrez: but why are we doing that?
[14:14:10] <vgutierrez>	 that's the depool threshold mechanism, at least for the CDN it's better to do that after a certain point (66%) than send all the traffic to a few cp servers that would melt under that load
[14:14:37] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] Run Bird spec tests on Bookworm/Trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239599 (https://phabricator.wikimedia.org/T335765) (owner: 10Muehlenhoff)
[14:14:50] <vgutierrez>	 so it's better to lose a % of the traffic but keep the cluster alive
[14:14:58] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T415786)', diff saved to https://phabricator.wikimedia.org/P88845 and previous config saved to /var/cache/conftool/dbconfig/20260217-141457-marostegui.json
[14:15:01] <stashbot>	 T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
[14:15:03] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
[14:15:11] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2172 (T415786)', diff saved to https://phabricator.wikimedia.org/P88846 and previous config saved to /var/cache/conftool/dbconfig/20260217-141510-marostegui.json
[14:15:11] <XioNoX>	 vgutierrez: ah, ok I see, thx!
[14:15:13] <claime>	 The confusion comes from the fact that depooling the site at the DNS level does not inhibit the alert for failed probes I guess?
[14:15:21] <wikibugs>	 (03CR) 10Federico Ceratto: [C:03+1] "LGTM, want me to merge the change in puppet?" [puppet] - 10https://gerrit.wikimedia.org/r/1239927 (https://phabricator.wikimedia.org/T416582) (owner: 10Marostegui)
[14:15:34] <icinga-wm>	 RECOVERY - Host asw1-b3-magru.mgmt is UP: PING OK - Packet loss = 0%, RTA = 111.24 ms
[14:16:14] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1239855|Add ParserOutputFlags::PREVENT_SELECTIVE_UPDATE (T348236)]] (duration: 08m 13s)
[14:16:18] <stashbot>	 T348236: Introduce parser limit for # of images - https://phabricator.wikimedia.org/T348236
[14:16:21] <sukhe>	 claime: yeah but that's expected in the sense that unless explicitly silenced, we are still doing the healthchecking for the service itself
[14:16:26] <wikibugs>	 (03CR) 10Marostegui: "Yeah, go for it, so you can merge and deploy the grants in prod too." [puppet] - 10https://gerrit.wikimedia.org/r/1239927 (https://phabricator.wikimedia.org/T416582) (owner: 10Marostegui)
[14:16:31] <cscott>	 Amir1: ok, done!
[14:16:36] <sukhe>	 whether we should add it to the DNS admin depool cookbook, that's a point for discussion
[14:17:02] <icinga-wm>	 RECOVERY - Host asw1-b3-magru is UP: PING OK - Packet loss = 0%, RTA = 111.83 ms
[14:17:10] <icinga-wm>	 RECOVERY - Router interfaces on mr1-magru is OK: OK: host 195.200.68.132, interfaces up: 35, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[14:17:21] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] ipip: Disable prometheus_lvs_realserver_mss if no clamping [puppet] - 10https://gerrit.wikimedia.org/r/1239936 (https://phabricator.wikimedia.org/T352956) (owner: 10JMeybohm)
[14:17:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in magru - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[14:17:58] <jinxer-wm>	 RESOLVED: ProbeDown: Service upload:80 has failed probes (http_upload_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#upload:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:17:58] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:18:13] <claime>	 sukhe: yeah, I'm just trying to think of how we can make it so that we don't have the "is that because of X" question that we alway do :D
[14:18:16] <icinga-wm>	 RECOVERY - Host asw1-b3-magru IPv6 is UP: PING OK - Packet loss = 0%, RTA = 111.29 ms
[14:18:20] <jinxer-wm>	 FIRING: [25x] JobUnavailable: Reduced availability for job benthos in ops@magru - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:19:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: netbox_ganeti_magru03_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:19:40] <jinxer-wm>	 RESOLVED: [25x] JobUnavailable: Reduced availability for job benthos in ops@magru - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:19:42] <sukhe>	 claime: yeah. IMO , the simplest thing in this case would be to allow an optional argument to sre.dns.admin that says geoDNS depool (as we are doing) and then put a silence for that specific site
[14:20:00] <claime>	 sukhe: yep
[14:21:33] <XioNoX>	 sukhe: I have to reboot asw1-b4 now, how I can downtime that specific paging alert?
[14:22:31] <sukhe>	 XioNoX: on alerts.wm.org, creating a silence for site=magru for this duration should be enough.
[14:22:41] <sukhe>	 I already did that now but for future we can do that
[14:22:48] <sukhe>	 or do it via the cookbook if people agree :>
[14:22:52] <cscott>	 Amir1: next on the list is Thiemo, but I don't know if he's around.
[14:23:01] <logmsgbot>	 !log brouberol@deploy2002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[14:23:20] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:23:28] <XioNoX>	 !log asw1-b4-magru> request system reboot - T416442
[14:23:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:32] <stashbot>	 T416442: magru: upgrade routers & switches (2026) - https://phabricator.wikimedia.org/T416442
[14:23:56] <Amir1>	 thanks
[14:23:58] <Amir1>	 let me check
[14:24:40] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:25:15] <Amir1>	 phuedx: Are you around for your patch?
[14:25:20] <phuedx>	 Amir1: o/
[14:25:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Run Bird spec tests on Bookworm/Trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239599 (https://phabricator.wikimedia.org/T335765) (owner: 10Muehlenhoff)
[14:25:31] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239935 (https://phabricator.wikimedia.org/T417578) (owner: 10Anzx)
[14:26:18] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:26:25] <wikibugs>	 (03PS1) 10Urbanecm: Growth: Enable on all open Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239949 (https://phabricator.wikimedia.org/T417023)
[14:26:29] <logmsgbot>	 !log brouberol@deploy2002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[14:26:30] <Amir1>	 awesome. I'll deploy yours after this
[14:26:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] dnsdist: Run spec tests on Bookworm/Trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239600 (https://phabricator.wikimedia.org/T335765) (owner: 10Muehlenhoff)
[14:26:52] <wikibugs>	 (03Merged) 10jenkins-bot: lift IP cap for event at Tshwane University of Technology [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239935 (https://phabricator.wikimedia.org/T417578) (owner: 10Anzx)
[14:27:15] <icinga-wm>	 PROBLEM - Router interfaces on mr1-magru is CRITICAL: CRITICAL: host 195.200.68.132, interfaces up: 34, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[14:27:20] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1239935|lift IP cap for event at Tshwane University of Technology (T417578)]]
[14:27:25] <stashbot>	 T417578: Request for IP exemption for event with Tshwane University of Technology  2026-02-23 - https://phabricator.wikimedia.org/T417578
[14:27:50] <wikibugs>	 (03CR) 10Majavah: [C:03+2] openstack: Fix puppetleaks script for openstack authentication changes [puppet] - 10https://gerrit.wikimedia.org/r/1237208 (owner: 10Majavah)
[14:28:54] <wikibugs>	 (03PS1) 10Brouberol: admin/rbac: give permissions to the flink operators on flinkbluegreendeployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239950 (https://phabricator.wikimedia.org/T416455)
[14:29:35] <logmsgbot>	 !log ladsgroup@deploy2002 anzx, ladsgroup: Backport for [[gerrit:1239935|lift IP cap for event at Tshwane University of Technology (T417578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:29:39] <anzx>	 Amir1: nothing to test 
[14:29:53] <Amir1>	 yeah, pushing
[14:30:01] <logmsgbot>	 !log ladsgroup@deploy2002 anzx, ladsgroup: Continuing with sync
[14:31:23] <wikibugs>	 (03CR) 10DCausse: admin/rbac: give permissions to the flink operators on flinkbluegreendeployments (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239950 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[14:32:08] <wikibugs>	 (03PS2) 10Brouberol: admin/rbac: give permissions to the flink operators on flinkbluegreendeployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239950 (https://phabricator.wikimedia.org/T416455)
[14:32:12] <wikibugs>	 (03CR) 10Brouberol: admin/rbac: give permissions to the flink operators on flinkbluegreendeployments (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239950 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[14:32:28] <wikibugs>	 (03PS1) 10Ssingh: hiera: common.yaml: remove redundant comments for authdns_addrs [puppet] - 10https://gerrit.wikimedia.org/r/1239952
[14:32:30] <Lucas_WMDE>	 o/ meeting done
[14:33:32] <Amir1>	 We got some pushed forward
[14:33:43] <Amir1>	 Next is Sam's
[14:33:53] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8051/console" [puppet] - 10https://gerrit.wikimedia.org/r/1239952 (owner: 10Ssingh)
[14:34:05] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1239935|lift IP cap for event at Tshwane University of Technology (T417578)]] (duration: 06m 45s)
[14:34:10] <stashbot>	 T417578: Request for IP exemption for event with Tshwane University of Technology  2026-02-23 - https://phabricator.wikimedia.org/T417578
[14:34:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: netbox_ganeti_magru03_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:34:28] <wikibugs>	 (03Restored) 10Ladsgroup: Undeploy InterwikiSorting - I: Disable everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599064 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[14:35:22] <Lucas_WMDE>	 Amir1: do you want to keep deploying the window?
[14:35:35] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:35:37] <wikibugs>	 (03CR) 10Herron: [C:03+1] thanos::rule: add ExecReload to the service unit [puppet] - 10https://gerrit.wikimedia.org/r/1239906 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli)
[14:35:54] <Amir1>	 Lucas_WMDE: I need a break. It'd be great if you take over
[14:36:14] <icinga-wm>	 RECOVERY - Router interfaces on mr1-magru is OK: OK: host 195.200.68.132, interfaces up: 35, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[14:36:24] <wikibugs>	 (03CR) 10Ssingh: [V:03+1 C:03+2] hiera: common.yaml: remove redundant comments for authdns_addrs [puppet] - 10https://gerrit.wikimedia.org/r/1239952 (owner: 10Ssingh)
[14:36:32] <wikibugs>	 (03CR) 10Ssingh: [V:03+1 C:03+2] "Removing old comments, no code change" [puppet] - 10https://gerrit.wikimedia.org/r/1239952 (owner: 10Ssingh)
[14:36:36] <icinga-wm>	 PROBLEM - Host 2a02:ec80:700:2:195:200:68:37 is DOWN: PING CRITICAL - Packet loss = 100%
[14:36:40] <Lucas_WMDE>	 alright, can do
[14:36:46] <sukhe>	 dns7002
[14:36:48] <sukhe>	 no worries, depooled
[14:36:55] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:37:00] <icinga-wm>	 RECOVERY - Host 2a02:ec80:700:2:195:200:68:37 is UP: PING OK - Packet loss = 0%, RTA = 110.87 ms
[14:39:45] <XioNoX>	 sukhe: all the prod side is back up and healthy, last reboot is for mr1 as soon as the software install is done
[14:39:58] <logmsgbot>	 jclark@cumin1003 provision (PID 1827528) is awaiting input
[14:40:04] <sukhe>	 thanks! I guess we can wait for that as well before repooling
[14:40:11] <sukhe>	 can check on the prod side in the meantime
[14:40:17] <icinga-wm>	 PROBLEM - Postfix SMTP on crm2001 is CRITICAL: CRITICAL - Certificate crm2001.codfw.wmnet expires in 15 day(s) (Thu 05 Mar 2026 02:40:00 PM GMT +0000). https://wikitech.wikimedia.org/wiki/Mail%23Troubleshooting
[14:41:44] <jinxer-wm>	 FIRING: [4x] RipeAtlasAnchorUnreachable: ipv4 ping to magru RIPE Atlas anchor: failures over threshold for measurement 95133212 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[14:42:35] <wikibugs>	 (03PS1) 10Urbanecm: [Growth] Enable on every new Wikipedia by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239954 (https://phabricator.wikimedia.org/T304052)
[14:43:16] <wikibugs>	 (03CR) 10Urbanecm: [C:04-2] "not yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239954 (https://phabricator.wikimedia.org/T304052) (owner: 10Urbanecm)
[14:43:23] <wikibugs>	 (03PS2) 10Urbanecm: [Growth] Enable on all open Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239949 (https://phabricator.wikimedia.org/T417023)
[14:43:28] <wikibugs>	 (03PS2) 10Urbanecm: [Growth] Enable on every new Wikipedia by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239954 (https://phabricator.wikimedia.org/T304052)
[14:43:30] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Remove lucaswerkmeister-wmde SSH key [puppet] - 10https://gerrit.wikimedia.org/r/1239955
[14:43:41] <Lucas_WMDE>	 re ^, I’m afraid someone else will have to continue the window after all :(
[14:44:15] <Lucas_WMDE>	 (I theoretically still have a SpiderPig session but I don’t want to start jobs without the ability to SSH in to fix stuff if needed)
[14:44:18] <Lucas_WMDE>	 cc Amir1
[14:44:50] <XioNoX>	 !log mr1-magru> request system reboot - T416442
[14:44:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:54] <stashbot>	 T416442: magru: upgrade routers & switches (2026) - https://phabricator.wikimedia.org/T416442
[14:45:05] <wikibugs>	 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 10Mail, and 2 others: Replace Spamassassin with Rspam for VRTS on Postfix - https://phabricator.wikimedia.org/T402260#11623148 (10ABran-WMF) 05Open→03Stalled
[14:45:19] <wikibugs>	 10SRE-SLO, 06collaboration-services: Implement service level indicator measurement for Gerrit - https://phabricator.wikimedia.org/T396979#11623159 (10ABran-WMF) 05Open→03Stalled
[14:46:36] <icinga-wm>	 PROBLEM - Host mr1-magru is DOWN: PING CRITICAL - Packet loss = 100%
[14:46:43] <jinxer-wm>	 RESOLVED: [4x] RipeAtlasAnchorUnreachable: ipv4 ping to magru RIPE Atlas anchor: failures over threshold for measurement 95133212 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[14:47:05] <phuedx>	 Lucas_WMDE: I can self service. Has everything but mine been deployed?
[14:47:05] <Amir1>	 ack do you want me to merge that puppet patch too Lucas_WMDE ?
[14:47:11] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:47:20] <Amir1>	 there are couple more left
[14:47:25] <Amir1>	 I can take care of them after yours
[14:47:37] <Amir1>	 Thiemo is not around so not gonna deploy his patches
[14:47:41] <phuedx>	 OK
[14:48:24] <Lucas_WMDE>	 phuedx: you’re up next, go ahead I think
[14:48:28] <Lucas_WMDE>	 Amir1: please do, thanks
[14:48:49] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by phuedx@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239672 (owner: 10Phuedx)
[14:49:07] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Remove lucaswerkmeister-wmde SSH key [puppet] - 10https://gerrit.wikimedia.org/r/1239955 (owner: 10Lucas Werkmeister (WMDE))
[14:49:24] <icinga-wm>	 PROBLEM - Host mr1-magru IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[14:49:39] <wikibugs>	 (03Merged) 10jenkins-bot: Test Kitchen: Set event intake service name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239672 (owner: 10Phuedx)
[14:49:56] <icinga-wm>	 PROBLEM - Host mr1-magru.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[14:50:06] <logmsgbot>	 !log phuedx@deploy2002 Started scap sync-world: Backport for [[gerrit:1239672|Test Kitchen: Set event intake service name]]
[14:50:20] <jhathaway>	 XioNoX: expected?
[14:50:34] <sukhe>	 jhathaway: yep
[14:50:38] <XioNoX>	 jhathaway: yep, that's the last device to reboot
[14:50:40] <sukhe>	 ongoing magru work
[14:50:40] <jhathaway>	 sukhe: thanks
[14:51:38] <wikibugs>	 (03CR) 10Federico Ceratto: [C:03+2] orchestrator.sql.erb: Add replication master admin to orchestrator [puppet] - 10https://gerrit.wikimedia.org/r/1239927 (https://phabricator.wikimedia.org/T416582) (owner: 10Marostegui)
[14:52:23] <logmsgbot>	 !log phuedx@deploy2002 phuedx: Backport for [[gerrit:1239672|Test Kitchen: Set event intake service name]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:52:53] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] admin/rbac: give permissions to the flink operators on flinkbluegreendeployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239950 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[14:53:24] <wikibugs>	 (03PS1) 10Blake: statsd-exporter: increase the number of replicas by 1. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239953
[14:53:29] <wikibugs>	 (03CR) 10Ebernhardson: [C:03+1] cirrus: enable default_sort for completion on a set of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207758 (https://phabricator.wikimedia.org/T404858) (owner: 10DCausse)
[14:53:44] <icinga-wm>	 RECOVERY - Host mr1-magru is UP: PING OK - Packet loss = 0%, RTA = 111.19 ms
[14:53:56] <icinga-wm>	 PROBLEM - Host mr1-magru.oob is DOWN: PING CRITICAL - Packet loss = 100%
[14:54:03] <logmsgbot>	 !log brouberol@deploy2002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[14:54:26] <icinga-wm>	 RECOVERY - Host mr1-magru IPv6 is UP: PING OK - Packet loss = 0%, RTA = 111.15 ms
[14:54:57] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06SRE Observability: RAM upgrade availability for Titan hosts - https://phabricator.wikimedia.org/T416741#11623233 (10VRiley-WMF) Hey @herron sorry about yesterday, I forgot it was a holiday. Would we be able to do this upgrade today?
[14:54:58] <icinga-wm>	 RECOVERY - Host mr1-magru.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 124.36 ms
[14:55:55] <logmsgbot>	 !log brouberol@deploy2002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[14:56:39] <phuedx>	 No errors in the logs or in the browser console. LGTM
[14:57:02] <icinga-wm>	 RECOVERY - Host mr1-magru.oob is UP: PING OK - Packet loss = 0%, RTA = 123.26 ms
[14:57:58] <logmsgbot>	 !log phuedx@deploy2002 phuedx: Continuing with sync
[14:58:11] <wikibugs>	 (03PS1) 10Jforrester: Revert "wikifunctions: [WIP] Specify the Rust-based evaluator releases too" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239958
[14:58:17] <wikibugs>	 (03PS2) 10Jforrester: Revert "wikifunctions: [WIP] Specify the Rust-based evaluator releases too" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239958
[14:58:20] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] Revert "wikifunctions: [WIP] Specify the Rust-based evaluator releases too" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239958 (owner: 10Jforrester)
[14:58:25] <vgutierrez>	 !log upload golang-github-mmatczuk-anyflag-dev 0.0~git20240709.eb9e24c-1 to trixie-wikimedia (apt.wm.o) - T401832
[14:58:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:58:29] <stashbot>	 T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832
[14:58:31] <logmsgbot>	 !log sukhe@puppetserver1001 conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=magru [reason: magru maintenance done]
[14:58:45] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[14:58:52] <sukhe>	 !log running authdns-update after magru depool
[14:58:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:59:22] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10Prod-Kubernetes, 06ServiceOps new, and 5 others: Fix thumbor discovery records and make swift use them - https://phabricator.wikimedia.org/T397618#11623253 (10MatthewVernon) >>! In T397618#11622476, @Clement_Goubert wrote: > I'm still unsure of the actual flow of...
[14:59:57] <logmsgbot>	 !log sukhe@dns1004 END - running authdns-update
[15:00:05] <jouncebot>	 Deploy window Test Kitchen UI Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1500)
[15:00:31] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "wikifunctions: [WIP] Specify the Rust-based evaluator releases too" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239958 (owner: 10Jforrester)
[15:00:41] <wikibugs>	 (03PS1) 10Brouberol: flink-operator: grant permissions on Flinkbluegreenddeployment in tenant namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239960 (https://phabricator.wikimedia.org/T416455)
[15:00:57] <wikibugs>	 (03CR) 10Clément Goubert: [C:04-1] "This would change the default chart value for a deployment, but our deployments override this in `helmfile.d/services/{mw-jobrunner,mw-api" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239953 (owner: 10Blake)
[15:01:21] <logmsgbot>	 !log sukhe@cumin1003 START - Cookbook sre.hosts.remove-downtime for 40 hosts
[15:01:45] <logmsgbot>	 !log sukhe@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 40 hosts
[15:02:02] <logmsgbot>	 !log phuedx@deploy2002 Finished scap sync-world: Backport for [[gerrit:1239672|Test Kitchen: Set event intake service name]] (duration: 11m 56s)
[15:02:43] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10Prod-Kubernetes, 06ServiceOps new, and 5 others: Fix thumbor discovery records and make swift use them - https://phabricator.wikimedia.org/T397618#11623277 (10Clement_Goubert) >>! In T397618#11623253, @MatthewVernon wrote: >>>! In T397618#11622476, @Clement_Gouber...
[15:02:58] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: [WIP] Specify the Rust-based evaluator releases too [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239961 (https://phabricator.wikimedia.org/T402957)
[15:02:58] <phuedx>	 Amir1: Done
[15:03:08] <Amir1>	 Awesome
[15:03:09] <Amir1>	 Thanks
[15:03:24] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06SRE Observability: RAM upgrade availability for Titan hosts - https://phabricator.wikimedia.org/T416741#11623280 (10herron) Hi @VRiley-WMF sure, by the way were you able to reclaim any more RAM?
[15:04:03] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10Prod-Kubernetes, 06ServiceOps new, and 5 others: Fix thumbor discovery records and make swift use them - https://phabricator.wikimedia.org/T397618#11623282 (10MatthewVernon) >>! In T397618#11623277, @Clement_Goubert wrote: > And only originals are replicated, righ...
[15:04:27] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239329 (https://phabricator.wikimedia.org/T415196) (owner: 10Anzx)
[15:05:04] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] k8s-staging: Set ipip_encapsulation in service::catalog [puppet] - 10https://gerrit.wikimedia.org/r/1237280 (https://phabricator.wikimedia.org/T352956) (owner: 10Alexandros Kosiaris)
[15:05:25] <wikibugs>	 (03Merged) 10jenkins-bot: sqwiki: remove editor usergroup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239329 (https://phabricator.wikimedia.org/T415196) (owner: 10Anzx)
[15:05:31] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06SRE Observability: RAM upgrade availability for Titan hosts - https://phabricator.wikimedia.org/T416741#11623286 (10VRiley-WMF) I have! I am able to double the RAM. So, we can go from each unit have 128 gigs to 256 gigs (adding 4 sticks of 32 gig)
[15:05:56] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1239329|sqwiki: remove editor usergroup (T415196)]]
[15:06:00] <stashbot>	 T415196: Remove 'editor' user right from sq.wikipedia - https://phabricator.wikimedia.org/T415196
[15:06:27] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs1018.eqiad.wmnet, wdqs1015.eqiad.wmnet, wdqs1020.eqiad.wmnet, wdqs1012.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:06:32] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: wikikube-staging-worker-eqiad@eqiad
[15:06:40] <logmsgbot>	 !log sukhe@cumin1003 START - Cookbook sre.dns.admin DNS admin: pool site magru [reason: Xionix maint work done, T416442]
[15:06:44] <stashbot>	 T416442: magru: upgrade routers & switches (2026) - https://phabricator.wikimedia.org/T416442
[15:06:57] <logmsgbot>	 !log sukhe@cumin1003 END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: pool site magru [reason: Xionix maint work done, T416442]
[15:07:03] <logmsgbot>	 !log sukhe@cumin1003 START - Cookbook sre.dns.admin DNS admin: pool site magru [reason: XioNoX: maint work done, T416442]
[15:07:04] <logmsgbot>	 !log sukhe@cumin1003 END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site magru [reason: XioNoX: maint work done, T416442]
[15:07:27] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:07:53] <wikibugs>	 (03PS1) 10Muehlenhoff: Also install the pbuilder hooks for trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239963 (https://phabricator.wikimedia.org/T401832)
[15:08:14] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, anzx: Backport for [[gerrit:1239329|sqwiki: remove editor usergroup (T415196)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[15:08:16] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10Prod-Kubernetes, 06ServiceOps new, and 5 others: Fix thumbor discovery records and make swift use them - https://phabricator.wikimedia.org/T397618#11623297 (10Clement_Goubert) >>! In T397618#11623282, @MatthewVernon wrote: >>>! In T397618#11623277, @Clement_Gouber...
[15:08:18] <wikibugs>	 (03PS2) 10Blake: statsd-exporter: increment replicas by 1 for several deployments. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239953
[15:08:18] <anzx>	 Amir1: checking 
[15:08:27] <wikibugs>	 (03PS1) 10Majavah: openstack: encapi: Store project names in database [puppet] - 10https://gerrit.wikimedia.org/r/1239965 (https://phabricator.wikimedia.org/T416588)
[15:08:45] <Amir1>	 noted
[15:09:07] <wikibugs>	 (03CR) 10CI reject: [V:04-1] openstack: encapi: Store project names in database [puppet] - 10https://gerrit.wikimedia.org/r/1239965 (https://phabricator.wikimedia.org/T416588) (owner: 10Majavah)
[15:09:09] <anzx>	 Amir1: looks good, ok to sync 
[15:09:34] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, anzx: Continuing with sync
[15:09:56] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
[15:10:20] <wikibugs>	 (03PS2) 10Majavah: openstack: encapi: Store project names in database [puppet] - 10https://gerrit.wikimedia.org/r/1239965 (https://phabricator.wikimedia.org/T416588)
[15:11:05] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Also install the pbuilder hooks for trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239963 (https://phabricator.wikimedia.org/T401832) (owner: 10Muehlenhoff)
[15:11:05] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
[15:11:05] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: wikikube-staging-worker-eqiad@eqiad
[15:11:06] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] statsd-exporter: increment replicas by 1 for several deployments. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239953 (owner: 10Blake)
[15:11:39] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06SRE Observability: RAM upgrade availability for Titan hosts - https://phabricator.wikimedia.org/T416741#11623318 (10herron) Excellent!  What is a good start time for you today?  I can depool titan1001 ahead of that.   I should also mention that the titan hosts can take about...
[15:11:40] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: wikikube-staging-worker-codfw@codfw
[15:11:59] <wikibugs>	 (03CR) 10Blake: [C:03+2] statsd-exporter: increment replicas by 1 for several deployments. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239953 (owner: 10Blake)
[15:12:55] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06SRE Observability: RAM upgrade availability for Titan hosts - https://phabricator.wikimedia.org/T416741#11623320 (10VRiley-WMF) I'm availble all day for this activity. Just let us know when we can start on the first one. Then once it's done I'll update it here.
[15:13:41] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1239329|sqwiki: remove editor usergroup (T415196)]] (duration: 07m 45s)
[15:13:45] <stashbot>	 T415196: Remove 'editor' user right from sq.wikipedia - https://phabricator.wikimedia.org/T415196
[15:14:18] <wikibugs>	 (03Merged) 10jenkins-bot: statsd-exporter: increment replicas by 1 for several deployments. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239953 (owner: 10Blake)
[15:14:29] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
[15:16:13] <logmsgbot>	 !log sfaci@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply
[15:16:45] <logmsgbot>	 !log sfaci@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply
[15:17:44] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11623347 (10Joe) >>! In T414805#11612457, @Krinkle wrote: >>>! In T414805#11612140, @gerritbot wrote: >> %%%[mediawiki/extensions/Wik...
[15:18:19] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11623348 (10Joe) >>! In T414805#11620283, @Ladsgroup wrote: > FWIW: Out of 9K‌ results in https://global-search.toolforge.org/?q=%5C%...
[15:18:34] <wikibugs>	 (03CR) 10DCausse: [C:03+1] flink-operator: grant permissions on Flinkbluegreenddeployment in tenant namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239960 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[15:19:05] <wikibugs>	 (03PS3) 10Urbanecm: [Growth] Enable on all open Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239949 (https://phabricator.wikimedia.org/T417023)
[15:19:08] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] flink-operator: grant permissions on Flinkbluegreenddeployment in tenant namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239960 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[15:19:32] <wikibugs>	 (03PS4) 10Urbanecm: [Growth] Enable on all open Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239949 (https://phabricator.wikimedia.org/T417023)
[15:19:40] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[15:22:13] <wikibugs>	 (03PS3) 10Urbanecm: [Growth] Enable on every new Wikipedia by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239954 (https://phabricator.wikimedia.org/T304052)
[15:22:14] <wikibugs>	 (03CR) 10Michael Große: [C:03+1] [Growth] Enable on all open Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239949 (https://phabricator.wikimedia.org/T417023) (owner: 10Urbanecm)
[15:22:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps2011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:23:19] <logmsgbot>	 jayme@cumin1003 migrate-service-ipip (PID 1889437) is awaiting input
[15:23:20] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:23:47] <wikibugs>	 (03CR) 10Ssingh: "We should rebase it and revisit this patch to merge it, I think?" [puppet] - 10https://gerrit.wikimedia.org/r/1214531 (https://phabricator.wikimedia.org/T411584) (owner: 10Slyngshede)
[15:23:59] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Add events checker [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738)
[15:24:52] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mariadb: Add events checker [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui)
[15:26:14] <anzx>	 Amir1: Thanks for deploying, could you run maintenance script to remove users editor usergroup on sqwiki 
[15:26:16] <logmsgbot>	 !log brouberol@deploy2002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[15:26:28] <Amir1>	 I‌ will soon-ish
[15:26:31] <Amir1>	 hope that's fine
[15:26:41] <anzx>	 ok, thanks 
[15:26:43] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10Prod-Kubernetes, 06ServiceOps new, and 5 others: Fix thumbor discovery records and make swift use them - https://phabricator.wikimedia.org/T397618#11623388 (10MatthewVernon) Yes, it wouldn't be good for anything other than short periods of time.  The two swift clu...
[15:27:07] <logmsgbot>	 !log brouberol@deploy2002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[15:27:37] <wikibugs>	 (03PS2) 10Marostegui: mariadb: Add events checker [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738)
[15:28:17] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mariadb: Add events checker [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui)
[15:28:56] <wikibugs>	 (03PS3) 10Marostegui: mariadb: Add events checker [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738)
[15:29:26] <logmsgbot>	 !log blake@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
[15:29:37] <logmsgbot>	 !log blake@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
[15:29:44] <logmsgbot>	 !log blake@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
[15:29:52] <logmsgbot>	 !log blake@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
[15:30:04] <jouncebot>	 Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1530)
[15:30:36] <wikibugs>	 (03PS1) 10Daniel Kinzler: rest-gateway: improve readability of tests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239972
[15:30:43] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mariadb: Add events checker [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui)
[15:30:44] <logmsgbot>	 !log blake@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
[15:30:52] <logmsgbot>	 !log blake@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
[15:30:55] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
[15:30:55] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: wikikube-staging-worker-codfw@codfw
[15:30:57] <logmsgbot>	 !log blake@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
[15:31:04] <logmsgbot>	 !log blake@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
[15:31:14] <logmsgbot>	 !log blake@deploy2002 helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
[15:31:19] <logmsgbot>	 !log blake@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
[15:31:23] <logmsgbot>	 !log blake@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
[15:31:29] <logmsgbot>	 !log blake@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
[15:31:38] <logmsgbot>	 !log blake@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
[15:31:45] <logmsgbot>	 !log blake@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
[15:31:49] <logmsgbot>	 !log blake@deploy2002 helmfile [codfw] START helmfile.d/services/mw-web: apply
[15:31:56] <logmsgbot>	 !log blake@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-web: apply
[15:32:42] <wikibugs>	 (03PS1) 10Brouberol: flink-operator: upgrade appVersion to 1.14.0 in the chart metadata [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239973 (https://phabricator.wikimedia.org/T416455)
[15:33:06] <bjensen>	 (that was updating the statsd-exporter replica count for those mw deployments)
[15:33:23] <wikibugs>	 (03PS1) 10Muehlenhoff: Make the pbuilder hook for apt.wikimedia.org compatible with trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239974
[15:33:40] <wikibugs>	 (03CR) 10DCausse: [C:03+1] flink-operator: upgrade appVersion to 1.14.0 in the chart metadata [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239973 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[15:34:12] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Make the pbuilder hook for apt.wikimedia.org compatible with trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239974 (owner: 10Muehlenhoff)
[15:35:17] <jayme>	 dpogorzelski || klausman: lvs201[34] have failing checks for ml-staging PYBAL CRITICAL - CRITICAL - k8s-ingress-ml-staging_31443: Servers ml-staging2001.codfw.wmnet, ml-staging2002.codfw.wmnet are marked down but pooled
[15:35:20] <logmsgbot>	 !log dcausse@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:35:41] <logmsgbot>	 !log dcausse@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:35:41] <wikibugs>	 (03PS4) 10Marostegui: mariadb: Add events checker [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738)
[15:35:48] <wikibugs>	 (03CR) 10Ssingh: "We will be discussing this in the Traffic meeting and will follow up after that. Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1237194 (https://phabricator.wikimedia.org/T306550) (owner: 10Majavah)
[15:35:58] <icinga-wm>	 PROBLEM - Host titan1001 is DOWN: PING CRITICAL - Packet loss = 100%
[15:38:03] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service titan1001:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:38:04] <wikibugs>	 (03CR) 10Daniel Kinzler: [C:03+2] rest gateway: add tests for chart rendering [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225085 (owner: 10Daniel Kinzler)
[15:38:20] <wikibugs>	 (03CR) 10Marostegui: "PCC: https://integration.wikimedia.org/ci/view/Ops/job/operations-puppet-catalog-compiler/8052/console" [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui)
[15:38:28] <wikibugs>	 (03PS2) 10Muehlenhoff: Make the pbuilder hook for apt.wikimedia.org compatible with trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239974
[15:38:29] <wikibugs>	 (03CR) 10Daniel Kinzler: rest gateway: implement per-policy shadow mode (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225699 (https://phabricator.wikimedia.org/T413183) (owner: 10Daniel Kinzler)
[15:38:34] <wikibugs>	 (03PS14) 10Daniel Kinzler: rest gateway: implement per-policy shadow mode [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225699 (https://phabricator.wikimedia.org/T413183)
[15:38:46] <wikibugs>	 (03CR) 10Daniel Kinzler: [C:03+2] rest gateway: implement per-policy shadow mode [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225699 (https://phabricator.wikimedia.org/T413183) (owner: 10Daniel Kinzler)
[15:38:51] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-private-users for maxbinderWMF - https://phabricator.wikimedia.org/T417655 (10MBinder_WMF) 03NEW
[15:39:37] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Q3:rack/setup/install cloudcephosd2008-dev - https://phabricator.wikimedia.org/T416396#11623439 (10Jhancock.wm)
[15:40:26] <wikibugs>	 (03Merged) 10jenkins-bot: rest gateway: add tests for chart rendering [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225085 (owner: 10Daniel Kinzler)
[15:40:48] <wikibugs>	 (03Merged) 10jenkins-bot: rest gateway: implement per-policy shadow mode [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225699 (https://phabricator.wikimedia.org/T413183) (owner: 10Daniel Kinzler)
[15:41:04] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239974 (owner: 10Muehlenhoff)
[15:43:12] <logmsgbot>	 !log dcausse@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:43:20] <jinxer-wm>	 FIRING: [6x] JobUnavailable: Reduced availability for job pint in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:43:26] <logmsgbot>	 !log dcausse@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:43:45] <logmsgbot>	 !log dcausse@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:43:51] <logmsgbot>	 !log daniel@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply
[15:43:54] <logmsgbot>	 !log dcausse@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:45:53] <logmsgbot>	 !log dcausse@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:46:01] <logmsgbot>	 !log dcausse@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:46:08] <logmsgbot>	 !log daniel@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
[15:46:14] <icinga-wm>	 RECOVERY - Host titan1001 is UP: PING WARNING - Packet loss = 33%, RTA = 0.39 ms
[15:46:36] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] cache::upload: enable global ratelimiting (eqsin) [puppet] - 10https://gerrit.wikimedia.org/r/1237243 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur)
[15:47:01] <logmsgbot>	 jhancock@cumin2002 provision (PID 1862644) is awaiting input
[15:47:13] <wikibugs>	 (03CR) 10Marostegui: "PCC https://puppet-compiler.wmflabs.org/output/1239969/8052/" [puppet] - 10https://gerrit.wikimedia.org/r/1239969 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui)
[15:47:17] <wikibugs>	 (03CR) 10Scott French: service.yaml: switch mw-parsoid to lvs_setup #2 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1239651 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[15:47:44] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06SRE Observability: RAM upgrade availability for Titan hosts - https://phabricator.wikimedia.org/T416741#11623488 (10VRiley-WMF) titan1001 has been fully upgraded to 256 gig. iDRAC verifies it is able to see the memory.
[15:48:03] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service titan1001:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:48:05] <logmsgbot>	 !log dcausse@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:48:10] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] cache::upload: enable global ratelimiting (eqsin) [puppet] - 10https://gerrit.wikimedia.org/r/1237243 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur)
[15:48:12] <logmsgbot>	 !log dcausse@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:48:20] <jinxer-wm>	 FIRING: [6x] JobUnavailable: Reduced availability for job pint in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:48:20] <wikibugs>	 (03CR) 10Scott French: "Great, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1238349 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[15:48:52] <logmsgbot>	 !log daniel@deploy2002 helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
[15:49:19] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[15:49:33] <logmsgbot>	 !log daniel@deploy2002 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
[15:49:34] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Thanks, effie!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238355 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[15:49:40] <jinxer-wm>	 RESOLVED: [6x] JobUnavailable: Reduced availability for job pint in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:49:46] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.dns.netbox
[15:51:42] <wikibugs>	 (03CR) 10Scott French: [C:03+1] mw-parsoid: repurpose for parsoidtest use #6 (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237472 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[15:53:31] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for HMonroy - https://phabricator.wikimedia.org/T417459#11623499 (10Ottomata)
[15:54:22] <logmsgbot>	 !log daniel@deploy2002 helmfile [codfw] START helmfile.d/services/rest-gateway: apply
[15:54:52] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: titan2001: expand ssd storage - https://phabricator.wikimedia.org/T417313#11623503 (10Jhancock.wm) @herron i have two 960GB SSDs we can install or one 1.92TB SSD i can install. do you have a preference?
[15:55:30] <logmsgbot>	 !log daniel@deploy2002 helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
[15:55:33] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[15:56:01] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudgw2004-dev to codfw - jhancock@cumin2002"
[15:56:07] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudgw2004-dev to codfw - jhancock@cumin2002"
[15:56:07] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:56:15] <wikibugs>	 (03CR) 10SomeRandomDeveloper: Escape the unescaped i18n messages (031 comment) [extensions/WP25EasterEggs] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1239419 (https://phabricator.wikimedia.org/T410091) (owner: 10Jdrewniak)
[15:56:27] <wikibugs>	 (03CR) 10Filippo Giunchedi: "IIRC system keyrings should work even pre-trixie, I think we should by okay to ditch apt-key altogether and always do system keyrings (i.e" [puppet] - 10https://gerrit.wikimedia.org/r/1239974 (owner: 10Muehlenhoff)
[15:56:50] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cloudgw2004-dev
[15:57:20] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudgw2004-dev
[15:57:23] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cloudgw2004-dev
[15:57:32] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudgw2004-dev
[15:58:03] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.dns.netbox
[16:00:05] <jouncebot>	 jelto, arnoldokoth, mutante, and arnaudb: I, the Bot under the Fountain, call upon thee, The Deployer, to do SRE Collaboration Services office hours deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1600).
[16:01:19] <wikibugs>	 (03CR) 10SomeRandomDeveloper: Escape the unescaped i18n messages (031 comment) [extensions/WP25EasterEggs] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1239419 (https://phabricator.wikimedia.org/T410091) (owner: 10Jdrewniak)
[16:01:42] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating cloudceph2008-dev in codfw - jhancock@cumin2002"
[16:01:47] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating cloudceph2008-dev in codfw - jhancock@cumin2002"
[16:01:47] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:06:50] <logmsgbot>	 !log brouberol@deploy2002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[16:07:26] <wikibugs>	 (03CR) 10Scott French: [C:03+1] mw-on-k8s: do not alert for mw-experimental and mw-parsoid [alerts] - 10https://gerrit.wikimedia.org/r/1239724 (owner: 10Effie Mouzeli)
[16:07:37] <logmsgbot>	 !log brouberol@deploy2002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[16:08:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr2-drmrs and fe80::ee38:7300:1ae8:9c56 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[16:08:20] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:13:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr2-drmrs and fe80::ee38:7300:1ae8:9c56 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[16:13:46] <logmsgbot>	 !log brouberol@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[16:14:51] <logmsgbot>	 !log brouberol@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[16:15:41] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.decommission for hosts ms-be[2057-2061].codfw.wmnet
[16:19:02] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.dns.netbox
[16:20:23] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] AM: only send critical I/F alerts to the I/F IRC chan [puppet] - 10https://gerrit.wikimedia.org/r/1239674 (owner: 10Ayounsi)
[16:20:46] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cloudgw2004-dev
[16:20:53] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudgw2004-dev
[16:21:28] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host cloudgw2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:21:35] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudgw2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:21:41] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:21:43] <logmsgbot>	 !log mvernon@cumin2002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ms-be[2057-2061].codfw.wmnet
[16:21:58] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: FY2526 Q3:rack/setup/install cloudgw2004-dev - https://phabricator.wikimedia.org/T413831#11623687 (10Jhancock.wm)
[16:25:03] <logmsgbot>	 !log brouberol@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[16:25:16] <wikibugs>	 (03PS2) 10JHathaway: postfix: remove localhost filtering [puppet] - 10https://gerrit.wikimedia.org/r/1239415
[16:25:18] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239415 (owner: 10JHathaway)
[16:25:36] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: FY2526 Q3:rack/setup/install cloudgw2004-dev - https://phabricator.wikimedia.org/T413831#11623711 (10Jhancock.wm) @Andrew I'm having trouble getting this server and the server in T416396 to provision. it fails at updating the switch. I used the netbox script to assign the ports...
[16:25:57] <logmsgbot>	 !log brouberol@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[16:26:55] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[16:27:58] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[16:28:46] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.decommission for hosts ms-be2057.codfw.wmnet
[16:28:48] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.dns.netbox
[16:29:40] <wikibugs>	 (03PS20) 10Tiziano Fogli: slothslos: add module to build and deploy sloth manifests [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579)
[16:29:40] <wikibugs>	 (03CR) 10Tiziano Fogli: "`sloth generate` preserves the original filesystem structure, and Thanos Ruler’s --rule-file flag is not able to load rule files recursive" [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli)
[16:30:38] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2346.codfw.wmnet with OS bookworm
[16:30:46] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06ServiceOps new: wikikube-worker2346 DOA - https://phabricator.wikimedia.org/T414708#11623739 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2346.codfw.wmnet with OS bookworm
[16:30:49] <wikibugs>	 (03PS2) 10Ladsgroup: Undeploy InterwikiSorting - I: Disable everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599064 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[16:30:56] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] flink-operator: upgrade appVersion to 1.14.0 in the chart metadata [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239973 (https://phabricator.wikimedia.org/T416455) (owner: 10Brouberol)
[16:32:24] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cumin2003 to codfw - jhancock@cumin2002"
[16:32:30] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cumin2003 to codfw - jhancock@cumin2002"
[16:32:30] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:32:48] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.dns.netbox
[16:33:06] <logmsgbot>	 !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab1004.eqiad.wmnet with reason: deployment
[16:33:20] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:33:26] <logmsgbot>	 !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab2002.codfw.wmnet with reason: deployment
[16:33:42] <logmsgbot>	 !log brennen@deploy2002 Started deploy [phabricator/deployment@aad109e]: deploy phab2002 for T417657
[16:33:46] <stashbot>	 T417657: Deploy Phab/Phorge 2026-02-17 - https://phabricator.wikimedia.org/T417657
[16:33:54] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] postfix: remove localhost filtering [puppet] - 10https://gerrit.wikimedia.org/r/1239415 (owner: 10JHathaway)
[16:34:14] <logmsgbot>	 !log brennen@deploy2002 Finished deploy [phabricator/deployment@aad109e]: deploy phab2002 for T417657 (duration: 00m 31s)
[16:34:40] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:34:55] <logmsgbot>	 !log brennen@deploy2002 Started deploy [phabricator/deployment@aad109e]: deploy phab1004 for T417657
[16:35:27] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:35:29] <logmsgbot>	 !log mvernon@cumin2002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ms-be2057.codfw.wmnet
[16:36:04] <logmsgbot>	 !log brennen@deploy2002 Finished deploy [phabricator/deployment@aad109e]: deploy phab1004 for T417657 (duration: 01m 08s)
[16:38:20] <jinxer-wm>	 FIRING: [8x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:40:24] <wikibugs>	 (03PS1) 10Elukey: sre.hosts.decommission: remove puppetmaster1001 leftovers [cookbooks] - 10https://gerrit.wikimedia.org/r/1239986
[16:41:21] <wikibugs>	 (03CR) 10Jforrester: Undeploy InterwikiSorting - I: Disable everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599064 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[16:42:28] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2346.codfw.wmnet with reason: host reimage
[16:43:20] <jinxer-wm>	 FIRING: [9x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:44:40] <jinxer-wm>	 FIRING: [10x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:45:12] <wikibugs>	 (03CR) 10Herron: "Could we simply use a flat filesystem structure on the input?" [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli)
[16:45:26] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host backup2015.codfw.wmnet with OS trixie
[16:45:32] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup2015 - https://phabricator.wikimedia.org/T414724#11623885 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host backup2015.codfw.wmnet with OS trixie
[16:45:39] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Thank you!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239692 (https://phabricator.wikimedia.org/T386246) (owner: 10Jgiannelos)
[16:45:53] <icinga-wm>	 PROBLEM - Host titan1002 is DOWN: PING CRITICAL - Packet loss = 100%
[16:46:01] <wikibugs>	 06SRE, 06Infrastructure-Foundations: sre.hosts.decommission fails with >1 host, leaves hosts impossible to decommission - https://phabricator.wikimedia.org/T417670 (10MatthewVernon) 03NEW
[16:46:21] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Thanks, Moritz!" [puppet] - 10https://gerrit.wikimedia.org/r/1239673 (owner: 10Muehlenhoff)
[16:46:33] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Q3:rack/setup/install cumin2003 - https://phabricator.wikimedia.org/T416385#11623896 (10Jhancock.wm)
[16:47:03] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:47:23] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2346.codfw.wmnet with reason: host reimage
[16:48:14] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.decommission for hosts ms-be[2057-2061].codfw.wmnet
[16:49:40] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:52:49] <wikibugs>	 (03CR) 10Tiziano Fogli: "In theory, we could — but the original repository structure helps keep the SLOs organized." [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli)
[16:53:20] <icinga-wm>	 RECOVERY - Host titan1002 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
[16:53:20] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:54:40] <jinxer-wm>	 RESOLVED: [3x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:56:14] <wikibugs>	 (03CR) 10Elukey: "Tested with test-cookbook -c 1239986 sre.hosts.decommission ms-be[2057-2061].codfw.wmnet -t T404771" [cookbooks] - 10https://gerrit.wikimedia.org/r/1239986 (owner: 10Elukey)
[16:57:03] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:59:45] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 4/6 UP : OSPFv3: 4/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[16:59:53] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 5/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[16:59:55] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 1/3 UP : OSPFv3: 1/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[17:00:05] <jouncebot>	 jhathaway and rzl: May I have your attention please! Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1700)
[17:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[17:00:39] <jinxer-wm>	 FIRING: CoreBGPDown: Core BGP session down between cr2-eqdfw and cr2-esams (208.80.153.216) - group Confed_esams - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=codfw&var-device=cr2-eqdfw:9804&var-bgp_group=Confed_esams&var-bgp_neighbor=cr2-esams - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[17:00:46] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[17:00:54] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[17:00:56] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[17:01:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr2-eqdfw and fe80::7a4f:9b00:174e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[17:01:49] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1239986 (owner: 10Elukey)
[17:02:12] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.dns.netbox
[17:02:13] <wikibugs>	 06SRE, 06Infrastructure-Foundations: offboarding Alex Kosiaris - https://phabricator.wikimedia.org/T417465#11623998 (10Dzahn)
[17:02:30] <wikibugs>	 06SRE, 06Infrastructure-Foundations: offboarding Alex Kosiaris - https://phabricator.wikimedia.org/T417465#11624001 (10Dzahn) removed from ops/ops-private:   ` [lists1004:~] $ sudo mailman-wrapper delmembers -m akosiaris@wikimedia.org -l ops@lists.wikimedia.org [lists1004:~] $ sudo mailman-wrapper delmembers -...
[17:04:29] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:05:09] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[17:05:39] <jinxer-wm>	 RESOLVED: [2x] CoreBGPDown: Core BGP session down between cr2-eqdfw and cr2-esams (208.80.153.216) - group Confed_esams - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[17:06:10] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-eqiad and fe80::7a4f:9b00:d4e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[17:06:10] <wikibugs>	 (03PS2) 10Elukey: sre.hosts.decommission: remove puppetmaster1001 leftovers [cookbooks] - 10https://gerrit.wikimedia.org/r/1239986
[17:06:12] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[17:06:13] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2346.codfw.wmnet with OS bookworm
[17:06:20] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06ServiceOps new: wikikube-worker2346 DOA - https://phabricator.wikimedia.org/T414708#11624020 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2346.codfw.wmnet with OS bookworm completed: - wikikube-worker2346 (...
[17:06:21] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2057-2061].codfw.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin2002"
[17:06:26] <logmsgbot>	 !log elukey@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2057-2061].codfw.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin2002"
[17:06:26] <logmsgbot>	 !log elukey@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:06:28] <logmsgbot>	 !log elukey@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2057-2061].codfw.wmnet
[17:06:34] <wikibugs>	 (03CR) 10Elukey: [C:03+2] "thanks :)" [cookbooks] - 10https://gerrit.wikimedia.org/r/1239986 (owner: 10Elukey)
[17:07:26] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06ServiceOps new: wikikube-worker2346 DOA - https://phabricator.wikimedia.org/T414708#11624039 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm @Clement_Goubert finally got this wayward server fixed up and it's ready for you to do what you need to do.
[17:08:21] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:09:05] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:09:28] <wikibugs>	 06SRE, 06Infrastructure-Foundations: sre.hosts.decommission fails with >1 host, leaves hosts impossible to decommission - https://phabricator.wikimedia.org/T417670#11624060 (10elukey) 05Open→03Resolved a:03elukey Fixed with https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1239986
[17:10:18] <wikibugs>	 (03CR) 10Dzahn: gerrit: limit access to http/https/ssh in firewall (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1238400 (https://phabricator.wikimedia.org/T411895) (owner: 10Dzahn)
[17:11:10] <jinxer-wm>	 RESOLVED: [3x] BFDdown: BFD session down between cr1-eqiad and fe80::7a4f:9b00:d4e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[17:11:49] <wikibugs>	 (03CR) 10Dzahn: [C:04-1] "let's put this on hold for a bit - too many moving parts right now" [puppet] - 10https://gerrit.wikimedia.org/r/1238400 (https://phabricator.wikimedia.org/T411895) (owner: 10Dzahn)
[17:12:31] <wikibugs>	 (03Merged) 10jenkins-bot: sre.hosts.decommission: remove puppetmaster1001 leftovers [cookbooks] - 10https://gerrit.wikimedia.org/r/1239986 (owner: 10Elukey)
[17:12:39] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "ready to ship, imho" [puppet] - 10https://gerrit.wikimedia.org/r/1239087 (https://phabricator.wikimedia.org/T417263) (owner: 10Jelto)
[17:13:17] <wikibugs>	 (03CR) 10Dzahn: [C:04-1] "no 5xx's anymore now." [dns] - 10https://gerrit.wikimedia.org/r/1239878 (https://phabricator.wikimedia.org/T417497) (owner: 10Jelto)
[17:13:20] <jinxer-wm>	 FIRING: [11x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[17:13:44] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "cool !:) thanks!" [dns] - 10https://gerrit.wikimedia.org/r/1238708 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb)
[17:14:26] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:14:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:14:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1239986 (owner: 10Elukey)
[17:15:18] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11624120 (10Dzahn)
[17:17:09] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2016']
[17:17:17] <wikibugs>	 10ops-eqsin, 06SRE, 06DC-Ops, 06Traffic: cp5022 is unreachable - https://phabricator.wikimedia.org/T414411#11624162 (10RobH)
[17:17:26] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup2016']
[17:18:06] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host backup2016.codfw.wmnet with OS trixie
[17:18:13] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11624182 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host backup2016.codfw.wmnet with OS trixie
[17:18:20] <jinxer-wm>	 FIRING: [12x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[17:18:43] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2017.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:20:11] <wikibugs>	 (03PS1) 10JHathaway: postfix: dkim sign mailer-daemon messages [puppet] - 10https://gerrit.wikimedia.org/r/1239996
[17:20:42] <wikibugs>	 (03Restored) 10Ladsgroup: Undeploy InterwikiSorting - II: Drop loading ability [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599065 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[17:20:50] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1239996 (owner: 10JHathaway)
[17:21:04] <wikibugs>	 (03Restored) 10Ladsgroup: Undeploy InterwikiSorting - III: Drop InterwikiSortOrders.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599066 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[17:21:59] <wikibugs>	 (03Restored) 10Ladsgroup: Undeploy InterwikiSorting - IV: Drop all config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599067 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[17:22:11] <wikibugs>	 (03Restored) 10Ladsgroup: Undeploy InterwikiSorting - V: Stop loading i18n [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599068 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[17:22:30] <wikibugs>	 (03PS3) 10Muehlenhoff: Make the pbuilder hook for apt.wikimedia.org compatible with trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239974
[17:22:38] <Amir1>	 jouncebot: nowandnext
[17:22:38] <jouncebot>	 For the next 0 hour(s) and 37 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1700)
[17:22:38] <jouncebot>	 In 0 hour(s) and 37 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1800)
[17:23:41] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] Make the pbuilder hook for apt.wikimedia.org compatible with trixie [puppet] - 10https://gerrit.wikimedia.org/r/1239974 (owner: 10Muehlenhoff)
[17:24:00] <wikibugs>	 (03CR) 10BCornwall: [C:04-1] Make the pbuilder hook for apt.wikimedia.org compatible with trixie (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1239974 (owner: 10Muehlenhoff)
[17:24:06] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11624224 (10Dzahn) [x] L3 has been signed [x] group approver not needed for WMF staff (confirmed staff sta...
[17:24:45] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11624225 (10Dzahn) @ccasilli Do you approve of this access request for Aiman? thanks!
[17:24:45] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2017.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:25:12] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] postfix: dkim sign mailer-daemon messages [puppet] - 10https://gerrit.wikimedia.org/r/1239996 (owner: 10JHathaway)
[17:25:43] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11624231 (10Dzahn) a:03ccasilli
[17:26:04] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] Version 0.4.6~deb13u1 [debs/python-logstash] - 10https://gerrit.wikimedia.org/r/1239460 (https://phabricator.wikimedia.org/T401832) (owner: 10BCornwall)
[17:26:33] <wikibugs>	 (03CR) 10Herron: "I think this approach makes a ton of sense! it is concise and intuitive" [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli)
[17:29:44] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06SRE Observability: RAM upgrade availability for Titan hosts - https://phabricator.wikimedia.org/T416741#11624244 (10VRiley-WMF) 05Open→03Resolved titan1002 has been fully upgraded to 256 gig. Verified it and was given the green light to close this. Thank you!
[17:32:43] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599064 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[17:33:20] <wikibugs>	 (03PS2) 10Daniel Kinzler: rest-gateway: improve readability of tests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239972
[17:33:53] <wikibugs>	 (03Merged) 10jenkins-bot: Undeploy InterwikiSorting - I: Disable everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599064 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[17:34:23] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:599064|Undeploy InterwikiSorting - I: Disable everywhere (T253764)]]
[17:34:27] <stashbot>	 T253764: Undeploy the InterwikiSorting extension from Wikipedia production - https://phabricator.wikimedia.org/T253764
[17:34:46] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host backup2017.codfw.wmnet with OS trixie
[17:35:02] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11624283 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host backup2017.codfw.wmnet with OS trixie
[17:36:33] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, jforrester: Backport for [[gerrit:599064|Undeploy InterwikiSorting - I: Disable everywhere (T253764)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[17:36:34] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11624292 (10AJaved-WMF)
[17:37:06] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] Version 0.4.6~deb13u1 [debs/python-logstash] - 10https://gerrit.wikimedia.org/r/1239460 (https://phabricator.wikimedia.org/T401832) (owner: 10BCornwall)
[17:37:15] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, jforrester: Continuing with sync
[17:37:39] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11624298 (10AJaved-WMF) Hi @Dzahn apologies for not making the edits earlier, I've done so now. I also che...
[17:39:29] <wikibugs>	 (03CR) 10BCornwall: [C:04-1] Make the pbuilder hook for apt.wikimedia.org compatible with trixie (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1239974 (owner: 10Muehlenhoff)
[17:41:21] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:599064|Undeploy InterwikiSorting - I: Disable everywhere (T253764)]] (duration: 06m 58s)
[17:41:25] <stashbot>	 T253764: Undeploy the InterwikiSorting extension from Wikipedia production - https://phabricator.wikimedia.org/T253764
[17:41:58] <Amir1>	 !log ladsgroup@deploy2002:~$ mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 compact-language-links
[17:42:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:43:15] <wikibugs>	 (03PS2) 10Ladsgroup: Undeploy InterwikiSorting - II: Drop loading ability [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599065 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[17:45:15] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599065 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[17:46:11] <wikibugs>	 (03Merged) 10jenkins-bot: Undeploy InterwikiSorting - II: Drop loading ability [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599065 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[17:46:43] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:599065|Undeploy InterwikiSorting - II: Drop loading ability (T253764)]]
[17:46:47] <stashbot>	 T253764: Undeploy the InterwikiSorting extension from Wikipedia production - https://phabricator.wikimedia.org/T253764
[17:48:55] <logmsgbot>	 !log ladsgroup@deploy2002 jforrester, ladsgroup: Backport for [[gerrit:599065|Undeploy InterwikiSorting - II: Drop loading ability (T253764)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[17:48:59] <wikibugs>	 (03PS2) 10Ladsgroup: Undeploy InterwikiSorting - III: Drop InterwikiSortOrders.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599066 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[17:49:46] <wikibugs>	 (03PS1) 10JHathaway: postfix: dkim sign mx-in outbound mail [puppet] - 10https://gerrit.wikimedia.org/r/1240000
[17:50:10] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240000 (owner: 10JHathaway)
[17:50:26] <logmsgbot>	 !log ladsgroup@deploy2002 jforrester, ladsgroup: Continuing with sync
[17:52:30] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11624378 (10Dzahn) Hi! Could you make her leave a quick comment here on the ticket? Thanks
[17:53:46] <wikibugs>	 (03CR) 10C. Scott Ananian: "nice round patch number, whoo!" [puppet] - 10https://gerrit.wikimedia.org/r/1240000 (owner: 10JHathaway)
[17:53:55] <wikibugs>	 (03Abandoned) 10BCornwall: Version 0.4.6~deb13u1 [debs/python-logstash] - 10https://gerrit.wikimedia.org/r/1239460 (https://phabricator.wikimedia.org/T401832) (owner: 10BCornwall)
[17:54:35] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:599065|Undeploy InterwikiSorting - II: Drop loading ability (T253764)]] (duration: 07m 52s)
[17:54:39] <stashbot>	 T253764: Undeploy the InterwikiSorting extension from Wikipedia production - https://phabricator.wikimedia.org/T253764
[17:56:06] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599066 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[17:56:58] <wikibugs>	 (03Merged) 10jenkins-bot: Undeploy InterwikiSorting - III: Drop InterwikiSortOrders.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599066 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[17:57:00] <brett>	 !log Import python-logstash (python3-logstash) 0.4.6~deb13u1 to trixie-wikimedia (T401832)
[17:57:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:57:04] <stashbot>	 T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832
[17:57:28] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:599066|Undeploy InterwikiSorting - III: Drop InterwikiSortOrders.php (T253764)]]
[17:58:41] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] "indeed!" [puppet] - 10https://gerrit.wikimedia.org/r/1240000 (owner: 10JHathaway)
[17:59:39] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, jforrester: Backport for [[gerrit:599066|Undeploy InterwikiSorting - III: Drop InterwikiSortOrders.php (T253764)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[17:59:43] <stashbot>	 T253764: Undeploy the InterwikiSorting extension from Wikipedia production - https://phabricator.wikimedia.org/T253764
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1800)
[18:00:23] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, jforrester: Continuing with sync
[18:04:39] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:599066|Undeploy InterwikiSorting - III: Drop InterwikiSortOrders.php (T253764)]] (duration: 07m 10s)
[18:05:37] <wikibugs>	 (03PS3) 10Ladsgroup: Undeploy InterwikiSorting - IV: Drop all config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599067 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[18:09:47] <wikibugs>	 (03PS4) 10Ladsgroup: Undeploy InterwikiSorting - IV: Drop all config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599067 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[18:10:29] <wikibugs>	 (03CR) 10Ladsgroup: "I‌ brought back InterwikiSortingSortPrepend since it's used in ULS: https://codesearch.wmcloud.org/search/?q=InterwikiSortingSort&files=&e" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599067 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[18:11:55] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599067 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[18:12:51] <wikibugs>	 (03Merged) 10jenkins-bot: Undeploy InterwikiSorting - IV: Drop all config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599067 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[18:13:02] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11624507 (10ccasilli) Hi there, thanks, all. Aiman you should have access now!
[18:13:23] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:599067|Undeploy InterwikiSorting - IV: Drop all config (T253764)]]
[18:13:27] <stashbot>	 T253764: Undeploy the InterwikiSorting extension from Wikipedia production - https://phabricator.wikimedia.org/T253764
[18:14:22] <wikibugs>	 (03CR) 10BCornwall: prometheus: add depooled cp* host check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: 10CDobbins)
[18:15:34] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, jforrester: Backport for [[gerrit:599067|Undeploy InterwikiSorting - IV: Drop all config (T253764)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[18:15:56] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11624536 (10Jhancock.wm)
[18:17:17] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, jforrester: Continuing with sync
[18:17:18] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11624546 (10Jhancock.wm) @jcrespo i need an edit to the site.pp file. the backup20XX servers have eqiad in the name. they should be codfw. Thank you!
[18:17:33] <wikibugs>	 (03CR) 10BCornwall: prometheus: add depooled cp* host check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: 10CDobbins)
[18:20:32] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11624570 (10Aklapper) @AJaved-WMF Please also [link your LDAP account to your Phabricator account](https:/...
[18:21:20] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:599067|Undeploy InterwikiSorting - IV: Drop all config (T253764)]] (duration: 07m 57s)
[18:21:24] <stashbot>	 T253764: Undeploy the InterwikiSorting extension from Wikipedia production - https://phabricator.wikimedia.org/T253764
[18:21:44] <wikibugs>	 (03CR) 10BCornwall: "This is looking great! You'll need to also make it so that this is actually installed/configured on an appropriate host." [puppet] - 10https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: 10CDobbins)
[18:23:14] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11624577 (10AJaved-WMF) Done
[18:26:24] <wikibugs>	 (03PS3) 10Ladsgroup: Undeploy InterwikiSorting - V: Stop loading i18n [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599068 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[18:30:11] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599068 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[18:31:08] <wikibugs>	 (03Merged) 10jenkins-bot: Undeploy InterwikiSorting - V: Stop loading i18n [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599068 (https://phabricator.wikimedia.org/T253764) (owner: 10Jforrester)
[18:31:39] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:599068|Undeploy InterwikiSorting - V: Stop loading i18n (T253764)]]
[18:31:43] <stashbot>	 T253764: Undeploy the InterwikiSorting extension from Wikipedia production - https://phabricator.wikimedia.org/T253764
[18:44:06] <wikibugs>	 (03PS1) 10Bernard Wang: Enable personal main menu to all users in minerva [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240012
[18:44:41] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2016.codfw.wmnet with OS trixie
[18:44:50] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11624706 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host backup2016.codfw.wmnet with OS trixie executed with errors: - backu...
[18:52:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:57:19] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2017.codfw.wmnet with OS trixie
[18:57:28] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup20[16-20] - https://phabricator.wikimedia.org/T414727#11624733 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host backup2017.codfw.wmnet with OS trixie executed with errors: - backu...
[18:57:46] <logmsgbot>	 !log ladsgroup@deploy2002 jforrester, ladsgroup: Backport for [[gerrit:599068|Undeploy InterwikiSorting - V: Stop loading i18n (T253764)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[18:57:50] <stashbot>	 T253764: Undeploy the InterwikiSorting extension from Wikipedia production - https://phabricator.wikimedia.org/T253764
[18:59:04] <logmsgbot>	 !log ladsgroup@deploy2002 jforrester, ladsgroup: Continuing with sync
[19:00:00] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] "Looks... not good, but good enough to me, given that all options are bad :D (though see suggestion inline). Would also like Ariel or Piotr" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1228218 (https://phabricator.wikimedia.org/T413186) (owner: 10Daniel Kinzler)
[19:00:04] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cumin2003
[19:00:05] <jouncebot>	 dancy and jnuche: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for MediaWiki train - Utc-7+Utc-0 Version . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T1900).
[19:00:11] <dancy>	 o/
[19:00:15] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cumin2003
[19:01:05] * dancy waits for ladsgroup's deployment.
[19:01:55] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Q3:rack/setup/install cumin2003 - https://phabricator.wikimedia.org/T416385#11624754 (10Jhancock.wm)
[19:02:12] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host cumin2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:12:55] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:599068|Undeploy InterwikiSorting - V: Stop loading i18n (T253764)]] (duration: 41m 15s)
[19:12:59] <stashbot>	 T253764: Undeploy the InterwikiSorting extension from Wikipedia production - https://phabricator.wikimedia.org/T253764
[19:15:48] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cumin2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:16:24] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cumin2003']
[19:16:46] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cumin2003']
[19:17:17] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host cumin2003.codfw.wmnet with OS trixie
[19:17:27] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Q3:rack/setup/install cumin2003 - https://phabricator.wikimedia.org/T416385#11624783 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cumin2003.codfw.wmnet with OS trixie
[19:18:38] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for HMonroy - https://phabricator.wikimedia.org/T417459#11624798 (10HMonroy) >>! In T417459#11619232, @MatthewVernon wrote: > All done. @HMonroy you should have had an email with a temporary kerberos password and instructions on...
[19:19:40] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[19:20:00] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 to 1.46.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240024 (https://phabricator.wikimedia.org/T413807)
[19:20:02] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Initiated by dancy@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240024 (https://phabricator.wikimedia.org/T413807) (owner: 10TrainBranchBot)
[19:21:01] <wikibugs>	 (03Merged) 10jenkins-bot: group0 to 1.46.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240024 (https://phabricator.wikimedia.org/T413807) (owner: 10TrainBranchBot)
[19:31:28] <logmsgbot>	 !log dancy@deploy2002 rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.16  refs T413807
[19:31:32] <stashbot>	 T413807: 1.46.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T413807
[19:33:56] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2003.codfw.wmnet with reason: host reimage
[19:36:00] <cjd91>	 !log cdobbins@apt1002 import fifo-log-demux 0.7.5+deb13u1 into trixie-wikimedia
[19:36:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:28] <wikibugs>	 06SRE, 06Infrastructure-Foundations: offboarding Alex Kosiaris - https://phabricator.wikimedia.org/T417465#11624878 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff All done
[19:38:47] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin2003.codfw.wmnet with reason: host reimage
[19:42:04] <wikibugs>	 (03PS1) 10Sergio Gimeno: [Growth] Specify notification delay as int instead of array [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240032 (https://phabricator.wikimedia.org/T375198)
[19:44:14] <wikibugs>	 (03PS2) 10Sergio Gimeno: [Growth] Specify notification delay as int instead of array [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240032 (https://phabricator.wikimedia.org/T375198)
[19:56:36] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[19:56:57] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[19:56:58] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cumin2003.codfw.wmnet with OS trixie
[19:57:07] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Q3:rack/setup/install cumin2003 - https://phabricator.wikimedia.org/T416385#11624974 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cumin2003.codfw.wmnet with OS trixie completed: - cumin2003 (**PASS**)   - Removed from Puppet...
[19:57:16] <icinga-wm>	 PROBLEM - Host an-worker1132 is DOWN: PING CRITICAL - Packet loss = 100%
[19:57:41] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Q3:rack/setup/install cumin2003 - https://phabricator.wikimedia.org/T416385#11624977 (10Jhancock.wm) 05Open→03Resolved
[19:57:58] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Q3:rack/setup/install cumin2003 - https://phabricator.wikimedia.org/T416385#11624980 (10Jhancock.wm) @MoritzMuehlenhoff this one is complete
[20:00:14] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Q3:rack/setup/install apus-fe200[4-5] - https://phabricator.wikimedia.org/T416387#11624985 (10Jhancock.wm)
[20:03:08] <wikibugs>	 06SRE, 06Traffic: Anycast ns[01].wikimedia.org for IPv4 - https://phabricator.wikimedia.org/T366193#11625005 (10ssingh)
[20:05:16] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.dns.netbox
[20:08:56] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding apus-fe2004 to codfw - jhancock@cumin2002"
[20:09:01] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding apus-fe2004 to codfw - jhancock@cumin2002"
[20:09:01] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:09:08] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host apus-fe2004
[20:09:18] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe2004
[20:09:24] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host apus-fe2005
[20:09:35] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe2005
[20:10:07] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host apus-2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:10:22] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host apus-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:10:44] <wikibugs>	 (03PS1) 10Medelius: EditCheck: update shown stats on initial page load [extensions/VisualEditor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240041 (https://phabricator.wikimedia.org/T417452)
[20:11:12] <wikibugs>	 (03PS1) 10Medelius: EditCheck: adjust editsuggestion-seen tag [extensions/VisualEditor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240043 (https://phabricator.wikimedia.org/T413419)
[20:16:00] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 17 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/VisualEditor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240041 (https://phabricator.wikimedia.org/T417452) (owner: 10Medelius)
[20:16:19] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Q3:rack/setup/install frdata1003, frmx1002, frqueue100[5-6] - https://phabricator.wikimedia.org/T416249#11625048 (10VRiley-WMF) frqueue1005  1st CableID: 230304500128 Port 26  2nd CableID: 239394599180 Port 26   frqueue1006 1st CableID: 230304500102 port 27  2nd CableID: 2303045...
[20:16:23] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 17 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/VisualEditor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240043 (https://phabricator.wikimedia.org/T413419) (owner: 10Medelius)
[20:22:41] <wikibugs>	 (03PS1) 10Urbanecm: linkrecommendation: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240045 (https://phabricator.wikimedia.org/T416877)
[20:23:08] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] linkrecommendation: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240045 (https://phabricator.wikimedia.org/T416877) (owner: 10Urbanecm)
[20:23:22] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:23:39] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:25:29] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['apus-fe2004']
[20:25:42] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['apus-fe2004']
[20:25:43] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q3:rack/setup/install frdb1008 - https://phabricator.wikimedia.org/T414374#11625101 (10VRiley-WMF) CableID 230304500126 Port 28  CableID 230304500118 Port 28
[20:26:13] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10Wikidata, 10Wikidata-Query-Service, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): wdqs1028: corrupted filesystem for /srv mount - https://phabricator.wikimedia.org/T417398#11625102 (10bking) a:05bking→03None
[20:26:18] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host apus-2004.codfw.wmnet with OS bookworm
[20:26:26] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Q3:rack/setup/install apus-fe200[4-5] - https://phabricator.wikimedia.org/T416387#11625106 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host apus-2004.codfw.wmnet with OS bookworm
[20:27:50] <wikibugs>	 (03Merged) 10jenkins-bot: linkrecommendation: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240045 (https://phabricator.wikimedia.org/T416877) (owner: 10Urbanecm)
[20:28:25] <wikibugs>	 (03CR) 10Urbanecm: [C:03+1] [Growth] Specify notification delay as int instead of array [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240032 (https://phabricator.wikimedia.org/T375198) (owner: 10Sergio Gimeno)
[20:28:27] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host apus-fe2005.codfw.wmnet with OS bookworm
[20:28:40] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Q3:rack/setup/install apus-fe200[4-5] - https://phabricator.wikimedia.org/T416387#11625113 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host apus-fe2005.codfw.wmnet with OS bookworm
[20:29:12] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.dns.netbox
[20:29:26] <logmsgbot>	 !log urbanecm@deploy2002 helmfile [staging] START helmfile.d/services/linkrecommendation: apply
[20:30:20] <wikibugs>	 (03PS1) 10Dwisehaupt: Add digicert validation TXT record for payments [dns] - 10https://gerrit.wikimedia.org/r/1240047 (https://phabricator.wikimedia.org/T411785)
[20:30:50] <logmsgbot>	 !log urbanecm@deploy2002 helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
[20:31:10] <wikibugs>	 (03CR) 10Dwisehaupt: "Adding @jgreen@wikimedia.org although he may be unable to review for a few days." [dns] - 10https://gerrit.wikimedia.org/r/1240047 (https://phabricator.wikimedia.org/T411785) (owner: 10Dwisehaupt)
[20:31:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add digicert validation TXT record for payments [dns] - 10https://gerrit.wikimedia.org/r/1240047 (https://phabricator.wikimedia.org/T411785) (owner: 10Dwisehaupt)
[20:31:51] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:31:54] <logmsgbot>	 !log urbanecm@deploy2002 helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
[20:33:16] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host apus-fe2004
[20:33:25] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe2004
[20:33:49] <logmsgbot>	 !log urbanecm@deploy2002 helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
[20:33:59] <logmsgbot>	 !log urbanecm@deploy2002 helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
[20:34:16] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host apus-2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:35:00] <logmsgbot>	 !log vriley@cumin1003 START - Cookbook sre.dns.netbox
[20:35:44] <logmsgbot>	 !log urbanecm@deploy2002 helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
[20:38:46] <logmsgbot>	 !log vriley@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  [frdb1008] - vriley@cumin1003"
[20:38:50] <logmsgbot>	 !log vriley@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  [frdb1008] - vriley@cumin1003"
[20:38:50] <logmsgbot>	 !log vriley@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:39:46] <logmsgbot>	 !log vriley@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host frdb1008
[20:39:47] <logmsgbot>	 !log vriley@cumin1003 END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host frdb1008
[20:40:01] <logmsgbot>	 !log vriley@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host frdb1008
[20:40:01] <logmsgbot>	 !log vriley@cumin1003 END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host frdb1008
[20:40:46] <wikibugs>	 (03PS2) 10Dwisehaupt: Add digicert validation CNAME record for payments [dns] - 10https://gerrit.wikimedia.org/r/1240047 (https://phabricator.wikimedia.org/T411785)
[20:42:26] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:49:17] <logmsgbot>	 jhancock@cumin2002 reimage (PID 2011482) is awaiting input
[20:52:44] <wikibugs>	 (03PS8) 10Bking: dse-k8s: Enable active/active for dse-k8s clusters [dns] - 10https://gerrit.wikimedia.org/r/1238441 (https://phabricator.wikimedia.org/T396478)
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T2100).
[21:00:05] <jouncebot>	 cmede: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:02:06] <cmede>	 o/
[21:02:52] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] admin: add ajaved-wmf to analytics-privatedata, level 1 [puppet] - 10https://gerrit.wikimedia.org/r/1238070 (https://phabricator.wikimedia.org/T416922) (owner: 10Dzahn)
[21:02:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 07Sustainability (Incident Followup): move the link from lvs1020 from ssw1-f1-eqiad to ssw1-e1-eqiad - https://phabricator.wikimedia.org/T417054#11625227 (10wiki_willy) a:03VRiley-WMF
[21:03:03] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on wdqs1028 - https://phabricator.wikimedia.org/T416736#11625229 (10Jclark-ctr)
[21:03:13] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Wikidata, and 2 others: wdqs1028: corrupted filesystem for /srv mount - https://phabricator.wikimedia.org/T417398#11625232 (10Jclark-ctr) →14Duplicate dup:03T416736
[21:08:43] <wikibugs>	 (03PS3) 10Dzahn: admin: add ajaved-wmf to analytics-privatedata, level 1 [puppet] - 10https://gerrit.wikimedia.org/r/1238070 (https://phabricator.wikimedia.org/T416922)
[21:10:36] <Kemayo>	 I am popping in to do the deploy for cmede's patches.
[21:10:43] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] admin: add ajaved-wmf to analytics-privatedata, level 1 [puppet] - 10https://gerrit.wikimedia.org/r/1238070 (https://phabricator.wikimedia.org/T416922) (owner: 10Dzahn)
[21:10:47] <Kemayo>	 cmede: is it okay to deploy them both together?
[21:10:56] <cmede>	 It should be
[21:11:32] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kemayo@deploy2002 using scap backport" [extensions/VisualEditor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240041 (https://phabricator.wikimedia.org/T417452) (owner: 10Medelius)
[21:11:32] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kemayo@deploy2002 using scap backport" [extensions/VisualEditor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240043 (https://phabricator.wikimedia.org/T413419) (owner: 10Medelius)
[21:12:53] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on wdqs1028 - https://phabricator.wikimedia.org/T416736#11625251 (10bking) a:05Jclark-ctr→03bking Thanks @Jclark-ctr ! I'll take this ticket from you and will re-image, which should fix the SW RAID issue.
[21:13:17] <wikibugs>	 (03Merged) 10jenkins-bot: EditCheck: update shown stats on initial page load [extensions/VisualEditor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240041 (https://phabricator.wikimedia.org/T417452) (owner: 10Medelius)
[21:13:20] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:13:29] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11625255 (10Dzahn)
[21:13:53] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-product-users, airflow-analytics-product-admins for akhatun - https://phabricator.wikimedia.org/T416703#11625259 (10AKhatun_WMF) Confirming I can access admin panels in airflow-analytics-product.wikimedia.org. I see the following in statbox: ` akhatu...
[21:14:10] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11625262 (10Dzahn)
[21:15:04] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar, 13Patch-For-Review: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11625266 (10Dzahn) @AJaved-WMF You have been added to the requested group. Please give it about 30 minutes...
[21:15:23] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.reimage for host wdqs1028.eqiad.wmnet with OS bookworm
[21:16:04] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.move-vlan for host wdqs1028
[21:16:15] <wikibugs>	 (03CR) 10Ssingh: Add digicert validation CNAME record for payments (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1240047 (https://phabricator.wikimedia.org/T411785) (owner: 10Dwisehaupt)
[21:16:30] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.netbox
[21:18:15] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe2005.codfw.wmnet with OS bookworm
[21:18:26] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Q3:rack/setup/install apus-fe200[4-5] - https://phabricator.wikimedia.org/T416387#11625285 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host apus-fe2005.codfw.wmnet with OS bookworm executed with e...
[21:18:39] <wikibugs>	 06SRE, 06Traffic: Anycast ns[01].wikimedia.org for IPv4 - https://phabricator.wikimedia.org/T366193#11625286 (10BBlack) Re: anycast catchments, diversity, resilience, etc (some of this is re-treading things said above, but bear with me):  The ideal state for anycast authdns is that you have multiple distinct (...
[21:21:22] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1028 - bking@cumin2002"
[21:21:27] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1028 - bking@cumin2002"
[21:21:28] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[21:21:28] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.wipe-cache wdqs1028.eqiad.wmnet 6.48.64.10.in-addr.arpa 6.0.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[21:21:31] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1028.eqiad.wmnet 6.48.64.10.in-addr.arpa 6.0.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[21:21:32] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wdqs1028
[21:22:05] <wikibugs>	 (03Merged) 10jenkins-bot: EditCheck: adjust editsuggestion-seen tag [extensions/VisualEditor] (wmf/1.46.0-wmf.15) - 10https://gerrit.wikimedia.org/r/1240043 (https://phabricator.wikimedia.org/T413419) (owner: 10Medelius)
[21:22:38] <logmsgbot>	 !log kemayo@deploy2002 Started scap sync-world: Backport for [[gerrit:1240041|EditCheck: update shown stats on initial page load (T417452)]], [[gerrit:1240043|EditCheck: adjust editsuggestion-seen tag (T413419)]]
[21:22:44] <stashbot>	 T417452: EditCheck VEFU instrumentation not correctly logging initial "shown" events - https://phabricator.wikimedia.org/T417452
[21:22:44] <stashbot>	 T413419: Append a tag to edits in which ≥1 Edit Suggestion was visible in the browser viewport - https://phabricator.wikimedia.org/T413419
[21:23:26] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1028
[21:23:27] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1028
[21:24:25] <wikibugs>	 06SRE, 06Traffic: Anycast ns[01].wikimedia.org for IPv4 - https://phabricator.wikimedia.org/T366193#11625305 (10BBlack) While I'm on these esoteric subjects - another bonus thing that some operators do, is place their nameserver *hostnames* in distinct TLDs operated by distinct operators.  For example, having...
[21:26:39] <jinxer-wm>	 FIRING: CoreBGPDown: Core BGP session down between cr2-eqdfw and cr2-drmrs (2620:0:860:fe0a::2) - group Confed_drmrs - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=codfw&var-device=cr2-eqdfw:9804&var-bgp_group=Confed_drmrs&var-bgp_neighbor=cr2-drmrs - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[21:27:39] <wikibugs>	 (03PS1) 10JHathaway: spf: create records for mx boxes [dns] - 10https://gerrit.wikimedia.org/r/1240060
[21:29:31] <wikibugs>	 (03PS9) 10Bking: dse-k8s: Enable active/active for dse-k8s clusters [dns] - 10https://gerrit.wikimedia.org/r/1238441 (https://phabricator.wikimedia.org/T396478)
[21:30:47] <icinga-wm>	 RECOVERY - Host sretest1002 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms
[21:31:39] <jinxer-wm>	 RESOLVED: CoreBGPDown: Core BGP session down between cr2-eqdfw and cr2-drmrs (2620:0:860:fe0a::2) - group Confed_drmrs - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=codfw&var-device=cr2-eqdfw:9804&var-bgp_group=Confed_drmrs&var-bgp_neighbor=cr2-drmrs - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[21:32:20] <wikibugs>	 (03PS2) 10JHathaway: spf: create records for mx boxes [dns] - 10https://gerrit.wikimedia.org/r/1240060
[21:35:39] <wikibugs>	 (03PS3) 10Dwisehaupt: Add digicert validation CNAME record for payments [dns] - 10https://gerrit.wikimedia.org/r/1240047 (https://phabricator.wikimedia.org/T411785)
[21:37:04] <wikibugs>	 (03CR) 10Dwisehaupt: Add digicert validation CNAME record for payments (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1240047 (https://phabricator.wikimedia.org/T411785) (owner: 10Dwisehaupt)
[21:37:54] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] "Good catch!" [dns] - 10https://gerrit.wikimedia.org/r/1240060 (owner: 10JHathaway)
[21:40:13] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] Add digicert validation CNAME record for payments [dns] - 10https://gerrit.wikimedia.org/r/1240047 (https://phabricator.wikimedia.org/T411785) (owner: 10Dwisehaupt)
[21:40:37] <icinga-wm>	 PROBLEM - Host sretest1002 is DOWN: PING CRITICAL - Packet loss = 0%, RTA = 2384.80 ms
[21:41:11] <icinga-wm>	 RECOVERY - Host sretest1002 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms
[21:43:03] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1028.eqiad.wmnet with reason: host reimage
[21:43:22] <wikibugs>	 (03PS1) 10BCornwall: hieradata: Set HAProxy version to 3 for cp204[34] [puppet] - 10https://gerrit.wikimedia.org/r/1240064 (https://phabricator.wikimedia.org/T401832)
[21:43:52] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] spf: create records for mx boxes [dns] - 10https://gerrit.wikimedia.org/r/1240060 (owner: 10JHathaway)
[21:44:23] <logmsgbot>	 !log jhathaway@dns1004 START - running authdns-update
[21:45:43] <logmsgbot>	 !log jhathaway@dns1004 END - running authdns-update
[21:46:07] <wikibugs>	 (03CR) 10RobH: [C:03+2] "This looks good to me and follows the directions outlined on the parent task from Digicert to allow for dns cname level domain validation." [dns] - 10https://gerrit.wikimedia.org/r/1240047 (https://phabricator.wikimedia.org/T411785) (owner: 10Dwisehaupt)
[21:46:45] <wikibugs>	 (03CR) 10BCornwall: [V:03+1] "PCC SUCCESS (NOOP 1 CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1240064 (https://phabricator.wikimedia.org/T401832) (owner: 10BCornwall)
[21:46:59] <logmsgbot>	 !log kemayo@deploy2002 caro, kemayo: Backport for [[gerrit:1240041|EditCheck: update shown stats on initial page load (T417452)]], [[gerrit:1240043|EditCheck: adjust editsuggestion-seen tag (T413419)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[21:47:04] <stashbot>	 T417452: EditCheck VEFU instrumentation not correctly logging initial "shown" events - https://phabricator.wikimedia.org/T417452
[21:47:04] <stashbot>	 T413419: Append a tag to edits in which ≥1 Edit Suggestion was visible in the browser viewport - https://phabricator.wikimedia.org/T413419
[21:47:11] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] Run Puppetboard spec tests on Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1239586 (owner: 10Muehlenhoff)
[21:47:22] <Kemayo>	 cmede: Want to verify on the testservers?
[21:47:24] <cmede>	 Testing now
[21:47:53] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] puppetdb: Drop firewall rule for access to Puppet 5 servers [puppet] - 10https://gerrit.wikimedia.org/r/1239647 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[21:48:55] <logmsgbot>	 !log robh@dns1004 START - running authdns-update
[21:49:15] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1028.eqiad.wmnet with reason: host reimage
[21:49:19] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] base::kernel: Unconditionally use the autoremove logic [puppet] - 10https://gerrit.wikimedia.org/r/1239696 (owner: 10Muehlenhoff)
[21:49:58] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] hieradata: Set HAProxy version to 3 for cp204[34] [puppet] - 10https://gerrit.wikimedia.org/r/1240064 (https://phabricator.wikimedia.org/T401832) (owner: 10BCornwall)
[21:50:14] <logmsgbot>	 !log robh@dns1004 END - running authdns-update
[21:50:21] <cmede>	 Ok, looks good to me - both of them
[21:50:45] <logmsgbot>	 !log kemayo@deploy2002 caro, kemayo: Continuing with sync
[21:51:19] <wikibugs>	 (03CR) 10BCornwall: [V:03+1 C:03+2] hieradata: Set HAProxy version to 3 for cp204[34] [puppet] - 10https://gerrit.wikimedia.org/r/1240064 (https://phabricator.wikimedia.org/T401832) (owner: 10BCornwall)
[21:54:06] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] Remove puppetmaster::monitoring and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239891 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[21:54:21] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] Unconditionally  install puppet-module-puppetlabs-augeas-core [puppet] - 10https://gerrit.wikimedia.org/r/1239889 (owner: 10Muehlenhoff)
[21:54:46] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] Remove puppetmaster::r10k [puppet] - 10https://gerrit.wikimedia.org/r/1239897 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[21:55:34] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] Remove puppetmaster::gitclone and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239895 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[21:55:56] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] Remove puppetmaster::rsync and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1239898 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[21:56:19] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] Remove puppetmaster:ssl [puppet] - 10https://gerrit.wikimedia.org/r/1239908 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[22:00:05] <jouncebot>	 Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260217T2200)
[22:03:05] <logmsgbot>	 !log kemayo@deploy2002 Finished scap sync-world: Backport for [[gerrit:1240041|EditCheck: update shown stats on initial page load (T417452)]], [[gerrit:1240043|EditCheck: adjust editsuggestion-seen tag (T413419)]] (duration: 40m 26s)
[22:03:14] <stashbot>	 T417452: EditCheck VEFU instrumentation not correctly logging initial "shown" events - https://phabricator.wikimedia.org/T417452
[22:03:14] <stashbot>	 T413419: Append a tag to edits in which ≥1 Edit Suggestion was visible in the browser viewport - https://phabricator.wikimedia.org/T413419
[22:03:32] <Kemayo>	 Well, that was quite slow -- we spent 50 minutes on the one pair of patches. But, all done!
[22:03:39] <cmede>	 Woohoo!
[22:04:43] <icinga-wm>	 PROBLEM - Host sretest1002 is DOWN: PING CRITICAL - Packet loss = 100%
[22:04:47] <dancy>	 Kemayo: FYI the slowness is due to the change to en.json in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/1240043   l10n changes == long deployment
[22:05:11] <icinga-wm>	 RECOVERY - Host sretest1002 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms
[22:05:16] <Kemayo>	 It felt particularly bad even for one of those.
[22:07:16] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922#11625413 (10Dzahn) a:05ccasilli→03AJaved-WMF
[22:09:20] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-product-users, airflow-analytics-product-admins for akhatun - https://phabricator.wikimedia.org/T416703#11625420 (10Dzahn) 05Open→03Resolved a:03Dzahn Thanks for confirming. Assuming this is all resolved now. If anyone disagrees feel free t...
[22:09:43] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-product-users, airflow-analytics-product-admins for akhatun - https://phabricator.wikimedia.org/T416703#11625423 (10Dzahn) a:05Dzahn→03None
[22:10:18] <wikibugs>	 (03PS9) 10Bking: opensearch-cluster: allow the definition of custom network policies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238298 (https://phabricator.wikimedia.org/T414095) (owner: 10Brouberol)
[22:10:45] <wikibugs>	 (03CR) 10Bking: opensearch-cluster: allow the definition of custom network policies (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238298 (https://phabricator.wikimedia.org/T414095) (owner: 10Brouberol)
[22:13:35] <wikibugs>	 (03Abandoned) 10Bking: opensearch-semantic-search-test: allow NS outbound access to liftwing [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239384 (https://phabricator.wikimedia.org/T414095) (owner: 10Bking)
[22:14:11] <wikibugs>	 (03CR) 10Bking: [C:03+1] opensearch-cluster: allow the definition of custom network policies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238298 (https://phabricator.wikimedia.org/T414095) (owner: 10Brouberol)
[22:15:08] <wikibugs>	 (03CR) 10Ryan Kemper: "We just fixed this in latest PS" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238298 (https://phabricator.wikimedia.org/T414095) (owner: 10Brouberol)
[22:24:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::7a4f:9b00:d4e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[22:29:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-eqiad and fe80::7a4f:9b00:d4e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[22:52:40] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:19:41] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[23:26:00] <wikibugs>	 (03PS3) 10Arlolra: Deploy PRV to 20 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239270 (https://phabricator.wikimedia.org/T417349)