[00:04:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 19.62% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:39:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.4% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:40:13] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1237725 [00:40:13] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1237725 (owner: 10TrainBranchBot) [00:40:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.51% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:54:02] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1237725 (owner: 10TrainBranchBot) [00:54:30] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.79% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:56:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 21.9% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:59:17] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [01:10:45] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1237728 [01:10:45] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1237728 (owner: 10TrainBranchBot) [01:34:50] 10ops-codfw, 06SRE, 06DC-Ops: codfw:expansion: Network devices/patch panel wiring - https://phabricator.wikimedia.org/T382219#11594791 (10Papaul) @cmooney thanks for bringing this up and finding the issue. I didn't really think about the mgmt vlan until now reading your comment. What you say about having the... [01:35:39] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1237728 (owner: 10TrainBranchBot) [01:46:02] FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate _etcd-server-ssl._tcp.ml_etcd.codfw.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [02:00:41] !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image [02:06:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.72% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [02:13:34] !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 12m 52s) [02:14:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 21.55% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [02:14:32] FIRING: [2x] ProbeDown: Service wdqs1020:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1020:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:19:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.72% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [02:56:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.72% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [03:14:41] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [03:29:17] FIRING: KubernetesCalicoDown: wikikube-worker2019.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s&var-instance=wikikube-worker2019.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [03:46:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.12% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [03:47:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 21.83% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [03:51:30] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.72% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [03:52:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.04% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:01:30] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.68% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:03:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.9% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:08:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 21.65% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:11:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.72% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:16:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.93% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:18:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.96% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:23:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.22% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:25:08] (03PS1) 10Kevin Bazira: ml: chunk torch libs in vLLM 0.14 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1237730 (https://phabricator.wikimedia.org/T415627) [04:28:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.93% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:38:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.4% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:51:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.65% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:56:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.51% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:56:45] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.11% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [04:59:17] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [05:06:30] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.11% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [05:08:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.9% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [05:09:17] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:28:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.36% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [05:30:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.61% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [05:34:17] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:35:13] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:35:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [05:36:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.82% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [05:46:02] FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate _etcd-server-ssl._tcp.ml_etcd.codfw.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [06:11:02] 06SRE, 06Infrastructure-Foundations, 10netops: cr2-codfw alarm: FPC 5 power is unstable - https://phabricator.wikimedia.org/T416691#11594857 (10ayounsi) Actually, that linecard is 13 years old and out of warranty, we keep it there just in case we needed it, but if it shows signs of failing we should just rem... [06:11:46] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T416554 [06:11:50] T416554: Switchover s1 master (db2203 -> db2212) - https://phabricator.wikimedia.org/T416554 [06:12:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Set db2212 with weight 0 T416554', diff saved to https://phabricator.wikimedia.org/P88722 and previous config saved to /var/cache/conftool/dbconfig/20260209-061218-marostegui.json [06:12:45] (03CR) 10Marostegui: [C:03+2] mariadb: Promote db2212 to s1 master [puppet] - 10https://gerrit.wikimedia.org/r/1237136 (https://phabricator.wikimedia.org/T416554) (owner: 10Gerrit maintenance bot) [06:13:05] !log Starting s1 codfw failover from db2203 to db2212 - T416554 [06:13:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:14:32] FIRING: [2x] ProbeDown: Service wdqs1020:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1020:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [06:17:15] (03PS1) 10Marostegui: db2203: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1237733 [06:17:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Set s1 codfw as read-only for maintenance - T416554', diff saved to https://phabricator.wikimedia.org/P88723 and previous config saved to /var/cache/conftool/dbconfig/20260209-061732-marostegui.json [06:17:36] T416554: Switchover s1 master (db2203 -> db2212) - https://phabricator.wikimedia.org/T416554 [06:17:57] !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote db2212 to s1 primary and set section read-write T416554', diff saved to https://phabricator.wikimedia.org/P88724 and previous config saved to /var/cache/conftool/dbconfig/20260209-061756-marostegui.json [06:18:19] (03CR) 10Marostegui: [C:03+2] wmnet: Update s1-master alias [dns] - 10https://gerrit.wikimedia.org/r/1237137 (https://phabricator.wikimedia.org/T416554) (owner: 10Gerrit maintenance bot) [06:18:22] !log marostegui@dns1006 START - running authdns-update [06:19:05] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db2203 T416554', diff saved to https://phabricator.wikimedia.org/P88725 and previous config saved to /var/cache/conftool/dbconfig/20260209-061904-marostegui.json [06:19:31] !log marostegui@dns1006 END - running authdns-update [06:20:12] (03CR) 10Marostegui: [C:03+2] db2203: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1237733 (owner: 10Marostegui) [06:22:42] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2203.codfw.wmnet with reason: Schema change [06:23:14] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2203.codfw.wmnet with reason: Maintenance [06:30:30] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.68% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [06:31:29] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad: rows C/D Upgrade Decom Asw Switches in Rows C & D - https://phabricator.wikimedia.org/T412525#11594896 (10ayounsi) a:05cmooney→03VRiley-WMF [06:31:45] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 20.83% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [06:33:39] (03CR) 10Ayounsi: [C:03+2] Remove LibreNMS as syslog target [homer/public] - 10https://gerrit.wikimedia.org/r/1237465 (https://phabricator.wikimedia.org/T415270) (owner: 10Ayounsi) [06:34:12] (03CR) 10Ayounsi: [C:03+2] Switch bblack ssh key to ed25519 [homer/public] - 10https://gerrit.wikimedia.org/r/1237509 (owner: 10BBlack) [06:35:00] (03Merged) 10jenkins-bot: Remove LibreNMS as syslog target [homer/public] - 10https://gerrit.wikimedia.org/r/1237465 (https://phabricator.wikimedia.org/T415270) (owner: 10Ayounsi) [06:35:34] (03Merged) 10jenkins-bot: Switch bblack ssh key to ed25519 [homer/public] - 10https://gerrit.wikimedia.org/r/1237509 (owner: 10BBlack) [06:41:45] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.79% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [06:42:46] 06SRE, 06Infrastructure-Foundations, 10netops: Update network SSH keys to ssh-ed25519 - https://phabricator.wikimedia.org/T336769#11594905 (10ayounsi) 05Open→03Resolved All good through https://gerrit.wikimedia.org/r/c/operations/homer/public/+/1237509 [07:14:41] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [07:19:17] FIRING: [4x] ProbeDown: Service wdqs1020:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:25:56] !log ayounsi@cumin1003 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [07:26:06] !log ayounsi@cumin1003 END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=99) for host aux-k8s-worker1006 [07:26:06] !log ayounsi@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm [07:29:17] FIRING: KubernetesCalicoDown: wikikube-worker2019.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s&var-instance=wikikube-worker2019.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [07:31:41] (03PS1) 10Muehlenhoff: Updates records for aramilferaxa [puppet] - 10https://gerrit.wikimedia.org/r/1237734 [07:33:41] (03CR) 10Muehlenhoff: [C:03+2] Updates records for aramilferaxa [puppet] - 10https://gerrit.wikimedia.org/r/1237734 (owner: 10Muehlenhoff) [07:36:06] (03PS1) 10Muehlenhoff: Remove access for nettrom [puppet] - 10https://gerrit.wikimedia.org/r/1237736 [07:40:16] !log jmm@cumin2002 DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nettrom out of all services on: 2497 hosts [07:43:11] (03CR) 10Muehlenhoff: [C:03+2] Remove access for nettrom [puppet] - 10https://gerrit.wikimedia.org/r/1237736 (owner: 10Muehlenhoff) [07:53:38] (03PS1) 10Marostegui: Revert "db2203: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1237737 [07:54:20] !log marostegui@cumin1003 START - Cookbook sre.mysql.newpool pool db2203: After schema change [07:54:24] (03CR) 10Marostegui: [C:03+2] Revert "db2203: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1237737 (owner: 10Marostegui) [07:55:49] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove the Puppet 5 CA cert from the cert bundle [debs/wmf-certificates] - 10https://gerrit.wikimedia.org/r/1237476 (https://phabricator.wikimedia.org/T415255) (owner: 10Muehlenhoff) [07:59:28] (03PS3) 10Daphne Smit: [wikifunctions] Grant sysops permission to edit function of attached implementation and tester [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1227748 (https://phabricator.wikimedia.org/T399934) [08:00:05] Amir1, Urbanecm, and awight: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T0800). [08:00:05] James_F: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [08:00:20] (Hey.) [08:00:44] (03PS4) 10Jforrester: [wikifunctions] Grant sysops permission to edit function of attached implementation and tester [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1227748 (https://phabricator.wikimedia.org/T399934) (owner: 10Daphne Smit) [08:00:49] (03CR) 10TrainBranchBot: [C:03+2] "Copied votes on follow-up patch sets have been updated:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1227748 (https://phabricator.wikimedia.org/T399934) (owner: 10Daphne Smit) [08:01:18] (03PS1) 10Muehlenhoff: Bump changelog [debs/wmf-certificates] - 10https://gerrit.wikimedia.org/r/1237739 (https://phabricator.wikimedia.org/T415255) [08:02:25] (03Merged) 10jenkins-bot: [wikifunctions] Grant sysops permission to edit function of attached implementation and tester [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1227748 (https://phabricator.wikimedia.org/T399934) (owner: 10Daphne Smit) [08:03:50] (03CR) 10Jforrester: "@xcollazo@wikimedia.org, @dandreescu@wikimedia.org: This was merged over the weekend to a production config repo but not deployed. Did you" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237540 (https://phabricator.wikimedia.org/T416719) (owner: 10Xcollazo) [08:04:13] (03PS1) 10Brouberol: airflow: Add a script fetching the content of an XCOM from s3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237740 (https://phabricator.wikimedia.org/T416821) [08:04:28] (03PS1) 10Jforrester: Revert "EventStreamConfig: Bump product_metrics.web_base* streams to large size" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237741 [08:04:34] (03CR) 10Jforrester: [C:03+2] Revert "EventStreamConfig: Bump product_metrics.web_base* streams to large size" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237741 (owner: 10Jforrester) [08:04:57] (03CR) 10Federico Ceratto: "As discussed on IRC, the change affects all SREs and we want to announce it internally. The pool/depool process is critical and I did not " [cookbooks] - 10https://gerrit.wikimedia.org/r/1236726 (https://phabricator.wikimedia.org/T383674) (owner: 10Federico Ceratto) [08:04:57] (03PS2) 10Brouberol: airflow: Add a script fetching the content of an XCOM from s3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237740 (https://phabricator.wikimedia.org/T416821) [08:04:59] (03CR) 10Federico Ceratto: [C:03+2] mysql: rename newpool cookbook to pool [cookbooks] - 10https://gerrit.wikimedia.org/r/1236726 (https://phabricator.wikimedia.org/T383674) (owner: 10Federico Ceratto) [08:05:24] (03Merged) 10jenkins-bot: Revert "EventStreamConfig: Bump product_metrics.web_base* streams to large size" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237741 (owner: 10Jforrester) [08:06:16] (03CR) 10Slyngshede: [C:03+2] Permissions: Format HTML emails [software/bitu] - 10https://gerrit.wikimedia.org/r/1237473 (https://phabricator.wikimedia.org/T416565) (owner: 10Slyngshede) [08:06:45] !log jforrester@deploy2002 Started scap sync-world: Backport for [[gerrit:1227748|[wikifunctions] Grant sysops permission to edit function of attached implementation and tester (T399934)]] [08:06:49] T399934: tests moved to a different function still show on implementations of the original - https://phabricator.wikimedia.org/T399934 [08:07:01] (03CR) 10Slyngshede: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1237518 (https://phabricator.wikimedia.org/T416358) (owner: 10Dzahn) [08:07:21] (03PS1) 10Jforrester: Revert^2 "EventStreamConfig: Bump product_metrics.web_base* streams to large size" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237743 (https://phabricator.wikimedia.org/T416719) [08:08:53] (03Merged) 10jenkins-bot: Permissions: Format HTML emails [software/bitu] - 10https://gerrit.wikimedia.org/r/1237473 (https://phabricator.wikimedia.org/T416565) (owner: 10Slyngshede) [08:10:00] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Bump changelog [debs/wmf-certificates] - 10https://gerrit.wikimedia.org/r/1237739 (https://phabricator.wikimedia.org/T415255) (owner: 10Muehlenhoff) [08:10:13] FIRING: ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:14:17] RESOLVED: [2x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:15:48] !log brouberol@cumin1003 START - Cookbook sre.hosts.reboot-single for host an-launcher1003.eqiad.wmnet [08:19:14] !log brouberol@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-launcher1003.eqiad.wmnet [08:21:07] (03PS6) 10Elukey: confluent::kafka::common: allow using Kafka 1.1 with Openjdk 17 [puppet] - 10https://gerrit.wikimedia.org/r/1237502 (https://phabricator.wikimedia.org/T416674) [08:21:07] (03PS2) 10Elukey: profile::kafka::broker: allow to force openjdk-17 [puppet] - 10https://gerrit.wikimedia.org/r/1237507 (https://phabricator.wikimedia.org/T416674) [08:21:07] (03PS3) 10Elukey: role::kafka::test: force JDK7 and apply missing inter_broker_protocol_version [puppet] - 10https://gerrit.wikimedia.org/r/1237508 (https://phabricator.wikimedia.org/T416674) [08:21:59] !log brouberol@cumin1003 START - Cookbook sre.ganeti.reboot-vm for VM an-launcher1003.eqiad.wmnet [08:24:15] (03PS7) 10Elukey: confluent::kafka::common: allow using Kafka 1.1 with Openjdk 17 [puppet] - 10https://gerrit.wikimedia.org/r/1237502 (https://phabricator.wikimedia.org/T416674) [08:24:15] (03PS3) 10Elukey: profile::kafka::broker: allow to force openjdk-17 [puppet] - 10https://gerrit.wikimedia.org/r/1237507 (https://phabricator.wikimedia.org/T416674) [08:24:15] (03PS4) 10Elukey: role::kafka::test: force JDK7 and apply missing inter_broker_protocol_version [puppet] - 10https://gerrit.wikimedia.org/r/1237508 (https://phabricator.wikimedia.org/T416674) [08:24:30] (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237502 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [08:24:45] (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237507 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [08:24:54] (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237508 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [08:25:19] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for trueg - https://phabricator.wikimedia.org/T415632#11595003 (10trueg) @elukey sorry for the late reply. Yes, indeed, there is no need for a new account. In fact I thought I did attach the existing ssh key for completeness sake. Maybe I mis-co... [08:25:44] (03CR) 10Jelto: [C:03+1] "lgtm" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237524 (https://phabricator.wikimedia.org/T414098) (owner: 10Dzahn) [08:25:53] !log brouberol@cumin1003 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-launcher1003.eqiad.wmnet [08:26:07] James_F: [08:26:14] Ugh. Monday [08:26:18] phuedx: Hey. [08:26:52] Thanks for reverting the ext-EventStreamConfig commit. I'll follow up on the thread with xcollazo and milimetric about its status [08:27:18] (03CR) 10Elukey: "In the previous patch I forgot to add an option to move the default jvm opts to something supported by jvm-17+." [puppet] - 10https://gerrit.wikimedia.org/r/1237502 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [08:27:32] phuedx: Of course. I imagine it's fine, I just wasn't confident in deploying it without knowing whether it broke things. [08:27:50] (03CR) 10Jelto: [V:03+1 C:03+2] gitlab: move mid-day backup out of maintenance window [puppet] - 10https://gerrit.wikimedia.org/r/1237475 (https://phabricator.wikimedia.org/T416687) (owner: 10Jelto) [08:30:09] !log jforrester@deploy2002 daphnesmit, jforrester: Backport for [[gerrit:1227748|[wikifunctions] Grant sysops permission to edit function of attached implementation and tester (T399934)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [08:30:12] T399934: tests moved to a different function still show on implementations of the original - https://phabricator.wikimedia.org/T399934 [08:30:57] I can see if I can run a quick test on testwiki [08:31:10] !log jforrester@deploy2002 daphnesmit, jforrester: Continuing with sync [08:32:20] wait, I think I got to the too early conversation, nevermind [08:34:55] (03PS1) 10Phuedx: metrics(ReviseTone): Use Experiment::send to send metrics [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237851 (https://phabricator.wikimedia.org/T416612) [08:35:02] (03PS1) 10Phuedx: metrics(ReviseTone): send consistent experiment exposure event [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237852 (https://phabricator.wikimedia.org/T416199) [08:35:58] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237851 (https://phabricator.wikimedia.org/T416612) (owner: 10Phuedx) [08:36:08] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237852 (https://phabricator.wikimedia.org/T416199) (owner: 10Phuedx) [08:39:00] (03CR) 10Dpogorzelski: [C:03+2] ml: chunk torch libs in vLLM 0.14 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1237730 (https://phabricator.wikimedia.org/T415627) (owner: 10Kevin Bazira) [08:39:45] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2203: After schema change [08:43:21] * James_F twiddles thumbs. [08:44:01] !log jforrester@deploy2002 Finished scap sync-world: Backport for [[gerrit:1227748|[wikifunctions] Grant sysops permission to edit function of attached implementation and tester (T399934)]] (duration: 37m 15s) [08:44:02] Finally. Over to phuedx or whomever. [08:44:04] T399934: tests moved to a different function still show on implementations of the original - https://phabricator.wikimedia.org/T399934 [08:44:27] (03CR) 10Brouberol: [C:03+1] analytics::cluster::client: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/1237492 (owner: 10Muehlenhoff) [08:44:50] (03CR) 10Brouberol: [C:03+1] analytics::cluster::packages::common: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/1237493 (owner: 10Muehlenhoff) [08:44:54] Ta [08:44:56] * phuedx waits for CI [08:47:12] (03CR) 10CI reject: [V:04-1] metrics(ReviseTone): send consistent experiment exposure event [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237852 (https://phabricator.wikimedia.org/T416199) (owner: 10Phuedx) [08:47:41] ^ MichaelG_WMF [08:48:08] (03CR) 10Dpogorzelski: [V:03+2 C:03+2] ml: chunk torch libs in vLLM 0.14 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1237730 (https://phabricator.wikimedia.org/T415627) (owner: 10Kevin Bazira) [08:48:37] @phuedx: flaky core api-test, unrelated [08:48:55] (though very annoying, happens a lot) [08:49:02] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/1237851 looks good [08:49:10] (03CR) 10Phuedx: "Recheck" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237852 (https://phabricator.wikimedia.org/T416199) (owner: 10Phuedx) [08:55:06] !log ayounsi@cumin1003 START - Cookbook sre.hosts.reimage for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm [08:55:51] !log ayounsi@cumin1003 START - Cookbook sre.hosts.move-vlan for host aux-k8s-worker1006 [08:56:31] !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox [08:59:17] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [08:59:24] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [08:59:24] !log ayounsi@cumin1003 START - Cookbook sre.dns.wipe-cache aux-k8s-worker1006.eqiad.wmnet 132.48.64.10.in-addr.arpa 2.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors [08:59:28] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1006.eqiad.wmnet 132.48.64.10.in-addr.arpa 2.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors [08:59:29] !log ayounsi@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker1006 [09:00:08] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker1006 [09:00:08] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host aux-k8s-worker1006 [09:05:49] (03CR) 10Elukey: ml: chunk torch libs in vLLM 0.14 image (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1237730 (https://phabricator.wikimedia.org/T415627) (owner: 10Kevin Bazira) [09:06:57] MichaelG_WMF: OK. The flaky test appears to have… err… flaked [09:07:33] Let's get the changes deployed. Can you smoke test the Revise Tone experience? [09:08:25] yes, I hope so. [09:08:38] (03CR) 10TrainBranchBot: [C:03+2] "Approved by phuedx@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237851 (https://phabricator.wikimedia.org/T416612) (owner: 10Phuedx) [09:08:39] (03CR) 10TrainBranchBot: [C:03+2] "Approved by phuedx@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237852 (https://phabricator.wikimedia.org/T416199) (owner: 10Phuedx) [09:09:13] parts of it should be testable on testwiki (the saving of an edit), and other parts on enwiki (an event being emitted when visiting the homepage) [09:09:35] I will also monitor EventGate validation errors [09:10:19] (03CR) 10Hashar: gerrit: allow `replication` when in readonly mode (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1237259 (owner: 10Hashar) [09:10:29] !log ayounsi@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1006.eqiad.wmnet with reason: host reimage [09:11:00] (03Merged) 10jenkins-bot: metrics(ReviseTone): Use Experiment::send to send metrics [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237851 (https://phabricator.wikimedia.org/T416612) (owner: 10Phuedx) [09:11:27] (03CR) 10Elukey: "of course it is not right, I modified the wrong class. Fixing.." [puppet] - 10https://gerrit.wikimedia.org/r/1237502 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [09:12:57] (03PS1) 10Muehlenhoff: Properly bump version to reflect last update [debs/wmf-certificates] - 10https://gerrit.wikimedia.org/r/1237853 [09:13:49] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Properly bump version to reflect last update [debs/wmf-certificates] - 10https://gerrit.wikimedia.org/r/1237853 (owner: 10Muehlenhoff) [09:13:51] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1006.eqiad.wmnet with reason: host reimage [09:15:47] (03PS1) 10Trueg: admin: bash cfg for trueg home dir [puppet] - 10https://gerrit.wikimedia.org/r/1237855 [09:20:43] (03Merged) 10jenkins-bot: metrics(ReviseTone): send consistent experiment exposure event [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237852 (https://phabricator.wikimedia.org/T416199) (owner: 10Phuedx) [09:21:03] !log phuedx@deploy2002 Started scap sync-world: Backport for [[gerrit:1237851|metrics(ReviseTone): Use Experiment::send to send metrics (T416612)]], [[gerrit:1237852|metrics(ReviseTone): send consistent experiment exposure event (T416199)]] [09:21:08] T416612: ReviseToneExperimentInteractionLogger should use Experiment#send() - https://phabricator.wikimedia.org/T416612 [09:21:08] T416199: [Revise Tone] Investigate observed impact on constructive activation - https://phabricator.wikimedia.org/T416199 [09:23:51] let me know when it is ready to test 👍 [09:25:05] !log phuedx@deploy2002 phuedx: Backport for [[gerrit:1237851|metrics(ReviseTone): Use Experiment::send to send metrics (T416612)]], [[gerrit:1237852|metrics(ReviseTone): send consistent experiment exposure event (T416199)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [09:25:38] ^ MichaelG_WMF [09:26:32] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet [09:29:46] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm [09:31:24] (03PS8) 10Elukey: confluent::kafka::common: allow using Kafka 1.1 with Openjdk 17 [puppet] - 10https://gerrit.wikimedia.org/r/1237502 (https://phabricator.wikimedia.org/T416674) [09:31:24] (03PS4) 10Elukey: profile::kafka::broker: allow to force openjdk-17 [puppet] - 10https://gerrit.wikimedia.org/r/1237507 (https://phabricator.wikimedia.org/T416674) [09:31:24] (03PS5) 10Elukey: role::kafka::test: force JDK7 and apply missing inter_broker_protocol_version [puppet] - 10https://gerrit.wikimedia.org/r/1237508 (https://phabricator.wikimedia.org/T416674) [09:31:24] (03PS1) 10Elukey: confluent::kafka::broker: fix default options for jvm-17+ [puppet] - 10https://gerrit.wikimedia.org/r/1237858 (https://phabricator.wikimedia.org/T416674) [09:32:02] 06SRE, 06Infrastructure-Foundations, 10netops: cr2-codfw alarm: FPC 5 power is unstable - https://phabricator.wikimedia.org/T416691#11595302 (10ayounsi) I also see that we have 2 old SCBE2-MX in the router: https://netbox.wikimedia.org/dcim/devices/1271/inventory/ (from 2014), as we've replaced them with SCB... [09:32:02] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet [09:32:11] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet [09:34:35] MichaelG_WMF: How's it going? [09:34:54] the save went fine, now looking at enwiki [09:34:56] (03PS2) 10Elukey: confluent::kafka::broker: fix default options for jvm-17+ [puppet] - 10https://gerrit.wikimedia.org/r/1237858 (https://phabricator.wikimedia.org/T416674) [09:34:56] (03PS9) 10Elukey: confluent::kafka::common: allow using Kafka 1.1 with Openjdk 17 [puppet] - 10https://gerrit.wikimedia.org/r/1237502 (https://phabricator.wikimedia.org/T416674) [09:34:56] (03PS5) 10Elukey: profile::kafka::broker: allow to force openjdk-17 [puppet] - 10https://gerrit.wikimedia.org/r/1237507 (https://phabricator.wikimedia.org/T416674) [09:34:56] (03PS6) 10Elukey: role::kafka::test: force JDK7 and apply missing inter_broker_protocol_version [puppet] - 10https://gerrit.wikimedia.org/r/1237508 (https://phabricator.wikimedia.org/T416674) [09:35:39] (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237858 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [09:35:44] (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237502 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [09:35:49] (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237507 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [09:35:53] (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237508 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [09:36:29] (03CR) 10Arnaudb: [C:03+2] gerrit: allow `replication` when in readonly mode [puppet] - 10https://gerrit.wikimedia.org/r/1237259 (owner: 10Hashar) [09:38:22] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet [09:38:52] (03CR) 10Elukey: "This bit can be merged anytime, independently from the rest, since we'll want to have it as we transition to JDK-17." [puppet] - 10https://gerrit.wikimedia.org/r/1237858 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [09:39:26] (03CR) 10Elukey: "Added a patch before this one in the chain to take care of what I mentioned before :)" [puppet] - 10https://gerrit.wikimedia.org/r/1237502 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [09:40:16] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet [09:41:14] !log kubectl delete node wikikube-worker2019.codfw.wmnet - T409102 [09:41:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:17] T409102: decommission wikikube-worker[2003-2004,2007-2010,2019-2032,2040,2043,2045,2048].codfw.wmnet - https://phabricator.wikimedia.org/T409102 [09:41:27] @phuedx looking good on my side on enwiki too! [09:41:27] (03PS1) 10Arnaudb: gerrit: temporarily disable replication on gerrit2003 [puppet] - 10https://gerrit.wikimedia.org/r/1237450 (https://phabricator.wikimedia.org/T387833) [09:42:00] (had to create a new account to finally get natively into the treatment group and thus send events) [09:42:16] MichaelG_WMF: ACK. Sycning [09:42:23] !log phuedx@deploy2002 phuedx: Continuing with sync [09:42:43] (03CR) 10Muehlenhoff: [C:03+1] "Looks good, nit inline (but feel free to ignore)" [puppet] - 10https://gerrit.wikimedia.org/r/1237858 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [09:44:18] RESOLVED: KubernetesCalicoDown: wikikube-worker2019.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s&var-instance=wikikube-worker2019.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [09:46:02] FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate _etcd-server-ssl._tcp.ml_etcd.codfw.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [09:46:39] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet [09:46:41] (03PS3) 10Elukey: confluent::kafka::broker: fix default options for jvm-17+ [puppet] - 10https://gerrit.wikimedia.org/r/1237858 (https://phabricator.wikimedia.org/T416674) [09:46:42] (03PS10) 10Elukey: confluent::kafka::common: allow using Kafka 1.1 with Openjdk 17 [puppet] - 10https://gerrit.wikimedia.org/r/1237502 (https://phabricator.wikimedia.org/T416674) [09:46:42] (03PS6) 10Elukey: profile::kafka::broker: allow to force openjdk-17 [puppet] - 10https://gerrit.wikimedia.org/r/1237507 (https://phabricator.wikimedia.org/T416674) [09:46:42] (03PS7) 10Elukey: role::kafka::test: force JDK7 and apply missing inter_broker_protocol_version [puppet] - 10https://gerrit.wikimedia.org/r/1237508 (https://phabricator.wikimedia.org/T416674) [09:47:02] 10ops-codfw, 06SRE, 06DC-Ops, 10decommission-hardware, and 3 others: decommission wikikube-worker[2003-2004,2007-2010,2019-2032,2040,2043,2045,2048].codfw.wmnet - https://phabricator.wikimedia.org/T409102#11595448 (10MLechvien-WMF) @jasmine_ can we make sure there's guardrail to not forget this? Either bet... [09:47:03] (03CR) 10Elukey: confluent::kafka::broker: fix default options for jvm-17+ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1237858 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [09:47:25] FIRING: SystemdUnitFailed: netbox_ganeti_codfw_test_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:48:08] (03PS3) 10JavierMonton: component: mediawiki.page_html_content_change.dev0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237451 (https://phabricator.wikimedia.org/T360794) [09:48:37] !log phuedx@deploy2002 Finished scap sync-world: Backport for [[gerrit:1237851|metrics(ReviseTone): Use Experiment::send to send metrics (T416612)]], [[gerrit:1237852|metrics(ReviseTone): send consistent experiment exposure event (T416199)]] (duration: 27m 34s) [09:48:41] T416612: ReviseToneExperimentInteractionLogger should use Experiment#send() - https://phabricator.wikimedia.org/T416612 [09:48:42] T416199: [Revise Tone] Investigate observed impact on constructive activation - https://phabricator.wikimedia.org/T416199 [09:49:47] !log End of UTC morning backport window [09:49:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:51:05] Thank you for deploying <3 [09:52:23] 06SRE, 10Charts, 07Kubernetes: Kserve helm chart - https://phabricator.wikimedia.org/T416580#11595523 (10DPogorzelski-WMF) It seems that the major difference is the fact that we have a calico network policy but the chart doesn't (unsurprisingly). Perhaps we can supply that out of band. Our images expect `/us... [09:52:27] MichaelG_WMF: I'm seeing events flowing on the .contributors.experiments event stream [09:52:45] and a corresponding drop in event validation errors from EventGate [09:52:50] \o/ [09:55:00] !log ayounsi@cumin1003 START - Cookbook sre.hosts.reimage for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm [09:55:28] !log ayounsi@cumin1003 START - Cookbook sre.hosts.move-vlan for host aux-k8s-worker1007 [09:55:39] !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox [09:58:11] (03CR) 10JavierMonton: [C:03+2] component: mediawiki.page_html_content_change.dev0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237451 (https://phabricator.wikimedia.org/T360794) (owner: 10JavierMonton) [09:59:06] (03Merged) 10jenkins-bot: component: mediawiki.page_html_content_change.dev0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237451 (https://phabricator.wikimedia.org/T360794) (owner: 10JavierMonton) [10:00:27] !log ayounsi@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host aux-k8s-worker1007 - ayounsi@cumin1003" [10:00:33] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host aux-k8s-worker1007 - ayounsi@cumin1003" [10:00:33] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [10:00:33] !log ayounsi@cumin1003 START - Cookbook sre.dns.wipe-cache aux-k8s-worker1007.eqiad.wmnet 131.48.64.10.in-addr.arpa 1.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors [10:00:36] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1007.eqiad.wmnet 131.48.64.10.in-addr.arpa 1.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors [10:00:36] !log ayounsi@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker1007 [10:00:52] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker1007 [10:00:52] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host aux-k8s-worker1007 [10:02:25] RESOLVED: SystemdUnitFailed: netbox_ganeti_codfw_test_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:11:26] 06SRE, 06Infrastructure-Foundations, 10netops: Offline script - adjust to work with fundraising - https://phabricator.wikimedia.org/T414321#11595653 (10ayounsi) 05Open→03Invalid Please reopen when anyone have more data for that one. [10:12:40] 06SRE, 10Charts, 07Kubernetes: Kserve helm chart - https://phabricator.wikimedia.org/T416580#11595659 (10elukey) @DPogorzelski-WMF please check https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/kserve/README, there is a little more :) Not sure what it is s... [10:17:24] !log ayounsi@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm [10:17:49] !log ayounsi@cumin1003 START - Cookbook sre.hosts.reimage for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm [10:20:08] 10SRE-swift-storage, 10Ceph, 06ServiceOps new, 07Epic, and 3 others: Move the docker registry's /restricted prefix to Docker Distribution backed up by Ceph - https://phabricator.wikimedia.org/T412951#11595680 (10elukey) To keep archives happy - me and Matthew will tentatively schedule the move for Wed 11 a... [10:25:39] (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1237858 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [10:28:56] 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: cr2-codfw alarm: FPC 5 power is unstable - https://phabricator.wikimedia.org/T416691#11595732 (10ayounsi) a:05cmooney→03None >>! In T416691#11594857, @ayounsi wrote: > Actually, that linecard is 13 years old and out of warranty, we... [10:29:35] (03CR) 10Brouberol: role::kafka::test: force JDK7 and apply missing inter_broker_protocol_version (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1237508 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [10:32:13] !log ayounsi@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm [10:33:46] !log ayounsi@cumin1003 START - Cookbook sre.hosts.reimage for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm [10:37:45] (03PS1) 10Muehlenhoff: Enable Bird 2.18 for centrallog [puppet] - 10https://gerrit.wikimedia.org/r/1237862 (https://phabricator.wikimedia.org/T413740) [10:39:40] (03CR) 10Elukey: role::kafka::test: force JDK7 and apply missing inter_broker_protocol_version (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1237508 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [10:42:16] (03CR) 10Hnowlan: "I think the most decisive thing we can do is add ThumbnailRender to `excluded_jobs` in the jobqueue config to ensure that we're completely" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237504 (https://phabricator.wikimedia.org/T415282) (owner: 10Ladsgroup) [10:47:00] (03PS1) 10Vgutierrez: haproxy: Validate host header [puppet] - 10https://gerrit.wikimedia.org/r/1237863 [10:47:31] (03CR) 10CI reject: [V:04-1] haproxy: Validate host header [puppet] - 10https://gerrit.wikimedia.org/r/1237863 (owner: 10Vgutierrez) [10:47:31] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237863 (owner: 10Vgutierrez) [10:47:47] 06SRE, 06Infrastructure-Foundations, 10netops: Nokia SR-Linux DHCP Relay Bug - https://phabricator.wikimedia.org/T411054#11595973 (10cmooney) 05Resolved→03Open Well of course this has occurred again as soon as I made the decisions to close. @ayounsi hit it today on //lsw1-d7-eqiad// trying to reimage au... [10:49:26] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es1033: Will be depooled [10:49:54] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es1033: Will be depooled [10:50:08] (03PS1) 10Marostegui: es1033: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1237864 (https://phabricator.wikimedia.org/T408772) [10:50:54] (03CR) 10Marostegui: [C:03+2] es1033: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1237864 (https://phabricator.wikimedia.org/T408772) (owner: 10Marostegui) [10:52:01] (03CR) 10Brouberol: [C:03+1] role::kafka::test: force JDK7 and apply missing inter_broker_protocol_version (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1237508 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [10:57:05] (03PS2) 10Vgutierrez: haproxy: Validate host header [puppet] - 10https://gerrit.wikimedia.org/r/1237863 [10:59:02] (03PS2) 10JMeybohm: lvs: Allow disabling TCP MSS clamping for IPIP realservers [puppet] - 10https://gerrit.wikimedia.org/r/1237467 (https://phabricator.wikimedia.org/T352956) (owner: 10Vgutierrez) [10:59:02] (03PS5) 10JMeybohm: k8s-staging: Switch to IPIP mode [puppet] - 10https://gerrit.wikimedia.org/r/1237277 (https://phabricator.wikimedia.org/T352956) (owner: 10Alexandros Kosiaris) [10:59:02] (03PS2) 10JMeybohm: k8s-staging: Set ipip_encapsulation in service::catalog [puppet] - 10https://gerrit.wikimedia.org/r/1237280 (https://phabricator.wikimedia.org/T352956) (owner: 10Alexandros Kosiaris) [10:59:19] (03PS8) 10Elukey: role::kafka::test: force JDK17 and apply missing inter_broker_protocol_version [puppet] - 10https://gerrit.wikimedia.org/r/1237508 (https://phabricator.wikimedia.org/T416674) [10:59:44] (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237277 (https://phabricator.wikimedia.org/T352956) (owner: 10Alexandros Kosiaris) [11:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T1100) [11:00:31] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237863 (owner: 10Vgutierrez) [11:01:23] (03CR) 10JMeybohm: [C:03+1] "LGTM, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1237467 (https://phabricator.wikimedia.org/T352956) (owner: 10Vgutierrez) [11:01:54] (03CR) 10Vgutierrez: [C:03+2] lvs: Allow disabling TCP MSS clamping for IPIP realservers [puppet] - 10https://gerrit.wikimedia.org/r/1237467 (https://phabricator.wikimedia.org/T352956) (owner: 10Vgutierrez) [11:04:09] Hi folks, I'm looking for some guidance: I wanted to schedule a mediawiki-config patch (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1237451) for backport during the next deployment window, but after having +1, I accidentally clicked "+2", so it's merged now. The backport schedule tool doesn't allow to schedule it now. Should I [11:04:09] modify the wikipage manually to schedule it? or should I revert it or create a new patch? It is a new stream config that shouldn't affect anything else, and it isn't urgent. I see there are other mediawiki-config changes already scheduled, maybe it will be backported automatically with them? [11:07:20] !log ayounsi@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm [11:07:54] !log jayme@cumin1003 START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet [11:08:09] !log jayme@cumin1003 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubestage2001.codfw.wmnet [11:08:33] !log jayme@cumin1003 START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet [11:10:32] !log jayme@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host kubestagemaster2003.codfw.wmnet [11:10:34] !log jayme@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestagemaster2003.codfw.wmnet [11:10:44] JavierMonton: I'd ask releng, maybe hashar is around and could help [11:10:59] !log jayme@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet [11:11:18] (03CR) 10Slyngshede: [C:03+1] "Looks good. Tested in a local setup." [puppet] - 10https://gerrit.wikimedia.org/r/1237863 (owner: 10Vgutierrez) [11:11:50] thanks marostegui [11:13:33] !log jayme@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet [11:14:41] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [11:15:14] 06SRE, 06Infrastructure-Foundations, 10netops: Nokia SR-Linux DHCP Relay Bug - https://phabricator.wikimedia.org/T411054#11596132 (10ayounsi) Ticket 05430684 created with Nokia [11:16:20] !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet [11:17:22] (03PS1) 10JavierMonton: Revert "component: mediawiki.page_html_content_change.dev0" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237868 [11:17:52] (03CR) 10JavierMonton: [C:04-2] Revert "component: mediawiki.page_html_content_change.dev0" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237868 (owner: 10JavierMonton) [11:18:04] (03CR) 10JavierMonton: [C:03+2] Revert "component: mediawiki.page_html_content_change.dev0" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237868 (owner: 10JavierMonton) [11:19:04] (03Merged) 10jenkins-bot: Revert "component: mediawiki.page_html_content_change.dev0" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237868 (owner: 10JavierMonton) [11:19:32] FIRING: [4x] ProbeDown: Service wdqs1020:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [11:22:05] (03PS1) 10JavierMonton: component: mediawiki.page_html_content_change.dev0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237870 (https://phabricator.wikimedia.org/T360794) [11:23:19] !log jayme@cumin1003 START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet [11:23:21] !log jayme@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet [11:23:24] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237863 (owner: 10Vgutierrez) [11:23:34] !log jayme@cumin1003 START - Cookbook sre.hosts.reboot-single for host kubestagemaster2003.codfw.wmnet [11:28:34] !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2003.codfw.wmnet [11:29:15] !log jayme@cumin1003 START - Cookbook sre.k8s.pool-depool-node pool for host kubestagemaster2003.codfw.wmnet [11:29:17] !log jayme@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestagemaster2003.codfw.wmnet [11:30:36] (03CR) 10Vgutierrez: [C:03+2] haproxy: Validate host header [puppet] - 10https://gerrit.wikimedia.org/r/1237863 (owner: 10Vgutierrez) [11:33:26] (03Abandoned) 10Daniel Kinzler: rest-gateway: make staging override ratelimit policies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237335 (owner: 10Daniel Kinzler) [11:33:43] (03PS2) 10Daniel Kinzler: rest-gateway: remove support for insecure user ID cookies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237295 (https://phabricator.wikimedia.org/T405578) [11:33:52] (03CR) 10CI reject: [V:04-1] rest-gateway: remove support for insecure user ID cookies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237295 (https://phabricator.wikimedia.org/T405578) (owner: 10Daniel Kinzler) [11:35:41] 06SRE, 10SRE-swift-storage, 07Upstream: Container dbs for wikipedia-commons-local-thumb.f8 AWOL in codfw due to corruption - https://phabricator.wikimedia.org/T383053#11596180 (10MatthewVernon) 05Open→03Stalled Is there a problem with it being left stalled? The Debian bug I opened hasn't seen any activit... [11:35:49] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Users reporting issues connecting to Gerrit with HTTPS from Orange, FR mobile network (AS 3215) - https://phabricator.wikimedia.org/T411203#11596183 (10ayounsi) 05Open→03Declined Not actionable on our side. [11:38:07] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host es1033.eqiad.wmnet [11:38:52] 06SRE, 06Infrastructure-Foundations, 10netops: Update esams network pop diagrams - https://phabricator.wikimedia.org/T368084#11596195 (10cmooney) @papaul is this something you might have already drawn out in EVE-NG? [11:39:30] 06SRE, 06Infrastructure-Foundations, 10netops: Update esams network pop diagrams - https://phabricator.wikimedia.org/T368084#11596197 (10ayounsi) a:05cmooney→03Papaul Hey @Papaul would you be interested in working on that ? [11:40:10] (03PS1) 10Vgutierrez: haproxy: Fix allowed-hosts filename [puppet] - 10https://gerrit.wikimedia.org/r/1237872 [11:40:25] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237872 (owner: 10Vgutierrez) [11:40:41] FIRING: ConfdResourceFailed: confd resource _etc_haproxy_conf.d_tls.cfg.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [11:40:49] ^^ that's me [11:40:55] (03CR) 10Slyngshede: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1237872 (owner: 10Vgutierrez) [11:43:18] (03CR) 10Vgutierrez: [C:03+2] haproxy: Fix allowed-hosts filename [puppet] - 10https://gerrit.wikimedia.org/r/1237872 (owner: 10Vgutierrez) [11:43:54] (03PS3) 10Daniel Kinzler: rest-gateway: remove support for insecure user ID cookies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237295 (https://phabricator.wikimedia.org/T405578) [11:44:07] 06SRE, 10Charts, 07Kubernetes: Kserve helm chart - https://phabricator.wikimedia.org/T416580#11596215 (10DPogorzelski-WMF) Will do but I would argue it's much better to deploy it, test it, see what's broken, fix and iterate until it's working as intended. Quick, small iterations. It's a big chart and plannin... [11:44:21] (03PS19) 10Daniel Kinzler: rest gateway: add tests for chart rendering [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225085 [11:44:45] (03PS3) 10Daniel Kinzler: re-apply: rest gateway: define new limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237464 [11:44:52] (03PS4) 10Daniel Kinzler: rest-gateway: remove support for insecure user ID cookies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237295 (https://phabricator.wikimedia.org/T405578) [11:46:02] !log fceratto@deploy2002 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . [11:49:45] 06SRE, 10Charts, 07Kubernetes: Kserve helm chart - https://phabricator.wikimedia.org/T416580#11596235 (10elukey) >>! In T416580#11596215, @DPogorzelski-WMF wrote: > Will do but I would argue it's much better to deploy it, test it, see what's broken, fix and iterate until it's working as intended. Quick, smal... [11:49:54] 10ops-codfw, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Q3:rack/setup/install apus-fe200[4-5] - https://phabricator.wikimedia.org/T416387#11596237 (10MatthewVernon) [11:50:41] RESOLVED: ConfdResourceFailed: confd resource _etc_haproxy_conf.d_tls.cfg.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [11:51:04] !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host es1033.eqiad.wmnet [11:51:18] (03CR) 10Muehlenhoff: [C:03+2] Enable Bird 2.18 for centrallog [puppet] - 10https://gerrit.wikimedia.org/r/1237862 (https://phabricator.wikimedia.org/T413740) (owner: 10Muehlenhoff) [11:52:02] 06SRE, 13Patch-For-Review: Test and upgrade Kafka clusters to Openjdk 17 - https://phabricator.wikimedia.org/T416674#11596242 (10elukey) I did some research about running Kafka 1.1 on JDK17, and I didn't find anybody reporting that they have done it successfully (as I expected). At a high level, the major prob... [11:52:11] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Q3:rack/setup/install ms-fe202[1-4] - https://phabricator.wikimedia.org/T416243#11596244 (10MatthewVernon) [11:54:11] (03CR) 10DCausse: [C:03+1] component: mediawiki.page_html_content_change.dev0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237870 (https://phabricator.wikimedia.org/T360794) (owner: 10JavierMonton) [11:54:14] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops: Q3:rack/setup/install ms-fe102[14] - https://phabricator.wikimedia.org/T416245#11596248 (10MatthewVernon) [11:59:46] (03PS1) 10Vgutierrez: haproxy: Enable host validation in magru [puppet] - 10https://gerrit.wikimedia.org/r/1237876 [11:59:57] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237876 (owner: 10Vgutierrez) [12:03:22] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Q3:rack/setup/install apus-fe100[4-5] - https://phabricator.wikimedia.org/T416386#11596291 (10MatthewVernon) [12:05:04] !log rolling out host header validation in haproxy on magru, revert https://gerrit.wikimedia.org/r/c/operations/puppet/+/1237876 if needed [12:05:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:11] (03CR) 10Vgutierrez: [C:03+2] haproxy: Enable host validation in magru [puppet] - 10https://gerrit.wikimedia.org/r/1237876 (owner: 10Vgutierrez) [12:05:38] 06SRE, 10Infrastructure Security, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware, and 5 others: October 2025 Bullseye reboots (ServiceOps hosts) - https://phabricator.wikimedia.org/T416451#11596300 (10MatthewVernon) None of these hosts are swift hosts, so removing the swift tag. [12:08:22] (03PS1) 10Jgiannelos: wikifeeds: Use flags for memory compatibility with node18 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237880 (https://phabricator.wikimedia.org/T410296) [12:14:26] (03CR) 10Hnowlan: [C:03+1] Cleanup redundant lint-related rest gateway routing config [puppet] - 10https://gerrit.wikimedia.org/r/1210631 (owner: 10Aaron Schulz) [12:18:10] !log upgrade centrallog2002 to Bird 2.18 T413740 [12:18:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:18:14] T413740: Backport and test Bird 2.18 - https://phabricator.wikimedia.org/T413740 [12:19:42] (03PS1) 10PipelineBot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237884 [12:23:05] (03PS1) 10Jelto: gerrit::sshkey: add gerrit-lb IPs to host_aliases ssh key [puppet] - 10https://gerrit.wikimedia.org/r/1237887 (https://phabricator.wikimedia.org/T411895) [12:24:33] (03CR) 10Jelto: gerrit::sshkey: add gerrit-lb IPs to host_aliases ssh key (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1237887 (https://phabricator.wikimedia.org/T411895) (owner: 10Jelto) [12:24:36] (03PS2) 10Jgiannelos: wikifeeds: Use flags for memory compatibility with node18 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237880 (https://phabricator.wikimedia.org/T410296) [12:25:25] (03Abandoned) 10Jgiannelos: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237884 (owner: 10PipelineBot) [12:26:42] (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (NOOP 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8003/console" [puppet] - 10https://gerrit.wikimedia.org/r/1237887 (https://phabricator.wikimedia.org/T411895) (owner: 10Jelto) [12:27:41] !log upgrade centrallog1002 to Bird 2.18 T413740 [12:27:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:27:44] T413740: Backport and test Bird 2.18 - https://phabricator.wikimedia.org/T413740 [12:31:44] 06SRE, 10Charts, 07Kubernetes: Kserve helm chart - https://phabricator.wikimedia.org/T416580#11596370 (10DPogorzelski-WMF) > The README contains things that need to be added to avoid the issues that you want to fix with small iterations, so since we know it beforehand I am not 100% sure why you want to redis... [12:33:13] (03PS1) 10Dreamy Jazz: Stop writing old for CheckUser user agent table migration on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237897 (https://phabricator.wikimedia.org/T361206) [12:34:18] (03PS1) 10Dreamy Jazz: Stop writing old for CheckUser user agent table migration on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237898 (https://phabricator.wikimedia.org/T361206) [12:34:20] (03PS1) 10Dreamy Jazz: Stop writing old for CheckUser user agent table migration everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237899 (https://phabricator.wikimedia.org/T361206) [12:36:00] (03PS4) 10Daniel Kinzler: re-apply: rest gateway: define new limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237464 [12:39:34] 06SRE, 10Charts, 07Kubernetes: Kserve helm chart - https://phabricator.wikimedia.org/T416580#11596391 (10DPogorzelski-WMF) Btw the chart does work fine locally for what's worth it. Bartosz also tested it. [12:41:45] (03CR) 10Kamila Součková: [C:03+1] re-apply: rest gateway: define new limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237464 (owner: 10Daniel Kinzler) [12:42:49] (03CR) 10Daniel Kinzler: [C:03+2] re-apply: rest gateway: define new limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237464 (owner: 10Daniel Kinzler) [12:43:36] (03PS5) 10Daniel Kinzler: rediscope: lower cpu and memoy limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/1233161 [12:43:42] (03PS9) 10Daniel Kinzler: redioscope: enable time bucket [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230444 [12:43:51] jouncebot: nowandnext [12:43:51] No deployments scheduled for the next 1 hour(s) and 16 minute(s) [12:43:51] In 1 hour(s) and 16 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T1400) [12:44:27] (03Merged) 10jenkins-bot: re-apply: rest gateway: define new limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237464 (owner: 10Daniel Kinzler) [12:46:35] !log daniel@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply [12:48:30] !log daniel@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply [12:48:41] 06SRE, 07Kubernetes: Kserve helm chart - https://phabricator.wikimedia.org/T416580#11596447 (10taavi) [12:52:40] !log daniel@deploy2002 helmfile [eqiad] START helmfile.d/services/rest-gateway: apply [12:53:36] !log daniel@deploy2002 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply [12:54:30] (03PS1) 10GergesShamon: Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237902 (https://phabricator.wikimedia.org/T416779) [12:57:05] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237902 (https://phabricator.wikimedia.org/T416779) (owner: 10GergesShamon) [12:59:17] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [12:59:34] (03PS1) 10Jforrester: wikifunctions: Add required metadata fields to chart definitions [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237904 (https://phabricator.wikimedia.org/T412693) [13:00:11] (03PS1) 10Muehlenhoff: Backfill missing roles for bullseye tracking [puppet] - 10https://gerrit.wikimedia.org/r/1237905 [13:00:40] !log daniel@deploy2002 helmfile [codfw] START helmfile.d/services/rest-gateway: apply [13:01:14] !log daniel@deploy2002 helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply [13:02:55] (03CR) 10Muehlenhoff: [C:03+2] Backfill missing roles for bullseye tracking [puppet] - 10https://gerrit.wikimedia.org/r/1237905 (owner: 10Muehlenhoff) [13:17:32] (03CR) 10Daniel Kinzler: [C:03+2] rediscope: lower cpu and memoy limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/1233161 (owner: 10Daniel Kinzler) [13:17:35] (03CR) 10Daniel Kinzler: [C:03+2] redioscope: enable time bucket [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230444 (owner: 10Daniel Kinzler) [13:17:51] (03PS1) 10STran: Fix IP reveal buttons on contributions page when there are extra user links [extensions/CheckUser] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237908 (https://phabricator.wikimedia.org/T416758) [13:18:42] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [extensions/CheckUser] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237908 (https://phabricator.wikimedia.org/T416758) (owner: 10STran) [13:19:32] (03CR) 10STran: "Being optimistic about this backport scheduling, as the primary patch is still going through CI and the window is somewhat busy today." [extensions/CheckUser] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237908 (https://phabricator.wikimedia.org/T416758) (owner: 10STran) [13:19:48] (03Merged) 10jenkins-bot: rediscope: lower cpu and memoy limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/1233161 (owner: 10Daniel Kinzler) [13:19:49] 06SRE, 06Infrastructure-Foundations, 10netops: Eqiad: move row-wide vlan gateways to Nokia switches - https://phabricator.wikimedia.org/T416872 (10cmooney) 03NEW p:05Triage→03Medium [13:19:49] (03Merged) 10jenkins-bot: redioscope: enable time bucket [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230444 (owner: 10Daniel Kinzler) [13:20:05] 06SRE, 06Infrastructure-Foundations, 10netops: Eqiad C/D refresh: move legacy switch uplinks to Nokias and migrate Vlan GWs - https://phabricator.wikimedia.org/T405562#11596553 (10cmooney) [13:20:07] 06SRE, 06Infrastructure-Foundations, 10netops: Eqiad: move row-wide vlan gateways to Nokia switches - https://phabricator.wikimedia.org/T416872#11596554 (10cmooney) [13:20:11] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Eqiad: row C/D switch refresh - https://phabricator.wikimedia.org/T396063#11596555 (10cmooney) [13:20:21] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Eqiad row C/D switch refresh: LVS changes to support migration - https://phabricator.wikimedia.org/T405602#11596556 (10cmooney) [13:20:27] 06SRE, 06Infrastructure-Foundations, 10netops: Eqiad: move row-wide vlan gateways to Nokia switches - https://phabricator.wikimedia.org/T416872#11596557 (10cmooney) [13:20:30] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Eqiad: row C/D switch refresh - https://phabricator.wikimedia.org/T396063#11596558 (10cmooney) [13:21:44] !log daniel@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/redioscope: apply [13:21:46] 06SRE, 06Infrastructure-Foundations, 10netops: Nokia SR-Linux DHCP Relay Bug - https://phabricator.wikimedia.org/T411054#11596563 (10cmooney) [13:21:47] 06SRE, 06Infrastructure-Foundations: Nokia L3 bugs [Oct 2025] - https://phabricator.wikimedia.org/T409286#11596564 (10cmooney) [13:22:11] 06SRE, 06Infrastructure-Foundations, 10netops: Nokia SR-Linux DHCP Relay Bug - https://phabricator.wikimedia.org/T411054#11596568 (10cmooney) I've removed parent task T409286 to track this independently but commenting for the record. [13:22:49] !log daniel@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/redioscope: apply [13:23:03] !log daniel@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/redioscope: apply [13:23:32] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Eqiad C/D refresh: move asw2-c-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T405579#11596570 (10cmooney) [13:23:33] 06SRE, 06Infrastructure-Foundations, 10netops: Eqiad C/D refresh: move legacy switch uplinks to Nokias and migrate Vlan GWs - https://phabricator.wikimedia.org/T405562#11596571 (10cmooney) [13:23:34] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Eqiad: row C/D switch refresh - https://phabricator.wikimedia.org/T396063#11596572 (10cmooney) [13:23:40] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Eqiad: row C/D switch refresh - https://phabricator.wikimedia.org/T396063#11596575 (10cmooney) [13:23:41] 06SRE, 06Infrastructure-Foundations, 10netops: Eqiad C/D refresh: move legacy switch uplinks to Nokias and migrate Vlan GWs - https://phabricator.wikimedia.org/T405562#11596574 (10cmooney) [13:23:53] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Eqiad: row C/D switch refresh - https://phabricator.wikimedia.org/T396063#11596579 (10cmooney) [13:23:57] 06SRE, 06Infrastructure-Foundations, 10netops: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067#11596577 (10cmooney) [13:24:05] 06SRE, 06Infrastructure-Foundations, 10netops: Eqiad C/D refresh: move legacy switch uplinks to Nokias and migrate Vlan GWs - https://phabricator.wikimedia.org/T405562#11596578 (10cmooney) [13:24:13] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Eqiad: row C/D switch refresh - https://phabricator.wikimedia.org/T396063#11596582 (10cmooney) [13:24:17] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1020: move primary uplink from asw2-d7-eqiad to lsw1-d7-eqiad and remove link to asw2-c2-eqiad - https://phabricator.wikimedia.org/T405609#11596580 (10cmooney) [13:24:25] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Eqiad row C/D switch refresh: LVS changes to support migration - https://phabricator.wikimedia.org/T405602#11596581 (10cmooney) [13:24:34] !log daniel@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/redioscope: apply [13:24:46] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Eqiad: row C/D switch refresh - https://phabricator.wikimedia.org/T396063#11596585 (10cmooney) [13:24:50] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Remove lvs1018 L2 link to ssw1-e1-eqiad - https://phabricator.wikimedia.org/T405499#11596587 (10cmooney) [13:24:58] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Eqiad row C/D switch refresh: LVS changes to support migration - https://phabricator.wikimedia.org/T405602#11596588 (10cmooney) [13:25:06] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Eqiad: row C/D switch refresh - https://phabricator.wikimedia.org/T396063#11596591 (10cmooney) [13:25:10] 10ops-eqiad, 06SRE, 06DC-Ops, 06Traffic: lvs1018: decom links to asw2-c2-eqiad and asw2-d7-eqiad - https://phabricator.wikimedia.org/T410661#11596589 (10cmooney) [13:25:16] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Eqiad row C/D switch refresh: LVS changes to support migration - https://phabricator.wikimedia.org/T405602#11596590 (10cmooney) [13:25:26] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: lvs1018: remove cross-rack links to rows A, C and D - https://phabricator.wikimedia.org/T411781#11596592 (10cmooney) [13:25:46] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Eqiad row C/D switch refresh: LVS changes to support migration - https://phabricator.wikimedia.org/T405602#11596593 (10cmooney) [13:26:14] 10ops-eqiad, 06SRE, 06DC-Ops: Eqiad: row C/D switch refresh - https://phabricator.wikimedia.org/T396063#11596594 (10cmooney) [13:27:43] !log daniel@deploy2002 helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply [13:28:09] !log daniel@deploy2002 helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply [13:30:18] (03CR) 10Elukey: [C:03+2] confluent::kafka::broker: fix default options for jvm-17+ [puppet] - 10https://gerrit.wikimedia.org/r/1237858 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [13:34:45] !log arnaudb@cumin1003 START - Cookbook sre.gerrit.localbackup Prepare local backup on: gerrit2003.wikimedia.org [13:36:34] (03CR) 10Jelto: [C:03+1] gerrit: temporarily disable replication on gerrit2003 [puppet] - 10https://gerrit.wikimedia.org/r/1237450 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [13:40:04] (03CR) 10Arnaudb: [C:03+2] gerrit: temporarily disable replication on gerrit2003 [puppet] - 10https://gerrit.wikimedia.org/r/1237450 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [13:42:10] !log arnaudb@cumin1003 END (PASS) - Cookbook sre.gerrit.localbackup (exit_code=0) Prepare local backup on: gerrit2003.wikimedia.org [13:43:08] !log arnaudb@cumin1003 START - Cookbook sre.gerrit.localbackup Prepare local backup on: gerrit1003.wikimedia.org [13:44:17] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host sretest2010.codfw.wmnet [13:46:02] FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate _etcd-server-ssl._tcp.ml_etcd.codfw.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [13:47:41] !log arnaudb@cumin1003 END (PASS) - Cookbook sre.gerrit.localbackup (exit_code=0) Prepare local backup on: gerrit1003.wikimedia.org [13:49:59] (03CR) 10Tchanders: [C:03+1] Fix IP reveal buttons on contributions page when there are extra user links [extensions/CheckUser] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237908 (https://phabricator.wikimedia.org/T416758) (owner: 10STran) [13:50:24] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2010.codfw.wmnet [13:56:59] !log ladsgroup@deploy2002:~$ mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 visualeditor-autodisable [13:57:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:04] Lucas_WMDE, Urbanecm, and TheresNoTime: #bothumor My software never has bugs. It just develops random features. Rise for UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T1400). [14:00:05] Jhs and Tran: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:09] o/ [14:00:13] o/ [14:06:08] Is a deployer around or is it a self-deploy day? [14:06:39] I can’t deploy yet, sorry [14:06:43] might be around in ~30 minutes [14:06:55] if a self-deployer is around, go ahead imho [14:07:26] jouncebot: nowandnext [14:07:27] For the next 0 hour(s) and 52 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T1400) [14:07:27] In 1 hour(s) and 22 minute(s): Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T1530) [14:07:27] I can do Tran's [14:07:47] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237897 (https://phabricator.wikimedia.org/T361206) (owner: 10Dreamy Jazz) [14:08:06] (03CR) 10TrainBranchBot: [C:03+2] "Approved by tchanders@deploy2002 using scap backport" [extensions/CheckUser] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237908 (https://phabricator.wikimedia.org/T416758) (owner: 10STran) [14:08:16] Jhs? [14:08:37] They are there [14:08:41] i'm here [14:08:50] but can't deploy anything myself [14:09:58] I'm here but my internet is unstable atm so Tchanders has kindly offered to deploy mine but I can't help with anyone else's due to that. I was mostly asking if someone could deploy Jhs', as those are the first in the queue. [14:10:15] (03Merged) 10jenkins-bot: Fix IP reveal buttons on contributions page when there are extra user links [extensions/CheckUser] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237908 (https://phabricator.wikimedia.org/T416758) (owner: 10STran) [14:10:57] Sorry, I got started on Tran's already. We're both multi-tasking in a meeting atm so it's difficult to take on more this window [14:11:19] I'm also in that same meeting, so can't help either [14:12:26] I just abandoned my attempt because there were unexpected commits, and I don't have time to reason through that [14:14:01] Interesting, do you want me to try mine and see what happens? [14:14:06] Hmm, the patch I was doing is now +2 (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1237908) but I said no to continuing with sync. What should I do next? [14:14:11] Yes, please go for it [14:14:32] (03CR) 10Muehlenhoff: [C:03+2] Make bast1004 a bastion [puppet] - 10https://gerrit.wikimedia.org/r/1237128 (owner: 10Muehlenhoff) [14:14:35] It will need reverting unless you want me to deploy it at the same time [14:14:57] (03CR) 10Jelto: [C:03+1] switch gerrit service IP to CDN [dns] - 10https://gerrit.wikimedia.org/r/1215709 (https://phabricator.wikimedia.org/T411895) (owner: 10Dzahn) [14:15:41] is it free to try and deploy it at the same time? It would be good to get this in today if possible. [14:15:54] (03CR) 10Xcollazo: [C:03+1] "LGTM, thanks @brouberol@wikimedia.org!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237740 (https://phabricator.wikimedia.org/T416821) (owner: 10Brouberol) [14:16:12] Looked at the diff. It seems the change was self-reverted [14:16:16] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237897 (https://phabricator.wikimedia.org/T361206) (owner: 10Dreamy Jazz) [14:16:31] Proceeding to deploy both mine and the CheckUser backport [14:16:54] (03CR) 10CDanis: [C:03+1] switch gerrit service IP to CDN [dns] - 10https://gerrit.wikimedia.org/r/1215709 (https://phabricator.wikimedia.org/T411895) (owner: 10Dzahn) [14:17:56] (03Merged) 10jenkins-bot: Stop writing old for CheckUser user agent table migration on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237897 (https://phabricator.wikimedia.org/T361206) (owner: 10Dreamy Jazz) [14:19:07] o/ I’m out of my meeting and could probably deploy if needed [14:19:30] !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1237897|Stop writing old for CheckUser user agent table migration on group0 (T361206)]] [14:19:35] T361206: Stop writing old for user agent schema migration on WMF wikis - https://phabricator.wikimedia.org/T361206 [14:19:44] Lucas_WMDE: Would be good if you could look at Jhs changes? [14:19:53] I'll ping you once mine is done [14:20:14] Not sure I have time to review the first one (it's quite a large change in terms of diffs) [14:20:19] sure, I can take a look [14:20:40] Thanks [14:20:48] 06SRE, 10SRE-swift-storage, 07SRE-Unowned, 06Data-Persistence, and 2 others: Create a new bucket for Tegola's tile cache and duplicate its data - https://phabricator.wikimedia.org/T396584#11596806 (10MatthewVernon) Thanks for doing this and documenting the approach :) [14:21:31] !log dreamyjazz@deploy2002 dreamyjazz: Backport for [[gerrit:1237897|Stop writing old for CheckUser user agent table migration on group0 (T361206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:22:01] waow [14:22:05] that’sa lot of changes [14:22:13] diffConfig might be more helpful though [14:22:14] Tran: Tchanders: Want to test yours? [14:22:25] Having a look... [14:22:27] Thanks [14:22:30] Testing mine [14:22:36] (03PS1) 10Vgutierrez: haproxy: Enable host header validation globally [puppet] - 10https://gerrit.wikimedia.org/r/1237915 [14:23:04] (03PS1) 10Muehlenhoff: Make bast1004 a bastion [puppet] - 10https://gerrit.wikimedia.org/r/1237916 (https://phabricator.wikimedia.org/T416254) [14:23:23] yeah, sorry about the big change [14:23:31] Lucas_WMDE, Dreamy_Jazz: Is there room for another config change? [14:23:38] (After everything else obvs) [14:23:40] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237915 (owner: 10Vgutierrez) [14:24:00] phuedx: maybe. feel free to put it on the calendar [14:24:00] it's not time-sensitive though, so i'd be fine with postponing it if you need more time to review [14:24:18] Dreamy_jazz: Looks good [14:24:28] 10SRE-swift-storage, 06Commons: File disappeared from server - https://phabricator.wikimedia.org/T416617#11596817 (10MatthewVernon) @Yann (I've been OoO, sorry for delay) your links all look like they work now. Is there anything outstanding here? [14:24:37] urandom: thanks for this https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/1237866 maybe I‌ backport and deploy it? [14:24:51] (specially to isolate the impact) [14:25:18] !log dreamyjazz@deploy2002 dreamyjazz: Continuing with sync [14:25:22] Jhs: the diffConfig is… empty? (I tried it out locally as well) [14:25:24] Testing complete, proceeding... [14:25:30] or is InterwikiSortOrders just not included in that? [14:25:47] not sure what diffConfig is 😅 [14:26:19] it builds the effective config for each wiki on both master and the change, then diffs them, so you can see what changes [14:26:21] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237743 (https://phabricator.wikimedia.org/T416719) (owner: 10Jforrester) [14:26:36] but it only includes some files (e.g InitialiseSettings.php) and not others (e.g. CommonSettings.php) [14:26:37] 06SRE, 06ServiceOps new, 10Continuous-Integration-Config, 06Release-Engineering-Team (Seen): operations/docker-images/production-images has no CI - https://phabricator.wikimedia.org/T283855#11596821 (10bking) Thanks @hashar , I must have had my brain dripping out of my ears when I wrote that last update. A... [14:26:48] so maybe interwikiSortOrders.php is part of the files that it doesn’t include [14:26:53] (03CR) 10Xcollazo: [C:03+2] "Yes, this was my bad, sorry for the fallout @jforrester@wikimedia.org." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237540 (https://phabricator.wikimedia.org/T416719) (owner: 10Xcollazo) [14:26:56] I guess InterwikiSortOrders having no diffConfig is expected? AFAICS it was just spacing changes [14:26:57] since the commit message sounds like *some* changes are expected [14:27:09] Oh yeah, good point [14:27:10] (“Also add all languages that have been added to Wikimedia wikis since the file was last updated”) [14:27:20] InterwikiSortOrders is very niche nowadays. Only used by a few wikis (those with non-default wgInterwikiSortingSort in InitaliseSettings), and only used in non-V22 and non-Minerva skins when the user has opted out of the compact language links [14:27:27] (unrelated rant: InterwikiSortOrder extension should just die: T253764) [14:27:28] T253764: Undeploy the InterwikiSorting extension from Wikipedia production - https://phabricator.wikimedia.org/T253764 [14:27:49] Amir1, it would be about time, yeah... [14:27:52] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237915 (owner: 10Vgutierrez) [14:28:02] I’ll just massage the arrays into a diffable form with sed and friends and compare manually ^^ [14:28:04] Jhs: and for those users, they can live with the default sorting order [14:28:14] Amir1, agreed [14:28:28] but as long as it's there, new languages all end up at the bottom of the list [14:28:46] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops, 06Infrastructure-Foundations: DHCP failing for at least 2 ms-be servers in codfw - https://phabricator.wikimedia.org/T415189#11596823 (10MatthewVernon) >>! In T415189#11589253, @ayounsi wrote: > I think that was due to the bug fixed in {T416401}. It shou... [14:28:56] 10ops-eqiad, 06SRE, 06DC-Ops: Q2:rack/setup/install mc1055-72 - https://phabricator.wikimedia.org/T412255#11596825 (10MLechvien-WMF) @Jclark-ctr Can I confirm what is the next step on this? [14:29:33] !log dreamyjazz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1237897|Stop writing old for CheckUser user agent table migration on group0 (T361206)]] (duration: 10m 03s) [14:29:36] T361206: Stop writing old for user agent schema migration on WMF wikis - https://phabricator.wikimedia.org/T361206 [14:29:43] Lucas_WMDE: I'm done [14:29:44] 06SRE, 06Traffic, 13Patch-For-Review: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605#11596826 (10ssingh) ` sukhe@cumin1003:~$ sudo cumin "A:bastion" "dig +nsid en.wikipedia.org @2a02:ec80:53::1| grep NSID" 8 hosts will be targeted: bast[1003-1004,2003,3007,4005,5004,6003,7002].wik... [14:29:52] ack, I’m still reviewing [14:30:10] (03CR) 10Slyngshede: [C:03+1] "Very good" [puppet] - 10https://gerrit.wikimedia.org/r/1237915 (owner: 10Vgutierrez) [14:30:22] Jhs: would you mind commenting on the ticket and saying that it's adding work when introducing a new language? Some people are arguing it's not much work to maintain it [14:30:42] Amir1, sure [14:31:04] (03CR) 10Vgutierrez: [C:03+2] haproxy: Enable host header validation globally [puppet] - 10https://gerrit.wikimedia.org/r/1237915 (owner: 10Vgutierrez) [14:31:18] (03PS2) 10Ladsgroup: changeprop-jobqueue: Remove thumbnail render job concurrency [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237504 (https://phabricator.wikimedia.org/T415282) [14:31:26] Thanks <3 [14:31:40] (03CR) 10Ladsgroup: "Great idea. Added." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237504 (https://phabricator.wikimedia.org/T415282) (owner: 10Ladsgroup) [14:32:38] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "I locally used some search+replace to make both this and master more diffable and AFAICT the changes look reasonable in quantity (though I" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235359 (owner: 10Jon Harald Søby) [14:32:58] Jhs: can that change be deployed on its own? [14:32:59] (03CR) 10Phuedx: [C:03+1] Test Kitchen renaming: Updated references to old names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235798 (https://phabricator.wikimedia.org/T415843) (owner: 10Santiago Faci) [14:33:05] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235798 (https://phabricator.wikimedia.org/T415843) (owner: 10Santiago Faci) [14:33:11] or do I need to review + deploy hewikisource PageImages at the same time? [14:33:32] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235359 (owner: 10Jon Harald Søby) [14:33:51] (going ahead with it for now) [14:34:31] (03Merged) 10jenkins-bot: Rework InterwikiSortOrders.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235359 (owner: 10Jon Harald Søby) [14:34:49] !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1235359|Rework InterwikiSortOrders.php]] [14:35:05] (03CR) 10Ladsgroup: "Thank you!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1236361 (https://phabricator.wikimedia.org/T416174) (owner: 10Seawolf35gerrit) [14:35:53] Lucas_WMDE, sorry, which one? [14:36:01] 10ops-eqiad, 06SRE, 06DC-Ops: Q2:rack/setup/install mc1055-72 - https://phabricator.wikimedia.org/T412255#11596856 (10Jclark-ctr) @MLechvien-WMF onsite we are waiting for the order to be completed and delivered. T405292 is ticket for procurement this is what we are waiting on. [14:36:10] Jhs: I’m deploying InterwikiSortOrders right now [14:36:16] sweet [14:36:23] unless you tell me I should abort because the other change needs to happen at the same time [14:36:30] (but now that I looked at it, that seems unlikely ^^) [14:36:42] !log lucaswerkmeister-wmde@deploy2002 jhsoby, lucaswerkmeister-wmde: Backport for [[gerrit:1235359|Rework InterwikiSortOrders.php]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:36:44] you can see diffConfig in action there btw: https://integration.wikimedia.org/ci/job/operations-mw-config-php83-composer-diffConfig/634/console [14:36:55] Jhs: okay, is there anything to test for the InterwikiSortOrders change? :) [14:37:46] Lucas_WMDE, mainly going to a page with lots of interwikis (like the main page) with the necessary settings (Vector, no compact links) and checking that no languages appear unexpectedly at the end) [14:37:59] okay, can you check on mwdebug? [14:38:27] ok I think I can see the issue at https://en.wikipedia.org/wiki/Main_Page?useskin=vector [14:38:41] and with mwdebug it seems to look better, yay [14:39:02] lgtm on enwiki on mwdebug, yeah [14:39:10] !log lucaswerkmeister-wmde@deploy2002 jhsoby, lucaswerkmeister-wmde: Continuing with sync [14:39:13] alright, thanks! [14:39:17] thank you! [14:39:46] back to diffConfig for a moment – you can see it linked in the CI output on https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1237691, or directly click the blue “test” check below the commit message to jump to it [14:40:10] in this case we can see that it has the expected effect, turn on a setting on hewikisource and nowhere else \o/ [14:40:30] (and apparently there’s no beta hewikisource, otherwise that would be listed as a second affected file ^^) [14:43:13] !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1235359|Rework InterwikiSortOrders.php]] (duration: 08m 24s) [14:43:17] ah, right. nice [14:43:34] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237691 (https://phabricator.wikimedia.org/T362851) (owner: 10Jon Harald Søby) [14:44:23] (03Merged) 10jenkins-bot: Enable PageImages for hewikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237691 (https://phabricator.wikimedia.org/T362851) (owner: 10Jon Harald Søby) [14:44:42] !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1237691|Enable PageImages for hewikisource (T362851)]] [14:44:46] T362851: Enable PageImages extension in hewikisource - https://phabricator.wikimedia.org/T362851 [14:45:55] phuedx: I assume both your changes are okay to deploy together? (as the second one only touches comments AFAICT) [14:46:03] Lucas_WMDE: That's correct [14:46:25] ok then we might have time [14:46:42] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, jhsoby: Backport for [[gerrit:1237691|Enable PageImages for hewikisource (T362851)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:47:18] Jhs: can you test the change on mwdebug? [14:47:25] Lucas_WMDE, already done :D [14:47:29] works as expected [14:47:49] I did a null edit on https://he.wikisource.org/wiki/%D7%A8%D7%99%22%D7%A3_%D7%A2%D7%9C_%D7%94%D7%A9%22%D7%A1 , and that page now gets an image when you search for it in Vector-22 [14:47:50] yay! [14:48:16] hm, not for me [14:48:19] but I’ll take your word for it ^^ [14:48:22] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, jhsoby: Continuing with sync [14:48:28] maybe the cache already got reset or something [14:48:49] should be a harmless enough config change after all [14:49:09] do you see a page image in its action=info ? https://he.wikisource.org/wiki/%D7%A8%D7%99%22%D7%A3_%D7%A2%D7%9C_%D7%94%D7%A9%22%D7%A1?action=info [14:49:20] 06SRE, 06Data-Platform-SRE, 06Infrastructure-Foundations, 07Epic: Migrate Docker images running in Production away from Bullseye - https://phabricator.wikimedia.org/T416452#11596907 (10JMeybohm) The cert-manager 1.10.x images are probably okay to ignore since the remaining clusters will have to upgrade to... [14:50:28] I do, yes [14:50:35] maybe the search results themselves were cached on my end [14:50:48] (03CR) 10Ayounsi: [C:03+1] Make bast1004 a bastion [puppet] - 10https://gerrit.wikimedia.org/r/1237916 (https://phabricator.wikimedia.org/T416254) (owner: 10Muehlenhoff) [14:51:47] RESOLVED: [2x] ProbeDown: Service wdqs1021:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1021:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:52:23] !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1237691|Enable PageImages for hewikisource (T362851)]] (duration: 07m 41s) [14:52:26] T362851: Enable PageImages extension in hewikisource - https://phabricator.wikimedia.org/T362851 [14:52:50] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237743 (https://phabricator.wikimedia.org/T416719) (owner: 10Jforrester) [14:52:50] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235798 (https://phabricator.wikimedia.org/T415843) (owner: 10Santiago Faci) [14:53:14] phuedx: I guess I should’ve asked if you want to deploy yourself [14:53:15] 06SRE, 13Patch-For-Review: Test and upgrade Kafka clusters to Openjdk 17 - https://phabricator.wikimedia.org/T416674#11596922 (10MoritzMuehlenhoff) >>! In T416674#11596242, @elukey wrote: > On the other hand, IIUC Kafka 3.5 can run on JDK 8 (namely it is supported, not sure how well) and we have it backported... [14:53:18] you can take over https://spiderpig.wikimedia.org/jobs/1304 if you like ^^ [14:53:35] (03Merged) 10jenkins-bot: Revert^2 "EventStreamConfig: Bump product_metrics.web_base* streams to large size" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237743 (https://phabricator.wikimedia.org/T416719) (owner: 10Jforrester) [14:53:43] (03Merged) 10jenkins-bot: Test Kitchen renaming: Updated references to old names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235798 (https://phabricator.wikimedia.org/T415843) (owner: 10Santiago Faci) [14:54:04] !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1237743|Revert^2 "EventStreamConfig: Bump product_metrics.web_base* streams to large size" (T416719)]], [[gerrit:1235798|Test Kitchen renaming: Updated references to old names (T415843)]] [14:54:05] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-product-users, airflow-analytics-product-admins for akhatun - https://phabricator.wikimedia.org/T416703#11596927 (10tappof) [14:54:08] T416719: OpsWeek: Bump memory of refine job for product_metrics.web_base_with_ip to avoid recent OOMs - https://phabricator.wikimedia.org/T416719 [14:54:09] T415843: Update old links to Test Kitchen UI with the new domain - https://phabricator.wikimedia.org/T415843 [14:54:27] Lucas_WMDE: No worries ^^ [14:55:37] !log fceratto@deploy2002 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . [14:56:00] !log lucaswerkmeister-wmde@deploy2002 jforrester, lucaswerkmeister-wmde, sfaci: Backport for [[gerrit:1237743|Revert^2 "EventStreamConfig: Bump product_metrics.web_base* streams to large size" (T416719)]], [[gerrit:1235798|Test Kitchen renaming: Updated references to old names (T415843)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:56:32] phuedx: anything to test on mwdebug? [14:56:41] (I guess EventStreamConfig should be testable?) [14:57:25] I'm here supporting that too, Lucas_WMDE, not testable yet but I agree [14:57:56] not sure what you’re agreeing to 😅 that it is or isn’t testable? [14:58:05] (the config looks good - but the test would be running the airflow job that depends on that value) [14:58:10] ok [14:58:13] so should I just go ahead with the deploy? [14:58:16] yep [14:58:20] !log lucaswerkmeister-wmde@deploy2002 jforrester, lucaswerkmeister-wmde, sfaci: Continuing with sync [14:58:21] alright, thanks! [14:58:28] thank you Lucas [15:02:19] !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1237743|Revert^2 "EventStreamConfig: Bump product_metrics.web_base* streams to large size" (T416719)]], [[gerrit:1235798|Test Kitchen renaming: Updated references to old names (T415843)]] (duration: 08m 16s) [15:02:24] T416719: OpsWeek: Bump memory of refine job for product_metrics.web_base_with_ip to avoid recent OOMs - https://phabricator.wikimedia.org/T416719 [15:02:25] T415843: Update old links to Test Kitchen UI with the new domain - https://phabricator.wikimedia.org/T415843 [15:02:45] !log UTC afternoon backport+config window done [15:02:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:00] (03PS1) 10Tiziano Fogli: admin: add akhatun to "analytics" groups [puppet] - 10https://gerrit.wikimedia.org/r/1237920 (https://phabricator.wikimedia.org/T416703) [15:04:28] (03CR) 10Brouberol: [C:03+2] airflow: Add a script fetching the content of an XCOM from s3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237740 (https://phabricator.wikimedia.org/T416821) (owner: 10Brouberol) [15:04:32] (03PS1) 10Vgutierrez: haproxy: Make host header validation mandatory [puppet] - 10https://gerrit.wikimedia.org/r/1237922 [15:05:36] (03PS1) 10Dpogorzelski: ml-staging-codfw: restore kserve 0.12 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237924 [15:05:56] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1237922 (owner: 10Vgutierrez) [15:07:40] 06SRE, 10SRE-Access-Requests: Requesting access to deployment, analytics-privatedata-users for ASanford-WMF - https://phabricator.wikimedia.org/T416710#11597015 (10tappof) @thcipriani could you please approve the deployment group request? Thanks [15:08:11] jouncebot: nowandnext [15:08:11] No deployments scheduled for the next 0 hour(s) and 21 minute(s) [15:08:11] In 0 hour(s) and 21 minute(s): Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T1530) [15:08:23] I‌push something really quickly then [15:08:43] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237517 (https://phabricator.wikimedia.org/T412971) (owner: 10Ladsgroup) [15:09:33] (03Merged) 10jenkins-bot: MediaViewer: Adjust bucket sizes with the new thumb standard sizes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237517 (https://phabricator.wikimedia.org/T412971) (owner: 10Ladsgroup) [15:09:51] !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1237517|MediaViewer: Adjust bucket sizes with the new thumb standard sizes (T412971)]] [15:10:13] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply [15:10:57] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply [15:11:04] (03CR) 10Hnowlan: [C:03+1] changeprop-jobqueue: Remove thumbnail render job concurrency [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237504 (https://phabricator.wikimedia.org/T415282) (owner: 10Ladsgroup) [15:11:15] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply [15:11:47] T412971: Propose a new set of standard thumbnail sizes - https://phabricator.wikimedia.org/T412971 [15:11:50] !log ladsgroup@deploy2002 ladsgroup: Backport for [[gerrit:1237517|MediaViewer: Adjust bucket sizes with the new thumb standard sizes (T412971)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [15:11:55] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply [15:13:03] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply [15:13:41] !log ladsgroup@deploy2002 ladsgroup: Continuing with sync [15:13:46] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply [15:14:42] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [15:17:46] !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1237517|MediaViewer: Adjust bucket sizes with the new thumb standard sizes (T412971)]] (duration: 07m 54s) [15:17:49] T412971: Propose a new set of standard thumbnail sizes - https://phabricator.wikimedia.org/T412971 [15:17:55] (03CR) 10Elukey: [C:03+1] ml-staging-codfw: restore kserve 0.12 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237924 (owner: 10Dpogorzelski) [15:20:25] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply [15:21:14] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply [15:23:13] !log root@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None [15:23:24] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply [15:23:57] 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11597066 (10Ladsgroup) >>! In T414805#11591828, @Ladsgroup wrote: > I‌ think this should fix it: https://gerrit.wikimedia.org/r/c/ope... [15:24:12] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply [15:26:59] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply [15:27:38] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply [15:28:05] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply [15:28:44] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply [15:29:41] 06SRE, 06Privacy Engineering, 06Traffic: Create and document Wikidough's privacy policy - https://phabricator.wikimedia.org/T275409#11597081 (10ssingh) a:03ssingh [15:30:05] Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T1530) [15:30:40] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: apply [15:31:28] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: apply [15:32:27] (03CR) 10Dpogorzelski: [C:03+2] ml-staging-codfw: restore kserve 0.12 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237924 (owner: 10Dpogorzelski) [15:32:58] !log fceratto@cumin1003 START - Cookbook sre.ganeti.makevm for new host dborch1002.eqiad.wmnet [15:33:00] !log fceratto@cumin1003 START - Cookbook sre.dns.netbox [15:33:19] (03CR) 10Scott French: [C:03+1] "Indeed, yeah - setting `enabled: false` only makes sense as a temporary measure while we delete the topics (after which there's no risk of" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237504 (https://phabricator.wikimedia.org/T415282) (owner: 10Ladsgroup) [15:35:12] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply [15:35:53] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply [15:36:27] !log fceratto@cumin1003 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [15:36:41] !log fceratto@cumin1003 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1002.eqiad.wmnet [15:36:42] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply [15:37:06] 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Add Jacob Thwaites WMDE to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T416358#11597097 (10tappof) Ok, thank you @Dzahn for helping me figure it out. Could someone from the following please approve the request? @conny-kawohl_WMDE @W... [15:37:30] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply [15:38:07] !log fceratto@cumin1003 START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org [15:38:09] !log fceratto@cumin1003 START - Cookbook sre.dns.netbox [15:38:24] 06SRE, 06Infrastructure-Foundations: Migrate diffscan VM to Trixie - https://phabricator.wikimedia.org/T415347#11597108 (10ayounsi) p:05Triage→03Low a:03ayounsi [15:38:36] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply [15:39:28] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply [15:39:28] 10SRE-tools, 10homer, 06Infrastructure-Foundations: Decom cookbook: run Homer when needed - https://phabricator.wikimedia.org/T416313#11597114 (10ayounsi) p:05Triage→03Low [15:39:42] (03CR) 10JHathaway: [C:03+1] "I have a bit of a masochistic soft spot for HP hardware, I especially liked their OOB serial console bios support, but I think we can easi" [puppet] - 10https://gerrit.wikimedia.org/r/1237499 (owner: 10Muehlenhoff) [15:39:47] nothing is happening in test kitchen, deploying something [15:41:48] (03PS1) 10Ladsgroup: GrowthExperimentsUserImpactUpdater: Do not compute data on every edit [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237931 (https://phabricator.wikimedia.org/T416171) [15:41:51] 06SRE, 06Data-Platform-SRE, 06Infrastructure-Foundations, 07Epic: Migrate Docker images running in Production away from Bullseye - https://phabricator.wikimedia.org/T416452#11597131 (10elukey) p:05Triage→03Medium [15:41:52] !log fceratto@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - fceratto@cumin1003" [15:41:56] !log fceratto@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - fceratto@cumin1003" [15:41:56] !log fceratto@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [15:41:57] !log fceratto@cumin1003 START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors [15:42:00] !log fceratto@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors [15:42:10] (03CR) 10Ladsgroup: [C:03+2] changeprop-jobqueue: Remove thumbnail render job concurrency [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237504 (https://phabricator.wikimedia.org/T415282) (owner: 10Ladsgroup) [15:42:21] 10ops-esams, 06SRE, 06DC-Ops, 10netops, and 2 others: ESAMS 502 broken pipe connection issues - https://phabricator.wikimedia.org/T415473#11597140 (10LSobanski) [15:42:24] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [15:42:28] !log fceratto@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dborch1002.wikimedia.org - fceratto@cumin1003" [15:42:32] !log fceratto@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dborch1002.wikimedia.org - fceratto@cumin1003" [15:42:32] !log fceratto@cumin1003 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host dborch1002.wikimedia.org [15:42:44] 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure: Remove Puppet 5 CA cert from wmf-certificates cert bundle - https://phabricator.wikimedia.org/T415255#11597157 (10MoritzMuehlenhoff) p:05Triage→03High [15:43:22] (03PS1) 10Volans: .wmfconfig: remove bullseye build [software/spicerack] - 10https://gerrit.wikimedia.org/r/1237932 [15:43:42] (03CR) 10Elukey: [C:03+1] .wmfconfig: remove bullseye build [software/spicerack] - 10https://gerrit.wikimedia.org/r/1237932 (owner: 10Volans) [15:44:03] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [15:44:09] (03Merged) 10jenkins-bot: changeprop-jobqueue: Remove thumbnail render job concurrency [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237504 (https://phabricator.wikimedia.org/T415282) (owner: 10Ladsgroup) [15:44:19] !log ladsgroup@deploy2002 helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply [15:44:26] !log ladsgroup@deploy2002 helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply [15:44:26] (03CR) 10Scott French: [C:03+1] "Thanks, Yiannis!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237880 (https://phabricator.wikimedia.org/T410296) (owner: 10Jgiannelos) [15:44:46] !log ladsgroup@deploy2002 helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply [15:44:51] !log ladsgroup@deploy2002 helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply [15:45:12] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [15:45:33] (03Abandoned) 10Jgiannelos: wikifeeds: Use flags for memory compatibility with node18 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237880 (https://phabricator.wikimedia.org/T410296) (owner: 10Jgiannelos) [15:45:44] !log ladsgroup@deploy2002 helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply [15:46:19] !log ladsgroup@deploy2002 helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply [15:46:22] (03Restored) 10Jgiannelos: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237884 (owner: 10PipelineBot) [15:46:31] !log ladsgroup@deploy2002 helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply [15:46:58] (03CR) 10Jgiannelos: [C:03+2] wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237884 (owner: 10PipelineBot) [15:47:43] !log ladsgroup@deploy2002 helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply [15:48:07] 10ops-esams, 06SRE, 06DC-Ops, 10netops, and 2 others: ESAMS 502 broken pipe connection issues - https://phabricator.wikimedia.org/T415473#11597191 (10ssingh) 05Open→03Resolved a:03ssingh Marking this as resolved since the failure was transient and has since resolved. The monitoring not being fire... [15:48:36] (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237931 (https://phabricator.wikimedia.org/T416171) (owner: 10Ladsgroup) [15:49:02] (03Merged) 10jenkins-bot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237884 (owner: 10PipelineBot) [15:49:11] !log ladsgroup@deploy2002 helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply [15:49:26] Thanks urandom ! [15:49:32] (03CR) 10Vgutierrez: [C:03+1] k8s-staging: Switch to IPIP mode [puppet] - 10https://gerrit.wikimedia.org/r/1237277 (https://phabricator.wikimedia.org/T352956) (owner: 10Alexandros Kosiaris) [15:49:38] sorry, wrong ping urbanecm :D [15:50:09] (03CR) 10CI reject: [V:04-1] .wmfconfig: remove bullseye build [software/spicerack] - 10https://gerrit.wikimedia.org/r/1237932 (owner: 10Volans) [15:50:14] Amir1: happens to all of us! [15:50:30] xD [15:50:58] let's hope it'll do the trick [15:51:03] !log ladsgroup@deploy2002 helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply [15:51:14] !log sukhe@puppetserver1001 conftool action : set/pooled=no; selector: name=dns4004.wikimedia.org [reason: bird2 upgrade] [15:52:34] (03CR) 10Jgiannelos: "Good, call. I thought it was reverted to node 18. I will just bump the image then." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237880 (https://phabricator.wikimedia.org/T410296) (owner: 10Jgiannelos) [15:52:39] (03CR) 10Muehlenhoff: .wmfconfig: remove bullseye build (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1237932 (owner: 10Volans) [15:53:03] !log jgiannelos@deploy2002 helmfile [staging] START helmfile.d/services/wikifeeds: apply [15:53:13] 06SRE, 06Infrastructure-Foundations, 10netops: Update esams network pop diagrams - https://phabricator.wikimedia.org/T368084#11597211 (10Papaul) Yes i can take it . thanks [15:53:28] !log jgiannelos@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifeeds: apply [15:53:34] !log jgiannelos@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifeeds: apply [15:53:44] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [15:54:00] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [15:54:05] !log trueg@deploy2002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply [15:54:07] !log jgiannelos@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply [15:54:13] !log trueg@deploy2002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply [15:54:16] !log jgiannelos@deploy2002 helmfile [codfw] START helmfile.d/services/wikifeeds: apply [15:54:16] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [15:54:43] !log jgiannelos@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply [15:55:11] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [15:56:40] !log sukhe@puppetserver1001 conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: [end] bird2 upgrade] [15:57:01] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [15:57:58] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [15:58:07] (03Merged) 10jenkins-bot: GrowthExperimentsUserImpactUpdater: Do not compute data on every edit [extensions/GrowthExperiments] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237931 (https://phabricator.wikimedia.org/T416171) (owner: 10Ladsgroup) [15:58:27] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1237931|GrowthExperimentsUserImpactUpdater: Do not compute data on every edit (T416171)]] [15:58:34] T416171: s2 primary master getting reads? - https://phabricator.wikimedia.org/T416171 [16:00:22] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [16:00:30] !log urbanecm@deploy2002 ladsgroup, urbanecm: Backport for [[gerrit:1237931|GrowthExperimentsUserImpactUpdater: Do not compute data on every edit (T416171)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [16:01:39] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [16:04:26] !log urbanecm@deploy2002 ladsgroup, urbanecm: Continuing with sync [16:05:45] PROBLEM - Host cloudnet2006-dev is DOWN: PING CRITICAL - Packet loss = 100% [16:06:46] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [16:07:12] FIRING: HelmReleaseBadStatus: Helm release kserve/kserve on k8s-mlstaging@codfw in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s-mlstaging&var-namespace=kserve - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [16:08:13] RECOVERY - Host cloudnet2006-dev is UP: PING OK - Packet loss = 0%, RTA = 30.26 ms [16:08:30] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1237931|GrowthExperimentsUserImpactUpdater: Do not compute data on every edit (T416171)]] (duration: 10m 02s) [16:08:34] T416171: s2 primary master getting reads? - https://phabricator.wikimedia.org/T416171 [16:08:56] !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [16:09:18] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:10:45] PROBLEM - Host cloudnet2005-dev is DOWN: PING CRITICAL - Packet loss = 100% [16:13:13] RECOVERY - Host cloudnet2005-dev is UP: PING OK - Packet loss = 0%, RTA = 30.22 ms [16:16:04] FIRING: MediaWikiElevatedUnknownLogins: Elevated number of login successes (source unknown) via mw-web - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins [16:17:42] 06SRE, 06Infrastructure-Foundations, 10netops, 06Data-Platform-SRE (2026.01.23 - 2026.02.13), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11597410 (10cmooney) p:05Triage→03Medium >>! In T414460#11582569, @Gehel wrote: > With the various... [16:17:51] (03PS1) 10Muehlenhoff: sre.ganeti.makevm: Stop passing the puppetversion [cookbooks] - 10https://gerrit.wikimedia.org/r/1237942 (https://phabricator.wikimedia.org/T365798) [16:20:44] (03CR) 10Ottomata: [C:03+1] "<3" [puppet] - 10https://gerrit.wikimedia.org/r/1237502 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [16:21:04] RESOLVED: MediaWikiElevatedUnknownLogins: Elevated number of login successes (source unknown) via mw-web - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins [16:22:24] 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Add Jacob Thwaites WMDE to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T416358#11597453 (10karapayneWMDE) >>! In T416358#11579161, @conny-kawohl_WMDE wrote: > Hi my name is Conny Kawohl, and I am the Engineering Manager of @Jacob_WMD... [16:22:28] 10ops-codfw, 06SRE, 06DC-Ops: Q3:rack/setup/install frqueue2004 - https://phabricator.wikimedia.org/T416251#11597458 (10Jhancock.wm) i'll rack it in the old rack then. no need to add a layer of confusion. [16:23:12] (03CR) 10Muehlenhoff: [C:03+2] Make bast1004 a bastion [puppet] - 10https://gerrit.wikimedia.org/r/1237916 (https://phabricator.wikimedia.org/T416254) (owner: 10Muehlenhoff) [16:23:56] 10ops-esams, 06SRE, 06DC-Ops, 10netops, and 2 others: ESAMS 502 broken pipe connection issues - https://phabricator.wikimedia.org/T415473#11597481 (10cmooney) >>! In T415473#11552172, @ssingh wrote: > We had a transient link failure between eqiad and esams that resulted in this issue. Correct this cor... [16:25:59] (03PS1) 10Dpogorzelski: istio ml-serve: update istio operator manifest [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237944 [16:26:23] !log fceratto@cumin1003 START - Cookbook sre.hosts.reimage for host dborch1002.wikimedia.org with OS trixie [16:27:13] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance [16:27:21] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1167 (T410589)', diff saved to https://phabricator.wikimedia.org/P88732 and previous config saved to /var/cache/conftool/dbconfig/20260209-162720-ladsgroup.json [16:27:24] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [16:28:17] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-product-users, airflow-analytics-product-admins for akhatun - https://phabricator.wikimedia.org/T416703#11597523 (10MoritzMuehlenhoff) @mpopov Can you clarify, it's my understanding that access to https://wikitech.wikimedia.org/... [16:28:35] (03CR) 10Muehlenhoff: "Let's wait for a reply on https://phabricator.wikimedia.org/T416703#11597523" [puppet] - 10https://gerrit.wikimedia.org/r/1237920 (https://phabricator.wikimedia.org/T416703) (owner: 10Tiziano Fogli) [16:30:05] jan_drewniak: Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T1630). Please do the needful. [16:34:18] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:35:13] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:37:20] (03CR) 10Muehlenhoff: [C:03+1] "That's a terrible hack, but should do the trick" [puppet] - 10https://gerrit.wikimedia.org/r/1237502 (https://phabricator.wikimedia.org/T416674) (owner: 10Elukey) [16:38:43] PROBLEM - MariaDB Replica Lag: s8 on an-redacteddb1001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 619.75 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [16:38:47] PROBLEM - MariaDB Replica Lag: s8 on clouddb1020 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 624.03 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [16:38:47] PROBLEM - MariaDB Replica Lag: s8 on clouddb1016 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 624.10 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [16:38:53] PROBLEM - MariaDB Replica Lag: s8 on db1154 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 631.32 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [16:39:52] !log installing intel-microcode bugfix updates from Trixie point release [16:39:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:11] 06SRE, 10SRE-Access-Requests: Remove Morten Warncke-Wang’s production shell access - https://phabricator.wikimedia.org/T416754#11597670 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff @Nettrom Your previous work-related production access has already been removed via the central WMF staf... [16:50:16] !log root@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None [16:51:05] (03CR) 10Elukey: [C:03+1] "Left a comment but you are free to test it, some extra tuning may be needed." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237944 (owner: 10Dpogorzelski) [16:52:32] 06SRE: (Potential) Pod Distribution Tracking and Rebalancing Strategy - https://phabricator.wikimedia.org/T413070#11597709 (10tappof) p:05Triage→03Medium [16:54:45] (03PS2) 10Dpogorzelski: istio ml-serve: update istio operator manifest [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237944 [16:54:55] (03CR) 10Dpogorzelski: istio ml-serve: update istio operator manifest (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237944 (owner: 10Dpogorzelski) [16:57:50] 10ops-eqiad, 06SRE, 06DC-Ops: Q2:rack/setup/install mc1055-72 - https://phabricator.wikimedia.org/T412255#11597766 (10jijiki) [16:59:18] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [17:00:20] (03CR) 10JHathaway: [C:03+1] sre.ganeti.makevm: Stop passing the puppetversion [cookbooks] - 10https://gerrit.wikimedia.org/r/1237942 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [17:01:20] 10ops-eqiad, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Q2:rack/setup/install mc1055-72 - https://phabricator.wikimedia.org/T412255#11597800 (10MLechvien-WMF) [17:03:18] (03PS1) 10Jgreen: Return fundraising traffic to eqiad now that rack migration is complete. [dns] - 10https://gerrit.wikimedia.org/r/1237952 (https://phabricator.wikimedia.org/T403035) [17:06:21] 06SRE, 06Traffic, 13Patch-For-Review: All github action tests of Pywikibot fails due to 429 status code (TOO MANY REQUESTS) - https://phabricator.wikimedia.org/T414173#11597848 (10Xqt) [17:06:34] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237373 (https://phabricator.wikimedia.org/T145604) (owner: 10C. Scott Ananian) [17:07:33] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/999060 (https://phabricator.wikimedia.org/T357054) (owner: 10C. Scott Ananian) [17:07:33] (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237954 [17:08:39] (03PS3) 10Dzahn: switch gerrit service IP to CDN [dns] - 10https://gerrit.wikimedia.org/r/1215709 (https://phabricator.wikimedia.org/T411895) [17:09:04] (03CR) 10Dzahn: [C:03+1] "the plan is to merge this 23 hours from now, Tue Feb 10, 1600 UTC" [dns] - 10https://gerrit.wikimedia.org/r/1215709 (https://phabricator.wikimedia.org/T411895) (owner: 10Dzahn) [17:09:11] (03CR) 10Pmiazga: rest gateway: add tests for chart rendering (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1225085 (owner: 10Daniel Kinzler) [17:10:35] (03CR) 10TrainBranchBot: [C:03+2] "Approved by otto@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237870 (https://phabricator.wikimedia.org/T360794) (owner: 10JavierMonton) [17:11:30] (03Merged) 10jenkins-bot: component: mediawiki.page_html_content_change.dev0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237870 (https://phabricator.wikimedia.org/T360794) (owner: 10JavierMonton) [17:11:47] !log otto@deploy2002 Started scap sync-world: Backport for [[gerrit:1237870|component: mediawiki.page_html_content_change.dev0 (T360794)]] [17:11:51] T360794: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794 [17:13:45] !log otto@deploy2002 otto, javiermonton: Backport for [[gerrit:1237870|component: mediawiki.page_html_content_change.dev0 (T360794)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [17:14:47] (03PS11) 10Pppery: Handle languages with nonstandard plural rules [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217845 (https://phabricator.wikimedia.org/T412422) [17:14:52] (03CR) 10Dzahn: [C:03+2] gerrit::sshkey: add gerrit-lb IPs to host_aliases ssh key [puppet] - 10https://gerrit.wikimedia.org/r/1237887 (https://phabricator.wikimedia.org/T411895) (owner: 10Jelto) [17:15:14] !log otto@deploy2002 otto, javiermonton: Continuing with sync [17:16:08] (03PS1) 10Eevans: restbase: new host (refresh) restbase2039 [puppet] - 10https://gerrit.wikimedia.org/r/1237956 (https://phabricator.wikimedia.org/T416538) [17:16:30] (03CR) 10Dwisehaupt: [C:03+1] "This looks right for the switch back. shipit." [dns] - 10https://gerrit.wikimedia.org/r/1237952 (https://phabricator.wikimedia.org/T403035) (owner: 10Jgreen) [17:18:06] (03CR) 10Jgreen: [C:03+2] Return fundraising traffic to eqiad now that rack migration is complete. [dns] - 10https://gerrit.wikimedia.org/r/1237952 (https://phabricator.wikimedia.org/T403035) (owner: 10Jgreen) [17:18:14] 06SRE, 06Infrastructure-Foundations: Integrate Trixie 13.3 point update - https://phabricator.wikimedia.org/T414179#11597939 (10MoritzMuehlenhoff) [17:18:34] !log jgreen@dns1004 START - running authdns-update [17:19:22] !log otto@deploy2002 Finished scap sync-world: Backport for [[gerrit:1237870|component: mediawiki.page_html_content_change.dev0 (T360794)]] (duration: 07m 34s) [17:19:25] T360794: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794 [17:19:45] !log jgreen@dns1004 END - running authdns-update [17:19:47] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops, 13Patch-For-Review: FY2526 Q3:rack/setup/install restbase2039 - https://phabricator.wikimedia.org/T416538#11597951 (10Eevans) a:05Eevans→03None [17:19:57] (03PS5) 10Ssingh: wikimedia.org: add IPv6 AAAA record for ns1 [dns] - 10https://gerrit.wikimedia.org/r/1236354 (https://phabricator.wikimedia.org/T81605) [17:21:36] (03CR) 10Ssingh: [C:03+2] wikimedia.org: add IPv6 AAAA record for ns1 [dns] - 10https://gerrit.wikimedia.org/r/1236354 (https://phabricator.wikimedia.org/T81605) (owner: 10Ssingh) [17:21:51] FIRING: SwitchCoreInterfaceDown: Switch core interface down - fasw1-f5a-codfw:ge-0/0/0 (Core: fmsw-f5-codfw) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=fasw1-f5a-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown [17:21:51] PROBLEM - Host fasw1-f5b-codfw is DOWN: PING CRITICAL - Packet loss = 100% [17:21:52] !log sukhe@dns1004 START - running authdns-update [17:22:23] RECOVERY - Host fasw1-f5b-codfw is UP: PING OK - Packet loss = 0%, RTA = 30.75 ms [17:23:02] !log sukhe@dns1004 END - running authdns-update [17:23:12] (03CR) 10Jforrester: "Yup, agreed that this is a very scary area of the code to be touching. This is part of the three-month-long UBN fixes agreed with Alex as " [puppet] - 10https://gerrit.wikimedia.org/r/1229229 (https://phabricator.wikimedia.org/T411807) (owner: 10Jforrester) [17:24:05] (03CR) 10C. Scott Ananian: [C:03+1] Add MultiTitle to extension list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224793 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:24:13] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224793 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:24:38] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224794 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:25:03] (03CR) 10Jgiannelos: [C:03+2] mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237954 (owner: 10PipelineBot) [17:26:24] (03PS2) 10Dzahn: admin: add Jacon Thwaites to ldap_only admins [puppet] - 10https://gerrit.wikimedia.org/r/1237518 (https://phabricator.wikimedia.org/T416358) [17:27:07] (03Merged) 10jenkins-bot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237954 (owner: 10PipelineBot) [17:29:09] (03PS2) 10C. Scott Ananian: Add MultiTitle to extension list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224793 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:29:09] (03PS3) 10C. Scott Ananian: Add config variable for MultiTitle [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224794 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:29:10] (03PS3) 10C. Scott Ananian: Enable MultiTitle on beta cluster testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224795 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:29:10] (03PS3) 10C. Scott Ananian: Load MultiTitle on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224796 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:29:26] (03CR) 10C. Scott Ananian: [C:03+1] Load MultiTitle on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224796 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:29:31] (03CR) 10C. Scott Ananian: [C:03+1] Enable MultiTitle on beta cluster testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224795 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:29:36] (03CR) 10C. Scott Ananian: [C:03+1] Add config variable for MultiTitle [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224794 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:31:22] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224795 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:31:44] !log jasmine@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2052-2054,2063,2079-2084,2096-2101].codfw.wmnet [17:32:10] !log jasmine@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2052-2054,2063,2079-2084,2096-2101].codfw.wmnet [17:32:54] (03CR) 10Dzahn: [C:03+2] admin: add Jacon Thwaites to ldap_only admins [puppet] - 10https://gerrit.wikimedia.org/r/1237518 (https://phabricator.wikimedia.org/T416358) (owner: 10Dzahn) [17:33:17] (03CR) 10Jasmine: [C:03+2] wikikube: decommission worker[2052-2054,2063,2079-2084,2096-2101].codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/1227431 (https://phabricator.wikimedia.org/T409103) (owner: 10Jasmine) [17:34:39] !log LDAP - added jacobthwaites to groups wmde and nda - T416358 [17:34:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:34:42] T416358: Add Jacob Thwaites WMDE to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T416358 [17:35:26] 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Add Jacob Thwaites WMDE to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T416358#11598108 (10Dzahn) [17:35:53] (03CR) 10C. Scott Ananian: [C:03+1] Load MultiTitle on beta cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224796 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:36:14] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224796 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [17:36:33] 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Add Jacob Thwaites WMDE to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T416358#11598116 (10Dzahn) Hi @Jacob_WMDE you have been added to the requested LDAP groups "wmde" and "nda". This is like what other WMDE staff have. So you sh... [17:36:54] (03CR) 10MVernon: [C:03+1] restbase: new host (refresh) restbase2039 [puppet] - 10https://gerrit.wikimedia.org/r/1237956 (https://phabricator.wikimedia.org/T416538) (owner: 10Eevans) [17:37:40] 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Add Jacob Thwaites WMDE to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T416358#11598119 (10Dzahn) 05Open→03Resolved a:03Dzahn you have the groups. feel free to reopen if you run into any issues. [17:38:46] !log fceratto@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dborch1002.wikimedia.org with OS trixie [17:41:51] RESOLVED: SwitchCoreInterfaceDown: Switch core interface down - fasw1-f5a-codfw:ge-0/0/0 (Core: fmsw-f5-codfw) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=fasw1-f5a-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown [17:42:10] !log jasmine@cumin2002 START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2052-2054].codfw.wmnet [17:46:02] FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate _etcd-server-ssl._tcp.ml_etcd.codfw.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [17:49:19] (03CR) 10Dzahn: [C:03+2] admin_ng: add status.wikimedia.org to miscweb TLS extra SANs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237524 (https://phabricator.wikimedia.org/T414098) (owner: 10Dzahn) [17:50:16] !log jasmine@cumin2002 START - Cookbook sre.dns.netbox [17:53:57] !log jasmine@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2052-2054].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002" [17:54:25] !log jasmine@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2052-2054].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002" [17:54:25] !log jasmine@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [17:54:26] !log jasmine@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2052-2054].codfw.wmnet [17:57:34] (03Merged) 10jenkins-bot: admin_ng: add status.wikimedia.org to miscweb TLS extra SANs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237524 (https://phabricator.wikimedia.org/T414098) (owner: 10Dzahn) [18:00:04] (03CR) 10Aklapper: [V:03+2 C:03+2] "After a rebase this applies cleanly locally, the code makes sense to me (after refreshing details of some Slavic grammars), the generate/e" [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217845 (https://phabricator.wikimedia.org/T412422) (owner: 10Pppery) [18:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T1800) [18:00:05] ryankemper: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T1800). [18:01:16] !log jasmine@cumin2002 START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2063,2079-2082].codfw.wmnet [18:01:45] jouncebot: Groundhog day was last week [18:04:01] (03CR) 10Aklapper: [V:03+2 C:03+2] "Makes sense per the code in upstream Phorge's /src/infrastructure/internationalization/management/PhorgeInternationalizationValidator.php " [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1233259 (owner: 10Pppery) [18:14:02] !log jasmine@cumin2002 START - Cookbook sre.dns.netbox [18:17:17] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224793 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [18:17:40] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224794 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [18:17:55] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224795 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [18:18:09] !log jasmine@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2063,2079-2082].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002" [18:18:10] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224796 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [18:18:27] !log jasmine@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2063,2079-2082].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002" [18:18:28] !log jasmine@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:18:29] !log jasmine@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2063,2079-2082].codfw.wmnet [18:18:58] !log jasmine@cumin2002 START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2083-2084].codfw.wmnet [18:25:28] !log jasmine@cumin2002 START - Cookbook sre.dns.netbox [18:27:02] (03CR) 10BCornwall: varnish: Restrict unauth sitemap access to verified crawlers (cat B) (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1233188 (https://phabricator.wikimedia.org/T407122) (owner: 10Krinkle) [18:29:44] (03PS1) 10Jdlrobson: Enable site notices on Minerva [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237348 (https://phabricator.wikimedia.org/T416644) [18:30:07] (03PS2) 10Jdlrobson: Enable site notices on Minerva [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237348 (https://phabricator.wikimedia.org/T416644) [18:31:12] !log jasmine@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2083-2084].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002" [18:31:42] !log jasmine@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2083-2084].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002" [18:31:42] !log jasmine@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:31:43] !log jasmine@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2083-2084].codfw.wmnet [18:32:45] !log dzahn@deploy1003 helmfile [staging-codfw] START helmfile.d/admin 'apply'. [18:32:48] !log jasmine@cumin2002 START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2096-2100].codfw.wmnet [18:39:20] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237348 (https://phabricator.wikimedia.org/T416644) (owner: 10Jdlrobson) [18:42:07] !log fceratto@cumin1003 START - Cookbook sre.hosts.reimage for host dborch1002.wikimedia.org with OS trixie [18:45:18] !log jasmine@cumin2002 START - Cookbook sre.dns.netbox [18:48:51] !log jasmine@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2096-2100].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002" [18:49:16] !log jasmine@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2096-2100].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002" [18:49:16] !log jasmine@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:49:17] !log jasmine@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2096-2100].codfw.wmnet [18:49:49] !log jasmine@cumin2002 START - Cookbook sre.hosts.decommission for hosts wikikube-worker2101.codfw.wmnet [18:52:00] !log dzahn@deploy1003 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [18:53:09] (03CR) 10Dzahn: "deployed - I wanted to deploy https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1237524 and got this in the diff for admin_n" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230984 (owner: 10Clément Goubert) [18:53:32] !log dzahn@deploy1003 helmfile [staging-eqiad] START helmfile.d/admin 'apply'. [18:53:41] (03CR) 10BCornwall: prometheus: add depooled cp* host check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: 10CDobbins) [18:54:03] !log dzahn@deploy1003 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. [18:54:10] !log dzahn@deploy1003 helmfile [codfw] START helmfile.d/admin 'apply'. [18:54:33] !log jasmine@cumin2002 START - Cookbook sre.dns.netbox [18:55:41] (03CR) 10BCornwall: "I would rename `node_depooled_cp_hosts.pp` to `node_pooled_cp_hosts.pp` to reflect the affirmative nature of the check now" [puppet] - 10https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: 10CDobbins) [18:55:45] !log dzahn@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'apply'. [18:56:18] (03CR) 10BCornwall: "(unresolved)" [puppet] - 10https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: 10CDobbins) [18:56:21] !log dzahn@deploy1003 helmfile [eqiad] START helmfile.d/admin 'apply'. [18:57:17] !log dzahn@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'apply'. [19:00:06] (03CR) 10BCornwall: [C:03+1] Remove puppet/config-master records pointint to puppetmaster2001 [dns] - 10https://gerrit.wikimedia.org/r/1237463 (https://phabricator.wikimedia.org/T416606) (owner: 10Muehlenhoff) [19:00:51] !log jasmine@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker2101.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002" [19:01:10] !log jasmine@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker2101.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002" [19:01:11] !log jasmine@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [19:01:12] !log jasmine@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker2101.codfw.wmnet [19:01:38] (03CR) 10Dzahn: "deployed per https://wikitech.wikimedia.org/wiki/Kubernetes/Remove_a_service#Deploy_changes_to_helmfile.d/admin_ng" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1237524 (https://phabricator.wikimedia.org/T414098) (owner: 10Dzahn) [19:02:22] !log “homer following T409103” [19:02:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:02:26] T409103: decommission wikikube-worker[2052-2054,2063,2079-2084,2096-2101].codfw.wmnet - https://phabricator.wikimedia.org/T409103 [19:12:06] 10ops-codfw, 06DC-Ops, 10decommission-hardware, 10Prod-Kubernetes, and 2 others: decommission wikikube-worker[2052-2054,2063,2079-2084,2096-2101].codfw.wmnet - https://phabricator.wikimedia.org/T409103#11598616 (10jasmine_) a:05jasmine_→03None [19:13:20] (03PS1) 10Dzahn: zuul::executor: add Hiera key for tls_config_dir [puppet] - 10https://gerrit.wikimedia.org/r/1237983 (https://phabricator.wikimedia.org/T395938) [19:13:26] 06SRE, 06Traffic, 13Patch-For-Review: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605#11598623 (10ssingh) `ns1` v6 went live today. We will wait for a bit more data to come in but we can see an uptick in connections to the v6 already. @BBlack suggested that instead of doing unicas... [19:14:42] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [19:25:20] (03CR) 10Dzahn: [C:03+2] zuul::executor: add Hiera key for tls_config_dir [puppet] - 10https://gerrit.wikimedia.org/r/1237983 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn) [19:27:15] (03CR) 10Dzahn: [C:03+2] "https://gerrit.wikimedia.org/r/c/operations/puppet/+/1237983" [puppet] - 10https://gerrit.wikimedia.org/r/1237543 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn) [19:32:11] jouncebot: now [19:32:11] No deployments scheduled for the next 1 hour(s) and 27 minute(s) [19:32:29] I am restarting Gerrit because of a broken replication to github [19:34:15] !log restarting Gerrit to fix broken replication to GitHub # T416912 [19:34:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:34:18] T416912: Replication to GitHub seems to have stalled - https://phabricator.wikimedia.org/T416912 [19:35:46] FIRING: [5x] GerritHAProxyBackendUnavailable: Gerrit backend is unavilable for tcp-proxy (HAProxy) gerrit_ssh - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyBackendUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyBackendUnavailable [19:36:46] FIRING: GerritHAProxyServiceUnavailable: Gerrit tcp-proxy (HAProxy) service gerrit_ssh is DOWN in codfw - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyServiceUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyServiceUnavailable [19:37:51] new monitoring firing because gerrit was restarted [19:38:14] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-product-users, airflow-analytics-product-admins for akhatun - https://phabricator.wikimedia.org/T416703#11598713 (10mpopov) @MoritzMuehlenhoff: Oh that's a great point. Yes, the //outcome// is that @AKhatun_WMF would have admin... [19:40:46] RESOLVED: [9x] GerritHAProxyBackendUnavailable: Gerrit backend is unavilable for tcp-proxy (HAProxy) gerrit_ssh - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyBackendUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyBackendUnavailable [19:41:46] RESOLVED: GerritHAProxyServiceUnavailable: Gerrit tcp-proxy (HAProxy) service gerrit_ssh is DOWN in codfw - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyServiceUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyServiceUnavailable [19:42:11] !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on zuul2002.codfw.wmnet with reason: WIP [19:42:40] !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on zuul1002.eqiad.wmnet with reason: WIP [19:45:45] 06SRE, 07SRE-Unowned, 06serviceops-radar, 10wikitech.wikimedia.org, 13Patch-For-Review: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#11598756 (10Andrew) I'm pretty sure I have images getting rolled into the static site now. Going to wait a few days for @taavi to find another issue... [19:47:54] (03PS1) 10Ssingh: bird: add return type for function (bool) [puppet] - 10https://gerrit.wikimedia.org/r/1238007 [19:49:00] (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8004/co" [puppet] - 10https://gerrit.wikimedia.org/r/1238007 (owner: 10Ssingh) [19:49:56] (03PS2) 10Ssingh: bird: add return type for function (bool) [puppet] - 10https://gerrit.wikimedia.org/r/1238007 [19:50:59] (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8005/co" [puppet] - 10https://gerrit.wikimedia.org/r/1238007 (owner: 10Ssingh) [19:51:30] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q3:rack/setup/install frdb1008 - https://phabricator.wikimedia.org/T414374#11598767 (10VRiley-WMF) [19:53:40] !log fceratto@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dborch1002.wikimedia.org with OS trixie [19:53:47] cccccbukvgbchhkektngucrhdhedkuvubvjdrbgidtnn [19:54:00] (03CR) 10Ssingh: [C:04-2] "We will merge this when everything is on bird2 2.14+. We can conditional on routed_ganeti but it's probably not worth it right now, since " [puppet] - 10https://gerrit.wikimedia.org/r/1238007 (owner: 10Ssingh) [19:54:19] (03CR) 10Dzahn: "https://phabricator.wikimedia.org/T416912#11598660" [puppet] - 10https://gerrit.wikimedia.org/r/1237450 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [19:54:38] yubikey-like typing detected [19:54:47] (03PS2) 10Cathal Mooney: wikimedia.org: add IPv6 AAAA record for ns0 [dns] - 10https://gerrit.wikimedia.org/r/1236798 (https://phabricator.wikimedia.org/T81605) [19:56:59] rzl: reference to http://www.bitboost.com/pawsense/ detected?:) [19:57:29] '"cat-like typing detected"-like typing detected'-like typing detected! [19:57:40] hehehe [20:00:19] (03PS1) 10Zabe: Start reading from il_target_id on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238010 (https://phabricator.wikimedia.org/T413669) [20:01:44] 06SRE, 10envoy, 06ServiceOps new, 10ServiceOps-Services-Oids: Upgrade Envoy to v1.35.7 - https://phabricator.wikimedia.org/T410975#11598807 (10MLechvien-WMF) @RLazarus can we close this? If not can you update the description to what is exactly needed here? [20:07:01] jouncebot: nowandnext [20:07:02] No deployments scheduled for the next 0 hour(s) and 52 minute(s) [20:07:02] In 0 hour(s) and 52 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T2100) [20:07:27] FIRING: HelmReleaseBadStatus: Helm release kserve/kserve on k8s-mlstaging@codfw in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s-mlstaging&var-namespace=kserve - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [20:09:03] (03PS1) 10Ssingh: hiera: common: authdns_addrs: remove redundant comments [puppet] - 10https://gerrit.wikimedia.org/r/1238011 [20:09:03] (03PS1) 10Ssingh: hiera: dnsbox: announce ns0 v6 IP as anycast [puppet] - 10https://gerrit.wikimedia.org/r/1238012 (https://phabricator.wikimedia.org/T81605) [20:10:02] (03PS2) 10DLynch: EditCheck: add instrumentation for checks seen during edit session [extensions/VisualEditor] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237606 (https://phabricator.wikimedia.org/T413419) (owner: 10Medelius) [20:10:06] (03CR) 10Zabe: [C:03+2] Configure Hadoop source for Mostcategories computations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237596 (https://phabricator.wikimedia.org/T413362) (owner: 10Zabe) [20:12:02] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [extensions/VisualEditor] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237606 (https://phabricator.wikimedia.org/T413419) (owner: 10Medelius) [20:12:06] (03PS2) 10Cathal Mooney: wikimedia.org: add IPv6 AAAA record for ns2 [dns] - 10https://gerrit.wikimedia.org/r/1236803 (https://phabricator.wikimedia.org/T81605) [20:12:09] (03Merged) 10jenkins-bot: Configure Hadoop source for Mostcategories computations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237596 (https://phabricator.wikimedia.org/T413362) (owner: 10Zabe) [20:12:44] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1237596|Configure Hadoop source for Mostcategories computations (T413362)]] [20:12:47] T413362: Move Mostcategories computation to Hadoop - https://phabricator.wikimedia.org/T413362 [20:14:36] !log zabe@deploy2002 zabe: Backport for [[gerrit:1237596|Configure Hadoop source for Mostcategories computations (T413362)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:14:59] 06SRE, 06Traffic, 13Patch-For-Review: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605#11598835 (10ssingh) There is unfortunately one more problem here. Our Puppetization for `bird` means that we can't set up v6 addresses without v4 ones, in the same configuration. This stems from... [20:15:23] !log zabe@deploy2002 zabe: Continuing with sync [20:17:51] 06SRE, 06collaboration-services, 06Traffic, 06Release-Engineering-Team (Radar): Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11598842 (10Dzahn) [20:17:59] 10SRE-Access-Requests: Grant Access to analytics-privatedata-users for AJAVED-WMF - https://phabricator.wikimedia.org/T416922 (10AJaved-WMF) 03NEW [20:18:20] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q3:rack/setup/install frdb1008 - https://phabricator.wikimedia.org/T414374#11598853 (10VRiley-WMF) [20:19:24] (03PS1) 10Cathal Mooney: DNS IPv6 anycast: change router config to support new ranges [homer/public] - 10https://gerrit.wikimedia.org/r/1238015 (https://phabricator.wikimedia.org/T81605) [20:19:35] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1237596|Configure Hadoop source for Mostcategories computations (T413362)]] (duration: 06m 51s) [20:19:38] T413362: Move Mostcategories computation to Hadoop - https://phabricator.wikimedia.org/T413362 [20:21:12] (03Abandoned) 10Cathal Mooney: wikimedia.org: add IPv6 AAAA record for ns2 [dns] - 10https://gerrit.wikimedia.org/r/1236803 (https://phabricator.wikimedia.org/T81605) (owner: 10Cathal Mooney) [20:25:22] (03PS3) 10Cathal Mooney: wikimedia.org: add IPv6 AAAA record for ns0 [dns] - 10https://gerrit.wikimedia.org/r/1236798 (https://phabricator.wikimedia.org/T81605) [20:25:22] (03PS1) 10Cathal Mooney: wikimedia.org: add IPv6 AAAA record for ns2 [dns] - 10https://gerrit.wikimedia.org/r/1238019 (https://phabricator.wikimedia.org/T81605) [20:26:11] (03CR) 10CI reject: [V:04-1] wikimedia.org: add IPv6 AAAA record for ns2 [dns] - 10https://gerrit.wikimedia.org/r/1238019 (https://phabricator.wikimedia.org/T81605) (owner: 10Cathal Mooney) [20:27:34] (03PS2) 10Zabe: Use Hadoop for Mostcategories on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237597 (https://phabricator.wikimedia.org/T413362) [20:27:52] (03PS2) 10Cathal Mooney: wikimedia.org: add IPv6 AAAA record for ns2 [dns] - 10https://gerrit.wikimedia.org/r/1238019 (https://phabricator.wikimedia.org/T81605) [20:28:42] (03CR) 10CI reject: [V:04-1] wikimedia.org: add IPv6 AAAA record for ns2 [dns] - 10https://gerrit.wikimedia.org/r/1238019 (https://phabricator.wikimedia.org/T81605) (owner: 10Cathal Mooney) [20:29:05] (03PS3) 10Zabe: Use Hadoop for Mostcategories on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237597 (https://phabricator.wikimedia.org/T413362) [20:29:40] (03PS3) 10Cathal Mooney: wikimedia.org: add IPv6 AAAA record for ns2 [dns] - 10https://gerrit.wikimedia.org/r/1238019 (https://phabricator.wikimedia.org/T81605) [20:31:38] 06SRE, 06ServiceOps new, 06Traffic: Handle edge cache invalidation for the api gateway - https://phabricator.wikimedia.org/T324200#11598920 (10MLechvien-WMF) a:03Clement_Goubert Hi Clement, would you be able to triage this task? is it still relevant given the plans for API Gateway? [20:32:06] 06SRE, 06ServiceOps new, 10ServiceOps-SharedInfra, 06Traffic: Handle edge cache invalidation for the api gateway - https://phabricator.wikimedia.org/T324200#11598924 (10MLechvien-WMF) [20:32:29] (03CR) 10Zabe: [C:03+2] Use Hadoop for Mostcategories on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237597 (https://phabricator.wikimedia.org/T413362) (owner: 10Zabe) [20:34:08] (03Merged) 10jenkins-bot: Use Hadoop for Mostcategories on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237597 (https://phabricator.wikimedia.org/T413362) (owner: 10Zabe) [20:34:30] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1237597|Use Hadoop for Mostcategories on testwiki (T413362)]] [20:34:33] T413362: Move Mostcategories computation to Hadoop - https://phabricator.wikimedia.org/T413362 [20:36:25] !log zabe@deploy2002 zabe: Backport for [[gerrit:1237597|Use Hadoop for Mostcategories on testwiki (T413362)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:36:38] (03PS1) 10Cathal Mooney: Change ns1.wikimedia.org IPv6 address to anycast [dns] - 10https://gerrit.wikimedia.org/r/1238020 (https://phabricator.wikimedia.org/T81605) [20:36:47] (03PS1) 10Dzahn: cloudgw: add gerrit-lb IPs to Cloud VPS egress NAT exemption list [puppet] - 10https://gerrit.wikimedia.org/r/1238021 (https://phabricator.wikimedia.org/T411895) [20:36:50] !log zabe@deploy2002 zabe: Continuing with sync [20:37:27] (03PS2) 10Cathal Mooney: wikimedia.org: change IPv6 AAAA records for ns1 to anycast [dns] - 10https://gerrit.wikimedia.org/r/1238020 (https://phabricator.wikimedia.org/T81605) [20:40:52] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1237597|Use Hadoop for Mostcategories on testwiki (T413362)]] (duration: 06m 23s) [20:40:56] T413362: Move Mostcategories computation to Hadoop - https://phabricator.wikimedia.org/T413362 [20:42:06] (03PS1) 10Cathal Mooney: anycast prefixes: remove old ns1.wikimedia.org IPv6 address [homer/public] - 10https://gerrit.wikimedia.org/r/1238022 (https://phabricator.wikimedia.org/T81605) [20:42:40] (03CR) 10Cathal Mooney: [C:04-1] "Not to be merged before https://gerrit.wikimedia.org/r/c/operations/dns/+/1238020/2" [homer/public] - 10https://gerrit.wikimedia.org/r/1238022 (https://phabricator.wikimedia.org/T81605) (owner: 10Cathal Mooney) [20:45:29] 06SRE, 06ServiceOps new: Separate mediawiki latency metrics by endpoint - https://phabricator.wikimedia.org/T263727#11598970 (10MLechvien-WMF) p:05High→03Low a:03hnowlan Changing the priority to Low as this was not actioned for quite some time. @hnowlan do you recall what this was about, and is this sti... [20:47:25] (03PS1) 10Bking: opensearch-on-k8s: Align resources with docs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238024 (https://phabricator.wikimedia.org/T409501) [20:51:17] (03PS2) 10Bking: opensearch-on-k8s: Align resources with docs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238024 (https://phabricator.wikimedia.org/T409501) [20:51:52] (03PS1) 10Zabe: Reenable MostCategories on frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238025 (https://phabricator.wikimedia.org/T413362) [20:54:52] (03CR) 10Andrew Bogott: [C:03+1] "looks right. I'm not 100% sure that this is the only change we need for that checkbox." [puppet] - 10https://gerrit.wikimedia.org/r/1238021 (https://phabricator.wikimedia.org/T411895) (owner: 10Dzahn) [20:55:08] 06SRE, 06Traffic, 13Patch-For-Review: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605#11599012 (10cmooney) Yeah that seems fine to me. I've allocated some [[ https://netbox.wikimedia.org/ipam/prefixes/1398/prefixes/ | new /48s ]] from our RIPE /29 range not for these, and created... [20:59:18] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [21:00:05] RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T2100). [21:00:05] _Gerges, cscott, jan_drewniak, and cmede: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [21:01:35] \o/ [21:02:52] (03PS1) 10Zabe: Using Hadoop for MostTranscludedPages on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238028 (https://phabricator.wikimedia.org/T416927) [21:03:46] (03CR) 10Zabe: [C:04-2] Using Hadoop for MostTranscludedPages on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238028 (https://phabricator.wikimedia.org/T416927) (owner: 10Zabe) [21:04:19] <_Gerges_> here [21:04:27] I can deploy if no one else is around [21:07:13] (03CR) 10Zabe: [C:03+2] Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237902 (https://phabricator.wikimedia.org/T416779) (owner: 10GergesShamon) [21:08:04] (03Merged) 10jenkins-bot: Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237902 (https://phabricator.wikimedia.org/T416779) (owner: 10GergesShamon) [21:08:23] (03CR) 10Ryan Kemper: [C:03+1] "Looks good (defaulting to smallest size)." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238024 (https://phabricator.wikimedia.org/T409501) (owner: 10Bking) [21:08:28] _Gerges_: should there be an alias for the old meta namespace? [21:09:11] o/ (sorry, a bit late) [21:09:13] <_Gerges_> Yes, i forget :/ [21:09:17] i can spiderpig [21:09:38] _Gerges_: could you upload a follow-up, we can deploy them together then? [21:10:29] <_Gerges_> Yes [21:10:42] jan_drewniak: i can deploy your config change with mine, if you like. [21:11:11] (03CR) 10Zabe: [C:03+2] EditCheck: add instrumentation for checks seen during edit session [extensions/VisualEditor] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237606 (https://phabricator.wikimedia.org/T413419) (owner: 10Medelius) [21:11:14] hey cscott, yes please! [21:12:51] (03Merged) 10jenkins-bot: EditCheck: add instrumentation for checks seen during edit session [extensions/VisualEditor] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1237606 (https://phabricator.wikimedia.org/T413419) (owner: 10Medelius) [21:13:25] FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:13:57] (03PS1) 10GergesShamon: Add alias for arwikibooks namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238031 [21:14:05] zabe/_Gerges_ we're currently waiting for your patch to deploy, is that right? [21:14:14] (03CR) 10Bking: "Good catch, I think we should add an override for`opensearch-ipoid-test` to make it match `opensearch-ipoid`, that would be less confusing" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238024 (https://phabricator.wikimedia.org/T409501) (owner: 10Bking) [21:14:40] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on wdqs1028 - https://phabricator.wikimedia.org/T416736#11599161 (10Jclark-ctr) a:03Jclark-ctr This server is out of warranty. @wiki_willy this server is 6 years and 5 months are they due to be replaced i did not see them on upcoming procurement doc. [21:14:44] <_Gerges_> @cscott done [21:14:57] <_Gerges_> https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1238031 this patch [21:14:58] shall i get started then? i can spiderpig [21:15:23] (03PS3) 10Bking: opensearch-on-k8s: Align resources with docs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238024 (https://phabricator.wikimedia.org/T409501) [21:15:26] <_Gerges_> @zabe [21:15:36] yes, sorry [21:15:53] cscott: yes, I waited for gerges to upload a follow-up [21:16:06] <_Gerges_> i uploaded [21:16:14] <_Gerges_> https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1238031 [21:16:19] (03CR) 10Zabe: [C:03+2] Add alias for arwikibooks namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238031 (owner: 10GergesShamon) [21:16:28] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on wdqs1028 - https://phabricator.wikimedia.org/T416736#11599167 (10Jclark-ctr) I did find T405276 but it is listed as expansion [21:16:41] zabe: you want to finish 1238031 before i start then? [21:17:00] yes please, since I already merged https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1237902 [21:17:10] (03Merged) 10jenkins-bot: Add alias for arwikibooks namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238031 (owner: 10GergesShamon) [21:17:14] and also since your i18n rebuild will take like 50 min [21:17:34] ok, just let me know when you are done. [21:17:43] will do [21:18:27] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1238031|Add alias for arwikibooks namespace]], [[gerrit:1237902|Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). (T416779)]], [[gerrit:1237606|EditCheck: add instrumentation for checks seen during edit session (T413419 T412334)]] [21:18:33] (i was going to deploy 1237373, 999060, and jan_drewniak's 1237348 first, that shouldn't trigger i18n i don't think) [21:18:34] T416779: Change wgSiteName and wgMetaNamespace for arwikibooks - https://phabricator.wikimedia.org/T416779 [21:18:34] T413419: Append a tag to edits in which ≥1 Edit Suggestion was visible in the browser viewport - https://phabricator.wikimedia.org/T413419 [21:18:35] T412334: Add instrumentation to measure suggestion visibility within VisualEditor - https://phabricator.wikimedia.org/T412334 [21:19:04] right [21:19:38] ehh [21:19:49] actually its probably me now who needs to rebuild i18n [21:20:07] cmedes patch does add messages [21:20:14] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/1237606 [21:21:06] !log zabe@deploy2002 sync-world aborted: Backport for [[gerrit:1238031|Add alias for arwikibooks namespace]], [[gerrit:1237902|Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). (T416779)]], [[gerrit:1237606|EditCheck: add instrumentation for checks seen during edit session (T413419 T412334)]] (duration: 02m 38s) [21:21:14] lets not rebuild i18n two times [21:21:47] (03CR) 10Ryan Kemper: [C:03+1] opensearch-on-k8s: Align resources with docs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238024 (https://phabricator.wikimedia.org/T409501) (owner: 10Bking) [21:22:05] (03CR) 10Ryan Kemper: [C:03+1] "Done" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238024 (https://phabricator.wikimedia.org/T409501) (owner: 10Bking) [21:22:23] !log zabe@deploy2002 Started scap sync-world: T416779 [21:23:06] !log zabe@deploy2002 sync-world aborted: T416779 (duration: 00m 43s) [21:24:21] cscott: ok, before we end up doing two i18n rebuilds, would you prefer me to include your change in the deploy or would you prefer to include cmede's patch while you are deploying yours? [21:25:52] i was going to do the multititle extension patches (four of them) at once, but in a separate batch from the other two config patches (magic links, parsoid read views) and jan's site notices patch. [21:25:56] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on wdqs1028 - https://phabricator.wikimedia.org/T416736#11599198 (10bking) @Jclark-ctr sorry for the confusion, we recycled some old Elastic hosts into WDQS hosts in T409769 and T409769. We did this **after** asking for an expansion in T405276 , which threw off the... [21:26:14] (03CR) 10Bking: [C:03+2] opensearch-on-k8s: Align resources with docs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1238024 (https://phabricator.wikimedia.org/T409501) (owner: 10Bking) [21:26:31] i'm happy to throw in the editcheck patch along with the multititle group, or you can do that group, either one is fine by me. multititle shouldn't have any effect on production, just on beta. [21:26:32] i can do mine tomorrow morning instead if needed [21:26:57] it actually saves a little time to do it now, since we need to rebuild the i18n for the new extension anyway [21:27:03] ok sweet, thank you [21:27:17] (03PS1) 10Zabe: Revert "EditCheck: add instrumentation for checks seen during edit session" [extensions/VisualEditor] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1238032 [21:27:21] (03CR) 10Zabe: [V:03+2 C:03+2] Revert "EditCheck: add instrumentation for checks seen during edit session" [extensions/VisualEditor] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1238032 (owner: 10Zabe) [21:27:32] but i was going to do all the non-i18n config changes in a batch, just to get them done quickly. [21:28:05] <_Gerges_> @zabe Do you need me right now? I'm currently busy. Could you test the patch instead of me? [21:28:06] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1238031|Add alias for arwikibooks namespace]], [[gerrit:1237902|Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). (T416779)]] [21:28:10] T416779: Change wgSiteName and wgMetaNamespace for arwikibooks - https://phabricator.wikimedia.org/T416779 [21:28:13] sure, I can test it [21:28:21] cscott: thanks [21:28:21] 1238031 can go in the same batch as 1237373, 999060, and 1237348. [21:28:32] (03PS1) 10Zabe: Revert^2 "EditCheck: add instrumentation for checks seen during edit session" [extensions/VisualEditor] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1238033 [21:29:01] <_Gerges_> bye [21:29:47] 07Puppet, 10Gerrit: Puppet should restart Gerrit when changing it's replication config - https://phabricator.wikimedia.org/T416929 (10bd808) 03NEW [21:29:50] (03PS2) 10Zabe: Revert^2 "EditCheck: add instrumentation for checks seen during edit session" [extensions/VisualEditor] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1238033 (https://phabricator.wikimedia.org/T413419) [21:39:12] (sorry, i've lost track of our status. are we waiting for something?) [21:39:58] (03CR) 10Ladsgroup: [C:03+1] Reenable MostCategories on frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238025 (https://phabricator.wikimedia.org/T413362) (owner: 10Zabe) [21:40:04] i'm not sure to be honest [21:40:21] on scap, even though I reverted cmede's patch to avoid the i18n rebuild, it is still very slow [21:40:41] ah, i'm sorry [21:40:51] this is scapping the 1238031 patch, right? [21:41:01] yes [21:41:16] i've been spoiled by how spiderpig lets me watch an ongoing scap :) [21:41:36] (03CR) 10Andrew Bogott: [C:04-1] "After (lengthy!) discussion we are going to try leaving gerrit out of the dmz and letting wmcs access it via the normal nat for public ser" [puppet] - 10https://gerrit.wikimedia.org/r/1238021 (https://phabricator.wikimedia.org/T411895) (owner: 10Dzahn) [21:43:21] "Waiting 300 seconds for swift after full mediawiki image build (T390251)" [21:43:21] T390251: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251 [21:43:22] ... [21:43:35] I'm so sorry [21:45:26] (03PS1) 10Andrew Bogott: cloudgw: remove gerrit dmz entries [puppet] - 10https://gerrit.wikimedia.org/r/1238036 [21:46:02] FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate _etcd-server-ssl._tcp.ml_etcd.codfw.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [21:46:54] (03CR) 10Cathal Mooney: [C:03+1] "Bear in mind this might cause some momentary connection resets after the rule change, but all new connections after should be ok" [puppet] - 10https://gerrit.wikimedia.org/r/1238036 (owner: 10Andrew Bogott) [21:48:23] (03CR) 10Andrew Bogott: [C:03+2] cloudgw: remove gerrit dmz entries [puppet] - 10https://gerrit.wikimedia.org/r/1238036 (owner: 10Andrew Bogott) [21:51:57] !log zabe@deploy2002 gergesshamon, zabe: Backport for [[gerrit:1238031|Add alias for arwikibooks namespace]], [[gerrit:1237902|Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). (T416779)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:52:00] T416779: Change wgSiteName and wgMetaNamespace for arwikibooks - https://phabricator.wikimedia.org/T416779 [21:52:57] !log zabe@deploy2002 gergesshamon, zabe: Continuing with sync [21:57:14] 07Puppet, 10Gerrit: Puppet should restart Gerrit when changing it's replication config - https://phabricator.wikimedia.org/T416929#11599302 (10bd808) Now that I understand what I'm reading, https://wikitech.wikimedia.org/wiki/Gerrit/Administration#Troubleshooting documents this problem as well: > Any changes... [21:58:31] 07Puppet, 10Gerrit: Puppet should restart Gerrit when changing it's replication config - https://phabricator.wikimedia.org/T416929#11599305 (10Reedy) [22:00:05] Reedy, sbassett, Maryum, and manfredi: #bothumor My software never has bugs. It just develops random features. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260209T2200). [22:01:59] (03PS1) 10Cathal Mooney: Cloud-vrf-in: remove exception to allow Cloud VPS private IPs reach gerrit [homer/public] - 10https://gerrit.wikimedia.org/r/1238042 (https://phabricator.wikimedia.org/T411895) [22:03:08] (03PS1) 10Hashar: Revert^2 "Gerrit: Disable auto reloading replication config" [puppet] - 10https://gerrit.wikimedia.org/r/1238043 (https://phabricator.wikimedia.org/T416929) [22:03:20] Is the VE instrumentation patch still in the general queue of stuff that's going to get deployed, or do we need to come back to it later? [22:05:14] 07Puppet, 10Gerrit, 13Patch-For-Review: Puppet should restart Gerrit when changing it's replication config - https://phabricator.wikimedia.org/T416929#11599335 (10hashar) `git log -GautoReload modules/gerrit` has the history: That got disable by Paladox in October 2019 referencing "a bug". I went to enabl... [22:05:28] Depends on if zabe is ok with the window running long [22:05:35] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1238031|Add alias for arwikibooks namespace]], [[gerrit:1237902|Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). (T416779)]] (duration: 37m 29s) [22:05:38] T416779: Change wgSiteName and wgMetaNamespace for arwikibooks - https://phabricator.wikimedia.org/T416779 [22:05:41] cscott: ok, the config patch scap finished, sorry about this, no idea why it still decided to rebuild i18n [22:06:32] (03CR) 10Hashar: "See T416929. When removing the replication config for the spare Gerrit, the plugin was no more able to replicate to GitHub until Gerrit g" [puppet] - 10https://gerrit.wikimedia.org/r/1238043 (https://phabricator.wikimedia.org/T416929) (owner: 10Hashar) [22:06:33] Should we press on or call it a day? [22:07:46] 06SRE, 10SRE-Access-Requests: Requesting access to deployment, analytics-privatedata-users for ASanford-WMF - https://phabricator.wikimedia.org/T416710#11599345 (10thcipriani) >>! In T416710#11597014, @tappof wrote: > @thcipriani could you please approve the deployment group request? Thanks Approved for `depl... [22:07:50] 07Puppet, 10Gerrit, 13Patch-For-Review: Puppet should restart Gerrit when changing it's replication config - https://phabricator.wikimedia.org/T416929#11599347 (10Dzahn) > Or we should have some other mechanism to ensure that Gerrit is restarted following a replication config change. Or we should do restart... [22:07:59] If you still want to do your config patches, feel free to do them, I can do the two i18n patches otherwise, so that it will be faster tomorrow [22:08:27] I would like to get mine out today, but I can deploy myself if others need to go [22:09:35] I messed this up a bit, so I will at least do the ve instrumentation patch (and if you like yours) this evening, but I am flexibel regarding the order in which stuff happens [22:09:38] I also don't mind stepping in to handle the VE patch myself, if need be -- we've currently got three hours of probably-empty window (security, then "web team"-that-doesn't-exist). [22:10:37] If we've got time, let's start with the config patches? [22:10:42] I can do those [22:10:55] sounds good [22:13:01] (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237373 (https://phabricator.wikimedia.org/T145604) (owner: 10C. Scott Ananian) [22:13:01] (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/999060 (https://phabricator.wikimedia.org/T357054) (owner: 10C. Scott Ananian) [22:13:02] (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237348 (https://phabricator.wikimedia.org/T416644) (owner: 10Jdlrobson) [22:13:26] ok, these should be fast i expect [22:14:00] (03Merged) 10jenkins-bot: Disable magic links on nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237373 (https://phabricator.wikimedia.org/T145604) (owner: 10C. Scott Ananian) [22:14:02] (03Merged) 10jenkins-bot: Turn on Parsoid read views by default on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/999060 (https://phabricator.wikimedia.org/T357054) (owner: 10C. Scott Ananian) [22:14:05] (03Merged) 10jenkins-bot: Enable site notices on Minerva [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237348 (https://phabricator.wikimedia.org/T416644) (owner: 10Jdlrobson) [22:14:27] !log cscott@deploy2002 Started scap sync-world: Backport for [[gerrit:1237373|Disable magic links on nlwiki (T145604)]], [[gerrit:999060|Turn on Parsoid read views by default on labs (T357054)]], [[gerrit:1237348|Enable site notices on Minerva (T416644)]] [22:14:35] T145604: RFC: Future of magic links - https://phabricator.wikimedia.org/T145604 [22:14:35] T357054: Use Parsoid HTML for all page views on beta cluster - https://phabricator.wikimedia.org/T357054 [22:14:35] T416644: Baby globe will not show up on Minerva on English Wikipedia in current state - https://phabricator.wikimedia.org/T416644 [22:15:19] "0 languages rebuilt out of 545" :) [22:16:22] !log cscott@deploy2002 jdlrobson, cscott: Backport for [[gerrit:1237373|Disable magic links on nlwiki (T145604)]], [[gerrit:999060|Turn on Parsoid read views by default on labs (T357054)]], [[gerrit:1237348|Enable site notices on Minerva (T416644)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:17:54] jan_drewniak: can you test? [22:18:09] i've verified magic links on nlwiki, checking PRV on beta now... [22:19:35] cscott: thanks, not really testable to be honest (unless I want to create an sitenotice on enwiki :P ) but debug servers are not broken, so it's good to sync :) [22:19:41] (03Abandoned) 10Dzahn: cloudgw: add gerrit-lb IPs to Cloud VPS egress NAT exemption list [puppet] - 10https://gerrit.wikimedia.org/r/1238021 (https://phabricator.wikimedia.org/T411895) (owner: 10Dzahn) [22:21:39] ok, my patches look good, continuing sync [22:21:43] !log cscott@deploy2002 jdlrobson, cscott: Continuing with sync [22:22:58] (03Abandoned) 10Ryan Kemper: WDQS: separate avail SLOs per service [puppet] - 10https://gerrit.wikimedia.org/r/1230672 (https://phabricator.wikimedia.org/T393966) (owner: 10Ryan Kemper) [22:25:48] !log cscott@deploy2002 Finished scap sync-world: Backport for [[gerrit:1237373|Disable magic links on nlwiki (T145604)]], [[gerrit:999060|Turn on Parsoid read views by default on labs (T357054)]], [[gerrit:1237348|Enable site notices on Minerva (T416644)]] (duration: 11m 20s) [22:25:55] T145604: RFC: Future of magic links - https://phabricator.wikimedia.org/T145604 [22:25:56] T357054: Use Parsoid HTML for all page views on beta cluster - https://phabricator.wikimedia.org/T357054 [22:25:56] zabe: next are the four (!) config patches for Extension:MultiTitle plus the cmede/Kemayo patch, which we're going to do together because we expect it will trigger an i18n rebuild? [22:25:56] T416644: Baby globe will not show up on Minerva on English Wikipedia in current state - https://phabricator.wikimedia.org/T416644 [22:27:29] Yes [22:27:31] If you prefer to reduce the risk, we could only combine https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1224793 with cmedes patch, since that is the only one of yours that should trigger an i18n rebuild. [22:30:30] the only patch which "should" do anything is https://gerrit.wikimedia.org/r/c/1224796/ because the rest just set various config variables [22:30:30] but i'm fine with doing just the two "rebuild i18n" patches for now, and i can do the quick config stuff later. [22:31:36] I am also fine with doing all 5 together, do it how you feel more comfortable [22:31:55] since it's late lets play it safe and do just the 2 for now. [22:33:19] alright [22:33:19] 1224793+1237606 [22:33:57] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/1238033 [22:34:26] (03CR) 10Zabe: [C:03+2] Revert^2 "EditCheck: add instrumentation for checks seen during edit session" [extensions/VisualEditor] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1238033 (https://phabricator.wikimedia.org/T413419) (owner: 10Zabe) [22:34:49] (+2'ed to speed it up a bit, not intending to interfere with your deploy) [22:36:02] (03Merged) 10jenkins-bot: Revert^2 "EditCheck: add instrumentation for checks seen during edit session" [extensions/VisualEditor] (wmf/1.46.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1238033 (https://phabricator.wikimedia.org/T413419) (owner: 10Zabe) [22:38:57] (03PS1) 10C. Scott Ananian: Turn on Parsoid read views by default on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238049 (https://phabricator.wikimedia.org/T357054) [22:39:42] (03PS2) 10C. Scott Ananian: Turn on Parsoid read views by default on labs (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238049 (https://phabricator.wikimedia.org/T357054) [22:58:33] (03PS1) 10Kosta Harlan: IPReputation: Switch to OpenSearch backend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1238051 (https://phabricator.wikimedia.org/T416164) [23:14:42] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [23:21:25] 10ops-eqiad, 06SRE, 06DC-Ops: C1 post move cleanup - https://phabricator.wikimedia.org/T416747#11599628 (10VRiley-WMF) 05Open→03Resolved Removed extra cables and rails from cabinet. [23:23:22] zabe: Is that i18n rebuild still going on in the background somewhere? [23:23:57] I am currently not deploying anything [23:24:29] Maybe I missunderstood this, but I thought cscott wanted to deploy their patches? [23:24:36] If not, I can do yours now [23:24:50] (cmedes) [23:24:54] if you could, that would be great [23:25:03] Sure [23:25:18] (03CR) 10Zabe: [C:03+2] Add MultiTitle to extension list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224793 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [23:25:37] if that's okay with cscott? [23:26:10] (03Merged) 10jenkins-bot: Add MultiTitle to extension list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224793 (https://phabricator.wikimedia.org/T404461) (owner: 10Tbodt) [23:26:27] I will just boldly deploy their i18n patch aswell for now (hope thats fine) [23:26:40] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1238033|Revert^2 "EditCheck: add instrumentation for checks seen during edit session" (T413419 T412334)]], [[gerrit:1224793|Add MultiTitle to extension list (T404461)]] [23:26:46] T413419: Append a tag to edits in which ≥1 Edit Suggestion was visible in the browser viewport - https://phabricator.wikimedia.org/T413419 [23:26:46] T412334: Add instrumentation to measure suggestion visibility within VisualEditor - https://phabricator.wikimedia.org/T412334 [23:26:47] T404461: Enable Extension:MultiTitle on tok.wikipedia.org - https://phabricator.wikimedia.org/T404461 [23:51:10] is it still building? [23:51:27] !log zabe@deploy2002 tbodt, zabe: Backport for [[gerrit:1238033|Revert^2 "EditCheck: add instrumentation for checks seen during edit session" (T413419 T412334)]], [[gerrit:1224793|Add MultiTitle to extension list (T404461)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [23:51:33] T413419: Append a tag to edits in which ≥1 Edit Suggestion was visible in the browser viewport - https://phabricator.wikimedia.org/T413419 [23:51:33] T412334: Add instrumentation to measure suggestion visibility within VisualEditor - https://phabricator.wikimedia.org/T412334 [23:51:33] T404461: Enable Extension:MultiTitle on tok.wikipedia.org - https://phabricator.wikimedia.org/T404461 [23:51:35] cmede: done:) [23:51:40] woohoo! let me test [23:52:36] oh yeah, sorry, i tuned out but I thought you were doing it. thanks for doing it anyway! [23:53:29] It's all looking good for me [23:53:33] nice! [23:53:35] !log zabe@deploy2002 tbodt, zabe: Continuing with sync