[00:14:26] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:42:52] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-esams:xe-0/1/7 (Transit: Liberty Global (BB00088) {#021468}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[00:43:40] <jinxer-wm>	 FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-esams and LibertyGlobal (2001:730:2209:1::d52e:ba09) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[00:44:25] <wikibugs>	 (03PS2) 10CDanis: cli: add --sort-groups and --reverse-sort options [software/cumin] - 10https://gerrit.wikimedia.org/r/1294990
[00:45:26] <wikibugs>	 (03CR) 10CDanis: cli: add --sort-groups and --reverse-sort options (039 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/1294990 (owner: 10CDanis)
[01:00:03] <wikibugs>	 (03Abandoned) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1295788 (owner: 10TrainBranchBot)
[01:09:07] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.47.0-wmf.5 [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296056 (https://phabricator.wikimedia.org/T423914)
[01:09:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/1.47.0-wmf.5 [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296056 (https://phabricator.wikimedia.org/T423914) (owner: 10TrainBranchBot)
[01:09:14] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1296057
[01:09:14] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1296057 (owner: 10TrainBranchBot)
[01:09:14] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service lsw1-f1-codfw.mgmt.codfw.wmnet:32767 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#lsw1-f1-codfw.mgmt.codfw.wmnet:32767 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[01:10:18] <jinxer-wm>	 FIRING: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster main-codfw in codfw - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-kafka_cluster=main-codfw - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[01:16:15] <wikibugs>	 (03CR) 10C. Scott Ananian: [C:03+1] Deploy PRV to 5 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296015 (https://phabricator.wikimedia.org/T427851) (owner: 10Arlolra)
[01:21:04] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.47.0-wmf.5 [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296056 (https://phabricator.wikimedia.org/T423914) (owner: 10TrainBranchBot)
[01:22:36] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1296057 (owner: 10TrainBranchBot)
[01:32:05] <wikibugs>	 (03PS1) 10RLazarus: Copy mesh.networkpolicy 1.2.1 -> 1.2.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296063
[01:32:05] <wikibugs>	 (03PS1) 10RLazarus: Copy mesh.configuration 1.15.2 -> 1.15.3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296064
[01:32:06] <wikibugs>	 (03PS1) 10RLazarus: mesh.networkpolicy: Handle a services_proxy entry with no upstream.ips [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296065 (https://phabricator.wikimedia.org/T427863)
[01:32:07] <wikibugs>	 (03PS1) 10RLazarus: Copy mesh.service 1.2.0 -> 1.2.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296066
[01:32:07] <wikibugs>	 (03PS1) 10RLazarus: mesh.configuration: Add restricted_listeners [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296067 (https://phabricator.wikimedia.org/T427863)
[01:32:12] <wikibugs>	 (03PS1) 10RLazarus: mesh.service: Add TLS service ports for restricted_listeners [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296068 (https://phabricator.wikimedia.org/T427863)
[01:32:16] <wikibugs>	 (03PS1) 10RLazarus: function-{evaluator,orchestrator}: sextant update mesh modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296069 (https://phabricator.wikimedia.org/T427863)
[01:32:20] <wikibugs>	 (03PS1) 10RLazarus: orchestrator: Add restricted_listeners ports to network egress policy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296070 (https://phabricator.wikimedia.org/T427863)
[01:32:24] <wikibugs>	 (03PS1) 10RLazarus: wikifunctions: Add mesh.restricted_listeners port to orchestrator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296071 (https://phabricator.wikimedia.org/T427863)
[01:32:28] <wikibugs>	 (03PS1) 10RLazarus: function-evaluator: Add outgoing Envoy config and egress policy for callbacks [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296072 (https://phabricator.wikimedia.org/T427863)
[01:33:34] <wikibugs>	 (03PS1) 10RLazarus: services_proxy: "Reserve" local port 6520 for wikifunctions orchestrator [puppet] - 10https://gerrit.wikimedia.org/r/1296073 (https://phabricator.wikimedia.org/T427863)
[01:38:03] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] services_proxy: "Reserve" local port 6520 for wikifunctions orchestrator [puppet] - 10https://gerrit.wikimedia.org/r/1296073 (https://phabricator.wikimedia.org/T427863) (owner: 10RLazarus)
[01:42:40] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:43:03] <wikibugs>	 (03CR) 10Krinkle: P:cache:haproxy add image generator information (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1295921 (https://phabricator.wikimedia.org/T414338) (owner: 10Slyngshede)
[01:47:52] <jinxer-wm>	 RESOLVED: CoreRouterInterfaceDown: Core router interface down - cr2-esams:xe-0/1/7 (Transit: Liberty Global (BB00088) {#021468}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[01:48:40] <jinxer-wm>	 RESOLVED: [2x] TransitBGPDown: Transit BGP session down between cr2-esams and LibertyGlobal (2001:730:2209:1::d52e:ba09) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[02:00:05] <jouncebot>	 Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T0200)
[02:08:56] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:24:19] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2013.codfw.wmnet, wdqs2011.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[02:26:19] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[02:28:34] <wikibugs>	 (03PS2) 10RLazarus: mesh.configuration: Add restricted_listeners [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296067 (https://phabricator.wikimedia.org/T427863)
[02:28:34] <wikibugs>	 (03PS2) 10RLazarus: mesh.service: Add TLS service ports for restricted_listeners [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296068 (https://phabricator.wikimedia.org/T427863)
[02:28:34] <wikibugs>	 (03PS2) 10RLazarus: function-{evaluator,orchestrator}: sextant update mesh modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296069 (https://phabricator.wikimedia.org/T427863)
[02:28:35] <wikibugs>	 (03PS2) 10RLazarus: orchestrator: Add restricted_listeners ports to network egress policy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296070 (https://phabricator.wikimedia.org/T427863)
[02:28:36] <wikibugs>	 (03PS2) 10RLazarus: wikifunctions: Add mesh.restricted_listeners port to orchestrator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296071 (https://phabricator.wikimedia.org/T427863)
[02:28:37] <wikibugs>	 (03PS2) 10RLazarus: function-evaluator: Add outgoing Envoy config and egress policy for callbacks [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296072 (https://phabricator.wikimedia.org/T427863)
[02:29:19] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2007.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2010.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2013.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[02:29:19] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2014.codfw.wmnet, wdqs2011.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2010.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[02:31:19] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[02:33:07] <wikibugs>	 (03CR) 10RLazarus: "See https://gerrit.wikimedia.org/r/1296072 for why this is needed." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296065 (https://phabricator.wikimedia.org/T427863) (owner: 10RLazarus)
[02:33:09] <wikibugs>	 (03CR) 10RLazarus: mesh.configuration: Add restricted_listeners (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296067 (https://phabricator.wikimedia.org/T427863) (owner: 10RLazarus)
[02:34:19] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2007.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2010.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2011.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[02:35:56] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:37:19] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[02:37:19] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[03:00:05] <jouncebot>	 Deploy window Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T0300)
[03:14:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:49:26] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:00:05] <jouncebot>	 Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T0400)
[04:05:40] <logmsgbot>	 !log mwpresync@deploy1003 Pruned MediaWiki: 1.47.0-wmf.2 (duration: 05m 33s)
[04:13:56] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] dse-k8s-codfw: Add wdqs namespaces for the new deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295465 (https://phabricator.wikimedia.org/T425007) (owner: 10Trueg)
[04:22:05] <wikibugs>	 (03Merged) 10jenkins-bot: dse-k8s-codfw: Add wdqs namespaces for the new deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295465 (https://phabricator.wikimedia.org/T425007) (owner: 10Trueg)
[04:35:13] <wikibugs>	 06SRE, 06Traffic, 13Patch-For-Review: Move contact info detection at the edge to a lua module - https://phabricator.wikimedia.org/T414300#11974879 (10Joe) 05Open→03Resolved
[04:36:26] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
[04:40:43] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
[04:46:39] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[04:49:51] <ryankemper>	 !log T425007 (k8s) created 4 wdqs namespaces on `dse-k8s-codfw`'s `admin_ng` ns: `wdqs-[internal,external]` & `wdqs-[internal,external]-next`; certs issued
[04:49:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:49:54] <stashbot>	 T425007: Helm chart for wdqs-qlever and wdqs-streaming-consumer - https://phabricator.wikimedia.org/T425007
[04:56:13] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[04:59:26] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:00:32] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[05:02:08] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[05:05:52] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade
[05:06:13] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es1052: Upgrading es1052.eqiad.wmnet
[05:06:43] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es1052: Upgrading es1052.eqiad.wmnet
[05:07:26] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es1052.eqiad.wmnet with OS trixie
[05:09:14] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service lsw1-f1-codfw.mgmt.codfw.wmnet:32767 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#lsw1-f1-codfw.mgmt.codfw.wmnet:32767 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[05:10:18] <jinxer-wm>	 FIRING: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster main-codfw in codfw - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-kafka_cluster=main-codfw - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[05:14:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:19:04] <wikibugs>	 (03PS14) 10Trueg: wdqs-backend: Deployment chart for the WDQS triple-store [deployment-charts] - 10https://gerrit.wikimedia.org/r/1286374 (https://phabricator.wikimedia.org/T425007)
[05:21:01] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[05:22:43] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es1052.eqiad.wmnet with reason: host reimage
[05:25:12] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[05:26:13] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[05:28:49] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[05:29:18] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1052.eqiad.wmnet with reason: host reimage
[05:29:44] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[05:30:34] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[05:33:20] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[05:36:40] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[05:42:40] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:45:21] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1052.eqiad.wmnet with OS trixie
[05:46:22] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[05:47:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr2-eqdfw and fe80::b6f9:5dff:fe30:e538 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[05:47:21] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[05:48:44] <logmsgbot>	 marostegui@cumin1003 major-upgrade (PID 3861657) is awaiting input
[05:50:47] <logmsgbot>	 !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
[05:50:55] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es1052: repool after upgrade
[05:51:39] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T426088
[05:51:42] <stashbot>	 T426088: Switchover s7 master (db1181 -> db1236) - https://phabricator.wikimedia.org/T426088
[05:51:54] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Set db1236 with weight 0 T426088', diff saved to https://phabricator.wikimedia.org/P93470 and previous config saved to /var/cache/conftool/dbconfig/20260602-055153-marostegui.json
[05:52:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr2-eqdfw and fe80::b6f9:5dff:fe30:e538 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[05:52:17] <wikibugs>	 (03PS2) 10Gerrit maintenance bot: mariadb: Promote db1236 to s7 master [puppet] - 10https://gerrit.wikimedia.org/r/1286416 (https://phabricator.wikimedia.org/T426088)
[05:53:07] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Update s7 CNAME [dns] - 10https://gerrit.wikimedia.org/r/1296248 (https://phabricator.wikimedia.org/T426088)
[05:53:17] <wikibugs>	 (03Abandoned) 10Marostegui: wmnet: Update s7-master alias [dns] - 10https://gerrit.wikimedia.org/r/1286417 (https://phabricator.wikimedia.org/T426088) (owner: 10Gerrit maintenance bot)
[05:54:36] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Promote db1236 to s7 master [puppet] - 10https://gerrit.wikimedia.org/r/1286416 (https://phabricator.wikimedia.org/T426088) (owner: 10Gerrit maintenance bot)
[06:00:02] <marostegui>	 !log Starting s7 eqiad failover from db1181 to db1236 - T426088
[06:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T0600)
[06:00:05] <jouncebot>	 marostegui, Amir1, and federico3: How many deployers does it take to do Primary database switchover deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T0600).
[06:00:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:00:06] <stashbot>	 T426088: Switchover s7 master (db1181 -> db1236) - https://phabricator.wikimedia.org/T426088
[06:00:19] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T426088', diff saved to https://phabricator.wikimedia.org/P93471 and previous config saved to /var/cache/conftool/dbconfig/20260602-060018-marostegui.json
[06:00:42] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote db1236 to s7 primary and set section read-write T426088', diff saved to https://phabricator.wikimedia.org/P93472 and previous config saved to /var/cache/conftool/dbconfig/20260602-060041-marostegui.json
[06:01:11] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] wmnet: Update s7 CNAME [dns] - 10https://gerrit.wikimedia.org/r/1296248 (https://phabricator.wikimedia.org/T426088) (owner: 10Marostegui)
[06:01:23] <logmsgbot>	 !log marostegui@dns1004 START - running authdns-update
[06:01:58] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db1181 T426088', diff saved to https://phabricator.wikimedia.org/P93473 and previous config saved to /var/cache/conftool/dbconfig/20260602-060157-marostegui.json
[06:02:49] <logmsgbot>	 !log marostegui@dns1004 END - running authdns-update
[06:04:09] <wikibugs>	 (03PS1) 10Marostegui: db1181: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1296249 (https://phabricator.wikimedia.org/T425388)
[06:04:41] <icinga-wm>	 PROBLEM - orchestrator resolve cache non-FQDNs on dborch1002 is CRITICAL: CRITICAL: 2 non-FQDN entries in orchestrator resolve cache: https://wikitech.wikimedia.org/wiki/Orchestrator
[06:04:43] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1181: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1296249 (https://phabricator.wikimedia.org/T425388) (owner: 10Marostegui)
[06:04:59] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade
[06:05:08] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool db1181: Upgrading db1181.eqiad.wmnet
[06:05:48] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1181: Upgrading db1181.eqiad.wmnet
[06:06:41] <icinga-wm>	 RECOVERY - orchestrator resolve cache non-FQDNs on dborch1002 is OK: OK: all orchestrator resolve cache entries are FQDNs https://wikitech.wikimedia.org/wiki/Orchestrator
[06:07:26] <wikibugs>	 (03PS1) 10Jelto: miscweb: update wmf-navigator images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296250 (https://phabricator.wikimedia.org/T414405)
[06:08:33] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host db1181.eqiad.wmnet with OS trixie
[06:10:24] <wikibugs>	 (03CR) 10Jelto: [C:03+2] miscweb: update wmf-navigator images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296250 (https://phabricator.wikimedia.org/T414405) (owner: 10Jelto)
[06:12:46] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb: update wmf-navigator images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296250 (https://phabricator.wikimedia.org/T414405) (owner: 10Jelto)
[06:15:46] <logmsgbot>	 !log jelto@deploy1003 helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
[06:16:21] <logmsgbot>	 !log jelto@deploy1003 helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
[06:21:20] <logmsgbot>	 !log jelto@deploy1003 helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
[06:22:12] <logmsgbot>	 !log jelto@deploy1003 helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
[06:24:31] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: host reimage
[06:29:51] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: host reimage
[06:30:30] <wikibugs>	 (03PS1) 10Muehlenhoff: profile::firewall: Allow to provide more fine-grained access from monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1296251
[06:36:19] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1052: repool after upgrade
[06:36:45] <icinga-wm>	 PROBLEM - Host titan1002 is DOWN: PING CRITICAL - Packet loss = 100%
[06:36:52] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2045.codfw.wmnet
[06:36:57] <jinxer-wm>	 FIRING: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:37:22] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2045.codfw.wmnet
[06:37:23] <icinga-wm>	 RECOVERY - Host titan1002 is UP: PING WARNING - Packet loss = 90%, RTA = 739.19 ms
[06:37:33] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2045.codfw.wmnet
[06:38:15] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] "question inline, looks good to me!" [puppet] - 10https://gerrit.wikimedia.org/r/1295967 (https://phabricator.wikimedia.org/T412780) (owner: 10Dzahn)
[06:40:36] <logmsgbot>	 jmm@cumin2002 drain-node (PID 3495911) is awaiting input
[06:41:38] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade
[06:41:56] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1181: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1296252
[06:41:57] <jinxer-wm>	 RESOLVED: [3x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4)   - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:41:59] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es2053: Upgrading es2053.codfw.wmnet
[06:42:21] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es2053: Upgrading es2053.codfw.wmnet
[06:42:59] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db1181: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1296252 (owner: 10Marostegui)
[06:43:21] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es2053.codfw.wmnet with OS trixie
[06:46:44] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1181.eqiad.wmnet with OS trixie
[06:50:48] <wikibugs>	 (03CR) 10DCausse: [C:03+1] translate: adding separate read/write endpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294949 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko)
[06:55:26] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[06:55:32] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db1181: Migration of db1181.eqiad.wmnet completed
[06:55:34] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1180 (T426633)', diff saved to https://phabricator.wikimedia.org/P93478 and previous config saved to /var/cache/conftool/dbconfig/20260602-065533-fceratto.json
[06:57:05] <wikibugs>	 (03CR) 10Marostegui: "Just a typo and a question, can you run this PCC for also the following hosts, just in case:" [puppet] - 10https://gerrit.wikimedia.org/r/1296251 (owner: 10Muehlenhoff)
[06:59:27] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es2053.codfw.wmnet with reason: host reimage
[07:00:05] <jouncebot>	 Amir1, urbanecm, and awight: Your horoscope predicts another UTC morning backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T0700).
[07:00:05] <jouncebot>	 atsukoito: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:01:27] <atsukoito>	 hi!
[07:01:33] <dcausse>	 o/
[07:02:37] <wikibugs>	 (03PS2) 10Muehlenhoff: profile::firewall: Allow to provide more fine-grained access from monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1296251
[07:02:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Apply cluster::management role to cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1289272 (owner: 10Muehlenhoff)
[07:04:18] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2053.codfw.wmnet with reason: host reimage
[07:05:10] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] admin: upgrade Mahmoud Abdelsattar from ldap_only to shell user [puppet] - 10https://gerrit.wikimedia.org/r/1295952 (https://phabricator.wikimedia.org/T427597) (owner: 10Dzahn)
[07:12:20] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.depool depool db2241: Depool for rack maintenance
[07:12:23] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2186.codfw.wmnet with reason: upgrade
[07:12:52] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2241: Depool for rack maintenance
[07:12:54] <wikibugs>	 (03CR) 10Dpogorzelski: [C:03+1] ml-services: Bump llm ns memory quota to 256Gi. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295958 (owner: 10Bartosz Wójtowicz)
[07:13:04] <wikibugs>	 (03CR) 10Dpogorzelski: [C:03+2] ml-services: Bump llm ns memory quota to 256Gi. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295958 (owner: 10Bartosz Wójtowicz)
[07:14:47] <marostegui>	 !log Install mariadb 10.11.17 on db2186 T427345
[07:14:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:14:50] <stashbot>	 T427345: Compile and package MariaDB 10.11.17 - https://phabricator.wikimedia.org/T427345
[07:15:02] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2241.codfw.wmnet with reason: Depool for rack maintenance
[07:16:04] <wikibugs>	 (03PS1) 10Muehlenhoff: mariadb::wmf_root_client: Add cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1296255
[07:16:15] <wikibugs>	 (03PS2) 10Muehlenhoff: mariadb::wmf_root_client: Add cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1296255
[07:16:24] <atsukoito>	 dcausse: i'm ready to backport 1294949: translate: adding separate read/write endpoints | https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1294949
[07:16:54] <dcausse>	 atsukoito: sounds good
[07:17:45] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by atsuko@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294949 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko)
[07:18:38] <wikibugs>	 (03Merged) 10jenkins-bot: translate: adding separate read/write endpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294949 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko)
[07:19:21] <logmsgbot>	 !log atsuko@deploy1003 Started scap sync-world: Backport for [[gerrit:1294949|translate: adding separate read/write endpoints (T425377)]]
[07:19:25] <stashbot>	 T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s - https://phabricator.wikimedia.org/T425377
[07:19:26] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:20:47] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2053.codfw.wmnet with OS trixie
[07:21:04] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: Bump llm ns memory quota to 256Gi. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295958 (owner: 10Bartosz Wójtowicz)
[07:21:17] <atsukoito>	 dcausse: after the debug is done, i'll run mwscript https://phabricator.wikimedia.org/T425377#11915906 to test the config before proceeding
[07:21:17] <logmsgbot>	 !log atsuko@deploy1003 atsuko: Backport for [[gerrit:1294949|translate: adding separate read/write endpoints (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[07:22:12] <dcausse>	 atsukoito: ok, will test some special pages in the meantime
[07:23:06] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for db2241.codfw.wmnet
[07:23:07] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2241.codfw.wmnet
[07:23:39] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.pool pool db2241: Depool for rack maintenance
[07:24:12] <logmsgbot>	 marostegui@cumin1003 major-upgrade (PID 3876624) is awaiting input
[07:25:27] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11975162 (10ayounsi)
[07:25:50] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.depool depool pc2021: rack A3 maintenance
[07:25:50] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.parsercache
[07:26:05] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
[07:26:05] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2021: rack A3 maintenance
[07:26:48] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11975168 (10ayounsi)
[07:26:49] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on pc2021.codfw.wmnet with reason: rack A3 maintenance
[07:27:39] <logmsgbot>	 !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
[07:28:02] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1296251 (owner: 10Muehlenhoff)
[07:28:33] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.depool depool db2158: rack A3 maintenance
[07:28:54] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2158: rack A3 maintenance
[07:28:57] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T426633)', diff saved to https://phabricator.wikimedia.org/P93487 and previous config saved to /var/cache/conftool/dbconfig/20260602-072856-fceratto.json
[07:29:35] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2158.codfw.wmnet with reason: rack A3 maintenance
[07:30:39] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es2053: repool after upgrade
[07:32:37] <XioNoX>	 !log pfw1-eqiad# delete protocols bgp group Production family inet6 - T423384
[07:32:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:32:41] <stashbot>	 T423384: Investigate internal rejected prefixes - https://phabricator.wikimedia.org/T423384
[07:36:44] <atsukoito>	 dcausse and me decided on reverting 1294949, won't proceed with promoting testing to prod
[07:39:04] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P93490 and previous config saved to /var/cache/conftool/dbconfig/20260602-073904-fceratto.json
[07:39:34] <logmsgbot>	 !log atsuko@deploy1003 atsuko: Rolling back deployment
[07:40:23] <logmsgbot>	 !log atsuko@deploy1003 Finished scap sync-world: Backport for [[gerrit:1294949|translate: adding separate read/write endpoints (T425377)]] (duration: 21m 01s)
[07:40:26] <stashbot>	 T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s - https://phabricator.wikimedia.org/T425377
[07:40:42] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] "We also need to add the database grants for this host." [puppet] - 10https://gerrit.wikimedia.org/r/1296255 (owner: 10Muehlenhoff)
[07:41:01] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1181: Migration of db1181.eqiad.wmnet completed
[07:41:02] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[07:41:54] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.pool pool db1180: Pooling
[07:42:26] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1181.eqiad.wmnet with reason: Reboot
[07:43:09] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.depool depool db1181: Reboot
[07:43:14] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] "https://phabricator.wikimedia.org/T427884" [puppet] - 10https://gerrit.wikimedia.org/r/1296255 (owner: 10Muehlenhoff)
[07:43:34] <wikibugs>	 (03CR) 10Muehlenhoff: "Yes, that's for followup later. Initially we first need to get all packages properly installed on Trixie-compatible versions etc." [puppet] - 10https://gerrit.wikimedia.org/r/1296255 (owner: 10Muehlenhoff)
[07:43:36] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] mariadb::wmf_root_client: Add cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1296255 (owner: 10Muehlenhoff)
[07:44:01] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1181: Reboot
[07:44:28] <wikibugs>	 (03PS1) 10Atsuko: translate: fixing missed variable in credentials formatting closure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296262 (https://phabricator.wikimedia.org/T425377)
[07:45:19] <wikibugs>	 (03CR) 10DCausse: [C:03+1] translate: fixing missed variable in credentials formatting closure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296262 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko)
[07:45:31] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 02 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296262 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko)
[07:45:34] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 02 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296262 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko)
[07:47:19] <atsukoito>	 dcausse: applying forward fix, 1296262: translate: fixing missed variable in credentials formatting closure | https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1296262
[07:47:30] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.pool pool db1181: Pooling
[07:47:52] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by atsuko@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296262 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko)
[07:48:27] <logmsgbot>	 !log fceratto@cumin1003 END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db1181: Pooling
[07:48:46] <wikibugs>	 (03Merged) 10jenkins-bot: translate: fixing missed variable in credentials formatting closure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296262 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko)
[07:49:02] <logmsgbot>	 !log atsuko@deploy1003 Started scap sync-world: Backport for [[gerrit:1296262|translate: fixing missed variable in credentials formatting closure (T425377)]]
[07:49:05] <stashbot>	 T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s - https://phabricator.wikimedia.org/T425377
[07:50:46] <logmsgbot>	 !log atsuko@deploy1003 atsuko: Backport for [[gerrit:1296262|translate: fixing missed variable in credentials formatting closure (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[07:54:00] <wikibugs>	 (03PS1) 10Muehlenhoff: Add dbbackups profile for cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1296371
[07:54:11] <wikibugs>	 (03PS2) 10Muehlenhoff: Add dbbackups profile for cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1296371
[07:57:09] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1180: Pooling
[07:57:52] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[07:57:59] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1181 (T419635)', diff saved to https://phabricator.wikimedia.org/P93498 and previous config saved to /var/cache/conftool/dbconfig/20260602-075759-fceratto.json
[07:58:03] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[07:58:49] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
[07:59:39] <logmsgbot>	 !log atsuko@deploy1003 atsuko: Rolling back deployment
[07:59:47] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
[07:59:52] <atsukoito>	 rolling back
[08:00:11] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T419635)', diff saved to https://phabricator.wikimedia.org/P93499 and previous config saved to /var/cache/conftool/dbconfig/20260602-080011-fceratto.json
[08:02:06] <wikibugs>	 (03PS1) 10Atsuko: Revert "translate: fixing missed variable in credentials formatting closure" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296488
[08:03:31] <wikibugs>	 (03PS20) 10Ayounsi: Create cookbook to depool all services in a given rack [cookbooks] - 10https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300)
[08:03:42] <wikibugs>	 (03PS1) 10Atsuko: Revert "translate: adding separate read/write endpoints" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296489
[08:03:49] <logmsgbot>	 !log atsuko@deploy1003 Finished scap sync-world: Backport for [[gerrit:1296262|translate: fixing missed variable in credentials formatting closure (T425377)]] (duration: 14m 47s)
[08:03:53] <stashbot>	 T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s - https://phabricator.wikimedia.org/T425377
[08:07:35] <wikibugs>	 (03CR) 10CWilliams: sre.mysql.global-read-only Set all sections as RO/RW (034 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1277076 (https://phabricator.wikimedia.org/T419874) (owner: 10Federico Ceratto)
[08:08:44] <wikibugs>	 (03PS2) 10Atsuko: Revert "translate: adding separate read/write endpoints" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296488 (https://phabricator.wikimedia.org/T425377)
[08:09:09] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2241: Depool for rack maintenance
[08:09:11] <wikibugs>	 (03Abandoned) 10Atsuko: Revert "translate: adding separate read/write endpoints" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296489 (owner: 10Atsuko)
[08:09:34] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
[08:10:19] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P93502 and previous config saved to /var/cache/conftool/dbconfig/20260602-081018-fceratto.json
[08:10:29] <marostegui>	 !log Install mariadb 10.11.17 on es2053 T427345
[08:10:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:10:32] <stashbot>	 T427345: Compile and package MariaDB 10.11.17 - https://phabricator.wikimedia.org/T427345
[08:11:17] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
[08:11:59] <wikibugs>	 (03CR) 10DCausse: [C:03+1] Revert "translate: adding separate read/write endpoints" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296488 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko)
[08:12:15] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by atsuko@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296488 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko)
[08:12:29] <atsukoito>	 backporting revert
[08:13:15] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "translate: adding separate read/write endpoints" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296488 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko)
[08:13:30] <logmsgbot>	 !log atsuko@deploy1003 Started scap sync-world: Backport for [[gerrit:1296488|Revert "translate: adding separate read/write endpoints" (T425377)]]
[08:13:34] <wikibugs>	 (03CR) 10Jcrespo: [C:04-1] Add dbbackups profile for cumin2003 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1296371 (owner: 10Muehlenhoff)
[08:13:34] <stashbot>	 T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s - https://phabricator.wikimedia.org/T425377
[08:13:58] <wikibugs>	 (03PS3) 10Muehlenhoff: sre.puppet.disable-merges: New cookbook to disable Puppet merges temporarily [cookbooks] - 10https://gerrit.wikimedia.org/r/1295425 (https://phabricator.wikimedia.org/T248872)
[08:15:15] <logmsgbot>	 !log atsuko@deploy1003 atsuko: Backport for [[gerrit:1296488|Revert "translate: adding separate read/write endpoints" (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[08:16:03] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es2053: repool after upgrade
[08:16:33] <logmsgbot>	 !log atsuko@deploy1003 atsuko: Rolling back deployment
[08:17:03] <logmsgbot>	 !log atsuko@deploy1003 Finished scap sync-world: Backport for [[gerrit:1296488|Revert "translate: adding separate read/write endpoints" (T425377)]] (duration: 03m 33s)
[08:17:13] <atsukoito>	 rolling back deployment is a normal operation, we didn't want to leave the testing hanging
[08:17:26] <atsukoito>	 that concludes the backport window, thanks dcausse 
[08:17:29] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] P:idp webauthn, with database backend [puppet] - 10https://gerrit.wikimedia.org/r/1282286 (https://phabricator.wikimedia.org/T372892) (owner: 10Slyngshede)
[08:17:35] <dcausse>	 atsukoito: thanks!
[08:17:38] <wikibugs>	 (03PS3) 10Muehlenhoff: Add dbbackups profile for cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1296371
[08:18:26] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
[08:18:27] <wikibugs>	 (03CR) 10Muehlenhoff: Add dbbackups profile for cumin2003 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1296371 (owner: 10Muehlenhoff)
[08:18:43] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
[08:19:40] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
[08:19:40] <wikibugs>	 (03CR) 10Muehlenhoff: sre.puppet.disable-merges: New cookbook to disable Puppet merges temporarily (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1295425 (https://phabricator.wikimedia.org/T248872) (owner: 10Muehlenhoff)
[08:20:26] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P93504 and previous config saved to /var/cache/conftool/dbconfig/20260602-082026-fceratto.json
[08:20:43] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
[08:21:16] <wikibugs>	 (03CR) 10Jcrespo: "I won't -1 this, but I suggest to keep the general parts with sections: [], I cannot guarantee this will not cause systemd alerts because " [puppet] - 10https://gerrit.wikimedia.org/r/1296371 (owner: 10Muehlenhoff)
[08:22:33] <wikibugs>	 (03PS4) 10Muehlenhoff: Add dbbackups profile for cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1296371
[08:25:26] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] Add dbbackups profile for cumin2003 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1296371 (owner: 10Muehlenhoff)
[08:29:35] <slyngs>	 !log IDP, new configuration in preparation for webauthn
[08:29:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:07] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[08:30:34] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T419635)', diff saved to https://phabricator.wikimedia.org/P93505 and previous config saved to /var/cache/conftool/dbconfig/20260602-083033-fceratto.json
[08:30:38] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[08:30:50] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
[08:33:20] <wikibugs>	 (03PS1) 10Arnaudb: puppetserver: pull puppet via discovery record [puppet] - 10https://gerrit.wikimedia.org/r/1296495 (https://phabricator.wikimedia.org/T420184)
[08:33:20] <wikibugs>	 (03CR) 10Arnaudb: "pcc output visible here: https://puppet-compiler.wmflabs.org/output/1296495/6927/puppetserver1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1296495 (https://phabricator.wikimedia.org/T420184) (owner: 10Arnaudb)
[08:33:45] <wikibugs>	 (03CR) 10Marostegui: sre.mysql.global-read-only Set all sections as RO/RW (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1277076 (https://phabricator.wikimedia.org/T419874) (owner: 10Federico Ceratto)
[08:37:15] <urbanecm>	 !log Reset user email of Barras@votewiki to the one of Barras@SUL
[08:37:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:23] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[08:39:32] <claime>	 jouncebot: nowandnext
[08:39:32] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 20 minute(s)
[08:39:32] <jouncebot>	 In 1 hour(s) and 20 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1000)
[08:39:46] <wikibugs>	 07Puppet, 06collaboration-services, 10Gerrit, 06Infrastructure-Foundations, 13Patch-For-Review: Change puppet-merge git origin to use gerrit.discovery.wmnet instead of gerrit.wikimedia.org - https://phabricator.wikimedia.org/T420184#11975411 (10ABran-WMF) >>! In T420184#11968357, @Dzahn wrote: > The stri...
[08:41:23] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[08:43:26] <wikibugs>	 (03PS1) 10Bartosz Wójtowicz: ml-services: Bump experimental ns memory quota to 256Gi. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296500
[08:44:15] <wikibugs>	 (03CR) 10Dpogorzelski: [C:03+2] ml-services: Bump experimental ns memory quota to 256Gi. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296500 (owner: 10Bartosz Wójtowicz)
[08:46:39] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2045.codfw.wmnet
[08:47:46] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[08:49:00] <wikibugs>	 (03PS1) 10Slyngshede: Enable WebAuthN support [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1296505 (https://phabricator.wikimedia.org/T372892)
[08:50:23] <wikibugs>	 (03CR) 10Dpogorzelski: [V:03+2 C:03+2] ml-services: Bump experimental ns memory quota to 256Gi. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296500 (owner: 10Bartosz Wójtowicz)
[08:50:46] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[08:50:57] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
[08:51:52] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
[08:52:19] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
[08:52:50] <wikibugs>	 (03PS13) 10Federico Ceratto: sre.mysql.global-read-only Set all sections as RO/RW [cookbooks] - 10https://gerrit.wikimedia.org/r/1277076 (https://phabricator.wikimedia.org/T419874)
[08:53:16] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
[08:53:19] <wikibugs>	 (03CR) 10Federico Ceratto: "I added comments in the code and removed an unnecessary sleep" [cookbooks] - 10https://gerrit.wikimedia.org/r/1277076 (https://phabricator.wikimedia.org/T419874) (owner: 10Federico Ceratto)
[08:54:24] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[08:54:36] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[08:55:07] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
[08:56:32] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie
[08:56:48] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[08:56:57] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] "Please let us know when you plan to decom 2002, I would like to test backups before migration thorougly." [puppet] - 10https://gerrit.wikimedia.org/r/1296371 (owner: 10Muehlenhoff)
[08:59:16] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2250.codfw.wmnet with reason: rack A3 maintenance
[09:01:09] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2165 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/1296507 (https://phabricator.wikimedia.org/T427892)
[09:01:22] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: cr2-drmrs unexpected reboot - https://phabricator.wikimedia.org/T427600#11975473 (10cmooney) {F86180058}
[09:01:23] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2165 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/1296508 (https://phabricator.wikimedia.org/T427893)
[09:04:25] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
[09:04:32] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1187 (T426633)', diff saved to https://phabricator.wikimedia.org/P93506 and previous config saved to /var/cache/conftool/dbconfig/20260602-090432-fceratto.json
[09:06:26] <wikibugs>	 (03Abandoned) 10CWilliams: mariadb: Promote db2165 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/1296508 (https://phabricator.wikimedia.org/T427893) (owner: 10Gerrit maintenance bot)
[09:08:50] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM!" [alerts] - 10https://gerrit.wikimedia.org/r/1295805 (https://phabricator.wikimedia.org/T423384) (owner: 10Ayounsi)
[09:09:14] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service lsw1-f1-codfw.mgmt.codfw.wmnet:32767 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#lsw1-f1-codfw.mgmt.codfw.wmnet:32767 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[09:09:25] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Add RejectingBGPPrefixes alert [alerts] - 10https://gerrit.wikimedia.org/r/1295805 (https://phabricator.wikimedia.org/T423384) (owner: 10Ayounsi)
[09:09:29] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage
[09:09:49] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1258 to x3 master [puppet] - 10https://gerrit.wikimedia.org/r/1296510 (https://phabricator.wikimedia.org/T427895)
[09:09:54] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: wmnet: Update x3-master alias [dns] - 10https://gerrit.wikimedia.org/r/1296511 (https://phabricator.wikimedia.org/T427895)
[09:10:18] <jinxer-wm>	 FIRING: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster main-codfw in codfw - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-kafka_cluster=main-codfw - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[09:11:27] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T426633)', diff saved to https://phabricator.wikimedia.org/P93508 and previous config saved to /var/cache/conftool/dbconfig/20260602-091126-fceratto.json
[09:11:32] <wikibugs>	 (03Merged) 10jenkins-bot: Add RejectingBGPPrefixes alert [alerts] - 10https://gerrit.wikimedia.org/r/1295805 (https://phabricator.wikimedia.org/T423384) (owner: 10Ayounsi)
[09:13:06] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM, good thinking!" [alerts] - 10https://gerrit.wikimedia.org/r/1295919 (https://phabricator.wikimedia.org/T419298) (owner: 10Ayounsi)
[09:13:56] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:14:07] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage
[09:15:21] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.pool pool db1187: Pooling
[09:15:56] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:21:00] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add dbbackups profile for cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1296371 (owner: 10Muehlenhoff)
[09:22:39] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Configure nginx to log requests in ECS format to syslog [puppet] - 10https://gerrit.wikimedia.org/r/1287407 (https://phabricator.wikimedia.org/T425087) (owner: 10Btullis)
[09:24:39] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Add InterfaceNoDescription alert [alerts] - 10https://gerrit.wikimedia.org/r/1295919 (https://phabricator.wikimedia.org/T419298) (owner: 10Ayounsi)
[09:26:43] <wikibugs>	 (03Merged) 10jenkins-bot: Add InterfaceNoDescription alert [alerts] - 10https://gerrit.wikimedia.org/r/1295919 (https://phabricator.wikimedia.org/T419298) (owner: 10Ayounsi)
[09:28:28] <jinxer-wm>	 FIRING: KeyholderUnarmed: 2 unarmed Keyholder key(s) on cumin2003:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed
[09:30:21] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS trixie
[09:30:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus-node-textfile-export_service_type.service on cumin2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:32:55] <moritzm>	 !log temporarily remove ganeti2045 from the codfw cluster T427357
[09:32:58] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] trafficserver: Default most APIs to rest-gateway [puppet] - 10https://gerrit.wikimedia.org/r/1293699 (https://phabricator.wikimedia.org/T422937) (owner: 10Clément Goubert)
[09:32:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:33:00] <stashbot>	 T427357: codfw: rack A4 maintenance - https://phabricator.wikimedia.org/T427357
[09:33:14] <wikibugs>	 (03PS15) 10Trueg: wdqs-backend: Deployment chart for the WDQS triple-store [deployment-charts] - 10https://gerrit.wikimedia.org/r/1286374 (https://phabricator.wikimedia.org/T425007)
[09:33:29] <wikibugs>	 (03CR) 10Kosta Harlan: hCaptcha: Load self-hosted secure-api.js on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295909 (https://phabricator.wikimedia.org/T403829) (owner: 10Kosta Harlan)
[09:33:56] <wikibugs>	 (03PS1) 10Blake: mcrouter_wancache: swap mc1054 for mc1055 to enable decom [puppet] - 10https://gerrit.wikimedia.org/r/1296513 (https://phabricator.wikimedia.org/T426044)
[09:34:20] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of rpki2003.codfw.wmnet to plain
[09:34:23] <claime>	 !log Disabling puppet on A:cp-text for ATS rest-gateway cleanup - T422937
[09:34:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:34:26] <stashbot>	 T422937: Cleanup ATS configuration for API paths - https://phabricator.wikimedia.org/T422937
[09:34:58] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of rpki2003.codfw.wmnet to plain
[09:35:01] <wikibugs>	 (03PS1) 10Urbanecm: [Growth] Set wgGEMentorshipCleanupEnabled to false on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296514 (https://phabricator.wikimedia.org/T427386)
[09:35:05] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti2045 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[09:35:07] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti2045 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[09:35:12] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of netflow2004.codfw.wmnet to plain
[09:35:50] <jinxer-wm>	 FIRING: ProbeDown: Service ganeti2045:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:37:09] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1159.eqiad.wmnet with reason: Maintenance
[09:37:17] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1159 (T426633)', diff saved to https://phabricator.wikimedia.org/P93511 and previous config saved to /var/cache/conftool/dbconfig/20260602-093716-fceratto.json
[09:37:38] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow2004.codfw.wmnet to plain
[09:37:39] <claime>	 !log Running puppet on cp6010 and cp6011 - T422937
[09:37:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:40:00] <wikibugs>	 (03PS1) 10STran: Add a reply-to to Direct Reporting emails [extensions/ReportIncident] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296516 (https://phabricator.wikimedia.org/T427788)
[09:40:17] <wikibugs>	 (03PS1) 10STran: Add a reply-to to Direct Reporting emails [extensions/ReportIncident] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296517 (https://phabricator.wikimedia.org/T427788)
[09:40:56] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job routinator in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:42:40] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:43:40] <wikibugs>	 (03PS1) 10Urbanecm: growthexperiments.pp: Run cleanMentorList every 3 days [puppet] - 10https://gerrit.wikimedia.org/r/1296519 (https://phabricator.wikimedia.org/T427386)
[09:43:43] <wikibugs>	 (03CR) 10Mszwarc: [C:03+1] Add a reply-to to Direct Reporting emails [extensions/ReportIncident] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296516 (https://phabricator.wikimedia.org/T427788) (owner: 10STran)
[09:43:47] <wikibugs>	 (03CR) 10Mszwarc: [C:03+1] Add a reply-to to Direct Reporting emails [extensions/ReportIncident] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296517 (https://phabricator.wikimedia.org/T427788) (owner: 10STran)
[09:44:04] <wikibugs>	 (03PS1) 10Cathal Mooney: netops: set CR packet drop alert to paging and up timer on saturation [alerts] - 10https://gerrit.wikimedia.org/r/1296520 (https://phabricator.wikimedia.org/T384052)
[09:45:42] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1187: Pooling
[09:45:46] <wikibugs>	 (03CR) 10CI reject: [V:04-1] netops: set CR packet drop alert to paging and up timer on saturation [alerts] - 10https://gerrit.wikimedia.org/r/1296520 (https://phabricator.wikimedia.org/T384052) (owner: 10Cathal Mooney)
[09:45:56] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job routinator in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:46:13] <logmsgbot>	 !log jmm@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cumin2003.codfw.wmnet with reason: in setup
[09:46:14] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 02 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [extensions/ReportIncident] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296516 (https://phabricator.wikimedia.org/T427788) (owner: 10STran)
[09:46:26] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 02 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [extensions/ReportIncident] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296517 (https://phabricator.wikimedia.org/T427788) (owner: 10STran)
[09:49:00] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add a reply-to to Direct Reporting emails [extensions/ReportIncident] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296516 (https://phabricator.wikimedia.org/T427788) (owner: 10STran)
[09:50:50] <jinxer-wm>	 RESOLVED: ProbeDown: Service ganeti2045:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:56:11] <claime>	 !log Enabling puppet on A:cp-text for ATS rest-gateway cleanup - T422937
[09:56:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:56:15] <stashbot>	 T422937: Cleanup ATS configuration for API paths - https://phabricator.wikimedia.org/T422937
[10:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1000)
[10:00:12] <wikibugs>	 (03PS2) 10Cathal Mooney: netops: set CR packet drop alert to paging and up timer on saturation [alerts] - 10https://gerrit.wikimedia.org/r/1296520 (https://phabricator.wikimedia.org/T384052)
[10:03:21] <wikibugs>	 (03CR) 10CI reject: [V:04-1] netops: set CR packet drop alert to paging and up timer on saturation [alerts] - 10https://gerrit.wikimedia.org/r/1296520 (https://phabricator.wikimedia.org/T384052) (owner: 10Cathal Mooney)
[10:03:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1296505 (https://phabricator.wikimedia.org/T372892) (owner: 10Slyngshede)
[10:03:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] autoinstall: Switch to deb.debian.org [puppet] - 10https://gerrit.wikimedia.org/r/1295956 (https://phabricator.wikimedia.org/T416707) (owner: 10Muehlenhoff)
[10:04:41] <wikibugs>	 (03PS2) 10Jcrespo: dbbackups: Reenable read-only ES backups [puppet] - 10https://gerrit.wikimedia.org/r/1295925 (https://phabricator.wikimedia.org/T424661)
[10:05:16] <wikibugs>	 (03PS3) 10Cathal Mooney: netops: set CR packet drop alert to paging and up timer on saturation [alerts] - 10https://gerrit.wikimedia.org/r/1296520 (https://phabricator.wikimedia.org/T384052)
[10:05:19] <wikibugs>	 (03PS3) 10Jcrespo: dbbackups: Reenable read-only ES backups [puppet] - 10https://gerrit.wikimedia.org/r/1295925 (https://phabricator.wikimedia.org/T424661)
[10:05:37] <wikibugs>	 (03CR) 10Mszwarc: [C:03+1] "recheck" [extensions/ReportIncident] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296516 (https://phabricator.wikimedia.org/T427788) (owner: 10STran)
[10:06:26] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/eventstreams-internal: apply
[10:06:53] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/eventstreams-internal: apply
[10:06:58] <wikibugs>	 (03CR) 10CI reject: [V:04-1] netops: set CR packet drop alert to paging and up timer on saturation [alerts] - 10https://gerrit.wikimedia.org/r/1296520 (https://phabricator.wikimedia.org/T384052) (owner: 10Cathal Mooney)
[10:08:03] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade
[10:08:24] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es2056: Upgrading es2056.codfw.wmnet
[10:08:45] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es2056: Upgrading es2056.codfw.wmnet
[10:09:03] <wikibugs>	 (03PS4) 10Cathal Mooney: netops: set CR packet drop alert to paging and up timer on saturation [alerts] - 10https://gerrit.wikimedia.org/r/1296520 (https://phabricator.wikimedia.org/T384052)
[10:09:09] <wikibugs>	 (03PS4) 10Jcrespo: dbbackups: Reenable read-only ES backups [puppet] - 10https://gerrit.wikimedia.org/r/1295925 (https://phabricator.wikimedia.org/T424661)
[10:09:32] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es2056.codfw.wmnet with OS trixie
[10:10:44] <wikibugs>	 (03CR) 10CI reject: [V:04-1] netops: set CR packet drop alert to paging and up timer on saturation [alerts] - 10https://gerrit.wikimedia.org/r/1296520 (https://phabricator.wikimedia.org/T384052) (owner: 10Cathal Mooney)
[10:12:13] <wikibugs>	 (03PS1) 10Muehlenhoff: Inline profile::mail::smarthost into profile::mail::smarthost::wmcs (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1296528
[10:12:46] <wikibugs>	 (03PS5) 10Cathal Mooney: netops: set CR packet drop alert to paging and up timer on saturation [alerts] - 10https://gerrit.wikimedia.org/r/1296520 (https://phabricator.wikimedia.org/T384052)
[10:12:50] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: alertmanager: Add Slack alerts to public slack channel for ML team [puppet] - 10https://gerrit.wikimedia.org/r/1296529
[10:13:53] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Inline profile::mail::smarthost into profile::mail::smarthost::wmcs (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1296528 (owner: 10Muehlenhoff)
[10:20:22] <wikibugs>	 (03CR) 10Cathal Mooney: "I think this makes sense but admittedly it's tricky to get thresholds like this right so happy to discuss." [alerts] - 10https://gerrit.wikimedia.org/r/1296520 (https://phabricator.wikimedia.org/T384052) (owner: 10Cathal Mooney)
[10:21:39] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159 (T426633)', diff saved to https://phabricator.wikimedia.org/P93515 and previous config saved to /var/cache/conftool/dbconfig/20260602-102139-fceratto.json
[10:24:58] <wikibugs>	 (03PS2) 10Muehlenhoff: Inline profile::mail::smarthost into profile::mail::smarthost::wmcs (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1296528
[10:25:34] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es2056.codfw.wmnet with reason: host reimage
[10:27:14] <claime>	 !log Disabling puppet on A:cp-text for ATS rest-gateway cleanup - T422937
[10:27:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:27:17] <stashbot>	 T422937: Cleanup ATS configuration for API paths - https://phabricator.wikimedia.org/T422937
[10:28:45] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2056.codfw.wmnet with reason: host reimage
[10:31:47] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P93516 and previous config saved to /var/cache/conftool/dbconfig/20260602-103146-fceratto.json
[10:31:54] <wikibugs>	 (03CR) 10Ayounsi: "Some phrasing comments inline, overall lgtm." [alerts] - 10https://gerrit.wikimedia.org/r/1296520 (https://phabricator.wikimedia.org/T384052) (owner: 10Cathal Mooney)
[10:33:52] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1296528 (owner: 10Muehlenhoff)
[10:33:55] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] trafficserver: Route /media/math directly to restbase [puppet] - 10https://gerrit.wikimedia.org/r/1293703 (https://phabricator.wikimedia.org/T422937) (owner: 10Clément Goubert)
[10:34:04] <wikibugs>	 (03PS2) 10Clément Goubert: trafficserver: Route /media/math directly to restbase [puppet] - 10https://gerrit.wikimedia.org/r/1293703 (https://phabricator.wikimedia.org/T422937)
[10:36:07] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] trafficserver: Route /media/math directly to restbase [puppet] - 10https://gerrit.wikimedia.org/r/1293703 (https://phabricator.wikimedia.org/T422937) (owner: 10Clément Goubert)
[10:36:28] <wikibugs>	 (03PS1) 10Dreamy Jazz: hCaptcha: Deduplicate edit API detection code [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296532 (https://phabricator.wikimedia.org/T427887)
[10:36:31] <wikibugs>	 (03PS1) 10Dreamy Jazz: hCaptcha: Disable hCaptcha for DiscussionTools for the apps [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296533 (https://phabricator.wikimedia.org/T427887)
[10:36:47] <wikibugs>	 (03PS1) 10Daniel Kinzler: rest-gateway: cost limits for action=parse (shadow mode) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296534 (https://phabricator.wikimedia.org/T405472)
[10:37:09] <wikibugs>	 (03PS1) 10Btullis: dumps: web: Fix nginx ECS access log config so nginx can start [puppet] - 10https://gerrit.wikimedia.org/r/1296535 (https://phabricator.wikimedia.org/T291645)
[10:39:27] <wikibugs>	 (03CR) 10Cathal Mooney: netops: set CR packet drop alert to paging and up timer on saturation (032 comments) [alerts] - 10https://gerrit.wikimedia.org/r/1296520 (https://phabricator.wikimedia.org/T384052) (owner: 10Cathal Mooney)
[10:41:55] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P93517 and previous config saved to /var/cache/conftool/dbconfig/20260602-104154-fceratto.json
[10:42:12] <claime>	 !log Enabling puppet on A:cp-text for ATS rest-gateway cleanup - T422937
[10:42:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:42:15] <stashbot>	 T422937: Cleanup ATS configuration for API paths - https://phabricator.wikimedia.org/T422937
[10:42:28] <moritzm>	 !log installing busybox security updates
[10:42:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:45:09] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2056.codfw.wmnet with OS trixie
[10:45:10] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] mcrouter_wancache: swap mc1054 for mc1055 to enable decom [puppet] - 10https://gerrit.wikimedia.org/r/1296513 (https://phabricator.wikimedia.org/T426044) (owner: 10Blake)
[10:45:18] <wikibugs>	 (03CR) 10Btullis: "Note that the use of 'geo' is definitely a hacky workaround to obtain a literal dollar sign from an nginx config, but it is the closest th" [puppet] - 10https://gerrit.wikimedia.org/r/1296535 (https://phabricator.wikimedia.org/T291645) (owner: 10Btullis)
[10:45:37] <wikibugs>	 (03CR) 10Blake: [C:03+2] mcrouter_wancache: swap mc1054 for mc1055 to enable decom [puppet] - 10https://gerrit.wikimedia.org/r/1296513 (https://phabricator.wikimedia.org/T426044) (owner: 10Blake)
[10:45:52] <wikibugs>	 (03PS1) 10Majavah: confd: Replace deprecated fact [puppet] - 10https://gerrit.wikimedia.org/r/1296536
[10:45:52] <wikibugs>	 (03PS1) 10Majavah: confd: Add condition to prevent starting without configs [puppet] - 10https://gerrit.wikimedia.org/r/1296537 (https://phabricator.wikimedia.org/T356296)
[10:45:54] <wikibugs>	 (03PS6) 10Btullis: Configure rsyslog to forward 'dumps_http' messages to Kafka [puppet] - 10https://gerrit.wikimedia.org/r/1287374 (https://phabricator.wikimedia.org/T425087)
[10:45:54] <wikibugs>	 (03PS6) 10Btullis: logstash: Consume the ECS dumps webrequest stream from Kafka [puppet] - 10https://gerrit.wikimedia.org/r/1295917 (https://phabricator.wikimedia.org/T291645)
[10:48:28] <logmsgbot>	 marostegui@cumin1003 major-upgrade (PID 4060565) is awaiting input
[10:49:21] <wikibugs>	 (03PS15) 10Daniel Kinzler: rest gateway: implement cost-based rate limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/1228535 (https://phabricator.wikimedia.org/T412586)
[10:50:16] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8628/co" [puppet] - 10https://gerrit.wikimedia.org/r/1296537 (https://phabricator.wikimedia.org/T356296) (owner: 10Majavah)
[10:51:24] <wikibugs>	 (03CR) 10Slyngshede: [V:03+2 C:03+2] Enable WebAuthN support [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1296505 (https://phabricator.wikimedia.org/T372892) (owner: 10Slyngshede)
[10:52:02] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159 (T426633)', diff saved to https://phabricator.wikimedia.org/P93518 and previous config saved to /var/cache/conftool/dbconfig/20260602-105202-fceratto.json
[10:52:22] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[10:52:32] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[10:52:39] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1161 (T426633)', diff saved to https://phabricator.wikimedia.org/P93519 and previous config saved to /var/cache/conftool/dbconfig/20260602-105239-fceratto.json
[10:55:24] <wikibugs>	 (03PS2) 10Btullis: Add the new dse-k8s-wdqs nodes to site.pp and preseed.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1292045 (https://phabricator.wikimedia.org/T422038)
[10:55:36] <wikibugs>	 (03PS3) 10Btullis: Add the new dse-k8s-wdqs nodes to site.pp and preseed.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1292045 (https://phabricator.wikimedia.org/T422038)
[10:56:31] <wikibugs>	 (03PS1) 10Atsuko: opensearch-cluster: anonymous access for ttmsearch and toolhub [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296539 (https://phabricator.wikimedia.org/T424248)
[10:57:22] <wikibugs>	 (03CR) 10Michael Große: [C:03+1] [Growth] Set wgGEMentorshipCleanupEnabled to false on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296514 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[10:58:12] <wikibugs>	 (03CR) 10Atsuko: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1296535 (https://phabricator.wikimedia.org/T291645) (owner: 10Btullis)
[10:58:18] <wikibugs>	 (03CR) 10Atsuko: [C:03+1] dumps: web: Fix nginx ECS access log config so nginx can start [puppet] - 10https://gerrit.wikimedia.org/r/1296535 (https://phabricator.wikimedia.org/T291645) (owner: 10Btullis)
[10:59:40] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T426633)', diff saved to https://phabricator.wikimedia.org/P93520 and previous config saved to /var/cache/conftool/dbconfig/20260602-105939-fceratto.json
[11:00:29] <federico3>	 !incidents
[11:00:29] <sirenbot>	 8038 (RESOLVED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet eqsin)
[11:00:43] <wikibugs>	 (03CR) 10Btullis: [C:03+2] dumps: web: Fix nginx ECS access log config so nginx can start [puppet] - 10https://gerrit.wikimedia.org/r/1296535 (https://phabricator.wikimedia.org/T291645) (owner: 10Btullis)
[11:01:45] <logmsgbot>	 !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
[11:01:59] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es2056: repool after upgrade
[11:02:41] <wikibugs>	 (03CR) 10Btullis: [C:03+1] opensearch-cluster: anonymous access for ttmsearch and toolhub [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296539 (https://phabricator.wikimedia.org/T424248) (owner: 10Atsuko)
[11:03:02] <logmsgbot>	 !log cwilliams@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s8 T427892
[11:03:06] <stashbot>	 T427892: Switchover s8 master (db2161 -> db2165) - https://phabricator.wikimedia.org/T427892
[11:03:33] <wikibugs>	 (03CR) 10Atsuko: Add the new dse-k8s-wdqs nodes to site.pp and preseed.yaml (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1292045 (https://phabricator.wikimedia.org/T422038) (owner: 10Btullis)
[11:04:21] <logmsgbot>	 !log cwilliams@cumin1003 dbctl commit (dc=all): 'Set db2165 with weight 0 T427892', diff saved to https://phabricator.wikimedia.org/P93522 and previous config saved to /var/cache/conftool/dbconfig/20260602-110420-cwilliams.json
[11:05:10] <wikibugs>	 (03PS1) 10Dreamy Jazz: hCaptcha: Don't show AbuseFilter CAPTCHA for unsupported APIs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296550 (https://phabricator.wikimedia.org/T427608)
[11:05:51] <wikibugs>	 (03CR) 10CI reject: [V:04-1] hCaptcha: Don't show AbuseFilter CAPTCHA for unsupported APIs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296550 (https://phabricator.wikimedia.org/T427608) (owner: 10Dreamy Jazz)
[11:06:36] <wikibugs>	 (03PS4) 10Btullis: Add the new dse-k8s-wdqs nodes to site.pp and preseed.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1292045 (https://phabricator.wikimedia.org/T422038)
[11:06:44] <wikibugs>	 (03CR) 10Atsuko: [C:03+2] opensearch-cluster: anonymous access for ttmsearch and toolhub [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296539 (https://phabricator.wikimedia.org/T424248) (owner: 10Atsuko)
[11:07:11] <bjensen>	 i'll be reimaging memcached servers to Trixie, ~2 at a time or so, no impact expected; i'll be keeping an eye out for errors (feel free to holler at me if i don't notice)
[11:07:21] <wikibugs>	 (03CR) 10Btullis: Add the new dse-k8s-wdqs nodes to site.pp and preseed.yaml (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1292045 (https://phabricator.wikimedia.org/T422038) (owner: 10Btullis)
[11:08:15] <wikibugs>	 (03CR) 10Michael Große: [C:03+1] growthexperiments.pp: Run cleanMentorList every 3 days [puppet] - 10https://gerrit.wikimedia.org/r/1296519 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[11:08:56] <wikibugs>	 (03CR) 10CWilliams: [C:03+2] mariadb: Promote db2165 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/1296507 (https://phabricator.wikimedia.org/T427892) (owner: 10Gerrit maintenance bot)
[11:08:56] <wikibugs>	 (03Merged) 10jenkins-bot: opensearch-cluster: anonymous access for ttmsearch and toolhub [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296539 (https://phabricator.wikimedia.org/T424248) (owner: 10Atsuko)
[11:09:02] <wikibugs>	 (03CR) 10Dreamy Jazz: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296550 (https://phabricator.wikimedia.org/T427608) (owner: 10Dreamy Jazz)
[11:09:05] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS trixie
[11:09:24] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS trixie
[11:09:47] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P93523 and previous config saved to /var/cache/conftool/dbconfig/20260602-110947-fceratto.json
[11:10:49] <cezmunsta>	 !log Starting s8 codfw failover from db2161 to db2165 - T427892
[11:10:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:10:53] <stashbot>	 T427892: Switchover s8 master (db2161 -> db2165) - https://phabricator.wikimedia.org/T427892
[11:12:01] <logmsgbot>	 !log cwilliams@cumin1003 dbctl commit (dc=all): 'Promote db2165 to s8 primary T427892', diff saved to https://phabricator.wikimedia.org/P93524 and previous config saved to /var/cache/conftool/dbconfig/20260602-111200-cwilliams.json
[11:12:53] <wikibugs>	 (03CR) 10Atsuko: [C:03+1] Add the new dse-k8s-wdqs nodes to site.pp and preseed.yaml (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1292045 (https://phabricator.wikimedia.org/T422038) (owner: 10Btullis)
[11:14:31] <wikibugs>	 (03CR) 10Majavah: [C:03+1] designate: remove leftover mcrouter code (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1278528 (https://phabricator.wikimedia.org/T427189) (owner: 10Andrew Bogott)
[11:15:12] <logmsgbot>	 !log cwilliams@cumin1003 dbctl commit (dc=all): 'Depool db2161 T427892', diff saved to https://phabricator.wikimedia.org/P93525 and previous config saved to /var/cache/conftool/dbconfig/20260602-111511-cwilliams.json
[11:15:56] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:16:59] <icinga-wm>	 PROBLEM - orchestrator resolve cache non-FQDNs on dborch1002 is CRITICAL: CRITICAL: 2 non-FQDN entries in orchestrator resolve cache: https://wikitech.wikimedia.org/wiki/Orchestrator
[11:17:37] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] profile::firewall: Allow to provide more fine-grained access from monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1296251 (owner: 10Muehlenhoff)
[11:17:59] <icinga-wm>	 RECOVERY - orchestrator resolve cache non-FQDNs on dborch1002 is OK: OK: all orchestrator resolve cache entries are FQDNs https://wikitech.wikimedia.org/wiki/Orchestrator
[11:18:56] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:19:09] <wikibugs>	 (03PS3) 10Muehlenhoff: Inline profile::mail::smarthost into profile::mail::smarthost::wmcs [puppet] - 10https://gerrit.wikimedia.org/r/1296528
[11:19:27] <wikibugs>	 (03PS4) 10Muehlenhoff: Inline profile::mail::smarthost into profile::mail::smarthost::wmcs [puppet] - 10https://gerrit.wikimedia.org/r/1296528
[11:19:34] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Add the new dse-k8s-wdqs nodes to site.pp and preseed.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1292045 (https://phabricator.wikimedia.org/T422038) (owner: 10Btullis)
[11:19:34] <wikibugs>	 (03PS5) 10Muehlenhoff: Inline profile::mail::smarthost into profile::mail::smarthost::wmcs [puppet] - 10https://gerrit.wikimedia.org/r/1296528
[11:19:41] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1014:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:19:55] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P93527 and previous config saved to /var/cache/conftool/dbconfig/20260602-111954-fceratto.json
[11:21:23] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[11:21:26] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage
[11:21:33] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db2161: Upgrading db2161.codfw.wmnet
[11:21:43] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2161: Upgrading db2161.codfw.wmnet
[11:22:12] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage
[11:22:42] <wikibugs>	 (03PS1) 10Dreamy Jazz: hCaptcha: Enable for badlogin on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296551 (https://phabricator.wikimedia.org/T426875)
[11:23:21] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db2161.codfw.wmnet with OS trixie
[11:23:21] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1296528 (owner: 10Muehlenhoff)
[11:23:45] <wikibugs>	 (03PS1) 10JMeybohm: partman/reuse-raid10-6dev.cfg: Use linux-swap as fs identifier [puppet] - 10https://gerrit.wikimedia.org/r/1296553 (https://phabricator.wikimedia.org/T427088)
[11:25:18] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:04-1] "Until group0 is wmf.5, this is blocked" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296551 (https://phabricator.wikimedia.org/T426875) (owner: 10Dreamy Jazz)
[11:26:05] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.pool pool db1161: Repooling
[11:26:15] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1161: Repooling
[11:29:17] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage
[11:30:09] <wikibugs>	 (03PS1) 10Jelto: miscweb: fix sleep command in data-sync [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296555 (https://phabricator.wikimedia.org/T414405)
[11:30:10] <wikibugs>	 (03CR) 10Btullis: "This has now been validated as per: https://phabricator.wikimedia.org/T425087#11975964" [puppet] - 10https://gerrit.wikimedia.org/r/1287374 (https://phabricator.wikimedia.org/T425087) (owner: 10Btullis)
[11:30:12] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
[11:30:20] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1185 (T426633)', diff saved to https://phabricator.wikimedia.org/P93529 and previous config saved to /var/cache/conftool/dbconfig/20260602-113019-fceratto.json
[11:30:56] <wikibugs>	 (03CR) 10Effie Mouzeli: site.pp: add rdb2013 and rdb2014 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[11:31:55] <icinga-wm>	 PROBLEM - Memcached on mc1057 is CRITICAL: connect to address 10.64.0.197 and port 11214: Connection refused https://wikitech.wikimedia.org/wiki/Memcached
[11:32:46] <wikibugs>	 (03CR) 10Muehlenhoff: "(The PCC diff across the moved file is a bit confusing to read)" [puppet] - 10https://gerrit.wikimedia.org/r/1296528 (owner: 10Muehlenhoff)
[11:33:03] <effie>	 host is being reimaged ^, something didn't go well with downtiming 
[11:33:06] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage
[11:33:36] <wikibugs>	 (03PS2) 10Dreamy Jazz: hCaptcha: Don't show AbuseFilter CAPTCHA for wbsetclaim API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296550 (https://phabricator.wikimedia.org/T427608)
[11:37:06] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T426633)', diff saved to https://phabricator.wikimedia.org/P93531 and previous config saved to /var/cache/conftool/dbconfig/20260602-113705-fceratto.json
[11:38:32] <wikibugs>	 (03CR) 10Atsuko: [C:03+1] "Reviewed 5->6" [puppet] - 10https://gerrit.wikimedia.org/r/1287374 (https://phabricator.wikimedia.org/T425087) (owner: 10Btullis)
[11:39:28] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.14 point update - https://phabricator.wikimedia.org/T426759#11975998 (10MoritzMuehlenhoff)
[11:39:46] <wikibugs>	 (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1287374 (https://phabricator.wikimedia.org/T425087) (owner: 10Btullis)
[11:40:38] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2161.codfw.wmnet with reason: host reimage
[11:40:55] <icinga-wm>	 RECOVERY - Memcached on mc1057 is OK: TCP OK - 0.000 second response time on 10.64.0.197 port 11214 https://wikitech.wikimedia.org/wiki/Memcached
[11:41:29] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Rakefile: Run chart specific tests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282965 (https://phabricator.wikimedia.org/T424824) (owner: 10Daniel Kinzler)
[11:42:26] <wikibugs>	 (03CR) 10Jelto: [C:03+2] miscweb: fix sleep command in data-sync [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296555 (https://phabricator.wikimedia.org/T414405) (owner: 10Jelto)
[11:44:07] <wikibugs>	 (03PS2) 10JMeybohm: partman/reuse-raid10-6dev.cfg: Use linux-swap as fs identifier [puppet] - 10https://gerrit.wikimedia.org/r/1296553 (https://phabricator.wikimedia.org/T427088)
[11:44:47] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2161.codfw.wmnet with reason: host reimage
[11:45:00] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb: fix sleep command in data-sync [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296555 (https://phabricator.wikimedia.org/T414405) (owner: 10Jelto)
[11:45:09] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] ratelimite: update homepage [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294314 (https://phabricator.wikimedia.org/T426951) (owner: 10Effie Mouzeli)
[11:45:30] <wikibugs>	 (03PS1) 10Kosta Harlan: hCaptcha: Remove apiUrl health check and APCu layer from health checker [extensions/ConfirmEdit] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296558 (https://phabricator.wikimedia.org/T421464)
[11:45:52] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS trixie
[11:47:14] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P93532 and previous config saved to /var/cache/conftool/dbconfig/20260602-114713-fceratto.json
[11:47:20] <wikibugs>	 (03Merged) 10jenkins-bot: ratelimite: update homepage [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294314 (https://phabricator.wikimedia.org/T426951) (owner: 10Effie Mouzeli)
[11:47:25] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es2056: repool after upgrade
[11:47:38] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS trixie
[11:47:44] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade
[11:48:05] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es2049: Upgrading es2049.codfw.wmnet
[11:48:37] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es2049: Upgrading es2049.codfw.wmnet
[11:49:07] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es2049.codfw.wmnet with OS trixie
[11:49:31] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS trixie
[11:50:11] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS trixie
[11:51:17] <wikibugs>	 (03CR) 10Kosta Harlan: [C:03+1] hCaptcha: Disable hCaptcha for DiscussionTools for the apps [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296533 (https://phabricator.wikimedia.org/T427887) (owner: 10Dreamy Jazz)
[11:51:46] <wikibugs>	 (03CR) 10Kosta Harlan: hCaptcha: Enable for badlogin on group0 wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296551 (https://phabricator.wikimedia.org/T426875) (owner: 10Dreamy Jazz)
[11:51:53] <wikibugs>	 (03PS2) 10Dreamy Jazz: hCaptcha: Enable for badlogin on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296551 (https://phabricator.wikimedia.org/T426875)
[11:52:13] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:04-1] hCaptcha: Enable for badlogin on group0 wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296551 (https://phabricator.wikimedia.org/T426875) (owner: 10Dreamy Jazz)
[11:53:05] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[11:53:13] <wikibugs>	 (03CR) 10Muehlenhoff: partman/reuse-raid10-6dev.cfg: Use linux-swap as fs identifier (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1296553 (https://phabricator.wikimedia.org/T427088) (owner: 10JMeybohm)
[11:53:29] <logmsgbot>	 !log jelto@deploy1003 helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
[11:53:48] <logmsgbot>	 !log jelto@deploy1003 helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
[11:54:01] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] dbbackups: Reenable read-only ES backups [puppet] - 10https://gerrit.wikimedia.org/r/1295925 (https://phabricator.wikimedia.org/T424661) (owner: 10Jcrespo)
[11:54:04] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1295925 (https://phabricator.wikimedia.org/T424661) (owner: 10Jcrespo)
[11:55:00] <logmsgbot>	 !log jelto@deploy1003 helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
[11:55:29] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[11:55:34] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[11:55:42] <logmsgbot>	 !log jelto@deploy1003 helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
[11:57:21] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P93535 and previous config saved to /var/cache/conftool/dbconfig/20260602-115721-fceratto.json
[11:58:24] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[12:00:04] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1200)
[12:00:21] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage
[12:01:48] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2161.codfw.wmnet with OS trixie
[12:02:17] <Dreamy_Jazz>	 jouncebot: nowandnext
[12:02:17] <jouncebot>	 For the next 0 hour(s) and 57 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1200)
[12:02:17] <jouncebot>	 In 0 hour(s) and 57 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1300)
[12:02:26] <Dreamy_Jazz>	 Anyone using scap in this window?
[12:02:30] <wikibugs>	 (03PS1) 10Mpostoronca: wmf-config: Disable hCaptcha for action=mcrundo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296557 (https://phabricator.wikimedia.org/T427612)
[12:02:42] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage
[12:03:42] <wikibugs>	 (03CR) 10Muehlenhoff: site.pp: add rdb2013 and rdb2014 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[12:04:39] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] partman/reuse-raid10-6dev.cfg: Use linux-swap as fs identifier [puppet] - 10https://gerrit.wikimedia.org/r/1296553 (https://phabricator.wikimedia.org/T427088) (owner: 10JMeybohm)
[12:04:41] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage
[12:04:49] <wikibugs>	 (03CR) 10Dreamy Jazz: wmf-config: Disable hCaptcha for action=mcrundo (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296557 (https://phabricator.wikimedia.org/T427612) (owner: 10Mpostoronca)
[12:05:19] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296532 (https://phabricator.wikimedia.org/T427887) (owner: 10Dreamy Jazz)
[12:05:19] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es2049.codfw.wmnet with reason: host reimage
[12:05:20] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296533 (https://phabricator.wikimedia.org/T427887) (owner: 10Dreamy Jazz)
[12:06:40] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] dbbackups: Reenable read-only ES backups [puppet] - 10https://gerrit.wikimedia.org/r/1295925 (https://phabricator.wikimedia.org/T424661) (owner: 10Jcrespo)
[12:06:45] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Deduplicate edit API detection code [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296532 (https://phabricator.wikimedia.org/T427887) (owner: 10Dreamy Jazz)
[12:06:54] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Disable hCaptcha for DiscussionTools for the apps [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296533 (https://phabricator.wikimedia.org/T427887) (owner: 10Dreamy Jazz)
[12:07:09] <logmsgbot>	 !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1296532|hCaptcha: Deduplicate edit API detection code (T427887)]], [[gerrit:1296533|hCaptcha: Disable hCaptcha for DiscussionTools for the apps (T427887)]]
[12:07:11] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2011,2033-2034,2050,2055-2062,2068-2071,2107-2113].codfw.wmnet
[12:07:16] <stashbot>	 T427887: Cannot publish DiscussionTools reply on Android App - https://phabricator.wikimedia.org/T427887
[12:07:29] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T426633)', diff saved to https://phabricator.wikimedia.org/P93536 and previous config saved to /var/cache/conftool/dbconfig/20260602-120728-fceratto.json
[12:07:48] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
[12:07:51] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage
[12:07:56] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1200 (T426633)', diff saved to https://phabricator.wikimedia.org/P93537 and previous config saved to /var/cache/conftool/dbconfig/20260602-120755-fceratto.json
[12:08:09] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Wikidata Platform Team, and 2 others: Q4:rack/setup/install dse-k8s-wdqs100[1-3] (formerly wdqs103[6-8]) - https://phabricator.wikimedia.org/T423314#11976085 (10Jclark-ctr) # Dedicated dse-k8s workers for production WDQS in codfw - See #T425653 node /^dse-k8s-wdqs200[1-4]\.c...
[12:08:18] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] partman/reuse-raid10-6dev.cfg: Use linux-swap as fs identifier (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1296553 (https://phabricator.wikimedia.org/T427088) (owner: 10JMeybohm)
[12:09:00] <logmsgbot>	 !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1296532|hCaptcha: Deduplicate edit API detection code (T427887)]], [[gerrit:1296533|hCaptcha: Disable hCaptcha for DiscussionTools for the apps (T427887)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[12:09:56] <logmsgbot>	 !log ayounsi@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Switch maintenance
[12:09:59] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2161: Migration of db2161.codfw.wmnet completed
[12:11:05] <logmsgbot>	 !log ayounsi@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lsw1-a3-codfw,lsw1-a3-codfw IPv6,lsw1-a3-codfw.mgmt with reason: Switch maintenance
[12:11:44] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2049.codfw.wmnet with reason: host reimage
[12:11:54] <logmsgbot>	 !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment
[12:13:25] <wikibugs>	 (03PS1) 10Reedy: Add a maintenance script to delete old files [extensions/timeline] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296560
[12:13:42] <wikibugs>	 (03PS1) 10Reedy: Add a maintenance script to delete old files [extensions/timeline] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296561
[12:14:52] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T426633)', diff saved to https://phabricator.wikimedia.org/P93539 and previous config saved to /var/cache/conftool/dbconfig/20260602-121451-fceratto.json
[12:16:11] <logmsgbot>	 !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1296532|hCaptcha: Deduplicate edit API detection code (T427887)]], [[gerrit:1296533|hCaptcha: Disable hCaptcha for DiscussionTools for the apps (T427887)]] (duration: 09m 02s)
[12:16:18] <stashbot>	 T427887: Cannot publish DiscussionTools reply on Android App - https://phabricator.wikimedia.org/T427887
[12:17:34] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS trixie
[12:18:39] <wikibugs>	 (03PS3) 10Slyngshede: P:cache:haproxy add image generator information [puppet] - 10https://gerrit.wikimedia.org/r/1295921 (https://phabricator.wikimedia.org/T414338)
[12:19:06] <wikibugs>	 (03CR) 10Slyngshede: "Documentation created: https://wikitech.wikimedia.org/wiki/X-Image-Generator" [puppet] - 10https://gerrit.wikimedia.org/r/1295921 (https://phabricator.wikimedia.org/T414338) (owner: 10Slyngshede)
[12:20:06] <wikibugs>	 (03PS1) 10Bartosz Wójtowicz: ml-services: Bump outlink-topic-model image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296562
[12:20:06] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS trixie
[12:20:47] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Remove workaround for stuck session cookies on Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296563 (https://phabricator.wikimedia.org/T389433)
[12:20:51] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2011,2033-2034,2050,2055-2062,2068-2071,2107-2113].codfw.wmnet
[12:20:51] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS trixie
[12:21:06] <wikibugs>	 (03PS1) 10Btullis: Fix the hostnames for dse-k8s-wdqs100[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/1296564 (https://phabricator.wikimedia.org/T423314)
[12:21:28] <XioNoX>	 !log reboot lsw1-a3-codfw for software upgrade - T427301
[12:21:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:21:32] <stashbot>	 T427301: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301
[12:21:46] <XioNoX>	 Shutdown at Tue Jun  2 12:22:36 2026
[12:22:35] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Fix the hostnames for dse-k8s-wdqs100[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/1296564 (https://phabricator.wikimedia.org/T423314) (owner: 10Btullis)
[12:23:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[12:24:38] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS trixie
[12:24:42] <icinga-wm>	 PROBLEM - Router interfaces on mr1-codfw is CRITICAL: CRITICAL: host 208.80.153.196, interfaces up: 32, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:24:52] <icinga-wm>	 PROBLEM - BFD status on ssw1-a1-codfw.mgmt is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:24:52] <icinga-wm>	 PROBLEM - BFD status on ssw1-a8-codfw.mgmt is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:25:00] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P93542 and previous config saved to /var/cache/conftool/dbconfig/20260602-122459-fceratto.json
[12:25:27] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Clean up bot password configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296566
[12:25:35] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 02 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296563 (https://phabricator.wikimedia.org/T389433) (owner: 10Bartosz Dziewoński)
[12:25:53] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 02 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296566 (owner: 10Bartosz Dziewoński)
[12:26:28] <wikibugs>	 (03CR) 10Majavah: "there's a minor diff if you scroll to the file resource all the way down" [puppet] - 10https://gerrit.wikimedia.org/r/1296528 (owner: 10Muehlenhoff)
[12:26:32] <wikibugs>	 (03CR) 10Majavah: [C:04-1] Inline profile::mail::smarthost into profile::mail::smarthost::wmcs [puppet] - 10https://gerrit.wikimedia.org/r/1296528 (owner: 10Muehlenhoff)
[12:26:39] <jinxer-wm>	 FIRING: [2x] CoreBGPDown: Core BGP session down between ssw1-a1-codfw and lsw1-a3-codfw (10.192.252.5) - group EVPN_IBGP - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[12:26:51] <jinxer-wm>	 FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-a1-codfw:et-0/0/2 (Core: lsw1-a3-codfw:et-0/0/55 {#230403800027}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[12:27:19] <wikibugs>	 (03CR) 10AikoChou: [C:03+1] ml-services: Bump outlink-topic-model image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296562 (owner: 10Bartosz Wójtowicz)
[12:27:43] <wikibugs>	 (03CR) 10Bartosz Wójtowicz: [C:03+2] ml-services: Bump outlink-topic-model image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296562 (owner: 10Bartosz Wójtowicz)
[12:28:14] <kostajh>	 jouncebot: nowandnext
[12:28:14] <jouncebot>	 For the next 0 hour(s) and 31 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1200)
[12:28:14] <jouncebot>	 In 0 hour(s) and 31 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1300)
[12:28:41] <wikibugs>	 (03PS1) 10Kosta Harlan: hCaptcha: Remove apiUrl health check and APCu layer from health checker [extensions/ConfirmEdit] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296568 (https://phabricator.wikimedia.org/T421464)
[12:28:44] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2049.codfw.wmnet with OS trixie
[12:28:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[12:28:56] <wikibugs>	 (03PS1) 10Slyngshede: Geo-maps: Update Meta mapping for June 2026 [dns] - 10https://gerrit.wikimedia.org/r/1296569
[12:28:56] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:29:29] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS trixie
[12:29:46] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: Bump outlink-topic-model image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296562 (owner: 10Bartosz Wójtowicz)
[12:31:41] <logmsgbot>	 !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
[12:31:53] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es2049: repool after upgrade
[12:33:26] <XioNoX>	 switch is back up
[12:33:34] <XioNoX>	 11min downtime
[12:33:40] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage
[12:33:51] <logmsgbot>	 !log bwojtowicz@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
[12:35:06] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P93545 and previous config saved to /var/cache/conftool/dbconfig/20260602-123505-fceratto.json
[12:35:35] <wikibugs>	 (03PS7) 10Arnaudb: trafficserver: add a map for gitlab as a backend [puppet] - 10https://gerrit.wikimedia.org/r/1290731 (https://phabricator.wikimedia.org/T425441)
[12:35:44] <logmsgbot>	 !log blake@cumin1003 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage
[12:35:44] <icinga-wm>	 RECOVERY - Router interfaces on mr1-codfw is OK: OK: host 208.80.153.196, interfaces up: 33, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:35:54] <icinga-wm>	 RECOVERY - BFD status on ssw1-a8-codfw.mgmt is OK: UP: 17 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:35:54] <icinga-wm>	 RECOVERY - BFD status on ssw1-a1-codfw.mgmt is OK: UP: 17 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:36:51] <jinxer-wm>	 RESOLVED: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-a1-codfw:et-0/0/2 (Core: lsw1-a3-codfw:et-0/0/55 {#230403800027}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[12:38:32] <wikibugs>	 (03CR) 10Arnaudb: trafficserver: add a map for gitlab as a backend (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1290731 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb)
[12:38:43] <wikibugs>	 (03PS1) 10Arnaudb: cache_text: add gitlab-https to realservers [puppet] - 10https://gerrit.wikimedia.org/r/1296572 (https://phabricator.wikimedia.org/T425441)
[12:38:56] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:39:27] <wikibugs>	 (03PS3) 10Anzx: cswiki: lift IP cap for workshop on 08-June-2026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295574 (https://phabricator.wikimedia.org/T427678)
[12:39:28] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Configure rsyslog to forward 'dumps_http' messages to Kafka [puppet] - 10https://gerrit.wikimedia.org/r/1287374 (https://phabricator.wikimedia.org/T425087) (owner: 10Btullis)
[12:39:49] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 02 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295574 (https://phabricator.wikimedia.org/T427678) (owner: 10Anzx)
[12:40:44] <icinga-wm>	 PROBLEM - Router interfaces on mr1-codfw is CRITICAL: CRITICAL: host 208.80.153.196, interfaces up: 32, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:41:13] <topranks>	 !log enable bgp graceful-shutdown in underlay on ssw1-a1-codfw T427301
[12:41:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:41:17] <stashbot>	 T427301: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301
[12:41:39] <jinxer-wm>	 RESOLVED: [2x] CoreBGPDown: Core BGP session down between ssw1-a1-codfw and lsw1-a3-codfw (10.192.252.5) - group EVPN_IBGP - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[12:42:10] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage
[12:42:13] <logmsgbot>	 !log blake@cumin1003 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage
[12:42:52] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[12:43:26] <icinga-wm>	 PROBLEM - Backup freshness on backup1014 is CRITICAL: All failures: 1 (backup2013), Fresh: 138 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[12:43:29] <logmsgbot>	 !log blake@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc1060.eqiad.wmnet with OS trixie
[12:43:32] <wikibugs>	 (03PS1) 10Dpogorzelski: ml-serve: add node labels [puppet] - 10https://gerrit.wikimedia.org/r/1296574
[12:44:14] <wikibugs>	 (03PS1) 10Majavah: P:syslog: centralserver: Migrate to firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/1296577
[12:44:22] <wikibugs>	 (03PS2) 10Dpogorzelski: ml-serve: add node labels [puppet] - 10https://gerrit.wikimedia.org/r/1296574
[12:45:13] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T426633)', diff saved to https://phabricator.wikimedia.org/P93547 and previous config saved to /var/cache/conftool/dbconfig/20260602-124512-fceratto.json
[12:45:34] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
[12:45:39] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Review of firewall services without srange - https://phabricator.wikimedia.org/T149804#11976265 (10MoritzMuehlenhoff)
[12:45:42] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1207 (T426633)', diff saved to https://phabricator.wikimedia.org/P93548 and previous config saved to /var/cache/conftool/dbconfig/20260602-124541-fceratto.json
[12:46:06] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8629/co" [puppet] - 10https://gerrit.wikimedia.org/r/1296577 (owner: 10Majavah)
[12:46:37] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: CPU1 thermal fault for wdqs1015.eqiad.wmnet - https://phabricator.wikimedia.org/T427852#11976268 (10Jclark-ctr) @RKemper @wiki_willy  I have gone through all decommissioned servers and do not have a matching Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz availabl...
[12:46:40] <wikibugs>	 (03PS21) 10Ayounsi: Create cookbook to depool all services in a given rack [cookbooks] - 10https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300)
[12:47:18] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS trixie
[12:48:15] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.hosts.remove-downtime for lsw1-a3-codfw,lsw1-a3-codfw IPv6,lsw1-a3-codfw.mgmt
[12:48:17] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lsw1-a3-codfw,lsw1-a3-codfw IPv6,lsw1-a3-codfw.mgmt
[12:49:55] <logmsgbot>	 !log blake@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc1061.eqiad.wmnet with OS trixie
[12:50:08] <topranks>	 !log enable bgp graceful-shutdown in overlay on ssw1-a1-codfw T427301
[12:50:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:50:12] <stashbot>	 T427301: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301
[12:51:59] <wikibugs>	 (03CR) 10Bartosz Wójtowicz: [C:03+1] ml-serve: add node labels [puppet] - 10https://gerrit.wikimedia.org/r/1296574 (owner: 10Dpogorzelski)
[12:52:23] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T426633)', diff saved to https://phabricator.wikimedia.org/P93550 and previous config saved to /var/cache/conftool/dbconfig/20260602-125223-fceratto.json
[12:52:55] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[12:53:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1295023 (owner: 10Elukey)
[12:54:55] <topranks>	 !log shutdown sub-interfaces on cr1-codfw et-1/1/5 for row A/B vlans T427301
[12:54:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:29] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1296577 (owner: 10Majavah)
[12:55:32] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2161: Migration of db2161.codfw.wmnet completed
[12:55:33] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[12:55:59] <wikibugs>	 (03CR) 10Dpogorzelski: [C:03+2] ml-serve: add node labels [puppet] - 10https://gerrit.wikimedia.org/r/1296574 (owner: 10Dpogorzelski)
[12:57:36] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
[12:57:40] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
[12:57:44] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
[12:57:51] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
[12:58:19] <Msz2001>	 I have a patch to deploy this window, I'll be back at my computer in 10-15 mins :)
[12:59:02] <wikibugs>	 (03CR) 10Arnaudb: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1290731 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb)
[12:59:48] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage
[13:00:05] <jouncebot>	 Lucas_WMDE, urbanecm, and TheresNoTime: It is that lovely time of the day again! You are hereby commanded to deploy UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1300).
[13:00:05] <jouncebot>	 MatmaRex, Msz2001, and anzx: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:09] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] "Noting down to remember to revert it. Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1296036 (https://phabricator.wikimedia.org/T418200) (owner: 10Scott French)
[13:00:13] <MatmaRex>	 hi
[13:00:31] <anzx>	 o/
[13:00:31] <MatmaRex>	 my config changes are all no-ops / cleanups, i need someone to deploy for me :)
[13:00:45] <wikibugs>	 (03PS1) 10Dreamy Jazz: Use the globalblock-local-status right over globalblock-whitelist [extensions/GlobalBlocking] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296582 (https://phabricator.wikimedia.org/T277942)
[13:01:10] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: CPU1 thermal fault for wdqs1015.eqiad.wmnet - https://phabricator.wikimedia.org/T427852#11976331 (10Jclark-ctr) I did attempt the firmware updates, but after rebooting, the server became unresponsive and will not boot.  At this point, I would need a compatibl...
[13:01:49] <wikibugs>	 (03PS10) 10Daniel Kinzler: Rakefile: Run chart specific tests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282965 (https://phabricator.wikimedia.org/T424824)
[13:02:25] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:02:31] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P93553 and previous config saved to /var/cache/conftool/dbconfig/20260602-130230-fceratto.json
[13:02:39] <logmsgbot>	 !log cwilliams@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Reimaging upstream servers
[13:02:50] <wikibugs>	 (03PS3) 10Anzx: Add kha to wmgExtraLanguageNames [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296580 (https://phabricator.wikimedia.org/T427917)
[13:03:06] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 02 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296580 (https://phabricator.wikimedia.org/T427917) (owner: 10Anzx)
[13:03:12] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage
[13:03:12] <topranks>	 !log increase OSPF cost on ssw1-a1-codfw et-0/0/2 towards lsw1-a3-codfw T427301
[13:03:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:17] <stashbot>	 T427301: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301
[13:03:18] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs1001.eqiad.wmnet with OS trixie
[13:03:32] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Wikidata Platform Team, and 2 others: Q4:rack/setup/install dse-k8s-wdqs100[1-3] (formerly wdqs103[6-8]) - https://phabricator.wikimedia.org/T423314#11976336 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host dse-k8s-wdqs1001...
[13:03:38] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Rakefile: Run chart specific tests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282965 (https://phabricator.wikimedia.org/T424824) (owner: 10Daniel Kinzler)
[13:03:43] <Dreamy_Jazz>	 I can deploy
[13:03:44] <logmsgbot>	 !log cwilliams@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on clouddb[1022-1023].eqiad.wmnet with reason: Reimaging upstream servers
[13:03:46] <wikibugs>	 (03PS2) 10Mpostoronca: wmf-config: Skip CAPTCHA for action=mcrundo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296557 (https://phabricator.wikimedia.org/T427612)
[13:04:19] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[13:04:21] <wikibugs>	 (03CR) 10Mpostoronca: wmf-config: Skip CAPTCHA for action=mcrundo (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296557 (https://phabricator.wikimedia.org/T427612) (owner: 10Mpostoronca)
[13:04:22] <logmsgbot>	 !log cwilliams@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
[13:04:41] <wikibugs>	 (03PS11) 10Daniel Kinzler: Rakefile: Run chart specific tests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282965 (https://phabricator.wikimedia.org/T424824)
[13:04:56] <logmsgbot>	 !log jayme@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2006.codfw.wmnet with OS trixie
[13:06:04] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS trixie
[13:06:24] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] Clean up bot password configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296566 (owner: 10Bartosz Dziewoński)
[13:06:35] <Dreamy_Jazz>	 Still looking over the changes
[13:07:25] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs1002.eqiad.wmnet with OS trixie
[13:07:27] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs1003.eqiad.wmnet with OS trixie
[13:07:44] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Wikidata Platform Team, and 2 others: Q4:rack/setup/install dse-k8s-wdqs100[1-3] (formerly wdqs103[6-8]) - https://phabricator.wikimedia.org/T423314#11976362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host dse-k8s-wdqs1002...
[13:07:45] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Wikidata Platform Team, and 2 others: Q4:rack/setup/install dse-k8s-wdqs100[1-3] (formerly wdqs103[6-8]) - https://phabricator.wikimedia.org/T423314#11976363 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host dse-k8s-wdqs1003...
[13:08:05] <Dreamy_Jazz>	 anzx: For your IP rate limit change is the start time as expected?
[13:08:14] <Dreamy_Jazz>	 On the task I see 07:00 UTC+2
[13:08:30] <Dreamy_Jazz>	 But the throttle start time appears to be an hour earlier?
[13:08:37] <Dreamy_Jazz>	 06:00 +2:00
[13:09:00] <anzx>	 yeah just to be safe , i set 1 hour early 
[13:09:13] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] "Going off the comment saying it's safe to remove, this looks fine" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296563 (https://phabricator.wikimedia.org/T389433) (owner: 10Bartosz Dziewoński)
[13:09:14] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service lsw1-f1-codfw.mgmt.codfw.wmnet:32767 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#lsw1-f1-codfw.mgmt.codfw.wmnet:32767 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[13:09:21] <Dreamy_Jazz>	 Sure, thanks
[13:10:18] <jinxer-wm>	 FIRING: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster main-codfw in codfw - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-kafka_cluster=main-codfw - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[13:10:34] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+2] .fixtures: remove erroneously committed file [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295949 (owner: 10Kamila Součková)
[13:11:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295502 (owner: 10Bartosz Dziewoński)
[13:11:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1283106 (owner: 10Bartosz Dziewoński)
[13:11:04] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296566 (owner: 10Bartosz Dziewoński)
[13:11:04] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296563 (https://phabricator.wikimedia.org/T389433) (owner: 10Bartosz Dziewoński)
[13:11:05] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295574 (https://phabricator.wikimedia.org/T427678) (owner: 10Anzx)
[13:11:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/GlobalBlocking] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296582 (https://phabricator.wikimedia.org/T277942) (owner: 10Dreamy Jazz)
[13:11:42] <logmsgbot>	 !log atsuko@deploy1003 helmfile [eqiad] START helmfile.d/services/eventstreams: apply
[13:11:46] <Dreamy_Jazz>	 Doing all but Msz2001's changes
[13:11:54] <Dreamy_Jazz>	 (Msz2001 should be able to self deploy)
[13:12:13] <logmsgbot>	 !log atsuko@deploy1003 helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
[13:12:16] <wikibugs>	 (03CR) 10Majavah: [V:03+1 C:03+2] P:syslog: centralserver: Migrate to firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/1296577 (owner: 10Majavah)
[13:12:21] <logmsgbot>	 !log atsuko@deploy1003 helmfile [codfw] START helmfile.d/services/eventstreams: apply
[13:12:39] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P93554 and previous config saved to /var/cache/conftool/dbconfig/20260602-131238-fceratto.json
[13:12:48] <logmsgbot>	 !log atsuko@deploy1003 helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
[13:13:03] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[13:13:23] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db1167: Upgrading db1167.eqiad.wmnet
[13:13:28] <Dreamy_Jazz>	 gate-and-submit is slow today, so may be a while before it gets started
[13:13:53] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1167: Upgrading db1167.eqiad.wmnet
[13:14:16] <MatmaRex>	 thanks Dreamy_Jazz
[13:14:22] <wikibugs>	 (03PS2) 10Kamila Součková: CI: Fix CI pass on template render fail [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295947 (https://phabricator.wikimedia.org/T427307)
[13:14:51] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] Geo-maps: Update Meta mapping for June 2026 [dns] - 10https://gerrit.wikimedia.org/r/1296569 (owner: 10Slyngshede)
[13:14:58] <wikibugs>	 (03CR) 10Kamila Součková: CI: Fix CI pass on template render fail (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295947 (https://phabricator.wikimedia.org/T427307) (owner: 10Kamila Součková)
[13:15:06] <Dreamy_Jazz>	 Actually seems zuul has stopped processing
[13:15:31] <logmsgbot>	 !log bwojtowicz@deploy1003 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
[13:15:34] <Reedy>	 castor being castor
[13:15:57] <Reedy>	 globalblocking I guess is just backed up
[13:16:19] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db1167.eqiad.wmnet with OS trixie
[13:16:28] <wikibugs>	 (03PS1) 10Slyngshede: P:cumin:master remove liberica alias for eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1296587
[13:16:35] <Dreamy_Jazz>	 https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php83-selenium/57912/console is complete, but isn't being reflected as such in zuul
[13:16:45] <Msz2001>	 I'm back
[13:17:00] <logmsgbot>	 !log bwojtowicz@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
[13:17:06] <Dreamy_Jazz>	 Hi Msz2001, seems like zuul / CI is having a bad day
[13:17:18] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es2049: repool after upgrade
[13:17:20] <Dreamy_Jazz>	 Still waiting for any gate-and-submit jobs to start
[13:17:31] <Msz2001>	 ouch...
[13:17:52] <Msz2001>	 An hour ago it worked
[13:18:44] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 13Patch-For-Review: Requesting access to <Superset> for <APDube-WMF> - https://phabricator.wikimedia.org/T427553#11976402 (10Raine)
[13:18:53] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage
[13:19:46] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS trixie
[13:19:51] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS trixie
[13:20:19] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS trixie
[13:22:33] <wikibugs>	 (03PS2) 10Kamila Součková: admin: add apdube-wmf user [puppet] - 10https://gerrit.wikimedia.org/r/1295979 (https://phabricator.wikimedia.org/T427553)
[13:22:46] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T426633)', diff saved to https://phabricator.wikimedia.org/P93557 and previous config saved to /var/cache/conftool/dbconfig/20260602-132246-fceratto.json
[13:23:06] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
[13:23:14] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1210 (T426633)', diff saved to https://phabricator.wikimedia.org/P93558 and previous config saved to /var/cache/conftool/dbconfig/20260602-132314-fceratto.json
[13:23:49] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage
[13:24:40] <topranks>	 !log increase OSPF cost on ssw1-a1-codfw et-0/0/4 towards lsw1-a5-codfw T427301
[13:24:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:43] <stashbot>	 T427301: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301
[13:25:05] <Dreamy_Jazz>	 Looks like zuul is completely stopped, I've posted in #wikimedia-releng and will see about getting it working again
[13:25:29] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Geo-maps: Update Meta mapping for June 2026 [dns] - 10https://gerrit.wikimedia.org/r/1296569 (owner: 10Slyngshede)
[13:25:38] <logmsgbot>	 !log slyngshede@dns1004 START - running authdns-update
[13:26:16] <icinga-wm>	 PROBLEM - Host db2175 #page is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:18] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Wikidata Platform Team, and 2 others: Q4:rack/setup/install dse-k8s-wdqs100[1-3] (formerly wdqs103[6-8]) - https://phabricator.wikimedia.org/T423314#11976434 (10Jclark-ctr) {F86226390} These are Failing to image for preseed file
[13:26:30] <icinga-wm>	 PROBLEM - Host backup2013 is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:31] <federico3>	 !ack
[13:26:32] <sirenbot>	 8039 (ACKED)  Host db2175 (paged)
[13:26:38] <icinga-wm>	 PROBLEM - Host wikikube-worker2242 is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:38] <wikibugs>	 (03CR) 10Aqu: "Nice" [puppet] - 10https://gerrit.wikimedia.org/r/1295045 (https://phabricator.wikimedia.org/T427532) (owner: 10Dr0ptp4kt)
[13:26:38] <icinga-wm>	 PROBLEM - Host wikikube-worker2243 is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:38] <icinga-wm>	 PROBLEM - Host wikikube-worker2254 is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:38] <icinga-wm>	 PROBLEM - Host wikikube-worker2255 is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:42] <wikibugs>	 (03CR) 10Dreamy Jazz: "While this is kinda hacky, it's not intended as a long term fix and will be removed once the interface supports it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296557 (https://phabricator.wikimedia.org/T427612) (owner: 10Mpostoronca)
[13:26:44] <icinga-wm>	 PROBLEM - Host thanos-be2006 is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:44] <icinga-wm>	 PROBLEM - Host puppetserver2002 is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:46] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] wmf-config: Skip CAPTCHA for action=mcrundo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296557 (https://phabricator.wikimedia.org/T427612) (owner: 10Mpostoronca)
[13:26:49] <icinga-wm>	 PROBLEM - Host es2050 #page is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:50] <icinga-wm>	 PROBLEM - Host rdb2007 is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:55] <icinga-wm>	 PROBLEM - Host db2154 #page is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:55] <icinga-wm>	 PROBLEM - Host db2153 #page is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:56] <icinga-wm>	 PROBLEM - Host db2157 #page is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:57] <sukhe>	 uh?
[13:27:04] <federico3>	 wow
[13:27:05] <federico3>	 !ack
[13:27:06] <sirenbot>	 8040 (ACKED)  Host es2050 (paged)
[13:27:06] <sirenbot>	 8041 (ACKED)  Host db2154 (paged)
[13:27:07] <sirenbot>	 8042 (ACKED)  Host db2157 (paged)
[13:27:09] <icinga-wm>	 PROBLEM - Host db2176 #page is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:14] <icinga-wm>	 PROBLEM - Host wikikube-worker2017 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:14] <icinga-wm>	 PROBLEM - Host wikikube-worker2018 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:16] <icinga-wm>	 PROBLEM - Host wikikube-worker2041 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:16] <icinga-wm>	 PROBLEM - Host wikikube-worker2013 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:16] <icinga-wm>	 PROBLEM - Host wikikube-worker2014 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:16] <icinga-wm>	 PROBLEM - Host wikikube-worker2051 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:16] <icinga-wm>	 PROBLEM - Host wikikube-worker2044 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:16] <icinga-wm>	 PROBLEM - Host wikikube-worker2012 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:16] <icinga-wm>	 PROBLEM - Host wikikube-worker2074 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:17] <icinga-wm>	 PROBLEM - Host wikikube-worker2075 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:17] <icinga-wm>	 PROBLEM - Host wikikube-worker2092 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:18] <icinga-wm>	 PROBLEM - Host wikikube-worker2076 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:18] <icinga-wm>	 PROBLEM - Host wikikube-worker2091 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:19] <icinga-wm>	 PROBLEM - Host wikikube-worker2077 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:19] <icinga-wm>	 PROBLEM - Host wikikube-worker2078 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:20] <icinga-wm>	 PROBLEM - Host ml-serve2001 is DOWN: PING CRITICAL - Packet loss = 100%
[13:27:21] <logmsgbot>	 !log slyngshede@dns1004 END - running authdns-update
[13:27:24] <federico3>	 a the rack going down?
[13:27:35] <icinga-wm>	 RECOVERY - Host db2176 #page is UP: PING OK - Packet loss = 0%, RTA = 35.04 ms
[13:27:35] <XioNoX>	 that's us
[13:27:41] <icinga-wm>	 RECOVERY - Host db2154 #page is UP: PING OK - Packet loss = 0%, RTA = 32.95 ms
[13:27:41] <icinga-wm>	 RECOVERY - Host db2153 #page is UP: PING OK - Packet loss = 0%, RTA = 32.87 ms
[13:27:42] <icinga-wm>	 RECOVERY - Host rdb2007 is UP: PING OK - Packet loss = 0%, RTA = 33.00 ms
[13:27:42] <sukhe>	 A5?
[13:27:42] <icinga-wm>	 RECOVERY - Host db2157 #page is UP: PING OK - Packet loss = 0%, RTA = 33.50 ms
[13:27:42] <icinga-wm>	 RECOVERY - Host wikikube-worker2017 is UP: PING OK - Packet loss = 0%, RTA = 32.93 ms
[13:27:42] <icinga-wm>	 RECOVERY - Host wikikube-worker2018 is UP: PING OK - Packet loss = 0%, RTA = 32.90 ms
[13:27:43] <XioNoX>	 yeah, spine maintenance issue
[13:27:46] <icinga-wm>	 RECOVERY - Host wikikube-worker2014 is UP: PING OK - Packet loss = 0%, RTA = 32.93 ms
[13:27:46] <icinga-wm>	 RECOVERY - Host wikikube-worker2044 is UP: PING OK - Packet loss = 0%, RTA = 32.88 ms
[13:27:46] <icinga-wm>	 RECOVERY - Host wikikube-worker2051 is UP: PING OK - Packet loss = 0%, RTA = 32.83 ms
[13:27:46] <icinga-wm>	 RECOVERY - Host wikikube-worker2012 is UP: PING OK - Packet loss = 0%, RTA = 36.28 ms
[13:27:46] <icinga-wm>	 RECOVERY - Host wikikube-worker2013 is UP: PING OK - Packet loss = 0%, RTA = 32.88 ms
[13:27:46] <icinga-wm>	 RECOVERY - Host wikikube-worker2041 is UP: PING OK - Packet loss = 0%, RTA = 32.92 ms
[13:27:46] <icinga-wm>	 RECOVERY - Host wikikube-worker2075 is UP: PING OK - Packet loss = 0%, RTA = 32.93 ms
[13:27:47] <sukhe>	 ohphew
[13:27:47] <icinga-wm>	 RECOVERY - Host wikikube-worker2076 is UP: PING OK - Packet loss = 0%, RTA = 32.99 ms
[13:27:47] <icinga-wm>	 RECOVERY - Host wikikube-worker2091 is UP: PING OK - Packet loss = 0%, RTA = 32.83 ms
[13:27:48] <icinga-wm>	 RECOVERY - Host backup2013 is UP: PING OK - Packet loss = 0%, RTA = 32.92 ms
[13:27:48] <icinga-wm>	 RECOVERY - Host wikikube-worker2092 is UP: PING OK - Packet loss = 0%, RTA = 32.85 ms
[13:27:49] <icinga-wm>	 RECOVERY - Host wikikube-worker2078 is UP: PING OK - Packet loss = 0%, RTA = 33.54 ms
[13:27:49] <icinga-wm>	 RECOVERY - Host wikikube-worker2074 is UP: PING OK - Packet loss = 0%, RTA = 32.84 ms
[13:27:50] <icinga-wm>	 RECOVERY - Host db2175 #page is UP: PING OK - Packet loss = 0%, RTA = 33.52 ms
[13:27:50] <icinga-wm>	 RECOVERY - Host wikikube-worker2077 is UP: PING OK - Packet loss = 0%, RTA = 37.13 ms
[13:27:51] <icinga-wm>	 RECOVERY - Host ml-serve2001 is UP: PING OK - Packet loss = 0%, RTA = 33.01 ms
[13:27:51] <logmsgbot>	 jclark@cumin1003 reimage (PID 16585) is awaiting input
[13:27:52] <icinga-wm>	 RECOVERY - Host es2050 #page is UP: PING OK - Packet loss = 0%, RTA = 33.01 ms
[13:27:55] <XioNoX>	 yeah, change applied to one rack went fine, but not the other
[13:27:55] <claime>	 oof
[13:28:01] <federico3>	 !ack
[13:28:01] <sirenbot>	 All incidents are already acked.
[13:28:06] <icinga-wm>	 RECOVERY - Host wikikube-worker2255 is UP: PING OK - Packet loss = 0%, RTA = 32.84 ms
[13:28:06] <icinga-wm>	 RECOVERY - Host wikikube-worker2243 is UP: PING OK - Packet loss = 0%, RTA = 32.83 ms
[13:28:06] <icinga-wm>	 RECOVERY - Host wikikube-worker2254 is UP: PING OK - Packet loss = 0%, RTA = 33.28 ms
[13:28:06] <icinga-wm>	 RECOVERY - Host wikikube-worker2242 is UP: PING OK - Packet loss = 0%, RTA = 32.86 ms
[13:28:11] <sukhe>	 free cardio in the morning, sitting. nice :D
[13:28:14] <icinga-wm>	 RECOVERY - Host puppetserver2002 is UP: PING OK - Packet loss = 0%, RTA = 32.94 ms
[13:28:14] <icinga-wm>	 RECOVERY - Host thanos-be2006 is UP: PING OK - Packet loss = 0%, RTA = 32.88 ms
[13:28:58] <XioNoX>	 for what it's worth it's not a full failure, but most likely the monitoring host to that rack lost connectivity
[13:29:13] <wikibugs>	 (03CR) 10Ladsgroup: profile::firewall: Allow to provide more fine-grained access from monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1296251 (owner: 10Muehlenhoff)
[13:29:19] <federico3>	 I'm looking at the DB metrics
[13:29:20] <XioNoX>	 some other pings to/from that rack were still fine, we're investigating
[13:29:41] <logmsgbot>	 jclark@cumin1003 reimage (PID 16602) is awaiting input
[13:30:00] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1210 (T426633)', diff saved to https://phabricator.wikimedia.org/P93559 and previous config saved to /var/cache/conftool/dbconfig/20260602-132959-fceratto.json
[13:31:13] <federico3>	 indeed on the DB side they don't seem to have lost connectivity for a significant amount of time
[13:31:24] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage
[13:31:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in codfw - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[13:32:45] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage
[13:33:09] <jynus>	 I got a backup timeout at :27
[13:33:14] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply
[13:33:18] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply
[13:34:04] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply
[13:34:10] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply
[13:35:13] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs1003.eqiad.wmnet with OS trixie
[13:35:18] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs1002.eqiad.wmnet with OS trixie
[13:35:26] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Wikidata Platform Team, and 2 others: Q4:rack/setup/install dse-k8s-wdqs100[1-3] (formerly wdqs103[6-8]) - https://phabricator.wikimedia.org/T423314#11976478 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host dse-k8s-wdqs1003.eqi...
[13:35:30] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Wikidata Platform Team, and 2 others: Q4:rack/setup/install dse-k8s-wdqs100[1-3] (formerly wdqs103[6-8]) - https://phabricator.wikimedia.org/T423314#11976479 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host dse-k8s-wdqs1002.eqi...
[13:35:57] <XioNoX>	 federico3: all good for the DBs?
[13:36:54] <jynus>	 is it related to the maintenance or unrelated?
[13:37:27] <federico3>	 yes, they don't show drops in traffic
[13:37:40] <XioNoX>	 jynus: related
[13:37:42] <topranks>	 it's related to the maintenance yes, occurred after we de-preffed the link from ssw1-a1-codfw to lsw1-a5-codfw 
[13:37:58] <topranks>	 though we did not expect that to interrupt things 
[13:38:07] <jynus>	 yeah, np
[13:38:12] <topranks>	 the change was rolled back after which we got the recoveries 
[13:38:14] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Observability-Logging: Degraded RAID on centrallog1002 - https://phabricator.wikimedia.org/T427748#11976487 (10Jclark-ctr) @colewhite  can this be swapped at any time would you be able to rebuild after swapping?
[13:38:14] <jynus>	 was the work itself finished?
[13:38:15] <topranks>	 sorry folks <3 
[13:38:18] <jynus>	 ah, I got my answer
[13:38:21] <topranks>	 yes 
[13:38:27] <topranks>	 we have to re-think this 
[13:38:37] <jynus>	 I just want to know if to wait a bit before retrying the long running backups
[13:38:39] * urbanecm would like to do a (no-op) config change, waiting for things to settle down
[13:38:43] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage
[13:39:15] <jynus>	 I am not affected by the interrumption a lot service-wise, just for retries on ongoing maintenance
[13:39:29] <jynus>	 so waiting for a green light to retry
[13:40:08] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P93560 and previous config saved to /var/cache/conftool/dbconfig/20260602-134007-fceratto.json
[13:40:35] <wikibugs>	 (03CR) 10Kamila Součková: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1295979 (https://phabricator.wikimedia.org/T427553) (owner: 10Kamila Součková)
[13:40:37] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS trixie
[13:42:52] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage
[13:43:48] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS trixie
[13:44:50] <Dreamy_Jazz>	 MatmaRex: anzx: I need to go shortly, so as CI is still blocked I won't be able to do your backports
[13:44:54] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to Cassandra staging for akhatun - https://phabricator.wikimedia.org/T427701#11976504 (10Raine)
[13:44:56] <wikibugs>	 (03CR) 10Muehlenhoff: profile::firewall: Allow to provide more fine-grained access from monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1296251 (owner: 10Muehlenhoff)
[13:45:06] <wikibugs>	 (03PS4) 10Effie Mouzeli: site.pp: add rdb2013 and rdb2014 [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924)
[13:45:14] <icinga-wm>	 RECOVERY - Postfix SMTP on crm2001 is OK: OK - Certificate crm2001.codfw.wmnet will expire on Tue 30 Jun 2026 01:10:00 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mail%23Troubleshooting
[13:45:25] <wikibugs>	 (03PS1) 10JMeybohm: partman/reuse-raid10-6dev.cfg: Apply workaround to swap handling affecting trixie installations [puppet] - 10https://gerrit.wikimedia.org/r/1296597 (https://phabricator.wikimedia.org/T427088)
[13:45:35] <wikibugs>	 (03CR) 10CI reject: [V:04-1] site.pp: add rdb2013 and rdb2014 [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[13:45:40] <Dreamy_Jazz>	 Maybe Msz2001 can you handle these changes?
[13:45:45] <MatmaRex>	 Dreamy_Jazz: no problem
[13:45:46] <wikibugs>	 (03CR) 10CI reject: [V:04-1] partman/reuse-raid10-6dev.cfg: Apply workaround to swap handling affecting trixie installations [puppet] - 10https://gerrit.wikimedia.org/r/1296597 (https://phabricator.wikimedia.org/T427088) (owner: 10JMeybohm)
[13:45:48] <wikibugs>	 (03CR) 10Effie Mouzeli: "I removed the preseed addition, by public demand" [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[13:45:51] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] profile::firewall: Allow to provide more fine-grained access from monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1296251 (owner: 10Muehlenhoff)
[13:45:52] <Msz2001>	 I can
[13:45:57] <MatmaRex>	 i can reschedule if we don't have time. i didn't have anything important
[13:46:09] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to Cassandra staging for akhatun - https://phabricator.wikimedia.org/T427701#11976521 (10Raine) @Ahoelzl can you please approve? Thanks!
[13:46:13] <Dreamy_Jazz>	 Thanks, see you all around o/
[13:46:19] <wikibugs>	 (03CR) 10JMeybohm: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1296597 (https://phabricator.wikimedia.org/T427088) (owner: 10JMeybohm)
[13:46:37] <wikibugs>	 (03CR) 10Effie Mouzeli: site.pp: add rdb2013 and rdb2014 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[13:46:45] <wikibugs>	 (03PS5) 10Effie Mouzeli: site.pp: add rdb2013 and rdb2014 [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924)
[13:47:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] site.pp: add rdb2013 and rdb2014 [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[13:47:18] <topranks>	 !log revert all config to normal on cr1-codfw and ssw1-a1-codfw 
[13:47:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:48] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2014.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[13:50:08] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to Cassandra staging for akhatun - https://phabricator.wikimedia.org/T427701#11976541 (10Raine) @KOfori can you please approve access as group approver? Thank you!
[13:50:15] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P93561 and previous config saved to /var/cache/conftool/dbconfig/20260602-135015-fceratto.json
[13:50:52] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] profile::firewall: Allow to provide more fine-grained access from monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1296251 (owner: 10Muehlenhoff)
[13:51:06] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[13:51:48] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[13:51:51] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[13:52:39] <wikibugs>	 (03PS6) 10Effie Mouzeli: site.pp: add rdb2013 and rdb2014 [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924)
[13:52:58] <wikibugs>	 (03CR) 10CI reject: [V:04-1] site.pp: add rdb2013 and rdb2014 [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[13:54:31] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Observability-Logging: Degraded RAID on centrallog1002 - https://phabricator.wikimedia.org/T427748#11976559 (10colewhite) >>! In T427748#11976487, @Jclark-ctr wrote: > @colewhite  can this be swapped at any time would you be able to rebuild after swapping?  Yes, I can do the...
[13:54:47] <wikibugs>	 (03CR) 10JMeybohm: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1296597 (https://phabricator.wikimedia.org/T427088) (owner: 10JMeybohm)
[13:54:48] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2013.codfw.wmnet, wdqs2021.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[13:55:04] <logmsgbot>	 jclark@cumin1003 reimage (PID 13706) is awaiting input
[13:55:11] <wikibugs>	 (03CR) 10Muehlenhoff: profile::firewall: Allow to provide more fine-grained access from monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1296251 (owner: 10Muehlenhoff)
[13:55:38] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] profile::firewall: Allow to provide more fine-grained access from monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1296251 (owner: 10Muehlenhoff)
[13:55:54] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1167.eqiad.wmnet with OS trixie
[13:56:27] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage
[13:56:38] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] profile::firewall: Allow to provide more fine-grained access from monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1296251 (owner: 10Muehlenhoff)
[13:56:48] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[13:59:21] <wikibugs>	 (03CR) 10Aqu: Add commonswiki globalimagelinks monthly sqoop (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1295045 (https://phabricator.wikimedia.org/T427532) (owner: 10Dr0ptp4kt)
[14:00:00] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS trixie
[14:00:05] <jouncebot>	 Deploy window Test Kitchen UI Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1400)
[14:00:23] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1210 (T426633)', diff saved to https://phabricator.wikimedia.org/P93562 and previous config saved to /var/cache/conftool/dbconfig/20260602-140022-fceratto.json
[14:00:24] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2011,2033-2034,2050,2055-2062,2068-2071,2107-2113].codfw.wmnet
[14:00:38] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2011,2033-2034,2050,2055-2062,2068-2071,2107-2113].codfw.wmnet
[14:00:45] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage
[14:01:11] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS trixie
[14:01:13] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[14:01:32] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[14:01:40] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1156 (T426633)', diff saved to https://phabricator.wikimedia.org/P93563 and previous config saved to /var/cache/conftool/dbconfig/20260602-140140-fceratto.json
[14:01:45] <jinxer-wm>	 RESOLVED: WidespreadPuppetFailure: Puppet has failed in codfw - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[14:02:33] <XioNoX>	 federico3: you can repool servers for https://phabricator.wikimedia.org/T427301
[14:02:40] <federico3>	 thanks
[14:03:37] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11976599 (10ayounsi)
[14:04:53] <wikibugs>	 (03CR) 10Arnaudb: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1290731 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb)
[14:05:26] <logmsgbot>	 !log cwilliams@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
[14:06:48] <wikibugs>	 (03PS1) 10Jcrespo: dbbackups: Testing x1 backups on new cumin2003 trixie host [puppet] - 10https://gerrit.wikimedia.org/r/1296602 (https://phabricator.wikimedia.org/T427897)
[14:06:48] <wikibugs>	 (03CR) 10JMeybohm: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1296597 (https://phabricator.wikimedia.org/T427088) (owner: 10JMeybohm)
[14:07:08] <wikibugs>	 (03CR) 10CI reject: [V:04-1] dbbackups: Testing x1 backups on new cumin2003 trixie host [puppet] - 10https://gerrit.wikimedia.org/r/1296602 (https://phabricator.wikimedia.org/T427897) (owner: 10Jcrespo)
[14:07:23] <wikibugs>	 (03CR) 10Jcrespo: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1296602 (https://phabricator.wikimedia.org/T427897) (owner: 10Jcrespo)
[14:08:42] <logmsgbot>	 !log urbanecm@deploy1003 mwscript-k8s job started: foreachwikiindblist growthexperiments userOptions.php --delete growthexperiments-homepage-variant  # T417621
[14:08:45] <stashbot>	 T417621: Remove 'growthexperiments-homepage-variant' user property from all wikis where it's present - https://phabricator.wikimedia.org/T417621
[14:08:46] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1296597 (https://phabricator.wikimedia.org/T427088) (owner: 10JMeybohm)
[14:09:04] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.hosts.decommission for hosts mc1048.eqiad.wmnet
[14:09:05] <logmsgbot>	 !log urbanecm@deploy1003 mwscript-k8s job started: foreachwikiindblist growthexperiments userOptions.php --delete --nowarn growthexperiments-homepage-variant  # T417621
[14:09:59] <wikibugs>	 (03CR) 10JMeybohm: [V:03+2 C:03+2] partman/reuse-raid10-6dev.cfg: Apply workaround to swap handling affecting trixie installations [puppet] - 10https://gerrit.wikimedia.org/r/1296597 (https://phabricator.wikimedia.org/T427088) (owner: 10JMeybohm)
[14:10:20] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T426633)', diff saved to https://phabricator.wikimedia.org/P93564 and previous config saved to /var/cache/conftool/dbconfig/20260602-141019-fceratto.json
[14:11:08] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: re-rack mc2055 (before Jun 9th) - https://phabricator.wikimedia.org/T427373#11976649 (10Jhancock.wm) @jijiki i'm ready whenever you are to do the move. should only take about 20-30 minutes. I do have a meeting this morning bu...
[14:13:55] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage
[14:14:28] <Msz2001>	 MatmaRex, anzx: I'll reschedule my patches from today's window for tomorrow UTC morning. I can deploy yours as well at that time if you're okay with that (they seem trivial enough that they need no verification or that I can verify them myself)
[14:14:31] <logmsgbot>	 jayme@cumin2002 reimage (PID 3580405) is awaiting input
[14:14:43] <wikibugs>	 (03CR) 10Jcrespo: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1296602 (https://phabricator.wikimedia.org/T427897) (owner: 10Jcrespo)
[14:14:51] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.dns.netbox
[14:15:08] <MatmaRex>	 Msz2001: sure, that's cool with me
[14:15:11] <anzx>	 Msz2001: ok, thanks
[14:15:28] <MatmaRex>	 if anything turns out to not be trivial, i'll reschedule it :)
[14:15:36] <MatmaRex>	 thanks
[14:15:48] <logmsgbot>	 !log jayme@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2006.codfw.wmnet with OS trixie
[14:15:56] <Msz2001>	 yw
[14:16:46] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS trixie
[14:17:03] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.remove-downtime for db1167.eqiad.wmnet
[14:17:04] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1167.eqiad.wmnet
[14:17:20] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS trixie
[14:17:22] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS trixie
[14:18:36] <wikibugs>	 (03CR) 10Mforns: Add filerevision to the mediawiki not-history sqoop (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1295047 (https://phabricator.wikimedia.org/T427532) (owner: 10Dr0ptp4kt)
[14:19:30] <wikibugs>	 (03CR) 10Mforns: "I think if we use sqoopable_dblist in the other change, then we don't need this change at all, no?" [puppet] - 10https://gerrit.wikimedia.org/r/1295045 (https://phabricator.wikimedia.org/T427532) (owner: 10Dr0ptp4kt)
[14:20:28] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P93566 and previous config saved to /var/cache/conftool/dbconfig/20260602-142027-fceratto.json
[14:20:29] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage
[14:20:41] <logmsgbot>	 jiji@cumin1003 decommission (PID 74967) is awaiting input
[14:21:18] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db1167: Repooling after Icing wait-for-green timeout
[14:23:32] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "(removing CR+2 so this doesn’t get merged accidentally without being deployed; AFAICT from a glance at the IRC backscroll, Zuul / gate-and" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1283106 (owner: 10Bartosz Dziewoński)
[14:23:41] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Remove unused 'writeapi' right [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1283106 (owner: 10Bartosz Dziewoński)
[14:23:42] <Lucas_WMDE>	 Dreamy_Jazz: ^ fyi, I hope that’s okay
[14:23:51] <icinga-wm>	 RECOVERY - Router interfaces on mr1-codfw is OK: OK: host 208.80.153.196, interfaces up: 33, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[14:25:00] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1048.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[14:26:33] <wikibugs>	 (03CR) 10Mszwarc: [C:03+1] "Removed CR+2 as it didn't get deployed due to CI problems. Let's not have dangling +2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295502 (owner: 10Bartosz Dziewoński)
[14:26:42] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "labswiki: Disallow account autocreation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295502 (owner: 10Bartosz Dziewoński)
[14:26:46] <wikibugs>	 (03CR) 10Mszwarc: [C:03+1] "Removed CR+2 as it didn't get deployed due to CI problems. Let's not have dangling +2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296563 (https://phabricator.wikimedia.org/T389433) (owner: 10Bartosz Dziewoński)
[14:26:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Remove workaround for stuck session cookies on Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296563 (https://phabricator.wikimedia.org/T389433) (owner: 10Bartosz Dziewoński)
[14:26:59] <wikibugs>	 (03CR) 10Mszwarc: [C:03+1] "Removed CR+2 as it didn't get deployed due to CI problems. Let's not have dangling +2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296566 (owner: 10Bartosz Dziewoński)
[14:27:02] <wikibugs>	 (03PS1) 10Btullis: dumps: http: Stop prepending the hostname to the syslog events [puppet] - 10https://gerrit.wikimedia.org/r/1296605 (https://phabricator.wikimedia.org/T425087)
[14:27:07] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Clean up bot password configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296566 (owner: 10Bartosz Dziewoński)
[14:27:21] <wikibugs>	 (03CR) 10CI reject: [V:04-1] dumps: http: Stop prepending the hostname to the syslog events [puppet] - 10https://gerrit.wikimedia.org/r/1296605 (https://phabricator.wikimedia.org/T425087) (owner: 10Btullis)
[14:27:50] <wikibugs>	 (03CR) 10Mszwarc: [C:03+1] "Removed CR+2 as it didn't get deployed due to CI problems. Let's not have dangling +2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295574 (https://phabricator.wikimedia.org/T427678) (owner: 10Anzx)
[14:27:58] <wikibugs>	 (03CR) 10CI reject: [V:04-1] cswiki: lift IP cap for workshop on 08-June-2026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295574 (https://phabricator.wikimedia.org/T427678) (owner: 10Anzx)
[14:28:05] <logmsgbot>	 jiji@cumin1003 decommission (PID 74967) is awaiting input
[14:28:24] <wikibugs>	 (03CR) 10Anzx: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295574 (https://phabricator.wikimedia.org/T427678) (owner: 10Anzx)
[14:30:05] <jouncebot>	 Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1430)
[14:30:19] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage
[14:30:27] <wikibugs>	 (03PS2) 10Btullis: dumps: http: Stop prepending the hostname to the syslog events [puppet] - 10https://gerrit.wikimedia.org/r/1296605 (https://phabricator.wikimedia.org/T425087)
[14:30:36] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P93569 and previous config saved to /var/cache/conftool/dbconfig/20260602-143035-fceratto.json
[14:30:46] <wikibugs>	 (03CR) 10CI reject: [V:04-1] dumps: http: Stop prepending the hostname to the syslog events [puppet] - 10https://gerrit.wikimedia.org/r/1296605 (https://phabricator.wikimedia.org/T425087) (owner: 10Btullis)
[14:31:53] <wikibugs>	 (03PS3) 10Btullis: dumps: http: Stop prepending the hostname to the syslog events [puppet] - 10https://gerrit.wikimedia.org/r/1296605 (https://phabricator.wikimedia.org/T425087)
[14:32:17] <wikibugs>	 (03CR) 10CI reject: [V:04-1] dumps: http: Stop prepending the hostname to the syslog events [puppet] - 10https://gerrit.wikimedia.org/r/1296605 (https://phabricator.wikimedia.org/T425087) (owner: 10Btullis)
[14:32:18] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] P:cumin:master remove liberica alias for eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1296587 (owner: 10Slyngshede)
[14:32:44] <wikibugs>	 (03CR) 10CI reject: [V:04-1] P:cumin:master remove liberica alias for eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1296587 (owner: 10Slyngshede)
[14:33:57] <wikibugs>	 (03PS4) 10Btullis: dumps: http: Stop prepending the hostname to the syslog events [puppet] - 10https://gerrit.wikimedia.org/r/1296605 (https://phabricator.wikimedia.org/T425087)
[14:34:16] <wikibugs>	 (03CR) 10CI reject: [V:04-1] dumps: http: Stop prepending the hostname to the syslog events [puppet] - 10https://gerrit.wikimedia.org/r/1296605 (https://phabricator.wikimedia.org/T425087) (owner: 10Btullis)
[14:34:49] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2013.codfw.wmnet, wdqs2021.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2010.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[14:34:51] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage
[14:35:37] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs1017.eqiad.wmnet, wdqs1021.eqiad.wmnet, wdqs1014.eqiad.wmnet, wdqs1016.eqiad.wmnet, wdqs1020.eqiad.wmnet, wdqs1012.eqiad.wmnet, wdqs1022.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[14:35:37] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs1021.eqiad.wmnet, wdqs1020.eqiad.wmnet, wdqs1022.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[14:36:49] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS trixie
[14:37:24] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS trixie
[14:37:31] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1048.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[14:37:32] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:37:33] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1048.eqiad.wmnet
[14:37:37] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[14:37:37] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[14:37:58] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[14:38:38] <wikibugs>	 (03PS2) 10Urbanecm: [Growth] Set wgGEMentorshipCleanupEnabled to false on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296514 (https://phabricator.wikimedia.org/T427386)
[14:38:42] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver: apply
[14:38:45] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver: apply
[14:38:48] <wikibugs>	 (03CR) 10CI reject: [V:04-1] [Growth] Set wgGEMentorshipCleanupEnabled to false on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296514 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[14:38:49] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2008.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[14:38:57] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[14:39:31] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1296587 (owner: 10Slyngshede)
[14:40:40] <logmsgbot>	 jiji@cumin1003 decommission (PID 99366) is awaiting input
[14:40:44] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T426633)', diff saved to https://phabricator.wikimedia.org/P93571 and previous config saved to /var/cache/conftool/dbconfig/20260602-144043-fceratto.json
[14:40:49] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply
[14:41:02] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.pool pool db2158: Repooling
[14:41:03] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[14:41:11] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1182 (T426633)', diff saved to https://phabricator.wikimedia.org/P93573 and previous config saved to /var/cache/conftool/dbconfig/20260602-144110-fceratto.json
[14:41:20] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.pool pool pc2021: Repooling
[14:41:20] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.parsercache
[14:41:35] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
[14:41:35] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool pc2021: Repooling
[14:41:45] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for db2158.codfw.wmnet
[14:41:45] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2158.codfw.wmnet
[14:41:55] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for db2250.codfw.wmnet
[14:41:55] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2250.codfw.wmnet
[14:42:04] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for pc2021.codfw.wmnet
[14:42:04] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for pc2021.codfw.wmnet
[14:42:38] <dancy>	 jouncebot nowandnext
[14:42:38] <jouncebot>	 For the next 0 hour(s) and 17 minute(s): Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1430)
[14:42:38] <jouncebot>	 In 0 hour(s) and 17 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1500)
[14:42:49] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[14:42:49] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[14:43:24] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "I think we should split this up… first add the new `'msg'` keys, deploy that, make the Wikibase changes depend on that and merge those, th" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295978 (https://phabricator.wikimedia.org/T427804) (owner: 10Audrey Penven)
[14:43:27] <icinga-wm>	 PROBLEM - Backup freshness on backup1014 is CRITICAL: All failures: 1 (backup2013), Fresh: 138 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[14:44:30] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis to 1.47.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296606 (https://phabricator.wikimedia.org/T423914)
[14:44:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Initiated by dancy@deploy1003" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296606 (https://phabricator.wikimedia.org/T423914) (owner: 10TrainBranchBot)
[14:44:44] <wikibugs>	 (03CR) 10CI reject: [V:04-1] testwikis to 1.47.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296606 (https://phabricator.wikimedia.org/T423914) (owner: 10TrainBranchBot)
[14:44:50] <wikibugs>	 (03CR) 10EMcFarland: [C:03+1] "Looks good, but even if I could +2 this, I'd want someone with more Puppet experience to do the final +2." [puppet] - 10https://gerrit.wikimedia.org/r/1296519 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[14:45:30] <wikibugs>	 (03Abandoned) 10Ahmon Dancy: testwikis to 1.47.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296606 (https://phabricator.wikimedia.org/T423914) (owner: 10TrainBranchBot)
[14:45:44] <wikibugs>	 (03CR) 10EMcFarland: [C:03+1] "Looking good." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296514 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[14:48:26] <wikibugs>	 (03CR) 10Jcrespo: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1296602 (https://phabricator.wikimedia.org/T427897) (owner: 10Jcrespo)
[14:49:33] <wikibugs>	 (03PS1) 10Btullis: kafka event platform logs - Strip the stray $!msg field [puppet] - 10https://gerrit.wikimedia.org/r/1296607 (https://phabricator.wikimedia.org/T291645)
[14:49:36] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T426633)', diff saved to https://phabricator.wikimedia.org/P93575 and previous config saved to /var/cache/conftool/dbconfig/20260602-144935-fceratto.json
[14:50:17] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage
[14:50:23] <wikibugs>	 06SRE, 06Commons, 06DBA, 06Traffic: Unable to save edits or delete pages on Commons – database lag - https://phabricator.wikimedia.org/T402749#11976839 (10Ademola) Implementation is ready and tested. For batches under 200 files, maxSimultaneousReq is now set to 4; larger batches remain at 2. Source cod...
[14:50:52] <logmsgbot>	 !log trueg@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply
[14:51:04] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS trixie
[14:51:12] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] dbbackups: Testing x1 backups on new cumin2003 trixie host [puppet] - 10https://gerrit.wikimedia.org/r/1296602 (https://phabricator.wikimedia.org/T427897) (owner: 10Jcrespo)
[14:51:35] <wikibugs>	 (03CR) 10Btullis: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1296605 (https://phabricator.wikimedia.org/T425087) (owner: 10Btullis)
[14:51:41] <wikibugs>	 (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1296607 (https://phabricator.wikimedia.org/T291645) (owner: 10Btullis)
[14:51:45] <wikibugs>	 (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1296605 (https://phabricator.wikimedia.org/T425087) (owner: 10Btullis)
[14:52:01] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.hosts.decommission for hosts mc1049.eqiad.wmnet
[14:52:22] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver: apply
[14:52:28] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver: apply
[14:53:44] <wikibugs>	 (03CR) 10Slyngshede: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1296587 (owner: 10Slyngshede)
[14:54:31] <wikibugs>	 (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296514 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[14:54:32] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage
[14:56:22] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.14 point update - https://phabricator.wikimedia.org/T426759#11976878 (10MoritzMuehlenhoff)
[14:57:52] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] P:cumin:master remove liberica alias for eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1296587 (owner: 10Slyngshede)
[14:58:36] <wikibugs>	 (03CR) 10Slyngshede: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1295979 (https://phabricator.wikimedia.org/T427553) (owner: 10Kamila Součková)
[14:59:06] <wikibugs>	 (03CR) 10Effie Mouzeli: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[14:59:09] <wikibugs>	 (03CR) 10Hashar: "recheck CI had some issue." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295502 (owner: 10Bartosz Dziewoński)
[14:59:13] <wikibugs>	 (03CR) 10Hashar: "recheck CI had some issue." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296563 (https://phabricator.wikimedia.org/T389433) (owner: 10Bartosz Dziewoński)
[14:59:16] <wikibugs>	 (03CR) 10Hashar: "recheck CI had some issue." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296566 (owner: 10Bartosz Dziewoński)
[14:59:20] <wikibugs>	 (03CR) 10Hashar: "recheck CI had some issue." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295574 (https://phabricator.wikimedia.org/T427678) (owner: 10Anzx)
[14:59:25] <wikibugs>	 (03CR) 10CI reject: [V:04-1] admin: add apdube-wmf user [puppet] - 10https://gerrit.wikimedia.org/r/1295979 (https://phabricator.wikimedia.org/T427553) (owner: 10Kamila Součková)
[14:59:32] <Dreamy_Jazz>	 jouncebot: nowandnext
[14:59:32] <jouncebot>	 For the next 0 hour(s) and 0 minute(s): Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1430)
[14:59:32] <jouncebot>	 In 0 hour(s) and 0 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1500)
[14:59:43] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P93578 and previous config saved to /var/cache/conftool/dbconfig/20260602-145943-fceratto.json
[15:00:05] <jouncebot>	 jelto, arnoldokoth, mutante, and arnaudb: May I have your attention please! SRE Collaboration Services office hours. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1500)
[15:00:21] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] [Growth] Set wgGEMentorshipCleanupEnabled to false on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296514 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[15:00:44] <Dreamy_Jazz>	 (Spiderpig isn't working for me)
[15:01:15] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.dns.netbox
[15:01:27] <urbanecm>	 Dreamy_Jazz: tbh i switched back to deploy once i got it to the "it broke when i opened my job"  state
[15:01:59] <wikibugs>	 (03Merged) 10jenkins-bot: [Growth] Set wgGEMentorshipCleanupEnabled to false on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296514 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[15:02:01] <wikibugs>	 (03PS1) 10Atsuko: services_proxy: switch to prod opensearch-on-k8s services [puppet] - 10https://gerrit.wikimedia.org/r/1296608 (https://phabricator.wikimedia.org/T424248)
[15:02:05] <Dreamy_Jazz>	 I assume you are deploying now?
[15:02:20] <Dreamy_Jazz>	 If so I'll go in the queue behind you
[15:02:20] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS trixie
[15:02:43] <urbanecm>	 Dreamy_Jazz: yep, it should be quick
[15:02:46] <logmsgbot>	 !log urbanecm@deploy1003 Started scap sync-world: Backport for [[gerrit:1296514|[Growth] Set wgGEMentorshipCleanupEnabled to false on all wikis (T427386)]]
[15:02:53] <stashbot>	 T427386: Deploy automated mentor list cleanup to Wikimedia wikis - https://phabricator.wikimedia.org/T427386
[15:04:22] <wikibugs>	 (03CR) 10Blake: [C:03+1] site.pp: add rdb2013 and rdb2014 [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[15:05:04] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Cumin hosts to Trixie - https://phabricator.wikimedia.org/T427897#11976933 (10jcrespo) I tested remote backups, and packages seem to be in a working state, but cumin (a dependency) seem to not be working well or lacking extra setup. No worries,...
[15:05:33] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] site.pp: add rdb2013 and rdb2014 [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[15:05:47] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1049.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[15:05:57] <wikibugs>	 (03PS1) 10Scott French: shellbox: Pick up newly rebuilt images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296585
[15:06:06] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1049.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[15:06:06] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:06:07] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1049.eqiad.wmnet
[15:06:14] <wikibugs>	 (03PS1) 10Jcrespo: Revert "dbbackups: Testing x1 backups on new cumin2003 trixie host" [puppet] - 10https://gerrit.wikimedia.org/r/1296611
[15:06:43] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.hosts.decommission for hosts mc1050.eqiad.wmnet
[15:06:47] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1167: Repooling after Icing wait-for-green timeout
[15:07:13] <wikibugs>	 (03PS1) 10Ottomata: mw-content-history-reconcile-enrich-* [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296612 (https://phabricator.wikimedia.org/T421237)
[15:08:10] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] shellbox: Pick up newly rebuilt images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296585 (owner: 10Scott French)
[15:08:14] <wikibugs>	 (03CR) 10A-pizzata: [C:03+1] "LGTM, thx a lot!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296612 (https://phabricator.wikimedia.org/T421237) (owner: 10Ottomata)
[15:08:55] <wikibugs>	 (03CR) 10Jcrespo: "CCing Moritz for awereness (I commented on the ticket too), although I am sure he is aware." [puppet] - 10https://gerrit.wikimedia.org/r/1296611 (owner: 10Jcrespo)
[15:08:59] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] Revert "dbbackups: Testing x1 backups on new cumin2003 trixie host" [puppet] - 10https://gerrit.wikimedia.org/r/1296611 (owner: 10Jcrespo)
[15:09:08] <logmsgbot>	 !log urbanecm@deploy1003 Finished scap sync-world: Backport for [[gerrit:1296514|[Growth] Set wgGEMentorshipCleanupEnabled to false on all wikis (T427386)]] (duration: 06m 22s)
[15:09:12] <stashbot>	 T427386: Deploy automated mentor list cleanup to Wikimedia wikis - https://phabricator.wikimedia.org/T427386
[15:09:16] <urbanecm>	 Dreamy_Jazz: i'm done, over to you
[15:09:28] <Dreamy_Jazz>	 Thanks
[15:09:51] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P93580 and previous config saved to /var/cache/conftool/dbconfig/20260602-150951-fceratto.json
[15:09:54] <Dreamy_Jazz>	 Going to retry the spiderpig deploy I was doing in the window including the changes in said window
[15:10:10] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295502 (owner: 10Bartosz Dziewoński)
[15:10:10] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1283106 (owner: 10Bartosz Dziewoński)
[15:10:11] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296566 (owner: 10Bartosz Dziewoński)
[15:10:11] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296563 (https://phabricator.wikimedia.org/T389433) (owner: 10Bartosz Dziewoński)
[15:10:11] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295574 (https://phabricator.wikimedia.org/T427678) (owner: 10Anzx)
[15:10:13] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/GlobalBlocking] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296582 (https://phabricator.wikimedia.org/T277942) (owner: 10Dreamy Jazz)
[15:10:25] <wikibugs>	 (03CR) 10Urbanecm: [C:04-1] "Needs the MW patch to be both merged and deployed" [puppet] - 10https://gerrit.wikimedia.org/r/1296519 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[15:10:48] <wikibugs>	 06SRE, 06ServiceOps new: Build httpbb for Trixie - https://phabricator.wikimedia.org/T427899#11977012 (10MLechvien-WMF) p:05Triage→03Medium a:03RLazarus
[15:11:27] <wikibugs>	 (03PS1) 10JMeybohm: partman/reuse-raid10-6dev.cfg: Apply swap workaround [puppet] - 10https://gerrit.wikimedia.org/r/1296613 (https://phabricator.wikimedia.org/T427088)
[15:11:39] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "labswiki: Disallow account autocreation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295502 (owner: 10Bartosz Dziewoński)
[15:11:42] <wikibugs>	 (03Merged) 10jenkins-bot: Remove unused 'writeapi' right [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1283106 (owner: 10Bartosz Dziewoński)
[15:11:44] <wikibugs>	 (03Merged) 10jenkins-bot: Clean up bot password configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296566 (owner: 10Bartosz Dziewoński)
[15:11:52] <wikibugs>	 (03Merged) 10jenkins-bot: Use the globalblock-local-status right over globalblock-whitelist [extensions/GlobalBlocking] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296582 (https://phabricator.wikimedia.org/T277942) (owner: 10Dreamy Jazz)
[15:12:01] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS trixie
[15:12:26] <logmsgbot>	 jayme@cumin2002 reimage (PID 3592644) is awaiting input
[15:12:27] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] partman/reuse-raid10-6dev.cfg: Apply swap workaround [puppet] - 10https://gerrit.wikimedia.org/r/1296613 (https://phabricator.wikimedia.org/T427088) (owner: 10JMeybohm)
[15:12:45] <logmsgbot>	 !log jayme@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2006.codfw.wmnet with OS trixie
[15:12:47] <wikibugs>	 (03Merged) 10jenkins-bot: Remove workaround for stuck session cookies on Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296563 (https://phabricator.wikimedia.org/T389433) (owner: 10Bartosz Dziewoński)
[15:12:51] <wikibugs>	 (03Merged) 10jenkins-bot: cswiki: lift IP cap for workshop on 08-June-2026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295574 (https://phabricator.wikimedia.org/T427678) (owner: 10Anzx)
[15:12:57] <wikibugs>	 (03CR) 10Ottomata: [C:03+2] mw-content-history-reconcile-enrich-* [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296612 (https://phabricator.wikimedia.org/T421237) (owner: 10Ottomata)
[15:13:23] <logmsgbot>	 !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1295502|Revert "labswiki: Disallow account autocreation"]], [[gerrit:1283106|Remove unused 'writeapi' right]], [[gerrit:1296566|Clean up bot password configuration]], [[gerrit:1296563|Remove workaround for stuck session cookies on Wikitech (T389433)]], [[gerrit:1295574|cswiki: lift IP cap for workshop on 08-June-2026 (T427678)]], [[gerrit:1296582|Us
[15:13:23] <logmsgbot>	 e the globalblock-local-status right over globalblock-whitelist (T277942)]]
[15:13:27] <stashbot>	 T389433: Fix stuck old cookies on Wikitech - https://phabricator.wikimedia.org/T389433
[15:13:28] <stashbot>	 T427678: Lift IP cap on 2026-06-08 for Czech Wikipedia workshop - cs.wikipedia - https://phabricator.wikimedia.org/T427678
[15:13:28] <stashbot>	 T277942: Address Voice and Tone issues in GlobalBlocking - https://phabricator.wikimedia.org/T277942
[15:14:24] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.dns.netbox
[15:14:26] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1014:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:15:10] <logmsgbot>	 !log dreamyjazz@deploy1003 matmarex, anzx, dreamyjazz: Backport for [[gerrit:1295502|Revert "labswiki: Disallow account autocreation"]], [[gerrit:1283106|Remove unused 'writeapi' right]], [[gerrit:1296566|Clean up bot password configuration]], [[gerrit:1296563|Remove workaround for stuck session cookies on Wikitech (T389433)]], [[gerrit:1295574|cswiki: lift IP cap for workshop on 08-June-2026 (T427678)]], [[gerrit:1296582
[15:15:10] <logmsgbot>	 |Use the globalblock-local-status right over globalblock-whitelist (T277942)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[15:15:19] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage
[15:15:38] <anzx>	 Dreamy_Jazz: nothing to test for throttle patch
[15:15:44] <wikibugs>	 (03PS1) 10Urbanecm: feat(cleanMentorList): Add a feature flag [extensions/GrowthExperiments] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296614 (https://phabricator.wikimedia.org/T427386)
[15:15:47] <Dreamy_Jazz>	 Thanks
[15:15:57] <wikibugs>	 (03Merged) 10jenkins-bot: mw-content-history-reconcile-enrich-* [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296612 (https://phabricator.wikimedia.org/T421237) (owner: 10Ottomata)
[15:15:59] <wikibugs>	 (03PS1) 10Urbanecm: feat(cleanMentorList): Add a feature flag [extensions/GrowthExperiments] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296615 (https://phabricator.wikimedia.org/T427386)
[15:17:48] <wikibugs>	 (03CR) 10Hashar: [C:03+1] "I have tested it ( T422258#11977056 ), though not with a new instance, but that got the job done.  Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1282006 (https://phabricator.wikimedia.org/T422258) (owner: 10Andrew Bogott)
[15:17:52] <logmsgbot>	 !log otto@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
[15:17:52] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] designate: remove leftover mcrouter code (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1278528 (https://phabricator.wikimedia.org/T427189) (owner: 10Andrew Bogott)
[15:17:56] <logmsgbot>	 !log otto@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
[15:18:06] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS trixie
[15:18:12] <wikibugs>	 (03CR) 10Hashar: [C:03+1] "Verified and Puppet agent still pass with the profile removed ( T422258#11977087 ). Thx!" [puppet] - 10https://gerrit.wikimedia.org/r/1282007 (https://phabricator.wikimedia.org/T422258) (owner: 10Andrew Bogott)
[15:18:25] <logmsgbot>	 !log dreamyjazz@deploy1003 matmarex, anzx, dreamyjazz: Continuing with deployment
[15:18:55] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1070.eqiad.wmnet with OS trixie
[15:19:15] <logmsgbot>	 !log otto@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[15:19:20] <logmsgbot>	 !log otto@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[15:19:59] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T426633)', diff saved to https://phabricator.wikimedia.org/P93582 and previous config saved to /var/cache/conftool/dbconfig/20260602-151958-fceratto.json
[15:20:14] <logmsgbot>	 jiji@cumin1003 decommission (PID 124164) is awaiting input
[15:20:19] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
[15:20:27] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1188 (T426633)', diff saved to https://phabricator.wikimedia.org/P93583 and previous config saved to /var/cache/conftool/dbconfig/20260602-152026-fceratto.json
[15:20:34] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage
[15:20:53] <Dreamy_Jazz>	 zabe: I'm seeing a lot of error logs relating to a script you are running
[15:21:06] <Dreamy_Jazz>	 "InvalidArgumentException: No server with index '0'"
[15:21:32] <wikibugs>	 (03PS7) 10Clément Goubert: api-gateway: Pre-teardown deprecation [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294957 (https://phabricator.wikimedia.org/T426881)
[15:22:09] <Dreamy_Jazz>	 e.g. https://logstash.wikimedia.org/app/dashboards#/doc/logstash-*/logstash-deploy-1-7.0.0-1-2026.06.02?id=a7PriJ4BwaIJ3BXy_YCr
[15:22:37] <logmsgbot>	 !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1295502|Revert "labswiki: Disallow account autocreation"]], [[gerrit:1283106|Remove unused 'writeapi' right]], [[gerrit:1296566|Clean up bot password configuration]], [[gerrit:1296563|Remove workaround for stuck session cookies on Wikitech (T389433)]], [[gerrit:1295574|cswiki: lift IP cap for workshop on 08-June-2026 (T427678)]], [[gerrit:1296582|U
[15:22:37] <logmsgbot>	 se the globalblock-local-status right over globalblock-whitelist (T277942)]] (duration: 09m 14s)
[15:22:38] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] Add new class, labs_lvm_ephemeral [puppet] - 10https://gerrit.wikimedia.org/r/1282006 (https://phabricator.wikimedia.org/T422258) (owner: 10Andrew Bogott)
[15:22:41] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] Remove profile::wmcs::lvm [puppet] - 10https://gerrit.wikimedia.org/r/1282007 (https://phabricator.wikimedia.org/T422258) (owner: 10Andrew Bogott)
[15:22:43] <stashbot>	 T389433: Fix stuck old cookies on Wikitech - https://phabricator.wikimedia.org/T389433
[15:22:43] <stashbot>	 T427678: Lift IP cap on 2026-06-08 for Czech Wikipedia workshop - cs.wikipedia - https://phabricator.wikimedia.org/T427678
[15:22:44] <stashbot>	 T277942: Address Voice and Tone issues in GlobalBlocking - https://phabricator.wikimedia.org/T277942
[15:22:50] <hashar>	 (Jenkins had some issue earlier and that got handled & fixed by j.nuche) 
[15:22:54] <wikibugs>	 (03PS10) 10Andrew Bogott: Add new class, labs_lvm_ephemeral [puppet] - 10https://gerrit.wikimedia.org/r/1282006 (https://phabricator.wikimedia.org/T422258)
[15:23:03] <wikibugs>	 (03PS8) 10Andrew Bogott: Remove profile::wmcs::lvm [puppet] - 10https://gerrit.wikimedia.org/r/1282007 (https://phabricator.wikimedia.org/T422258)
[15:25:09] <kostajh>	 jouncebot: nowandnext
[15:25:09] <jouncebot>	 For the next 0 hour(s) and 34 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1500)
[15:25:10] <jouncebot>	 In 0 hour(s) and 34 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1600)
[15:25:21] <kostajh>	 Dreamy_Jazz: I may deploy something once you’re done
[15:25:24] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] Add new class, labs_lvm_ephemeral [puppet] - 10https://gerrit.wikimedia.org/r/1282006 (https://phabricator.wikimedia.org/T422258) (owner: 10Andrew Bogott)
[15:25:26] <Dreamy_Jazz>	 I am done
[15:25:27] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] Remove profile::wmcs::lvm [puppet] - 10https://gerrit.wikimedia.org/r/1282007 (https://phabricator.wikimedia.org/T422258) (owner: 10Andrew Bogott)
[15:25:31] <Dreamy_Jazz>	 You can go ahead
[15:26:04] <kostajh>	 ok
[15:26:23] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops: ULSFO: Unrack old switches (asw2-22/23-ulsfo) - https://phabricator.wikimedia.org/T427283#11977212 (10ayounsi)
[15:26:28] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296558 (https://phabricator.wikimedia.org/T421464) (owner: 10Kosta Harlan)
[15:26:28] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296568 (https://phabricator.wikimedia.org/T421464) (owner: 10Kosta Harlan)
[15:26:29] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2158: Repooling
[15:27:27] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T426633)', diff saved to https://phabricator.wikimedia.org/P93585 and previous config saved to /var/cache/conftool/dbconfig/20260602-152726-fceratto.json
[15:29:56] <logmsgbot>	 !log otto@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
[15:30:01] <wikibugs>	 06SRE, 10SRE-tools, 06Infrastructure-Foundations, 10netops: Netbox - PuppetDB audit 2021-11 - https://phabricator.wikimedia.org/T295762#11977272 (10ayounsi) 05Open→03Resolved a:03ayounsi No need to keep that old parent task open.
[15:30:01] <logmsgbot>	 !log otto@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
[15:31:44] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1070.eqiad.wmnet with reason: host reimage
[15:32:11] <logmsgbot>	 !log otto@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[15:32:16] <logmsgbot>	 !log otto@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[15:32:54] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] Update name and address for bvibber, drop dead blog from planet [puppet] - 10https://gerrit.wikimedia.org/r/1296038 (owner: 10Bvibber)
[15:33:16] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "checked with Brooke" [puppet] - 10https://gerrit.wikimedia.org/r/1296038 (owner: 10Bvibber)
[15:33:26] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Occasional high ICMP probe response from codfw to cr1-drmrs - https://phabricator.wikimedia.org/T315645#11977334 (10cmooney) 05Stalled→03Declined
[15:33:43] <wikibugs>	 06SRE, 06Data-Engineering: Automate ingestion of netflow event stream - https://phabricator.wikimedia.org/T248865#11977336 (10ayounsi)
[15:34:56] <zabe>	 Dreamy_Jazz: Thanks for the head up! Restarted it, which seems to have fixed it.
[15:35:03] <Dreamy_Jazz>	 Thanks!
[15:35:18] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] admin: upgrade Mahmoud Abdelsattar from ldap_only to shell user [puppet] - 10https://gerrit.wikimedia.org/r/1295952 (https://phabricator.wikimedia.org/T427597) (owner: 10Dzahn)
[15:35:31] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1070.eqiad.wmnet with reason: host reimage
[15:36:10] <wikibugs>	 (03PS1) 10Ottomata: mw-content-history-reconcile-enrich - allow fetching schemas from schema service [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296619 (https://phabricator.wikimedia.org/T421237)
[15:36:34] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS trixie
[15:37:34] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1071.eqiad.wmnet with OS trixie
[15:37:35] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P93586 and previous config saved to /var/cache/conftool/dbconfig/20260602-153734-fceratto.json
[15:38:54] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Remove apiUrl health check and APCu layer from health checker [extensions/ConfirmEdit] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296558 (https://phabricator.wikimedia.org/T421464) (owner: 10Kosta Harlan)
[15:38:57] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Remove apiUrl health check and APCu layer from health checker [extensions/ConfirmEdit] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296568 (https://phabricator.wikimedia.org/T421464) (owner: 10Kosta Harlan)
[15:40:00] <wikibugs>	 (03CR) 10Ottomata: [C:03+2] mw-content-history-reconcile-enrich - allow fetching schemas from schema service [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296619 (https://phabricator.wikimedia.org/T421237) (owner: 10Ottomata)
[15:40:17] <logmsgbot>	 !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1296558|hCaptcha: Remove apiUrl health check and APCu layer from health checker (T421464)]], [[gerrit:1296568|hCaptcha: Remove apiUrl health check and APCu layer from health checker (T421464)]]
[15:40:21] <stashbot>	 T421464: hCaptcha: Stop using urldownloader for health checks of the secure-api.js file - https://phabricator.wikimedia.org/T421464
[15:40:41] <wikibugs>	 (03CR) 10A-pizzata: [C:03+1] mw-content-history-reconcile-enrich - allow fetching schemas from schema service [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296619 (https://phabricator.wikimedia.org/T421237) (owner: 10Ottomata)
[15:42:01] <logmsgbot>	 !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1296558|hCaptcha: Remove apiUrl health check and APCu layer from health checker (T421464)]], [[gerrit:1296568|hCaptcha: Remove apiUrl health check and APCu layer from health checker (T421464)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[15:42:06] <wikibugs>	 (03Merged) 10jenkins-bot: mw-content-history-reconcile-enrich - allow fetching schemas from schema service [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296619 (https://phabricator.wikimedia.org/T421237) (owner: 10Ottomata)
[15:43:31] <logmsgbot>	 !log kharlan@deploy1003 kharlan: Continuing with deployment
[15:47:41] <logmsgbot>	 !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1296558|hCaptcha: Remove apiUrl health check and APCu layer from health checker (T421464)]], [[gerrit:1296568|hCaptcha: Remove apiUrl health check and APCu layer from health checker (T421464)]] (duration: 07m 24s)
[15:47:42] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P93587 and previous config saved to /var/cache/conftool/dbconfig/20260602-154742-fceratto.json
[15:47:45] <stashbot>	 T421464: hCaptcha: Stop using urldownloader for health checks of the secure-api.js file - https://phabricator.wikimedia.org/T421464
[15:48:36] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295909 (https://phabricator.wikimedia.org/T403829) (owner: 10Kosta Harlan)
[15:49:35] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Load self-hosted secure-api.js on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295909 (https://phabricator.wikimedia.org/T403829) (owner: 10Kosta Harlan)
[15:49:51] <logmsgbot>	 !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1295909|hCaptcha: Load self-hosted secure-api.js on group0 wikis (T403829)]]
[15:49:55] <stashbot>	 T403829: hCaptcha: Self-host secure-api.js code in /static directory - https://phabricator.wikimedia.org/T403829
[15:50:17] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1071.eqiad.wmnet with reason: host reimage
[15:51:39] <logmsgbot>	 !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1295909|hCaptcha: Load self-hosted secure-api.js on group0 wikis (T403829)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[15:51:52] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1070.eqiad.wmnet with OS trixie
[15:53:03] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host mc1072.eqiad.wmnet with OS trixie
[15:53:04] <wikibugs>	 (03PS1) 10Dreamy Jazz: core-Permissions: Stop assigning unused globalblock-whitelist right [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296620 (https://phabricator.wikimedia.org/T277942)
[15:53:36] <wikibugs>	 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netbox, 10netops: Avoid ghost hosts on the network - https://phabricator.wikimedia.org/T306007#11977548 (10ayounsi) 05Open→03Resolved a:03ayounsi Looks like the current provisioning process with the `Port with no description on access switch` ale...
[15:54:01] <wikibugs>	 (03PS1) 10Ottomata: mw_page_html_content_change_enrich_next - remove temporary kafka cluster override [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296623 (https://phabricator.wikimedia.org/T423920)
[15:54:06] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1071.eqiad.wmnet with reason: host reimage
[15:56:33] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to [restricted] for Mahmoud Abdelsattar (WMDE) - https://phabricator.wikimedia.org/T427597#11977586 (10Dzahn) Hi @mahmoud.abdelsattar.wmde   give it max. ~ 30 minutes for the changes to deploy and you should have the access as requested.  Ch...
[15:56:39] <logmsgbot>	 jayme@cumin2002 reimage (PID 3605330) is awaiting input
[15:56:59] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to [restricted] for Mahmoud Abdelsattar (WMDE) - https://phabricator.wikimedia.org/T427597#11977590 (10Dzahn) 05In progress→03Resolved a:03Dzahn ` [deploy1003:~] $ id mahmoud-abdelsattar uid=100472(mahmoud-abdelsattar) gid=500(wiki...
[15:57:28] <wikibugs>	 (03CR) 10Ottomata: [C:03+2] "this staging, staging is not running.  Merging for future uses of staging." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296623 (https://phabricator.wikimedia.org/T423920) (owner: 10Ottomata)
[15:57:35] <kostajh>	 Still verifying the above config patch
[15:57:50] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T426633)', diff saved to https://phabricator.wikimedia.org/P93588 and previous config saved to /var/cache/conftool/dbconfig/20260602-155749-fceratto.json
[15:58:10] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
[15:58:10] <wikibugs>	 (03CR) 10Komla Sapaty: "This is an attempt to determine the activity levels(using SSH login activity) of Toolforge users. It looks at the last successful login re" [puppet] - 10https://gerrit.wikimedia.org/r/1294864 (owner: 10Komla Sapaty)
[15:58:18] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1197 (T426633)', diff saved to https://phabricator.wikimedia.org/P93589 and previous config saved to /var/cache/conftool/dbconfig/20260602-155817-fceratto.json
[15:58:35] <wikibugs>	 (03CR) 10Dzahn: "thanks for this one" [puppet] - 10https://gerrit.wikimedia.org/r/1296537 (https://phabricator.wikimedia.org/T356296) (owner: 10Majavah)
[15:58:48] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Looks good." [puppet] - 10https://gerrit.wikimedia.org/r/1296608 (https://phabricator.wikimedia.org/T424248) (owner: 10Atsuko)
[15:59:04] <logmsgbot>	 !log kharlan@deploy1003 kharlan: Rolling back deployment
[15:59:27] <wikibugs>	 (03PS1) 10Kosta Harlan: Revert "hCaptcha: Load self-hosted secure-api.js on group0 wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296624 (https://phabricator.wikimedia.org/T403829)
[15:59:30] <wikibugs>	 (03Merged) 10jenkins-bot: mw_page_html_content_change_enrich_next - remove temporary kafka cluster override [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296623 (https://phabricator.wikimedia.org/T423920) (owner: 10Ottomata)
[15:59:40] <logmsgbot>	 !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1295909|hCaptcha: Load self-hosted secure-api.js on group0 wikis (T403829)]] (duration: 09m 48s)
[15:59:45] <stashbot>	 T403829: hCaptcha: Self-host secure-api.js code in /static directory - https://phabricator.wikimedia.org/T403829
[15:59:58] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296624 (https://phabricator.wikimedia.org/T403829) (owner: 10Kosta Harlan)
[16:00:04] <jouncebot>	 jhathaway and rzl: Time to do the Puppet request window deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1600).
[16:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[16:00:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus-puppet-ca-exporter.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:01:08] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "lgtm, but I think this needs review from infra foundations?" [puppet] - 10https://gerrit.wikimedia.org/r/1296495 (https://phabricator.wikimedia.org/T420184) (owner: 10Arnaudb)
[16:01:48] <urbanecm>	 kostajh: i take it you're still deploying?
[16:02:01] <kostajh>	 urbanecm: yes, will be done soon
[16:02:06] <urbanecm>	 perf
[16:02:06] <kostajh>	 Rolling back a config patch
[16:02:16] <urbanecm>	 can i start CI on a MW backport? or should i wait
[16:02:54] <wikibugs>	 (03CR) 10Btullis: [C:03+2] dumps: http: Stop prepending the hostname to the syslog events [puppet] - 10https://gerrit.wikimedia.org/r/1296605 (https://phabricator.wikimedia.org/T425087) (owner: 10Btullis)
[16:03:07] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "hCaptcha: Load self-hosted secure-api.js on group0 wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296624 (https://phabricator.wikimedia.org/T403829) (owner: 10Kosta Harlan)
[16:03:20] <logmsgbot>	 !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1296624|Revert "hCaptcha: Load self-hosted secure-api.js on group0 wikis" (T403829)]]
[16:04:37] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 03 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295968 (https://phabricator.wikimedia.org/T426799) (owner: 10Marco Fossati)
[16:04:38] <kostajh>	 urbanecm: I think you can start +2’ing things
[16:04:42] <wikibugs>	 (03CR) 10Dr0ptp4kt: Add filerevision to the mediawiki not-history sqoop (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1295047 (https://phabricator.wikimedia.org/T427532) (owner: 10Dr0ptp4kt)
[16:04:45] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] feat(cleanMentorList): Add a feature flag [extensions/GrowthExperiments] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296615 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[16:04:47] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] feat(cleanMentorList): Add a feature flag [extensions/GrowthExperiments] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296614 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[16:05:09] <logmsgbot>	 !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1296624|Revert "hCaptcha: Load self-hosted secure-api.js on group0 wikis" (T403829)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[16:05:13] <stashbot>	 T403829: hCaptcha: Self-host secure-api.js code in /static directory - https://phabricator.wikimedia.org/T403829
[16:05:24] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2006.codfw.wmnet with reason: host reimage
[16:05:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus-puppet-ca-exporter.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:05:27] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197 (T426633)', diff saved to https://phabricator.wikimedia.org/P93590 and previous config saved to /var/cache/conftool/dbconfig/20260602-160527-fceratto.json
[16:05:43] <wikibugs>	 07Puppet, 06collaboration-services, 10Gerrit, 06Infrastructure-Foundations, 13Patch-For-Review: Change puppet-merge git origin to use gerrit.discovery.wmnet instead of gerrit.wikimedia.org - https://phabricator.wikimedia.org/T420184#11977655 (10Dzahn) Ah, it's in the puppetserver module. Thanks!  The pat...
[16:05:47] <logmsgbot>	 !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1072.eqiad.wmnet with reason: host reimage
[16:05:49] <logmsgbot>	 !log kharlan@deploy1003 kharlan: Continuing with deployment
[16:07:32] <wikibugs>	 (03CR) 10Atsuko: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1296608 (https://phabricator.wikimedia.org/T424248) (owner: 10Atsuko)
[16:08:15] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Adjust "port with no description on access switch" alert - https://phabricator.wikimedia.org/T353364#11977663 (10ayounsi) 05Open→03Resolved I've renamed it to `Interface UP for 7 days with no description` when migrating the alert to AlertManager. Please...
[16:08:47] <wikibugs>	 (03PS1) 10Ahmon Dancy: Bump buildkit to v0.30.0 [puppet] - 10https://gerrit.wikimedia.org/r/1296626 (https://phabricator.wikimedia.org/T426212)
[16:08:56] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:09:54] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2006.codfw.wmnet with reason: host reimage
[16:10:01] <logmsgbot>	 !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1296624|Revert "hCaptcha: Load self-hosted secure-api.js on group0 wikis" (T403829)]] (duration: 06m 40s)
[16:10:04] <kostajh>	 urbanecm: over to you
[16:10:50] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1071.eqiad.wmnet with OS trixie
[16:11:05] <wikibugs>	 (03CR) 10Dr0ptp4kt: Add filerevision to the mediawiki not-history sqoop (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1295047 (https://phabricator.wikimedia.org/T427532) (owner: 10Dr0ptp4kt)
[16:12:35] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Upgrade Cumin hosts to Trixie - https://phabricator.wikimedia.org/T427897#11977704 (10CWilliams-WMF) > AttributeError: module 'urllib3.exceptions' has no attribute 'SubjectAltNameWarning'  This looks to be an outdated PuppetDB config attempting to disable a warning that was...
[16:13:00] <wikibugs>	 (03PS1) 10Matthias Mullie: Add missing lazy img to carousel [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296627 (https://phabricator.wikimedia.org/T427821)
[16:13:12] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 03 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296627 (https://phabricator.wikimedia.org/T427821) (owner: 10Matthias Mullie)
[16:14:08] <wikibugs>	 (03PS3) 10Kamila Součková: admin: add apdube-wmf user [puppet] - 10https://gerrit.wikimedia.org/r/1295979 (https://phabricator.wikimedia.org/T427553)
[16:14:17] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1072.eqiad.wmnet with reason: host reimage
[16:14:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] admin: add apdube-wmf user [puppet] - 10https://gerrit.wikimedia.org/r/1295979 (https://phabricator.wikimedia.org/T427553) (owner: 10Kamila Součková)
[16:15:35] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P93591 and previous config saved to /var/cache/conftool/dbconfig/20260602-161534-fceratto.json
[16:16:48] <wikibugs>	 (03CR) 10Atsuko: [C:03+2] services_proxy: switch to prod opensearch-on-k8s services [puppet] - 10https://gerrit.wikimedia.org/r/1296608 (https://phabricator.wikimedia.org/T424248) (owner: 10Atsuko)
[16:17:11] <wikibugs>	 (03PS1) 10JavierMonton: stream: webrequest-page-view [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296628 (https://phabricator.wikimedia.org/T425624)
[16:17:52] <wikibugs>	 (03Merged) 10jenkins-bot: feat(cleanMentorList): Add a feature flag [extensions/GrowthExperiments] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296615 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[16:17:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] feat(cleanMentorList): Add a feature flag [extensions/GrowthExperiments] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296614 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[16:18:01] <wikibugs>	 (03PS1) 10Atsuko: translate: adding separate read/write endpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296631 (https://phabricator.wikimedia.org/T425377)
[16:18:12] <wikibugs>	 (03CR) 10Urbanecm: feat(cleanMentorList): Add a feature flag [extensions/GrowthExperiments] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296614 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[16:18:27] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] feat(cleanMentorList): Add a feature flag [extensions/GrowthExperiments] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296614 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[16:18:52] <wikibugs>	 (03PS1) 10Matthias Mullie: Image Browsing: add accessible labels to carousel elements [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296632 (https://phabricator.wikimedia.org/T407793)
[16:19:26] <wikibugs>	 (03CR) 10Kosta Harlan: [C:04-2] hCaptcha: Roll out self-hosted secure-api.js to all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1295910 (https://phabricator.wikimedia.org/T403829) (owner: 10Kosta Harlan)
[16:19:49] <wikibugs>	 (03PS1) 10Kosta Harlan: Revert^2 "hCaptcha: Load self-hosted secure-api.js on group0 wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296635 (https://phabricator.wikimedia.org/T403829)
[16:19:51] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy1003 using scap backport" [extensions/GrowthExperiments] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296614 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[16:19:57] <jinxer-wm>	 FIRING: [8x] ProbeDown: Service mw-web:4450 has failed probes (http_mw-web_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:20:15] <federico3>	 !ack
[16:20:16] <sirenbot>	 8045 (ACKED)  [8x] ProbeDown sre (probes/service)
[16:20:21] <wikibugs>	 (03CR) 10Kosta Harlan: [C:04-2] "Needs https://gitlab.wikimedia.org/repos/product-safety-and-integrity/hcaptcha-secure-api-vendor/-/merge_requests/3 to be merged and synce" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296635 (https://phabricator.wikimedia.org/T403829) (owner: 10Kosta Harlan)
[16:20:42] <jhathaway>	 o/
[16:20:57] <jinxer-wm>	 FIRING: [3x] ProbeDown: Service mw-web:4450 has failed probes (http_mw-web_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:21:13] <cdanis>	 urbanecm: what's train status?
[16:21:13] <federico3>	 looks pretty noisy, is it real? https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes%2Fservice&var-module=%24__all&orgId=1&from=now-15m&to=now&timezone=utc&var-site=%24__all&var-Filters
[16:21:15] <jinxer-wm>	 FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[16:21:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 2.74% idle #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:21:19] <cdanis>	 federico3: yes
[16:21:31] <jinxer-wm>	 FIRING: [2x] RedisReplicaDown: Redis replica down rdb2014:16378 redis_misc - https://wikitech.wikimedia.org/wiki/Redis#Cluster_redis_misc  - https://alerts.wikimedia.org/?q=alertname%3DRedisReplicaDown
[16:21:34] <federico3>	 cdanis: anything I can help with?
[16:21:45] <urbanecm>	 cdanis: not a train conductor. i only +2'ed some patches, but i stopped scap, so at most they'll merge
[16:21:55] <urbanecm>	 according to https://versions.toolforge.org/, we are still on wmf.4
[16:22:22] <wikibugs>	 10SRE-swift-storage, 06cloud-services-team, 06Commons: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949 (10Andrew) 03NEW
[16:22:37] <urbanecm>	 the errors are "503 Service Unavailable" with bunch of services
[16:23:13] <wikibugs>	 10SRE-swift-storage, 06cloud-services-team, 06Commons: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11977818 (10Andrew)
[16:23:15] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-web releases routed via main (k8s) 1.273s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:24:06] <wikibugs>	 10SRE-swift-storage, 06cloud-services-team, 06Commons: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11977825 (10Andrew)
[16:24:14] <wikibugs>	 (03PS1) 10Btullis: Add the wdqs::alternative nodes to the S3/Ceph envoy firewall [puppet] - 10https://gerrit.wikimedia.org/r/1296636 (https://phabricator.wikimedia.org/T427319)
[16:24:17] <wikibugs>	 (03CR) 10JavierMonton: [C:03+2] stream: webrequest-page-view [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296628 (https://phabricator.wikimedia.org/T425624) (owner: 10JavierMonton)
[16:24:45] <dancy>	 There are over 2 million log records in the last 15 minutes from kartotherian
[16:24:48] <wikibugs>	 (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1296636 (https://phabricator.wikimedia.org/T427319) (owner: 10Btullis)
[16:24:57] <jinxer-wm>	 RESOLVED: [8x] ProbeDown: Service mw-web:4450 has failed probes (http_mw-web_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:25:37] <dancy>	 Ah, not all kartotherian.  Many different services
[16:25:43] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P93593 and previous config saved to /var/cache/conftool/dbconfig/20260602-162542-fceratto.json
[16:25:57] <jinxer-wm>	 RESOLVED: [8x] ProbeDown: Service mw-web:4450 has failed probes (http_mw-web_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:26:05] <federico3>	 !incidents
[16:26:06] <sirenbot>	 8046 (UNACKED)  PHPFPMTooBusy sre (mw-web main codfw)
[16:26:06] <sirenbot>	 8045 (RESOLVED)  [8x] ProbeDown sre (probes/service)
[16:26:06] <sirenbot>	 8040 (RESOLVED)  Host es2050 (paged)
[16:26:06] <sirenbot>	 8039 (RESOLVED)  Host db2175 (paged)
[16:26:07] <sirenbot>	 8042 (RESOLVED)  Host db2157 (paged)
[16:26:07] <sirenbot>	 8043 (RESOLVED)  Host db2153 (paged)
[16:26:07] <sirenbot>	 8041 (RESOLVED)  Host db2154 (paged)
[16:26:08] <sirenbot>	 8044 (RESOLVED)  Host db2176 (paged)
[16:26:08] <sirenbot>	 8038 (RESOLVED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet eqsin)
[16:26:11] <federico3>	 !ack
[16:26:12] <sirenbot>	 8046 (ACKED)  PHPFPMTooBusy sre (mw-web main codfw)
[16:26:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[16:26:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 2.74% idle #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:26:31] <jinxer-wm>	 FIRING: [5x] RedisReplicaDown: Redis replica down rdb2014:16378 redis_misc - https://wikitech.wikimedia.org/wiki/Redis#Cluster_redis_misc  - https://alerts.wikimedia.org/?q=alertname%3DRedisReplicaDown
[16:26:51] <wikibugs>	 10SRE-swift-storage, 06cloud-services-team, 06Commons: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11977884 (10MatthewVernon) TIFF compression is fairly easy via [[ https://manpages.debian.org/trixie/libtiff-tools/tiffcp.1.en.html | tiffcp ]] (I'm not a compression specialist, b...
[16:26:52] <wikibugs>	 (03Merged) 10jenkins-bot: stream: webrequest-page-view [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296628 (https://phabricator.wikimedia.org/T425624) (owner: 10JavierMonton)
[16:27:37] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2006.codfw.wmnet with OS trixie
[16:27:51] <wikibugs>	 10SRE-swift-storage, 06Commons: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11977905 (10Andrew)
[16:28:15] <jinxer-wm>	 FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[16:28:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 0% idle #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:28:15] <wikibugs>	 10SRE-swift-storage, 06Commons: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11977910 (10MatthewVernon) TIFF compression can be done losslessly, so I see no reason to accept uncompressed TIFFs.
[16:28:21] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-web releases routed via main (k8s) 874.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:29:14] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons, 10MediaWiki-File-management: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11977917 (10MatthewVernon)
[16:29:45] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[16:30:08] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[16:30:12] <jinxer-wm>	 FIRING: [8x] ProbeDown: Service mw-web:4450 has failed probes (http_mw-web_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:30:15] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-web releases routed via main (k8s) 2.5s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:30:16] <MatmaRex>	 Dreamy_Jazz: i just noticed that you shipped my config patches earlier, thanks!
[16:30:24] <federico3>	 !ack
[16:30:25] <sirenbot>	 8047 (ACKED)  PHPFPMTooBusy sre (mw-web main codfw)
[16:30:25] <sirenbot>	 8048 (ACKED)  [6x] ProbeDown sre (probes/service)
[16:30:27] <jinxer-wm>	 FIRING: [8x] ProbeDown: Service mw-web:4450 has failed probes (http_mw-web_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:30:51] <logmsgbot>	 !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1072.eqiad.wmnet with OS trixie
[16:31:10] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] CI: Fix CI pass on template render fail [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295947 (https://phabricator.wikimedia.org/T427307) (owner: 10Kamila Součková)
[16:31:52] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] Bump buildkit to v0.30.0 [puppet] - 10https://gerrit.wikimedia.org/r/1296626 (https://phabricator.wikimedia.org/T426212) (owner: 10Ahmon Dancy)
[16:32:14] <Dreamy_Jazz>	 No problem, I thought I might as well ship them with mine
[16:32:48] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296631 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko)
[16:33:13] <wikibugs>	 (03Merged) 10jenkins-bot: feat(cleanMentorList): Add a feature flag [extensions/GrowthExperiments] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1296614 (https://phabricator.wikimedia.org/T427386) (owner: 10Urbanecm)
[16:33:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[16:33:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 0% idle #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:33:40] <wikibugs>	 (03CR) 10Dzahn: "No, I meant to say it should change nothing. The same procedure as before, just different code to get the same result to be able to rsync " [puppet] - 10https://gerrit.wikimedia.org/r/1295967 (https://phabricator.wikimedia.org/T412780) (owner: 10Dzahn)
[16:33:56] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:34:00] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[16:34:04] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[16:35:03] <jinxer-wm>	 RESOLVED: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster main-codfw in codfw - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-kafka_cluster=main-codfw - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[16:35:12] <jinxer-wm>	 RESOLVED: [8x] ProbeDown: Service mw-web:4450 has failed probes (http_mw-web_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:35:13] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis to 1.47.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296645 (https://phabricator.wikimedia.org/T423914)
[16:35:15] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-web releases routed via main (k8s) 1.028s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:35:16] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Initiated by dancy@deploy1003" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296645 (https://phabricator.wikimedia.org/T423914) (owner: 10TrainBranchBot)
[16:35:51] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197 (T426633)', diff saved to https://phabricator.wikimedia.org/P93594 and previous config saved to /var/cache/conftool/dbconfig/20260602-163550-fceratto.json
[16:35:56] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:35:58] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[16:36:04] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[16:36:15] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
[16:36:23] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1229 (T426633)', diff saved to https://phabricator.wikimedia.org/P93595 and previous config saved to /var/cache/conftool/dbconfig/20260602-163622-fceratto.json
[16:36:42] <wikibugs>	 (03CR) 10Dr0ptp4kt: "Except maybe the logging piece. It would then inherit 64 mappers (instead of the 10 suggested in @aqu comment - which I'm gathering was ju" [puppet] - 10https://gerrit.wikimedia.org/r/1295045 (https://phabricator.wikimedia.org/T427532) (owner: 10Dr0ptp4kt)
[16:41:17] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis to 1.47.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296645 (https://phabricator.wikimedia.org/T423914) (owner: 10TrainBranchBot)
[16:43:15] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis to 1.47.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296646 (https://phabricator.wikimedia.org/T423914)
[16:43:18] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Initiated by dancy@deploy1003" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296646 (https://phabricator.wikimedia.org/T423914) (owner: 10TrainBranchBot)
[16:43:29] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229 (T426633)', diff saved to https://phabricator.wikimedia.org/P93596 and previous config saved to /var/cache/conftool/dbconfig/20260602-164328-fceratto.json
[16:46:09] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons, 10media-backups, 10MediaWiki-File-management: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11978047 (10jcrespo)
[16:46:56] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons, 10media-backups, 10MediaWiki-File-management: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11978050 (10Andrew)
[16:47:31] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1050.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[16:48:10] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1050.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[16:48:10] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:48:11] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1050.eqiad.wmnet
[16:49:20] <wikibugs>	 (03CR) 10Dzahn: trafficserver: add a map for gitlab as a backend (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1290731 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb)
[16:49:46] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 03 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1296632 (https://phabricator.wikimedia.org/T407793) (owner: 10Matthias Mullie)
[16:49:59] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.hosts.decommission for hosts mc1051.eqiad.wmnet
[16:53:37] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P93597 and previous config saved to /var/cache/conftool/dbconfig/20260602-165336-fceratto.json
[16:53:39] <wikibugs>	 (03PS1) 10Dbrant: hCaptcha: Roll out to all except enwiki for mobile apps. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296649 (https://phabricator.wikimedia.org/T426048)
[17:00:05] <jouncebot>	 swfrench-wmf: May I have your attention please! MediaWiki infrastructure (UTC late). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1700)
[17:01:29] <swfrench-wmf>	 o/
[17:03:47] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P93598 and previous config saved to /var/cache/conftool/dbconfig/20260602-170344-fceratto.json
[17:03:53] <swfrench-wmf>	 dancy: it looks like you might be re-running presync? no worries if that's the case, as I should be able to work around it
[17:04:26] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.dns.netbox
[17:05:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on wikikube-worker1071:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1071 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[17:05:54] <dancy>	 swfrench-wmf: I'm blocked by CI issues at the moment so it's all yours. I'll try again later 
[17:08:20] <swfrench-wmf>	 dancy: hmmm ... it looks like https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1296645 merged, so a sync-world now would pick that up, right?
[17:09:14] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service lsw1-f1-codfw.mgmt.codfw.wmnet:32767 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#lsw1-f1-codfw.mgmt.codfw.wmnet:32767 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[17:10:26] <logmsgbot>	 jiji@cumin1003 decommission (PID 203985) is awaiting input
[17:10:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on wikikube-worker1071:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1071 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[17:13:55] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229 (T426633)', diff saved to https://phabricator.wikimedia.org/P93599 and previous config saved to /var/cache/conftool/dbconfig/20260602-171354-fceratto.json
[17:14:00] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+2] CI: Fix CI pass on template render fail [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295947 (https://phabricator.wikimedia.org/T427307) (owner: 10Kamila Součková)
[17:14:15] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
[17:14:17] <wikibugs>	 (03CR) 10Scott French: [C:03+2] shellbox: Pick up newly rebuilt images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296585 (owner: 10Scott French)
[17:14:23] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1233 (T426633)', diff saved to https://phabricator.wikimedia.org/P93600 and previous config saved to /var/cache/conftool/dbconfig/20260602-171422-fceratto.json
[17:17:28] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] sre.puppet.disable-merges: New cookbook to disable Puppet merges temporarily [cookbooks] - 10https://gerrit.wikimedia.org/r/1295425 (https://phabricator.wikimedia.org/T248872) (owner: 10Muehlenhoff)
[17:21:36] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1233 (T426633)', diff saved to https://phabricator.wikimedia.org/P93601 and previous config saved to /var/cache/conftool/dbconfig/20260602-172135-fceratto.json
[17:26:16] <wikibugs>	 (03CR) 10Kosta Harlan: [C:03+1] hCaptcha: Roll out to all except enwiki for mobile apps. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296649 (https://phabricator.wikimedia.org/T426048) (owner: 10Dbrant)
[17:31:23] <urbanecm>	 swfrench-wmf: dancy: note that i probably have undeployed code merged, as it merged during the incident
[17:31:33] <urbanecm>	 happy to finalise the deployment, but i see we're now in MW infra
[17:31:43] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P93602 and previous config saved to /var/cache/conftool/dbconfig/20260602-173143-fceratto.json
[17:32:43] <swfrench-wmf>	 urbanecm: thanks! yeah, I've not touched MediaWiki, as I don't know the state of /srv/mediawiki-staging. which is to say, if you'd like to pick up your deployment, please go ahead (I'm trying to understand a CI issue blocking other work I have planned).
[17:32:58] <urbanecm>	 okay, let me finish it then
[17:33:18] <swfrench-wmf>	 unless, dancy has any concerns that is
[17:33:37] <dancy>	 swfrench-wmf: sorry for the late reply. Yes, that wikiversions change will be picked up. That works for me if it works for you
[17:34:17] <swfrench-wmf>	 ah, cool - no objections on my end. FYI, urbanecm ^ it looks like testwikis will also get .5 in the same deployment.
[17:34:39] <urbanecm>	 yeah, just saw that in scap
[17:34:43] <urbanecm>	 so, that's expected i guess?
[17:34:54] <swfrench-wmf>	 yes, expected it seems
[17:35:38] <dancy>	 That does mean it will be a long deployment. If that's a pain, I can revert the wiki etsi
[17:35:50] <dancy>	 *the wikiversions change
[17:36:24] <urbanecm>	 fine with me
[17:36:31] <dancy>	 Great
[17:37:01] <urbanecm>	 kostajh: scap says you have some undeployed backports (hCaptcha: Remove apiUrl health check and APCu layer from health checker)
[17:37:04] <urbanecm>	 is that expected?
[17:38:35] <urbanecm>	 or Dreamy_Jazz 
[17:38:39] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296649 (https://phabricator.wikimedia.org/T426048) (owner: 10Dbrant)
[17:40:19] <wikibugs>	 (03Merged) 10jenkins-bot: CI: Fix CI pass on template render fail [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295947 (https://phabricator.wikimedia.org/T427307) (owner: 10Kamila Součková)
[17:40:22] <wikibugs>	 (03Merged) 10jenkins-bot: shellbox: Pick up newly rebuilt images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296585 (owner: 10Scott French)
[17:41:50] <logmsgbot>	 !log urbanecm@deploy1003 Started scap sync-world: Backport for [[gerrit:1296615|feat(cleanMentorList): Add a feature flag (T427386)]], [[gerrit:1296614|feat(cleanMentorList): Add a feature flag (T427386)]]
[17:41:51] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P93603 and previous config saved to /var/cache/conftool/dbconfig/20260602-174150-fceratto.json
[17:41:54] <stashbot>	 T427386: Deploy automated mentor list cleanup to Wikimedia wikis - https://phabricator.wikimedia.org/T427386
[17:42:25] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox: apply
[17:42:41] <wikibugs>	 (03PS1) 10SBassett: varnish: Add CSP report-only directives for all of upload.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1296654 (https://phabricator.wikimedia.org/T117618)
[17:42:51] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox: apply
[17:42:52] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
[17:43:05] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
[17:43:06] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-media: apply
[17:43:20] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
[17:43:21] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
[17:43:36] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[17:43:37] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
[17:43:54] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
[17:43:56] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-video: apply
[17:44:18] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
[17:44:50] <Daimona>	 Hi folks, I would need to run a query on x1.wikishared to recover accidentally deleted data. It's a trivial query affecting ~60 records, which I pasted in https://phabricator.wikimedia.org/T427962#11978299. May I go ahead and run that?
[17:45:25] <urbanecm>	 Daimona: it might be helpful to have a +1 on the query beforehand, just in case (TM)
[17:46:12] <Daimona>	 Does a self +1 count? :D (I tried it locally but you aren't wrong)
[17:47:04] <urbanecm>	 otherwise, as long as you wrap it in a transaction (just in case more records are affected) and it's not a regular thing, makes sense. but i'd recommend the review anyway :))
[17:47:19] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox: apply
[17:48:02] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox: apply
[17:48:33] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
[17:49:10] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
[17:49:41] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-media: apply
[17:49:59] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
[17:50:18] <Daimona>	 I just double-checked, it's 61 records affected. The review should be coming soon :)
[17:50:31] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
[17:50:55] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[17:51:27] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
[17:51:51] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
[17:51:58] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1233 (T426633)', diff saved to https://phabricator.wikimedia.org/P93604 and previous config saved to /var/cache/conftool/dbconfig/20260602-175157-fceratto.json
[17:52:20] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1254.eqiad.wmnet with reason: Maintenance
[17:52:22] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-video: apply
[17:52:28] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1254 (T426633)', diff saved to https://phabricator.wikimedia.org/P93605 and previous config saved to /var/cache/conftool/dbconfig/20260602-175227-fceratto.json
[17:53:23] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
[17:53:32] <icinga-wm>	 PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on deploy1003 is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging https://wikitech.wikimedia.org/wiki/Monitoring/bad_directory_owner
[17:55:53] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1051.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[17:56:29] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1051.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[17:56:29] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:56:30] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1051.eqiad.wmnet
[17:57:58] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
[17:58:22] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[17:59:32] <logmsgbot>	 jiji@cumin1003 decommission (PID 263264) is awaiting input
[17:59:34] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1254 (T426633)', diff saved to https://phabricator.wikimedia.org/P93607 and previous config saved to /var/cache/conftool/dbconfig/20260602-175933-fceratto.json
[18:00:05] <jouncebot>	 dancy and jnuche: That opportune time for a MediaWiki train - Utc-7+Utc-0 Version deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T1800).
[18:00:32] <urbanecm>	 (deployment finishing)
[18:00:51] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.hosts.decommission for hosts mc1052.eqiad.wmnet
[18:01:31] <logmsgbot>	 !log urbanecm@deploy1003 urbanecm: Backport for [[gerrit:1296615|feat(cleanMentorList): Add a feature flag (T427386)]], [[gerrit:1296614|feat(cleanMentorList): Add a feature flag (T427386)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[18:01:34] <stashbot>	 T427386: Deploy automated mentor list cleanup to Wikimedia wikis - https://phabricator.wikimedia.org/T427386
[18:01:50] <logmsgbot>	 !log urbanecm@deploy1003 urbanecm: Continuing with deployment
[18:01:50] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox: apply
[18:02:23] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox: apply
[18:02:38] <swfrench-wmf>	 !log reverting shellbox to 2026-05-20-192555 due to errors in shellbox-syntaxhighlight
[18:02:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:02:54] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
[18:04:29] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
[18:05:00] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-media: apply
[18:05:15] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
[18:05:46] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
[18:05:49] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[18:06:20] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
[18:06:51] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
[18:07:22] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-video: apply
[18:08:37] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
[18:09:42] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P93608 and previous config saved to /var/cache/conftool/dbconfig/20260602-180941-fceratto.json
[18:10:48] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.dns.netbox
[18:12:16] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox: apply
[18:12:32] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox: apply
[18:12:34] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
[18:13:00] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
[18:13:01] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-media: apply
[18:13:10] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
[18:13:11] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
[18:13:20] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[18:13:21] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
[18:13:30] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
[18:13:31] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-video: apply
[18:13:39] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
[18:13:56] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job rsyslog-receiver in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[18:14:05] <Daimona>	 Alright I got a +1 :) May I run the query from https://phabricator.wikimedia.org/T427962#11978299 in x1.wikishared now?
[18:15:41] <wikibugs>	 (03PS1) 10Scott French: shellbox: Revert to 2026-05-20-192555 images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296659
[18:16:00] <logmsgbot>	 !log urbanecm@deploy1003 Finished scap sync-world: Backport for [[gerrit:1296615|feat(cleanMentorList): Add a feature flag (T427386)]], [[gerrit:1296614|feat(cleanMentorList): Add a feature flag (T427386)]] (duration: 34m 09s)
[18:16:04] <stashbot>	 T427386: Deploy automated mentor list cleanup to Wikimedia wikis - https://phabricator.wikimedia.org/T427386
[18:16:49] <logmsgbot>	 jiji@cumin1003 decommission (PID 263264) is awaiting input
[18:18:23] <wikibugs>	 (03Abandoned) 10Ahmon Dancy: testwikis to 1.47.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296646 (https://phabricator.wikimedia.org/T423914) (owner: 10TrainBranchBot)
[18:18:48] <urbanecm>	 Daimona: 👍 from me
[18:18:54] <urbanecm>	 Also, done with deployment
[18:19:08] <dancy>	 urbanecm: Thanks!
[18:19:49] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P93609 and previous config saved to /var/cache/conftool/dbconfig/20260602-181949-fceratto.json
[18:20:51] <Daimona>	 Going ahead then, ty :)
[18:21:39] <Daimona>	 !log Running query from T427962#11978299 in x1.wikishared
[18:21:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:44] <stashbot>	 T427962: Accidentally removed (unregistered) all 61 participants from an event on Meta-Wiki: is the data recoverable? - https://phabricator.wikimedia.org/T427962
[18:24:59] <dancy>	 !log Train is blocked at testwikis on https://phabricator.wikimedia.org/T427935
[18:25:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:25:23] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1052.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[18:25:45] <wikibugs>	 (03CR) 10Scott French: [C:03+2] shellbox: Revert to 2026-05-20-192555 images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296659 (owner: 10Scott French)
[18:26:03] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1052.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[18:26:04] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[18:26:04] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1052.eqiad.wmnet
[18:26:13] <wikibugs>	 (03PS1) 10Santiago Faci: Test Kitchen UI: Deploy v1.3.9 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296660 (https://phabricator.wikimedia.org/T427543)
[18:27:49] <mutante>	 !log gerrit delete unused plugin projects: barricade, WikimediaBlocks and WikimediaWebSessions
[18:27:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:28:10] <wikibugs>	 (03Merged) 10jenkins-bot: shellbox: Revert to 2026-05-20-192555 images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296659 (owner: 10Scott French)
[18:29:07] <logmsgbot>	 jiji@cumin1003 decommission (PID 284379) is awaiting input
[18:29:56] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1254 (T426633)', diff saved to https://phabricator.wikimedia.org/P93610 and previous config saved to /var/cache/conftool/dbconfig/20260602-182956-fceratto.json
[18:30:16] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1259.eqiad.wmnet with reason: Maintenance
[18:30:24] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1259 (T426633)', diff saved to https://phabricator.wikimedia.org/P93611 and previous config saved to /var/cache/conftool/dbconfig/20260602-183023-fceratto.json
[18:32:51] <swfrench-wmf>	 dancy: FYI, you may see me making some changes to shellbox in the background. should not conflict with your work (trying to debug something).
[18:33:54] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.hosts.decommission for hosts mc1053.eqiad.wmnet
[18:35:14] <dancy>	 swfrench-wmf: thx. Train is currently blocked so nothing happening there right now
[18:35:39] <swfrench-wmf>	 ah, got it. best of luck unblocking.
[18:37:04] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
[18:37:22] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[18:37:23] <wikibugs>	 (03CR) 10Komla Sapaty: "I will go ahead and redact the usernames so that they are not logged, either in the CSV file or in the DB" [puppet] - 10https://gerrit.wikimedia.org/r/1294864 (owner: 10Komla Sapaty)
[18:37:50] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1259 (T426633)', diff saved to https://phabricator.wikimedia.org/P93612 and previous config saved to /var/cache/conftool/dbconfig/20260602-183749-fceratto.json
[18:38:32] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
[18:38:38] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[18:38:49] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.dns.netbox
[18:38:56] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job rsyslog-receiver in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[18:44:54] <logmsgbot>	 jiji@cumin1003 decommission (PID 284379) is awaiting input
[18:47:01] <wikibugs>	 (03Abandoned) 10Andrew Bogott: rabbitmq: add haproxy in front of codfw1dev endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1260100 (https://phabricator.wikimedia.org/T420937) (owner: 10Andrew Bogott)
[18:47:57] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P93614 and previous config saved to /var/cache/conftool/dbconfig/20260602-184757-fceratto.json
[18:52:00] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 to 1.47.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296662 (https://phabricator.wikimedia.org/T423914)
[18:52:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Initiated by dancy@deploy1003" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296662 (https://phabricator.wikimedia.org/T423914) (owner: 10TrainBranchBot)
[18:53:03] <wikibugs>	 (03Merged) 10jenkins-bot: group0 to 1.47.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296662 (https://phabricator.wikimedia.org/T423914) (owner: 10TrainBranchBot)
[18:58:05] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P93615 and previous config saved to /var/cache/conftool/dbconfig/20260602-185804-fceratto.json
[19:01:20] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+2] "https://puppet-compiler.wmflabs.org/output/1295967/8630/gerrit1003.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1295967 (https://phabricator.wikimedia.org/T412780) (owner: 10Dzahn)
[19:04:26] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:05:19] <logmsgbot>	 !log dancy@deploy1003 rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.5  refs T423914
[19:05:23] <stashbot>	 T423914: 1.47.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T423914
[19:06:59] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: 'purge_temporary_accounts' job is owned by PSI, not MWP [puppet] - 10https://gerrit.wikimedia.org/r/1296664
[19:08:12] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1259 (T426633)', diff saved to https://phabricator.wikimedia.org/P93616 and previous config saved to /var/cache/conftool/dbconfig/20260602-190811-fceratto.json
[19:08:37] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+2] "/usr/local/sbin/sync-gerrit* files have been created - but no timers to do anything automatically - as intended" [puppet] - 10https://gerrit.wikimedia.org/r/1295967 (https://phabricator.wikimedia.org/T412780) (owner: 10Dzahn)
[19:09:00] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[19:09:08] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1157 (T426633)', diff saved to https://phabricator.wikimedia.org/P93617 and previous config saved to /var/cache/conftool/dbconfig/20260602-190907-fceratto.json
[19:13:02] <wikibugs>	 (03CR) 10Clare Ming: [C:03+2] Test Kitchen UI: Deploy v1.3.9 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296660 (https://phabricator.wikimedia.org/T427543) (owner: 10Santiago Faci)
[19:15:08] <wikibugs>	 (03Merged) 10jenkins-bot: Test Kitchen UI: Deploy v1.3.9 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296660 (https://phabricator.wikimedia.org/T427543) (owner: 10Santiago Faci)
[19:37:32] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1053.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[19:40:37] <logmsgbot>	 jiji@cumin1003 decommission (PID 284379) is awaiting input
[19:48:25] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1053.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[19:48:25] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:48:26] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1053.eqiad.wmnet
[19:49:44] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] Add the wdqs::alternative nodes to the S3/Ceph envoy firewall [puppet] - 10https://gerrit.wikimedia.org/r/1296636 (https://phabricator.wikimedia.org/T427319) (owner: 10Btullis)
[19:51:28] <logmsgbot>	 jiji@cumin1003 decommission (PID 341305) is awaiting input
[19:55:04] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] "PSI like their alerting to go to slack, but I assume this is set up per team so should be handled by this" [puppet] - 10https://gerrit.wikimedia.org/r/1296664 (owner: 10Bartosz Dziewoński)
[20:00:02] <wikibugs>	 (03PS4) 10Kamila Součková: admin: add apdube-wmf user [puppet] - 10https://gerrit.wikimedia.org/r/1295979 (https://phabricator.wikimedia.org/T427553)
[20:00:05] <jouncebot>	 RoanKattouw, urbanecm, TheresNoTime, kindrobot, and cjming: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T2000).
[20:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[20:00:31] <TheresNoTime>	 (indeed)
[20:03:30] <wikibugs>	 (03CR) 10Kamila Součková: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1295979 (https://phabricator.wikimedia.org/T427553) (owner: 10Kamila Součková)
[20:03:45] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.hosts.decommission for hosts mc1054.eqiad.wmnet
[20:07:16] <wikibugs>	 (03PS1) 10Effie Mouzeli: site.pp: mc1054 is being decommed [puppet] - 10https://gerrit.wikimedia.org/r/1296672 (https://phabricator.wikimedia.org/T426303)
[20:09:22] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T426633)', diff saved to https://phabricator.wikimedia.org/P93618 and previous config saved to /var/cache/conftool/dbconfig/20260602-200922-fceratto.json
[20:11:25] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] site.pp: mc1054 is being decommed [puppet] - 10https://gerrit.wikimedia.org/r/1296672 (https://phabricator.wikimedia.org/T426303) (owner: 10Effie Mouzeli)
[20:12:10] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] site.pp: mc1054 is being decommed [puppet] - 10https://gerrit.wikimedia.org/r/1296672 (https://phabricator.wikimedia.org/T426303) (owner: 10Effie Mouzeli)
[20:18:56] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[20:18:58] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: re-rack mc2055 (before Jun 9th) - https://phabricator.wikimedia.org/T427373#11979006 (10jijiki) >>! In T427373#11976649, @Jhancock.wm wrote: > @jijiki i'm ready whenever you are to do the move. should only take about 20-30 mi...
[20:19:30] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P93619 and previous config saved to /var/cache/conftool/dbconfig/20260602-201929-fceratto.json
[20:20:14] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.dns.netbox
[20:20:56] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[20:21:18] <wikibugs>	 (03CR) 10Ottomata: [C:03+1] kafka event platform logs - Strip the stray $!msg field [puppet] - 10https://gerrit.wikimedia.org/r/1296607 (https://phabricator.wikimedia.org/T291645) (owner: 10Btullis)
[20:21:21] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops: Repurpose ganeti102[3456] for Zuul migration - https://phabricator.wikimedia.org/T427353#11979010 (10wiki_willy) a:03VRiley-WMF
[20:22:43] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware, 06ServiceOps new, and 2 others: decommission mc10[37-54] - https://phabricator.wikimedia.org/T426303#11979017 (10jijiki)
[20:23:13] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware, 06ServiceOps new, and 2 others: decommission mc10[37-54] - https://phabricator.wikimedia.org/T426303#11979019 (10jijiki) @Jclark-ctr  over to you folks!
[20:23:16] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware, 06ServiceOps new, and 2 others: decommission mc10[37-54] - https://phabricator.wikimedia.org/T426303#11979020 (10Jclark-ctr) a:03Jclark-ctr
[20:26:06] <logmsgbot>	 !log jiji@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1054.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[20:26:31] <jinxer-wm>	 FIRING: [5x] RedisReplicaDown: Redis replica down rdb2014:16378 redis_misc - https://wikitech.wikimedia.org/wiki/Redis#Cluster_redis_misc  - https://alerts.wikimedia.org/?q=alertname%3DRedisReplicaDown
[20:27:28] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1054.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
[20:27:28] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:27:29] <logmsgbot>	 !log jiji@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1054.eqiad.wmnet
[20:27:39] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware, 06ServiceOps new, and 2 others: decommission mc10[37-54] - https://phabricator.wikimedia.org/T426303#11979032 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jiji@cumin1003 for hosts: `mc1054.eqiad.wmnet` - mc1054.eqiad.wmnet (**PASS**)...
[20:29:38] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P93620 and previous config saved to /var/cache/conftool/dbconfig/20260602-202937-fceratto.json
[20:39:45] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T426633)', diff saved to https://phabricator.wikimedia.org/P93621 and previous config saved to /var/cache/conftool/dbconfig/20260602-203945-fceratto.json
[20:45:20] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] hCaptcha: Roll out to all except enwiki for mobile apps. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296649 (https://phabricator.wikimedia.org/T426048) (owner: 10Dbrant)
[21:00:05] <jouncebot>	 Deploy window Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260602T2100)
[21:01:55] <wikibugs>	 (03CR) 10JHathaway: "I'm a little wary of copying over the existing privileges, I would" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1293593 (https://phabricator.wikimedia.org/T426180) (owner: 10Elukey)
[21:07:30] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] wmnet: Update x3-master alias [dns] - 10https://gerrit.wikimedia.org/r/1296511 (https://phabricator.wikimedia.org/T427895) (owner: 10Gerrit maintenance bot)
[21:08:18] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Observability-Logging: Degraded RAID on centrallog1002 - https://phabricator.wikimedia.org/T427748#11979241 (10Jclark-ctr) I was double checking and i was looking at model not serail.. verified again it is actually slot 5 .    Disk 5 on Embedded AHCI Controller 2   Available...
[21:09:14] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service lsw1-f1-codfw.mgmt.codfw.wmnet:32767 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#lsw1-f1-codfw.mgmt.codfw.wmnet:32767 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[21:09:49] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Observability-Logging: Degraded RAID on centrallog1002 - https://phabricator.wikimedia.org/T427748#11979250 (10Jclark-ctr) Removed Failed drive Verified sdb has been removed    ` jclark@centrallog1002:~$ cat /proc/mdstat Personalities : [raid10] [linear] [multipath] [raid0]...
[21:11:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Observability-Logging: Degraded RAID on centrallog1002 - https://phabricator.wikimedia.org/T427748#11979264 (10Jclark-ctr) New drive has been Attached  @colewhite  ready to be rebuilt   ` [Tue Jun  2 21:09:44 2026] sd 7:0:0:0: Attached scsi generic sg5 type 0 [Tue Jun  2 21:...
[21:13:22] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware, and 2 others: decommission mc10[37-54] - https://phabricator.wikimedia.org/T426303#11979267 (10jijiki)
[21:18:57] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] P:cache:haproxy add image generator information (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1295921 (https://phabricator.wikimedia.org/T414338) (owner: 10Slyngshede)
[21:19:27] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons, 10media-backups, 10MediaWiki-File-management: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11979309 (10Ladsgroup) As some data: `  mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select actor_name, sum(fr_size) from filerevision jo...
[21:23:23] <wikibugs>	 (03PS1) 10Jforrester: Drop the abstractwiki-rust-web images, no longer used [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1296681 (https://phabricator.wikimedia.org/T425340)
[21:23:26] <wikibugs>	 (03PS1) 10Jforrester: abstractwiki-rust: Bake in semgrep, cargo-chef, clang, and clippy [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1296682 (https://phabricator.wikimedia.org/T427989)
[21:27:43] <wikibugs>	 (03PS1) 10Eevans: linked-artifacts: update for production deploy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296683 (https://phabricator.wikimedia.org/T414140)
[21:43:28] <icinga-wm>	 RECOVERY - Backup freshness on backup1014 is OK: Fresh: 139 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[21:59:00] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Observability-Logging: Degraded RAID on centrallog1002 - https://phabricator.wikimedia.org/T427748#11979474 (10colewhite) 05Open→03In progress
[21:59:13] <wikibugs>	 (03PS1) 10Dzahn: site: add releases[12]004 with collab insetup role [puppet] - 10https://gerrit.wikimedia.org/r/1296687 (https://phabricator.wikimedia.org/T418299)
[22:00:19] <wikibugs>	 (03PS1) 10Dzahn: docker_registry: add next releases hosts (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1296688
[22:01:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] docker_registry: add next releases hosts (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1296688 (owner: 10Dzahn)
[22:02:21] <wikibugs>	 (03CR) 10Dreamy Jazz: hCaptcha: Enable for badlogin on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296551 (https://phabricator.wikimedia.org/T426875) (owner: 10Dreamy Jazz)
[22:02:27] <Dreamy_Jazz>	 jouncebot: nowandnext
[22:02:28] <jouncebot>	 No deployments scheduled for the next 7 hour(s) and 57 minute(s)
[22:02:28] <jouncebot>	 In 7 hour(s) and 57 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260603T0600)
[22:02:51] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons, 10media-backups, 10MediaWiki-File-management: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11979495 (10Ladsgroup) or in other words, total storage of our originals looks like this: https://grafana.wikimedia.org/d/75a174f3-44b6-4416-a8b8-...
[22:02:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296551 (https://phabricator.wikimedia.org/T426875) (owner: 10Dreamy Jazz)
[22:04:42] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Enable for badlogin on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296551 (https://phabricator.wikimedia.org/T426875) (owner: 10Dreamy Jazz)
[22:05:16] <logmsgbot>	 !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1296551|hCaptcha: Enable for badlogin on group0 wikis (T426875)]]
[22:05:21] <stashbot>	 T426875: hCaptcha: Support usage in "always challenge" SiteKey for badlogin - https://phabricator.wikimedia.org/T426875
[22:05:44] <wikibugs>	 (03PS1) 10Dreamy Jazz: hCaptcha: Correct inaccurate comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296689
[22:05:54] <wikibugs>	 (03CR) 10CI reject: [V:04-1] hCaptcha: Correct inaccurate comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296689 (owner: 10Dreamy Jazz)
[22:06:08] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons, 10media-backups, 10MediaWiki-File-management: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11979500 (10Ladsgroup) Notified the uploader: https://commons.wikimedia.org/wiki/User_talk:PantheraLeo1359531#Compression_of_TIFF_files
[22:06:13] <wikibugs>	 (03PS2) 10Dreamy Jazz: hCaptcha: Correct inaccurate comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296689
[22:07:15] <logmsgbot>	 !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1296551|hCaptcha: Enable for badlogin on group0 wikis (T426875)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[22:09:31] <logmsgbot>	 !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment
[22:10:19] <logmsgbot>	 !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply
[22:10:39] <logmsgbot>	 !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply
[22:13:47] <logmsgbot>	 !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1296551|hCaptcha: Enable for badlogin on group0 wikis (T426875)]] (duration: 08m 31s)
[22:13:51] <stashbot>	 T426875: hCaptcha: Support usage in "always challenge" SiteKey for badlogin - https://phabricator.wikimedia.org/T426875
[22:14:05] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296689 (owner: 10Dreamy Jazz)
[22:14:28] <wikibugs>	 (03PS1) 10Santiago Faci: Revert "Test Kitchen UI: Deploy v1.3.9 release to production" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296690
[22:15:00] <wikibugs>	 (03CR) 10Clare Ming: [C:03+2] Revert "Test Kitchen UI: Deploy v1.3.9 release to production" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296690 (owner: 10Santiago Faci)
[22:15:05] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Correct inaccurate comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296689 (owner: 10Dreamy Jazz)
[22:15:30] <logmsgbot>	 !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1296689|hCaptcha: Correct inaccurate comment]]
[22:17:13] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Test Kitchen UI: Deploy v1.3.9 release to production" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1296690 (owner: 10Santiago Faci)
[22:17:29] <logmsgbot>	 !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1296689|hCaptcha: Correct inaccurate comment]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[22:17:48] <logmsgbot>	 !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment
[22:18:28] <logmsgbot>	 !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply
[22:18:46] <logmsgbot>	 !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply
[22:21:58] <logmsgbot>	 !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1296689|hCaptcha: Correct inaccurate comment]] (duration: 06m 27s)
[22:26:49] <wikibugs>	 (03CR) 10Cwhite: [C:03+1] kafka event platform logs - Strip the stray $!msg field [puppet] - 10https://gerrit.wikimedia.org/r/1296607 (https://phabricator.wikimedia.org/T291645) (owner: 10Btullis)
[22:40:36] <wikibugs>	 (03PS2) 10Arlolra: Deploy PRV to 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296015 (https://phabricator.wikimedia.org/T427851)
[23:04:41] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:15:01] <wikibugs>	 (03CR) 10Scott French: "Alas, I did not get a chance to deploy this today, and will aim for tomorrow instead." [puppet] - 10https://gerrit.wikimedia.org/r/1296036 (https://phabricator.wikimedia.org/T418200) (owner: 10Scott French)
[23:39:52] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1296697
[23:39:52] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1296697 (owner: 10TrainBranchBot)
[23:40:28] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons, 10media-backups, 10MediaWiki-File-management: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11979680 (10Ladsgroup) I wrote a script to proactively compress tiff files, and it works pretty nice so far: ` Processing: File:LVGL-SL - DOP20IR...
[23:45:42] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons, 10media-backups, 10MediaWiki-File-management: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11979686 (10Ladsgroup) Example: https://commons.wikimedia.org/wiki/File:LVGL-SL_-_DOP20IR_-_346000_5490000_(2025).tif
[23:47:51] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons, 10media-backups, 10MediaWiki-File-management: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#11979687 (10Ladsgroup) And it can't even upload the new files: ` ERROR: An error occurred for uri https://commons.wikimedia.org/w/api.php ERROR: T...
[23:51:45] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1296697 (owner: 10TrainBranchBot)