[00:38:30] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1101960
[00:38:30] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1101960 (owner: 10TrainBranchBot)
[00:50:52] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.wdqs.data-transfer (T376150, xfer wdqs scholarly 2023(public)->2026(internal)) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2027.codfw.wmnet w/ force delete existing files, repooling both afterwards
[00:50:56] <stashbot>	 T376150: Prepare hosts to serve wdqs-internal-main & wdqs-internal-scholarly - https://phabricator.wikimedia.org/T376150
[00:56:23] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1101960 (owner: 10TrainBranchBot)
[01:08:36] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1101966
[01:08:36] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1101966 (owner: 10TrainBranchBot)
[01:19:43] <jinxer-wm>	 FIRING: [2x] IPv4AnchorUnreachable: ipv4 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable
[01:19:44] <jinxer-wm>	 FIRING: [2x] IPv6AnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv6AnchorUnreachable
[01:20:58] <wikibugs>	 06SRE, 10fundraising-tech-ops: Q1:rack/setup/install fransw200[1-3].frack.codfw.wmnet - https://phabricator.wikimedia.org/T367800#10395667 (10Dwisehaupt) @Papaul @Jhancock.wm I'm getting to building these hosts now (so many other things were pre-reqs) and they are starting out ok. Except fransw2002 is not reac...
[01:29:52] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1101966 (owner: 10TrainBranchBot)
[01:31:27] <wikibugs>	 06SRE, 10fundraising-tech-ops: Q1:rack/setup/install fransw200[1-3].frack.codfw.wmnet - https://phabricator.wikimedia.org/T367800#10395679 (10Papaul) @Dwisehaupt yes the host is still racked in C7 we are waiting for civic2001 to be decom so we can move it into C8/U16. For the issue about the host is not reacha...
[01:36:28] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, xfer wdqs scholarly 2023(public)->2026(internal)) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2027.codfw.wmnet w/ force delete existing files, repooling both afterwards
[01:36:32] <stashbot>	 T376150: Prepare hosts to serve wdqs-internal-main & wdqs-internal-scholarly - https://phabricator.wikimedia.org/T376150
[01:40:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 206636680 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:41:10] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 42592 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:49:33] <wikibugs>	 06SRE, 10fundraising-tech-ops: Q1:rack/setup/install fransw200[1-3].frack.codfw.wmnet - https://phabricator.wikimedia.org/T367800#10395689 (10Dwisehaupt) @Papaul Thanks. I'll take a look at civi2001. I believe we need to keep it in place through the end of the month (just in case for big english) and then we c...
[01:52:32] <icinga-wm>	 PROBLEM - Disk space on releases1003 is CRITICAL: DISK CRITICAL - /srv/docker/overlay2/d486464159cce853466b996ebd3d3e2d81d20cbb42a6103376b28d0acc67c450/merged is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops
[02:04:39] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[02:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:12:32] <icinga-wm>	 RECOVERY - Disk space on releases1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops
[02:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:06:06] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is OK: (C)1e+05 gt (W)1e+04 gt 8406 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad
[03:06:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:18:10] <wikibugs>	 06SRE, 10fundraising-tech-ops: Q1:rack/setup/install fransw200[1-3].frack.codfw.wmnet - https://phabricator.wikimedia.org/T367800#10395739 (10Papaul) @Dwisehaupt there is no rush on our end. You can take you time on that,
[04:07:35] <icinga-wm>	 RECOVERY - Host ripe-atlas-eqsin IPv6 is UP: PING OK - Packet loss = 0%, RTA = 36.76 ms
[04:13:46] <icinga-wm>	 PROBLEM - Host ripe-atlas-eqsin IPv6 is DOWN: CRITICAL - Host Unreachable (2001:df2:e500:201:103:102:166:20)
[05:19:43] <jinxer-wm>	 FIRING: [2x] IPv4AnchorUnreachable: ipv4 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable
[05:19:44] <jinxer-wm>	 FIRING: [2x] IPv6AnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv6AnchorUnreachable
[06:04:39] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[06:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:33:05] <wikibugs>	 (03PS1) 10Func: ve.ui.CodeMirror.v6: Use plugin callback to load the actual module [extensions/CodeMirror] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102141 (https://phabricator.wikimedia.org/T374072)
[06:35:36] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 11 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/CodeMirror] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102141 (https://phabricator.wikimedia.org/T374072) (owner: 10Func)
[06:37:31] <wikibugs>	 (03PS1) 10Func: styles: Avoid misalignments when line numbering is disabled [extensions/CodeMirror] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102142 (https://phabricator.wikimedia.org/T381714)
[06:38:00] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 11 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/CodeMirror] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102142 (https://phabricator.wikimedia.org/T381714) (owner: 10Func)
[06:50:29] <wikibugs>	 (03PS1) 10Kevin Bazira: APIGW: Add configuration to expose LW isvc article-country [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102150 (https://phabricator.wikimedia.org/T371897)
[07:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T0700)
[07:13:38] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for Ammarpad - https://phabricator.wikimedia.org/T381851#10395916 (10Ammarpad) >>! In T381851#10394860, @Scott_French wrote: > Thanks, @Ammarpad - It would great if you could you please confirm your SSH public key via a second authenticated channel....
[07:36:46] <wikibugs>	 (03PS1) 10Kevin Bazira: httpbb: add post deployment tests for the article-country endpoint [puppet] - 10https://gerrit.wikimedia.org/r/1102201 (https://phabricator.wikimedia.org/T371897)
[07:45:03] <wikibugs>	 (03CR) 10Gmodena: data-engineering: add alerts for dumps2 flink app. (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1101849 (https://phabricator.wikimedia.org/T379362) (owner: 10Gmodena)
[07:47:10] <wikibugs>	 (03PS7) 10Gmodena: dse-k8s-services: rename mw-dumps helmfiles. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322)
[07:59:27] <wikibugs>	 (03PS1) 10Novem Linguae: Follow-up I9df39fdcc: Convert missed 'this' to 'el' [extensions/PageTriage] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102205 (https://phabricator.wikimedia.org/T381741)
[08:00:04] <jouncebot>	 Amir1, Urbanecm, and awight: OwO what's this, a deployment window?? UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T0800). nyaa~
[08:00:05] <jouncebot>	 Func: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:00:10] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 11 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [extensions/PageTriage] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102205 (https://phabricator.wikimedia.org/T381741) (owner: 10Novem Linguae)
[08:00:15] <Func>	 o/
[08:01:09] <NovemLinguae>	 if I'm not too late, I'm going to add one right now
[08:08:00] <wikibugs>	 (03PS1) 10Jelto: Rename kubernetes[2011-2014] to wikikube-worker[2180-2183] [puppet] - 10https://gerrit.wikimedia.org/r/1102206 (https://phabricator.wikimedia.org/T377877)
[08:14:36] <icinga-wm>	 PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: /var/lib/archiva 8812 MB (3% inode=80%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops
[08:33:18] <NovemLinguae>	 think we'll still have the backport?
[08:33:58] <wikibugs>	 (03PS1) 10Slyngshede: Release v0.1.4 [software/bitu] - 10https://gerrit.wikimedia.org/r/1102213
[08:47:55] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:04-1] "LGTM, but put something (even if it is commented) to showcase the structure of the option in modules/mesh/values.yaml" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[08:49:20] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Same comment as the parent change. Something in values.yaml, even if an example stanza commented to allow a reader to quickly reason about" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101919 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[08:58:49] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] "Looks good!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[08:59:40] <wikibugs>	 (03PS1) 10Marostegui: control-mariadb-client-10.6-bookworm: Added to repo [software] - 10https://gerrit.wikimedia.org/r/1102215 (https://phabricator.wikimedia.org/T380073)
[09:00:55] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "this looks good to me from the GitLab side. But I have little knowledge what data the blunderbuss service provides. Please keep in mind th" [puppet] - 10https://gerrit.wikimedia.org/r/1101925 (https://phabricator.wikimedia.org/T371994) (owner: 10Aleksandar Mastilovic)
[09:01:13] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Rename kubernetes[2011-2014] to wikikube-worker[2180-2183] [puppet] - 10https://gerrit.wikimedia.org/r/1102206 (https://phabricator.wikimedia.org/T377877) (owner: 10Jelto)
[09:01:54] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] control-mariadb-client-10.6-bookworm: Added to repo [software] - 10https://gerrit.wikimedia.org/r/1102215 (https://phabricator.wikimedia.org/T380073) (owner: 10Marostegui)
[09:02:21] <wikibugs>	 (03Merged) 10jenkins-bot: control-mariadb-client-10.6-bookworm: Added to repo [software] - 10https://gerrit.wikimedia.org/r/1102215 (https://phabricator.wikimedia.org/T380073) (owner: 10Marostegui)
[09:02:22] <wikibugs>	 07sre-alert-triage, 06Infrastructure-Foundations: Alert in need of triage: SystemdUnitFailed (instance idm-test1001:9100) - https://phabricator.wikimedia.org/T381947 (10LSobanski) 03NEW
[09:04:27] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[2011-2014].codfw.wmnet
[09:04:44] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on db2136.codfw.wmnet with reason: maintenance
[09:04:58] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2136.codfw.wmnet with reason: maintenance
[09:05:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2136 to upgrade MariaDB 10.11 T378940', diff saved to https://phabricator.wikimedia.org/P71694 and previous config saved to /var/cache/conftool/dbconfig/20241211-090538-marostegui.json
[09:05:42] <stashbot>	 T378940: Compile and package MariaDB 10.11.10 and 10.6.20 - https://phabricator.wikimedia.org/T378940
[09:06:45] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[2011-2014].codfw.wmnet
[09:08:49] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[2011-2014].codfw.wmnet
[09:08:57] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[2011-2014].codfw.wmnet
[09:09:52] <wikibugs>	 (03CR) 10Jelto: [C:03+2] Rename kubernetes[2011-2014] to wikikube-worker[2180-2183] [puppet] - 10https://gerrit.wikimedia.org/r/1102206 (https://phabricator.wikimedia.org/T377877) (owner: 10Jelto)
[09:10:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P71695 and previous config saved to /var/cache/conftool/dbconfig/20241211-091029-root.json
[09:11:44] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10396114 (10Marostegui) a:05ABran-WMF→03Jhancock.wm
[09:13:32] <wikibugs>	 (03PS1) 10Brouberol: ceph-csi: remove un-necessary network policies allowing kube api egress [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102221 (https://phabricator.wikimedia.org/T381264)
[09:14:29] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes2011 to wikikube-worker2180
[09:14:50] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[09:16:15] <wikibugs>	 (03PS1) 10Marostegui: installserver: Do not reimage es2046 [puppet] - 10https://gerrit.wikimedia.org/r/1102222
[09:17:12] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitorin
[09:17:12] <icinga-wm>	 status
[09:17:22] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitorin
[09:17:22] <icinga-wm>	 status
[09:18:27] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2011 to wikikube-worker2180 - jelto@cumin1002"
[09:18:59] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] installserver: Do not reimage es2046 [puppet] - 10https://gerrit.wikimedia.org/r/1102222 (owner: 10Marostegui)
[09:19:04] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2011 to wikikube-worker2180 - jelto@cumin1002"
[09:19:04] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:19:05] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2180
[09:19:22] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2180
[09:19:44] <jinxer-wm>	 FIRING: [2x] IPv4AnchorUnreachable: ipv4 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable
[09:19:44] <jinxer-wm>	 FIRING: [2x] IPv6AnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv6AnchorUnreachable
[09:20:01] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2011 to wikikube-worker2180
[09:20:36] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes2012 to wikikube-worker2181
[09:20:56] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[09:21:02] <wikibugs>	 (03CR) 10Gmodena: [C:03+2] dse-k8s-services: rename mw-dumps helmfiles. (034 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[09:22:07] <wikibugs>	 (03Merged) 10jenkins-bot: dse-k8s-services: rename mw-dumps helmfiles. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[09:24:31] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2012 to wikikube-worker2181 - jelto@cumin1002"
[09:24:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on kubernetes2014:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubernetes2014 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[09:25:00] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2012 to wikikube-worker2181 - jelto@cumin1002"
[09:25:00] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:25:00] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2181
[09:25:28] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2181
[09:25:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P71696 and previous config saved to /var/cache/conftool/dbconfig/20241211-092535-root.json
[09:26:07] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2012 to wikikube-worker2181
[09:26:37] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes2013 to wikikube-worker2182
[09:26:57] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[09:30:39] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2013 to wikikube-worker2182 - jelto@cumin1002"
[09:30:57] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2013 to wikikube-worker2182 - jelto@cumin1002"
[09:30:57] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:30:57] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2182
[09:30:59] <wikibugs>	 (03PS1) 10Marostegui: production-m1.sql.erb: Upgrade grants [puppet] - 10https://gerrit.wikimedia.org/r/1102226 (https://phabricator.wikimedia.org/T367380)
[09:31:18] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2182
[09:31:57] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2013 to wikikube-worker2182
[09:32:18] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes2014 to wikikube-worker2183
[09:32:27] <logmsgbot>	 !log elukey@cumin1002 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
[09:32:27] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-lab1002.eqiad.wmnet with OS bookworm
[09:32:34] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Great! Thanks for looking into this." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102221 (https://phabricator.wikimedia.org/T381264) (owner: 10Brouberol)
[09:32:39] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[09:33:13] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] ceph-csi: remove un-necessary network policies allowing kube api egress [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102221 (https://phabricator.wikimedia.org/T381264) (owner: 10Brouberol)
[09:35:00] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 11 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101577 (owner: 10Arlolra)
[09:35:34] <wikibugs>	 (03PS1) 10Marostegui: report_users.sh: Add dbproxy2005 IP [software] - 10https://gerrit.wikimedia.org/r/1102228 (https://phabricator.wikimedia.org/T367380)
[09:36:13] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2014 to wikikube-worker2183 - jelto@cumin1002"
[09:36:31] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2014 to wikikube-worker2183 - jelto@cumin1002"
[09:36:32] <wikibugs>	 (03CR) 10Marostegui: "This is a NOOP - grants added to the DB" [puppet] - 10https://gerrit.wikimedia.org/r/1102226 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[09:36:32] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:36:32] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2183
[09:36:33] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] production-m1.sql.erb: Upgrade grants [puppet] - 10https://gerrit.wikimedia.org/r/1102226 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[09:36:44] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2183
[09:36:49] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] report_users.sh: Add dbproxy2005 IP [software] - 10https://gerrit.wikimedia.org/r/1102228 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[09:37:22] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2014 to wikikube-worker2183
[09:37:40] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2180.codfw.wmnet wikikube-worker2181.codfw.wmnet wikikube-worker2182.codfw.wmnet wikikube-worker2183.codfw.wmnet on all recursors
[09:37:44] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2180.codfw.wmnet wikikube-worker2181.codfw.wmnet wikikube-worker2182.codfw.wmnet wikikube-worker2183.codfw.wmnet on all recursors
[09:39:52] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2180.codfw.wmnet with OS bookworm
[09:40:02] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2180
[09:40:05] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2181.codfw.wmnet with OS bookworm
[09:40:14] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2182.codfw.wmnet with OS bookworm
[09:40:25] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2182
[09:40:29] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2183.codfw.wmnet with OS bookworm
[09:40:32] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[09:40:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P71697 and previous config saved to /var/cache/conftool/dbconfig/20241211-094040-root.json
[09:42:07] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Update m1-master.codfw.wmnet CNAME [dns] - 10https://gerrit.wikimedia.org/r/1102233 (https://phabricator.wikimedia.org/T367380)
[09:44:17] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2180 - jelto@cumin1002"
[09:44:21] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2180 - jelto@cumin1002"
[09:44:21] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:44:21] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2180.codfw.wmnet 109.32.192.10.in-addr.arpa 9.0.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:44:25] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2180.codfw.wmnet 109.32.192.10.in-addr.arpa 9.0.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:44:25] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2180
[09:44:27] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[09:44:27] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] wmnet: Update m1-master.codfw.wmnet CNAME [dns] - 10https://gerrit.wikimedia.org/r/1102233 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[09:44:45] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2180
[09:44:45] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2180
[09:44:51] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2181
[09:46:50] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:46:50] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2182.codfw.wmnet 28.48.192.10.in-addr.arpa 8.2.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:46:53] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2182.codfw.wmnet 28.48.192.10.in-addr.arpa 8.2.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:46:54] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2182
[09:47:21] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[09:47:46] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2182
[09:47:46] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2182
[09:48:15] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2183
[09:49:13] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 11 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1098045 (https://phabricator.wikimedia.org/T377809) (owner: 10Joely Rooke WMDE)
[09:50:55] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Update dbproxy200(1,5) notes [puppet] - 10https://gerrit.wikimedia.org/r/1102237
[09:51:00] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2181 - jelto@cumin1002"
[09:51:05] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2181 - jelto@cumin1002"
[09:51:05] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:51:06] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2181.codfw.wmnet 110.32.192.10.in-addr.arpa 0.1.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:51:09] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2181.codfw.wmnet 110.32.192.10.in-addr.arpa 0.1.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:51:09] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2181
[09:51:27] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2181
[09:51:27] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2181
[09:51:37] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[09:51:50] <wikibugs>	 (03PS1) 10Marostegui: report_users.sh: Change variable [software] - 10https://gerrit.wikimedia.org/r/1102238
[09:52:23] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Prod-Kubernetes, 06serviceops: wikikube-ctrl1002 and wikikube-ctrl1003: Switch network cable from port 2 to port 1 on the 10G NIC - https://phabricator.wikimedia.org/T379717#10396280 (10JMeybohm) >>! In T379717#10395266, @VRiley-WMF wrote: > Can we proceed with swapping th...
[09:52:27] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Update dbproxy200(1,5) notes [puppet] - 10https://gerrit.wikimedia.org/r/1102237 (owner: 10Marostegui)
[09:52:46] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] report_users.sh: Change variable [software] - 10https://gerrit.wikimedia.org/r/1102238 (owner: 10Marostegui)
[09:55:21] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2183 - jelto@cumin1002"
[09:55:26] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2183 - jelto@cumin1002"
[09:55:26] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:55:26] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2183.codfw.wmnet 29.48.192.10.in-addr.arpa 9.2.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:55:29] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2183.codfw.wmnet 29.48.192.10.in-addr.arpa 9.2.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:55:30] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2183
[09:55:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P71698 and previous config saved to /var/cache/conftool/dbconfig/20241211-095546-root.json
[09:56:10] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2183
[09:56:11] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2183
[09:58:46] <logmsgbot>	 !log aqu@deploy2002 Started deploy [airflow-dags/analytics@416a3c0]: Backfill webrequest actor metrics rollup hourly 2024 12
[09:59:49] <logmsgbot>	 !log aqu@deploy2002 Finished deploy [airflow-dags/analytics@416a3c0]: Backfill webrequest actor metrics rollup hourly 2024 12 (duration: 01m 02s)
[10:01:23] <wikibugs>	 (03CR) 10JMeybohm: charts: Add kartotherian (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[10:02:23] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2180.codfw.wmnet with reason: host reimage
[10:03:58] <wikibugs>	 (03PS1) 10Elukey: dockerfile: fix upstream_version filter [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1102240
[10:04:39] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[10:04:57] <wikibugs>	 (03PS1) 10Marostegui: report_users.sh: Use cumin2024 [software] - 10https://gerrit.wikimedia.org/r/1102241
[10:06:14] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2180.codfw.wmnet with reason: host reimage
[10:06:42] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] report_users.sh: Use cumin2024 [software] - 10https://gerrit.wikimedia.org/r/1102241 (owner: 10Marostegui)
[10:08:06] <wikibugs>	 (03PS1) 10Marostegui: production-m2.sql.erb: Replaced dbproxy2002 with dbproxy2006 [puppet] - 10https://gerrit.wikimedia.org/r/1102242 (https://phabricator.wikimedia.org/T367380)
[10:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:10:48] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] production-m2.sql.erb: Replaced dbproxy2002 with dbproxy2006 [puppet] - 10https://gerrit.wikimedia.org/r/1102242 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[10:10:52] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P71699 and previous config saved to /var/cache/conftool/dbconfig/20241211-101051-root.json
[10:12:43] <wikibugs>	 (03PS1) 10Marostegui: report_users.sh: Add dbproxy2006 IP [software] - 10https://gerrit.wikimedia.org/r/1102244 (https://phabricator.wikimedia.org/T367380)
[10:12:49] <wikibugs>	 (03PS9) 10Elukey: charts: Add kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452 (https://phabricator.wikimedia.org/T216826)
[10:12:49] <wikibugs>	 (03PS5) 10Elukey: admin_ng: add the kartotherian namespace on Wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101487 (https://phabricator.wikimedia.org/T216826)
[10:12:50] <wikibugs>	 (03PS5) 10Elukey: services: add helmfile config for Kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101488 (https://phabricator.wikimedia.org/T216826)
[10:12:55] <wikibugs>	 (03CR) 10Elukey: charts: Add kartotherian (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[10:13:28] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] report_users.sh: Add dbproxy2006 IP [software] - 10https://gerrit.wikimedia.org/r/1102244 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[10:14:00] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2183.codfw.wmnet with reason: host reimage
[10:15:21] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Promote dbproxy2006 to m2 master [dns] - 10https://gerrit.wikimedia.org/r/1102245 (https://phabricator.wikimedia.org/T367380)
[10:16:22] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Update dbproxy200(2,6) notes [puppet] - 10https://gerrit.wikimedia.org/r/1102246
[10:17:05] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Update dbproxy200(2,6) notes [puppet] - 10https://gerrit.wikimedia.org/r/1102246 (owner: 10Marostegui)
[10:17:32] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] wmnet: Promote dbproxy2006 to m2 master [dns] - 10https://gerrit.wikimedia.org/r/1102245 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[10:17:39] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2183.codfw.wmnet with reason: host reimage
[10:17:52] <wikibugs>	 06SRE, 06Traffic-Icebox, 10MobileFrontend (Tracking): RFC: Remove .m. subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998#10396354 (10Krinkle)
[10:25:26] <wikibugs>	 (03PS1) 10Brouberol: airflow-ml: define DNS records [dns] - 10https://gerrit.wikimedia.org/r/1102249 (https://phabricator.wikimedia.org/T380258)
[10:25:33] <Dreamy_Jazz>	 jouncebot: nowandnext
[10:25:34] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 34 minute(s)
[10:25:34] <jouncebot>	 In 0 hour(s) and 34 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T1100)
[10:26:19] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+2] Revert^2 "Stats: Move StatsFactory flush into emitBufferedStats" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101913 (owner: 10Cwhite)
[10:26:26] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2180.codfw.wmnet with OS bookworm
[10:27:28] <wikibugs>	 (03PS1) 10Marostegui: production-m3.sql.erb: Replace dbproxy2003 with dbproxy2007 [puppet] - 10https://gerrit.wikimedia.org/r/1102250 (https://phabricator.wikimedia.org/T367380)
[10:29:25] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Promote dbproxy2007 to m3-codfw master [dns] - 10https://gerrit.wikimedia.org/r/1102251 (https://phabricator.wikimedia.org/T367380)
[10:30:03] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 10decommission-hardware: decommission cloudcephmon100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T380893#10396433 (10Andrew) These hosts have a somewhat unusual vlan setup, so my guess is something is tripping on that -- paging @cmooney for m...
[10:30:29] <wikibugs>	 (03PS1) 10Marostegui: report_users.sh: Add dbproxy2007 IP [software] - 10https://gerrit.wikimedia.org/r/1102253 (https://phabricator.wikimedia.org/T367380)
[10:31:08] <wikibugs>	 (03PS1) 10Brouberol: airflow-ml: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102254 (https://phabricator.wikimedia.org/T380258)
[10:31:13] <wikibugs>	 (03PS1) 10Brouberol: deployment_server: define airflow-ml users [puppet] - 10https://gerrit.wikimedia.org/r/1102255 (https://phabricator.wikimedia.org/T380258)
[10:31:15] <wikibugs>	 (03PS1) 10Brouberol: airflow-ml: define ATS mapping rules and cache settings [puppet] - 10https://gerrit.wikimedia.org/r/1102256 (https://phabricator.wikimedia.org/T380258)
[10:31:17] <wikibugs>	 (03PS1) 10Brouberol: airflow-ml: define CAS config [puppet] - 10https://gerrit.wikimedia.org/r/1102257 (https://phabricator.wikimedia.org/T380258)
[10:31:19] <wikibugs>	 (03PS1) 10Brouberol: openldap: define new offloaded airflow-ml-ops group [puppet] - 10https://gerrit.wikimedia.org/r/1102258 (https://phabricator.wikimedia.org/T380258)
[10:32:31] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2182.codfw.wmnet with OS bookworm
[10:33:20] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2182.codfw.wmnet with OS bookworm
[10:33:23] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2182
[10:33:23] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2182
[10:33:55] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] production-m3.sql.erb: Replace dbproxy2003 with dbproxy2007 [puppet] - 10https://gerrit.wikimedia.org/r/1102250 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[10:34:08] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] report_users.sh: Add dbproxy2007 IP [software] - 10https://gerrit.wikimedia.org/r/1102253 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[10:34:12] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] wmnet: Promote dbproxy2007 to m3-codfw master [dns] - 10https://gerrit.wikimedia.org/r/1102251 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[10:34:34] <wikibugs>	 (03Merged) 10jenkins-bot: report_users.sh: Add dbproxy2007 IP [software] - 10https://gerrit.wikimedia.org/r/1102253 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[10:37:23] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Update dbproxy200(3,7) notes [puppet] - 10https://gerrit.wikimedia.org/r/1102259
[10:39:49] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2183.codfw.wmnet with OS bookworm
[10:43:40] <wikibugs>	 (03PS1) 10Marostegui: production-m5.sql.erb: Upgrade dbproxy grants [puppet] - 10https://gerrit.wikimedia.org/r/1102260 (https://phabricator.wikimedia.org/T367380)
[10:45:20] <wikibugs>	 (03CR) 10Fabfur: Enable new countries for magru (Cohort 3) [dns] - 10https://gerrit.wikimedia.org/r/1100084 (https://phabricator.wikimedia.org/T371141) (owner: 10Fabfur)
[10:45:27] <wikibugs>	 (03PS5) 10Fabfur: Enable new countries for magru (Cohort 3) [dns] - 10https://gerrit.wikimedia.org/r/1100084 (https://phabricator.wikimedia.org/T371141)
[10:46:02] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Update dbproxy200(3,7) notes [puppet] - 10https://gerrit.wikimedia.org/r/1102259 (owner: 10Marostegui)
[10:46:10] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] production-m5.sql.erb: Upgrade dbproxy grants [puppet] - 10https://gerrit.wikimedia.org/r/1102260 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[10:46:23] <wikibugs>	 (03Merged) 10jenkins-bot: Revert^2 "Stats: Move StatsFactory flush into emitBufferedStats" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101913 (owner: 10Cwhite)
[10:49:44] <wikibugs>	 (03PS1) 10Marostegui: report_users.sh: Add dbproxy2008 IP [software] - 10https://gerrit.wikimedia.org/r/1102261 (https://phabricator.wikimedia.org/T367380)
[10:49:47] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Promote dbproxy2008 to m3-codfw master [dns] - 10https://gerrit.wikimedia.org/r/1102262 (https://phabricator.wikimedia.org/T367380)
[10:51:49] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2182.codfw.wmnet with reason: host reimage
[10:53:09] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] report_users.sh: Add dbproxy2008 IP [software] - 10https://gerrit.wikimedia.org/r/1102261 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[10:53:22] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] wmnet: Promote dbproxy2008 to m3-codfw master [dns] - 10https://gerrit.wikimedia.org/r/1102262 (https://phabricator.wikimedia.org/T367380) (owner: 10Marostegui)
[10:54:36] <icinga-wm>	 RECOVERY - Disk space on archiva1002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops
[10:54:44] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1101913|Revert^2 "Stats: Move StatsFactory flush into emitBufferedStats"]]
[10:55:04] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Update dbproxy2004,dbproxy2008 notes [puppet] - 10https://gerrit.wikimedia.org/r/1102263
[10:55:22] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2182.codfw.wmnet with reason: host reimage
[10:55:46] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Update dbproxy2004,dbproxy2008 notes [puppet] - 10https://gerrit.wikimedia.org/r/1102263 (owner: 10Marostegui)
[10:58:46] <fabfur>	 !log merging https://gerrit.wikimedia.org/r/c/operations/dns/+/1100084 to direct Argentina, Chile, Uruguay to magru (T359054)
[10:58:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:59:25] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz, cwhite: Backport for [[gerrit:1101913|Revert^2 "Stats: Move StatsFactory flush into emitBufferedStats"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[10:59:57] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] Enable new countries for magru (Cohort 3) [dns] - 10https://gerrit.wikimedia.org/r/1100084 (https://phabricator.wikimedia.org/T371141) (owner: 10Fabfur)
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T1100)
[11:00:08] <wikibugs>	 (03PS6) 10Fabfur: Enable new countries for magru (Cohort 3) [dns] - 10https://gerrit.wikimedia.org/r/1100084 (https://phabricator.wikimedia.org/T371141)
[11:00:19] <wikibugs>	 (03CR) 10Fabfur: [V:03+2 C:03+2] Enable new countries for magru (Cohort 3) [dns] - 10https://gerrit.wikimedia.org/r/1100084 (https://phabricator.wikimedia.org/T371141) (owner: 10Fabfur)
[11:03:21] <mszabo>	 jouncebot: now
[11:03:21] <jouncebot>	 For the next 0 hour(s) and 56 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T1100)
[11:03:51] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz, cwhite: Continuing with sync
[11:09:06] <wikibugs>	 (03CR) 10Klausman: [C:03+1] APIGW: Add configuration to expose LW isvc article-country [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102150 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[11:09:06] <logmsgbot>	 !log dreamyjazz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101913|Revert^2 "Stats: Move StatsFactory flush into emitBufferedStats"]] (duration: 14m 22s)
[11:11:39] <mszabo>	 jouncebot: nowandnext
[11:11:40] <jouncebot>	 For the next 0 hour(s) and 48 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T1100)
[11:11:40] <jouncebot>	 In 0 hour(s) and 48 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T1200)
[11:11:48] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2181.codfw.wmnet with OS bookworm
[11:12:16] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2181.codfw.wmnet with OS bookworm
[11:12:19] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2181
[11:12:19] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2181
[11:13:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszabo@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1099213 (https://phabricator.wikimedia.org/T374105) (owner: 10Máté Szabó)
[11:13:26] <wikibugs>	 (03PS2) 10Brouberol: airflow-ml: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102254 (https://phabricator.wikimedia.org/T380258)
[11:13:26] <wikibugs>	 (03PS1) 10Brouberol: airflow-ml: register namespaces in cloudnative/ceph operator tenant namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102268 (https://phabricator.wikimedia.org/T380258)
[11:13:56] <wikibugs>	 (03PS2) 10Brouberol: airflow-ml: define DNS records [dns] - 10https://gerrit.wikimedia.org/r/1102249 (https://phabricator.wikimedia.org/T380258)
[11:14:20] <wikibugs>	 (03Merged) 10jenkins-bot: Prep pilot wiki config for IRS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1099213 (https://phabricator.wikimedia.org/T374105) (owner: 10Máté Szabó)
[11:14:37] <logmsgbot>	 !log mszabo@deploy2002 Started scap sync-world: Backport for [[gerrit:1099213|Prep pilot wiki config for IRS (T374105)]]
[11:14:42] <stashbot>	 T374105: Incident Reporting System - MVP - https://phabricator.wikimedia.org/T374105
[11:15:38] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2182.codfw.wmnet with OS bookworm
[11:17:27] <logmsgbot>	 !log mszabo@deploy2002 mszabo: Backport for [[gerrit:1099213|Prep pilot wiki config for IRS (T374105)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[11:17:45] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 06Traffic: Slowly ramping up traffic to the Brazil data center (magru) and related geo-maps - https://phabricator.wikimedia.org/T359054#10396579 (10Fabfur) Argentina, Chile and Uruguay now lands on magru by default
[11:20:22] <logmsgbot>	 !log mszabo@deploy2002 mszabo: Continuing with sync
[11:25:02] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Looks good to me." [dns] - 10https://gerrit.wikimedia.org/r/1102249 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[11:25:40] <wikibugs>	 (03CR) 10Btullis: [C:03+1] deployment_server: define airflow-ml users [puppet] - 10https://gerrit.wikimedia.org/r/1102255 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[11:25:41] <logmsgbot>	 !log mszabo@deploy2002 Finished scap sync-world: Backport for [[gerrit:1099213|Prep pilot wiki config for IRS (T374105)]] (duration: 11m 04s)
[11:25:45] <stashbot>	 T374105: Incident Reporting System - MVP - https://phabricator.wikimedia.org/T374105
[11:26:13] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow-ml: define ATS mapping rules and cache settings [puppet] - 10https://gerrit.wikimedia.org/r/1102256 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[11:27:08] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Looks good to me, but let's get someone on I/F to check that they're happy with it, too." [puppet] - 10https://gerrit.wikimedia.org/r/1102257 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[11:28:33] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/1102257 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[11:28:48] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] "lgtm" [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1102240 (owner: 10Elukey)
[11:28:58] <wikibugs>	 (03CR) 10Btullis: [C:03+1] openldap: define new offloaded airflow-ml-ops group [puppet] - 10https://gerrit.wikimedia.org/r/1102258 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[11:29:25] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow-ml: register namespaces in cloudnative/ceph operator tenant namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102268 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[11:29:47] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2181.codfw.wmnet with reason: host reimage
[11:31:18] <wikibugs>	 (03CR) 10Btullis: airflow-ml: define helmfile and values (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102254 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[11:32:28] <wikibugs>	 (03PS1) 10Hnowlan: kubernetes: include idle_timeout and tcp_keepalive in service mesh data [puppet] - 10https://gerrit.wikimedia.org/r/1102272 (https://phabricator.wikimedia.org/T371701)
[11:33:38] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2181.codfw.wmnet with reason: host reimage
[11:35:26] <wikibugs>	 (03CR) 10Elukey: [C:03+2] dockerfile: fix upstream_version filter [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1102240 (owner: 10Elukey)
[11:37:32] <logmsgbot>	 !log isaranto@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
[11:41:14] <wikibugs>	 (03Merged) 10jenkins-bot: dockerfile: fix upstream_version filter [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1102240 (owner: 10Elukey)
[11:43:48] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4664/co" [puppet] - 10https://gerrit.wikimedia.org/r/1102272 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[11:44:07] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] kubernetes: include idle_timeout and tcp_keepalive in service mesh data [puppet] - 10https://gerrit.wikimedia.org/r/1102272 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[11:44:09] <wikibugs>	 (03PS1) 10Elukey: Release version 4.0.3 [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1102276
[11:44:17] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1 C:03+1] kubernetes: include idle_timeout and tcp_keepalive in service mesh data [puppet] - 10https://gerrit.wikimedia.org/r/1102272 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[11:51:53] <wikibugs>	 (03CR) 10Brouberol: airflow-ml: define helmfile and values (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102254 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[11:52:56] <wikibugs>	 (03CR) 10Brouberol: airflow-ml: define helmfile and values (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102254 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[11:53:42] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2181.codfw.wmnet with OS bookworm
[11:54:44] <jelto>	 !log homer 'lsw1-d6-codfw*' commit 'T377877'
[11:54:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:47] <stashbot>	 T377877: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877
[11:55:21] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow-ml: define DNS records [dns] - 10https://gerrit.wikimedia.org/r/1102249 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[11:56:14] <jelto>	 !log homer 'lsw1-c1-codfw*' commit 'T377877'
[11:56:15] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] deployment_server: define airflow-ml users [puppet] - 10https://gerrit.wikimedia.org/r/1102255 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[11:56:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:57:53] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2180-2183].codfw.wmnet
[11:57:56] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2180-2183].codfw.wmnet
[11:58:08] <wikibugs>	 (03CR) 10Bartosz Dziewoński: [C:03+1] Fix protocol for .well-known/change-password Apache rule [puppet] - 10https://gerrit.wikimedia.org/r/1101462 (https://phabricator.wikimedia.org/T381625) (owner: 10Gergő Tisza)
[11:59:17] <wikibugs>	 10ops-codfw, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T381967 (10Jelto) 03NEW
[12:00:05] <jouncebot>	 mvolz: Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T1200). Please do the needful.
[12:00:52] <wikibugs>	 (03PS3) 10Hnowlan: mediawiki: get mercurius label from mediawiki image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101889 (https://phabricator.wikimedia.org/T371700)
[12:01:45] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow-ml: register namespaces in cloudnative/ceph operator tenant namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102268 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[12:02:07] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2024-12-10-132417-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102278 (https://phabricator.wikimedia.org/T369815)
[12:02:20] <wikibugs>	 (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101839 (owner: 10PipelineBot)
[12:04:02] <wikibugs>	 (03PS2) 10Brouberol: airflow-ml: register namespaces in cloudnative/ceph operator tenant namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102268 (https://phabricator.wikimedia.org/T380258)
[12:04:02] <wikibugs>	 (03PS3) 10Brouberol: airflow-ml: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102254 (https://phabricator.wikimedia.org/T380258)
[12:04:02] <wikibugs>	 (03PS1) 10Brouberol: airflow-ml: define kubernetes namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102280 (https://phabricator.wikimedia.org/T380258)
[12:04:03] <wikibugs>	 (03CR) 10Hnowlan: "Given that this comes from the puppet data for the listeners, does it really belong in values.yaml? Most other mesh listener options aren'" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[12:04:35] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] kubernetes: include idle_timeout and tcp_keepalive in service mesh data [puppet] - 10https://gerrit.wikimedia.org/r/1102272 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[12:04:38] <wikibugs>	 (03PS1) 10Jelto: Rename kubernetes20(17|21|22|24) to wikikube-worker[2184-2187] [puppet] - 10https://gerrit.wikimedia.org/r/1102281 (https://phabricator.wikimedia.org/T377877)
[12:04:59] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply
[12:05:02] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply
[12:05:19] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow-ml: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102254 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[12:06:28] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow-ml: define kubernetes namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102280 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[12:06:33] <wikibugs>	 (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101839 (owner: 10PipelineBot)
[12:08:11] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply
[12:08:37] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply
[12:11:03] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
[12:11:06] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
[12:11:29] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
[12:11:35] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
[12:11:56] <logmsgbot>	 !log mvolz@deploy2002 helmfile [codfw] START helmfile.d/services/citoid: apply
[12:12:42] <logmsgbot>	 !log mvolz@deploy2002 helmfile [codfw] DONE helmfile.d/services/citoid: apply
[12:12:57] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
[12:13:02] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
[12:14:39] <logmsgbot>	 !log mvolz@deploy2002 helmfile [eqiad] START helmfile.d/services/citoid: apply
[12:15:13] <logmsgbot>	 !log mvolz@deploy2002 helmfile [eqiad] DONE helmfile.d/services/citoid: apply
[12:18:07] <wikibugs>	 (03PS2) 10Abijeet Patro: Translate: Enable message group subscription for 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1102283 (https://phabricator.wikimedia.org/T372386)
[12:18:28] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, December 12 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1102283 (https://phabricator.wikimedia.org/T372386) (owner: 10Abijeet Patro)
[12:18:40] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Translate: Enable message group subscription for 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1102283 (https://phabricator.wikimedia.org/T372386) (owner: 10Abijeet Patro)
[12:21:31] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons, 10Thumbor, 06Traffic: Unable to render file from upload.wikimedia.org "Error 349 ERR_RESPONSE_HEADERS_MULTIPLE_CONTENT_DISPOSITION" - https://phabricator.wikimedia.org/T170605#10396801 (10TheDJ) 05Open→03Declined Most likely a device/browser level issue. No...
[12:22:20] <wikibugs>	 (03PS1) 10Btullis: dse-k8s: Add a namespace for llm-inference work by the ML team [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102284 (https://phabricator.wikimedia.org/T377266)
[12:23:30] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be208[1-8] - https://phabricator.wikimedia.org/T371400#10396810 (10MatthewVernon) 05Resolved→03Open @elukey ms-be2085 is still missing its spinning drives, I'm afraid. I tried setting them to JBOD via the...
[12:23:32] <wikibugs>	 06SRE, 06Traffic: Webrequests live data shows traffic without TLS on varnish for upload.w.o - https://phabricator.wikimedia.org/T340097#10396814 (10TheDJ) @BCornwall is this still an issue ?
[12:23:43] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be208[1-8] - https://phabricator.wikimedia.org/T371400#10396815 (10MatthewVernon) p:05Medium→03High
[12:27:39] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Traffic-Icebox, 07affects-Kiwix-and-openZIM, 07Wikimedia-Performance-recommendation: Swift sends ETAG without double-quotes - https://phabricator.wikimedia.org/T256217#10396817 (10TheDJ) @MatthewVernon This still needs to happen right ?
[12:39:43] <wikibugs>	 (03PS4) 10Hnowlan: mediawiki: get mercurius label from mediawiki image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101889 (https://phabricator.wikimedia.org/T371700)
[12:41:15] <wikibugs>	 (03PS1) 10Btullis: dse-k8s: Add token for the llm-inference namespace [puppet] - 10https://gerrit.wikimedia.org/r/1102287 (https://phabricator.wikimedia.org/T377266)
[12:43:36] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4665/co" [puppet] - 10https://gerrit.wikimedia.org/r/1102287 (https://phabricator.wikimedia.org/T377266) (owner: 10Btullis)
[12:45:09] <wikibugs>	 (03PS2) 10Btullis: dse-k8s: Add tokens for the llm-inference namespace [puppet] - 10https://gerrit.wikimedia.org/r/1102287 (https://phabricator.wikimedia.org/T377266)
[12:47:24] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4666/co" [puppet] - 10https://gerrit.wikimedia.org/r/1102287 (https://phabricator.wikimedia.org/T377266) (owner: 10Btullis)
[12:47:35] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
[12:47:41] <wikibugs>	 (03CR) 10Hnowlan: mediawiki: get mercurius label from mediawiki image version (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101889 (https://phabricator.wikimedia.org/T371700) (owner: 10Hnowlan)
[12:47:44] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mediawiki: get mercurius label from mediawiki image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101889 (https://phabricator.wikimedia.org/T371700) (owner: 10Hnowlan)
[12:48:37] <kart_>	 Doing quick cxserver deployment..
[12:48:58] <wikibugs>	 (03CR) 10KartikMistry: [C:03+2] Update cxserver to 2024-12-10-132417-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102278 (https://phabricator.wikimedia.org/T369815) (owner: 10KartikMistry)
[12:49:54] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: get mercurius label from mediawiki image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101889 (https://phabricator.wikimedia.org/T371700) (owner: 10Hnowlan)
[12:50:53] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2024-12-10-132417-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102278 (https://phabricator.wikimedia.org/T369815) (owner: 10KartikMistry)
[12:52:10] <wikibugs>	 (03PS3) 10Hnowlan: mesh.configuration: dummy commit for 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101917
[12:54:21] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply
[12:54:36] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
[12:54:44] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[12:54:48] <wikibugs>	 (03PS6) 10Hnowlan: mesh.configuration: add tcp_keepalive/idle_timeout to 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701)
[12:54:48] <wikibugs>	 (03PS3) 10Hnowlan: mediawiki: use mesh.configuration 1.11 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101919 (https://phabricator.wikimedia.org/T371701)
[12:56:01] <wikibugs>	 (03CR) 10Hnowlan: "This is coming from the puppet mesh configuration (where is is documented) and can't be configured at the chart level, so I don't think it" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101919 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[12:57:00] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
[12:59:45] <logmsgbot>	 !log kartik@deploy2002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[13:00:11] <logmsgbot>	 !log kartik@deploy2002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[13:00:38] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[13:01:10] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[13:02:29] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Rename kubernetes20(17|21|22|24) to wikikube-worker[2184-2187] [puppet] - 10https://gerrit.wikimedia.org/r/1102281 (https://phabricator.wikimedia.org/T377877) (owner: 10Jelto)
[13:02:40] <wikibugs>	 (03CR) 10Brouberol: [V:03+2 C:03+2] airflow-ml: register namespaces in cloudnative/ceph operator tenant namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102268 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[13:02:44] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow-ml: define kubernetes namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102280 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[13:02:51] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow-ml: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102254 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[13:03:07] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] dse-k8s: Add tokens for the llm-inference namespace [puppet] - 10https://gerrit.wikimedia.org/r/1102287 (https://phabricator.wikimedia.org/T377266) (owner: 10Btullis)
[13:03:32] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] dse-k8s: Add a namespace for llm-inference work by the ML team [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102284 (https://phabricator.wikimedia.org/T377266) (owner: 10Btullis)
[13:03:48] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] dse-k8s: Add tokens for the llm-inference namespace [puppet] - 10https://gerrit.wikimedia.org/r/1102287 (https://phabricator.wikimedia.org/T377266) (owner: 10Btullis)
[13:04:15] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
[13:04:23] <kart_>	 !log Updated cxserver to 2024-12-10-132417-production (T369815)
[13:04:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:27] <stashbot>	 T369815: Enable in content Translation the new languages Google Translate supports in June 2024 - https://phabricator.wikimedia.org/T369815
[13:05:33] <wikibugs>	 (03PS1) 10Hnowlan: base: fix typo in CHANGELOG [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102307
[13:06:28] <wikibugs>	 (03Merged) 10jenkins-bot: airflow-ml: define kubernetes namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102280 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[13:06:42] <wikibugs>	 (03Merged) 10jenkins-bot: airflow-ml: register namespaces in cloudnative/ceph operator tenant namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102268 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[13:07:05] <wikibugs>	 (03Merged) 10jenkins-bot: airflow-ml: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102254 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[13:08:03] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Traffic-Icebox, 07affects-Kiwix-and-openZIM, 07Wikimedia-Performance-recommendation: Swift sends ETAG without double-quotes - https://phabricator.wikimedia.org/T256217#10396995 (10MatthewVernon) @TheDJ we're still emitting old-style ETags.
[13:08:03] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[13:09:19] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[13:11:13] <wikibugs>	 (03PS2) 10Brouberol: airflow-ml: define CAS config [puppet] - 10https://gerrit.wikimedia.org/r/1102257 (https://phabricator.wikimedia.org/T380258)
[13:11:13] <wikibugs>	 (03PS2) 10Brouberol: openldap: define new offloaded airflow-ml-ops group [puppet] - 10https://gerrit.wikimedia.org/r/1102258 (https://phabricator.wikimedia.org/T380258)
[13:11:13] <wikibugs>	 (03PS2) 10Brouberol: airflow-ml: define ATS mapping rules and cache settings [puppet] - 10https://gerrit.wikimedia.org/r/1102256 (https://phabricator.wikimedia.org/T380258)
[13:11:55] <wikibugs>	 (03CR) 10Klausman: [V:03+2 C:03+2] httpbb: add post deployment tests for the article-country endpoint [puppet] - 10https://gerrit.wikimedia.org/r/1102201 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[13:13:05] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow-ml: define CAS config [puppet] - 10https://gerrit.wikimedia.org/r/1102257 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[13:13:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[2017,2021-2022,2024].codfw.wmnet
[13:17:04] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: hw troubleshooting: Stuck/bugged BMC on ml-lab1002.eqiad.wmnet - https://phabricator.wikimedia.org/T381902#10397020 (10klausman) The management interface  works now, for unclear reasons. Maybe it just took forever to recover from reset(s)? It's all ver...
[13:17:55] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
[13:18:01] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
[13:18:37] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[2017,2021-2022,2024].codfw.wmnet
[13:19:27] <wikibugs>	 (03CR) 10Jelto: [C:03+2] Rename kubernetes20(17|21|22|24) to wikikube-worker[2184-2187] [puppet] - 10https://gerrit.wikimedia.org/r/1102281 (https://phabricator.wikimedia.org/T377877) (owner: 10Jelto)
[13:19:29] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
[13:19:44] <jinxer-wm>	 FIRING: [2x] IPv4AnchorUnreachable: ipv4 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable
[13:19:44] <jinxer-wm>	 FIRING: [2x] IPv6AnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv6AnchorUnreachable
[13:21:01] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes2017 to wikikube-worker2184
[13:21:22] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:21:36] <wikibugs>	 (03PS1) 10Brouberol: airflow-ml: fix typo [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102308 (https://phabricator.wikimedia.org/T380258)
[13:25:11] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow-ml: fix typo [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102308 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[13:25:19] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2017 to wikikube-worker2184 - jelto@cumin1002"
[13:25:41] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2017 to wikikube-worker2184 - jelto@cumin1002"
[13:25:41] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:25:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2184
[13:25:59] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2184
[13:26:15] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
[13:26:37] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2017 to wikikube-worker2184
[13:27:16] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
[13:27:55] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes2021 to wikikube-worker2185
[13:28:16] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:28:48] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] openldap: define new offloaded airflow-ml-ops group [puppet] - 10https://gerrit.wikimedia.org/r/1102258 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[13:29:10] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] "The underlying application was deployed:" [puppet] - 10https://gerrit.wikimedia.org/r/1102256 (https://phabricator.wikimedia.org/T380258) (owner: 10Brouberol)
[13:31:40] <jinxer-wm>	 FIRING: [2x] KubernetesRsyslogDown: rsyslog on kubernetes2022:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[13:31:50] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2021 to wikikube-worker2185 - jelto@cumin1002"
[13:32:44] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2021 to wikikube-worker2185 - jelto@cumin1002"
[13:32:44] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:32:44] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2185
[13:32:55] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2185
[13:33:34] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2021 to wikikube-worker2185
[13:34:19] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes2022 to wikikube-worker2186
[13:34:40] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:39:16] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Registry of multiple webauthn devices - https://phabricator.wikimedia.org/T380180#10397071 (10SLyngshede-WMF) To trigger webauthn for select users, we'll just reuse the groovy script from u2f and set the mfa-method field in LDAP to mfa-webauthn  ` cas.authn.mfa...
[13:39:30] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2022 to wikikube-worker2186 - jelto@cumin1002"
[13:40:43] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2022 to wikikube-worker2186 - jelto@cumin1002"
[13:40:43] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:40:44] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2186
[13:40:55] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2186
[13:41:34] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2022 to wikikube-worker2186
[13:42:23] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes2024 to wikikube-worker2187
[13:42:44] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:45:42] <wikibugs>	 (03CR) 10Btullis: [V:03+1 C:03+2] dse-k8s: Add tokens for the llm-inference namespace [puppet] - 10https://gerrit.wikimedia.org/r/1102287 (https://phabricator.wikimedia.org/T377266) (owner: 10Btullis)
[13:45:54] <wikibugs>	 (03PS1) 10Brouberol: airflow: enable the support of multiple executors [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102312 (https://phabricator.wikimedia.org/T362788)
[13:46:44] <wikibugs>	 (03PS2) 10Brouberol: airflow: enable the support of multiple executors [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102312 (https://phabricator.wikimedia.org/T362788)
[13:47:16] <wikibugs>	 (03PS2) 10Btullis: dse-k8s: Add a namespace for llm-inference work by the ML team [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102284 (https://phabricator.wikimedia.org/T377266)
[13:53:13] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2024 to wikikube-worker2187 - jelto@cumin1002"
[13:53:35] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2024 to wikikube-worker2187 - jelto@cumin1002"
[13:53:35] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:53:35] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2187
[13:53:52] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2187
[13:54:30] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2024 to wikikube-worker2187
[13:57:22] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2184.codfw.wmnet wikikube-worker2185.codfw.wmnet wikikube-worker2186.codfw.wmnet wikikube-worker2187.codfw.wmnet on all recursors
[13:57:25] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2184.codfw.wmnet wikikube-worker2185.codfw.wmnet wikikube-worker2186.codfw.wmnet wikikube-worker2187.codfw.wmnet on all recursors
[13:59:32] <wikibugs>	 (03PS1) 10Btullis: dse-k8s: Add tokens for mw-content-history-reconcile-enrich namespaces [puppet] - 10https://gerrit.wikimedia.org/r/1102314 (https://phabricator.wikimedia.org/T381322)
[14:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T1400).
[14:00:05] <jouncebot>	 Func, arlolra, and joelyrookewmde: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:10] <Func>	 o/
[14:00:13] <joelyrookewmde>	 hi
[14:00:26] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2184.codfw.wmnet with OS bookworm
[14:00:29] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2185.codfw.wmnet with OS bookworm
[14:00:30] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2186.codfw.wmnet with OS bookworm
[14:00:32] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2187.codfw.wmnet with OS bookworm
[14:00:37] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2184
[14:00:40] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2186
[14:00:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2187
[14:00:50] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[14:01:42] <wikibugs>	 (03CR) 10Btullis: [C:03+2] dse-k8s: Add a namespace for llm-inference work by the ML team [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102284 (https://phabricator.wikimedia.org/T377266) (owner: 10Btullis)
[14:02:20] <wikibugs>	 (03PS2) 10Btullis: dse-k8s: Add tokens for mw-content-history-reconcile-enrich namespaces [puppet] - 10https://gerrit.wikimedia.org/r/1102314 (https://phabricator.wikimedia.org/T381322)
[14:02:46] * TheresNoTime can deploy
[14:03:33] <wikibugs>	 (03CR) 10Samtar: [C:03+2] "start deploy" [extensions/CodeMirror] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102141 (https://phabricator.wikimedia.org/T374072) (owner: 10Func)
[14:03:50] <wikibugs>	 (03CR) 10Samtar: [C:03+2] "start deploy" [extensions/CodeMirror] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102142 (https://phabricator.wikimedia.org/T381714) (owner: 10Func)
[14:04:09] <TheresNoTime>	 while they're merging, joelyrookewmde I'll do yours first
[14:04:22] <joelyrookewmde>	 okie dokie
[14:04:34] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by samtar@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1098045 (https://phabricator.wikimedia.org/T377809) (owner: 10Joely Rooke WMDE)
[14:04:39] <wikibugs>	 07sre-alert-triage, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Alert in need of triage: SmartNotHealthy (instance stat1011:9100) - https://phabricator.wikimedia.org/T380835#10397141 (10BTullis) p:05Triage→03Medium a:03BTullis
[14:04:39] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[14:04:41] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4668/co" [puppet] - 10https://gerrit.wikimedia.org/r/1102314 (https://phabricator.wikimedia.org/T381322) (owner: 10Btullis)
[14:04:50] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2186 - jelto@cumin1002"
[14:04:54] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2186 - jelto@cumin1002"
[14:04:54] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:04:54] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2186.codfw.wmnet 180.48.192.10.in-addr.arpa 0.8.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:04:57] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2186.codfw.wmnet 180.48.192.10.in-addr.arpa 0.8.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:04:58] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2186
[14:04:59] <wikibugs>	 (03Merged) 10jenkins-bot: dse-k8s: Add a namespace for llm-inference work by the ML team [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102284 (https://phabricator.wikimedia.org/T377266) (owner: 10Btullis)
[14:05:09] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2186
[14:05:09] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2186
[14:05:19] <wikibugs>	 (03Merged) 10jenkins-bot: Remove feature flag which controls wikibase item link location [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1098045 (https://phabricator.wikimedia.org/T377809) (owner: 10Joely Rooke WMDE)
[14:05:29] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[14:05:39] <logmsgbot>	 !log samtar@deploy2002 Started scap sync-world: Backport for [[gerrit:1098045|Remove feature flag which controls wikibase item link location (T377809)]]
[14:05:43] <stashbot>	 T377809: Cleanup "Move wikidata item link into Other Projects sidebar" - https://phabricator.wikimedia.org/T377809
[14:06:11] <logmsgbot>	 !log btullis@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[14:06:36] <logmsgbot>	 !log btullis@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[14:07:48] * Lucas_WMDE also around if needed
[14:07:51] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:07:52] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2184.codfw.wmnet 41.32.192.10.in-addr.arpa 1.4.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:07:55] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2184.codfw.wmnet 41.32.192.10.in-addr.arpa 1.4.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:07:55] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2184
[14:08:05] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] dse-k8s: Add tokens for mw-content-history-reconcile-enrich namespaces [puppet] - 10https://gerrit.wikimedia.org/r/1102314 (https://phabricator.wikimedia.org/T381322) (owner: 10Btullis)
[14:08:25] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.apifeatureusage.roll-restart-reboot-logstash rolling restart_daemons on A:apifeatureusage
[14:08:32] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[14:08:59] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2184
[14:09:00] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2184
[14:09:22] <subbu>	 o/ arlo is around watching this irc channel from my laptop.
[14:09:25] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2185
[14:09:26] <logmsgbot>	 !log samtar@deploy2002 samtar, joelyrookewmde: Backport for [[gerrit:1098045|Remove feature flag which controls wikibase item link location (T377809)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:09:29] <TheresNoTime>	 joelyrookewmde: ready for testing ^
[14:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:09:33] <subbu>	 he has a backport scheduled in this window if anyone is around.
[14:10:02] <joelyrookewmde>	 *looking*
[14:10:10] <TheresNoTime>	 subbu: ack, will be doing that one next probably
[14:10:19] <subbu>	 ty
[14:10:56] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:10:56] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2187.codfw.wmnet 87.48.192.10.in-addr.arpa 7.8.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:10:57] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[14:11:00] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2187.codfw.wmnet 87.48.192.10.in-addr.arpa 7.8.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:11:00] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2187
[14:11:05] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.apifeatureusage.roll-restart-reboot-logstash (exit_code=0) rolling restart_daemons on A:apifeatureusage
[14:11:11] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2187
[14:11:11] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2187
[14:11:46] <joelyrookewmde>	 looks goof to me
[14:11:51] <joelyrookewmde>	 good*
[14:12:00] <logmsgbot>	 !log samtar@deploy2002 samtar, joelyrookewmde: Continuing with sync
[14:12:56] <icinga-wm>	 PROBLEM - Host ms-be2085 is DOWN: PING CRITICAL - Packet loss = 100%
[14:14:28] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2185 - jelto@cumin1002"
[14:14:31] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host es1043.eqiad.wmnet with OS bookworm
[14:14:32] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2185 - jelto@cumin1002"
[14:14:32] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:14:32] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2185.codfw.wmnet 89.32.192.10.in-addr.arpa 9.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:14:35] <wikibugs>	 (03Merged) 10jenkins-bot: ve.ui.CodeMirror.v6: Use plugin callback to load the actual module [extensions/CodeMirror] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102141 (https://phabricator.wikimedia.org/T374072) (owner: 10Func)
[14:14:35] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2185.codfw.wmnet 89.32.192.10.in-addr.arpa 9.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:14:36] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2185
[14:14:38] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10397163 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es1043.eqiad.wmnet with OS bookworm
[14:14:39] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on archiva1002.wikimedia.org with reason: Adding new disk
[14:14:54] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on archiva1002.wikimedia.org with reason: Adding new disk
[14:14:59] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+2] Release version 4.0.3 [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1102276 (owner: 10Elukey)
[14:15:25] <wikibugs>	 (03Merged) 10jenkins-bot: styles: Avoid misalignments when line numbering is disabled [extensions/CodeMirror] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102142 (https://phabricator.wikimedia.org/T381714) (owner: 10Func)
[14:15:25] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2185
[14:15:25] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2185
[14:17:54] <icinga-wm>	 RECOVERY - Host ms-be2085 is UP: PING OK - Packet loss = 0%, RTA = 30.46 ms
[14:18:11] <logmsgbot>	 !log samtar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1098045|Remove feature flag which controls wikibase item link location (T377809)]] (duration: 12m 32s)
[14:18:15] <stashbot>	 T377809: Cleanup "Move wikidata item link into Other Projects sidebar" - https://phabricator.wikimedia.org/T377809
[14:18:23] <TheresNoTime>	 joelyrookewmde: live on prod
[14:18:32] <wikibugs>	 (03Merged) 10jenkins-bot: Release version 4.0.3 [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1102276 (owner: 10Elukey)
[14:19:04] <TheresNoTime>	 Func: will do your two backports now
[14:19:10] <Func>	 ok
[14:19:38] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye
[14:19:43] <logmsgbot>	 !log samtar@deploy2002 Started scap sync-world: Backport for [[gerrit:1102141|ve.ui.CodeMirror.v6: Use plugin callback to load the actual module (T374072)]], [[gerrit:1102142|styles: Avoid misalignments when line numbering is disabled (T381714)]]
[14:19:48] <stashbot>	 T374072: CodeMirror 6 + 2017 wikitext editor race conditions - https://phabricator.wikimedia.org/T374072
[14:19:49] <stashbot>	 T381714: Width of the cm-content element not set when line numbering is disabled in the 2017 wikitext editor - https://phabricator.wikimedia.org/T381714
[14:22:02] <joelyrookewmde>	 thanks!!
[14:22:16] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2186.codfw.wmnet with reason: host reimage
[14:22:59] <logmsgbot>	 !log samtar@deploy2002 samtar, func: Backport for [[gerrit:1102141|ve.ui.CodeMirror.v6: Use plugin callback to load the actual module (T374072)]], [[gerrit:1102142|styles: Avoid misalignments when line numbering is disabled (T381714)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:23:05] <TheresNoTime>	 Func: ready for testing ^
[14:23:12] <Func>	 looking
[14:24:59] <Func>	 TheresNoTime: looks good
[14:25:04] <logmsgbot>	 !log samtar@deploy2002 samtar, func: Continuing with sync
[14:25:47] <wikibugs>	 (03CR) 10Gmodena: [C:03+1] "LGTM. Left you question re naming convnetions." [puppet] - 10https://gerrit.wikimedia.org/r/1102314 (https://phabricator.wikimedia.org/T381322) (owner: 10Btullis)
[14:25:52] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2186.codfw.wmnet with reason: host reimage
[14:25:58] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2184.codfw.wmnet with reason: host reimage
[14:28:17] <wikibugs>	 (03PS3) 10Btullis: dse-k8s: Add tokens for mw-content-history-reconcile-enrich namespaces [puppet] - 10https://gerrit.wikimedia.org/r/1102314 (https://phabricator.wikimedia.org/T381322)
[14:28:28] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2187.codfw.wmnet with reason: host reimage
[14:28:35] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2184.codfw.wmnet with reason: host reimage
[14:28:46] <wikibugs>	 (03CR) 10Btullis: dse-k8s: Add tokens for mw-content-history-reconcile-enrich namespaces (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1102314 (https://phabricator.wikimedia.org/T381322) (owner: 10Btullis)
[14:30:20] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] APIGW: Add configuration to expose LW isvc article-country [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102150 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[14:30:26] <logmsgbot>	 !log samtar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1102141|ve.ui.CodeMirror.v6: Use plugin callback to load the actual module (T374072)]], [[gerrit:1102142|styles: Avoid misalignments when line numbering is disabled (T381714)]] (duration: 10m 42s)
[14:30:31] <stashbot>	 T374072: CodeMirror 6 + 2017 wikitext editor race conditions - https://phabricator.wikimedia.org/T374072
[14:30:31] <stashbot>	 T381714: Width of the cm-content element not set when line numbering is disabled in the 2017 wikitext editor - https://phabricator.wikimedia.org/T381714
[14:30:36] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4669/co" [puppet] - 10https://gerrit.wikimedia.org/r/1102314 (https://phabricator.wikimedia.org/T381322) (owner: 10Btullis)
[14:30:37] <TheresNoTime>	 Func: both live on prod
[14:30:43] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] admin_ng: add the kartotherian namespace on Wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101487 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[14:30:45] <TheresNoTime>	 subbu: will do arlo's now
[14:30:49] <Func>	 thanks
[14:30:52] <subbu>	 thanks
[14:30:52] <wikibugs>	 06SRE, 10Wikimedia-Mailing-lists: https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ won't load - https://phabricator.wikimedia.org/T381980#10397235 (10Lucas_Werkmeister_WMDE) Clickable link: https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/  W...
[14:31:19] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by samtar@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101577 (owner: 10Arlolra)
[14:31:40] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2187.codfw.wmnet with reason: host reimage
[14:32:14] <wikibugs>	 06SRE, 10Wikimedia-Mailing-lists: https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ won't load - https://phabricator.wikimedia.org/T381980#10397243 (10Reedy) p:05Triage→03High
[14:32:17] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] APIGW: Add configuration to expose LW isvc article-country [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102150 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[14:32:28] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2185.codfw.wmnet with reason: host reimage
[14:32:37] <wikibugs>	 (03Merged) 10jenkins-bot: Add Atieno's public key [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101577 (owner: 10Arlolra)
[14:32:56] <logmsgbot>	 !log samtar@deploy2002 Started scap sync-world: Backport for [[gerrit:1101577|Add Atieno's public key]]
[14:33:42] <wikibugs>	 (03Merged) 10jenkins-bot: APIGW: Add configuration to expose LW isvc article-country [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102150 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[14:33:47] <TheresNoTime>	 subbu: will this patch need testing at all?
[14:33:52] <subbu>	 nope
[14:33:56] <TheresNoTime>	 ack :)
[14:34:06] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage
[14:35:45] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2185.codfw.wmnet with reason: host reimage
[14:36:17] <logmsgbot>	 !log samtar@deploy2002 arlolra, samtar: Backport for [[gerrit:1101577|Add Atieno's public key]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:36:21] <logmsgbot>	 !log samtar@deploy2002 arlolra, samtar: Continuing with sync
[14:36:23] <wikibugs>	 (03PS1) 10Jelto: trafficserver: add dedicated mapping for querybuilder [puppet] - 10https://gerrit.wikimedia.org/r/1102320 (https://phabricator.wikimedia.org/T350793)
[14:38:32] <wikibugs>	 (03CR) 10Xcollazo: data-engineering: add alerts for dumps2 flink app. (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1101849 (https://phabricator.wikimedia.org/T379362) (owner: 10Gmodena)
[14:39:18] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage
[14:41:44] <logmsgbot>	 !log samtar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101577|Add Atieno's public key]] (duration: 08m 47s)
[14:41:49] <TheresNoTime>	 subbu: live :)
[14:41:59] <subbu>	 thanks
[14:42:07] <TheresNoTime>	 !log done UTC afternoon backport window
[14:42:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:45:45] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2186.codfw.wmnet with OS bookworm
[14:48:31] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2184.codfw.wmnet with OS bookworm
[14:48:52] <wikibugs>	 (03CR) 10Gmodena: data-engineering: add alerts for dumps2 flink app. (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1101849 (https://phabricator.wikimedia.org/T379362) (owner: 10Gmodena)
[14:51:47] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2187.codfw.wmnet with OS bookworm
[14:55:36] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2185.codfw.wmnet with OS bookworm
[14:56:23] <jelto>	 !log homer 'lsw1-d5-codfw*' commit 'T377877'
[14:56:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:27] <stashbot>	 T377877: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877
[14:56:56] <logmsgbot>	 !log klausman@deploy2002 helmfile [staging] START helmfile.d/services/api-gateway: apply
[14:57:25] <logmsgbot>	 !log klausman@deploy2002 helmfile [staging] DONE helmfile.d/services/api-gateway: apply
[14:57:45] <jelto>	 !log homer 'lsw1-c3-codfw*' commit 'T377877'
[14:57:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:58:15] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1043.eqiad.wmnet with OS bookworm
[14:58:21] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10397345 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es1043.eqiad.wmnet with OS bookworm...
[14:59:58] <jelto>	 !log homer 'lsw1-d3-codfw*' commit 'T377877'
[15:00:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T1500)
[15:02:45] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye
[15:02:55] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2184-2187].codfw.wmnet
[15:02:58] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2184-2187].codfw.wmnet
[15:03:28] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T381967#10397383 (10Jelto)
[15:04:26] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] trafficserver: add dedicated mapping for querybuilder (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1102320 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[15:11:20] <wikibugs>	 (03PS1) 10Elukey: Updating docker-pkg to 4.0.3 [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/1102325
[15:11:47] <wikibugs>	 (03CR) 10Elukey: [V:03+2 C:03+2] Updating docker-pkg to 4.0.3 [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/1102325 (owner: 10Elukey)
[15:13:04] <logmsgbot>	 !log elukey@deploy2002 Started deploy [docker-pkg/deploy@9305554]: Update to 4.0.3
[15:13:34] <logmsgbot>	 !log elukey@deploy2002 Finished deploy [docker-pkg/deploy@9305554]: Update to 4.0.3 (duration: 00m 37s)
[15:13:48] <wikibugs>	 (03PS6) 10DCausse: wdqs: add graph_name in query logs [puppet] - 10https://gerrit.wikimedia.org/r/1084193 (https://phabricator.wikimedia.org/T376134)
[15:13:59] <wikibugs>	 (03CR) 10DCausse: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1084193 (https://phabricator.wikimedia.org/T376134) (owner: 10DCausse)
[15:15:34] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1025.eqiad.wmnet with reason: T376150
[15:15:37] <stashbot>	 T376150: Prepare hosts to serve wdqs-internal-main & wdqs-internal-scholarly - https://phabricator.wikimedia.org/T376150
[15:15:48] <jinxer-wm>	 FIRING: PuppetFailure: Puppet has failed on wdqs1025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[15:15:49] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1025.eqiad.wmnet with reason: T376150
[15:19:25] <logmsgbot>	 !log klausman@deploy2002 helmfile [codfw] START helmfile.d/services/api-gateway: apply
[15:19:50] <logmsgbot>	 !log klausman@deploy2002 helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
[15:20:15] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
[15:20:23] <wikibugs>	 (03PS1) 10Elukey: jaeger: fix builder changelog to remove warnings [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1102329
[15:21:01] <wikibugs>	 (03CR) 10Elukey: [V:03+2 C:03+2] jaeger: fix builder changelog to remove warnings [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1102329 (owner: 10Elukey)
[15:21:42] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
[15:22:30] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
[15:23:48] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
[15:23:54] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
[15:23:57] <wikibugs>	 (03PS1) 10CDanis: upstream_version test: be a bit more specific [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1102330
[15:24:02] <wikibugs>	 (03CR) 10Btullis: [V:03+1 C:03+2] dse-k8s: Add tokens for mw-content-history-reconcile-enrich namespaces [puppet] - 10https://gerrit.wikimedia.org/r/1102314 (https://phabricator.wikimedia.org/T381322) (owner: 10Btullis)
[15:24:18] <wikibugs>	 (03PS1) 10Elukey: spark: update 3.3 build's changelog to fix warnings [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1102331
[15:24:36] <wikibugs>	 (03CR) 10Elukey: [V:03+2 C:03+2] spark: update 3.3 build's changelog to fix warnings [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1102331 (owner: 10Elukey)
[15:25:25] <wikibugs>	 (03CR) 10Elukey: [C:03+1] upstream_version test: be a bit more specific [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1102330 (owner: 10CDanis)
[15:27:12] <wikibugs>	 (03PS1) 10Hnowlan: mediawiki: shorten mercurius job name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102332 (https://phabricator.wikimedia.org/T371701)
[15:27:18] <wikibugs>	 (03CR) 10CDanis: [C:03+2] upstream_version test: be a bit more specific [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1102330 (owner: 10CDanis)
[15:27:21] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1209 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/1102333 (https://phabricator.wikimedia.org/T381993)
[15:29:32] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mediawiki: shorten mercurius job name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102332 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[15:30:32] <wikibugs>	 (03Merged) 10jenkins-bot: upstream_version test: be a bit more specific [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1102330 (owner: 10CDanis)
[15:33:41] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: shorten mercurius job name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102332 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[15:34:55] <wikibugs>	 (03PS7) 10DCausse: wdqs: add graph_name in query logs [puppet] - 10https://gerrit.wikimedia.org/r/1084193 (https://phabricator.wikimedia.org/T376134)
[15:35:08] <logmsgbot>	 !log fabfur@cumin1002 conftool action : set/pooled=no; selector: name=cp3066.esams.wmnet
[15:36:03] <logmsgbot>	 !log fabfur@cumin1002 conftool action : set/pooled=yes; selector: name=cp3066.esams.wmnet
[15:36:30] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
[15:36:35] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
[15:36:47] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release mw-videoscaler/main on k8s@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=mw-videoscaler - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[15:37:03] <hnowlan>	 ^ just fixed
[15:37:23] <wikibugs>	 (03CR) 10DCausse: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1084193 (https://phabricator.wikimedia.org/T376134) (owner: 10DCausse)
[15:38:22] <logmsgbot>	 !log klausman@deploy2002 helmfile [eqiad] START helmfile.d/services/api-gateway: apply
[15:38:46] <logmsgbot>	 !log klausman@deploy2002 helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
[15:38:59] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on archiva1002.wikimedia.org with reason: Adding new disk
[15:39:12] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Make db2235 m5 master [puppet] - 10https://gerrit.wikimedia.org/r/1102339 (https://phabricator.wikimedia.org/T373579)
[15:39:14] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on archiva1002.wikimedia.org with reason: Adding new disk
[15:41:12] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Make db2235 m5 master [puppet] - 10https://gerrit.wikimedia.org/r/1102339 (https://phabricator.wikimedia.org/T373579) (owner: 10Marostegui)
[15:41:15] <wikibugs>	 (03CR) 10Itamar Givon: [C:03+1] trafficserver: add dedicated mapping for querybuilder [puppet] - 10https://gerrit.wikimedia.org/r/1102320 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[15:41:47] <jinxer-wm>	 RESOLVED: HelmReleaseBadStatus: Helm release mw-videoscaler/main on k8s@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=mw-videoscaler - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[15:44:04] <wikibugs>	 (03PS1) 10Marostegui: dbproxy2004,dbproxy2008: Add db2235 [puppet] - 10https://gerrit.wikimedia.org/r/1102341 (https://phabricator.wikimedia.org/T373579)
[15:44:58] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] dbproxy2004,dbproxy2008: Add db2235 [puppet] - 10https://gerrit.wikimedia.org/r/1102341 (https://phabricator.wikimedia.org/T373579) (owner: 10Marostegui)
[15:45:19] <wikibugs>	 (03PS8) 10DCausse: wdqs: add graph_name in query logs [puppet] - 10https://gerrit.wikimedia.org/r/1084193 (https://phabricator.wikimedia.org/T376134)
[15:45:41] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[15:45:47] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[15:46:31] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.177 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[15:46:37] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53069 bytes in 0.068 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[15:50:09] <wikibugs>	 (03CR) 10Bking: [C:03+2] wdqs: add graph_name in query logs [puppet] - 10https://gerrit.wikimedia.org/r/1084193 (https://phabricator.wikimedia.org/T376134) (owner: 10DCausse)
[15:54:50] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Disable master on db2135 [puppet] - 10https://gerrit.wikimedia.org/r/1102342
[15:56:15] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Disable master on db2135 [puppet] - 10https://gerrit.wikimedia.org/r/1102342 (owner: 10Marostegui)
[15:57:57] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on db2135.codfw.wmnet with reason: maintenance
[15:58:11] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2135.codfw.wmnet with reason: maintenance
[16:02:58] <wikibugs>	 (03CR) 10Ottomata: [C:03+2] "I'd like to make progress on this while I have time.  Being bold and merging.  If there are still changes needed please comment and I will" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1063222 (https://phabricator.wikimedia.org/T353817) (owner: 10Ottomata)
[16:03:45] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki.org/beacon/event/index.php - use EventBus->send [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1063222 (https://phabricator.wikimedia.org/T353817) (owner: 10Ottomata)
[16:10:41] <logmsgbot>	 !log otto@deploy2002 Started scap sync-world: Backport for [[gerrit:1063222|mediawiki.org/beacon/event/index.php - use EventBus->send (T353817)]]
[16:10:41] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:10:45] <stashbot>	 T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817
[16:10:47] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:10:49] <wikibugs>	 (03CR) 10Xcollazo: "Metrics LGTM, but I am unfamiliar with syntax." [alerts] - 10https://gerrit.wikimedia.org/r/1101849 (https://phabricator.wikimedia.org/T379362) (owner: 10Gmodena)
[16:11:24] <wikibugs>	 (03CR) 10Xcollazo: [C:03+1] data-engineering: add alerts for dumps2 flink app. [alerts] - 10https://gerrit.wikimedia.org/r/1101849 (https://phabricator.wikimedia.org/T379362) (owner: 10Gmodena)
[16:12:56] <wikibugs>	 (03PS1) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/1102346
[16:13:40] <wikibugs>	 (03CR) 10Scott French: [C:03+2] shellbox-video: allow egress to swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101944 (https://phabricator.wikimedia.org/T292322) (owner: 10Scott French)
[16:14:39] <jinxer-wm>	 RESOLVED: CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[16:14:44] <wikibugs>	 (03Merged) 10jenkins-bot: shellbox-video: allow egress to swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101944 (https://phabricator.wikimedia.org/T292322) (owner: 10Scott French)
[16:16:21] <logmsgbot>	 !log otto@deploy2002 otto: Backport for [[gerrit:1063222|mediawiki.org/beacon/event/index.php - use EventBus->send (T353817)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[16:16:25] <stashbot>	 T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817
[16:16:37] <logmsgbot>	 !log otto@deploy2002 otto: Continuing with sync
[16:18:53] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:20:28] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Documentation was my angle fwiw. Someone trying to reason about this, shouldn't have to look into the what puppet puts in for the listener" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:20:31] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.169 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:20:37] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53069 bytes in 0.066 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:20:43] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 08 Feb 2025 11:19:52 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:21:31] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox-video: apply
[16:21:39] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
[16:22:17] <logmsgbot>	 !log otto@deploy2002 Finished scap sync-world: Backport for [[gerrit:1063222|mediawiki.org/beacon/event/index.php - use EventBus->send (T353817)]] (duration: 11m 36s)
[16:22:21] <stashbot>	 T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817
[16:23:21] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.wdqs.restart
[16:24:33] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
[16:24:39] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
[16:25:26] <wikibugs>	 (03PS2) 10Herron: pyrra: switch liftwing away from increase5m metrics [puppet] - 10https://gerrit.wikimedia.org/r/1102346 (https://phabricator.wikimedia.org/T302995)
[16:25:44] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-video: apply
[16:25:48] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
[16:28:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid (k8s) 1.304s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:30:15] <wikibugs>	 (03PS6) 10Elukey: services: add helmfile config for Kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101488 (https://phabricator.wikimedia.org/T216826)
[16:32:24] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.wdqs.restart
[16:32:30] <wikibugs>	 (03PS7) 10Hnowlan: mesh.configuration: add tcp_keepalive/idle_timeout to 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701)
[16:33:03] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 1058224120 and 54 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[16:33:16] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid (k8s) 1.286s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:34:13] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
[16:35:19] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.wdqs.restart
[16:35:41] <wikibugs>	 10ops-eqiad, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T382002 (10phaultfinder) 03NEW
[16:38:05] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 14248 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[16:42:57] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
[16:43:30] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.wdqs.restart
[16:46:11] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
[16:47:19] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
[16:48:35] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.wdqs.restart
[16:49:23] <wikibugs>	 (03PS4) 10Hnowlan: mediawiki: use mesh.configuration 1.11 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101919 (https://phabricator.wikimedia.org/T371701)
[16:49:26] <wikibugs>	 (03CR) 10Scott French: [C:03+1] mesh.configuration: add tcp_keepalive/idle_timeout to 1.11.0 (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:54:19] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be208[1-8] - https://phabricator.wikimedia.org/T371400#10397999 (10MatthewVernon) 05Open→03Resolved ms-be2085 now sorted, thanks to @elukey, so closing again.
[16:54:33] <wikibugs>	 (03CR) 10Scott French: "The new comments in mesh.configuration, together with a slight wording change in the mesh CHANGELOG (see comment on parent patch) indicate" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101919 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:56:32] <wikibugs>	 (03PS8) 10Hnowlan: mesh.configuration: add tcp_keepalive/idle_timeout to 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701)
[16:57:05] <wikibugs>	 (03PS1) 10Urbanecm: [Growth] Make the typage campaign not specific to 2023 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1102350 (https://phabricator.wikimedia.org/T380405)
[17:01:01] <wikibugs>	 (03CR) 10Hnowlan: "I've added documentation above each of the sections in the template where we add the values." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[17:05:14] <wikibugs>	 (03CR) 10Elukey: [C:03+1] pyrra: switch liftwing away from increase5m metrics [puppet] - 10https://gerrit.wikimedia.org/r/1102346 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[17:05:14] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for Ammarpad - https://phabricator.wikimedia.org/T381851#10398045 (10Scott_French)
[17:05:19] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for Ammarpad - https://phabricator.wikimedia.org/T381851#10398046 (10Scott_French) Great, thank you very much @Ammarpad and @Jdlrobson.
[17:05:29] <wikibugs>	 (03PS1) 10Clément Goubert: wikikube: Decommission 8 hosts [puppet] - 10https://gerrit.wikimedia.org/r/1102352 (https://phabricator.wikimedia.org/T379788)
[17:08:00] <wikibugs>	 (03CR) 10Herron: [C:03+2] pyrra: switch liftwing away from increase5m metrics [puppet] - 10https://gerrit.wikimedia.org/r/1102346 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[17:09:37] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2047,2066,2085-2086,2180-2183].codfw.wmnet
[17:09:48] <wikibugs>	 (03PS5) 10Hnowlan: mediawiki: use mesh.configuration 1.11 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101919 (https://phabricator.wikimedia.org/T371701)
[17:11:29] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on aqs1014 - https://phabricator.wikimedia.org/T381742#10398064 (10VRiley-WMF) If I recall correctly, last time this happened we ended up replacing two drives. When would it be okay to carry out this activity?
[17:12:18] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mesh.configuration: dummy commit for 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101917 (owner: 10Hnowlan)
[17:13:23] <wikibugs>	 (03Merged) 10jenkins-bot: mesh.configuration: dummy commit for 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101917 (owner: 10Hnowlan)
[17:13:39] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[17:14:32] <wikibugs>	 (03PS2) 10DCausse: rdf-streaming-updater: add wdqs udpater streams in event stream config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1099727 (https://phabricator.wikimedia.org/T374919)
[17:16:53] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2047,2066,2085-2086,2180-2183].codfw.wmnet
[17:18:18] <wikibugs>	 (03CR) 10Jsn.sherman: [C:03+1] "LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101937 (https://phabricator.wikimedia.org/T381000) (owner: 10Kgraessle)
[17:19:40] <logmsgbot>	 !log bking@cumin2002 END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
[17:19:44] <jinxer-wm>	 FIRING: [2x] IPv4AnchorUnreachable: ipv4 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable
[17:19:44] <jinxer-wm>	 FIRING: [2x] IPv6AnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv6AnchorUnreachable
[17:21:22] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] wikikube: Decommission 8 hosts [puppet] - 10https://gerrit.wikimedia.org/r/1102352 (https://phabricator.wikimedia.org/T379788) (owner: 10Clément Goubert)
[17:24:40] <wikibugs>	 10ops-eqiad, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Q2:rack/setup E8/F8 new leaf switches - https://phabricator.wikimedia.org/T382017 (10RobH) 03NEW
[17:25:28] <wikibugs>	 10ops-eqiad, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Q2:rack/setup E8/F8 new leaf switches - https://phabricator.wikimedia.org/T382017#10398169 (10RobH) @ayounsi or @cmooney: These two switches will arrive in December.  Would one of you be able tot update this task with the cabling directions to...
[17:25:42] <wikibugs>	 10ops-eqiad, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Q2:rack/setup E8/F8 new leaf switches - https://phabricator.wikimedia.org/T382017#10398171 (10RobH)
[17:26:04] <wikibugs>	 10ops-eqiad, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Q2:rack/setup E8/F8 new leaf switches - https://phabricator.wikimedia.org/T382017#10398173 (10RobH)
[17:26:11] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 13Patch-For-Review: Decommission E/F 8 Dell switches - https://phabricator.wikimedia.org/T380050#10398174 (10RobH)
[17:28:04] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mesh.configuration: add tcp_keepalive/idle_timeout to 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[17:28:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mesh.configuration: add tcp_keepalive/idle_timeout to 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[17:30:25] <wikibugs>	 (03PS9) 10Hnowlan: mesh.configuration: add tcp_keepalive/idle_timeout to 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701)
[17:31:53] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.wdqs.restart
[17:31:55] <logmsgbot>	 !log bking@cumin2002 END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)
[17:32:00] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2047,2066,2085-2086].codfw.wmnet
[17:33:55] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 13Patch-For-Review, 10Puppet (Puppet 7.0): Backport facter to bullseye - https://phabricator.wikimedia.org/T381538#10398196 (10jhathaway) 05Open→03In progress
[17:34:26] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 13Patch-For-Review, 10Puppet (Puppet 7.0): Backport facter to bullseye - https://phabricator.wikimedia.org/T381538#10398198 (10jhathaway) 05In progress→03Resolved
[17:34:38] <wikibugs>	 (03PS1) 10Clément Goubert: wikikube: Decom wikikube-worker2086 [puppet] - 10https://gerrit.wikimedia.org/r/1102357 (https://phabricator.wikimedia.org/T379788)
[17:34:53] <wikibugs>	 (03PS2) 10Clément Goubert: wikikube: Decom wikikube-worker2086 [puppet] - 10https://gerrit.wikimedia.org/r/1102357 (https://phabricator.wikimedia.org/T379788)
[17:35:35] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Update code to the last two MRs [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1102360
[17:35:40] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] wikikube: Decom wikikube-worker2086 [puppet] - 10https://gerrit.wikimedia.org/r/1102357 (https://phabricator.wikimedia.org/T379788) (owner: 10Clément Goubert)
[17:35:59] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Update code to the last two MRs [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1102360 (owner: 10Giuseppe Lavagetto)
[17:37:46] <logmsgbot>	 !log oblivian@cumin1002 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "UI improvements, add uncomitted changes warning - oblivian@cumin1002"
[17:37:48] <logmsgbot>	 !log oblivian@cumin1002 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: UI improvements, add uncomitted changes warning - oblivian@cumin1002
[17:38:19] <logmsgbot>	 !log oblivian@cumin1002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: UI improvements, add uncomitted changes warning - oblivian@cumin1002
[17:38:20] <logmsgbot>	 !log oblivian@cumin1002 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "UI improvements, add uncomitted changes warning - oblivian@cumin1002"
[17:40:09] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 11 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101937 (https://phabricator.wikimedia.org/T381000) (owner: 10Kgraessle)
[17:41:19] <icinga-wm>	 PROBLEM - BGP status on lsw1-a6-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:45:19] <icinga-wm>	 PROBLEM - BGP status on lsw1-b6-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:45:36] <wikibugs>	 06SRE, 10fundraising-tech-ops: Q1:rack/setup/install fransw200[1-3].frack.codfw.wmnet - https://phabricator.wikimedia.org/T367800#10398276 (10Jhancock.wm) @Dwisehaupt @Papaul got the cable reconnected and confirmed it pings. if there are any issues with it, lmk and I'll take care of it asap.
[17:45:58] <claime>	 BGP alerts are jasmine_ and I decommissioning k8s nodes
[17:47:31] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 34086720 and 36 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[17:48:31] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 85192 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[17:48:39] <wikibugs>	 06SRE, 10Wikimedia-Mailing-lists: https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ won't load - https://phabricator.wikimedia.org/T381980#10398289 (10Wargo) And now?
[17:48:39] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.dns.netbox
[17:49:26] <wikibugs>	 06SRE, 10fundraising-tech-ops: Q1:rack/setup/install fransw200[1-3].frack.codfw.wmnet - https://phabricator.wikimedia.org/T367800#10398295 (10Dwisehaupt) @Jhancock.wm Thanks! I can confirm that I'm in.
[17:52:34] <wikibugs>	 (03PS2) 10Hnowlan: base: fix pin on base.meta [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102307
[17:53:36] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2047,2066,2085-2086].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
[17:54:15] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2047,2066,2085-2086].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
[17:54:15] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:54:16] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2047,2066,2085-2086].codfw.wmnet
[17:55:00] <claime>	 !log homer 'lsw1-a6-codfw' commit 'T379788'
[17:55:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:55:04] <stashbot>	 T379788: Decommission kubernetes20[07-14].codfw.wmnet - https://phabricator.wikimedia.org/T379788
[17:56:53] <wikibugs>	 (03CR) 10Hnowlan: mesh.configuration: add tcp_keepalive/idle_timeout to 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[17:57:01] <claime>	 homer 'lsw1-b6-codfw*' commit 'T379788'
[17:57:12] <wikibugs>	 (03CR) 10Hnowlan: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[17:57:23] <icinga-wm>	 RECOVERY - BGP status on lsw1-a6-codfw.mgmt is OK: BGP OK - up: 40, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:58:26] <icinga-wm>	 RECOVERY - BGP status on lsw1-b6-codfw.mgmt is OK: BGP OK - up: 34, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:58:32] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T381967#10398377 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm
[17:58:33] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T373133#10398381 (10VRiley-WMF)
[17:58:47] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[17:58:50] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[17:59:16] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[17:59:19] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T1800)
[18:00:07] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2180-2183].codfw.wmnet
[18:03:12] <icinga-wm>	 PROBLEM - BGP status on lsw1-c1-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[18:04:28] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[18:04:30] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[18:05:13] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[18:05:14] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[18:05:32] <wikibugs>	 (03PS1) 10Herron: alertmanager: remove manually defined sli missing alert in favor or pyrra provided alert [alerts] - 10https://gerrit.wikimedia.org/r/1102366 (https://phabricator.wikimedia.org/T302995)
[18:05:42] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[18:05:44] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[18:06:14] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[18:06:17] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[18:06:24] <icinga-wm>	 PROBLEM - BGP status on lsw1-d6-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[18:08:06] <wikibugs>	 (03CR) 10Herron: alertmanager: remove manually defined sli missing alert in favor or pyrra provided alert (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1102366 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[18:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:10:11] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.dns.netbox
[18:11:14] <wikibugs>	 (03PS10) 10Hnowlan: mesh.configuration: add tcp_keepalive/idle_timeout to 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701)
[18:15:01] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2180-2183].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
[18:16:14] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2180-2183].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
[18:16:14] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[18:16:15] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2180-2183].codfw.wmnet
[18:17:05] <claime>	 !log homer 'lsw1-c1-codfw*' commit 'T379788'
[18:17:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:17:09] <stashbot>	 T379788: Decommission kubernetes20[07-14].codfw.wmnet - https://phabricator.wikimedia.org/T379788
[18:18:12] <icinga-wm>	 RECOVERY - BGP status on lsw1-c1-codfw.mgmt is OK: BGP OK - up: 6, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[18:18:21] <claime>	 !log homer 'lsw1-d6-codfw*' commit 'T379788'
[18:18:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:24] <icinga-wm>	 RECOVERY - BGP status on lsw1-d6-codfw.mgmt is OK: BGP OK - up: 18, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[18:25:54] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure: reimage puppetmasters to puppetservers - https://phabricator.wikimedia.org/T345067#10398512 (10jhathaway)
[18:40:46] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Thanks, Hugh! Yeah, I think this should address the "duplicate modules upon vendoring" issue now that base.helper 1.1.4 exists." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102307 (owner: 10Hnowlan)
[18:42:15] <wikibugs>	 10ops-codfw, 06DC-Ops, 06serviceops: Decommission kubernetes20[07-14].codfw.wmnet - https://phabricator.wikimedia.org/T379788#10398556 (10jasmine_) a:05jasmine_→03None
[18:49:24] <wikibugs>	 10ops-magru, 06Traffic: magru temp check - https://phabricator.wikimedia.org/T382026 (10RobH) 03NEW p:05Triage→03Medium
[18:56:15] <wikibugs>	 10ops-esams, 10ops-magru, 06SRE, 06DC-Ops, 06Traffic: CPU temperature issues in cp hosts - https://phabricator.wikimedia.org/T373993#10398622 (10RobH)
[19:13:06] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504#10398712 (10VRiley-WMF) a:03VRiley-WMF
[19:27:04] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Console domain and property access request - https://phabricator.wikimedia.org/T381904#10398751 (10Scott_French) a:05Scott_French→03None Great, thank you @NBaca-WMF.  Alright, it seems like there are two different issues intertwined here:  **Page annotations opt-outs**...
[19:37:38] <wikibugs>	 (03PS1) 10Eevans: aqs: Upgrade Cassandra to 4.1.7 [puppet] - 10https://gerrit.wikimedia.org/r/1102377 (https://phabricator.wikimedia.org/T380420)
[19:40:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on aqs1014 - https://phabricator.wikimedia.org/T381742#10398772 (10Eevans) >>! In T381742#10398064, @VRiley-WMF wrote: > If I recall correctly, last time this happened we ended up replacing two drives.  We did.  The original drive that had failed, and another that...
[19:44:35] <wikibugs>	 (03CR) 10Eevans: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1102377 (https://phabricator.wikimedia.org/T380420) (owner: 10Eevans)
[19:57:38] <wikibugs>	 (03CR) 10Scott French: [C:03+1] mesh.configuration: add tcp_keepalive/idle_timeout to 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[20:10:55] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 11 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [extensions/PageTriage] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102205 (https://phabricator.wikimedia.org/T381741) (owner: 10Novem Linguae)
[20:22:18] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504#10398927 (10VRiley-WMF)
[20:24:19] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504#10398939 (10VRiley-WMF)
[20:42:06] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on aqs1014.eqiad.wmnet with reason: Hardware replacement
[20:42:21] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on aqs1014.eqiad.wmnet with reason: Hardware replacement
[20:44:14] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service aqs1014-a:9042 has failed probes (tcp_cassandra_a_cql_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:45:57] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on aqs1014 - https://phabricator.wikimedia.org/T381742#10399020 (10VRiley-WMF) Replaced serial number S4KVNA0MB04873 (Slot 6) With S4KVNA0MB04856
[20:47:09] <jinxer-wm>	 FIRING: [4x] ProbeDown: Service aqs1014-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:47:26] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] "It was probably a mistake of mine. I should have pinned the minor version, not the patch one. Thanks for the fix!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1102307 (owner: 10Hnowlan)
[20:56:50] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on aqs1014 is CRITICAL: CRITICAL: State: degraded, Active: 11, Working: 12, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T382033 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[20:56:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on aqs1014 - https://phabricator.wikimedia.org/T382033 (10ops-monitoring-bot) 03NEW
[20:57:09] <jinxer-wm>	 RESOLVED: [4x] ProbeDown: Service aqs1014-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: That opportune time for a UTC late backport window deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T2100).
[21:00:05] <jouncebot>	 katherine_g and NovemLinguae: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:10] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host cloudelastic1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[21:00:12] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host cloudelastic1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[21:00:13] <katherine_g>	 here
[21:00:16] <NovemLinguae>	 o/
[21:00:25] <NovemLinguae>	 hey katie :)
[21:00:30] <katherine_g>	 hi :) 
[21:00:48] <NovemLinguae>	 pagetriage backport today. got a bug
[21:01:04] <wikibugs>	 07SRE-Unowned: The ops-maint-gcal.js script is missing support for some vendors - https://phabricator.wikimedia.org/T381680#10399059 (10Scott_French) I was able to reproduce the Arelion issue with https://groups.google.com/u/0/a/wikimedia.org/g/ops-maintenance/c/TGXNGkB-gSo (yes, this is a reminder for a mainten...
[21:01:08] * TheresNoTime can deploy if needed
[21:01:24] <NovemLinguae>	 yes please. no deployers at the last backport i attended :P
[21:01:40] <katherine_g>	 yes please
[21:01:49] <TheresNoTime>	 katherine_g: I'll start with yours then :)
[21:01:57] <katherine_g>	 thanks! 
[21:02:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by samtar@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101937 (https://phabricator.wikimedia.org/T381000) (owner: 10Kgraessle)
[21:02:52] <wikibugs>	 (03Merged) 10jenkins-bot: Enable AutoModerator on bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101937 (https://phabricator.wikimedia.org/T381000) (owner: 10Kgraessle)
[21:03:03] <wikibugs>	 06SRE, 10fundraising-tech-ops: Q1:rack/setup/install fransc2001 - https://phabricator.wikimedia.org/T367816#10399064 (10Dwisehaupt) 05Open→03Resolved Host is built out and in the configuration stages which is covered in other tasks. Closing.
[21:03:12] <logmsgbot>	 !log samtar@deploy2002 Started scap sync-world: Backport for [[gerrit:1101937|Enable AutoModerator on bnwiki (T381000)]]
[21:03:15] <stashbot>	 T381000: Enable AutoModerator on bnwiki - https://phabricator.wikimedia.org/T381000
[21:03:33] <wikibugs>	 (03CR) 10Samtar: [C:03+2] "start merge for deploy" [extensions/PageTriage] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102205 (https://phabricator.wikimedia.org/T381741) (owner: 10Novem Linguae)
[21:03:39] <wikibugs>	 06SRE, 10fundraising-tech-ops: Q1:rack/setup/install fransw200[1-3].frack.codfw.wmnet - https://phabricator.wikimedia.org/T367800#10399071 (10Dwisehaupt) 05Open→03Resolved Hosts are built out and in the configuration stages which is covered in other tasks. Closing.
[21:05:14] <katherine_g>	 I'm good to sync
[21:08:01] <logmsgbot>	 !log samtar@deploy2002 kgraessle, samtar: Backport for [[gerrit:1101937|Enable AutoModerator on bnwiki (T381000)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:08:10] <TheresNoTime>	 katherine_g: hadn't yet properly hit the test servers - could you just double-check and then I'll sync? :)
[21:08:51] <katherine_g>	 yep- we're good
[21:08:57] <TheresNoTime>	 thanks! :)
[21:08:59] <logmsgbot>	 !log samtar@deploy2002 kgraessle, samtar: Continuing with sync
[21:09:14] <katherine_g>	 np
[21:10:32] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[21:10:37] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[21:11:39] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host cloudelastic1011.eqiad.wmnet with OS bullseye
[21:11:40] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS bullseye
[21:11:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10399103 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudelastic...
[21:11:50] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10399104 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudelastic...
[21:11:58] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10399106 (10Jclark-ctr)
[21:13:39] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[21:14:13] <logmsgbot>	 !log samtar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101937|Enable AutoModerator on bnwiki (T381000)]] (duration: 11m 01s)
[21:14:17] <stashbot>	 T381000: Enable AutoModerator on bnwiki - https://phabricator.wikimedia.org/T381000
[21:14:18] <TheresNoTime>	 katherine_g: done :) live on prod
[21:14:55] <katherine_g>	 thanks! 
[21:15:03] <TheresNoTime>	 NovemLinguae: another ~8 minutes for your patch to merge
[21:15:16] <NovemLinguae>	 👍
[21:19:44] <jinxer-wm>	 FIRING: [2x] IPv4AnchorUnreachable: ipv4 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable
[21:19:44] <jinxer-wm>	 FIRING: [2x] IPv6AnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv6AnchorUnreachable
[21:21:25] <wikibugs>	 (03Merged) 10jenkins-bot: Follow-up I9df39fdcc: Convert missed 'this' to 'el' [extensions/PageTriage] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1102205 (https://phabricator.wikimedia.org/T381741) (owner: 10Novem Linguae)
[21:21:47] <NovemLinguae>	 merged
[21:22:02] <logmsgbot>	 !log samtar@deploy2002 Started scap sync-world: Backport for [[gerrit:1102205|Follow-up I9df39fdcc: Convert missed 'this' to 'el' (T381741)]]
[21:22:06] <stashbot>	 T381741: Toolbar tag flyout: changing tag groups is broken - https://phabricator.wikimedia.org/T381741
[21:22:36] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS bullseye
[21:22:43] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10399135 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host cloudelastic1012...
[21:22:43] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1011.eqiad.wmnet with OS bullseye
[21:22:49] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10399136 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host cloudelastic1011...
[21:25:58] <logmsgbot>	 !log samtar@deploy2002 novemlinguae, samtar: Backport for [[gerrit:1102205|Follow-up I9df39fdcc: Convert missed 'this' to 'el' (T381741)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:26:04] <TheresNoTime>	 NovemLinguae: on mwdebug for testing ^
[21:26:39] <NovemLinguae>	 tested, works :)
[21:26:52] <logmsgbot>	 !log samtar@deploy2002 novemlinguae, samtar: Continuing with sync
[21:32:04] <logmsgbot>	 !log samtar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1102205|Follow-up I9df39fdcc: Convert missed 'this' to 'el' (T381741)]] (duration: 10m 01s)
[21:32:08] <stashbot>	 T381741: Toolbar tag flyout: changing tag groups is broken - https://phabricator.wikimedia.org/T381741
[21:32:10] <TheresNoTime>	 NovemLinguae: done, live on prod :) 
[21:32:32] <NovemLinguae>	 awesome. thank you very much for your time
[21:32:40] <NovemLinguae>	 TheresNoTime ;-)
[21:32:57] <TheresNoTime>	 np! :D
[21:33:36] <TheresNoTime>	 !log done UTC late backport window
[21:33:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:34:54] <wikibugs>	 (03PS8) 10Kamila Součková: [WIP, DNM] create sre.k8s.roll-reimage-nodes [cookbooks] - 10https://gerrit.wikimedia.org/r/1094494 (https://phabricator.wikimedia.org/T377857)
[21:35:04] <wikibugs>	 (03CR) 10Kamila Součková: [WIP, DNM] create sre.k8s.roll-reimage-nodes (037 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1094494 (https://phabricator.wikimedia.org/T377857) (owner: 10Kamila Součková)
[21:59:26] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): Kernel error Server cloudvirt1061 may have kernel errors - https://phabricator.wikimedia.org/T380673#10399164 (10Jclark-ctr) i have updated firmwares and dell sees no issues. these where ordered with 512 memory and are listing the correct amou...
[22:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241211T2200)
[22:05:41] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on aqs1014 - https://phabricator.wikimedia.org/T382033#10399184 (10Jclark-ctr) a:03VRiley-WMF @VRiley-WMF  looks like it came back  T362841 same drive SDG
[22:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:12:21] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: hw troubleshooting: Stuck/bugged BMC on ml-lab1002.eqiad.wmnet - https://phabricator.wikimedia.org/T381902#10399191 (10Jclark-ctr) 05Open→03Resolved
[22:19:07] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): Kernel error Server cloudvirt1061 may have kernel errors - https://phabricator.wikimedia.org/T380673#10399201 (10wiki_willy) @Jclark-ctr - there's nothing that I'm aware of.  If there's no additional info in the original procurement task or an...
[22:42:49] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on aqs1014 - https://phabricator.wikimedia.org/T381742#10399231 (10Eevans) Status: Rebuilding...
[22:52:52] <tzatziki>	 !log removing three files for legal compliance
[22:52:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:02:10] <tzatziki>	 !log removing 4 files for legal compliance
[23:02:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:24:31] <tzatziki>	 !log removing 7 files for legal compliance
[23:24:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log