[10:27:31] 06Traffic, 10conftool, 10Sustainability (Incident Followup): Research allowing read-only access to the superset api from requestctl's web UI - https://phabricator.wikimedia.org/T379718 (10Joe) 03NEW [15:14:36] 06Traffic, 10ops-magru, 06SRE: magru: Incorrect racking for magru hosts (F-25G and Custom Config interchanged) - https://phabricator.wikimedia.org/T376737#10317193 (10RobH) [15:19:42] 06Traffic, 10ops-magru, 06SRE: magru: Incorrect racking for magru hosts (F-25G and Custom Config interchanged) - https://phabricator.wikimedia.org/T376737#10317221 (10RobH) I've scheduled the work via remote hands ticket CS1028070 and also detailed on that ticket the multiple ways to reach me during the wind... [16:00:18] 06Traffic: Investigate transport errors in eqsin - https://phabricator.wikimedia.org/T379611#10317520 (10Fabfur) 05Open→03Resolved Issue was related to a too strict timeout settings on haproxykafka while checking for cluster metadata. This has been fixed with https://gitlab.wikimedia.org/repos/sre/haprox... [16:01:02] 06Traffic, 13Patch-For-Review: haproxykafka hardening - https://phabricator.wikimedia.org/T379237#10317527 (10Fabfur) 05Open→03Resolved [16:01:14] 06Traffic, 10Data-Engineering (Q2 2024 October 1st - December 31th), 13Patch-For-Review: Rollout haproxykafka on all hosts - https://phabricator.wikimedia.org/T378578#10317529 (10Fabfur) [16:08:18] 06Traffic: Enable SSL client authentication on haproxykafka - https://phabricator.wikimedia.org/T379776 (10Fabfur) 03NEW [16:24:18] 10netops, 06Infrastructure-Foundations, 10procurement, 06SRE: Decom prod infra side of the ulsfo-office link - https://phabricator.wikimedia.org/T379778 (10ayounsi) 03NEW [16:33:12] 10netops, 06Infrastructure-Foundations, 10procurement, 06SRE: Decom prod infra side of the ulsfo-office link - https://phabricator.wikimedia.org/T379778#10317697 (10ayounsi) [16:47:06] 10netops, 06Infrastructure-Foundations, 10procurement, 06SRE, 13Patch-For-Review: Decom prod infra side of the ulsfo-office link - https://phabricator.wikimedia.org/T379778#10317783 (10RobH) [16:50:07] 10netops, 06Infrastructure-Foundations, 10procurement, 06SRE, 13Patch-For-Review: Decom prod infra side of the ulsfo-office link - https://phabricator.wikimedia.org/T379778#10317801 (10RobH) I'll have the xconnect disconnected by remote hands during the cross-connect disconnection, putting in the cross c... [17:39:51] 10netops, 06Infrastructure-Foundations, 06serviceops, 07Kubernetes: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790 (10akosiaris) 03NEW [18:13:04] 06Traffic, 10conftool, 10Sustainability (Incident Followup): Research allowing read-only access to the superset api from requestctl's web UI - https://phabricator.wikimedia.org/T379718#10318358 (10BTullis) I think that we can do this, but maybe we should try to avoid changing the production Superset instance... [18:17:31] 06Traffic: Package and deploy ATS 9.2.6 - https://phabricator.wikimedia.org/T379797 (10ssingh) 03NEW [18:17:35] 06Traffic: Package and deploy ATS 9.2.6 - https://phabricator.wikimedia.org/T379797#10318387 (10ssingh) p:05Triage→03Medium [18:26:47] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10318425 (10cmooney) 05Open→03Resolved @Jclark-ctr I've erased the config on all the old devices no... [18:28:59] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Q1:eqiad:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371435#10318437 (10cmooney) @robh the migration work is now done, all that remains is to remove the old devices and any cables connecting... [18:36:38] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Q1:eqiad:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371435#10318507 (10RobH) a:03Jclark-ctr I'd hand this over to either John or Valerie as ops-eqiad for them to remove any devices and ca... [18:41:23] 10netops, 06Infrastructure-Foundations, 06serviceops, 07Kubernetes: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790#10318533 (10cmooney) Polling Netbox to find what switch each of those are connec... [19:23:14] 06Traffic, 10Observability-Alerting, 06SRE: PuppetFailure alert is not being fired for host(s) where agent has failed - https://phabricator.wikimedia.org/T379807 (10ssingh) 03NEW [19:23:54] 06Traffic, 10Observability-Alerting, 06SRE: PuppetFailure alert is not being fired for host(s) where agent has failed - https://phabricator.wikimedia.org/T379807#10318786 (10ssingh) p:05Triage→03Medium [20:50:15] uhh what's up with those authdns sync alerts? [20:52:00] all good, resolving [20:52:13] authdns-update was not run I am assuming [20:53:12] jgreen just ran it [20:57:39] ah ok [21:02:06] the alert is very unforgiving, intentionally though. I have deliberated bumping up the timing and I did once but I do think there is value in keeping it fairly reasonable [21:02:46] if it ends up being too noisy though, I will bump it up more [21:04:02] 👍 [21:04:21] definitely better to start out on the unforgiving side and tune looser :D [21:32:12] :) thanks for keeping an eye out [22:22:31] 06Traffic, 10Observability-Alerting, 06SRE: PuppetFailure alert is not being fired for host(s) where agent has failed - https://phabricator.wikimedia.org/T379807#10319559 (10colewhite) The issue [[ https://grafana-rw.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&from=1731511462516&to=1731528409686&forceLogin&view... [23:42:39] 06Traffic, 06Data-Persistence, 06SRE, 10SRE-swift-storage, and 6 others: Change default image thumbnail size - https://phabricator.wikimedia.org/T355914#10320113 (10Jdlrobson)