[07:08:06] "Exciting News! GTT adds telephone-based customer support" welcome to 1990 [07:23:03] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw racks D5 & D6 from asw to lsw - https://phabricator.wikimedia.org/T373104#10147444 (10ABran-WMF) [] db2129: cm s6 T374806→switchback [] db2140: m s4 T374804 [] db2218: m s7 T374807 [07:23:57] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Migrate servers in codfw racks D3 & D4 from asw to lsw - https://phabricator.wikimedia.org/T373103#10147440 (10ABran-WMF) [] db2213: m s5 T374805 [] db2214: m s6 T374806 [07:24:11] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D7 & D8 from asw to lsw - https://phabricator.wikimedia.org/T373105#10147456 (10ABran-WMF) [] db2220: cm s7 T374807→switchback [08:19:45] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10147625 (10ayounsi) Catching up on backlog, I think {T374803} is related. [08:57:27] 10netops, 06Infrastructure-Foundations, 06SRE: Enable BFD on 'core' EBGP peerings from L3 switches to CRs - https://phabricator.wikimedia.org/T374452#10147824 (10ayounsi) Not sure it's worth it for direct (short) links. The tradeoff is to rely on an extra protocol, extra config, and adding load on the device... [09:02:34] 10netops, 06Infrastructure-Foundations, 06SRE: ToR server-move Netbox script adding ".0" to end of interface names - https://phabricator.wikimedia.org/T374024#10147850 (10ayounsi) @cmooney thanks for your patch ! is there something left to do on this ? [09:26:48] 10netops, 06Infrastructure-Foundations, 06SRE: Cloud IPv6 subnets - https://phabricator.wikimedia.org/T187929#10147916 (10ayounsi) Awesome, great to see progress here ! [09:31:18] 10netops, 06Infrastructure-Foundations, 06SRE: Cloud IPv6 subnets - https://phabricator.wikimedia.org/T187929#10147929 (10aborrero) 05Open→03Resolved a:03aborrero it seems there is agreement in the addressing plan. Marking as resolved, will work on {T374712} next. [09:37:56] 10netbox, 06Infrastructure-Foundations, 10Observability-Alerting, 10SRE Observability (FY2024/2025-Q1): Port netbox reports checks to Prometheus/Alertmanager - https://phabricator.wikimedia.org/T374823 (10fgiunchedi) 03NEW [09:42:01] 10netops, 06Infrastructure-Foundations, 06SRE: Create alerting for saturation on sub-rated interfaces - https://phabricator.wikimedia.org/T374614#10147994 (10ayounsi) Short term I think if you add `[4Gbps]` to the interface description, LibreNMS will [[ https://docs.librenms.org/Extensions/Interface-Descript... [10:48:09] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic: Alert when anycast-healthchecker withdraws BGP route - https://phabricator.wikimedia.org/T374619#10148385 (10ayounsi) Note that the Bird exporter is already up and running: https://grafana.wikimedia.org/d/dxbfeGDZk/anycast We could in theory correl... [10:54:20] XioNoX: that's real news from GTT? :/ [10:54:56] haha yeah, sorting through my email backlog [10:54:56] (welcome back, by the way) [10:57:03] time to ditch such companies, but yeah, hard to do so [11:23:44] 10netops, 06Infrastructure-Foundations, 06SRE: netbox: create IPv6 entries for Cloud VPS - https://phabricator.wikimedia.org/T374712#10148509 (10aborrero) p:05Triage→03Medium [11:32:42] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Migrate servers in codfw racks D3 & D4 from asw to lsw - https://phabricator.wikimedia.org/T373103#10148524 (10ABran-WMF) all hosts are depoolable for this task [13:20:35] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 13Patch-For-Review: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10148844 (10Jgreen) [13:30:47] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack: Spicerack: expand Supermicro support in the Redfish module - https://phabricator.wikimedia.org/T365372#10148864 (10elukey) Cross-posting from T365167#10148384, where I am testing a reimage for sretest2001. On sretest2001 we have 10G/25G cap... [14:11:58] 07Puppet, 06Infrastructure-Foundations, 10Keyholder, 06SRE: keyholder-proxy doesn't restart on config change - https://phabricator.wikimedia.org/T374711#10149030 (10joanna_borun) p:05Triage→03Low [14:12:14] 07Puppet, 06Infrastructure-Foundations, 10Keyholder, 06SRE: keyholder-proxy doesn't restart on config change - https://phabricator.wikimedia.org/T374711#10149032 (10elukey) [14:18:32] 10netbox, 06Infrastructure-Foundations, 10Observability-Alerting, 10SRE Observability (FY2024/2025-Q1): Port netbox reports checks to Prometheus/Alertmanager - https://phabricator.wikimedia.org/T374823#10149047 (10SLyngshede-WMF) p:05Triage→03Low a:03SLyngshede-WMF [14:20:34] 10SRE-tools, 06Infrastructure-Foundations, 06SRE: Per host access control for kerberized SSH - https://phabricator.wikimedia.org/T276790#10149062 (10joanna_borun) dependent on https://phabricator.wikimedia.org/T244840 [14:22:17] 10Mail, 06Infrastructure-Foundations, 06SRE, 07Surveys: Qualtrics cannot send email to wikimedia.org addresses - https://phabricator.wikimedia.org/T176666#10149077 (10joanna_borun) 05Open→03Declined [14:23:02] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10149067 (10ayounsi) Those won't be in a VC, especially as we didn't pay for the extra VC license :) This means a bit more... [14:23:17] 10SRE-tools, 06Infrastructure-Foundations, 06SRE, 10Release-Engineering-Team (Seen): Support running puppet Beaker on CI - https://phabricator.wikimedia.org/T253635#10149074 (10joanna_borun) @hashar is this task still valid? [14:26:23] 10SRE-tools, 10Icinga, 06Infrastructure-Foundations: get-raid-status-perccli should allow for commands to return non-zero exit code - https://phabricator.wikimedia.org/T320998#10149097 (10SLyngshede-WMF) p:05Medium→03Low a:03SLyngshede-WMF [14:28:48] 10Mail, 06Infrastructure-Foundations, 06SRE: 2022-05-09 Exim BDAT Errors incident - https://phabricator.wikimedia.org/T309238#10149110 (10jhathaway) 05Open→03Resolved a:03jhathaway Fixed with change in the config, also no longer relative, as we are now running Postfix [14:33:34] 10SRE-tools, 06Infrastructure-Foundations, 06SRE: Improve sre.hosts.decommission (additionally find host yaml files) - https://phabricator.wikimedia.org/T257297#10149234 (10elukey) 05Open→03Declined Probably not needed anymore :) [14:34:04] 10SRE-tools, 06Infrastructure-Foundations: Clarify 'wipe bootloader' step in sre.hosts.decommission - https://phabricator.wikimedia.org/T283204#10149250 (10joanna_borun) 05Open→03Declined [14:36:04] 10SRE-tools, 06Infrastructure-Foundations, 10Observability-Alerting: Spicerack: add support for Alertmanager - https://phabricator.wikimedia.org/T293209#10149255 (10Volans) 05Open→03Resolved a:03Volans The alertmanager support has been in place for a long time. Resolving. Any additional feature wil... [14:36:06] 10SRE-tools, 06Infrastructure-Foundations: Clarify 'wipe bootloader' step in sre.hosts.decommission - https://phabricator.wikimedia.org/T283204#10149270 (10Volans) As there were no agreement here on task and multiple years have passed we decided to close it. Feel free to reopen in case there is more consen... [18:28:06] 10SRE-tools, 06Infrastructure-Foundations, 06SRE: Pairing tool for new SREs using sudo under supervision - https://phabricator.wikimedia.org/T299989#10150478 (10CDanis) Today we saw another good use case for `sudo_pair`: while troubleshooting and firefighting a #phatality deploy gone wrong (T374880), several... [20:28:53] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10150834 (10Papaul)