[08:19:24] 10CAS-SSO, 06Infrastructure-Foundations, 06SRE: CLI tools for CAS administration - https://phabricator.wikimedia.org/T233940#10151801 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff The original script landed in the logout cookbook. [09:01:48] 10SRE-tools, 06cloud-services-team, 06Infrastructure-Foundations, 07IPv6: Some WMCS clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271139#10151973 (10Volans) This is the update list as of today: `clouddb2002-dev,cloudlb2004-dev,clouddb[1013-1020]`. I guess that the clouddb are... [12:54:26] 10netops, 06Infrastructure-Foundations, 10probenet, 06SRE, 06Traffic: improve GeoDNS-to-edge mapping - https://phabricator.wikimedia.org/T316160#10152990 (10CDanis) We did a somewhat experimental version of this work as @JameelKaisar's intern project in {T332024} and friends. The infrastructure pieces h... [12:56:33] 10netops, 06Infrastructure-Foundations, 10probenet, 06SRE, 06Traffic: improve GeoDNS-to-edge mapping - https://phabricator.wikimedia.org/T316160#10153011 (10CDanis) Oh, and one related thing, we should fix T347114 -- I think that's just a VCL change. [14:42:34] small code review if you have time https://gerrit.wikimedia.org/r/c/operations/puppet/+/1073465 [14:43:32] +1 [14:43:40] thankssss [14:52:37] 10SRE-tools, 06Infrastructure-Foundations: Output test logs of production testing of the pre switchover tasks related to databases - https://phabricator.wikimedia.org/T374972 (10jcrespo) 03NEW [15:01:42] 10SRE-tools, 06Infrastructure-Foundations: Output test logs of production testing of the pre switchover tasks related to databases - https://phabricator.wikimedia.org/T374972#10153634 (10ops-monitoring-bot) cookbooks.sre.switchdc.databases for the switch from eqiad to codfw started by jynus@cumin1002 [15:02:40] 10SRE-tools, 06Infrastructure-Foundations: Output test logs of production testing of the pre switchover tasks related to databases - https://phabricator.wikimedia.org/T374972#10153642 (10ops-monitoring-bot) cookbooks.sre.switchdc.databases for the switch from eqiad to codfw started by jynus@cumin1002 completed... [15:05:08] elukey: moritzm: about lintian for chartmuseym and helm3, I ahve deployed the change [15:05:26] however I think the lintian error can be disabled in our lintian profile. I have left a note at https://gerrit.wikimedia.org/r/c/integration/config/+/1073426/comments/493e1b1b_982a959c [15:05:40] cause making the job non voting would ignore any other legit build failure :) [15:09:47] adding missing-notice-file-for-apache-license is definitely a useful change per se [15:10:15] but the chartmuseum build also failed for other reasons, namely that the runner appears to configure backports, which is gone for buster [15:10:34] and separately, we're rebuilding chartmuseum for bookworm as the new target OS [15:16:14] ah yeah backports is surely to be gone [15:16:34] I think the issue was that in `debian/changelog` we can't not mention we want both 'wikimedia' and 'backports' [15:16:56] hence the -backports Jenkins jobs which sets BACKPORTS=1 (iirc) [15:18:05] anyway, I have deployed the CI config change so both repos have a non voting debian-glue job now [15:18:18] but it would be nice to fix the underlying issue and make them voting again :) [15:22:26] sure, happy to review any patch for adapting the lintian config :-9 [15:45:12] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Migrate servers in codfw racks D3 & D4 from asw to lsw - https://phabricator.wikimedia.org/T373103#10153864 (10ABran-WMF) d/p hosts are depooled [16:08:56] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move db2209 uplink from asw-c5-codfw to lsw1-c5-codfw - https://phabricator.wikimedia.org/T374523#10154026 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=3098e21e-4c2c-426d-9ca2-5661232be6df) set by cmooney@cumin100... [16:40:50] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move db2209 uplink from asw-c5-codfw to lsw1-c5-codfw - https://phabricator.wikimedia.org/T374523#10154180 (10cmooney) 05Open→03Resolved a:03cmooney Move completed today without issue. [16:43:39] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 13Patch-For-Review: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154187 (10Dwisehaupt) Ran into something odd in the logs when building out hosts. The pay-... [17:02:00] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544#10154255 (10cmooney) 05Open→03Resolved Upgrade was successful today on cloudsw1-c8-codfw, the last of th... [17:02:37] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Migrate servers in codfw racks D3 & D4 from asw to lsw - https://phabricator.wikimedia.org/T373103#10154276 (10cmooney) All hosts moved successfully and responding to ping again. [17:10:36] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Migrate servers in codfw racks D3 & D4 from asw to lsw - https://phabricator.wikimedia.org/T373103#10154320 (10ABran-WMF) hosts are repooling [17:16:55] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 13Patch-For-Review: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154354 (10cmooney) >>! In T373942#10154187, @Dwisehaupt wrote: > Ran into something odd in... [17:26:38] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 2 others: (2) new singlemode fiber patches from dmarc to routers for IX ports - https://phabricator.wikimedia.org/T373376#10154382 (10cmooney) 05Open→03Resolved Thankfully got a call from a really good Equinix engineer today who was... [17:52:40] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154484 (10Dwisehaupt) @cmooney Thanks for that info. I think we may have multiple things going on here, ie: the... [18:39:31] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154700 (10Jgreen) I think the modsec noise localized to one webserver is explained by pay-lb*:/etc/anycast-healt... [19:01:15] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154772 (10Dwisehaupt) Good find @Jgreen. This makes more sense knowing there are similar checks in multiple places. [19:07:10] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154830 (10ssingh) You can find similar `check_cmds` (healthchecks) in a bunch of places that use bird/anycast-hc... [19:43:14] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154906 (10Jgreen) >>! In T373942#10154830, @ssingh wrote: > The `durum.yaml` one has HTTP checks (while the othe... [20:34:20] 10SRE-tools, 10Spicerack: Support listing pooled / active authdns hosts (rather than all) - https://phabricator.wikimedia.org/T375014 (10Scott_French) 03NEW [21:43:21] 10SRE-tools, 06Infrastructure-Foundations, 06SRE, 10Release-Engineering-Team (Seen): Support running puppet Beaker on CI - https://phabricator.wikimedia.org/T253635#10155446 (10hashar) Acceptance tests run by CI would be quite nice to have, specially for non SRE since that helps build confidence a given pa... [22:09:55] 10netops, 06Infrastructure-Foundations, 10observability: Transient DOWN alert on cr2-magru - https://phabricator.wikimedia.org/T374401#10155473 (10Dzahn) Adding netops since it seems unlikely it's only an Icinga issue but also both times all magru hosts and only magru hosts. [22:10:02] 10netops, 06Infrastructure-Foundations, 10observability: Transient DOWN alert on cr2-magru - https://phabricator.wikimedia.org/T374401#10155471 (10Dzahn) [22:39:26] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10155516 (10Dwisehaupt) I believe [[ https://phabricator.wikimedia.org/T170318#10155039 | this comment ]] was supp... [22:47:24] 10netops, 06Infrastructure-Foundations, 10observability: Transient DOWN alert on cr2-magru - https://phabricator.wikimedia.org/T374401#10155518 (10Scott_French) From a quick scan of events on cr2-eqdfw - assuming it's on the direct path from alerts2002 to magru, by way of the EdgeUno WAN link - there's defin...