[10:38:33] lvs1015 has dpdk installed, is there anything to consider when upgrading it? is that just for some tests? [10:45:45] moritzm: you can remove it even [10:46:10] let me take care of that :) [10:46:42] (done) [10:47:12] ack, thx! [12:54:26] 10netops, 06Traffic, 06Infrastructure-Foundations, 10probenet, 06SRE: improve GeoDNS-to-edge mapping - https://phabricator.wikimedia.org/T316160#10152990 (10CDanis) We did a somewhat experimental version of this work as @JameelKaisar's intern project in {T332024} and friends. The infrastructure pieces h... [12:56:33] 10netops, 06Traffic, 06Infrastructure-Foundations, 10probenet, 06SRE: improve GeoDNS-to-edge mapping - https://phabricator.wikimedia.org/T316160#10153011 (10CDanis) Oh, and one related thing, we should fix T347114 -- I think that's just a VCL change. [14:03:56] sukhe: are you in a position to depool cp2039 & cp2040 before our maintenance in 2 hours time? [14:04:18] topranks: yes please :) [14:04:27] left a note that we will do it 30-minutes before the event [14:04:37] super thanks - sorry I missed that [14:04:42] no worries [15:45:12] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Migrate servers in codfw racks D3 & D4 from asw to lsw - https://phabricator.wikimedia.org/T373103#10153864 (10ABran-WMF) d/p hosts are depooled [16:08:56] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move db2209 uplink from asw-c5-codfw to lsw1-c5-codfw - https://phabricator.wikimedia.org/T374523#10154026 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=3098e21e-4c2c-426d-9ca2-5661232be6df) set by cmooney@cumin100... [16:40:50] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move db2209 uplink from asw-c5-codfw to lsw1-c5-codfw - https://phabricator.wikimedia.org/T374523#10154180 (10cmooney) 05Open→03Resolved a:03cmooney Move completed today without issue. [16:43:39] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 13Patch-For-Review: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154187 (10Dwisehaupt) Ran into something odd in the logs when building out hosts. The pay-... [16:55:12] 06Traffic, 06DC-Ops, 10ops-esams, 10ops-magru, 06SRE: CPU temperature issues in cp hosts - https://phabricator.wikimedia.org/T373993#10154233 (10RobH) I'm now looking into these. Overall just these specific servers report heat issues while they are weighted the same as other cp hosts within the same fle... [16:58:27] 06Traffic, 06DC-Ops, 10ops-esams, 06SRE: cp307[12] thermal issues - https://phabricator.wikimedia.org/T374986 (10RobH) 03NEW [17:02:00] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544#10154255 (10cmooney) 05Open→03Resolved Upgrade was successful today on cloudsw1-c8-codfw, the last of th... [17:02:37] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Migrate servers in codfw racks D3 & D4 from asw to lsw - https://phabricator.wikimedia.org/T373103#10154276 (10cmooney) All hosts moved successfully and responding to ping again. [17:10:36] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Migrate servers in codfw racks D3 & D4 from asw to lsw - https://phabricator.wikimedia.org/T373103#10154320 (10ABran-WMF) hosts are repooling [17:16:55] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 13Patch-For-Review: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154354 (10cmooney) >>! In T373942#10154187, @Dwisehaupt wrote: > Ran into something odd in... [17:19:27] 06Traffic, 06DC-Ops, 10ops-esams, 06SRE: cp307[12] thermal issues - https://phabricator.wikimedia.org/T374986#10154357 (10RobH) [17:20:31] 06Traffic, 06DC-Ops, 10ops-esams, 10ops-magru, 06SRE: CPU temperature issues in cp hosts - https://phabricator.wikimedia.org/T373993#10154362 (10RobH) 05Open→03Stalled Stalling parent task while working on fixing the esams hosts (esams is easier to get parts in and out than magru, so esams is better... [17:26:38] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 2 others: (2) new singlemode fiber patches from dmarc to routers for IX ports - https://phabricator.wikimedia.org/T373376#10154382 (10cmooney) 05Open→03Resolved Thankfully got a call from a really good Equinix engineer today who was... [17:52:40] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154484 (10Dwisehaupt) @cmooney Thanks for that info. I think we may have multiple things going on here, ie: the... [18:39:31] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154700 (10Jgreen) I think the modsec noise localized to one webserver is explained by pay-lb*:/etc/anycast-healt... [19:01:15] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154772 (10Dwisehaupt) Good find @Jgreen. This makes more sense knowing there are similar checks in multiple places. [19:07:10] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154830 (10ssingh) You can find similar `check_cmds` (healthchecks) in a bunch of places that use bird/anycast-hc... [19:43:14] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10154906 (10Jgreen) >>! In T373942#10154830, @ssingh wrote: > The `durum.yaml` one has HTTP checks (while the othe... [22:09:56] 10netops, 06Infrastructure-Foundations, 10observability: Transient DOWN alert on cr2-magru - https://phabricator.wikimedia.org/T374401#10155473 (10Dzahn) Adding netops since it seems unlikely it's only an Icinga issue but also both times all magru hosts and only magru hosts. [22:10:02] 10netops, 06Infrastructure-Foundations, 10observability: Transient DOWN alert on cr2-magru - https://phabricator.wikimedia.org/T374401#10155471 (10Dzahn) [22:39:26] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10155516 (10Dwisehaupt) I believe [[ https://phabricator.wikimedia.org/T170318#10155039 | this comment ]] was supp... [22:47:25] 10netops, 06Infrastructure-Foundations, 10observability: Transient DOWN alert on cr2-magru - https://phabricator.wikimedia.org/T374401#10155518 (10Scott_French) From a quick scan of events on cr2-eqdfw - assuming it's on the direct path from alerts2002 to magru, by way of the EdgeUno WAN link - there's defin...