[03:42:31] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Q1:codfw:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371434#10139563 (10Papaul) >>! In T371434#10120335, @cmooney wrote: >>>! In T371434#10119784, @Papaul wrote: >> The diagram below will ou... [03:55:47] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 10ops-codfw: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587 (10Papaul) 03NEW [03:55:56] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 10ops-codfw: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10139580 (10Papaul) p:05Triage→03Medium [03:56:49] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 10ops-codfw: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10139581 (10Papaul) [06:11:35] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10139747 (10ABran-WMF) ES replication source in the path has been moved (T374592), all remaining hosts are depoolable [09:04:09] topranks: this looks like a sane neighboor/peer config? https://www.irccloud.com/pastebin/rhe2u0vR/ [09:04:59] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move db2209 uplink from asw-c5-codfw to lsw1-c5-codfw - https://phabricator.wikimedia.org/T374523#10139909 (10ABran-WMF) >>! In T374523#10136865, @cmooney wrote: >>>! In T374523#10136856, @ABran-WMF wrote: >> I'll get to T374425 to get... [09:11:07] vgutierrez: yes I think so [09:11:49] The "AS *" means only the switch/router can initiate the session (as gobgpd is not configured with the ASN of its peer to do so) [09:12:09] but that is probably ok, and makes the config a little cleaner our side [09:12:16] also it mirrors what PyBal already does [09:12:34] The timers of 30/90 are fine - I assume we will use BFD here? [09:14:02] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move db2209 uplink from asw-c5-codfw to lsw1-c5-codfw - https://phabricator.wikimedia.org/T374523#10139921 (10cmooney) >>! In T374523#10139909, @ABran-WMF wrote: > We can add it to today's maintenance if you're up to it. Let me know so... [09:14:33] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move db2209 uplink from asw-c5-codfw to lsw1-c5-codfw - https://phabricator.wikimedia.org/T374523#10139922 (10ABran-WMF) ack, adding it to the pile [09:16:38] topranks: I can definitely specify the AS of the other side [09:17:47] from what I'm seeing pybal is only aware of the local AS [09:22:35] but IIRC pybal attempts to initiate the session [09:29:50] topranks: we can use bfd but we need to use an additional bfd daemon [09:32:23] vgutierrez: indeed you are correct - PyBal does initiate [09:32:25] so ignore me [09:32:27] also - https://datatracker.ietf.org/doc/html/rfc4271#section-4.2 [09:32:39] I am mistaken about the remote-sides expected AS needing to be in the OPEN message [09:32:58] BFD I think it quite a good idea so we fail-over fast if there is a problem [09:33:32] Not *essential* I guess. If we don't use it we may want to change the neighbor hold/keepalive time for the BGP session [09:33:47] so it will detect a dead peer in less than the 90 seconds those current settings have [09:33:55] ok, so I'll make those configurable [09:34:03] right now it's using the default values [09:34:12] we can control the hold and keepalive timers from either side [09:34:24] https://www.irccloud.com/pastebin/oIEC1Uwm/ [09:34:25] the peers will agree on whichever side asks for the lowest values [09:34:32] right now I'm only setting those options [09:34:37] so we can set them lower on the switch side and affect gobgpd [09:34:47] or set them lower on the gobgpd side and affect the switch [09:35:32] so yeah that seems ok [09:35:39] cool [09:35:52] I think we can control the timers from the switch side [09:35:59] we can keep iterating on this of course :) [09:35:59] If we use BFD we can leave the default 90 seconds [09:36:14] as BFD will do the job much quicker [09:36:33] if we don't enable BFD we can set to perhaps 10/30 on the switch or lower even [09:36:44] but yep we can iterate on this - for now all that seems good [11:05:09] 06Traffic, 06DC-Ops, 10ops-esams, 10ops-magru, 06SRE: CPU temperature issues in cp hosts - https://phabricator.wikimedia.org/T373993#10140285 (10Vgutierrez) @RobH / @wiki_willy could we get this task prioritized on your side? [12:40:25] 10netops, 06Infrastructure-Foundations, 06SRE: Create alerting for saturation on sub-rated interfaces - https://phabricator.wikimedia.org/T374614 (10cmooney) 03NEW p:05Triage→03Medium [13:03:12] 10netops, 06Infrastructure-Foundations, 06SRE: Create alerting for saturation on sub-rated interfaces - https://phabricator.wikimedia.org/T374614#10140660 (10cmooney) [13:22:50] 10netops, 06Infrastructure-Foundations, 06SRE: Alert when anycast-healthchecker withdraws BGP route - https://phabricator.wikimedia.org/T374619 (10cmooney) 03NEW p:05Triage→03Low [13:34:43] 10netops, 06Infrastructure-Foundations, 06SRE: Alert when anycast-healthchecker withdraws BGP route - https://phabricator.wikimedia.org/T374619#10140791 (10cmooney) [13:56:21] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE: Alert when anycast-healthchecker withdraws BGP route - https://phabricator.wikimedia.org/T374619#10140887 (10ssingh) [13:57:46] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE: Alert when anycast-healthchecker withdraws BGP route - https://phabricator.wikimedia.org/T374619#10140885 (10ssingh) Thanks for filing this task! This is indeed something we have discussed in the past but not formally so let's use this task to do th... [14:55:29] 06Traffic, 06Commons: Error: 503, Backend fetch failed while editing Commons - https://phabricator.wikimedia.org/T372473#10141198 (10CDobbins) >>! In T372473#10127905, @Yann wrote: > https://commons.wikimedia.org/w/index.php?title=File:The_Three_Musketeers_(1921).webm&action=revert > > ` > Request from 80.82.... [15:11:17] 10Wikimedia-Apache-configuration, 06MW-Interfaces-Team, 07Regression: After introduction of /api/ in ATS, https://en.wikipedia.org/api/ returns 404 Not Found - https://phabricator.wikimedia.org/T373998#10141234 (10daniel) @akosiaris Did I understand correctly that the rewrite rule for /api/ has been rolld ba... [15:27:35] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10141318 (10jcrespo) I've stopped codfw media backups. @cmooney Would it be possible to get preferencial time on mai... [15:40:50] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10141367 (10ABran-WMF) @cmooney all nodes have been depooled [15:50:48] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10141418 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=bb570977-8737-4373-95ac-3765685f6e5e) set... [15:51:39] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10141420 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=5073d83c-c18b-41a0-aa78-a6da63b209f9) set... [16:04:51] 06Traffic: main project domains are improperly added/removed in ncmonitor patches - https://phabricator.wikimedia.org/T374640 (10BCornwall) 03NEW [16:05:28] 06Traffic: main project domains are improperly added/removed in ncmonitor patches - https://phabricator.wikimedia.org/T374640#10141530 (10BCornwall) p:05Triage→03Medium [16:19:05] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10141576 (10cmooney) Everything moved successfully, all ports up on the new switch and everything responding to ping a... [16:30:38] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move db2209 uplink from asw-c5-codfw to lsw1-c5-codfw - https://phabricator.wikimedia.org/T374523#10141626 (10cmooney) Will re-schedule for Tuesday Sep 17th [16:30:51] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: Migrate servers in codfw racks C6 & C7 from asw to lsw - https://phabricator.wikimedia.org/T373101#10141621 (10cmooney) 05Open→03Resolved a:03cmooney [17:22:54] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10141995 (10Jgreen) [17:28:49] 06Traffic, 06DC-Ops, 10ops-esams, 10ops-magru, 06SRE: CPU temperature issues in cp hosts - https://phabricator.wikimedia.org/T373993#10142033 (10wiki_willy) a:03RobH [17:33:25] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Move sretest2002 primary uplink to asw-d4-codfw - https://phabricator.wikimedia.org/T370475#10142054 (10cmooney) So once we have completed the move for D4 next Tuesday I have a (hopefully) small request. Could the sretest2002 uplinks... [18:18:53] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Move sretest2002 primary uplink to asw-d4-codfw - https://phabricator.wikimedia.org/T370475#10142127 (10Jhancock.wm) yeah can do [18:19:06] 06Traffic: main project domains are improperly added/evaluated in ncmonitor patches - https://phabricator.wikimedia.org/T374640#10142128 (10BCornwall) [22:55:01] 06Traffic: ncmonitor shouldn't submit empty CRs to operations/dns repo - https://phabricator.wikimedia.org/T373780#10142822 (10BCornwall) [22:55:23] 06Traffic: ncmonitor shouldn't submit empty CRs to operations/dns repo - https://phabricator.wikimedia.org/T373780#10142823 (10BCornwall) [22:55:38] 06Traffic: main project domains are improperly added/evaluated in ncmonitor patches - https://phabricator.wikimedia.org/T374640#10142824 (10BCornwall) [23:05:50] 10Wikimedia-Apache-configuration, 06MW-Interfaces-Team, 07Regression: After introduction of /api/ in ATS, https://en.wikipedia.org/api/ returns 404 Not Found - https://phabricator.wikimedia.org/T373998#10142849 (10Krinkle) 05Open→03Resolved [23:06:29] 10Wikimedia-Apache-configuration, 06MW-Interfaces-Team, 07Regression: After introduction of /api/ in ATS, https://en.wikipedia.org/api/ returns 404 Not Found - https://phabricator.wikimedia.org/T373998#10142850 (10Krinkle) Fixed in: >>! In T364400#10126203, @gerritbot wrote: > Change #1071229 **merged**... [23:07:22] 10Wikimedia-Apache-configuration, 06MW-Interfaces-Team, 10MW-on-K8s, 07Regression: https://en.wikipedia.org/api/ 404 Not Found due to extract2.php RewriteRule - https://phabricator.wikimedia.org/T373048#10142851 (10Krinkle) 05Open→03Resolved Fixed in: >>! In T364400#10126203, @gerritbot wrote: > C... [23:27:42] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs2011: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370891#10142876 (10Papaul) a:03Papaul [23:28:05] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: lvs2013: move uplink to lsw1-c2-codfw and connect to per-rack vlan - https://phabricator.wikimedia.org/T370927#10142877 (10Papaul) a:03Papaul