[09:00:53] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10129063 (10Jelto) I depooled `gitlab-runner2003` for tomorrows maintenance [09:12:26] 06Traffic, 10Observability-Alerting, 10SRE Observability (FY2024/2025-Q1): No ntp query ACL for new alert hosts - https://phabricator.wikimedia.org/T374340 (10fgiunchedi) 03NEW [09:12:48] let me know what you think re: ^ and ntp + new alert hosts [09:24:41] godog: I'll let su.khe answer to that one [09:24:52] he should be around later today in his working hours [09:30:37] cheers vgutierrez [12:36:21] godog: thanks, will look shortly [13:04:13] 06Traffic, 10Observability-Alerting, 10SRE Observability (FY2024/2025-Q1): No ntp query ACL for new alert hosts - https://phabricator.wikimedia.org/T374340#10129744 (10ssingh) Thanks for filling this @fgiunchedi! Your analysis is correct: while after setting up ntpsec/ntpd (alias), the initial file was copie... [13:10:38] 06Traffic, 06Abstract Wikipedia team: Wikifunctions is down - https://phabricator.wikimedia.org/T374318#10129753 (10Joe) 05Resolved→03In progress For the record, the cause was a relatively aggressive crawler filling up all resources. While we've rate-limited this bot, I think we should use robots.txt to ba... [13:17:18] 06Traffic, 06Abstract Wikipedia team: Wikifunctions is down - https://phabricator.wikimedia.org/T374318#10129767 (10Jdforrester-WMF) 05In progress→03Resolved p:05Triage→03Unbreak! I've added a general ban of ClaudeBot for all pages to https://www.wikifunctions.org/wiki/MediaWiki:Robots.txt for now.... [13:37:15] sukhe: thank you! [14:27:46] topranks, XioNoX: I'm checking gobgpd peer configuration options, do you have a wishlist of what we should support? [14:28:08] stuff like auth for peering? or that's a no go? [14:36:55] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10130193 (10joanna_borun) p:05Triage→03Medium [14:40:04] vgutierrez: hey [14:40:16] no pre-compiled wish list I think, but certainly we should consider the options [14:40:34] MD5 Auth on the TCP session is not something we've done before, so probably we don't need to change our stance there [14:41:24] is there a doc with a list of the configurable gobgpd options? [14:43:23] so I was reading the gRPC proto definition [14:44:05] CLI documentation enumerates what you can set when adding a neighbor: https://github.com/osrg/gobgp/blob/master/docs/sources/cli-command-syntax.md#--syntax-2 [14:47:54] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: asw2-d2-eqid <-> asw2-d4-eqiad vcp link flapping - https://phabricator.wikimedia.org/T374272#10130269 (10CDanis) >>! In T374272#10127785, @cmooney wrote: > Thanks @cdanis and @Southparkfan for the task! > > Logs relate to [[ https://n... [14:52:32] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: asw2-d2-eqid <-> asw2-d4-eqiad vcp link flapping - https://phabricator.wikimedia.org/T374272#10130289 (10cmooney) >>! In T374272#10130269, @CDanis wrote: > The timestamps in the description come from LibreNMS's logs viewer for asw2-d-e... [15:27:44] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: asw2-d2-eqid <-> asw2-d4-eqiad vcp link flapping - https://phabricator.wikimedia.org/T374272#10130497 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=81e99a80-f593-4494-a565-ea730a19fbc7) set by cmooney@cumin1002 fo... [15:39:27] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: asw2-d2-eqid <-> asw2-d4-eqiad vcp link flapping - https://phabricator.wikimedia.org/T374272#10130595 (10cmooney) Ok link was replaced: ` Sep 9 15:36:56 asw2-d-eqiad vccpd[2257]: VCCPD_PROTOCOL_INTF_STATE_CHANGED: Member 4, interface... [15:41:10] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: asw2-d2-eqid <-> asw2-d4-eqiad vcp link flapping - https://phabricator.wikimedia.org/T374272#10130602 (10VRiley-WMF) Thank you! I appreciate it. Will be relabeling the new cable as 0325. Feel free to reach out if anything else happens. [15:42:10] 06Traffic, 10Observability-Alerting, 10SRE Observability (FY2024/2025-Q1): No ntp query ACL for new alert hosts - https://phabricator.wikimedia.org/T374340#10130603 (10ssingh) @fgiunchedi: Can you please try again? The changes should be rolled out to everywhere. [15:57:43] any chance I could get a review on these patches, https://gerrit.wikimedia.org/r/c/operations/puppet/+/1065286 [15:57:48] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1065283 [15:58:14] jhathaway: looking [15:58:26] sukhe: thanks [16:00:45] jhathaway: silly question: but is there a reason we are picking the cp hosts here vs something else? [16:01:24] sukhe: no good reason, they were just an example host that was throwing this error [16:01:58] are there some other hosts? I guess my preference would be try on those first vs affecting all cp ones [16:02:11] feel free to take any of the durum or even doh hosts for that matter [16:03:16] would those use the same hiera lookups? looking at 1065283 [16:04:54] the above was for 1065286 [16:05:02] looking at 1065283 now! [16:05:28] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Q1:codfw:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371434#10130724 (10Papaul) @cmooney thanks for the feedback. The discussion about not using virtual-chassis was it a final decision or j... [16:24:14] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 2 others: (2) new singlemode fiber patches from dmarc to routers for IX ports - https://phabricator.wikimedia.org/T373376#10130820 (10RobH) IRC Update: All DC Ops related items are complete and Cathal is currently working with EQ to schedul... [16:34:32] 10netops, 06Infrastructure-Foundations, 06SRE: BFD won't esablish between QFX in VRF and host from IPv6 link-local - https://phabricator.wikimedia.org/T374379 (10cmooney) 03NEW p:05Triage→03Low [16:55:54] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10131007 (10Papaul) [18:24:48] 06Traffic, 10Observability-Alerting, 10SRE Observability (FY2024/2025-Q1): No ntp query ACL for new alert hosts - https://phabricator.wikimedia.org/T374340#10131407 (10andrea.denisse) 05Open→03Resolved a:03ssingh Hi @ssingh , I tried it again and the NTP checks work now: ` denisse@alert2002:~$ /us... [18:37:26] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: BFD won't esablish between QFX in VRF and host from IPv6 link-local - https://phabricator.wikimedia.org/T374379#10131477 (10cmooney) Ok through trial and error it would appear the issue is something to do with the switch not dealing well wi... [18:53:57] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: asw2-d2-eqid <-> asw2-d4-eqiad vcp link flapping - https://phabricator.wikimedia.org/T374272#10131535 (10cmooney) So far things seem stable with this. I will leave task open to review as the week goes on, also considering if we need t... [18:54:18] jhathaway: I think the context of 1065286 became more clearer once I looked at the PCC output for 1065283 [18:54:31] I have +1ed one both and will merge tomorrow morning [18:55:36] awesome thanks, apologies for not painting the context a bit more [18:55:46] no worries at all, I was looking at it incorrectly [19:55:18] 10netops, 06Infrastructure-Foundations, 06SRE: Routed Ganeti: Add support for VM QoS marking - https://phabricator.wikimedia.org/T374392 (10cmooney) 03NEW p:05Triage→03Medium [20:59:59] 06Traffic, 13Patch-For-Review: purged issues while kafka brokers are restarted - https://phabricator.wikimedia.org/T334078#10131960 (10BCornwall) [21:01:13] 06Traffic, 10SRE Observability (FY2024/2025-Q1): CPU thermal throttling: saturation panel isn't working as expected - https://phabricator.wikimedia.org/T373995#10131988 (10BCornwall) @herron Would you say this has been solved? [23:24:34] 06Traffic, 06Abstract Wikipedia team: Wikifunctions is down - https://phabricator.wikimedia.org/T374318#10132230 (1099of9) Can/have we set a general rate limit so that when the next bot tries it, we don't go down again?