[02:13:10] 06Traffic, 06collaboration-services, 06SRE, 13Patch-For-Review, 06Release-Engineering-Team (Radar): Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11347743 (10Dzahn) We had some strange results when trying to debug this together. So I ended up testing every combination betw... [09:00:47] 06Traffic, 06MW-Interfaces-Team, 06serviceops, 07Epic, and 3 others: Epic: API Rate Limiting Architecture - https://phabricator.wikimedia.org/T399291#11348482 (10daniel) [09:45:07] I'm seeing something weird related to the discovery DNS records: [09:45:07] brouberol@cp1100:~$ dig +short growthbook-api.discovery.wmnet [09:45:07] k8s-ingress-dse.discovery.wmnet. [09:45:07] 10.2.2.91 [09:45:07] --- [09:45:07] brouberol@cp6016:~$ dig +short growthbook-api.discovery.wmnet [09:45:07] brouberol@cp6016:~$ [09:48:14] seems like the that discovery record is not consistently seen across our PoPs [09:53:24] 10netops, 06Infrastructure-Foundations, 10Toolforge, 06tools-infrastructure-team: Plan networking for Toolforge-on-Metal experiment - https://phabricator.wikimedia.org/T407140#11348677 (10fgiunchedi) Thank you @cmooney for the summary, I'll add a few thoughts I had while working on the Toolforge on Metal p... [10:49:14] it seems to have resolved [10:49:14] brouberol@cp6014:~$ host growthbook-api.discovery.wmnet [10:49:14] growthbook-api.discovery.wmnet is an alias for k8s-ingress-dse.discovery.wmnet. [10:49:14] k8s-ingress-dse.discovery.wmnet has address 10.2.2.91 [10:51:20] has been added recently? [11:18:23] it was added about 1.5h before it started to resolve on these cp10xx hosts. It did resolve correctly on other PoPs [11:19:32] brouberol: is it possible it got negative-cached before the deploy? have you tried to run the cookook to clear the record cache in the resolvers? [11:24:55] ah, it's entirely possible. IIRC I realized the record didn't exist by running a `host` command, punched in the domain in the `dns` repo, deployed and tried again [11:33:53] so yeah most likely it got negative-cached [11:34:06] gotcha thanks! [11:34:08] 06Traffic, 06collaboration-services, 06SRE, 06Release-Engineering-Team (Radar), 05WMF-NDA: Change Gitiles caching config - https://phabricator.wikimedia.org/T409422 (10LSobanski) 03NEW [11:40:04] brouberol: from a quick check our negative TTL should be 1h [11:40:33] and you can use the sre.dns.wipe-cache cookbook to clear records in case of issues [12:52:45] thanks! Lots I didn't know! [13:56:57] 06Traffic: [Search Console Verification DNS Request] - wiktionary.org and wikibooks.org - https://phabricator.wikimedia.org/T409314#11349648 (10JKelsoteel-WMF) Hello Team, I did see **wiktionary.org** show up in GSC today, though now I need to verify domain ownership in GSC for our service account. Can you pleas... [13:57:59] 06Traffic: [Search Console Verification DNS Request] - wiktionary.org and wikibooks.org - https://phabricator.wikimedia.org/T409314#11349650 (10ssingh) Thanks @JKelsoteel-WMF, we will be picking this up today. [15:23:51] 06Traffic: [Search Console Verification DNS Request] - wiktionary.org and wikibooks.org - https://phabricator.wikimedia.org/T409314#11350127 (10ssingh) @JKelsoteel-WMF: Can you please try to log in to `wikibooks.org` as well so we can see the text of the DNS record that needs to be verified? [15:28:16] 06Traffic, 10wikimediafoundation.org, 07Chinese-Sites, 13Patch-For-Review: Some domains point to Chinese version of Wikimedia Foundation website which no longer exists - https://phabricator.wikimedia.org/T407579#11350136 (10Bugreporter2) [15:35:00] 06Traffic, 10Prod-Kubernetes, 06serviceops, 07Kubernetes, 13Patch-For-Review: Handling inbound IPIP traffic on low traffic LVS k8s based realservers - https://phabricator.wikimedia.org/T352956#11350194 (10ssingh) Thanks @akosiaris, that sounds good. We would like to get this done in Q3 to resolve this bl... [15:52:01] 06Traffic: [Search Console Verification DNS Request] - wiktionary.org and wikibooks.org - https://phabricator.wikimedia.org/T409314#11350274 (10JKelsoteel-WMF) @ssingh Here is the TXT record for wikibooks.org google-site-verification=COFkPSi4dDq5UyOCRY7y7XTduWBKcY5vl59LMmrb8gM Although, it looks like our servi... [15:56:58] 06Traffic: [Search Console Verification DNS Request] - wiktionary.org and wikibooks.org - https://phabricator.wikimedia.org/T409314#11350301 (10ssingh) Ah interesting, that explains why we couldn't see the verification option on the Search Console. So just to confirm, you are set for both? [15:57:41] 06Traffic: [Search Console Verification DNS Request] - wiktionary.org and wikibooks.org - https://phabricator.wikimedia.org/T409314#11350307 (10JKelsoteel-WMF) Yes, I am set for both! I am able to do all I need to do. Thank you for your help! [15:59:25] 06Traffic: [Search Console Verification DNS Request] - wiktionary.org and wikibooks.org - https://phabricator.wikimedia.org/T409314#11350315 (10ssingh) 05Open→03Resolved a:03ssingh [17:33:58] 06Traffic, 10wikimediafoundation.org, 07Chinese-Sites, 13Patch-For-Review: Some domains point to Chinese version of Wikimedia Foundation website which no longer exists - https://phabricator.wikimedia.org/T407579#11350740 (10Dzahn) Hmm.. I agree that it's a bit unclear how to deal with these and prevent the... [17:51:15] 06Traffic, 10wikimediafoundation.org, 07Chinese-Sites, 13Patch-For-Review: Some domains point to Chinese version of Wikimedia Foundation website which no longer exists - https://phabricator.wikimedia.org/T407579#11350847 (10Pppery) 05In progress→03Resolved a:03SCampos-WMF (Closing this ticket per... [18:11:45] 10netops, 06Infrastructure-Foundations, 06SRE: Servers exposing incorrect LLDP info - https://phabricator.wikimedia.org/T250367#11350945 (10cmooney) So this is causing a lot of logspam on our Nokia switches right now. What I've noticed before is that our hosts tend to alternate between two LLDP neighbors co... [19:06:33] hello traffic friends o/ any concerns or conflicts if I deploy an haproxy config change to A:cp in a few minutes? (i.e., disable-puppet, test, rolling run-puppet-agent) [19:11:11] how to allocate service IPs for a new load-balanced service? [19:13:16] mutante: this is an LVS VIP? if so, I think that's https://wikitech.wikimedia.org/wiki/DNS/Netbox#How_to_manually_allocate_a_special_purpose_IP_address_in_Netbox [19:13:31] swfrench-wmf: yes [19:13:39] thank you, reading! [19:14:38] no problem! hopefully it's still right :) [19:14:41] it's linked from https://wikitech.wikimedia.org/wiki/LVS#DNS_changes_(svc_zone_only) [19:15:26] * swfrench-wmf will start deploying haproxy change shortly [19:15:53] *nod*, alright. I guess I will not only need eqiad and codfw but 7 of them for each POP [19:16:48] oh, interesting - yeah, I have no idea what this looks like outside core DCs [19:17:07] starts with the more standard thing only [19:26:20] first try to create an IP in netbox and already an issue :) says I created it but not showing up .. ok.. [19:30:06] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Nokia OSPF alerts not working - https://phabricator.wikimedia.org/T408378#11351215 (10cmooney) FWIW this will need further investigation, I've reset a bunch of these switches which will cause the scenario the alerts should fire, but I... [19:34:17] mutante: happy to help, just seeing this now [19:34:31] but yes basically the steps are correct, even if the text could be better [19:34:38] assign on Netbox, run the DNS netbox cookbook [19:34:47] and then do the manual ops/dns commit which I just saw you did [19:35:18] but yeah, please do ping us if required. the instructions are _decent_ but they still have gaps in them some places (and we are trying to keep those updated) [19:35:53] sukhe: consider the gerrit review request the ping:) no rush [19:37:08] mutante: ok, noted. I am sure you have seen it but also https://wikitech.wikimedia.org/wiki/LVS#Please_read_before_we_get_started... for the overall process basically [19:37:09] I created https://netbox.wikimedia.org/ipam/ip-addresses/21741/ but I am missing it here: https://netbox.wikimedia.org/ipam/prefixes/93/ip-addresses/ even though that's where i came from [19:37:29] tl;dr being that we get the patches ready, review and then merge one by one, same day ideally (though not a hard reqiurement depending on the step) [19:38:37] mutante: change Status from Reserved to Active [19:39:09] ok! will do. somehow assumed that comes after it's bound to loopback [19:40:14] VRF not required as well, leave that empty [19:44:49] set to active, VRF remoed [19:57:49] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Nokia OSPF alerts not working - https://phabricator.wikimedia.org/T408378#11351345 (10cmooney) Small update, right now lsw1-d6-eqiad is broken. So this alert should be present for ssw1-d1-eqiad and ssw1-d8-eqiad. [20:12:14] 06Traffic: Extend ncmonitor to validate existing ncredir domains - https://phabricator.wikimedia.org/T409486 (10BCornwall) 03NEW [20:13:08] 06Traffic, 10wikimediafoundation.org, 07Chinese-Sites, 13Patch-For-Review: Some domains point to Chinese version of Wikimedia Foundation website which no longer exists - https://phabricator.wikimedia.org/T407579#11351406 (10BCornwall) I went ahead and created T409486 which proposes extending ncmonitor... [20:19:51] 06Traffic, 10Hiddenparma, 13Patch-For-Review: Introduce known-client identity objects and integrate with requestctl - https://phabricator.wikimedia.org/T403220#11351429 (10Scott_French) [20:32:02] 06Traffic, 10Hiddenparma, 13Patch-For-Review: Introduce known-client identity objects and integrate with requestctl - https://phabricator.wikimedia.org/T403220#11351456 (10Scott_French) [20:35:19] 06Traffic, 10Hiddenparma, 13Patch-For-Review: Introduce known-client identity objects and integrate with requestctl - https://phabricator.wikimedia.org/T403220#11351466 (10Scott_French) [20:57:34] 06Traffic: Extend ncmonitor to validate existing ncredir domains - https://phabricator.wikimedia.org/T409486#11351543 (10BCornwall) p:05Triage→03Low [21:13:44] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Nokia OSPF alerts not working - https://phabricator.wikimedia.org/T408378#11351612 (10colewhite) In today's case, the alert criteria wasn't met because the metrics [[ https://grafana-rw.wikimedia.org/explore?schemaVersion=1&panes=%7B%... [21:39:39] 06Traffic, 06SRE, 13Patch-For-Review, 07User-notice-archive: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11351698 (10Tgr) The PR looks right to me; not sure if there's an easy way to verify the header is applied to all requests, other than removing... [21:57:42] 06Traffic, 13Patch-For-Review: ncmonitor should verify that DNSSEC is disabled in MarkMonitor - https://phabricator.wikimedia.org/T402961#11351740 (10BCornwall) > We confirmed the API response depends on whether the TLD supports automated DNSSEC or not....