[07:39:09] FIRING: LVSHighCPU: The host lvs1016:9100 has at least its CPU 30 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1016 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [07:44:09] RESOLVED: LVSHighCPU: The host lvs1016:9100 has at least its CPU 30 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1016 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [08:05:18] 10netops, 06Infrastructure-Foundations, 10Toolforge, 06tools-infrastructure-team: Plan networking for Toolforge-on-Metal experiment - https://phabricator.wikimedia.org/T407140#11635293 (10cmooney) @fgiunchedi I got a few minutes to do a bit of a brain dump here - please see this [[ https://docs.google.com/... [08:57:38] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11635379 (10Tacsipacsi) >>! In T414805#11632990, @Ladsgroup wrote: > A global rate limit effectively is not that different than full... [08:58:25] 06Traffic, 06collaboration-services, 10Continuous-Integration-Infrastructure, 10Gerrit, and 2 others: Gerrit events not received by Zuul due to TCP Proxy timeout (CI is not triggered for some patches) - https://phabricator.wikimedia.org/T417497#11635380 (10hashar) Turns out Zuul is still being disconnected... [09:26:30] dear traffic can I please get a +1 here too https://gerrit.wikimedia.org/r/c/operations/puppet/+/1240750 ? [09:29:05] +1 for me! [09:29:11] tx ! [10:13:22] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11635530 (10Nux) This would not change the calculations much, but there are [[ https://global-search.toolforge.org/?q=thumb%5C%2F.%5C... [12:03:52] Is there a page on what the official DoT and DoH addresses are? [12:03:56] !dns [12:04:52] hiya: https://meta.wikimedia.org/wiki/Wikimedia_DNS [12:08:21] For DoT DNSOverTLS=yes DNSSEC=yes [12:08:23] Right? [12:08:32] is DNSSEC enabled? [12:10:16] I guess it is not enabled as of yet [12:55:34] 10Wikimedia-Apache-configuration, 06serviceops, 06SRE, 10Wikibase GraphQL, and 2 others: Create a rewrite for the GraphQL endpoint on wikidata.org - https://phabricator.wikimedia.org/T417026#11635869 (10taavi) [14:32:25] hiya: yeah, we validate DNSSEC irrespective of whether the client signals its intention to valid or not (and we return SERVFAIL in case of a bogus response) [14:32:31] https://wikitech.wikimedia.org/wiki/Wikimedia_DNS#DNSSEC [14:38:53] your local stub is of course free to perform its own local validation of course, in case you don't trust Wikimedia DNS to do it for you. [14:39:22] in Wikimedia DNS, just like any other recursor, you are essentially trust it (or the other recursors) to have performed the DNSSEC validation for you and set the AD bit [14:39:34] but if you don't trust it, local validation is the way to go [14:39:47] s/trust it/trustingit [14:43:25] sukhe: yes, I host a local resolver but it is mostly for email server or dnsbls [14:44:09] yeah makes sense. that's how my setup is: I have a dnsbls (unbound) and then everything else is forwarded to Wikimedia DNS. [14:44:23] forward-zone: name: "." forward-tls-upstream: yes forward-addr: 185.71.138.138@853 [14:44:30] wow nice :) [15:42:39] 06Traffic, 13Patch-For-Review: Consider rate limiting non-standard thumbnail sizes - https://phabricator.wikimedia.org/T402792#11636655 (10MatthewVernon) [15:42:42] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11636656 (10MatthewVernon) [16:09:55] sukhe: so it is safe to set: DNSOverTLS=yes DNSSEC=yes [16:17:55] hiya: yes, it should work for Wikimedia DNS. [16:18:26] if you try it, let us know and we can update the docs as well, since I don't think our current docs for systemd-resolved mention DNSSEC [16:34:00] in your experience, how much faster is DoT vs DoH? [16:38:27] we have never really measured it I think but there is some existing research on this topic. DoT should in theory simply because of its nature of being TCP + TLS be faster [16:39:13] I think last we looked at it, it was quite close but the real reason to pick one over the other is of course that DoH is behind 443, so less susceptible to blocking, plus easier in theory to use for browsers and such [16:39:19] whereas DoT tends to be on the OS-level [16:39:46] so you would configure DoT on Android on systemd-resolved whereas you would do DoH in your browser [16:41:35] if you care about speed, DNS over 53 is of course the fastest but at least Wikimedia DNS does not support that. but I am guessing if you are asking about configuring DNSSEC that you don't want to use that all :) [16:55:20] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad row A/B switch upgrade - https://phabricator.wikimedia.org/T418012 (10RobH) 03NEW p:05Triage→03Medium [16:56:06] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad row A/B switch upgrade - https://phabricator.wikimedia.org/T418012#11636926 (10RobH) [17:45:46] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad A/B switch cabling documentation - https://phabricator.wikimedia.org/T418018 (10RobH) 03NEW [17:46:03] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad A/B switch cabling documentation - https://phabricator.wikimedia.org/T418018#11637114 (10RobH) [17:48:01] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad A/B switch cabling documentation - https://phabricator.wikimedia.org/T418018#11637117 (10RobH) @ayounsi or @papaul: Would one of you be best suited to provide the cable diagram and matrix so we know how/where each switch is conne... [18:40:25] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad A/B switch cabling documentation - https://phabricator.wikimedia.org/T418018#11637204 (10RobH) [18:51:43] FIRING: [2x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [18:56:43] FIRING: [8x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [19:14:18] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11637304 (10Shawn) Why are those 429 errors unpredictable? All this afternoon LiveRC (real time monitoring tool of recent changes on... [19:31:43] FIRING: [13x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [19:36:43] FIRING: [14x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [19:41:43] FIRING: [16x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [19:46:43] FIRING: [16x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [19:51:43] RESOLVED: [16x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [20:01:58] FIRING: [8x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [20:19:16] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11637442 (10Shawn) I'm not sure if I understood properly the issue because I tried to fix LiveRC by changing size of icons but I stil... [20:46:58] FIRING: [8x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [20:51:58] RESOLVED: [8x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [21:06:43] FIRING: [3x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [21:06:58] FIRING: [4x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [21:11:43] FIRING: [8x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [21:21:43] FIRING: [8x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [21:21:44] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11637548 (10Tacsipacsi) >>! In T414805#11636273, @Ladsgroup wrote: > The rate limit is at the edge layer so it doesn't know whether i... [21:26:43] FIRING: [8x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [21:31:43] RESOLVED: [8x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [23:44:07] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11637743 (10Papaul) The first step was completed by remote hands yesterday but the port number and the cable ID's were not given to me so I just got the informati...