[09:37:08] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11870110 (10SLyngshede-WMF) Depooling command: ` $ ssh cumin1003.eqiad.wmnet $ sudo cookbook sre.dns.admin depool ulsfo ` [09:51:59] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 10ServiceOps-SharedInfra, and 4 others: Epic: API Rate Limiting Architecture - https://phabricator.wikimedia.org/T399291#11870159 (10daniel) [10:02:42] 06Traffic, 10MediaWiki-File-management, 10MediaWiki-General, 06MW-Interfaces-Team, and 5 others: Make MediaWiki comply with Wikimedia user agent policy - https://phabricator.wikimedia.org/T424768#11870226 (10daniel) [10:03:17] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, and 2 others: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11870228 (10ABran-WMF) 05Open→03Stalled p:05High→03Medium reducing the priority of the task, given the error rat... [10:06:26] 06Traffic, 10MediaWiki-File-management, 10MediaWiki-General, 06MW-Interfaces-Team, and 5 others: Make MediaWiki comply with Wikimedia user agent policy - https://phabricator.wikimedia.org/T424768#11870238 (10LucasWerkmeister) (The task title and/or description could probably still do with some refining. Ar... [10:55:51] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 07Epic, and 3 others: Epic: Enforce API rate limits (WE5.1.3c) - https://phabricator.wikimedia.org/T412585#11870421 (10daniel) [10:59:01] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 10ServiceOps-SharedInfra, and 4 others: Epic: API Rate Limiting Architecture - https://phabricator.wikimedia.org/T399291#11870439 (10daniel) [11:01:20] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 07Epic, and 3 others: Epic: Enforce API rate limits (WE5.1.3c) - https://phabricator.wikimedia.org/T412585#11870448 (10daniel) [11:17:41] 06Traffic, 06ServiceOps new, 05FY2025-26 KR 5.1, 06MediaWiki-Platform-Team (Radar), 07OKR-Work: rest-gateway: configure comprehensive set of cg-nat IPs - https://phabricator.wikimedia.org/T424827 (10daniel) 03NEW [11:17:45] 06Traffic, 06ServiceOps new, 05FY2025-26 KR 5.1, 06MediaWiki-Platform-Team (Radar), 07OKR-Work: rest-gateway: configure comprehensive set of cg-nat IPs - https://phabricator.wikimedia.org/T424827#11870552 (10daniel) a:03daniel [11:20:42] 06Traffic, 06collaboration-services, 10Gerrit, 10Quibble, and 3 others: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11870572 (10A_smart_kitten) (given the patch to #quibble to, IIUC, attempt to address the remaining CI errors) [11:32:05] 06Traffic, 06ServiceOps new, 07Epic, 05FY2025-26 KR 5.1, and 2 others: CDN Backend API: recognized centralauth tokens in haproxy - https://phabricator.wikimedia.org/T424831 (10daniel) 03NEW [12:00:43] FIRING: HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=esams&var-instance=cp3068&viewPanel=panel-19 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [12:10:43] RESOLVED: HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=esams&var-instance=cp3068&viewPanel=panel-19 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [12:26:35] 10netops, 06Infrastructure-Foundations: POPs - free up 2xQSFP ports - https://phabricator.wikimedia.org/T424611#11870933 (10cmooney) [13:05:19] 06Traffic, 10MediaWiki-File-management, 10MediaWiki-General, 06MW-Interfaces-Team, and 5 others: Make MediaWiki comply with Wikimedia user agent policy - https://phabricator.wikimedia.org/T424768#11871173 (10matmarex) Does MWHttpRequest automatically support cookies, and if yes, where does it persist them?... [13:06:10] 06Traffic, 10DNS, 06SRE: [Update DNS Record Request] - wikimedia.org - Add TXT verification for Anthropic - https://phabricator.wikimedia.org/T424785#11871180 (10ssingh) a:03CDobbins [13:14:37] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations: ulsfo switch work May 2026: Host reimaging - https://phabricator.wikimedia.org/T424686#11871222 (10SLyngshede-WMF) [13:14:41] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11871223 (10SLyngshede-WMF) [13:18:54] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations: ulsfo switch work May 2026: Host reimaging - https://phabricator.wikimedia.org/T424686#11871241 (10MoritzMuehlenhoff) Are ganeti4006/4008 the only servers which will be touched in this maintenance? Then I would already go ahead and drain/remove ganeti4006 [13:24:13] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations: ulsfo switch work May 2026: Host reimaging - https://phabricator.wikimedia.org/T424686#11871253 (10SLyngshede-WMF) Only hosts in this one rack will be touched: https://netbox.wikimedia.org/dcim/racks/34/ but we'll depool the entire site before work starts. [13:25:30] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations: ulsfo switch work May 2026: Host reimaging - https://phabricator.wikimedia.org/T424686#11871260 (10MoritzMuehlenhoff) Ack, thanks! [13:28:51] followup question for the ulsfo maintenance and Ganeti: do I need to drop both Ganeti nodes ahead of the maintenance or will be able move them sequentially? so that e.g. ganeti4006 moves to the new switch first and in the mean time 4008 will remain connected to the old switch? or is there a flag day which will impact both at the same time? [14:28:47] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 6 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11871716 (10A_smart_kitten) Prompted by {T424511}, I'm probably gonna try and work a bit from (subsets of) [[https://codesearch.wmclo... [14:36:12] moritzm: the latter [14:38:46] ok, then I'll drain both nodes tomorrow ahead of the maintenance [15:03:30] hey folks [15:03:47] I ran authdns-update on dns1004 for https://gerrit.wikimedia.org/r/c/operations/dns/+/1277099 [15:04:14] but deploy1003 seems to be using dns1006 that returns NXDOMAIN [15:04:35] it should be propagated with authdns-update [15:04:36] strange [15:05:36] elukey: deploy1003 seems to be using dns1006 meaning? [15:05:52] probably because someone did a lookup for it and it got cached, the NXDOMAIN [15:05:58] wipe the caches and try again I think [15:06:11] cookbook sre.dns.wipe-cache [15:06:58] sukhe: recdns-anycast wise I mean, but yeah I may have tried the domain before the change and got the cache polluted, right [15:07:03] thanks :D [15:07:03] yep [15:07:41] dont' know if you did something already but it works now [15:07:46] I thought nxdomains were not cached though, anywayyy [15:07:52] dig wikifunctions-python-evaluator.svc.codfw.wmnet [15:07:52] thanks and sorry for the trouble :) [15:08:10] wikifunctions-python-evaluator.svc.codfw.wmnet. 3600 IN CNAME k8s-ingress-wikikube.svc.codfw.wmnet. [15:08:10] k8s-ingress-wikikube.svc.codfw.wmnet. 3600 IN A 10.2.1.70 [15:08:28] not with dig wikifunctions-javascript-evaluator.discovery.wmnet @dns1006.wikimedia.org afaics [15:08:35] I'll wipe the cache [15:10:01] elukey: yeah, NXDOMAIN TTL of 1H [15:10:09] sukhe: TIL [15:10:21] works now [15:10:23] <3 [15:10:27] <3 [15:11:28] <3 (I was the only one missing) [15:20:53] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 10ServiceOps-SharedInfra, and 4 others: Epic: API Rate Limiting Architecture - https://phabricator.wikimedia.org/T399291#11872065 (10daniel) [15:21:15] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 10ServiceOps-SharedInfra, and 4 others: Epic: API Rate Limiting Architecture - https://phabricator.wikimedia.org/T399291#11872081 (10daniel) [15:21:20] 06Traffic, 10MediaWiki-File-management, 10MediaWiki-General, 06MW-Interfaces-Team, and 5 others: Make MediaWiki comply with Wikimedia user agent policy - https://phabricator.wikimedia.org/T424768#11872080 (10daniel) [15:25:21] 06Traffic, 10MediaWiki-File-management, 10MediaWiki-General, 06MW-Interfaces-Team, and 5 others: Allow MediaWiki to make authenticated requests to other instances of MediaWiki - https://phabricator.wikimedia.org/T424768#11872105 (10daniel) [15:33:30] 06Traffic, 10MediaWiki-File-management, 10MediaWiki-General, 06MW-Interfaces-Team, and 5 others: Allow MediaWiki to make authenticated requests to other instances of MediaWiki - https://phabricator.wikimedia.org/T424768#11872158 (10daniel) >>! In T424768#11871173, @matmarex wrote: > Does MWHttpRequest auto... [15:35:03] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 10ServiceOps-SharedInfra, and 4 others: Epic: API Rate Limiting Architecture - https://phabricator.wikimedia.org/T399291#11872169 (10daniel) [16:18:51] 06Traffic, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep - https://phabricator.wikimedia.org/T424549#11872370 (10elukey) I added the new intermediate config, and now puppet should do its job in propagating the c... [18:56:24] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 6 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11873042 (10Nux) I went through this [[ https://global-search.toolforge.org/?q=%5C%2Fthumb%5C%2F%28%5B%5E%5C%2F%5D%2B%3F%5C%2F%29%7B3... [18:58:38] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11873046 (10Papaul) [19:16:17] 06Traffic, 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 13Patch-For-Review: Surge in webrequest validation check - https://phabricator.wikimedia.org/T422030#11873158 (10xcollazo) >>! In T422030#11873071, @CodeReviewBot wrote: > xcollazo **merged** https://gitlab.wikimedia.org/repos/data-engineering... [19:28:34] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11873200 (10Papaul) [19:38:49] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11873230 (10Papaul) [19:41:41] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11873235 (10Papaul) [20:35:58] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11873424 (10Papaul) [20:53:34] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11873477 (10Papaul) [21:48:04] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11873578 (10Papaul) @ssingh important note: The public subnet mask for servers in rack 103.02.22 will be changing for /28 to /27 so will will have to manually... [22:21:26] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11873638 (10Papaul) @RobH Remote hands instructions are ready @ https://docs.google.com/document/d/1EW6hxHCQjXPy1PXQWluwOTnCl_AHddI34iOYHdJuvek/edit?tab=t.0 Pl... [23:42:49] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11873863 (10Papaul) [23:53:14] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11873875 (10Papaul)