[08:06:00] 06Traffic, 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Investigate raise in Invalid HAProxyKafka messages in esams - https://phabricator.wikimedia.org/T422033#11807603 (10JAllemandou) 05Open→03Resolved Having discussed this with Traffic, this was related to SSL handshake problem (regular tra... [08:07:08] 06Traffic, 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 13Patch-For-Review: Surge in webrequest validation check - https://phabricator.wikimedia.org/T422030#11807606 (10JAllemandou) Removing myself as the task assignee so that someone else take it wh... [08:07:14] 06Traffic, 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 13Patch-For-Review: Surge in webrequest validation check - https://phabricator.wikimedia.org/T422030#11807611 (10JAllemandou) a:05JAllemandou→03None [08:09:03] 06Traffic, 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 13Patch-For-Review: Surge in webrequest validation check - https://phabricator.wikimedia.org/T422030#11807625 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez Closing the task as this h... [10:09:24] 06Traffic, 06ServiceOps new: Thumbor is using an unmantained HAProxy version - https://phabricator.wikimedia.org/T422926 (10Vgutierrez) 03NEW [10:09:28] 06Traffic, 06ServiceOps new: Thumbor is using an unmantained HAProxy version - https://phabricator.wikimedia.org/T422926#11808084 (10Vgutierrez) p:05Triage→03High [10:26:15] 06Traffic, 06ServiceOps new, 10ServiceOps-Services-Oids, 10Thumbor: Thumbor is using an unmantained HAProxy version - https://phabricator.wikimedia.org/T422926#11808164 (10Clement_Goubert) Tagging @JTweed-WMF for awareness. Thanks for restoring the component while we work out a solution. I think the most... [10:26:23] 06Traffic, 06ServiceOps new, 10ServiceOps-Services-Oids, 10Thumbor: Thumbor is using an unmantained HAProxy version - https://phabricator.wikimedia.org/T422926#11808167 (10Clement_Goubert) a:03Clement_Goubert [10:27:26] 06Traffic, 06ServiceOps new, 10ServiceOps-Services-Oids, 10Thumbor: Thumbor is using an unmantained HAProxy version - https://phabricator.wikimedia.org/T422926#11808177 (10Vgutierrez) meanwhile I've restored the component and uploaded haproxy 2.8.20 there [11:30:57] 06Traffic, 06SRE: Nakavo - Rate Limiting Query - https://phabricator.wikimedia.org/T422872#11808307 (10NakavoDev) >>! In T422872#11806792, @Reedy wrote: > By you extract one of the links... What do you mean? > > Are you always getting thumbs? Or are you sometimes (often?) requesting the originals based on siz... [12:06:22] 10netops, 06Infrastructure-Foundations: Create public vlans in eqiad and codfw - https://phabricator.wikimedia.org/T422043#11808413 (10cmooney) >>! In T422043#11804537, @Jhancock.wm wrote: > D row has no specialty rack at all so we can easily work around that for future private vlan installs. To be clear the... [13:09:20] FIRING: DnsboxServiceMismatch: Service ntp-a state mismatch on dns1004:9100 - https://wikitech.wikimedia.org/wiki/DNS#DnsboxServiceMismatch - https://grafana.wikimedia.org/d/96fb573c-0f3c-456a-886c-e50c29f3ed48/dns-box-service-state?var-site=eqiad&var-instance=dns1004:9100 - https://alerts.wikimedia.org/?q=alertname%3DDnsboxServiceMismatch [13:09:53] ^ just a delay in the alert firing. should it fire at all? no. [13:10:32] [service was restarted, so it's expected] [13:14:20] RESOLVED: DnsboxServiceMismatch: Service ntp-a state mismatch on dns1004:9100 - https://wikitech.wikimedia.org/wiki/DNS#DnsboxServiceMismatch - https://grafana.wikimedia.org/d/96fb573c-0f3c-456a-886c-e50c29f3ed48/dns-box-service-state?var-site=eqiad&var-instance=dns1004:9100 - https://alerts.wikimedia.org/?q=alertname%3DDnsboxServiceMismatch [13:19:50] FIRING: [2x] DnsboxServiceMismatch: Service ntp-a state mismatch on dns1004:9100 - https://wikitech.wikimedia.org/wiki/DNS#DnsboxServiceMismatch - https://alerts.wikimedia.org/?q=alertname%3DDnsboxServiceMismatch [13:20:19] soooo weird. it fires after being resolved and clearly when there is no mismatch now. [13:24:50] RESOLVED: DnsboxServiceMismatch: Service ntp-a state mismatch on dns1004:9100 - https://wikitech.wikimedia.org/wiki/DNS#DnsboxServiceMismatch - https://grafana.wikimedia.org/d/96fb573c-0f3c-456a-886c-e50c29f3ed48/dns-box-service-state?var-site=eqiad&var-instance=dns1004:9100 - https://alerts.wikimedia.org/?q=alertname%3DDnsboxServiceMismatch [13:31:50] FIRING: DnsboxServiceMismatch: Service ntp-c state mismatch on dns1006:9100 - https://wikitech.wikimedia.org/wiki/DNS#DnsboxServiceMismatch - https://grafana.wikimedia.org/d/96fb573c-0f3c-456a-886c-e50c29f3ed48/dns-box-service-state?var-site=eqiad&var-instance=dns1006:9100 - https://alerts.wikimedia.org/?q=alertname%3DDnsboxServiceMismatch [13:33:59] https://www.youtube.com/watch?v=pndhO5DcSI0 [13:37:33] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE: hardware troubleshooting: NVMe errors on cp1115.eqiad.wmnet - https://phabricator.wikimedia.org/T421007#11808649 (10ssingh) [14:19:40] 06Traffic, 06SRE: Nakavo - Rate Limiting Query - https://phabricator.wikimedia.org/T422872#11808793 (10Reedy) Do you know what stats of thumbnail vs original you’re requesting? Generally, thumbnails are definitely preferred, so if you’re preferring original because it’s first match, that will start to explain... [14:41:31] 10netops, 06Infrastructure-Foundations, 10observability, 10Prod-Kubernetes, and 5 others: Increase visibility of kubernetes network status - https://phabricator.wikimedia.org/T356877#11808865 (10cmooney) Broadly the patch submitted looked good to me, though I see it was abandoned. As per the comment I lef... [14:43:46] 10netops, 06Infrastructure-Foundations, 10observability, 10Prod-Kubernetes, and 5 others: Increase visibility of kubernetes network status - https://phabricator.wikimedia.org/T356877#11808868 (10Blake) Ah, thanks Cathal! The original patch was abandoned because I was struggling with git, the new patch is n... [14:54:13] 06Traffic, 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 13Patch-For-Review: Surge in webrequest validation check - https://phabricator.wikimedia.org/T422030#11808900 (10xcollazo) 05Resolved→03Open Reopening and tagging #data-engineering since we... [14:54:52] 06Traffic, 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 13Patch-For-Review: Surge in webrequest validation check - https://phabricator.wikimedia.org/T422030#11808902 (10xcollazo) p:05Triage→03Medium a:05Vgutierrez→03None [15:26:57] 06Traffic, 06SRE: Nakavo - Rate Limiting Query - https://phabricator.wikimedia.org/T422872#11808983 (10NakavoDev) >>! In T422872#11808793, @Reedy wrote: > Do you know what stats of thumbnail vs original you’re requesting? > > Generally, thumbnails are definitely preferred, so if you’re preferring original bec... [16:07:50] RESOLVED: DnsboxServiceMismatch: Service ntp-a state mismatch on dns7001:9100 - https://wikitech.wikimedia.org/wiki/DNS#DnsboxServiceMismatch - https://grafana.wikimedia.org/d/96fb573c-0f3c-456a-886c-e50c29f3ed48/dns-box-service-state?var-site=magru&var-instance=dns7001:9100 - https://alerts.wikimedia.org/?q=alertname%3DDnsboxServiceMismatch [16:08:21] restarts done. anything from now on is an actual alert. [16:09:20] FIRING: DnsboxServiceMismatch: Service ntp-b state mismatch on dns7002:9100 - https://wikitech.wikimedia.org/wiki/DNS#DnsboxServiceMismatch - https://grafana.wikimedia.org/d/96fb573c-0f3c-456a-886c-e50c29f3ed48/dns-box-service-state?var-site=magru&var-instance=dns7002:9100 - https://alerts.wikimedia.org/?q=alertname%3DDnsboxServiceMismatch [16:12:00] stale! [16:14:20] RESOLVED: [2x] DnsboxServiceMismatch: Service ntp-a state mismatch on dns7001:9100 - https://wikitech.wikimedia.org/wiki/DNS#DnsboxServiceMismatch - https://alerts.wikimedia.org/?q=alertname%3DDnsboxServiceMismatch [16:31:02] 06Traffic, 10Diff-blog, 10Technical Blog: Redirect techblog.wikimedia.org to diff.wikimedia.org - https://phabricator.wikimedia.org/T417940#11809114 (10CKoerner_WMF) >>! In T417940#11641988, @ssingh wrote: > Hi @CKoerner_WMF: To clarify, this should cover all posts and the domain techblog should redirect to... [19:44:33] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE: hardware troubleshooting: NVMe errors on cp1115.eqiad.wmnet - https://phabricator.wikimedia.org/T421007#11809628 (10VRiley-WMF) Finally was able to get Dell to send out a new part for the unit. Part should arrive next business day. [22:34:00] 06Traffic, 10Pywikibot, 06SRE, 10Wikidata, and 2 others: Pywikibot reports maxlag retry error - https://phabricator.wikimedia.org/T421642#11809908 (10MisterSynergy) >>! In T421642#11785461, @Xqt wrote: > The problems began on March 25th: > {F74901675} Exact timestamp seems to be shortly after 2026-03-25 1...