[03:19:16] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: Update ULSFO LVS service IP's - https://phabricator.wikimedia.org/T418971#11776183 (10Papaul) @SLyngshede-WMF thank you very much. [03:25:45] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11776198 (10Papaul) [03:55:09] 10netops, 06DC-Ops, 06Infrastructure-Foundations: Standardize management routers interfaces - https://phabricator.wikimedia.org/T421674#11776217 (10Papaul) @VRiley-WMF @Jclark-ctr when you are next onsite, can you please look for 1 QFX-SFP-1GE-T and plug it into mr1-eqiad ge-0/0/7? Thank you https://www.fs.... [06:50:29] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11776348 (10ABran-WMF) 05In progress→03Resolved [08:11:25] 10netops, 06Infrastructure-Foundations, 06SRE: Re-IP eqiad private baremetal hosts to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T421704#11776508 (10MLechvien-WMF) [08:53:06] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11776679 (10ArthurTaylor) Saw this issue again this morning: https://integration.wikimedia.org/ci/job/quibbl... [08:56:39] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11776685 (10SomeRandomDeveloper) Also seen a couple hours ago in https://integration.wikimedia.org/ci/job/qu... [09:08:36] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11776712 (10ArthurTaylor) And this - different but maybe related: https://integration.wikimedia.org/ci/job/q... [09:09:24] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11776713 (10ABran-WMF) 05Resolved→03Open thanks for raising these, I'll check [09:27:29] 06Traffic, 10MediaWiki-Action-API: Notifications API is returning an authorisation error since today - https://phabricator.wikimedia.org/T421991#11776808 (10Aklapper) [09:27:46] 06Traffic, 10MediaWiki-Action-API: Notifications API is returning a permissions error since 2026-04-01 for a bot account - https://phabricator.wikimedia.org/T421991#11776809 (10Aklapper) [09:53:15] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11776939 (10ABran-WMF) @ArthurTaylor @SomeRandomDeveloper is it OK for me to retry some of these jobs to test [[... [09:55:38] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11776946 (10ArthurTaylor) @ABran-WMF sure, but I think my patches have now been merged (by retrying a bunch of t... [09:57:01] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11776949 (10SomeRandomDeveloper) The jobs I linked are also associated with patches that are already merged. The... [10:01:06] 06Traffic, 10MediaWiki-Action-API, 06MW-Interfaces-Team: Notifications API is returning a permissions error since 2026-04-01 for a bot account - https://phabricator.wikimedia.org/T421991#11776994 (10SomeRandomDeveloper) Related: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Echo/+/1265608 (and I assu... [10:30:08] 06Traffic, 10Liberica, 10Prod-Kubernetes, 06ServiceOps new, 07Kubernetes: Migrate Wikikube k8s apiserver and services to IPIP - https://phabricator.wikimedia.org/T420436#11777137 (10MLechvien-WMF) [10:35:25] 10netops, 06Infrastructure-Foundations, 10observability, 10Prod-Kubernetes, and 4 others: Increase visibility of kubernetes network status - https://phabricator.wikimedia.org/T356877#11777204 (10JMeybohm) [12:46:27] 06Traffic, 10Liberica, 10Prod-Kubernetes, 06ServiceOps new, 07Kubernetes: Add missing wikikube workers to conftool-data - https://phabricator.wikimedia.org/T420729#11777638 (10MLechvien-WMF) [13:05:12] 06Traffic, 10MediaWiki-Action-API, 10Notifications (Echo): Notifications API is returning a permissions error since 2026-04-01 for a bot account - https://phabricator.wikimedia.org/T421991#11777815 (10AGhirelli-WMF) [13:08:35] 06Traffic, 06ServiceOps new, 10ServiceOps-Services-Oids, 06Product Safety and Integrity (Sprint Forsythia (Mar 23 - Apr 10))), 05WE4.2 Bot detection (WE4.2 hCaptcha editing trial): hCaptcha: Stop using urldownloader for health checks of the secure-api.js... - https://phabricator.wikimedia.org/T421464#11777841 [13:13:48] 06Traffic, 13Patch-For-Review: Upgrade HAProxy to version 3.2 - https://phabricator.wikimedia.org/T421402#11777867 (10Fabfur) [13:28:15] sukhe we are getting ready to order some NVMEs for the first time, do y'all have any recommendations on a particular type? We will probably end up using y'all's config F but with smaller drives [13:29:37] inflatador: Config F is what we are doing for the cp's yep. they seem to have worked well for us so far. [13:30:06] we have had a few SSDs fail here and there but none of the NVMes [13:40:02] sukhe thanks, we're gonna get some Config F's with smaller drives then [13:48:47] note our config F has *huge* amounts of RAM, too, which is costly if you don't needit [13:49:15] it has in recent times mostly been a config only used for the cps [13:56:01] We are gonna need the extra RAM, but the good news is we should be able to shrink our overall server footprint [13:59:25] And the new servers will be running Virtuosso or qlever, **not** blazegraph ;) [14:05:28] 06Traffic, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Surge in webrequest sequence-id validation check - https://phabricator.wikimedia.org/T422030 (10JAllemandou) 03NEW [14:16:37] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11778265 (10hashar) [14:16:49] 06Traffic, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Investigate raise in Invalid HAProxyKafka messages in esams - https://phabricator.wikimedia.org/T422033 (10JAllemandou) 03NEW [14:18:08] inflatador: those nvmes will make a huge difference if you're iops or i/o latency-bound. Note that they can be (and should be) formatted for native 4K block sizes too, which can have some trickle-up effects on database-level storage tuning stuff too. [14:19:28] bblack Thanks, I appreciate the tip. I assume that is in y'all's partman recipes already? [14:20:34] While I have ya, any opinions on filesystem? We were a Red Hat shop at my old job and it was pounded into me to use XFS for databases. [14:20:57] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11778317 (10hashar) >>! In T421904#11778130, @A_smart_kitten wrote: > I tried the link in the task description a... [14:21:26] inflatador: no idea, I figured you were using raw block devices or something. that's what we do for cache storage on nvme [14:21:35] if you look in: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/install_server/files/autoinstall/scripts/late_command.sh [14:22:04] we have some late_command stuff there, if you search around for "nvme". It's conditional on cp-hostnames, and now various for the new cp2 nodes vs others, because there was a slight change to how you format the disks [14:22:14] probably just need to update the conditionals there to cover your new hosts [14:22:48] partman doesn't know how to cover this afaik, that's why we just execute the low-level formatting in late_command [14:23:04] 06Traffic, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Investigate raise in Invalid HAProxyKafka messages in esams - https://phabricator.wikimedia.org/T422033#11778343 (10JAllemandou) [14:23:08] whaaa? Partman doesn't know how to do something? ;P [14:23:33] I don't think raw devices are an option for blazegraph, but worth a look for the new contenders. Will get a ticket started [14:24:07] 06Traffic, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Surge in webrequest sequence-id validation check - https://phabricator.wikimedia.org/T422030#11778354 (10JAllemandou) [14:24:15] even if you're using a filesystem, you still want the 4K formatting. and then you may want to tune some mkfs params depending on which filesystem you pick (so that it works in 4K blocks at the bottom too) [14:24:24] 06Traffic, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Surge in webrequest sequence-id validation check - https://phabricator.wikimedia.org/T422030#11778357 (10JAllemandou) [14:24:42] 4K native blocks on the device means there's a 1:1 mapping between OS memory pages and disk block pages :) [14:26:28] NICE [14:31:16] speaking of partman/preseed, I've been playing around with FAI ( https://fai-project.org/ ) in my homelab. It can bootstrap Debian servers without partman/preseed (the Debian team uses it for building their cloud images https://salsa.debian.org/cloud-team/debian-cloud-images/-/tree/master/config_space/13 ). It is really fast and the disk config is actually sane. Hope to have a demo for y'all one of these days ;) [14:37:54] 06Traffic, 13Patch-For-Review: Upgrade HAProxy to version 3.2 - https://phabricator.wikimedia.org/T421402#11778468 (10Fabfur) [14:52:03] 06Traffic, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Surge in webrequest sequence-id validation check - https://phabricator.wikimedia.org/T422030#11778559 (10Fabfur) This could be related to upgrade to HAProxy 3.2 (T421402) that started on the drmrs datacenter, we'll investigate if the sequence... [14:54:59] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11778595 (10DLynch) More: https://integration.wikimedia.org/ci/job/quibble-with-gated-extensions-selenium-php83/... [15:09:36] 10netops, 06Infrastructure-Foundations: Create public vlan on eqiad and codfw pods E/F - https://phabricator.wikimedia.org/T422043 (10ayounsi) 03NEW [15:24:06] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11778802 (10ABran-WMF) The change has been merged, please let us know if that does not fix the situation. [15:43:41] 06Traffic, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: Surge in webrequest sequence-id validation check - https://phabricator.wikimedia.org/T422030#11778885 (10Fabfur) This is most probably due to a deprecation in haproxy configuration directives https://www.haproxy.com/blog... [15:56:18] 10netops, 06Infrastructure-Foundations: Create public vlan on eqiad and codfw pods E/F - https://phabricator.wikimedia.org/T422043#11778904 (10cmooney) If we are going to have one public-enabled rack per "pod" then should we not have just one vlan assigned for codfw row E/F (and then one also for a/b and c/d)?... [16:14:48] 06Traffic, 13Patch-For-Review: Upgrade HAProxy to version 3.2 - https://phabricator.wikimedia.org/T421402#11779013 (10Fabfur) [16:15:51] 06Traffic, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: Surge in webrequest sequence-id validation check - https://phabricator.wikimedia.org/T422030#11779016 (10Fabfur) [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1266301 | The patch ]]has been applied to all impa... [16:17:08] 06Traffic, 13Patch-For-Review: Upgrade HAProxy to version 3.2 - https://phabricator.wikimedia.org/T421402#11779019 (10Fabfur) [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1266301 | Related patch ]] [16:50:06] <_joe_> inflatador: yeah anything that has the potential to get us out of partman/preseed would be great, I think we had looked into FAI in the past, maybe moritzm remembers more [17:07:58] hello traffic friends - just FYI in case you see it go by: I'll be merging a service.yaml patch shortly that moves a service directly from production to service_setup. this is a k8s ingress service, not an LVS service, so this should be fine :) [17:10:04] swfrench-wmf: thanks for the heads up [17:21:59] 06Traffic, 06SRE: Deprecate low-traffic proxoid service and O:hcaptcha_proxy for the older hcaptcha proxy setup - https://phabricator.wikimedia.org/T411097#11779456 (10BCornwall) 05In progress→03Resolved a:03BCornwall [17:24:03] _joe_ cool, I've gotten it to bootstrap baremetal EFI servers in my lab, here's an example storage config and how FAI renders it https://phabricator.wikimedia.org/P90195 [17:32:25] 06Traffic, 06SRE: Deprecate low-traffic proxoid service and O:hcaptcha_proxy for the older hcaptcha proxy setup - https://phabricator.wikimedia.org/T411097#11779507 (10ssingh) Thanks for taking care of this @BCornwall! [19:55:57] 06Traffic, 06cloud-services-team, 10Data-Services, 10Datasets-General-or-Unknown, 13Patch-For-Review: Move dumps.wikimedia.org HTTP service behind CDN edge - https://phabricator.wikimedia.org/T306550#11779966 (10taavi) 05Open→03Invalid Thanks @BBlack. I've forked {T422040} for making this as a pu... [19:57:06] 06Traffic, 06cloud-services-team, 10Data-Services, 10Datasets-General-or-Unknown, 13Patch-For-Review: Move dumps.wikimedia.org HTTP service behind CDN edge - https://phabricator.wikimedia.org/T306550#11779972 (10taavi) 05Invalid→03Declined [20:38:48] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11780122 (10SomeRandomDeveloper) Seen again in https://integration.wikimedia.org/ci/job/quibble-with-gated-exten... [21:24:03] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11780253 (10BCornwall) [21:24:21] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11780256 (10BCornwall) [21:59:51] 06Traffic, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Surge in webrequest sequence-id validation check - https://phabricator.wikimedia.org/T422030#11780385 (10Ahoelzl) a:03JAllemandou [22:00:34] 06Traffic, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Investigate raise in Invalid HAProxyKafka messages in esams - https://phabricator.wikimedia.org/T422033#11780389 (10Ahoelzl) a:03JAllemandou [22:07:57] 06Traffic, 072026-user-javascript-incident, 07ContentSecurityPolicy: Can't debug scripts on localhost on URLs that omit /w/index.php - https://phabricator.wikimedia.org/T421565#11780400 (10sbassett) Yes, this is an odd edge case that will hopefully disappear soon, once we migrate CSP config in Wikimedia prod... [23:25:10] 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: mr1-eqiad: move from OSPF to BGP - https://phabricator.wikimedia.org/T421238#11780563 (10Papaul) All the BGP sessions are up ` mr1-eqiad# run show bgp summary group Production Threading mode: BGP I/O Default eBGP mode: advertise - accept, receive... [23:26:38] 10netops, 06DC-Ops, 06Infrastructure-Foundations: Standardize management routers interfaces - https://phabricator.wikimedia.org/T421674#11780565 (10Papaul) @VRiley-WMF any update on this: I will like to move the OOB tomorrow during the maintenance window. Thanks