[02:14:20] 06serviceops, 13Patch-For-Review: Migrate production Shellbox variants to PHP 8.1 - https://phabricator.wikimedia.org/T377038#10589628 (10Scott_French) I spent some time this afternoon looking more closely at this, and I suspect we're only going to be able to get at this with more data than we have now. A cou... [02:33:20] 06serviceops, 06MediaWiki-Engineering, 06Traffic, 07Upstream, 07Wikimedia-production-error: 503 error when edit large size pages on PHP 8.1 - https://phabricator.wikimedia.org/T385395#10589637 (10Scott_French) As of ~ 15:40 UTC (Thursday), the traffic migration has returned to the state we rolled back fr... [02:46:26] 06serviceops: Move conf2005 within the same rack - https://phabricator.wikimedia.org/T387416#10589662 (10Scott_French) Ah, thanks for clarifying, Janis. That makes this quite a bit simpler, then. [02:55:15] 06serviceops, 06collaboration-services, 06Data-Persistence, 06DC-Ops, and 2 others: Tracking List: Relocating servers to free up 10G switch space in codfw - https://phabricator.wikimedia.org/T383709#10589667 (10Scott_French) Ah, that's good to know, @Jhancock.wm. If leaving it in place isn't causing any tr... [08:01:58] Hi folks, I'd like your help migrating several of your LVS services to to IPIP encapsulation and maglev as part of the ongoing migration to liberica. First one is docker-registry (T387294) i've already submitted the CR https://gerrit.wikimedia.org/r/q/topic:%22T387294%22, I'd need your help reviewing the CR and coordinating the migration with me, I can happily take care of all the on-hands work. Thanks! [08:20:32] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Fix alternatives entries in helm and kubernetes-client packages - https://phabricator.wikimedia.org/T387548 (10JMeybohm) 03NEW [10:38:22] 06serviceops, 06SRE, 10Wikimedia-Apache-configuration, 10Wikimedia-Portals, and 2 others: www.wikipedia.org: prefilling the search box with the "search" URL parameter does not work - https://phabricator.wikimedia.org/T318285#10590297 (10elukey) 05Resolved→03Open Hi folks! I am really sorry to ruin the... [10:39:29] vgutierrez: o/ I can help with the registry bits [10:39:43] lovely [10:41:31] we can do eqiad first that is not serving traffic, and then codfw [10:41:44] do you have a timeline? Monday? [10:41:55] Monday it's ok [10:42:09] let me revert the order of the DCs then [10:42:10] all right, I think eqiad can be done in the morning anytime, just ping me [10:42:19] I usually go with codfw first by default [10:42:34] the registry is special sadly :( [10:50:14] elukey: ok, order adjusted, CRs are ready for your review :D [10:51:52] <3 [11:03:23] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Allow members of restricted to run maintenance scripts - https://phabricator.wikimedia.org/T378429#10590367 (10Urbanecm_WMF) Note this makes it challenging to post the "stream logs" command at tasks (see what I had to do in T385780#10590355). Unfortunately, we c... [11:16:17] 06serviceops, 07Datacenter-Switchover: Investigate burst of read only errors during live test - https://phabricator.wikimedia.org/T387509#10590414 (10hnowlan) On the session loss errors - I believe these are actually other operations that resulted in [[ https://logstash.wikimedia.org/app/discover#/doc/logstash... [11:40:53] 06serviceops, 10MW-on-K8s: Ensure tls-proxy container is started before launching main container - https://phabricator.wikimedia.org/T387208#10590443 (10Clement_Goubert) [13:04:34] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Respect kubeVersion constraints in deployment-charts CI - https://phabricator.wikimedia.org/T387376#10590686 (10JMeybohm) p:05Triage→03High a:03JMeybohm [13:04:55] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Fix alternatives entries in helm and kubernetes-client packages - https://phabricator.wikimedia.org/T387548#10590688 (10JMeybohm) p:05Triage→03Low [13:05:24] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 10Data-Platform-SRE (2025.02.10 - 2025.02.28), 07Kubernetes: Update wikikube-staging-codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T384450#10590689 (10JMeybohm) p:05Medium→03High [13:27:40] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 10Data-Platform-SRE (2025.03.01 - 2025.03.21), 07Kubernetes: Update wikikube-staging-codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T384450#10590788 (10Gehel) [13:37:34] 06serviceops, 06SRE, 10Wikimedia-Apache-configuration, 10Wikimedia-Portals, and 2 others: www.wikipedia.org: prefilling the search box with the "search" URL parameter does not work - https://phabricator.wikimedia.org/T318285#10590993 (10Gehel) [13:51:46] 06serviceops, 06SRE Observability: chartmuseum prometheus metrics cardinality spam - https://phabricator.wikimedia.org/T386808#10591100 (10kamila) a:03kamila The fix hasn't been merged. I pinged them, I'll see if I can get things moving. As for where these requests come from, I looked at a random sample of... [14:44:14] 06serviceops, 10MW-on-K8s, 10Observability-Logging, 07Kubernetes: Move rsyslog-generated mediawiki logs within k8s to their own kafka topics - https://phabricator.wikimedia.org/T384335#10591199 (10fgiunchedi) Something else to consider: not only udp-localhost sources but also file sources like php-fpm erro... [15:27:35] 06serviceops, 06collaboration-services, 06Data-Persistence, 06DC-Ops, and 2 others: Tracking List: Relocating servers to free up 10G switch space in codfw - https://phabricator.wikimedia.org/T383709#10591355 (10Jhancock.wm) 05Open→03Resolved we can leave it. The last server should be getting decomm... [16:10:29] 06serviceops, 06SRE, 10Wikimedia-Apache-configuration, 10Wikimedia-Portals, and 3 others: www.wikipedia.org: prefilling the search box with the "search" URL parameter does not work - https://phabricator.wikimedia.org/T318285#10591539 (10jcrespo) [16:13:19] 06serviceops: Move conf2005 within the same rack - https://phabricator.wikimedia.org/T387416#10591542 (10Scott_French) [16:13:28] 06serviceops: Move conf2005 within the same rack - https://phabricator.wikimedia.org/T387416#10591543 (10Scott_French) [16:15:24] 06serviceops: Move conf2005 within the same rack - https://phabricator.wikimedia.org/T387416#10591546 (10Scott_French) 05Open→03Resolved After further discussion on T383709, conf2005 does not need moved at this time. Thanks in any case for reviewing the procedure, Janis. [17:29:29] 06serviceops, 06SRE: HTTP 429 error on private wikis trying to create account via Special:CreateAccount - https://phabricator.wikimedia.org/T359901#10591796 (10Bugreporter) [20:13:26] 06serviceops, 07Datacenter-Switchover, 13Patch-For-Review: Investigate burst of read only errors during live test - https://phabricator.wikimedia.org/T387509#10592333 (10Tgr) > In theory, putting the read only datacentre in read only should be a fine thing to do I don't think this was ever even remotely tru... [20:19:09] 06serviceops, 06DC-Ops, 10ops-eqiad, 10Prod-Kubernetes, and 2 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T383620#10592354 (10VRiley-WMF) 05Open→03Resolved These have been relabled [20:36:51] 06serviceops, 07Datacenter-Switchover, 13Patch-For-Review: Investigate burst of read only errors during live test - https://phabricator.wikimedia.org/T387509#10592384 (10Tgr) Specifically for `deleteServerObjectsExpiringBefore`, it should be straightforward to add a check of whether the DB is readonly - ther...