[09:02:24] 06serviceops, 10Wikidata, 10Discovery-Search (2025.06.13 - 2025.07.04), 13Patch-For-Review: Migrate discovery-search jobs to mw-cron - https://phabricator.wikimedia.org/T388538#10946273 (10Clement_Goubert) Since the issue is tracked elsewhere, and the jobs are effectively migrated, I'm resolving this t... [09:02:36] 06serviceops, 10Wikidata, 10Discovery-Search (2025.06.13 - 2025.07.04), 13Patch-For-Review: Migrate discovery-search jobs to mw-cron - https://phabricator.wikimedia.org/T388538#10946275 (10Clement_Goubert) 05Open→03Resolved [10:28:07] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Turn down mwmaint production servers - https://phabricator.wikimedia.org/T397017#10946607 (10Clement_Goubert) a:03Clement_Goubert [11:28:40] 06serviceops, 06DC-Ops, 10ops-eqiad: hw troubleshooting: Backplane error for wikikube-worker1069.eqiad.wmnet - https://phabricator.wikimedia.org/T397829 (10Clement_Goubert) 03NEW [12:23:57] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Fix alternatives entries in helm and kubernetes-client packages - https://phabricator.wikimedia.org/T387548#10946856 (10Jelto) 05Open→03Resolved All hosts have been updated to use the `kubernetes-client`... [12:24:39] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Update wikikube codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T397148#10946859 (10Jelto) [13:14:11] 06serviceops, 10Page Content Service: mobileapps is comparatively slower to handle changeprop events - https://phabricator.wikimedia.org/T397750#10947003 (10hnowlan) Removing CPU and memory limits results in similar behaviour. mobileapps will eventually peg itself at just over 1*num_workers CPUs and once it hi... [13:27:45] can i go ahead and deploy mobileapps after this deployment window? cc hnowlan in case you are working on mobileapps prod [13:29:45] nemo-yiannis: would you mind if I got my changes in place first? it'll hopefully be a big improvement [13:29:53] yeah sure [13:29:57] nemo-yiannis: what's in your change, just so I'm informed? [13:30:12] i am upgrading node versions on all our services and mobileapps is the last [13:30:13] it can wait [13:31:11] cool - any major changes expected after the upgrade? [13:40:17] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, and 2 others: Update Kubernetes clusters to 1.31 - https://phabricator.wikimedia.org/T341984#10947130 (10Clement_Goubert) We should make an announcement on SRE mailing lists for the next upgrade, to avoid surprises. [13:43:23] hnowlan: not really, just regular maintenance [13:50:56] 06serviceops, 10MW-on-K8s, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Make mw-experimental production ready - https://phabricator.wikimedia.org/T396767#10947228 (10Tgr) The easy way to make private security patches was to make the edit on the deployment host in `/srv/mediawiki-sta... [14:02:59] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: hw troubleshooting: Backplane error for wikikube-worker1069.eqiad.wmnet - https://phabricator.wikimedia.org/T397829#10947289 (10Jclark-ctr) Confirmed: Service Request 211933253 [14:11:06] 06serviceops, 10MW-on-K8s: Turn down mwmaint production servers - https://phabricator.wikimedia.org/T397017#10947323 (10Clement_Goubert) [14:17:01] 06serviceops, 10MW-on-K8s: Turn down mwmaint production servers - https://phabricator.wikimedia.org/T397017#10947375 (10Clement_Goubert) [14:53:28] 06serviceops, 10MW-on-K8s: Turn down mwmaint production servers - https://phabricator.wikimedia.org/T397017#10947620 (10Urbanecm_WMF) hey, before we plug the switch, would it be possible to adjust the default `mysql` prompt on the deployment machine? I like mwmaint tells me where am I connected, while deploy*... [14:57:44] 06serviceops, 06DBA, 10MW-on-K8s: Should deployment servers include mariadb::maintenance profile - https://phabricator.wikimedia.org/T397847 (10Clement_Goubert) 03NEW [15:09:50] 06serviceops, 06MediaWiki-Engineering, 06Release-Engineering-Team: Deprecate mwdebugXXXX hosts - https://phabricator.wikimedia.org/T397498#10947665 (10jijiki) 05Open→03In progress p:05Triage→03Medium [15:33:45] 06serviceops, 06DC-Ops, 10ops-eqiad: hw troubleshooting: Backplane failure for wikikube-worker1243.eqiad.wmnet - https://phabricator.wikimedia.org/T397851 (10Clement_Goubert) 03NEW [15:34:52] !log homer "cr*eqiad*" commit 'wikikube-worker1243 failed' [15:40:17] claime: wrong channel presumably [15:40:31] oh sorry, I'm late [15:40:32] yeah already posted it to the right one [15:40:35] 's ok [16:24:22] 06serviceops, 13Patch-For-Review: Migrate the etcd main cluster to cfssl-based PKI - https://phabricator.wikimedia.org/T352245#10947908 (10Scott_French) Let's start with the good news: Everything that //could// be evaluated after migrating a single host (conf2006) seems to work as expected. We were able to con... [16:25:04] 06serviceops, 06DBA, 10MW-on-K8s: Should deployment servers include mariadb::maintenance profile - https://phabricator.wikimedia.org/T397847#10947913 (10Ladsgroup) The puppet file for this profile is empty (reading `modules/profile/manifests/mariadb/maintenance.pp`) I think it's safe to drop it (but make sur... [16:34:21] 06serviceops, 10MW-on-K8s, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Make mw-experimental production ready - https://phabricator.wikimedia.org/T396767#10947997 (10jijiki) >>! In T396767#10947228, @Tgr wrote: > The easy way to make private security patches was to make the edit on... [17:31:42] 06serviceops, 10MW-on-K8s, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Make mw-experimental production ready - https://phabricator.wikimedia.org/T396767#10948139 (10Tgr) It would be great to find a solution for it. I don't do security changes often though, so maybe worth asking som... [18:34:28] 06serviceops, 10MW-on-K8s, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Make mw-experimental production ready - https://phabricator.wikimedia.org/T396767#10948310 (10sbassett) >>! In T396767#10948138, @Tgr wrote: > It would be great to find a solution for it. I don't do security cha... [18:52:15] 06serviceops, 13Patch-For-Review: Migrate the etcd main cluster to cfssl-based PKI - https://phabricator.wikimedia.org/T352245#10948370 (10MoritzMuehlenhoff) One rather "cheap" way of solving this could by a systemd override which adds "RestartSec=5s" exact value TBD, I think hat would reliably avoid the resta... [20:02:59] 06serviceops, 07Datacenter-Switchover: Update switchover behavior for mw-wikifunctions - https://phabricator.wikimedia.org/T397874 (10Scott_French) 03NEW [20:06:21] 06serviceops, 10SRE-swift-storage, 07Datacenter-Switchover: Turn down unused swift-r[ow] discovery services - https://phabricator.wikimedia.org/T376237#10948585 (10Scott_French) 05Open→03Resolved This is done now. Thanks for the reviews, all! [20:13:24] 06serviceops, 07Datacenter-Switchover: Assess switchover behavior for mw-wikifunctions - https://phabricator.wikimedia.org/T397874#10948604 (10Scott_French) [21:29:40] 06serviceops, 10MW-on-K8s, 10Data-Platform-SRE (2025.06.13 - 2025.07.04), 10Discovery-Search (2025.06.13 - 2025.07.04): Investigate EQIAD daily completion suggester rebuild failure - https://phabricator.wikimedia.org/T395465#10948832 (10EBernhardson) Relatively minimal reproduction of the OOM we trigger. I... [21:43:13] 06serviceops, 10MW-on-K8s, 06Release-Engineering-Team, 10Scap: helmfile/scap does not reliably bootstrap mediawiki - https://phabricator.wikimedia.org/T397685#10948866 (10Scott_French) I was chatting with @dancy earlier today about what might have caused this, and it's kind of a puzzling one. Super-naivel... [22:16:23] 06serviceops, 10MW-on-K8s, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Make mw-experimental production ready - https://phabricator.wikimedia.org/T396767#10948922 (10bd808) `scap pull` is tied to the legacy bare metal PHP deployment process that is rapidly becoming obsolete. Improvi... [22:26:57] 06serviceops, 07Datacenter-Switchover: Assess switchover behavior for mw-wikifunctions - https://phabricator.wikimedia.org/T397874#10948936 (10Scott_French)