[02:28:43] 06serviceops, 06DC-Ops, 10ops-eqiad, 10Prod-Kubernetes, and 2 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T383620#10599156 (10VRiley-WMF) [07:36:06] kubestagemaster1005 will be on DRBD for like an half an hour (ganeti node reimage) [07:40:36] 06serviceops, 06Release-Engineering-Team: train presync failed - https://phabricator.wikimedia.org/T387823 (10hashar) 03NEW [07:40:47] 06serviceops, 06Release-Engineering-Team: train presync failed - https://phabricator.wikimedia.org/T387823#10599459 (10hashar) p:05Triage→03Unbreak! [07:41:10] 06serviceops, 06Release-Engineering-Team: train presync failed - https://phabricator.wikimedia.org/T387823#10599461 (10hashar) [07:49:41] 06serviceops, 06Release-Engineering-Team: train presync failed - https://phabricator.wikimedia.org/T387823#10599484 (10hashar) ` $ systemctl status train-presync.service ● train-presync.service - Perform beginning-of-week train operations Loaded: loaded (/lib/systemd/system/train-presync.service; static)... [08:00:37] and back to normal [08:03:08] 06serviceops, 06Release-Engineering-Team: train presync failed - https://phabricator.wikimedia.org/T387823#10599570 (10hashar) Oh I can do: `scap stage-train -Dfull_image_build:True --yes auto` [08:45:19] 06serviceops, 10MW-on-K8s, 06SRE, 10Release-Engineering-Team (Priority Backlog 📥): Automated validation of mediawiki-multiversion images - https://phabricator.wikimedia.org/T288629#10599692 (10JMeybohm) 05Resolved→03Open >>! In T288629#10598113, @dancy wrote: >>>! In T288629#10582102, @JMeybohm wrote:... [08:45:49] 06serviceops, 10MW-on-K8s, 06SRE, 10Release-Engineering-Team (Priority Backlog 📥): Automated validation of mediawiki-multiversion images - https://phabricator.wikimedia.org/T288629#10599694 (10JMeybohm) a:05dancy→03None [08:50:29] 06serviceops, 06Release-Engineering-Team: train presync failed - https://phabricator.wikimedia.org/T387823#10599725 (10hashar) 05Open→03Resolved a:03hashar I cancelled the `scap stage-train` since the backport window was starting. @dcausse did a backport and a result ended up pushing the train, which... [09:25:48] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: top-level config key environments must be defined before releases in helmfile.yaml - https://phabricator.wikimedia.org/T387836 (10JMeybohm) 03NEW [09:25:55] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: top-level config key environments must be defined before releases in helmfile.yaml - https://phabricator.wikimedia.org/T387836#10599898 (10JMeybohm) [09:29:37] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Fix installed key in dependend helmfile releases - https://phabricator.wikimedia.org/T387837 (10JMeybohm) 03NEW [09:32:17] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Respect kubeVersion constraints in deployment-charts CI - https://phabricator.wikimedia.org/T387376#10599919 (10JMeybohm) 05Open→03Resolved This works now for helm charts as well as admin_ng. [09:38:30] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: top-level config key environments must be defined before releases in helmfile.yaml - https://phabricator.wikimedia.org/T387836#10599937 (10Jelto) a:03Jelto [10:31:40] James_F: ack, I'll check in with them then [10:44:51] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 07Epic: Add time jitter on TTL when invalidating caches on PCS - https://phabricator.wikimedia.org/T387472#10600109 (10Jgiannelos) a:03Jgiannelos [10:45:01] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 10Content-Transform-Team (Work In Progress), 07Epic: Add time jitter on TTL when invalidating caches on PCS - https://phabricator.wikimedia.org/T387472#10600110 (10Jgiannelos) [11:10:09] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 10Content-Transform-Team (Work In Progress), 07Epic: Add time jitter on TTL when invalidating caches on PCS - https://phabricator.wikimedia.org/T387472#10600182 (10Jgiannelos) Merge request here: https://gitlab.wikimedia.org/repos/content-trans... [11:53:53] 06serviceops, 07Datacenter-Switchover, 07User-notice: MoveComms support for March 2025 Datacentre switchover - https://phabricator.wikimedia.org/T387444#10600293 (10hnowlan) >>! In T387444#10588175, @Trizek-WMF wrote: > @hnowlan, any notable changes since the last switchover? Not too many of community sign... [12:25:56] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 10Content-Transform-Team (Work In Progress): Change changeprops rules to pre-generate/invalidate cache directly to PCS rather than in restbase - https://phabricator.wikimedia.org/T348996#10600451 (10MSantos) [12:28:30] 06serviceops, 10RESTBase Sunsetting, 07Epic: Replace usage of RESTbase parsoid endpoints - https://phabricator.wikimedia.org/T328559#10600457 (10MSantos) 05In progress→03Resolved a:03MSantos Nothing else is needed in the scope of this work. The remaining Parsoid re-routing tasks will be tracked in... [12:35:08] 06serviceops, 10Page Content Service, 10Content-Transform-Team (Work In Progress), 13Patch-For-Review: Rollout more wikis after week 1 of testing with production traffic - https://phabricator.wikimedia.org/T387277#10600508 (10MSantos) LGTM. [12:43:13] 06serviceops, 06Release-Engineering-Team: train presync failed - https://phabricator.wikimedia.org/T387823#10600532 (10hashar) From the information @jnuche gave me, when we run `scap stage-train -Dfull_image_build:True --yes auto` the full image build causes the code to be in a single layer, that is to av... [13:34:27] 06serviceops, 10MW-on-K8s, 06SRE Observability: Periodic job alerting - https://phabricator.wikimedia.org/T385709#10600672 (10fgiunchedi) >>! In T385709#10596762, @Clement_Goubert wrote: >>>! In T385709#10595920, @fgiunchedi wrote: >>>>! In T385709#10578781, @Clement_Goubert wrote: >>> Hmm so obviously it's... [13:45:08] 06serviceops, 10Page Content Service, 10Content-Transform-Team (Work In Progress), 13Patch-For-Review: Rollout more wikis after week 1 of testing with production traffic - https://phabricator.wikimedia.org/T387277#10600714 (10Jgiannelos) After talking with @Seddon its probably better if we swap ptwiki with... [14:18:47] 06serviceops, 10CirrusSearch, 06Data-Platform-SRE, 06Discovery-Search: Repartition [eqiad|codfw].cirrussearch.update_pipeline.update.v1 topics in kafka-main@[eqiad|codfw] - https://phabricator.wikimedia.org/T387863#10600872 (10dcausse) [14:19:19] 06serviceops, 10CirrusSearch, 06Data-Platform-SRE, 06Discovery-Search: Repartition [eqiad|codfw].cirrussearch.update_pipeline.update.v1 topics in kafka-main@[eqiad|codfw] - https://phabricator.wikimedia.org/T387863#10600874 (10dcausse) [14:30:51] 06serviceops, 10CirrusSearch, 06Data-Platform-SRE, 06Discovery-Search: Repartition [eqiad|codfw].cirrussearch.update_pipeline.update.v1 topics in kafka-main@[eqiad|codfw] - https://phabricator.wikimedia.org/T387863#10600929 (10dcausse) [14:55:49] 06serviceops, 07Datacenter-Switchover, 07User-notice: MoveComms support for March 2025 Datacentre switchover - https://phabricator.wikimedia.org/T387444#10601039 (10Trizek-WMF) [14:56:45] 06serviceops, 07Datacenter-Switchover, 07User-notice: MoveComms support for March 2025 Datacentre switchover - https://phabricator.wikimedia.org/T387444#10601043 (10Trizek-WMF) >>! In T387444#10600293, @hnowlan wrote: >they're not very user visible! Great! I did all the preparation scheduled for this week... [16:23:17] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 07Epic, 10Event-Platform: Add page namespace information on resource change events - https://phabricator.wikimedia.org/T387435#10601615 (10Ottomata) [16:36:11] 06serviceops, 10CirrusSearch, 06Data-Platform-SRE, 06Discovery-Search: Repartition [eqiad|codfw].cirrussearch.update_pipeline.update.v1 topics in kafka-main@[eqiad|codfw] - https://phabricator.wikimedia.org/T387863#10601689 (10Gehel) p:05Triage→03High [16:36:26] 06serviceops, 10CirrusSearch, 06Discovery-Search, 10Data-Platform-SRE (2025.03.01 - 2025.03.21): Repartition [eqiad|codfw].cirrussearch.update_pipeline.update.v1 topics in kafka-main@[eqiad|codfw] - https://phabricator.wikimedia.org/T387863#10601694 (10Gehel) [16:38:26] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes: Update wikikube-staging-codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T384450#10601697 (10Gehel) [17:38:17] 06serviceops, 13Patch-For-Review: Migrate production Shellbox variants to PHP 8.1 - https://phabricator.wikimedia.org/T377038#10602052 (10jijiki) >>! In T377038#10598675, @Scott_French wrote: > Alright, progress. > > After a bit of debugging, there are two things going on here. > > One is that we're clearly... [17:39:24] Any chance someone will be available later in the day to deploy the admin_ng part of https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1122678/5/helmfile.d/admin_ng/values/main.yaml following the process outlined in https://wikitech.wikimedia.org/wiki/Kubernetes/Remove_a_service#Deploy_changes_to_helmfile.d/admin_ng ? [17:40:17] (I or inflatador are also happy to drive but sounds like the docs want it to be someone from serviceops) [17:40:55] ^^ [17:48:27] 06serviceops, 06Release-Engineering-Team: deploy1003 reports helmfileAdminPendingChanges - https://phabricator.wikimedia.org/T387900 (10jijiki) 03NEW [18:11:15] claime: thanks for the merges on the php 8.1 & PCRE puppet patches for deployment-prep [18:35:09] 06serviceops, 06Data-Engineering, 06Data-Engineering-Radar, 10Page Content Service, and 3 others: Add page namespace information on resource change events - https://phabricator.wikimedia.org/T387435#10602213 (10Ahoelzl) [18:55:10] ryankemper: I don't want to touch prod anymore today, but I can do it my tomorrow, would that be okay? [18:57:15] kamila_ that's fine (I'm working on this w/ryan)...the endpoint (which doesn't touch k8s) works so that's good enough for now [18:59:38] ok, thanks inflatador [19:16:59] 06serviceops, 07Datacenter-Switchover, 13Patch-For-Review: Investigate burst of DBReadOnlyError during switchover test - https://phabricator.wikimedia.org/T387509#10602513 (10Krinkle) >>! In T387509#10592333, @Tgr wrote: > In theory, putting the read only datacentre in read only should be a fine thing to do.... [19:41:55] 06serviceops, 07Datacenter-Switchover, 13Patch-For-Review: Investigate burst of DBReadOnlyError during switchover test - https://phabricator.wikimedia.org/T387509#10602665 (10Krinkle) >>! @hnowlan wrote in the task description: > During the live test, there was [[ https://logstash.wikimedia.org/goto/d9067125... [20:12:39] 06serviceops: Migrate mw-cron to PHP8.1 - https://phabricator.wikimedia.org/T387916 (10jijiki) 03NEW [20:13:27] 06serviceops: Migrate mw-script to PHP 8.1 - https://phabricator.wikimedia.org/T387917 (10jijiki) 03NEW [20:13:37] 06serviceops: Migrate mw-script to PHP 8.1 - https://phabricator.wikimedia.org/T387917#10602770 (10jijiki) [20:15:59] 06serviceops, 10Scap: scap leaves mediawiki main releases in the "forward" version after a canary rollback - https://phabricator.wikimedia.org/T375497#10602782 (10dancy) @Scott_French I like your idea (and patches are always welcome!). Don't worry too much about `--stop-before-sync`. It is used for automated... [21:37:32] 06serviceops, 10MW-on-K8s: Ensure tls-proxy container is started before launching main container - https://phabricator.wikimedia.org/T387208#10603146 (10RLazarus) 05Open→03Resolved a:03Joe There's a nondeterministic element to the original bug obviously, but as far as I can tell from repeated testing... [23:56:51] 06serviceops, 06Data-Persistence (work done), 10Parsoid (Tracking): nodejs can't connect to mysqld via tcp/localhost any longer (was: mariadb failing on testreduce1001) - https://phabricator.wikimedia.org/T274034#10603543 (10Dzahn) 05Open→03Resolved a:03Dzahn Yea, it's me from a couple years later!...