[00:44:53] 06serviceops, 06Content-Transform-Team, 10Wikifeeds: Significant increase in wikifeeds latency since 2025/11/13 - https://phabricator.wikimedia.org/T410296#11381876 (10Scott_French) I happened to notice some `KubernetesDeploymentUnavailableReplicas` alert noise for mobileapps in codfw in -operations today.... [07:49:16] 06serviceops, 06Data-Engineering, 06Machine-Learning-Team: Enable ChangeProp to consume mediawiki.page_content_change.v1 - https://phabricator.wikimedia.org/T409469#11382296 (10achou) Hi, thanks all for the input. :) Due to our tight timeline, ML team has decided to move forward with Option D for now. > Tha... [08:06:17] 06serviceops, 06Data-Engineering, 06Machine-Learning-Team: Enable ChangeProp to consume mediawiki.page_content_change.v1 - https://phabricator.wikimedia.org/T409469#11382330 (10Joe) I'm very happy you're going with the option @jijiki recommended, which sounds like both the path of least resistance and the be... [11:45:57] 06serviceops, 07Epic, 06MW-Interfaces-Team (MWI-Roadmap), 07OKR-Work: Revisit backend routing for rest-gateway - https://phabricator.wikimedia.org/T401396#11383272 (10Clement_Goubert) 05Resolved→03In progress p:05Triage→03Medium Reopening for followup discussion of https://gerrit.wikimedia.org/r/c/... [11:51:48] 06serviceops, 06Content-Transform-Team, 10Wikifeeds: Significant increase in wikifeeds latency since 2025/11/13 - https://phabricator.wikimedia.org/T410296#11383292 (10hnowlan) >>! In T410296#11380452, @ssastry wrote: > [[ https://grafana.wikimedia.org/d/8169987e-2ef2-4bf2-ba85-eefad1edbefa/rest-gateway-per-... [12:34:58] 06serviceops, 07sre-alert-triage: Alert in need of triage: KubernetesWorkerUnschedulable - https://phabricator.wikimedia.org/T400969#11383492 (10Clement_Goubert) Silencing for 3 months. [12:49:29] 06serviceops, 10decommission-hardware, 13Patch-For-Review: decommission wikikube-worker[2003-2004,2007-2010,2019-2032,2040,2043,2045,2048].codfw.wmnet - https://phabricator.wikimedia.org/T409102#11383522 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=cf6542b8-b3f4-4883-99ac-108de7724557)... [12:53:22] 06serviceops, 10decommission-hardware, 13Patch-For-Review: decommission wikikube-worker[2003-2004,2007-2010,2019-2032,2040,2043,2045,2048].codfw.wmnet - https://phabricator.wikimedia.org/T409102#11383525 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=c33e05b1-9496-4424-a3a5-4d9922830c62)... [13:15:17] 06serviceops, 06Data-Engineering, 06Data-Engineering-Radar, 10Observability-Logging, and 2 others: Fix Kafka replicas skew - https://phabricator.wikimedia.org/T407185#11383548 (10Clement_Goubert) The `kafka-main` rebalance question is now pretty critical to figure out. One of the broker's certificates is e... [13:30:20] 06serviceops, 06MediaWiki-Platform-Team, 07OKR-Work: api rate limiting: Assign ratelimit class based on IP range - https://phabricator.wikimedia.org/T410273#11383568 (10Clement_Goubert) Hmm, we should probably also figure out a way to route these to `mw-api-int` instead of `mw-api-ext` somehow. I have to thi... [13:35:14] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11383609 (10Clement_Goubert) >>! In T405950#11379425, @RobH wrote: > @Clement_Goubert, > > Is it possible that I could send the commands for this or do we need someone in your... [13:41:17] 06serviceops, 06Content-Transform-Team, 10Wikifeeds: Significant increase in wikifeeds latency since 2025/11/13 - https://phabricator.wikimedia.org/T410296#11383642 (10Dbrant) Very curious - I don't think we've made any changes to either wikifeeds or mobileapps that directly affect the type(s) or quantities... [14:18:48] 06serviceops, 06MediaWiki-Platform-Team, 07OKR-Work: Determine the source of internal requests going through the API gateway. - https://phabricator.wikimedia.org/T410198#11383772 (10Clement_Goubert) According to the [[ https://logstash.wikimedia.org/goto/6cd5f9eacd02a78f846e4f83f84ed1f2 | rest-gateway logs ]... [15:48:13] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11384205 (10RobH) Awesome! We're also moving dns1006 at the same time (we'll move it first while k8 hosts drain) and then we'll move onto moving these! I'll ping you in about... [15:59:05] 06serviceops, 10RESTBase Sunsetting, 07Essential-Work, 06MW-Interfaces-Team (MWI-Sprint-23 (2025-11-18 to 2025-12-02)), 13Patch-For-Review: Reroute /api/rest_v1 documentation to REST Sandbox - https://phabricator.wikimedia.org/T396807#11384275 (10HCoplin-WMF) [15:59:22] 06serviceops, 10DiscussionTools, 10MediaWiki-REST-API, 10RESTBase Sunsetting, and 2 others: Reroute RESTBase /page/lint/ endpoints to MediaWiki REST endpoints - https://phabricator.wikimedia.org/T384216#11384277 (10HCoplin-WMF) [15:59:28] 06serviceops, 06Content-Transform-Team, 10Wikifeeds, 13Patch-For-Review: Significant increase in wikifeeds latency since 2025/11/13 - https://phabricator.wikimedia.org/T410296#11384279 (10hnowlan) Just as a datapoint - I roll-restarted mobileapps and it had an immediate impact on wikifeeds: https://grafana... [16:04:53] 06serviceops, 06Data-Engineering, 06Data-Engineering-Radar, 10Observability-Logging, and 2 others: Fix Kafka replicas skew - https://phabricator.wikimedia.org/T407185#11384357 (10Clement_Goubert) ` Validating broker list: Broker 1003 does not have a rack.id defined Broker 1001 does not have a rack.id d... [16:10:28] 06serviceops, 06Content-Transform-Team, 10Wikifeeds, 06Wikipedia-Android-App-Backlog, 13Patch-For-Review: Significant increase in wikifeeds latency since 2025/11/13 - https://phabricator.wikimedia.org/T410296#11384399 (10Dbrant) [16:19:02] 06serviceops, 06Data-Engineering, 06Data-Engineering-Radar, 10Observability-Logging, and 2 others: Fix Kafka replicas skew - https://phabricator.wikimedia.org/T407185#11384441 (10brouberol) As stated to @Clement_Goubert on IRC: feel free to RR the cluster in its current state. Once the certificates are ren... [16:25:45] topranks: you around? I have something weird happening to kubestagemaster1004 since it's been port-moved in eqiad row D. Its lldp facts don't have the vlans array anymore [17:13:11] 06serviceops, 06Content-Transform-Team, 10Wikifeeds, 06Wikipedia-Android-App-Backlog (Android Release - FY2025-26): Significant increase in wikifeeds latency since 2025/11/13 - https://phabricator.wikimedia.org/T410296#11384817 (10Dbrant) [17:14:50] claime: sorry I was in a meeting [17:15:08] topranks: 's all right, it's staging [17:15:20] Just worried it may happen for wikikube workers/ctrl [17:15:51] so what is the difference, the info from the host side or the switch side? [17:16:37] kubestagemaster1004 is a VM actually.... [17:17:56] right now it's showing the host as it's LLDP neighbor, which makes sense to me I guess [17:18:12] I've never really looked into what LLDP does over these virtual interfaces though [17:18:36] wikikube workers/ctrl are all bare metal right? [17:18:50] there may also be something different with those, but scenario is different [17:20:18] aaaah it's a VM, but why is it going in that branch then [17:20:32] topranks: see modules/profile/manifests/kubernetes/node.pp L141 [17:26:26] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11384918 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by cgoubert@cumin1003 depool for host wikikube-worker1016.eqiad.wmnet completed: - wikikube-w... [17:38:49] 06serviceops, 06MediaWiki-Platform-Team, 07OKR-Work: Determine the source of internal requests going through the API gateway. - https://phabricator.wikimedia.org/T410198#11384980 (10Tgr) >>! In T410198#11383772, @Clement_Goubert wrote: > MediaWiki probably has a similar call somewhere. In [[https://gerrit.w... [17:40:18] claime: I suspect we could replace that line with: [17:40:20] if $facts['networking']['interface'][$facts['interface_primary']]['netmask'] == "255.255.252.0": [17:40:46] topranks: yeah that's what I'm writing more or less [17:41:02] topranks: that holds for codfw as well right? [17:41:05] ok thank you <3 [17:41:13] yes [17:41:18] works for codfw also [17:41:19] Great [17:45:53] topranks: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1206929 [17:47:10] lgtm, just gonna wait see with puppet compiler shows [17:47:16] yup [17:48:31] and yep the 'parent' is still there in lldp, makes sense to verify bare metal based on that I think [17:49:43] actually the 'parent' is there for the VM too [17:49:44] https://phabricator.wikimedia.org/P85369 [17:53:11] topranks: I'm gonna disable puppet fleet-wide on wikikube, merge this, run it on a couple normal nodes to confirm nothing's screwed up [17:53:23] Then we can see if me move forwards [17:53:35] ok yep that makes sense to me +1 [17:57:40] Hey, I'm trying to build a new wmf-debci and I can't invoke the correct docker-pkg command. I'm following https://wikitech.wikimedia.org/wiki/Kubernetes/Images#Production_images on build2001 - I can't get it to select wmf-debci at all, it returns the selection as blank. [17:57:42] Removing the --select arg only has it claiming it would build a few images too. [17:58:06] https://paste.debian.net/plainh/c0b5fbe7 [17:58:22] topranks: It's not exactly gonna give the same result on VMs as it used to, since it used to just say ganeti-somethingsomething [17:58:28] but whatever, it'll be fine for now [17:58:43] I'm more worried about the loss of fidelity of the is_virtual fact [17:58:47] It's no longer a FACT [17:59:20] cmooney@kubestagemaster1004:~$ sudo puppet facts | jq .is_virtual [17:59:20] true [17:59:37] I'm slightly confused as to how anything on the LLDP changed for the VMs, that's confirmed right? [17:59:41] topranks: wtf [17:59:42] cgoubert@kubestage1004:~$ sudo facter -p is_virtual [17:59:44] false [17:59:51] That was like 4 minutes ago lmao [17:59:53] ok......... [18:00:33] Oh wait, facter -p vs puppet facts [18:00:51] oh wait... I'm on kubestagemaster1004 [18:00:55] not kubestage1004 [18:00:59] Oh wait my bad [18:01:01] That's my fault [18:01:25] ok ran puppet on one codfw node, noop [18:01:48] Running on the broken node in eqiad [18:02:07] and kubestage1004 is not a VM [18:02:20] sorry.... sorry....... [18:02:21] yeah yeah I just ssh'd to the wrong host [18:03:21] wikikube-worker1016 unbroken [18:03:28] awesome [18:20:55] 06serviceops, 13Patch-For-Review: Migrate the etcd main cluster to cfssl-based PKI - https://phabricator.wikimedia.org/T352245#11385211 (10Scott_French) Alright, that went quite smoothly, if somewhat involved. Some observations: **Leadership transfer**: By the time we started work on this today, conf2005 had... [18:50:41] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385387 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 depool for host wikikube-worker1063.eqiad.wmnet completed: - wikikube-worke... [18:53:11] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385407 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 pool for host wikikube-worker1063.eqiad.wmnet completed: - wikikube-worker1... [18:54:30] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385413 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 depool for host wikikube-worker1305.eqiad.wmnet completed: - wikikube-worke... [18:55:18] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385420 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 depool for host wikikube-worker1313.eqiad.wmnet completed: - wikikube-worke... [18:56:04] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385422 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 depool for host wikikube-worker1157.eqiad.wmnet completed: - wikikube-worke... [19:01:23] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385434 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 pool for host wikikube-worker1305.eqiad.wmnet completed: - wikikube-worker1... [19:03:14] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385458 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 pool for host wikikube-worker1313.eqiad.wmnet completed: - wikikube-worker1... [19:03:45] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385461 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 depool for host wikikube-worker1157.eqiad.wmnet completed: - wikikube-worke... [19:04:51] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385466 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 pool for host wikikube-worker1157.eqiad.wmnet completed: - wikikube-worker1... [19:05:47] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385475 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 pool for host wikikube-worker[1254-1256].eqiad.wmnet completed: - wikikube-... [19:06:28] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385477 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 pool for host wikikube-worker1306.eqiad.wmnet completed: - wikikube-worker1... [19:07:17] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385481 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 depool for host wikikube-worker1306.eqiad.wmnet completed: - wikikube-worke... [19:09:11] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385498 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 depool for host wikikube-worker[1254-1256].eqiad.wmnet completed: - wikikub... [19:25:37] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385581 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 pool for host wikikube-worker[1254-1256].eqiad.wmnet completed: - wikikube-... [19:25:50] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385582 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 pool for host wikikube-worker1306.eqiad.wmnet completed: - wikikube-worker1... [19:44:52] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385707 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 depool for host wikikube-worker1016.eqiad.wmnet completed: - wikikube-worke... [19:46:11] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385710 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by robh@cumin2002 pool for host wikikube-worker1016.eqiad.wmnet completed: - wikikube-worker1... [19:48:22] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11385733 (10RobH) a:05Clement_Goubert→03RobH IRC Discussion Update: We moved about half the wikikube workers today after a sync up with Clement and Cathal on the particular... [20:41:35] 06serviceops, 06collaboration-services, 10MW-on-K8s, 06SRE: Use encrypted rsync for releases - https://phabricator.wikimedia.org/T289858#11386004 (10Dzahn) [20:42:39] 06serviceops, 06collaboration-services, 10MW-on-K8s, 06SRE: Use encrypted rsync for releases - https://phabricator.wikimedia.org/T289858#11386020 (10Dzahn) Tagging collab; since we are probably the ones who need to get back to this nowadays. Sorry for the delay; slipped off the radar. [21:25:38] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11386182 (10Scott_French) @RLazarus and I were looking into verifying workloads returning to the migrated workers, and ran into a few surprises. Going by what's been marked com... [21:47:35] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11386255 (10RobH) >>! In T405950#11386181, @Scott_French wrote: > @RLazarus and I were looking into verifying workloads returning to the migrated workers, and ran into a few sur... [21:56:31] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11386300 (10Scott_French) Got it - thanks for clarifying, @RobH! Alright, in that case, let us know if you'd like a second pair of eyes on anything ahead of the next wave of mig... [22:32:15] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11386421 (10RobH) I think it'll be ok when we move things tomorrow, since I know exactly the mistake I made I don't think I'll make it again for a few months minimum ; D The cu...