[05:31:27] 06serviceops, 10CXServer, 10LPL Projects (Other), 07Unplanned-Sprint-Work: cxserver: Remove Yandex MT key from production - https://phabricator.wikimedia.org/T408138#11399287 (10KartikMistry) Possible to do the removal of the key this week? @jijiki [09:05:21] 06serviceops, 06Data-Engineering, 06Data-Engineering-Radar, 10Observability-Logging, and 2 others: Fix Kafka replicas skew - https://phabricator.wikimedia.org/T407185#11399489 (10brouberol) I've started rebalancing `kafka-logging-eqiad` with the following overall plan: ` Storage free change estimations:... [09:34:46] 06serviceops, 07sre-alert-triage: Alert in need of triage: HelmfileAdminNGPendingChanges (instance deploy1003:9100) - https://phabricator.wikimedia.org/T410858 (10LSobanski) 03NEW [09:34:56] 06serviceops, 07sre-alert-triage: Alert in need of triage: HelmfileAdminNGPendingChanges (instance deploy1003:9100) - https://phabricator.wikimedia.org/T410858#11399550 (10LSobanski) Also eqiad-staging and codfw-staging. [09:43:54] 06serviceops, 07sre-alert-triage: Alert in need of triage: HelmfileAdminNGPendingChanges (instance deploy1003:9100) - https://phabricator.wikimedia.org/T410858#11399655 (10Clement_Goubert) 05Open→03Resolved a:03Clement_Goubert [11:21:19] 06serviceops, 06Content-Transform-Team, 10Wikifeeds, 06Wikipedia-Android-App-Backlog: Significant increase in wikifeeds latency and mobileapps error rate since 2025/11/13 - https://phabricator.wikimedia.org/T410296#11400066 (10Jgiannelos) Would it make sense to try to upgrade from node 18 to 20 to see if t... [11:27:46] 06serviceops, 06Data-Engineering, 06Data-Engineering-Radar, 10Observability-Logging, and 2 others: Fix Kafka replicas skew - https://phabricator.wikimedia.org/T407185#11400092 (10brouberol) {F70572765} All done [11:29:44] 06serviceops, 06Data-Engineering, 06Data-Engineering-Radar, 10Observability-Logging, and 2 others: Fix Kafka replicas skew - https://phabricator.wikimedia.org/T407185#11400103 (10brouberol) Because kafka-logging has heterogeneous partition sizes, the leadership count is expected, as it leads to traffic and... [13:33:26] hello everyone there's a pending changeprop change (https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1210505) after it's merged i'd like to apply it, I don't foresee any issues so this is just a FYI. yell if I shouldn't :) [13:56:26] the change is applied [14:50:36] 06serviceops, 06Data-Engineering, 06Data-Engineering-Radar, 10Observability-Logging, and 2 others: Fix Kafka replicas skew - https://phabricator.wikimedia.org/T407185#11400705 (10brouberol) a:05brouberol→03Clement_Goubert [15:08:37] 👋 who would be the PoC for thumbor stuff? [15:12:01] vgutierrez: I *was* at least, anything I can help with? At the very least I can inflict I mean triage upon someone else [15:12:02] hnowlan: it looks like you've written thumbor's haproxy config [15:12:16] we need to reduce the queue timeout [15:13:00] the current 10s is kinda dangerous during attacks [15:15:19] sgtm, any recommendations for what to drop it to? 5s to start? [15:15:41] 2s? :) [15:16:57] also if I'm reading this correctly... for a single request we could have 10 seconds of queuing... 2 seconds of connect timeout to a server + response time on the retry or another 2 seconds of connect timeout, right? [15:17:52] 06serviceops, 07Epic, 06MediaWiki-Platform-Team (Kanban Board): Migrate Wikimedia production from PHP 8.1 to PHP 8.3 - https://phabricator.wikimedia.org/T360995#11400830 (10Krinkle) [15:18:19] 1 second of connect timeout sorry [15:18:35] so 10 + 1 + response time or 10 + 1 +1 till a 503 [15:21:15] 06serviceops, 07Epic, 06MediaWiki-Platform-Team (Kanban Board): Migrate Wikimedia production from PHP 8.1 to PHP 8.3 - https://phabricator.wikimedia.org/T360995#11400854 (10Krinkle) [15:21:37] hnowlan: happy to submit the 2 seconds CR and we can discuss there / involve whoever you consider necessary [15:22:13] vgutierrez: sounds good. and yeah I think your figures are correct [15:28:48] hnowlan: cool, https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1210611 [15:29:45] 06serviceops, 07Epic, 06MediaWiki-Platform-Team (Kanban Board): Migrate Wikimedia production from PHP 8.1 to PHP 8.3 - https://phabricator.wikimedia.org/T360995#11400951 (10Krinkle) [15:34:27] hnowlan: what's the procedure in terms of getting that merged and deployed? [15:35:04] vgutierrez: it's the standard k8s service workflow, I can +2 it and deploy it for you if you'd like [15:35:12] hnowlan: thx <3 [15:35:23] BTW this is the kind of thing we want to avoid: https://grafana.wikimedia.org/goto/iHDdR3iDg?orgId=1 [15:38:26] yeah, fair enough [15:39:47] Honestly the queues are fairly ineffectual in general. I wouldn't want to not have them, but we're either at a veerrrry low queue level or we're close to down. [15:40:03] the worker model needs to be overhauled [15:40:15] I'll deploy once the backport window is done [15:40:55] thx [16:06:28] 06serviceops, 06Content-Transform-Team, 10Wikifeeds, 06Wikipedia-Android-App-Backlog: Significant increase in wikifeeds latency and mobileapps error rate since 2025/11/13 - https://phabricator.wikimedia.org/T410296#11401188 (10Scott_French) @Jgiannelos - Yes, I think that would be highly informative. I sus... [16:25:43] vgutierrez: heh, oops - that value is overridden in helmfile.d :/ it's not 10s, it's 30s! which is a carry-over from the metal days. I've filed fixes to bring it to 10s and then 2s [16:26:02] ouch [16:28:26] 06serviceops, 06collaboration-services, 10MW-on-K8s, 06SRE: Use encrypted rsync for releases - https://phabricator.wikimedia.org/T289858#11401282 (10LSobanski) p:05Medium→03Low [17:37:39] 06serviceops, 06MW-Interfaces-Team, 06MediaWiki-Platform-Team (Kanban Board), 07OKR-Work: api-gateway chart: support rate limits for multiple time units - https://phabricator.wikimedia.org/T408132#11401849 (10daniel) 05Open→03Resolved a:03daniel Deployed and tested. We still only use per-hour lim... [18:28:52] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11402180 (10RobH) conf1009 migrated, @brouberol: Please provide feedback on migration of wikikube-ctrl1003 and kafka-main1008 as these are the last #serviceops hosts to migrate... [18:49:52] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11402364 (10RobH) >>! In T405950#11402180, @RobH wrote: > conf1009 migrated, > > @brouberol: Please provide feedback on migration of wikikube-ctrl1003 and kafka-main1008 as the... [18:52:52] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11402389 (10RobH) IRC Echo Update (chatting with Scott in irc about this just echoing to task for history): * We want to get feedback from @brouberol on migration of kafka-main... [19:55:19] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11402695 (10RobH) I've chatted with @brouberol via IRC: > 11:50 kafka hosts can be shut down / disconnected from the network, but not more than one at a time, to b... [20:57:59] 06serviceops, 13Patch-For-Review: MediaWiki on PHP 8.3 production workload migration - https://phabricator.wikimedia.org/T405955#11402905 (10Scott_French) Both deployment hosts have now had their local PHP CLI installations migrated to 8.3: ` swfrench@deploy2002:~$ php -v PHP 8.3.26 (cli) (built: Oct 10 2025... [20:58:15] 06serviceops, 13Patch-For-Review: MediaWiki on PHP 8.3 production workload migration - https://phabricator.wikimedia.org/T405955#11402908 (10Scott_French)