[03:05:27] 06serviceops, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, and 2 others: Switchover plan from restbase to api gateway for Citoid - https://phabricator.wikimedia.org/T361576#10627058 (10Ryasmeen) [03:38:34] 06serviceops, 10Image-Suggestions, 10Structured Data Engineering, 06Structured-Data-Backlog: Migrate data-engineering jobs to mw-cron - https://phabricator.wikimedia.org/T388537#10627082 (10Ottomata) [07:38:18] 06serviceops, 10Shellbox, 10SyntaxHighlight, 13Patch-For-Review, 07Wikimedia-production-error: Shellbox bubbles GuzzleHttp\Exception\ConnectException when it should probably wrap it in a ShellboxError? - https://phabricator.wikimedia.org/T374117#10627201 (10hashar) [07:59:18] hnowlan: o/ [07:59:47] There are some patches lined up to move changeprop to node20 and librdkafka 2.3 (we have node18 and node 2.2 now) [07:59:54] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1126215 (+nexts) [08:00:35] I am going to double check but this time we didn't see bumps in memory/cpu usage in staging, so in theory I don't expect any fireword [08:00:38] *firework [08:00:50] but there is the switchover lined up so this may be postponed [08:01:21] lemme know your preference - I can deploy changeprop and changeprop-jobqueue eqiad today in case, and complete the rollout by end of week [08:01:36] or we can postpone to after the switchover week, safer probably [08:18:10] 06serviceops, 10CX-cxserver, 10LPL Essential (LPL Essential 2025 Feb-Mar), 13Patch-For-Review, 07Technical-Debt: Use openapi compliant examples in swagger spec - https://phabricator.wikimedia.org/T382294#10627292 (10Nikerabbit) Please create a new task for the remaining work so that this can be resolved. [08:52:09] 06serviceops, 10MediaWiki-extensions-CentralAuth, 10MW-on-K8s, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review: Missing backfill_localaccounts periodic jobs - https://phabricator.wikimedia.org/T388564#10627387 (10ArielGlenn) >>! In T388564#10624852, @Clement_Goubert wrote: > @ArielGlenn I've create... [09:02:16] elukey: This is already apparently being handled by MW teams per https://phabricator.wikimedia.org/T381588 [09:02:38] akosiaris: yep I am helping them :D [09:02:46] which is the linked task in the change, I expected to see you on the task subscribed too [09:02:52] 🤦 [09:03:00] ma bad disregard [09:03:31] nono it was a good hint, when they reached out saying "we'd like to upgrade changeprop" I almost cried [09:04:01] didn't expect it so really glad about it :) [09:04:19] lol, why did they reach out to you specifically though? [09:04:45] the usual curse, git log [09:04:55] I upgraded the last time :D [09:05:00] lol [09:05:13] it was way more brutal, node10 to node18 + librdkafka etc.. [09:05:28] but if we keep upgrading in small steps I hope it will get better [09:06:21] yes, that's the hope [09:06:42] we 'll see how that pans out. There is no nodejs upgrade slated for APP next year, unlike this year. [09:07:12] but then again, node20 is going to be ok until 30 Apr 2026 [09:13:12] 06serviceops, 06Infrastructure-Foundations, 10Maps (Kartotherian), 13Patch-For-Review: Scale up Kartotherian on Wikikube and move live traffic to it - https://phabricator.wikimedia.org/T386926#10627585 (10elukey) @Jgiannelos I have three things to propose: 1) Try to use jemalloc (see above patch) via LD_P... [09:14:01] 06serviceops, 06Content-Transform-Team, 07Epic, 10Maps (Kartotherian): Move Kartotherian to Kubernetes - https://phabricator.wikimedia.org/T216826#10627588 (10elukey) Status: Kartotherian runs on k8s now! We are still investigating a slow memory leak in T386926, so we are not totally done. [09:55:53] elukey: I'd say go ahead and see how we do [09:55:57] thanks for checking in though [10:45:02] ack! I sadly found out that the deploy to staging brought a bit more cpu/memory usage [10:45:05] https://grafana.wikimedia.org/d/000300/change-propagation?orgId=1&var-dc=eqiad%20prometheus%2Fk8s-staging&from=1741622567919&to=1741697593357 [10:45:09] (see saturation graphs) [10:45:28] it is similar to what happened the last time, I think that bumping librdkafka causes this for some reason [10:45:45] (I don't think it is a viz issue due to avg/max being used) [10:48:20] 06serviceops, 10MediaWiki-extensions-CentralAuth, 10MW-on-K8s, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review: Missing backfill_localaccounts periodic jobs - https://phabricator.wikimedia.org/T388564#10627838 (10Clement_Goubert) 05In progress→03Resolved Jobs are now deployed on the mainten... [10:49:15] not *enormous* jumps though in the grand scheme of things [10:49:47] interesting that it comes with increased network traffic also, similar jump. has a poll rate increased? [10:51:57] not that I know, but maybe librdkafka 2.2 -> 2.3 causes this? (plus I imagine noderdkafka changes) [12:35:11] 06serviceops, 13Patch-For-Review: MediaWiki on PHP 8.1 production traffic ramp-up - https://phabricator.wikimedia.org/T383845#10628235 (10TheDJ) ping @jijiki as scap deployer [12:38:22] 06serviceops, 13Patch-For-Review: MediaWiki on PHP 8.1 production traffic ramp-up - https://phabricator.wikimedia.org/T383845#10628252 (10Clement_Goubert) >>! In T383845#10628231, @TheDJ wrote: > ping @jijiki as scap deployer for the possible change that kicked this error rate of T388659 up This is an unrelat... [13:03:18] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 07Code-Health-Objective, 07Epic: Move PCS endpoints behind API Gateway - https://phabricator.wikimedia.org/T264670#10628342 (10MSantos) [13:04:50] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 07Code-Health-Objective, 07Epic: Move PCS endpoints behind API Gateway - https://phabricator.wikimedia.org/T264670#10628350 (10MSantos) [13:07:47] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 07Code-Health-Objective, 07Epic: Move PCS endpoints behind API Gateway - https://phabricator.wikimedia.org/T264670#10628355 (10MSantos) p:05Low→03High [13:08:41] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 07Code-Health-Objective, and 2 others: Move PCS endpoints behind API Gateway - https://phabricator.wikimedia.org/T264670#10628357 (10MSantos) [14:24:49] 06serviceops, 10Deployments, 10Shellbox, 10Wikibase-Quality-Constraints, and 4 others: Burst of GuzzleHttp Exception for http://localhost:6025/call/constraint-regex-checker - https://phabricator.wikimedia.org/T371633#10628743 (10Lucas_Werkmeister_WMDE) [14:26:55] 06serviceops, 10Deployments, 10Shellbox, 10Wikibase-Quality-Constraints, and 4 others: Burst of GuzzleHttp Exception for http://localhost:6025/call/constraint-regex-checker - https://phabricator.wikimedia.org/T371633#10628754 (10karapayneWMDE) To do: Update the gerrit change to catch the ClientExceptionInt... [14:27:09] 06serviceops, 10Deployments, 10Shellbox, 10Wikibase-Quality-Constraints, and 5 others: Burst of GuzzleHttp Exception for http://localhost:6025/call/constraint-regex-checker - https://phabricator.wikimedia.org/T371633#10628756 (10karapayneWMDE) [14:34:47] 06serviceops, 06Infrastructure-Foundations, 10Maps (Kartotherian): Scale up Kartotherian on Wikikube and move live traffic to it - https://phabricator.wikimedia.org/T386926#10628841 (10elukey) Deployed the jemalloc change to staging, and verified that jemalloc's so is loaded: ` elukey@kubestage1005:~$ sudo...