[10:24:46] 06serviceops, 10Cassandra, 06Growth-Team, 10MW-1.45-notes (1.45.0-wmf.15; 2025-08-19), 13Patch-For-Review: mediawiki: migrate from image-suggestion to data-gateway - https://phabricator.wikimedia.org/T368096#11093511 (10Michael) Looking at our longer term KPI dashboard that records actual user interactio... [11:50:28] 06serviceops, 10Abstract Wikipedia team (26Q1 (Jul–Sep)), 07Essential-Work: Cannot deploy function-orchestrator in staging environment due to insufficient quotas - https://phabricator.wikimedia.org/T401833#11093746 (10Jdforrester-WMF) 05Open→03Resolved a:03akosiaris Confirm this is now fixed via a... [12:01:39] 06serviceops: Puppet CA certificate push-notifications.discovery.wmnet is about to expire - https://phabricator.wikimedia.org/T402183 (10MoritzMuehlenhoff) 03NEW [12:11:13] nemo-yiannis: I was looking at the apache logs of mw-parsoid because I was wondering if we could start tearing the deployment down (it alerts sometimes, mostly because of the very low traffic which means any long-ish request skews pXX), and there seems to be a couple of routes still hitting there (see https://logstash.wikimedia.org/goto/dd5271cef6df17ea3cf2d0fcc2383fab ), is that normal in your [12:11:15] opinion or should I keep digging? [12:12:01] Actually not a couple, just /transform/wikitext/to/lint afaict [12:20:33] I think there are still some pending /transform/ endpoints that are still to be migrated to core [12:20:37] There is work in progress [12:21:11] https://phabricator.wikimedia.org/T388401 [12:21:26] If i am not mistaken this is the last thing pending [12:22:32] 06serviceops: Puppet CA certificate push-notifications.discovery.wmnet is about to expire - https://phabricator.wikimedia.org/T402183#11093825 (10Clement_Goubert) Replaced by a cert-manager cfssl based cert in kubernets, cleaning up. [12:24:22] 06serviceops: Puppet CA certificate push-notifications.discovery.wmnet is about to expire - https://phabricator.wikimedia.org/T402183#11093829 (10Clement_Goubert) 05Open→03Resolved p:05Triage→03Medium a:03Clement_Goubert [12:26:45] nemo-yiannis: Cool, thanks for the link. It's probably not worth 25 replicas to serve these last few endpoints though, so we could probably scale down further, what do you think? [12:26:55] yes probably [12:27:02] i think the endpoint gets minimal traffic [12:27:54] Excluding healthchecks, mw-parsoid gets about 1000 requests per week over the last two weeks [12:28:09] so yeah very minimal [12:28:32] Sampled traffic shows 0.6K wikitext/to/lint requests the past 7 days [12:29:00] agreed [13:16:13] claime: i think the ticket was just closed [13:17:14] but i assume it also needs routing changes [14:24:32] nemo-yiannis: yeah we would need to add the proper routing to the gateway-check config, like we did for /api/rest_v1/transform/wikitext/to/html [14:24:48] so it gets passed to the the rest-gateway, which has the right configuration already [14:40:34] nemo-yiannis: looking at https://phabricator.wikimedia.org/T385066 it seems internal functional tests are next, and then we'll start rerouting [15:29:22] 06serviceops, 10Page Content Service, 10Content-Transform-Team (Work In Progress): mobileapps is comparatively slower to handle changeprop events - https://phabricator.wikimedia.org/T397750#11094689 (10Jgiannelos) This should be fixed by now after T397072 [15:29:30] 06serviceops, 10Page Content Service, 10Content-Transform-Team (Work In Progress): mobileapps is comparatively slower to handle changeprop events - https://phabricator.wikimedia.org/T397750#11094690 (10Jgiannelos) 05Open→03Resolved a:03Jgiannelos [15:32:45] T397072 [15:33:07] sigh, wrong channel [15:53:28] 06serviceops, 10Page Content Service, 10Content-Transform-Team (Work In Progress): mobileapps is comparatively slower to handle changeprop events - https://phabricator.wikimedia.org/T397750#11094776 (10hnowlan) fwiw I think that many of the performance issues here are still unresolved. I'm happy to resol... [16:18:08] 06serviceops, 10Page Content Service, 10Content-Transform-Team (Work In Progress): mobileapps is comparatively slower to handle changeprop events - https://phabricator.wikimedia.org/T397750#11094897 (10Jgiannelos) From what I understand from the execution times diagram it looks like the numbers have drop... [17:09:23] 06serviceops, 10Page Content Service, 10Content-Transform-Team (Work In Progress): mobileapps is comparatively slower to handle changeprop events - https://phabricator.wikimedia.org/T397750#11095170 (10Jgiannelos) I assume the main concern is the behaviour identified in https://phabricator.wikimedia.org/... [21:55:02] 06serviceops, 10Scap, 13Patch-For-Review: Provide MediaWiki app image PHP version in helm values - https://phabricator.wikimedia.org/T401721#11096229 (10Scott_French) In short, not all deployers are able to invoke raw docker commands (i.e., in this case, `docker image inspect`), so the current approach that... [23:31:56] 06serviceops, 06SRE: deploy1003 running out of disk space - https://phabricator.wikimedia.org/T401647#11096500 (10Ladsgroup) We ran into this again tonight. This seems to be the biggest problem: ` root@deploy1003:/srv/homedirs# du -hs mwmaint* 14G mwmaint1002 13G mwmaint2002 ` [23:33:10] 06serviceops, 06SRE: deploy1003 running out of disk space - https://phabricator.wikimedia.org/T401647#11096508 (10Ladsgroup) Top offenders: ` 1293600 tstarling 1774084 oblivian 1781788 samtar 2098668 cparle ` [23:33:15] 06serviceops, 06SRE: deploy1003 running out of disk space - https://phabricator.wikimedia.org/T401647#11096509 (10Ladsgroup) 05Resolved→03Open [23:35:14] 06serviceops, 06SRE: deploy1003 running out of disk space - https://phabricator.wikimedia.org/T401647#11096512 (10dancy) I freed a ton of space by running `scap clean-images`. Note that currently you must be a member of the `docker` group to successfully run this command. [23:42:22] 06serviceops, 06SRE: deploy1003 running out of disk space - https://phabricator.wikimedia.org/T401647#11096519 (10dancy) Noting that we're building 8.1 and 8.3 multiversion images now, and the single "next" version images too. And there have been a few separate adjustments made to the base images recently, so... [23:45:51] 06serviceops, 06SRE: deploy1003 running out of disk space - https://phabricator.wikimedia.org/T401647#11096527 (10Scott_French) I now wonder if this has also been exacerbated by {T402212}, which would have resulted in unnecessary full rebuilds. This should be better as of today, thanks to the workarounds @danc...