[12:08:03] hello everyone, I just caught up with the conversation, sorry that you now have a bug to fix, but glad to have somehow contributed. I can confirm the endpoint has always worked for me, it was just the Swagger spec that went missing. I guess I can keep using it without worrying then. Thank you very much! [12:22:17] sandrello: you're welcome! [15:19:00] Krinkle: I think this may be T348254, there was a rollout of parsoid to various wiktionaries around Feb 26-27 [15:19:01] T348254: Add ParserOutput::getHtmlHolder() - https://phabricator.wikimedia.org/T348254 [15:19:32] large pages on mwdebug demonstrate the problem quite well, e.g. https://performance.wikimedia.org/excimer/profile/317c87eee175962f [15:22:20] I might just file a separate task for an interim fix, as post-cache transforms currently take up 5.3% or so of index.php wall time [15:29:32] Filed as T394059 [15:29:32] T394059: Post-cache output transforms are expensive on large pages - https://phabricator.wikimedia.org/T394059 [15:53:35] fyi: fixing this pipeline performance issue is the thing that we're working on... right now (as in: it was one of the topics of our tech meeting just now) [15:53:52] thanks mszabo for the patch, that's super helpful :) [15:54:29] Yeah, I just came across your task while digging and hesitated whether to file a new one, but figured it might be worth it since this is currently having an impact on non-parsoid projects too [15:56:27] that is vaguely more disturbing though, because it kind of "shouldn't" - the only thing we've done (theoretically) for these was a refactor that ""should"" not have had a major impact. in particular, as i remember it, the dedupe code was probably taken as is from somewhere else :think [15:57:00] perhaps PHP 8.1 caused a performance regression within remex [15:57:27] but that wouldn't explain the large discrepancy between desktop and mobile [15:57:57] on large pages on mwdebug, the difference is 1:2 as expected (1 transform vs 2), but the graph ori shared shows a significantly larger uptick from late Feb onwards [16:00:47] (i just opened that page) yikes. [16:06:51] aha! [16:07:04] on that week we have deployed https://gerrit.wikimedia.org/r/c/mediawiki/core/+/989259 [16:07:25] which would be a strong contender if deduplicate styles is indeed iffy [16:11:27] oh that'd probably do it :D [16:12:49] hmm, looking at the context from https://gerrit.wikimedia.org/r/c/mediawiki/core/+/977814 - I suppose this might bring that problem back [16:13:02] so maybe we should limit the preg_replace_callback()'s scope to non-parsoid transforms [16:13:39] then we shouldn't mangle stray style tags in data-parsoid attributes and the impact of tokenizing the whole html for parsoid transforms should be addressed by your tasks [16:25:10] on the other hand your patch *limits* thee nodes on which it's executed but it doesn't make it execute on more nodes than it would have, right? [16:26:19] yea but Scott's original patch says "The previous implementation was using an ad-hoc regular expression whichwas matching inside the data-mw attribute of Parsoid output, eg: ..." [16:26:27] so that might still cause issues [16:26:36] yeah, but it was also doing the replace with regexps [16:26:41] so it was overreaching [16:27:02] since you're still using remex in there, there's a fair chance it's still fine [16:27:22] the optimization still does that too since we replace the element with the output of its transformed version, so it'd still end up containing unencoded quotes AIUI [16:27:37] remex wouldn't see the context [16:28:10] oh i see [16:29:22] yeah but ideally this is all interim anyways since the parsoid read world will eventually solve this via the DOM pipeline you described [16:29:33] it's going to be marvelous [16:29:42] :)) [16:30:32] maybe we'll update to PHP 8.3 soon to give y'all an opportunity to find new inconsistencies in its new HTML5 DOM implementation [16:31:11] c.scott has already started poking that bear :D [16:40:19] mmmh. scott points out that there's no reason that that specific patch would have an impact on mobile but not on desktop, and he has a point [16:40:35] ihurbain: we don't really much transforming if any on desktop. [16:40:40] that's mostly a mobilefrontend thing [16:41:04] so if use of transforms explains it, that would correlate naturally with mobile [16:41:16] the default html is more or less the desktop html [16:41:36] yeah but if we're looking specifically at deduplicatestyles, that one *would* be run on desktop, right? i think? [16:41:44] (let me check my stupid pipeline) [16:42:11] mszabo: https://phpc.social/@nielsdos/114491428964495361 [16:42:17] see that post and the rest of the thread up/down [16:42:29] it's a bit scattered but some analysis of the new HTML5 Php-dom api there [16:43:02] Yes, I would assume so. That's the TemplateStyles extension which afaik is a "normal" parser extension. Not different one way or the other. [16:43:13] ha, nice :) [16:43:20] yeah. and the patch that i was poking at was specifically a deduplicatestyle patch [16:43:22] DeduplicateStyles is a core transform [16:43:40] desktop: [16:43:42] it's a default since https://gerrit.wikimedia.org/r/c/mediawiki/core/+/393284 [16:43:44] mobile: [16:43:50] it is interesting that mobile uses <> and desktop [16:43:57] which suggests some normalise step is only applied on mobile [16:44:00] but yeah in practice most of the style tags will have come from TS [16:44:17] MobileFrontend has an additional post processing step which uses PHP DOM to build a document and process it via htmlformatter and co [16:44:24] I guess that pretty definitely shows that the desktop one isn't run through any full dom parse/serialize transform [16:44:29] yup [16:44:48] there are tasks to eventually subsume the MF subpipeline into core's own output transforms [17:16:15] waiiiiiiiiiit a minute [17:16:28] is it a matter of mobile vs desktop, or a matter of skin? [17:18:07] because i'm looking at the https://performance.wikimedia.org/excimer/profile/317c87eee175962f which has the TOC on top of it, and it is twice as large because we inject TOC (fine) (well, not, but you get my point), except TOC injection is not mobile-dependent, it's skin dependent [17:18:42] and vector2022 i *think* doesn't inject TOC because it handles TOC on its own, but plausibly mobile does? [17:22:43] (anyway. dinner time.) [18:13:15] I can't speak to what vector does or doesn't, but the "mobile formatter" is specific to mobile indeed, not by skin. [18:13:58] If you use useskin=minerva on desktop it is the skin we typically see on mobile but without the mobile html formatter transforms [18:14:30] Likewise, if you use useformat=mobile on regular URLs you'll trigger that transform even in vector. [18:14:55] ihurbain: ^ fyi [18:24:46] ihurbain: yeah sorry I was using confusing terminology - it's a minerva vs vector thing indeed [18:25:19] the skin must have toc: false set in its options in skin.json to disable the toc injection, which vector does (since vec2022 has the toc in the sidebar) and minerva doesnt [18:58:38] hm.. so both desktop (if using vector22) and mobile have a mututally exclusive transform. one for MobileFormatter, the other for TOC removal / non-insertion. Although there are still wikis and many users on desktop where regular Vectr is used, where TOC would be displayed. [18:59:03] Is the default TOC insertion where this spike comes from? [18:59:47] * Krinkle compares week over week flamegraphs [22:08:02] Krinkle: possibly, the TOC transform runs only on mobile since it performs the replacement of the incontent TOC placeholder with the actual TOC, whereas vector2022 instructs the output pipeline to eschew this so that it can manage its own toc rendering [22:08:22] but in both cases this is distinct from the DOM-based transforms that MobileFrontend is performing - those haven't been touched for a while [22:09:39] both DeduplicateStyles and HandleTOCMarkers are run by core's OutputTransformPipeline - they used to be disparate post-cache transforms living in ParserOutput::getText() until CTT cleaned the area up and introduced an abstraction to more clearly delineate and execute them