[17:50:35] I'm looking at action=raw and regret what I mentally started. [17:51:21] wfHttpError is used and claims to be "simple", but there ain't nothing simple about it once it calls wgOut->sendCacheControl() and the global monkey zoo it exposes in terms of global state from whatever may have run before. [17:51:45] and apparentlt throwing HttpError results in cache headers and vary headers sometimes but not other times, depending on how far down the class you throw it [18:08:35] oh gosh, it supports parsing wikitext, too? [18:08:41] and has a hook for modifying its output [18:10:09] * Krinkle feels like Marty in https://www.youtube.com/watch?v=mGrFtyd_LMI&t=99s [18:13:09] the 'monkey zoo' bit made me laugh [18:19:54] Krinkle: it predates query.php and api.php no? feels like people just stuffed in whatever features they needed :p [18:21:17] I definitely used it a lot back in the day [18:30:49] I'm going to propose we limit it to js/css/json/wikitext for latest (public) revs only. Thus removing 'oldid' support (and revdel/auth requirements), 'templates' support (preproc parsing), partial response with 'section' param, and 'maxage'/'smaxage' params. [18:34:30] if a task is created, feel free to toss me into the subscriptions, I don't have strog thoughts about it right now, need to mull it over [18:35:06] might be nice to see what sort of traffic that gets with what kinds of params, these days [18:42:49] yeah, the parts that remove support and change 200s to 40xs will not be done overnight and we need to look at usage indeed [18:43:17] https://phabricator.wikimedia.org/T279120 [18:47:33] ty [19:30:26] about 30% of action=raw is spent computing EchoSeenTime, because of its hook in OutputPage 304 handling [19:30:41] to determine whether or not to miss the cahce to show you an echo bell [19:30:50] except those don't appear on action=raw or action=render [19:31:21] https://performance.wikimedia.org/arclamp/svgs/daily/2021-04-02.excimer-wall.index.svgz click into RawAction [19:39:21] * bd808 watchlists [[mw:User:Krinkle/Gadgets 3.0]] [19:53:47] can we have 2.0 first? [19:55:58] I love the recursion of this issue - I love the recursion [19:56:13] https://phabricator.wikimedia.org/T258347#6969078 [19:56:54] also - we are apparently still serving pretty much the entire PDF content as file meta data through the API [19:57:03] from OCR [19:57:15] I think Tim proposed recently in light of the image table issue to move that out [19:57:57] might be worth considering whether we stil want to expose that in this manner in full as well, or keep it internal for use only to feed the search index [19:58:49] callers prpbablyu don't expect megabytes of strings when asking for media file metadata, I wonder if that means mediaviewer and mobiel apps download that too [19:59:57] is that one of the use cases that "revision slots" were supposed to help with? [20:00:15] mediaviewer does not support paged media. [20:00:28] that == derived data that needs to be cached somewhere [20:00:30] the desktop media viewer, at least. [20:01:26] derived revision slots were discussed but never implemented. [20:09:38] Cindy recently implemented storage for derived slots [20:10:09] https://gerrit.wikimedia.org/r/c/mediawiki/core/+/669277 [20:10:19] oh, cool! [20:10:31] not much on top of it, just the storage layer [20:11:28] not sure about where it's going yet, you should ask her if you need more details [20:29:50] Krinkle: do you have a minute for a quick question? [20:30:13] sure [20:30:39] do you think the split of ParserCache metrics by content-model is useful? [20:31:14] we are going away from the idea of page content model, it makes not much sense in mcr world [20:31:43] so I was going to remove the split and update the dashboard. Maybe add some fake 'wikitext' there for metrics continuity [20:31:58] but if the split useful - will leave it as is for now [20:35:12] Krinkle: patch in question: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/674616 [20:38:48] Pchelolo: interesting, I'm not sure. instinctively I feel like saying yes, especially to differentiate between "normal" articles and e.g. everything else (Wikidata, Flow, etc.) [20:39:41] but I'm not sure because maybe we don't actually do anything significantly different there from a parsercache and output handling perspective. [20:40:19] I don't know enough about how Flow and Wikibase use ParserCache on their pages to know if that would mess with the usefulness of the overall metric [20:40:21] from the correctness standpoint, what's cached in there is the combined renders of all the slots of the revision, each slot having it's own content model. And the page content model is essentially the main slot content model [20:40:45] and each slot rendering can mark the output uncacheable [20:41:31] can, but generally doesn't afaik. I would expect non-wikitext slots to not have abilities like {{int}}, or {{CURRENTTIME}} [20:42:00] and things like Flow may be entirely uncacheble/personalised for logged-in [20:42:19] WikibaseMediaInfo seems to use OOUI [20:42:31] and until recently I did not realize that that outpout ends up in the parser cache [20:42:38] I don't know if that's unused artefact or actually used [20:43:15] same for wikidata entity pages, not sure how much there is dynamic vs cached with regards to rendering that. [20:43:23] but maybe it's all the same as articles and nothing special [20:43:35] ok, makes sense. I guess I'll leave it for now - it's not a big deal, just a little dependency on Title [20:43:50] in terms of diagnosing incidents it would surely help if e.g. cache hits drops, why that happens. [20:44:08] eventually we'd get rid of it and replace with something like 'page type' concept [20:44:37] so yeah, I think it would help in that sense. but maybe we can derive it in a cheaper way if ParserOutput knows mainslot->contenttype [20:45:33] metrics are for cache misses too, so we might not have ParserOutput.. [20:45:43] right. [20:45:53] but we'd still end up with one to serve [20:46:24] the DI issue - $wikiPage->getContentModel() - that doesn't have to use Title I suppose [20:46:39] and page content type seems like a significant enough thing to be able to query one way or another long-term without Title [20:46:44] yeah, not a wikiPage anymore :) [20:46:46] part of page record I guess? [20:46:58] can be updated when you get to that [20:47:02] to use that instead [20:47:45] ok. sounds good. Will restore the content-model for now [20:47:47] thank you