[10:30:15] Not sure if sandrello is still here but what you're looking for might be: https://en.wikipedia.org/api/rest_v1/#/Page%20content/get_page_random__format_ and I can see that it's still supported though "stability" is marked as "unstable". [11:11:41] xSavitar: hm.. I don't see it listed. CTRL-F for "random" gives me no results. [11:12:00] These are the swagger docs sandrello referred to https://usercontent.irccloud-cdn.com/file/TSypF6O4/Screenshot%202025-05-12%20at%2012.11.43.png [11:33:12] Krinkle, this is very strange. I saw random on that list (I wish I took a screenshot). But for some reason, I can't see it again. [11:34:20] If you click - https://en.wikipedia.org/api/rest_v1/page/random/title, you should see the response of a random article. [11:34:40] So the endpoint is still there and working as expected [11:50:34] Yep [11:50:41] but the docs are missing [11:50:58] cc duesen bpirkle , I'm guessing something about swagger changes may've caused that to go missing [11:51:07] and you probably had it cached once from before? [12:00:36] The endpoint lives in wikifeeds service and the last patch that touched the random related endpoint is https://github.com/wikimedia/mediawiki-services-wikifeeds/commit/e93bb639f30c391ea791359d7bb91a7e27c1bd3f [12:01:04] Maybe something has happened elsewhere between then and now that has made the spec no longer render? [12:31:26] xSavitar: that patch is tagged with T267223 [12:31:26] T267223: Move /random endpoint from RESTBase to Wikifeeds - https://phabricator.wikimedia.org/T267223 [12:31:55] which says that the endpoint used to be in RESTBase, but with Wikifeeds moving to become an independent service, made a copy of this endpoint into its own service. [12:32:11] so assuming REST Gateway is handling this, that suggests indeed that it no longer exists in restbase and thus no longer part of its swagger spec [12:32:18] Actually, something else seems to be happening which is very weird - https://phabricator.wikimedia.org/M338 [12:32:23] I don't see a commit on that task that removes it from restbase, but it makes sense [12:32:26] I was able to see the /page/random again [12:33:10] xSavitar: https://en.wikipedia.org/api/rest_v1/?spec [12:33:39] `$ curl -i 'https://en.wikipedia.org/api/rest_v1/?spec' | grep -i random` [12:33:43] exit 1, for me [12:33:47] So what I did is potentially reproduceable. I visited this link: https://en.wikipedia.org/api/rest_v1/ then uncollapsed all the sections and then refreshed/reloaded and then somehow the /page/random endpoint shows [12:34:19] there's no state there afaik. it's a static JSON request [12:35:48] TBH, I don't know why it shows sometimes and doesn't other times. But whatever is causing it may need to be looked at since it's not showing what it's supposed to be showing. [12:37:04] xSavitar: that depends. if it was removed from restbase and exists now privately within wikifeeds for compat, then it makes sense that it no longer is advertised in restbase. [12:37:18] it might be that it is stuck in your nearest varnish cache for some reason [12:37:21] what does curl give you? [12:37:56] https://phabricator.wikimedia.org/M338/1123/ [12:39:14] % Total % Received % Xferd Average Speed Time Time Time Current [12:39:14] Dload Upload Total Spent Left Speed [12:39:14] 100 106k 0 106k 0 0 89421 0 --:--:-- 0:00:01 --:--:-- 89441 [12:40:54] This same link https://en.wikipedia.org/api/rest_v1/?spec when I search random (sometimes) I find a hit and other times I don't. [12:41:26] do you see anything in the headers above the json response that correlate with it finding and not finding: [12:42:02] e.g. something in the 'x-cache` header maybe [12:42:58] $ curl -i -s 'https://en.wikipedia.org/api/rest_v1/?spec' | grep -iE 'x-cache:|\bage:|random' [12:42:58] age: 0 [12:42:58] x-cache: cp3067 miss, cp3067 pass [12:45:46] Yes, when I don't find it, the spec response is 14.2 kB and when I find it, the response is 17.7 kB. No difference in x-cache header for both requests -- pasting below [12:46:11] (without): x-cache [12:46:12] cp6011 miss, cp6009 pass [12:46:22] server: restbase1034 [12:46:37] (with): x-cache [12:46:38] cp6012 miss, cp6009 pass [12:46:50] server [12:46:51] restbase1043 [12:47:57] cp6011 and cp6012 are different servers. but the miss status should mean that it isn't a cache problem, so which one you proxy through isn't important probably, but would be interesting if it correleates that way consistently for you, i.e. cp6012 with and cp6011 without. [12:47:57] So a response from `restbase1042` doesn't include the /page/random spec [12:48:06] is that consistently the case? [12:48:07] But the one from restbase1043 does [12:48:13] i.e. if you repeat it [12:49:16] restbase1043 consistently has /page/random, yes! [12:49:51] Also, x-cache for that restbase1043 is consistent too `cp6012 miss, cp6009 pass` [12:50:59] Other restbase instance such as restbase1038, 1035 etc don't have /page/random in the spec [12:52:06] Consistently for me, every request that hits rebase1043, I get /page/random section in the spec. Everything else doesn't show that. [13:10:58] xSavitar: nice. want to file a task for that? (tagged MWI and RESTBase, with ref to T267223 and the above user report as well!) [13:10:58] T267223: Move /random endpoint from RESTBase to Wikifeeds - https://phabricator.wikimedia.org/T267223 [13:29:43] Krinkle https://phabricator.wikimedia.org/T393897. [13:30:32] There is something more - specs for other standalone services are also missing not just wikifeeds [13:30:51] This also affects `/page/pdf` which is in Proton [13:53:28] Looks like there's a major latency regression on mobile as of ~1 month ago, almost doubled. https://grafana.wikimedia.org/d/QLtC93rMz/backend-pageview-timing?viewPanel=panel-60 [13:53:50] started Feb 28 [17:12:37] tgr: Have you seen these ones in Logstash? [17:12:38] > PHP Deprecated: Use of $_SESSION was deprecated in MediaWiki 1.27. [Called from session_write_close in (internal function)] [17:12:48] The trace isn't very telling. [17:13:27] The same reqId does log 1 other message: `Something wrote to $_SESSION!` [17:13:34] but that one lacks a stack trace [18:10:30] Krinkle: hm, that seems like a bug in SessionManager [18:11:38] it saves dirty sessions on shutdown, but it calls session_write_close() before that, which triggers the "PHP tried to write to the session" warning [18:12:38] seems like that should be the other way around [18:13:16] or it should check the shutdown flag, or something like that [18:14:13] btw 7 of the 10 top normalized messages now belong to us [18:14:23] I wonder if we should do something about that [23:01:54] 2+ months of a regression of this magnitude going unnoticed deserves a postmortem [23:03:16] not for the underlying bug, for the process failure [23:13:55] is there a list of high-signal dashboards for the on-call rotation to check periodically? [23:22:04] We have no SLOs for MediaWiki, no performance team, no alert owner, and generally the amount of code owned and maintained by teams comfortable with their code is at an all-time low (given various reorgs that dropped ownership, and code/people moving to teams unfamiliar with that code, and many-years of churn without appropriate backfills, eg the three people maintaining the nodejs jobqueue left slowly over several years but seeming not [23:22:04] understood as nobody as assigned or backfilled). [23:25:04] We have the data and telemetry. Although their stability isn't great since they're no longer consumed by the same team that maintains it, so regressions are common and only affect "other" parties (as you've seen with flame graphs).