[00:23:57] 06serviceops, 06MediaWiki-Engineering, 06Traffic, 07Wikimedia-production-error: 503 error when edit large size pages on PHP 8.1 - https://phabricator.wikimedia.org/T385395#10530854 (10Scott_French) One additional point of interest: If we zoom out and look at //all// php-fpm request timeouts logged by the 8... [01:05:25] 06serviceops, 06MediaWiki-Engineering, 06Traffic, 07Wikimedia-production-error: 503 error when edit large size pages on PHP 8.1 - https://phabricator.wikimedia.org/T385395#10530980 (10ssingh) On the Varnish side of things, I got some clarity from bblack. Specifically that `first_byte_timeout` only applies... [01:40:56] 06serviceops, 13Patch-For-Review: Improve / extend prometheus metrics exported by mercurius - https://phabricator.wikimedia.org/T383641#10531043 (10Scott_French) [01:46:19] 06serviceops, 13Patch-For-Review: Improve / extend prometheus metrics exported by mercurius - https://phabricator.wikimedia.org/T383641#10531049 (10Scott_French) p:05Medium→03Low Alright, that should cover everything I've got lingering from December. [08:42:42] 06serviceops, 06MediaWiki-Engineering, 06Traffic, 07Wikimedia-production-error: 503 error when edit large size pages on PHP 8.1 - https://phabricator.wikimedia.org/T385395#10531262 (10daniel) > This suggests we're getting stuck in some long-running C function, perhaps? A shot in the dark, given DT's heavy... [10:34:20] 06serviceops, 06MediaWiki-Platform-Team, 10MW-on-K8s: Migrate CentralAuth maintenance jobs to mw-cron - https://phabricator.wikimedia.org/T385866 (10Clement_Goubert) 03NEW [10:38:01] 06serviceops, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team, 10MW-on-K8s: Migrate CentralAuth maintenance jobs to mw-cron - https://phabricator.wikimedia.org/T385866#10531432 (10Clement_Goubert) [10:38:17] 06serviceops, 10CampaignEvents, 06Campaigns-Product-Team, 10MW-on-K8s: Migrate CampaignEvents jobs to mw-cron - https://phabricator.wikimedia.org/T385867 (10Clement_Goubert) 03NEW [10:42:22] 06serviceops, 10MW-on-K8s: Migrate cleanupUploadStash job to mw-cron - https://phabricator.wikimedia.org/T385868 (10Clement_Goubert) 03NEW [10:49:00] 06serviceops, 10MW-on-K8s: Make sure jobs are still defined on deployment-prep as systemd timers - https://phabricator.wikimedia.org/T385869 (10Clement_Goubert) 03NEW [10:49:04] 06serviceops, 10MW-on-K8s: Make sure jobs are still defined on deployment-prep as systemd timers - https://phabricator.wikimedia.org/T385869#10531474 (10Clement_Goubert) p:05Triage→03High [11:05:29] 06serviceops, 10Page Content Service, 13Patch-For-Review: Dont propagate server error details to end users - https://phabricator.wikimedia.org/T385821#10531481 (10Jgiannelos) @hnowlan In case we don't want to implement that logic on rest-gateway I sent a patch to hide details when `NODE_ENV == production`. I... [11:25:02] 06serviceops, 06MediaWiki-Engineering, 06Traffic, 07Wikimedia-production-error: 503 error when edit large size pages on PHP 8.1 - https://phabricator.wikimedia.org/T385395#10531509 (10mszabo) >>! In T385395#10531262, @daniel wrote: >> This suggests we're getting stuck in some long-running C function, perha... [13:05:54] 06serviceops, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team, 10MW-on-K8s: Migrate CentralAuth maintenance jobs to mw-cron - https://phabricator.wikimedia.org/T385866#10531647 (10Tgr) All of these are low-risk (in the sense that if the job stops working for a few days or weeks, that's not mu... [13:08:50] 06serviceops, 06Data-Persistence, 10MW-on-K8s: Migrate ParserCachePurging jobs to mw-cron - https://phabricator.wikimedia.org/T385800#10531650 (10Ladsgroup) Migrating PC cache jobs should be easy. Probably migrate one first and see if everything works fine. The rest are exactly the same job but on different... [13:10:13] 06serviceops, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team, 10MW-on-K8s: Migrate CentralAuth maintenance jobs to mw-cron - https://phabricator.wikimedia.org/T385866#10531666 (10Clement_Goubert) >>! In T385866#10531647, @Tgr wrote: > All of these are low-risk (in the sense that if the job s... [13:17:38] 06serviceops, 10MW-on-K8s, 06Trust and Safety Product Team, 10MediaModeration (MediaModeration 2.1): Migrate MediaModeration jobs to mw-cron - https://phabricator.wikimedia.org/T385799#10531677 (10Clement_Goubert) >>! In T385799#10529183, @Dreamy_Jazz wrote: > Thanks for checking for our input! > >> * job... [14:20:31] 06serviceops, 10CampaignEvents, 06Campaigns-Product-Team, 10MW-on-K8s: Migrate CampaignEvents jobs to mw-cron - https://phabricator.wikimedia.org/T385867#10531806 (10Daimona) Thanks for taking care of this! As for the input requested: > jobs that should be watched more closely Of the two job types, aggre... [14:41:43] 06serviceops, 06MediaWiki-Engineering, 06Traffic, 07Wikimedia-production-error: 503 error when edit large size pages on PHP 8.1 - https://phabricator.wikimedia.org/T385395#10531858 (10Daimona) >>! In T385395#10529425, @Scott_French wrote: > These seem to indicate a large number of very slow (> 1000s) discu... [14:44:06] 06serviceops, 06MediaWiki-Engineering, 06Traffic, 07Wikimedia-production-error: 503 error when edit large size pages on PHP 8.1 - https://phabricator.wikimedia.org/T385395#10531865 (10mszabo) I also took a glance at local and global AFs using `rlike` in case there might be a problematic regex there. I thin... [14:45:17] 06serviceops, 06MediaWiki-Engineering, 06Traffic, 07Wikimedia-production-error: 503 error when edit large size pages on PHP 8.1 - https://phabricator.wikimedia.org/T385395#10531866 (10mszabo) >>! In T385395#10531858, @Daimona wrote: > How exactly are y'all reproducing this? Would trying to edit https://uk.... [15:10:25] 06serviceops, 10MW-on-K8s, 06Trust and Safety Product Team, 10MediaModeration (MediaModeration 2.1): Migrate MediaModeration jobs to mw-cron - https://phabricator.wikimedia.org/T385799#10531958 (10Dreamy_Jazz) >>! In T385799#10531677, @Clement_Goubert wrote: > We can arrange its migration to happen say 15... [15:31:17] 06serviceops, 06Content-Transform-Team-WIP, 06Data-Persistence, 10iOS-app-feature-Performance, and 7 others: PCS caching and pregeneration when restbase is decommissioned - https://phabricator.wikimedia.org/T319365#10532014 (10Jgiannelos) 05Open→03Resolved Closing this one for now since we have a c... [15:41:19] 06serviceops, 06Abstract Wikipedia team, 10function-evaluator, 13Patch-For-Review: Have SRE provide a production-ready Rust image upstream - https://phabricator.wikimedia.org/T380807#10532025 (10akosiaris) > This is a good technical reason, we 'll need to discuss within #SRE what that would mean and how w... [16:00:33] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Make sure jobs are still defined on deployment-prep as systemd timers - https://phabricator.wikimedia.org/T385869#10532064 (10bd808) See also: {T276650} [16:02:45] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Make sure jobs are still defined on deployment-prep as systemd timers - https://phabricator.wikimedia.org/T385869#10532067 (10Clement_Goubert) >>! In T385869#10532064, @bd808 wrote: > See also: {T276650} That's not in scope for this task, which is just about no... [19:13:46] 06serviceops, 06MediaWiki-Engineering, 06Traffic, 07Wikimedia-production-error: 503 error when edit large size pages on PHP 8.1 - https://phabricator.wikimedia.org/T385395#10532577 (10Daimona) I wanted to reproduce this locally. So I went over to https://uk.wikipedia.org/wiki/Special:Export, added the titl... [19:34:10] 06serviceops, 10MW-on-K8s: Better test coverage for MediaWiki chart's Apache config - https://phabricator.wikimedia.org/T385905 (10RLazarus) 03NEW p:05Triage→03Low [20:52:02] 06serviceops, 06MediaWiki-Engineering, 06Traffic, 07Wikimedia-production-error: 503 error when edit large size pages on PHP 8.1 - https://phabricator.wikimedia.org/T385395#10532748 (10Daimona) I realized I also needed to install TemplateStyles, so I did that; I also already had Scribunto installed (althoug... [23:20:25] 06serviceops, 06MediaWiki-Engineering, 06Traffic, 07Wikimedia-production-error: 503 error when edit large size pages on PHP 8.1 - https://phabricator.wikimedia.org/T385395#10533126 (10mszabo) @daniel was right—this is indeed a case of catastrophic backtracking, although not in DT but in AbuseFilter. We st...