[10:02:26] 06serviceops, 06Data-Engineering, 10Event-Platform: Make eventstreams-internal available to WMF staff without an ssh tunnel - https://phabricator.wikimedia.org/T348763#10266916 (10phuedx) Would it be valuable to move MPIC to using oauth2-proxy for consistency with these other systems? [14:55:36] 06serviceops, 06SRE, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): DegradedArray email alerts for aqs1013 and aqs1014 are firing since April 18 - https://phabricator.wikimedia.org/T373490#10268041 (10Gehel) [14:56:47] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Allow running one-off scripts manually - https://phabricator.wikimedia.org/T341553#10268045 (10Michael) Recording here that I'm noticing myself still running one-off scripts on the maint-hosts because, as I understand it, for the new way of running them, I would... [16:04:20] 06serviceops, 06SRE, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): DegradedArray email alerts for aqs1013 and aqs1014 are firing since April 18 - https://phabricator.wikimedia.org/T373490#10268574 (10bking) a:03bking [16:22:34] 06serviceops, 06SRE, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): DegradedArray email alerts for aqs1013 and aqs1014 are firing since April 18 - https://phabricator.wikimedia.org/T373490#10268679 (10bking) Based on `/etc/wikimedia/contacts.yaml` , these hosts are owned by Data Persistence. As such, I'm re... [16:25:39] 06serviceops, 06SRE: DegradedArray email alerts for aqs1013 and aqs1014 are firing since April 18 - https://phabricator.wikimedia.org/T373490#10268693 (10bking) a:05bking→03None [16:34:42] 06serviceops, 06Data-Persistence, 06SRE: DegradedArray email alerts for aqs1013 and aqs1014 are firing since April 18 - https://phabricator.wikimedia.org/T373490#10268811 (10taavi) [17:20:54] 06serviceops, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269171 (10CDanis) [17:22:15] 06serviceops, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269184 (10Scott_French) It looks like there was a large influx of `flaggedrevs_CacheUpdate` [0] jobs that started around that time, and ended around 2024-10-25 20:30. That's re... [17:41:17] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10269389 (10Jhancock.wm) a:05Clement_Goubert→03Jhancock.wm [17:45:00] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install mc-gp200[4-6] - https://phabricator.wikimedia.org/T376968#10269407 (10Jhancock.wm) a:05Clement_Goubert→03Jhancock.wm [17:48:58] 06serviceops, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269432 (10Scott_French) Given the rate at which the backlog is draining, this should self-resolve in ~ 24h. We can try to relax the concurrency limit for the `low_traffic_jobs`... [17:50:37] 06serviceops, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269442 (10Dreamy_Jazz) >>! In T378385#10269432, @Scott_French wrote: > Given the rate at which the backlog is draining, this should self-resolve in ~ 24h. > > We can try to rel... [17:56:15] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install kubestage200[3-4] - https://phabricator.wikimedia.org/T377009#10269497 (10Jhancock.wm) a:05Clement_Goubert→03Jhancock.wm [18:20:33] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[56-70] - https://phabricator.wikimedia.org/T376965#10269618 (10Jhancock.wm) a:05Clement_Goubert→03Jhancock.wm [18:21:19] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10269610 (10Jhancock.wm) a:05Clement_Goubert→03Jhancock.wm [18:31:34] 06serviceops, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269740 (10Scott_French) Thanks for the follow-up, @Dreamy_Jazz. If self-resolving in 24h should be fine, then I might not make any changes to the concurrency limit (particularly... [18:32:27] 06serviceops, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269744 (10kostajh) >>! In T378385#10269442, @Dreamy_Jazz wrote: >>>! In T378385#10269432, @Scott_French wrote: >> Given the rate at which the backlog is draining, this should se... [18:47:14] 06serviceops, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269818 (10Scott_French) Looking at the breakdown by wiki for `flaggedrevs_CacheUpdate` jobs among the last 1M entries in the executor log on mwlog1002: ` $ tail -1000000 JobExe... [19:09:03] 06serviceops, 10FlaggedRevs, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269919 (10kostajh) [19:12:41] 06serviceops, 10FlaggedRevs, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269928 (10kostajh) >>! In T378385#10269818, @Scott_French wrote: > Looking at the breakdown by wiki for `flaggedrevs_CacheUpdate` jobs among the last 1M entries... [19:21:49] 06serviceops, 10FlaggedRevs, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269970 (10Scott_French) Thanks, @kostajh. FWIW, it looks like `Kategorie:Wikipedia:Seite,_die_JsonConfig_verwendet` is a brand new category created on the 24th... [19:43:37] 06serviceops, 06Data Products, 06Data-Platform-SRE, 10Dumps-Generation, and 2 others: Migrate current-generation dumps to run from our containerized images - https://phabricator.wikimedia.org/T352650#10270085 (10AeinBagheri) a:03AeinBagheri [19:48:38] 06serviceops, 06Data Products, 06Data-Platform-SRE, 10Dumps-Generation, and 2 others: Migrate current-generation dumps to run from our containerized images - https://phabricator.wikimedia.org/T352650#10270119 (10Reedy) a:05AeinBagheri→03None