[14:48:08] Raine: hey, https://phabricator.wikimedia.org/T421732 seems like potential fallout from https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1262091 [14:48:32] taavi: thanks, reverting [14:54:10] seems like the patch enabling it for all of s3 was missing one layer of arrays? [14:54:26] was it? [14:55:00] I tested category creation and that did the right thing (writing into both tables) [14:55:05] so I think the problem is in the code, not the config [14:55:14] (this code has not been run in production before) [14:55:19] or am I missing something? [14:55:27] $wgTempCategoryCollations is an array of associative arrays, while the patch set it as a single associative array [14:55:47] that's true [14:55:54] but then why did it seem to work? :D [14:56:52] that one I have no idea about :D [14:58:03] fascinating :D but yeah, I think you're right [14:58:05] thank you <3 [16:25:22] Hi! I have found that a maintenance job (testkitchen-updateconfigs-29577414) failed because of a timeout from etcd I was wondering if you know something about it. It seems that it failed just one time and, since then, the maintenance job is working fine. So it seems it's not a big deal. Just confirming it. [16:25:23] Thanks! [16:26:40] Please, also let me know if this is the right place to ask about this kind of things. I have read somewhere that you are the team that owns that component [16:37:14] unfortunately i don't have much information other than what's in the log here, which is the etcd timeout [16:40:29] in general though, given that this cron is running every minute and does not appear to fail often, it doesn't look particularly worrisome [17:15:39] That's what I was wondering. Since then (several days ago) the maintenance job has been running perfectly. Also, if you don't know about anything relevant related to etcd and that message doesn't worry you, I guess we can consider this was an isolated incident [17:15:41] Thank you very much! [17:37:38] sounds good, thanks! [17:43:39] I have something else that I thought it was related. Not sure if it's something to ask here. It's related to a grafana dashboard we have that is related to that maintenance job and the extension (TestKitchen) which the maintenance job belongs to. [17:43:44] https://grafana-rw.wikimedia.org/d/66d17209-c793-42ab-bba7-dfe69ad350bd/testkitchen?orgId=1&from=now-7d&to=now&timezone=utc&var-site=codfw [17:44:13] It stopped working 2 days before the maintenance job. And the metrics used there are generated by some code that is executed via the maintenance job [17:45:31] And I can see some related errors in https://thanos.wikimedia.org/graph?g0.expr=mediawiki_TestKitchen_test_kitchen_api_requests_total&g0.tab=1&g0.stacked=0&g0.range_input=1h&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D&g0.engine=prometheus&g0.analyze=0&g0.tenant=&g0.end_input=2026-03-26%2017%3A29%3A57&g0.moment_input=2026-03-26%2017%3A29%3A57 [17:51:03] sfaci: change the Cluster selector at the top of the dashboard to "eqiad" rather than "codfw" [17:51:20] the switchover moved periodic job execution to eqiad [17:51:36] wow! [17:51:42] I didn't realize! [17:52:04] I'm sorry for bothering with these kind of things! [17:52:10] and thank you very much! [17:52:16] no worries! they can be non-obvious :) [17:56:51] And now I have found the email where you are explaining that thing xD [17:56:56] Thank you very much!