[09:58:26] GitLab needs a short maintenance break at 11:30 UTC (in 90 minutes) [11:38:03] GitLab maintenance done [12:21:33] hello oncallers! [12:21:44] i am about to deploy thumbor in codfw to pick up the new poolcounter node [12:32:34] 👍 [12:40:42] done! Everything seems good [13:02:03] This is mostly an Observability question I suppose, so will ask here rather than in #mediawiki-core – in the context of T374231 where we're having timeouts that are themselves getting jsonTruncated because the stack trace is too long, is there an obvious point in wikimedia/request-timeout or maybe wikimedia/normalized-exception where our logging code should trim the passed-in exception before it gets to syslog? [13:02:05] T374231: wikifunctions mediawiki instance can't sustain more than 5rps - https://phabricator.wikimedia.org/T374231 [13:09:15] Actually, filed as T374618 as it's probably more helpful there. [13:09:16] T374618: Trim exceptions (?in wikimedia/normalized-exception) before they get to syslog, so that they aren't jsonTruncated - https://phabricator.wikimedia.org/T374618 [14:30:52] I can see the alerts dashboard back, and weirdly, I notice icinga like more responsive [15:05:22] it is plausible, cpu is four years faster [15:06:53] nice [15:08:49] unrelated, but heads up on disk space warning for grafana2001 [15:16:57] ack, thank you jynus [16:08:43] FYI, I'm in the process of running the `sre.discovery.datacenter` cookbook in `--dry-run` mode for testing ahead of the switchover [0]. [16:08:43] this should not have any effect on anything (barring some subtle regression in dry-run), but flagging here for visibility [16:08:43] [0] https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Dry_Run [16:30:06] ack, thx [16:41:40] all done, no ill effects encountered :) [16:43:08] \o/ [16:43:55] I did find one spot where we don't set a timeout on a DNS query, which happened to hit a DNS server in codfw rack D1 at exactly the wrong time, heh [16:44:04] I'll send a patch to fix that later today [16:45:57] now running the equivalent dry-run for the `sre.switchdc.mediawiki` cookbooks, ahead of the live-test next week [0]. [16:45:57] as before, no ill effects expected, though I'll be keeping an eye out. [16:45:57] [0] https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Live_Test [17:09:28] pausing cookbook testing while coordinating a change that needs to happen during the mediawiki infrastructure window [17:28:57] unpausing dry-run testing of `sre.switchdc.mediawiki` [17:58:58] all done. once again, no ill effects [18:11:54] swfrench-wmf: <3