[04:24:24] <ryankemper> We should add some further retry logic to our rolling operation elasticsearch cookbook. Most common failure scenario is `elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.eqiad.wmnet', port=9243): Read timed out. (read timeout=60))` which seems like a pretty easy error to detect and retry a few times on [09:14:47] <pfischer> dcausse: you where right about the failing rdf-spark-tools tests. After replacing the constructor calls with a builder chain, it compiles again. [09:41:40] <gehel> weekly update: https://wikitech.wikimedia.org/wiki/Search_Platform/Weekly_Updates/2024-06-07 [09:43:11] <gehel> ryankemper: what operation causes timeouts? 60 seconds seems pretty long already [09:44:26] <gehel> We should still implement retries, but I'm wondering if we have an underlying issue [10:13:58] <dcausse> lunch [12:52:03] <dcausse> hm... was about to re-deploy the cirrus-streaming-updater to staging (for https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1039727) but I realize that that might enable the sanitizer there [12:52:10] <dcausse> not sure we want it there... [13:00:59] <dcausse> pfischer: if you're around https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1040151 [13:11:38] <inflatador> <o/ [13:11:43] <dcausse> o/ [13:52:04] <inflatador> dropping off my son, back in ~20 [14:24:30] <inflatador> back [14:33:13] <ebernhardson> \o [14:33:24] <pfischer> dcausse: I noticed that yesterday, already set up a patch: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1039736 [14:35:20] <dcausse> pfischer: oh thanks! [14:35:22] <dcausse> o/ [14:35:55] <dcausse> will get that deployed to upgrade staging [14:36:08] <inflatador> gotta drop off the other kid ;) . Back in ~30 [14:55:04] <ebernhardson> hmm, in the mediawiki page_change events for the prior state of a page move, do we think it should always include the namespace id or only if it changed? [14:55:18] <ebernhardson> Currently only the page title is included there, but we need the old namespace id to know if it moved between indices [14:56:10] <ebernhardson> seems like it should simply always be there for consistency, allow consumer to compare .page.namespace_id against .prior_state.page.namespace_id [14:57:32] <dcausse> yes, makes to me [14:57:36] <dcausse> *sense [14:59:19] <gehel> dcausse, pfischer: last minute, but if you want to chat about Search and languages, feel free to join [14:59:41] <gehel> I just sent you the invite [15:01:10] <inflatador> back [15:44:47] <pfischer> gehel: I’am sorry, I was AFK [15:46:09] <pfischer> BTW: looks like rate-limiting via envoy is now ready https://phabricator.wikimedia.org/T362310#9870761 - shall we enable this in general or only for backfill setup? [15:49:40] <inflatador> heading back home, be there in ~30 [16:09:21] <dcausse> pfischer: no objections to enable it everywhere but is the pipeline ready to slowdown on 429 or will we fail more events? [16:12:29] <dcausse> going offline, have a nice week-end [16:16:12] <inflatador> back [16:17:14] <ebernhardson> hmm, deciding what counts as language support is not easy...in a way glad they asked for binary. I was thinking binary isn't specific enough, but then choosing an appropriate divider is hard [17:07:45] <inflatador> lunch, back in ~40 [17:21:28] * ebernhardson tries turning off FSLockManager on cindy...seems like many of the tests are failing on upload due to it [17:56:11] <pfischer> ebernhardson: I just noticed, there are two PRs for sending a distinct user agent with requests originating from the SUP: https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests [18:00:35] <pfischer> Looks like your's is offering greater flexibility, I'll discard mine. [18:02:20] <ebernhardson> pfischer: i just saw that as well, not sure how i missed yours was already in MR [18:15:29] <inflatador> back, working from my 3rd venue today! It's a new record! [18:27:34] <ebernhardson> lol [18:37:33] <inflatador> too many summer kids' activities ;) [18:40:20] * ebernhardson realizes while looking at this that event page titles are namespace prefixed, and our redirect update handling isn't stripping them [18:43:19] <pfischer> ebernhardson: which redirect handling? SUP or cirrus? [19:21:18] <ryankemper> gehel: It seems to be the flushing markers causing the timeout [19:21:28] <ryankemper> https://www.irccloud.com/pastebin/qXmZPRTW/stack_trace.log [20:15:34] <ebernhardson> pfischer: in SUP [20:16:29] <ebernhardson> pfischer: the redirect_page_link fields is prefixed db key, so namespace and underscores, but we load it into the TargetDocument.pageTitle which is mostly unused, except in the case of add/remove redirects [20:19:22] <ebernhardson> for extra fun, `Kill Bill: volume 1` and such things have :, but not to delimit the namespace. But as long as we get the namespace id we can then strip when ns > 0