[09:16:58] o/ [09:17:42] o/ [10:55:41] lunch [14:58:20] \o [14:58:44] o/ [14:58:54] .o/ [15:38:17] randomly pondering while looking at the cirrusDumpQuery for natural title sorts, I almost wonder if we could swap out query building for something that just hits the all field, we are throwing away the scores anyways when sorting. Probably adds unnecessary complication, just a thoght [15:39:10] the frequency of those kinds of queries is low enough the perf probably doesn't matter to us, maybe minor benefit to end users if they run quicker [15:39:58] I wonder if lucene will do some magic optimizations if not using a scoring collector and seeing a query like {bool: [filter: [], should: []} [15:42:09] we at least seem to skip the rescore part [17:01:58] workout, back in ~40 [17:07:08] sigh..i did muck up dump timestamps still. 20260104 in the directory name, but 20251228 in the file names :( https://dumps.wikimedia.org/other/cirrus_search_index/20260104/index_name%3Dcommonswiki_file/ [17:17:24] :/ [17:18:20] well I guess that it's recent enough that we can still make changes without breaking much existing clients [17:22:24] looks like easy enough fix, just an oversight. I was reviewing before submitting the patches to repoint the html pages and shut down the old dumps. [17:23:18] was also thinking it would be nice if we published some way that would download and import the dumps, both for us to verify and for end-users to semi-easily get an opensearch instance with the data...but thats more like a hackathon projcet [17:23:40] a sidecar docker image that can talk to a configured opensearch instance or some such [17:25:52] could be nice indeed, it's generally a oneliner with curl piping the _bulk api but something that does everything from creating the index copying mappings/analysis from the cirrus api would be great [17:26:43] yea i was mostly thinking about easing the mapping/analysis bits [17:27:01] i usually download from the api then hand-edit to remove things that can't be there [17:27:06] same [17:43:30] back [17:48:29] maybe the easier answer is to expose a variant of settings/config dump that generate them (like cirrusdoc vs cirrusbuilddoc) [17:48:37] ebernhardson I have a crappy script that does that, feel free to hack on it if you like https://gitlab.wikimedia.org/repos/search-platform/sre/dumps-2-opensearch [17:48:47] Guessing you probably already have something better though ;) [17:49:19] inflatador: nothing better, usually i hand edit the settings/mapping data, create the index, then fire off the curl command found in the old cirrus dump script (basically curl w/ gnu parallel) [17:49:59] * ebernhardson is separately annoyed that `parallel` in prod instances is not GNU parallel [17:50:44] that one's mainly borrowed from an old elasticsearch blog post [17:57:51] +1 to a cirrus-build-mapping version in addition to extracting the current mapping [17:58:23] +2 [17:59:41] will ponder, part of the question is how far to go with it. I think the current one talks to the search cluster to do feature detection [17:59:54] (mostly plugins) [18:37:31] What do y'all think? Should average bulk API rejection rate of > 5% over 2 minutes be considered a critical alert for OpenSearch on K8s? I'm leaning towards yes but the default is "warning". hmm [18:38:49] hmm, hard to say. On the one hand i would be very surprised if we were seeing 5% rejections on the bulk update api, so then yes. But I'm also not sure how it gets there. It's either a bug in the updater, or the whole system is crashing and all the alerts are firing anyways [18:40:08] but i guess i don't watch those numbers, i just assume they are low-ish. There is also the problem that we expect some failures, but those should be per-action failures and not a failure of the top-level bulk request iirc [18:42:18] if it's 5% of bulk requests that fail it seems pretty bad... [18:43:14] even 5% of actions seem quite bad, that'd mean 5% of your writes get ignored [18:43:56] yea 5% is pretty high, maybe it's reasonable to alert there. We can find out in the future if it only fires when 12 other alerts are also firing [18:46:16] ACK, thanks, I'll err on the side of caution and make it 'critical' [19:06:57] deployed the change to the dumps and cleared the tasks so it re-runs the dumps released on 20260104. In part just to better understand what happens when we re-publish a dump and if we need extra cleanup routines [19:16:39] sure [19:16:42] dinner [19:30:07] sigh...i guess i could just think about it. AFAICT since the target directory already exists, it will probably create a `snapshot={internal_snapshot}` directory inside the public snapshot directory [21:12:05] Hmm, i wonder what the appropriate way would be to place a DEPRECATED file in https://dumps.wikimedia.org/other/cirrussearch/ [21:12:10] i could just manually place a file there, but seems wrong [21:12:29] but doing it with puppet feels like overkill :P [21:30:50] hmm, actually i'm not sure how exactly to manually place a file anymore. Before i could log into snapshot instances and manipulate the NFS fs directly, but snapshot severs don't exist anymore :P [21:38:28] proposed deprecation document: https://phabricator.wikimedia.org/P86770 [21:42:15] i wonder if we have docs on mw.org or elsewhere that also need updating... [21:47:08] Looks good to me, FWiW ;) [21:47:22] https://gerrit.wikimedia.org/r/c/operations/alerts/+/1223727 ryankemper CR up for adding a few more OpenSearch on k8s alerts [21:48:46] I set JVM to 95% for a critical alert, the default 75% seemed a bit silly [22:01:30] Feeding dog, at pairing in 4’