[10:05:03] lunch [13:56:43] Hi search team, I am a new developer on the growth team who is reaching out with a question about getting cirrus search to work locally. [13:58:40] https://www.irccloud.com/pastebin/SaWPR5Rk/ [13:59:26] eileen-m: hi! seems like you use the wrong elasticsearch image [14:00:02] Thanks, which elastic search image should I use? [14:00:24] you need the cirrus opensearch image, I'm using this one at the moment: docker-registry.wikimedia.org/repos/search-platform/cirrussearch-opensearch-image:v1.3.20-6 [14:01:50] dcausse: does that mean https://www.mediawiki.org/wiki/MediaWiki-Docker/Configuration_recipes/ElasticSearch has the wrong image name? [14:10:17] \o [14:12:19] urbanecm: should be https://www.mediawiki.org/wiki/MediaWiki-Docker/Configuration_recipes/OpenSearch [14:12:38] o/ [14:12:48] .o/ [14:12:48] maybe it's not linked properly from cirrus [14:13:36] yes... https://www.mediawiki.org/wiki/MediaWiki-Docker/Extension/CirrusSearch needs some update [14:16:26] seeing in query clicks many clicks pointing to https://en.wikipedia.org/w/index.php?curid=55280845 or https://en.wikipedia.org/w/index.php?curid=19266855 [14:20:42] eileen-m, urbanecm: updated the doc at https://www.mediawiki.org/wiki/MediaWiki-Docker/Extension/CirrusSearch, hopefully it should point to the right recipes now [14:23:16] dcausse: huh, curious. Is there some new ui component? [14:24:11] no clue... just stumbled on those because they failed feature collection on my side, no clue what's happening but over a 10k sample I got ~50 of them [14:43:23] eileen-m, urbanecm if you are working on an Apple Silicon Mac, note that if you use the default opensearch image it will probably run, but very slowly. If you use gmodena's arm64 image, the plugins it uses are out of date. IIRC, Chinese will fail because there is a typo in a plugin name ("opensearch-analisys-stconvert" [14:43:55] ...), and Japanese will fall back to CJK instead of using Sudachi (which is not installed). [14:44:26] * urbanecm uses `image: kostajh/wmf-elasticsearch-arm64:7.10.2` right now, and it seems to work, although i haven't needed to re-set CirrusSearch locally in a long while [14:44:50] Nice.. I'll check that out! [14:46:43] if y'all want to file a ticket for making a universal OpenSearch image I'm happy to follow up. Not sure what's required, but it seems like the right thing to do [14:48:11] kostajh/wmf-elasticsearch-arm64:7.10.2 is no longer what we run, could be fine for some light testing but might soon become obsolete as we upgrade to newer opensearch version [14:48:23] Ah.. `kostajh/wmf-elasticsearch-arm64:7.10.2` is 3 years old. I did use that one for a while but had to abandon it, I forget why now. [14:48:50] it's elasticsearch not opensearch [14:48:53] Possibly because dcausse told me to. `;)` [14:48:59] :) [14:49:01] oh, yeah.. that'll do it [14:49:33] * Trey314159 obviously does not love docker and wants it to just magically work in the background... [14:50:25] one can build its own cirrus opensearch image with https://gitlab.wikimedia.org/repos/search-platform/cirrussearch-opensearch-image [14:51:17] tried to add arm64 support but stumbled on some blubber limitations [15:09:27] Thanks for all the info. I was in meetings for the last hour, which is why I didn't respond, but I'll update my local docker-compose.override.yml now to reflect all this new information. (Also, I am on an apple silicon mac, for context.) [15:50:33] Hi, I tried following the updated instructions and the updated docker-compose.override.yml, but I still get an error. I'll post my output below. [15:51:10] https://www.irccloud.com/pastebin/KPgVBFeO/ [15:51:39] eileen-m: hmm, suggests it doesn't have the latest version of the search-extra plugin [15:51:50] i thought i updated the image, since our integration bot uses it...sec [15:52:13] oh, but you have the arm version? I don't know how to build an arm image, i just have x86 [15:52:41] the integration bot uses docker-registry.wikimedia.org/repos/search-platform/cirrussearch-opensearch-image:v1.3.20-6 [15:53:16] Also, I noticed that the docker-compose.override.yml here https://www.mediawiki.org/wiki/MediaWiki-Docker/Configuration_recipes/OpenSearch has [15:53:16] volumes: [15:53:16] - esdata:/usr/share/opensearch/data [15:53:45] and then later under volumes it refers to osdata... was that intentional? I had to make both osdata in my local docker-compose.override.yml [15:54:07] eileen-m: probably they should both say osdata, sounds like missed in the migration from elastic -> opensearhc [15:54:22] got it, thanks. [15:55:01] eileen-m: the error from mediawiki installation is almost certainly due to the version of the search-extra plugin (which provides, among other things, adds that add_regex_start_end_anchor char filter). I added that justa month ago or so [15:55:15] also, to answer your earlier question - I do have the arm version. I'll look at the integration bot link, thanks [15:55:49] eileen-m: i suspect you need to build cirrussearch-opensearch-image on an arm instance, but honestly i've never successfully made an arm container so not sure [15:59:26] this esdata vs osdata is my fault... fixed [16:00:21] I don't think any of us have an arm machine to push a fresh build somewhere :( [16:01:42] I should try again with blubber, in theory the baseimage is multiarch and we don't build anything (it's just jars which are multi-arch) so it should be possible to build it without an arm cpu [16:02:20] yea, in theory you can build an x86 image and then copy appropriate things into an arm image and it "just works" [16:02:35] shouldn't need to run the jvm, i think [16:02:47] quick break, back in ~30 [16:04:36] dcausse: if you can provide me with some instructions on how to build it, i can try. eileen-m and i are meeting tomorrow to hopefully set it up somehow. [16:04:37] thanks for all the help! My onboarding buddy, Martin, and I are going to work on this synchronously tomorrow. I'll reach out then if I hit more blockers or have follow-up questions. [16:07:09] urbanecm: sure, see the "build locally" section at: https://gitlab.wikimedia.org/repos/search-platform/cirrussearch-opensearch-image#building-locally [16:07:20] ty! [16:14:58] that's where I stopped: T334254 [16:14:59] T334254: Error when using multi arch build on gitlab with blubber and kokkuri - https://phabricator.wikimedia.org/T334254 [17:16:57] * ebernhardson glances through the did-you-mean ab test analysis code...and hope i don't have to change to much and it just works :P [17:17:30] but of course have to change it some, because config required technically running three tests, but i think i can inject a simple rename early on and ignore that [17:48:45] * ebernhardson accidently passed a pandas dataframe to `sorted` and now, even after interrupting the kernel, it's been spinning for 3 minutes :S [17:49:05] no alerts for that stat hosts yet ;P [17:49:16] lol, it's probably single threaded at least. [17:49:36] yea, just sitting with one cpu at 100% [17:50:20] oh, actually it's worse, it's going to try to stringify that dataframe 3 million times :( I guess i should just kill python [17:50:50] (and then it will feed that 3 million strinfied and joined dataframes into the regex engine, lol) [17:53:08] Now that's what I call fun [17:55:17] Oops, looks like I already created a ticket for a multi-arch opensearch image T398461 [17:55:18] T398461: Attempt to build multi-arch cirrussearch-opensearch docker image - https://phabricator.wikimedia.org/T398461 [17:57:12] totaly unrelated, but i realized while working on this...in python you can reach up the stack and extract local variables from the local context of anything in your callstack. You could also change them [18:08:18] results from d1 (prefix_len=1) and d1v (opening-text+prefix_len=1) test: more suggestions from d1v than control, more from d1 than d1v. manual selection rate stayed constant (~0.5%), automatic rewriting increased 0.5% to 10.9 on d1, and 11.0 on d1v. article clickthrough on auto-rewrite up from 24.5 to 25.3 on d1, down to 23.9 on d1v. [18:09:13] article clickthrough on manually-selected queries constant between d1 and control, d1v declines by 3%, but it's not significant [18:09:25] overall, surprisingly to me, it seems opening text doesn't really help. [18:09:33] but prefix_len=1 makes a difference [18:10:34] (on en, de, and fr. I'm pondering what splits to generate) [18:14:54] but on en, de, and fr that adds up to approximatly 1k additional successfull rewrites from a poorly performing query to something that generates an article clickthrough [18:15:03] 1k per day [18:49:02] doh, realizing i have to combine some of this data. If automatic rewrite rate increases, but clickthrough rate decreases, can it still be a net win? Or maybe we need a second rate against the full dataset [18:52:08] d1 did 35k additional rewrites, and 8.3k more article clickthroughs. d1v did 44k additional rewrites and only 1.5k more article clickthroughs. guess it doesn't matter :P [18:52:47] although if we expand that with credible intervals, pretty sure it overlaps [21:20:51] ebernhardson: interesting stats.. but it's too much to do over IRC this late in the day. Definitely should talk tomorrow in the Wednesday meeting!