[09:24:30] Hey folks, This is Ozge from the ML Team. I'm working on semantic search project. One of the ideas for the next steps is to have hybrid (lexical + vector) search in Open Search. https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/index/ Therefore, I'm trying to explore our Open Search instance. Do we have an OpenSearch Dashboard that we can look into via a UI (I'm looking for something similar in the screenshot) ? [09:24:30] Do you know the name of the index we use for lexical search? Do you think we should keep investigating OpenSearch for vector search before starting looking into other options? Finally, is this the right channel to ask this kind of questions? :) Indeed, we will have a meeting soon but I wanted to start collecting some information. [09:24:36] https://usercontent.irccloud-cdn.com/file/XswIuzyk/image.png [09:48:44] ozge_: hey, yes this is the right channel :) [09:54:12] ozge_: the production opensearch cluster we use for CirrusSearch and where we index all the wikis is very old 1.3.20 and does not have all the features we'd need [09:54:46] but DPE should have opensearch running in k8s very soon and with a more recent version (checking which version) [09:55:20] DPE SRE is working on an OpenSearch cluster on k8s, which should allow us more flexibility to create additional OpenSearch instances. And will allow us to play with more recent version of OpenSearch. [09:55:29] :) [10:00:57] seems like it's opensearch 2.7.0... ideally we'd want at least 2.11 I think, it has neural sparse search which could perhaps be an alternative to a costly vector index [10:02:25] ozge_: no we don't have opensearch dashboard running on the production search indices, but if you have specific question please let us know (shape of the indexed doc, fields we index) [10:03:18] ozge_: if you're around this afternoon we could have a chat altother with the rest of the search team? sending an invite just in case [10:03:55] ah today is P&T staff meeting [10:05:41] ozge_: sent an invite, this meeting runs for longer than its schedule so feel free to poke in even after the meeting schedule we might still be there [10:49:05] lunch [10:49:09] Awesome! Thank you both David and Guillaume. Thank you for the meeting invite as well. I'll join. All useful information! Indeed according to the documentation, hybrid search is introduced in 2.11 and I guess it'd even be better to get closer to latest ~3.x. The other option is that we could serve vector search as an api separately (independent from cirrus/open search) . So that the service which queries cirrus search should also query [10:49:09] the new vector search api and then merge the search results into a single list. So that, we will have hybrid search results. Indeed, we have some promising results for vector search from the previous phase of Semantic Search poc. We are missing results for neural sparse search though. I see it's introduced in 2.11 and ANN version is introduced in 3.3 https://docs.opensearch.org/latest/vector-search/ai-search/neural-sparse-search/ Let's [10:49:09] discuss our options in the meeting. [11:57:17] lunch [13:28:49] ozge_: Here’s the ticket BTW: https://phabricator.wikimedia.org/T409898 - Could you comment with your requirements, please? [14:00:14] o/ [15:27:06] o/ I have a few items for our Wednesday Meeting, but that collides with the P & T Staff Meeting (and I don’t have time afterwards). So just a quick survey: Who want’s to attend the P & T live? [15:28:26] \o [15:31:34] o/ [15:32:14] I'll attend the wed meeting I guess since I invited Ozge [15:32:39] Yeah, I can be at the weds mtg too if that is helpful [15:39:22] inflatador: I'm back in https://meet.google.com/aod-fbxz-joy?authuser=0 if you still want to chat [16:02:23] * pfischer is still stuck in a meeting [16:10:59] Reminder I’m out today on pto [17:16:53] workout, back in ~40 [17:56:22] dinner [18:08:13] ebernhardson: did you see Jaimie's message on slack? https://wikimedia.enterprise.slack.com/archives/C05H0JYT85V/p1762970714102749?thread_ts=1762958311.362749&channel=C05H0JYT85V&message_ts=1762970714.102749 [18:08:30] thanks [18:10:21] back [19:13:14] lunch, back in ~40 [19:37:05] back