[12:02:19] lunch [14:21:11] \o [14:38:35] o/ [14:42:55] .o/ [14:43:43] I'm presenting on Incus at DPE deep dive in ~15m if anyone is interested. Ref https://linuxcontainers.org/incus/docs/main/ [16:22:33] dcausse: for the settings that go in /_cluster/settings i was waffling...i could add a bit to this to also manage cluster settings but wasn't sure if we want it here or if we want a static json of config in the helmfile's repo [16:23:18] for cluster settings i guess the idea would be to unify multiple json files into a single thing, then make sure the cluster settings match [16:24:31] ebernhardson: sure, issue I think that today we have no way to push these settings, although we could into why this additionalConfig thing is not working as expected in the opensearch chart [16:24:45] s/could/could look [16:25:22] it's a bit annoying that we can't configure the cluster from the helm file [16:25:27] hmm, i suppose if it's easier i can get it done here pretty easily [16:25:37] i'm not sure about helm, could get lost for days :P [16:25:45] we can do it manually in the meantime I guess [16:26:16] tbh we can survive setting this manually [16:27:30] kk, i'm working up the auth bits now. Should be easy [16:27:40] thanks, [16:27:58] relatedly I uploaded a google sheet with some numbers [16:28:53] it's mainly based on disk capacity for now, I'm hesitant to draw any conclusions on the throughput without having the data actually sitting in ceph and the k8s opensearch cluster [16:31:39] the same semantic search requirements one? [16:33:04] seems plausible, one thing i wonder about is deleted docs. We have very few in the relforge indices, but those will take space in prod [16:33:15] although with a once a week update, maybe we force merge? [16:34:17] ebernhardson: it's at https://docs.google.com/spreadsheets/d/17Ipli-b1Mlrqx22cihgsiJOKUFDQSREFYQKVQPC5zPo/edit?gid=0#gid=0 [16:34:32] yup, same one [16:34:33] for now no deleted docs, it's assuming a complete re-index per week [16:34:50] ahh, a new index each week? [16:34:56] but means at least 2 copies have to be live [16:34:59] ya [16:35:57] I can look into a diff approach but was also worried by deleted docs [16:36:15] the two copies should work fine as long as there aren't space concerns [16:36:58] there's 140Tb of ceph ssd apparently, here we'd need a bit less than 10Tb for en,fr and pt [16:37:06] i guess i was thinking since spark-nlp is only re-embedding the new ones we would ship that dataset, but two indices keeps things simpler (maybe more indexing) [16:37:10] oh nice! [16:40:05] another possibly optimization is indexing lower dimensions, qwen3 embeddings should support this but this needs further testing [16:40:17] s/possibly/possible [16:46:17] dcausse: whats the dns for the k8s cluster? i should know this... [16:46:36] ebernhardson: https://opensearch-semantic-search-test.svc.eqiad.wmnet:30443/ [16:48:13] meh, my first attempt at passing credentials still gets 401 :P never as easy as i hope [16:49:22] i wonder if it's something like requests changing the password into a hash for basic auth [16:52:06] It works with curl but not with requests? [16:52:48] inflatador: didn't try curl yet, i probably should [16:54:07] hmm, yea `curl -u opensearch:password https://opensearch-semantic-search-test.svc.eqiad.wmnet:30443/_search/pipeline` fails [16:54:26] using the password from deployment host helm secrets [16:54:55] ebernhardson does the password have a special character? might need to wrap in single quotes? [16:55:06] nope, it's all ascii [16:55:10] err, printable ascii [16:56:16] lol, i had double checked which error code 401 is (i forget sometimes :P) and just now noting the ai-answer linked to a video explaining what a 401 is...i wonder if people actually click through [16:56:29] * ebernhardson just wanted the MDN page [16:57:30] ah yes, it is probably a permissions issue then. The default user has r/w on indices [16:57:36] :) [16:58:02] oh, yes we need to update cluster settings other things [16:59:48] i wonder if i should bake in the REQUESTS_CA_BUNDLE bits, by default python requests uses the python cacerts package instead of the system certs, so it would always have to be run with REQUESTS_CA_BUNDLE env set. But something feels icky about baking that in [17:00:24] OK, that's on me. I'll get a ticket started to add the permissions for the default user [17:00:40] hmm, i guess i didn't try the operator user. I can just source that credentials file [17:00:54] yeah, if you have permissions to read the operator pw feel free [17:01:53] hmm, the operator user/pass also fails with curl :S [17:03:19] what error do you get? I get `type":"security_exception","reason":"no permissions for [cluster:admin/search/pipeline/get]` for the opensearch user [17:03:33] i get a 401 with the content 'Authentication finally failed' [17:09:47] "Authentication finally failed" usually means user/pw issue [17:10:23] with curl I do `UPW='opensearch:pw'` and then `curl -u ${UPW} https://opensearch-semantic-search-test.svc.eqiad.wmnet:30443/_search/pipeline`. That gives me the permissions error from above [17:10:28] hmm, i get the same for both username/passwords in the /etc/helmfile-defaults/private/dse-k8s_services/opensearch-semantic-search/dse-k8s-eqiad.yaml file [17:11:39] ebernhardson: perhaps you use the prod cluster pass and the test cluster url? [17:12:02] I'm using the passwd from `cat /etc/helmfile-defaults/private/dse-k8s_services/opensearch-semantic-search-test/dse-k8s-eqiad.yaml` as well, maybe try setting the user/pass as a variable like I did? [17:12:04] oh! yea that's probably it [17:13:22] yea now i get a forbidden instead of a denied on the opensearch user, and it works if i use the operator user [17:13:35] i suppose i should have it default to the operator? No need to give the default user access to change things i suppose [17:14:18] I've opened T416714 and T417328 to improve the UX [17:14:19] T416714: OpenSearch on K8s: Create separate admin user for cluster operations - https://phabricator.wikimedia.org/T416714 [17:14:19] T417328: Explore K8s-native OpenSearch user management - https://phabricator.wikimedia.org/T417328 [18:35:13] dinner [20:13:05] hmm, clearly something i don't understand about search processors :S Attempting to save a search_pipeline that includes innerhits gives 3 errors, one for each node, saying it's not installed [20:13:21] but clearly, for it to get far enough to replace inner_hits with the full processor_type from the code...it's installed [20:13:30] (and _plugins agrees) [21:04:06] oh i was just being silly...you have to register the processor with the same value you return from getType() [22:23:57] o/ I just had a chance to talk to Jazmin and she invited us to share our concerns regarding Semantic Search in a WIP document, that will eventually become a decision brief for how to scale semantic search beyond the first experiment. I created a section “Risk Assessment & Mitigation Considerations” and would highly appreciate your input: [22:24:05] https://docs.google.com/document/d/11b8cG1anQaGC2mYlcIRfMIUfdIxPDt7U5GpvM4ET0B4/edit?tab=t.uw184inb8xt [22:24:22] She also offered to join us for a Wednesday meeting to do a guided risk assessment with us. If you are interested, I would invite her for next week.