[10:33:56] just discussed with joseph and looked at the data, only ~100k, 25k and 5k pages are changed on enwiki, frwiki and ptwiki respectively over one week, I'm changing strategy and will index the diff [10:43:48] o/ dcausse, regarding https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1243696: According to Guillaume, they cannot accept the patch because that would result in >70% utilisation of the k8s cluster, which outside the comfort zone. SRE wanted to get back to us with affordable numbers. [10:44:58] pfischer: this patch is now just about testing frwiki to get a sense of the IO and minimal RAM requiremens [10:47:40] we initially request 768Gb of ram which based on https://grafana.wikimedia.org/d/WG4NjDISk/cluster-status-and-capacity?orgId=1&refresh=1m&var-datasource=000000026&var-site=eqiad&var-cluster=k8s-dse&from=now-30d&to=now&timezone=utc&viewPanel=panel-2 [10:48:03] Oh, I missed Balthazar’s comment. Alright. So this would definitely only fit frwiki at max? [10:48:13] peak usage seems around 1.2Tb [10:49:10] 1.2Tb + 768 is ~2Tb which I think is still under 70% [10:49:52] pfischer: this frwiki test is just for us to learn how opensearch behaves with remote volumes and how low we can in term of ram [10:50:55] but I'm open to see what we can do with the max they can give [10:51:55] Understood. Thanks! I’ll get back to SRE and unblock this either way. [10:52:18] thanks! [10:56:12] lunch [14:09:09] \o [14:32:33] o/ [14:38:38] .o/ [15:18:49] Trey314159 sorry for the late scratch! I have to get ready for a capacity planning mtg at the top of the hour. Hooray [15:19:29] no worries! [15:47:20] I feel that writing rows like '{"_index":123}\n{...}' as text is going to hit me at some point... wondering what to do, replace \n with a kind of separator or not writing as text... [15:48:00] dcausse: hmm, i don't follow the concern? Is it that we might miss the \n? [15:48:49] it's that it'll end up being two lines in the dataset and if for some it's re-shuffled it might break the bulk format? [15:49:07] s/some/some reasons/ [15:49:34] spark.read.text("/user/dcausse/semantic_search/ptwiki_full_paragraphs_20260215.ndjson").take(1) -> [Row(value='{"index": {"_id": 309}}')] [15:50:00] but also at read time, I need to read two lines then? [15:50:01] ohh, yes if it shuffles or treats things as per-line that could totally break things....hmm [15:51:25] i guess in convert_to_esbulk i solved that by having a very short path, basically repartition->format to pairs->saveAsTextFile [15:51:38] keeping it in row format until it's ready to write out [15:53:23] reading is more tricky...i think i delt with that somewhere but forget where...hmm [15:53:56] naively, i suppose i would read->mapPartitions to pair it up, but not sure where i've done that before [15:54:04] also I'll have deletes which are single lines... [15:54:28] :( [15:54:41] perhaps I keep a structured format with two columns and concat at the very end? [15:55:08] seems reasonable, one way or another it probably mostly needs to be processed with the unit of work as a single item [15:55:19] only spreading out/in at the very edges [15:55:49] or everything wrapped as a json: {"header": {"index": ...}, "data": {}) that I re-interpret later [15:57:12] but I liked the idea of being able to do hdfs dfs -text /part | split/parallel -> curl [16:00:17] hmm, yea it is convenient to use parallel and curl, needing a pyspark script to ship the changes is less than ideal. But maybe we could have a generic data shipper? A generic shape and a generic script we can reuse? [16:00:33] similar in idea to the mjolnir bulk daemon, but simpler and more specialized? [16:00:39] (and not a daemon) [16:01:47] dcausse: are you around for the resource sizing meeting? [16:01:53] oops [16:42:37] dcausse: are we going to use index aliases? And should cirrus expect some alternate index suffix, frwiki_semantic or some such? Or should we just call it frwiki_content? [16:48:03] calling it frwiki_content is probably the easiest [17:09:46] yes I was planning to add alias and keep _content for this one [17:10:49] i was also pondering where we set the search_pipeline, it seems like the qwen3-embedding could simply be the default pipeline set in index settings, it only defines the default model to use for neural searches to embed their queries [17:11:16] sure [17:11:18] but longer term if we introduce reranking it's not clear if that should be default or not. I suppose as long as it's a dedicated cluster it can be, but otherwise need to thread through to add the query param in cirrus (probably not that hard) [17:12:18] yes... all these search_pipeline bits are still very fuzzy to me... [17:16:09] i'll try and wrap up the cirrus bits today, if they want a plausible api end of next week it needs to be merged before mid-day eu on monday [17:16:24] ack [17:16:43] skipping search pipeline in cirrus for now, assuming we set the qwen3-embedding as default [17:17:08] can this be set after the fact on the index? [17:17:08] well, i might spend an hour to see where it fits in, if easy enough will attach it to the Search objects [17:17:43] yes, it's an index setting, `index.search.default_pipeline` [17:17:56] this means all searches will go through this pipeline? [17:18:13] yes, but in the case of qwen3-embedding it doesn't actually do anything, it just sets a default model that a neural query looks up [17:18:24] ok [17:18:26] the reranking one does though, and probably doesn't make sense as a default [17:19:10] i suppose the main thing the qwen3-embedding is doing is avoiding needing arbitrary model ids (that change per cluster) in mw config [17:25:26] yes that model id was going to be a pain without this search pipeline thing [17:25:54] watching the cluster doing a rolling after helmfile apply :) [17:26:46] i imagine for now, you are manually creating the index settings/mapping? [17:26:53] yes... [17:27:23] i can maybe fit that into the cirrus-toolbox bits, index templates are another stateful api [17:27:29] (but later, not this week) [17:27:47] sure [17:27:58] hm... it did not pick up my jvm settings [17:28:18] still seeing -Xms1g, -Xmx1g, [17:28:21] :S [17:28:47] should be able to kubectl exec into the host via the -deploy credentials and see what actually got passed [17:29:18] you might have to run `env` in the pod itself, it doesn't update the configmaps [17:29:27] it just added new extra pods but did not seem to pick my new mem setting [17:29:44] oh yeah, that's a known issue. The operator basically does nothing besides stand up the cluster and replace dead pods [17:30:21] https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/OpenSearch-on-K8s/Administration#Changing_Resources_on_a_Live_Cluster [17:30:37] it's annoying, but deleting the pods is really fast b/c it doesn't actually delete the data [17:30:59] can I delete the pods myself? [17:31:34] yeah, you need to do `kube_env opensearch-semantic-search-deploy dse-k8s-eqiad` , the regular user doesn't have perms [17:31:36] * ebernhardson loves pods: OCI runtime exec failed: exec failed: unable to start container process: exec: "ps": executable file not found in $PATH: unknown [17:31:55] sure i could dig through /proc, but thats not fun :P [17:32:08] I'd say use nsenter but ya gotta be root for that ;( [17:32:43] hm.. I see duplicated Xmx1G and then Xmx4g on the new pods... [17:35:22] yea same, -Xmx4096M comes last and should win... [17:35:38] ok [17:36:48] Where are y'all setting it? I can take a look at the chart [17:37:05] i hadn't realized it would munge the arguments, /proc/pid/1/cmdline shows -Xmx4096M, but the opensearch startup says -Xmx4G [17:37:21] it's same, i just didn't expect it to normalize it [17:37:28] I need to find the settings to kill these audit & top-queries indices and the like, not super keen having shards being created at random times [17:37:36] same [17:37:49] Yeah, that's something I'd like to apply via whatever default cluster settings thingie we come up with [17:38:35] inflatador: oh i should mention, i went with a narrow cluster settings bit in the cirrus-toolbox that's managing the other stateful bits. It's currently limited to only effecting plugins.ml_commons.* parts of /_cluster/settings [17:38:53] since there is a direct dependency between the config (it references hosts) and the settings (it has a host whitelist) [17:39:47] dcausse: i guess make a ticket? I can probably look into which plugins to remove next week [17:40:25] looks like it did pick up the 4g now: [opensearch-semantic-search-masters-0] heap size [4gb], compressed ordinary object pointers [true] [17:40:44] yes just restart the first original pods [17:40:47] ahh ok [17:40:48] *ed [17:41:04] I guess I can create the index now [17:42:52] apparently there are cluster settings to disable the top_queries indices: https://docs.opensearch.org/latest/observing-your-data/query-insights/settings-api/ [17:42:54] lemme see [17:49:07] applied cluster settings to both that should (probably?) disable the top_queries indices from being created and deleted the existing top_queries indices [17:49:37] the security auditlog will probably still be created, i dunno what we want to do with that. Doesn't seem like the worst idea, but we would need a cleanup routine [17:49:49] ok it's pushing data to frwiki_content_20260215 [17:50:29] or maybe it clears old audit indices on it's own, not clear. It looks like the query-insights plugin might have an auto-cleanup at least [17:52:27] oh hmm, looking at the query i generate in the semantic search also need to disable suggester, shouldn't be too hard [17:54:43] seems very slow :/ [17:54:48] comparing to relforge [17:55:17] index is still red, can't be helping [17:55:55] oh [17:55:58] 6 nodes and 7 shards [17:56:13] did I messed something? [17:56:37] one node died? [17:57:08] hmm, i see 7 in deployment-charts. Dieing would be suspicious, we have nodes 0,1,2,3,4,5. Maybe 6 died, but seems like maybe it never got started? [17:57:37] random possibility, denied for resource limits? [17:57:37] java.lang.UnsatisfiedLinkError: no opensearchknn_faiss_avx512 in java.library.path: /usr/java/packages/lib:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib [17:58:02] oh indeed, there was a 6, logs are avaliable [17:58:27] I use kctl logs pod_id --previous on the node that restarted [17:58:33] hmm, but then how do the others work :S [17:58:56] this is a setting we had to add IIRC on relforge initially [18:00:06] all failed [18:00:08] :) [18:00:08] fwiw, that lib is in plugins/opensearch-knn/lib [18:00:14] just ran _count on the index [18:00:39] i guess it's called libopensearchknn_faiss_avx512.so [18:00:44] where should this go? the image or helm values? [18:01:19] Interesting, I wonder we need some kind of AVX device passthrough? [18:01:20] can it be set from helm? Seems more changable there [18:01:40] baking things into the image when we aren't 100% sure just seems time consuming i guess [18:02:14] -Djava.library.path=$OPENSEARCH_HOME/plugins/opensearch-knn/lib should work in helm values [18:02:37] but not sure about "$OPENSEARCH_HOME" might need to hardcode the path [18:02:55] yea i'm not sure the context that runs in either [18:04:07] you should be able to exec into the pod and run env, that might help nail down the OPENSEARCH_HOME value [18:04:20] it should be /usr/share/opensearch [18:04:25] (from `pwd`) [18:04:27] ok [18:04:31] yeah, that sounds right [18:06:04] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1244729 [18:06:10] lunch, back in ~1h [18:12:50] I deleted all pods at the same time perhaps not the best idea, I wonder if the operator is waiting for a sane state before going to the next pod [18:13:03] seems kind of stuck :/ [18:13:35] :S [18:13:45] delete the cluster? In theory that's what this should be good at [18:13:51] can re-apply settings [18:14:32] you mean helmfile undeploy? [18:14:45] yea, basically remove it from helm and re-initialize [18:14:55] would lose all the data, but nothing important there [18:15:03] well, actually i wonder what happens in ceph [18:15:20] equally plausible it just gets abandonded and lives forever? [18:15:39] or it gets assigned the same namespace/paths [18:15:40] yes no clue :/ [18:15:53] me either, fun with new infra :) [18:15:55] the initial deployment mentions a cluster-bootstrap-0 [18:16:59] ok time to learn then [18:17:18] all gone [18:17:25] :) [18:19:27] dcausse: btw for fields, i think cirrus is going to want to see namespace/title/namespace_text/wiki/timestamp. [18:19:52] surprisingly, os doesn't seem to complain about unknowns in `_source` or `fields`, but cirrus might be surprised later [18:20:28] we can always stuff those in later though, title is there and the namespace/namespace_text/wiki should be constants across the whole index since it's only NS_MAIN [18:20:50] ebernhardson: this is what I have: https://phabricator.wikimedia.org/P89064 [18:21:24] oh yea that should be plenty. Thanks! [18:23:02] sigh it seems in a kind of error loop join validation on cluster state with a different cluster uuid zgn4ztgOTnGdW3g6NImERg than local cluster uuid zr2lEu_QQ3iB_xF6SlRUqg, rejecting [18:23:09] seems like data on disk was kept [18:23:57] not sure what to do... wipe-out manually some state within the pod to help it? [18:24:13] or recreate the ceph volumes? [18:24:39] I guess I broke it for real and I now need an adult :P [18:24:48] hmm, yea deleting /var/lib/opensearch/nodes/0 might work [18:24:58] lol [18:25:21] i would be tempted to delete the ceph volumes somehow, but not sure if it initializes that itself or if it was done separately [18:25:32] I'm going to guess it creates them itself, it would have to to add nodes? [18:26:28] no clue :/ [18:27:24] well... Node 'opensearch-semantic-search-masters-0' initialized [18:27:37] -1 is booting [18:27:45] but will probably fail the same way [18:29:36] ok doing rm /var/lib/opensearch/nodes/0 on every new nodes [18:42:23] ok seems back [18:42:47] ebernhardson: I guess I broke your settings as well, is it easy to reship them? [18:43:40] hm not sure... it did not complained when I re-created the index with the search_pipeline [18:43:45] yea super easy, sec [18:44:14] well, if the cluster is happy :P {"error":{"root_cause":[{"type":"m_l_exception","reason":"Fetching master key timed out."}],"type":"m_l_exception","reason":"Fetching master key timed out."},"status":500} [18:44:32] hmm [18:45:05] :/ [18:45:07] oh there it goes, just needed a sec [18:45:13] should all be applied now [18:45:17] thanks [18:45:29] indexing now [18:45:53] dinner, will check back a bit later [18:46:38] hmm, i guess i need to figure out a variant of that config that can be applied to a local cluster...maybe with llama to provide the embedding [18:51:06] hmm, do we pass the fulltext profile into SemanticResultsType, or add some sort of SearchContext::$resultsTypeOptions to allow the query builder to provide the options [18:51:26] * ebernhardson realized he has hardcoded options in SemanticResultsType that are parameters to the SemanticQueryBuilder [18:52:10] i guess i feel better not adding even more indirection...seems plausible to pass them in directly [19:03:27] sigh... java.lang.UnsatisfiedLinkError: /usr/share/opensearch/plugins/opensearch-knn/lib/libopensearchknn_faiss_avx512_spr.so: libopensearchknn_util.so: cannot open shared object file: No such file or directory [19:07:53] it needs LD_LIBRARY_PATH as well I guess [19:14:10] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1244764 [19:15:31] LGTM, +2 [19:15:58] back [19:18:04] it's slowly crashing all pods... hoping that it won't be stuck again :/ [19:19:45] applied will wait a bit to see if it can recover [19:21:02] phew... yes it recovers [19:22:13] spoke too soon [19:26:12] yes.. when it's in this state it does not want to pickup changes and will refuse to spin up deleted pods [19:26:59] deleted opensearch-semantic-search-masters-6 and it's not willing to re-adding it while the cluster is in this bad shape [19:27:14] well... giving up for today [19:31:27] Did you completely wipe out the cluster and try to bring it back? If you do that without deleting the persistent volume claims it won't form a new cluster. I'll add that to the docs [19:33:48] inflatador: we weren't sure how to drop the ceph stuff, so instead issued a manual rm -rf on the nodes/0 directory in ceph [19:33:58] (not sure if he did this time around, but it worked the first time to reset the cluster) [19:34:19] well, the rm -rf was from the host that had it mounted, but the effect was to clear out the node state in ceph [19:34:45] s/host/container/ [19:37:45] ebernhardson, ACK. I'll add how to do it to the docs, after that I can wipe everything out if you'd like [19:40:33] inflatador: sure, docs on how to re-create would be awesome, if you have time to also run it and verify would be great [19:41:00] on the upside, it turns out i need all this extra config for my local test instance too...shouldn't be surprised [19:58:44] * ebernhardson hearts java error messages. Never enough context: Invalid URI. Please check if the endpoint is valid from connector. [19:58:50] like, you could at least print the URI you tried to use :P [20:01:05] LOL [20:01:41] Just redeployed the cluster. Docs on how to completely wipe it here: https://w.wiki/H$ru [20:06:17] thanks!! [20:11:02] * ebernhardson apparently never remembers that _ is not a valid hostname character [20:11:57] blame arpanet apparently :P [20:18:35] i suppose a side benefit of knn...the results don't have to be any kind of related. Stuff any 5 docs into the index and they will be returned by knn [20:19:16] (maybe a downside too, better not deploy knn to tiny wikis :P) [20:21:16] randomly curious, cluster has been deployed 20 minutes, first attempt to upload an ml connector still failed with the timeout bit. Re-issuing request 2s later worked fine. weird [21:11:37] restarted frwiki import, working fine finally :) [21:12:08] thanks! [22:08:08] hmm, wmf-config really wants a labs service