[10:53:44] lunch [14:12:49] o/ [14:30:13] I think the opensearch image is ready, ref T414697 . But if y'all are gonna keep iterating on the innerhits plugin I could help y'all set something up where you can rebuild the docker image locally on relforge [14:30:14] T414697: Build the required plugins for opensearch 3 - https://phabricator.wikimedia.org/T414697 [14:30:45] inflatador: thanks! [14:31:02] \o [14:33:21] o/ [14:42:12] dcausse: looks like relforge is idle, any problem if i start swapping it to the new opensearch image? [14:44:44] ebernhardson: no problem, I'm not using it at the moment [14:45:16] playing with the test k8s cluster [14:45:27] kk [14:55:43] ebernhardson: and whenever you have a moment I'd have this dag https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/2010 to enable [15:00:21] dcausse: sure i'll take a look [15:00:27] thanks! [15:03:53] i still randomly wonder if we need some way to manage hive schemas. Having the same scheme in the test suite and the airflow repo seems awkward, but no great solution [15:05:43] very true :( [15:06:06] I could have added the create table in discolytics perhaps [15:06:26] but tbh I created it manually before hand to start populating some data [15:06:31] totally fine :) [15:06:48] hmm, apparently dbt has something for managing DDL lifecycles [15:06:58] (no clue if we actually need yet another tool :P) [15:07:49] yes... tbh, I was planning to ignore dbt for now :) [15:08:04] but definitely worth a look once there are some usecases in place [15:08:19] same :) [15:36:29] hmm, i wonder what opensearch is going to think about downgrading relforge 3.5.0 -> 3.3.2...testing locally before blowing that up [15:36:40] meh.. just thinking about something possibly annoying... if we can't control opensearch model ids, that means we need to vary them depending on the cluster we target? [15:37:11] which is going to be a pain with discovery endpoint :/ [15:37:34] dcausse: hmm, i guess it has to be wrapped into a named pipeline? [15:37:45] or does that even work with query embedding? hmm [15:37:51] no clue... [15:38:28] will test few options... possibly we can force the model id at creation? [15:40:32] i don't think we can force the model id, i didn't look too closely but when i was investingating how to manage configuration of pipelines across clusters i didn't see anything [15:40:44] school run, back in 20 [15:43:54] :( [15:58:37] https://docs.opensearch.org/latest/search-plugins/search-pipelines/neural-query-enricher/ seems way too complicated :( [16:03:02] If we need a 3.5.0 OpenSearch image LMK, shouldn't be too hard to do [16:58:56] heading out [16:59:03] take care [17:00:42] for cirrus-toolbox, i guess my idea was to add python scripts that will idempotently apply pipelines/models/etc and be able to lookup model-ids, etc. [17:00:57] would be fed from .json files defining them [17:01:10] workout, back in ~40 [17:18:54] meh, the upstream opensearch image use user 1000, the one in our registry uses 999. Which means they don't like to reuse each others volumes without a manual step in the middle :P fixable but annoying [17:22:28] hmm, no opensearch is very specific. no downgrading: java.lang.IllegalStateException: cannot downgrade a node from version [3.5.0] to version [3.3.2] [17:23:02] it's fine, i was just trying to keep things the same across the board [17:39:51] hmm, i suspect ml model groups might have something to help with aligning versions, but the 'Model group APIs' section of the docs 404's :P [17:45:52] hmm, maybe not. At least if gemini is appropriately able to guess what would have been on those pages. It sounds like they are more of an access control bit [18:49:10] heh, opensearch has a "guardrails" bit for llm inference. It amounts to a list of naughty words/regexes it should reject [19:02:45] very curious error...[relforge1010.eqiad.wmnet] Updated ML task successfully: OK, taskId: 603Xi5wBeX-V_pZjsoY4, updatedFields: {state=FAILED, error=timeout after 600 seconds} [19:53:23] _plugins/_ml/connectors/_create vs _plugins/_ml/models/_register ... i should find a way to ignore the inconsistencies [19:55:33] sigh...the answer is you can embed a connector into a model definition, but if you do and it doesn't like the connector definition it just hangs forever [19:58:41] i wonder if the tasks ever timeout ... it starts a task (in /_tasks) and that task just hangs, and is marked uncancelable. So far it's been idling for 25 minutes :( [20:01:56] * ebernhardson will just avoid nesting them...already have to have routines to looking id's anyways [21:07:25] * ebernhardson waffles on if we should add a bit for configuring cluster settings as well...but it's a little awkward since everything models/connectors/pipelines have one config per item, but cluster settings are all unified [21:33:50] most of the way there, have some tooling that can compare a directory of configs against the existing ml-commons state and apply it if requested. Next up figure out the search pipeline bits... [21:34:16] NICE [21:36:05] i skipped the cluster settings bit :P Probably could do it, but should probably talk about what exactly we want it to do if so before hand [21:38:12] yeah, I'm just starting to test the new chart and operator version. I assume we just want something that takes a json or YAML from a values file and applies it. Balthazar was showing me helm chart hooks ( https://helm.sh/docs/topics/charts_hooks/ ) [21:39:15] Lots of ways to do it, I just don't want to diverge from the upstream chart too much if we don't have to [21:39:21] in this case i added to the cirrus-toolbox repo an extra python tool, maybe best if we keep the settings super simple and apply them there. I imagine we can just store the expected settings in a file in helmcharts? [21:39:33] (by there i mean not integrating cluster settings to the new bit) [21:40:52] You mean keeping plugin settings in a file in helm charts? Yeah, that should be fine AFAIK [21:41:52] not the plugin settings, but the cluster settings. I guess i was thinking the plugin settings we can manually bring the repo to some host (deployment host or whatever) and run them, but if we want the cluster settings integrated with the helm chart (totally reasonable) that might make sense as it's own thing that just applies the json file to the /_cluster/settings endpoint [21:42:21] the plugin settings could maybe go there too, for now i have it reading a config dir in the same cirrus-toolbox repo, but that's totally configurable [21:42:36] it did seem a bit awkward to store the settings in that repo, but i didn't want to create yet-another-repo [21:43:31] yeah, agreed on both counts. I think we do have to keep the settings in the deployment-charts repo. As far as applying it I'd like to have that integrated into the chart too, but again without diverging too much from upstream [21:44:40] I doubt it will be that hard, but I just started looking into the newest version of chart and operator, makes more sense to start there since we'll need to switch over soon for the new operator [21:45:13] we could probably have gitlab build some container it can sidecar if we do want to automate the application of plugin bits, but i suppose i was thinking changes would be rare and can be deployed "manually" [21:46:24] yeah, that's totally fine, especially considering the quirks you mentioned above ;)