[10:27:47] errand+lunch [14:25:21] I think we need the wmf certs on the opensearch image and tweak the jvm options to tell the jvm to load them, I installed them manually last time [14:25:28] difficulty is that it's not debian :/ [14:25:55] and of course I did not take notes about what I did :( [14:29:03] o/ [14:29:38] dcausse could you use our opensearch image on relforge? I guess not until it has the plugins? [14:29:59] inflatador: I think Erik installed it recently [14:30:14] but I can't seem to be able to access liftwing now [14:31:30] I tweaked things directly inside the image it's why I need to do that at image building instead [14:33:02] or we go debian? no clue how hard it can be [14:36:39] inflatador: where is the image repo located? searching gitlab but no luck yet [14:39:18] dcausse sorry, it's at https://gitlab.wikimedia.org/repos/data-engineering/opensearch [14:39:50] oh ok thanks! [14:40:36] ah I totally missed something then! [14:40:44] this one is debian based [14:40:50] I wouldn't feel too bad, gitlab search sux ;( [14:41:26] nvm this image should hopefully be good, it's just the relforge one that I need to "retweak" [15:15:11] \o [15:16:01] yea i built a new opensearch image with the plugin, but that lost any customizations :( [15:16:18] o/ [15:16:22] no worries! [15:26:09] dcausse https://phabricator.wikimedia.org/T417457 [15:44:09] sigh.. feeling a bit dumb, was importing a big index and did tweak one of the instance, obviously the index process did not like that and crashed... [15:47:00] Our Wednesday Meeting conflicts with P&T Staff Meeting. I do have a question regarding re-ranking, but I could ask that offline if you would prefer to attend the P & T meeting. Who would want to join P & T? [15:54:03] will have some spark questions for Erik as well and I'm fine watching the recording of the P+T meeting [16:57:25] I'm in the P & T and they said they will have another session next week for questions [18:50:43] err, meh...realizing that even though we have the inner_hits now available, we don't have any ability via mustache to combine values. I guess inner_hits needs an additional set of in/out fields so it can augment [18:51:05] basically it means we can pass the text, but not the context + text per-hit [18:51:22] * ebernhardson just wants a zip function in mustache :P [18:51:55] or you could just assemble the whole thing at index time, where context is context + text [19:40:28] oh completely missed this problem... and made things even worse recently by dropping the context field but keeping only parent_sections. meaning that to reconstruct the context you have to the title, the section (if not null) and parent_sections... [19:41:22] having a text_with_context field is simple but we yet again dup the content :( [19:42:02] or we could encode this in the text field but that means we have to decode it and remove the context before displaying it [19:42:06] i put together a very simple bit so the inner_hits processor takes a list of fields, a target field, and a delimiter. Right now it only combines strings, but it could do whatever [19:42:56] i suppose it's a bit annoying to make generic enough to also combine the parent_sections and what not, but not impossible [19:43:02] could just accept specialization [19:43:09] I'm building enwiki_content_evaluation_index_20260125 in relforge please let me know if it's too hard [19:43:35] it's totally fine, it sounds like i should also join List and not just throw it away? [19:43:38] we can feed a "context" field, should not add much [19:46:48] it's only hard if we try and make it super generic, but if we accept some ugliness it can probably stay pretty simple to pull from parent + nested doc and join the bits [19:47:22] alternatively it would be nice if there was a script response processor, but i probably shouldn't get into the weeds with that :) [19:48:03] :) [19:48:46] anyways it's not too late to change how we index things and happy to change to make things easier [20:08:36] i just don't know :S On the one hand i'm not the biggest fan of duplicating everything even more times, but on the other hand the source is usually (i think?) a compressed json document so it probably isn't a big space problem to have a 'context+text' field for inferrence and a 'text' field for returning to the user. [20:24:33] sure, I'll re-import with a text_with_context field, we can always compare the diff in size [20:25:37] i've also updated the inner_hits plugin to have some joining possibilities, handles String and List, but uses a simple joiner so no `[foo][bar]\ntext`, but provides some options [20:25:54] would be `foo | bar | text` instead