[06:59:39] o/ [07:00:51] o/ [08:35:15] pfischer: aren't you out today ? [08:36:25] gehel: just the half day (afernoon) [08:37:06] Oh, that's why our pre-triage moved to this morning! [08:37:24] gehel: Yes, sorry for the short notice. [08:37:26] dcausse: I pinged Marco about T405712, but I am somewhat stuck how to resolve this. I would assume a potential solution depends on who is working with those scores. [08:37:26] T405712: image_suggestions_weekly fails with event validation errors - https://phabricator.wikimedia.org/T405712 [08:37:38] I have to be on an errand, back in ~1h. So let's cancel. Il do it on my own this afternoon [08:37:57] gehel: Alright. Thank you. [08:40:05] pfischer: these scores are used in MediaSearch I think, perhaps the issue is that we did not normalize them by dividing them by 1000? [08:40:37] curious that issue pops up just now? perhaps we did have only recommendation tags before? [08:40:50] s/that/that this / [08:42:53] I think the problem is that the source dataset is very close to our index representation where the score is a frequency, with the weighted tag stream we abstracted this making it a score between 0 and 1 [08:43:39] if we keep their input dataset as-is I think we need to apply the normalization factor (/1000) [08:44:00] dcausse: …and de-normalize in SUP? [08:44:18] it should already be denormalized in the SUP [08:44:41] we should do (int) $score*1000 or something's not right [08:47:26] dcausse: right, I forgot this the factor was applied tag-agnostic [08:47:42] ideally their data-pipeline should be re-worked to output something closer to what's expected by the weighted_tag stream... [09:51:25] pfischer / dcausse : about T405712, if the events don't follow the schema, should this be owned by the producers? That would probably be reader growth? [09:51:26] T405712: image_suggestions_weekly fails with event validation errors - https://phabricator.wikimedia.org/T405712 [09:53:19] gehel: I think it's a problem we introduced when switching this batch process to the weighted tag stream [09:53:51] so it makes sense for us to own it? [09:54:08] I think so? [09:57:08] I think so too! [10:04:41] lunch [12:17:38] https://wikimedia.slack.com/archives/C0975D4NLQY/p1759530543473839 : latest semantic search concepts seem to address the "let's not confuse advanced users" ! [12:25:21] sent a message to discovery-alerts-owner to request adding Gabriele but unsure who's receiving the emails [12:31:02] I seem to be an owner [12:31:53] gmodena: done [12:32:26] thanks! [13:18:44] o/ [13:30:04] I'm back from Texas Linux fest! Lots of cool stuff to share, but here's nice talk about RAG to getcha started https://www.youtube.com/watch?v=Gzmzb0fhsKM [13:31:55] ^^ the presenter (Major Hayden) has a great blog as well https://major.io/ [15:03:17] dcausse: triage? [15:03:20] oops [16:00:49] workout/errand, back in ~1h [16:54:59] back [17:54:18] wondering how often the mrl models may return similar results different query-result pairs (for the same query) [17:54:31] s/mrl/mlr/ [17:56:51] dinner [17:57:55] lunch, back in ~1h [18:51:15] actually was trying to understand why some results might randomly change ordering, turns out that case is kind of important sometimes... search france vs francE you get different results, could be due to our camel case handling [19:06:50] back [20:49:08] ryankemper you have anything for pairing? I thought we could look at the wdqs reimages again and what can be done to make the process cleaner, see also https://wikimedia.slack.com/archives/C055QGPTC69/p1759254616035349 [20:50:01] oh, and we also need to look at all the allowlist requests, ref https://wikimedia.slack.com/archives/C055QGPTC69/p1759478866155629 [20:50:14] inflatador: that sounds like a good idea [21:03:50] inflatador: feeding dog, will be there in a few [21:08:33] ACK, np