[08:08:55] pfischer: do you have a few minutes for a chat? Are you doing the weekly report this week? [08:13:28] gehel: sure [08:14:36] gehel: yes, I would do the report [08:17:24] pfischer: https://meet.google.com/nzy-gqve-jnb [08:51:59] the relaxed profile is enabled, you can trigger it appending &cirrusFTQBProfile=perfield_builder_relaxed to your search results. Testing few queries, I can see that it might need more a lot more tuning... [08:52:47] pfischer: ^ seems like a thing to highlight in the weekly status report! [08:53:48] dcausse: Thanks! Definitely worth reporting. What are the effects you would want to tune? [08:54:55] It might be worth adding a bit of context on T343148, and share it in #semantic-search (slack) [08:54:56] T343148: Relax 'AND' operator in search queries - https://phabricator.wikimedia.org/T343148 [08:55:17] pfischer: question to answer are: is the current MLR model adapted & the various weights we manually set are probably heavily biased towars the previous strict approach [08:56:42] strict approach: https://en.wikipedia.org/w/index.php?search=what+event+triggered+WW2%3F&title=Special%3ASearch&profile=advanced&fulltext=1&ns0=1 [08:56:47] relaxed: https://en.wikipedia.org/w/index.php?search=what+event+triggered+WW2%3F&title=Special%3ASearch&profile=advanced&fulltext=1&ns0=1&cirrusFTQBProfile=perfield_builder_relaxed [09:07:51] too early to share imo [09:33:28] quick heads up that the wikikube@eqiad k8s cluster upgrade is going to happen soon (next week before oct. 2) T405703 [09:33:29] T405703: Update wikikube eqiad to kubernetes 1.31 - https://phabricator.wikimedia.org/T405703 [09:35:45] do we have anything to demo for the staff meeting next week? [09:37:11] ebernhardson: maybe a quick presentation of T404822 ? [09:37:12] T404822: Analysis: how many search queries are using natural language vs keywords - https://phabricator.wikimedia.org/T404822 [10:08:25] dcausse: I don’t have to mention it, if you’d prefer. Maybe a mention, that we’re on it does for now. [10:09:57] pfischer: it's fine to mention, just wanted to warn that it might not be ready for broader evaluation, seems to me that we could do better [10:55:34] lunch [13:17:54] o/ [13:44:18] \o [13:44:47] gehel: i guess i could? I dunno how interesting it is. It's basically two numbers :P [13:45:36] I think the way you arrived at those 2 numbers is interesting. The mix of automatic classification and manual reclassification. [13:45:38] wow, those relaxed results are terrible :P [13:47:02] gehel: technically, the automatic classification is just a sampling strategy. I have a long comment that explains that pending to respond to martin's questions, but i only made it through the first two yesterday and have to address the last about agent_type still [13:47:28] but i think i explained it poorly initially, because martin's comment implies the same misunderstanding [13:47:54] :) [13:48:41] ebernhardson: your choice (presenting or not). I'll let you ping Guilherme who manages the list (see ping on slack) [13:52:55] .o/ [14:18:10] o/ [14:19:25] yes... I believe we could spend a couple hours manually tuning some weights, i.e. using the old field you get something more reasonable: https://en.wikipedia.org/w/index.php?search=what+OR+event+OR+triggered+OR+WW2&title=Special%3ASearch&profile=advanced&fulltext=1&ns0=1 [14:19:35] s/old/all [14:20:10] I bet these weights were tuned aggressively for concept matches but with strongly relied on the AND filter [14:20:28] s/but/and/ [14:20:33] that also skips the mlr rescore, which i suspect does poorly with the new stats [14:20:45] yes I was wondering about that [14:22:54] should make a ticket? It will probably take a couple days work to work out what we need to train a model under the different constraints (also because mjolnir failed dbn this week...need to check what failed) [14:23:40] yes, I can spend time manually tuning the retrieval query [14:23:50] for mlr I'm not sure what could be done [14:24:19] i'm not 100% sure, but i suspect we are seeing 0's in the min_* features, and it might not be used to that? Or it might have previously meant something else [14:24:40] like 0's might have been a slightly different signal under strict AND [14:25:55] that'd mean perhaps pulling some negative samples from this relaxed query in addition to the results present in the backend logs? [14:27:10] not sure if what I'm saying makes sense, can't remember how we pull candidates (if we take all the results presented for instance) [14:27:10] yea, i suspect it means we need more samples. although i'm realizing i haven't done signficant work in mjolnir in so long i'm not quite sure :P [14:27:26] i was also just poking in th ecode to understand if we run the query, or use the hits from the search logs [14:28:59] i think we might only be using the results from the logs, so indeed we might need to augment somehow [14:30:16] i dunno i'll have to think about it...i kinda expected mjolnir was going to be difficult here, but hadn't come up with a plan [14:33:21] negative sampling could be interesting to add but hard to tell if this will play well, esp. if the so called negative sample happen to be a good result that users would have clicked on if presented to them [14:33:51] yea, sadly we don't get the easy negative sampling of other things...outside search negative sampling often means "take positive samples from an unrelated thing" [14:33:59] but here...they would get so many 0 features to mean nothing [14:34:43] but the current retrieval query with OR is certainly very bad on multi words query so perhaps good enough [14:34:53] so the negatives have to at least match the query, and we can't randomly select them (random is nice because its highly probably a random result from the dataset is not a "good result") [14:34:59] yes [14:35:49] yea, i'm actually surprised at how bad that is, but then again with it looking much better with the MLR disabled, it might have promise [14:39:19] school run, back in 20 [14:40:44] saw somewhere that a method to harvest so called "hard-negative" is to simply use a naive bm25 retriever instead of just random samples, if the query has few clicks perhaps chances are low that these "hard-negatives" are actually good? [14:57:23] heading out, have a nice week-end! [15:06:59] hmm, that does seem plausible [15:31:37] workout, back in ~40 [15:33:43] i wonder what would happen if we re-ran queries under the loosened matching, then just declared everything after position x (5?) a negative sample [15:35:59] Time to end the day. See you all next week! [16:27:38] sorry, been back but I need to step out for lunch! Back in ~90 [16:59:04] was checking more like stats, it looks like we might be seeing ~20% increase in more_like qps vs last week, nothing to do but just to remember [16:59:23] (its almost certainly the new search suggestions ) [17:26:53] heh, testing the new date filters. surprisingly `lasteditdate:today=10y` has 11 results on enwiki [17:27:20] they are all disambiguation pages [18:06:20] Cool! I decided to quickly check how many pages were last edited in each year... https://usercontent.irccloud-cdn.com/file/sWOXyhqr/pages_vs_year.png [18:09:29] 2013 must've been a banger :) [18:12:14] a curious thing though, if i keep refreshing `lasteditdate:2025` i get varied results. but they should be approximately popularity sorted by incoming_links and popularity score [18:14:10] pondering...but i don't have a working theory on how that's possible :S hitting different shards should get the exact same scores since it's only math on those two float fields [19:10:39] back [20:56:32] ryankemper you have anything for pairing? I'm just working on my Asana updates [20:57:07] inflatador: nah just working on incident timeline and spicerack stuff, I’m fine skipping [20:57:30] ryankemper sounds good, see ya Monday!