[08:08:55] <gehel>	 pfischer: do you have a few minutes for a chat? Are you doing the weekly report this week?
[08:13:28] <pfischer>	 gehel: sure
[08:14:36] <pfischer>	 gehel: yes, I would do the report
[08:17:24] <gehel>	 pfischer: https://meet.google.com/nzy-gqve-jnb
[08:51:59] <dcausse>	 the relaxed profile is enabled, you can trigger it appending &cirrusFTQBProfile=perfield_builder_relaxed to your search results. Testing few queries, I can see that it might need more a lot more tuning...
[08:52:47] <gehel>	 pfischer: ^ seems like a thing to highlight in the weekly status report!
[08:53:48] <pfischer>	 dcausse: Thanks! Definitely worth reporting. What are the effects you would want to tune?
[08:54:55] <gehel>	 It might be worth adding a bit of context on T343148, and share it in #semantic-search (slack)
[08:54:56] <stashbot>	 T343148: Relax 'AND' operator in search queries - https://phabricator.wikimedia.org/T343148
[08:55:17] <dcausse>	 pfischer: question to answer are: is the current MLR model adapted & the various weights we manually set are probably heavily biased towars the previous strict approach
[08:56:42] <dcausse>	 strict approach: https://en.wikipedia.org/w/index.php?search=what+event+triggered+WW2%3F&title=Special%3ASearch&profile=advanced&fulltext=1&ns0=1
[08:56:47] <dcausse>	 relaxed: https://en.wikipedia.org/w/index.php?search=what+event+triggered+WW2%3F&title=Special%3ASearch&profile=advanced&fulltext=1&ns0=1&cirrusFTQBProfile=perfield_builder_relaxed
[09:07:51] <dcausse>	 too early to share imo
[09:33:28] <dcausse>	 quick heads up that the wikikube@eqiad k8s cluster upgrade is going to happen soon (next week before oct. 2) T405703
[09:33:29] <stashbot>	 T405703: Update wikikube eqiad to kubernetes 1.31 - https://phabricator.wikimedia.org/T405703
[09:35:45] <gehel>	 do we have anything to demo for the staff meeting next week?
[09:37:11] <gehel>	 ebernhardson: maybe a quick presentation of T404822 ?
[09:37:12] <stashbot>	 T404822: Analysis: how many search queries are using natural language vs keywords - https://phabricator.wikimedia.org/T404822
[10:08:25] <pfischer>	 dcausse: I don’t have to mention it, if you’d prefer. Maybe a mention, that we’re on it does for now.
[10:09:57] <dcausse>	 pfischer: it's fine to mention, just wanted to warn that it might not be ready for broader evaluation, seems to me that we could do better
[10:55:34] <dcausse>	 lunch
[13:17:54] <inflatador>	 <o/
[13:21:13] <dcausse>	 o/
[13:44:18] <ebernhardson>	 \o
[13:44:47] <ebernhardson>	 gehel: i guess i could? I dunno how interesting it is.  It's basically two numbers :P
[13:45:36] <gehel>	 I think the way you arrived at those 2 numbers is interesting. The mix of automatic classification and manual reclassification.
[13:45:38] <ebernhardson>	 wow, those relaxed results are terrible :P
[13:47:02] <ebernhardson>	 gehel: technically, the automatic classification is just a sampling strategy. I have a long comment that explains that pending to respond to martin's questions, but i only made it through the first two yesterday and have to address the last about agent_type still
[13:47:28] <ebernhardson>	 but i think i explained it poorly initially, because martin's comment implies the same misunderstanding
[13:47:54] <gehel>	 :)
[13:48:41] <gehel>	 ebernhardson: your choice (presenting or not). I'll let you ping Guilherme who manages the list (see ping on slack)
[13:52:55] <inflatador>	 .o/
[14:18:10] <dcausse>	 o/
[14:19:25] <dcausse>	 yes... I believe we could spend a couple hours manually tuning some weights, i.e. using the old field you get something more reasonable: https://en.wikipedia.org/w/index.php?search=what+OR+event+OR+triggered+OR+WW2&title=Special%3ASearch&profile=advanced&fulltext=1&ns0=1
[14:19:35] <dcausse>	 s/old/all
[14:20:10] <dcausse>	 I bet these weights were tuned aggressively for concept matches but with strongly relied on the AND filter
[14:20:28] <dcausse>	 s/but/and/
[14:20:33] <ebernhardson>	 that also skips the mlr rescore, which i suspect does poorly with the new stats
[14:20:45] <dcausse>	 yes I was wondering about that
[14:22:54] <ebernhardson>	 should make a ticket? It will probably take a couple days work to work out what we need to train a model under the different constraints (also because mjolnir failed dbn this week...need to check what failed)
[14:23:40] <dcausse>	 yes, I can spend time manually tuning the retrieval query
[14:23:50] <dcausse>	 for mlr I'm not sure what could be done
[14:24:19] <ebernhardson>	 i'm not 100% sure, but i suspect we are seeing 0's in the min_* features, and it might not be used to that? Or it might have previously meant something else
[14:24:40] <ebernhardson>	 like 0's might have been a slightly different signal under strict AND
[14:25:55] <dcausse>	 that'd mean perhaps pulling some negative samples from this relaxed query in addition to the results present in the backend logs?
[14:27:10] <dcausse>	 not sure if what I'm saying makes sense, can't remember how we pull candidates (if we take all the results presented for instance)
[14:27:10] <ebernhardson>	 yea, i suspect it means we need more samples. although i'm realizing i haven't done signficant work in mjolnir in so long i'm not quite sure :P 
[14:27:26] <ebernhardson>	 i was also just poking in th ecode to understand if we run the query, or use the hits from the search logs
[14:28:59] <ebernhardson>	 i think we might only be using the results from the logs, so indeed we might need to augment somehow
[14:30:16] <ebernhardson>	 i dunno i'll have to think about it...i kinda expected mjolnir was going to be difficult here, but hadn't come up with a plan
[14:33:21] <dcausse>	 negative sampling could be interesting to add but hard to tell if this will play well, esp. if the so called negative sample happen to be a good result that users would have clicked on if presented to them
[14:33:51] <ebernhardson>	 yea, sadly we don't get the easy negative sampling of other things...outside search negative sampling often means "take positive samples from an unrelated thing"
[14:33:59] <ebernhardson>	 but here...they would get so many 0 features to mean nothing
[14:34:43] <dcausse>	 but the current retrieval query with OR is certainly very bad on multi words query so perhaps good enough
[14:34:53] <ebernhardson>	 so the negatives have to at least match the query, and we can't randomly select them (random is nice because its highly probably a random result from the dataset is not a "good result")
[14:34:59] <dcausse>	 yes
[14:35:49] <ebernhardson>	 yea, i'm actually surprised at how bad that is, but then again with it looking much better with the MLR disabled, it might have promise
[14:39:19] <ebernhardson>	 school run, back in 20
[14:40:44] <dcausse>	 saw somewhere that a method to harvest so called "hard-negative" is to simply use a naive bm25 retriever instead of just random samples, if the query has few clicks perhaps chances are low that these "hard-negatives" are actually good?
[14:57:23] <dcausse>	 heading out, have a nice week-end!
[15:06:59] <ebernhardson>	 hmm, that does seem plausible
[15:31:37] <inflatador>	 workout, back in ~40
[15:33:43] <ebernhardson>	 i wonder what would happen if we re-ran queries under the loosened matching, then just declared everything after position x (5?) a negative sample
[15:35:59] <gehel>	 Time to end the day. See you all next week!
[16:27:38] <inflatador>	 sorry, been back but I need to step out for lunch! Back in ~90
[16:59:04] <ebernhardson>	 was checking more like stats, it looks like we might be seeing ~20% increase in more_like qps vs last week, nothing to do but just to remember
[16:59:23] <ebernhardson>	 (its almost certainly the new search suggestions )
[17:26:53] <ebernhardson>	 heh, testing the new date filters. surprisingly `lasteditdate:today=10y` has 11 results on enwiki
[17:27:20] <ebernhardson>	 they are all disambiguation pages
[18:06:20] <Trey314159>	 Cool! I decided to quickly check how many pages were last edited in each year... https://usercontent.irccloud-cdn.com/file/sWOXyhqr/pages_vs_year.png
[18:09:29] <ebernhardson>	 2013 must've been a banger :)
[18:12:14] <ebernhardson>	 a curious thing though, if i keep refreshing `lasteditdate:2025` i get varied results. but they should be approximately popularity sorted by incoming_links and popularity score
[18:14:10] <ebernhardson>	 pondering...but i don't have a working theory on how that's possible :S hitting different shards should get the exact same scores since it's only math on those two float fields
[19:10:39] <inflatador>	 back
[20:56:32] <inflatador>	 ryankemper you have anything for pairing? I'm just working on my Asana updates
[20:57:07] <ryankemper>	 inflatador: nah just working on incident timeline and spicerack stuff, I’m fine skipping
[20:57:30] <inflatador>	 ryankemper sounds good, see ya Monday!