[07:41:05] re lasteditdate:2025 with varied result: I suspect that even in the inital retrieval window of 8192 results/shard we get different candidates depending on the shard [07:47:57] filed T405867 & T405869 to capture the discussion we had last Friday (cc pfischer) [07:47:58] T405867: MLR: Mine and use negative samples - https://phabricator.wikimedia.org/T405867 [07:47:59] T405869: Tune the perfield_builder_relaxed query builder profile - https://phabricator.wikimedia.org/T405869 [09:56:55] lunch [10:07:09] lunch [13:14:11] o/ [14:26:37] \o [14:28:11] o/ [15:54:54] Hey team: if you're around, we're having a quick chat in https://meet.google.com/nbv-cerc-auh [15:55:18] dcausse, ebernhardson, inflatador ^ [16:03:46] gehel in SRE staff mtg, sorry [16:04:21] I'll be at Texas Linux fest this weekend https://2025.texaslinuxfest.org/ . If anyone has suggestions for panels to attend LMK. I'm definitely going to the one about RAG [16:36:42] workout, back in ~40 [17:25:10] curiously, only ~1M queries in labeled_query_page for enwiki. I can't remember if thats normal or not. It works out to ~31M labeled (query, page) pairs though so probably? [17:29:15] * ebernhardson sadly remembers far too little about how mjolnir works...this should be interesting :P [17:55:34] sorry, been back [18:06:33] lunch, back in ~40 [18:13:17] dinner [18:31:18] * ebernhardson notices random oddities, like how we pass norm_query through es_hits.transform but it doesn't get used anywhere, we just happen to need it in the output [18:32:06] i guess part of the problem is we didn't want to do joins or anything like that with the data that comes from kafka, wanted to as directly as possible read from kafka and write to hdfs to avoid error cases, iirc [19:47:17] turns out...just 40 threads doing parallel fulltext requests (from hadoop) is enough to take eqiad from <5% to 50% cpu [19:47:29] i guess they are sending msearch's though with 15 requests each [19:47:38] so maybe it's 600 queries in parallel [20:00:23] That makes me wonder if we need to enable the performance governor everywhere. I guess we need a more scientific way to test it [20:01:06] I guess we should start on relforge and see what (if any) difference it makes [20:04:18] it's hard to say...what we did a long long time ago was record queries with goreplay and play them back at various speeds [20:04:34] that way we had a repeatable test [20:09:12] I'd be down to try that again. Probably after Opensearch on K8s though ;) . See also T396501 [20:09:13] T396501: Decide on/run a benchmark for DPE SRE-owned OpenSearch clusters - https://phabricator.wikimedia.org/T396501 [20:20:20] doh, just realized this collection has taken a full hour because i forgot to de-duplicate the queries :( [20:20:35] it was supposed to take like 20-30 min, was curious why it kept going and going... [20:22:22] yup...it was running 21.8M queries, instead of 925k [20:35:18] figures..run properly it took 10 min [20:36:29] * ebernhardson should have been suspicious much sooner [20:59:04] ryankemper you have anything for pairing today? I'd say let's try and get some of those wdqs hosts into production if not [21:04:29] inflatador: wdqs sounds good, brt [22:48:01] ryankemper I don't think either of my reimages are gonna work. wdqs2017 was locked up, checking it out from console now [22:52:46] yup, console is completely unresponsive on 2017, even after power cycle. Will take a closer look tomorrow ;( [22:57:17] firmware update cookbook failed too ;( `AttributeError: 'NoneType' object has no attribute 'lower'`