[08:12:53] <gmodena>	 inflatador oof... saw the graph.
[08:12:56] <gmodena>	 o/
[09:34:23] <gehel>	 ryankemper: BulkByScrollResponse seems to indeed be about indexing documents. Once servers are overloaded, it is reasonable to assume that write operations will also take more time. Or we might have a job that does a bunch of updates at a specific time? Or even an external bot updating a bunch of pages at 8pm, generating a bunch of page reindex?
[11:47:04] <gmodena>	 Trey314159 thanks for the write up. Let's touch base monday or wed? jawiki puzzles me a bit, but right no we have no LTR enabled on it. That's one case we'll need to handle with care.
[11:48:04] <gmodena>	 I'm on the same page re not deploying 2025-02 models. I wonder if it would be worth planning these a/b test at least once a quarter, in the hope to catch some seasonality effect (if it makes sense)?
[13:45:12] <gehel>	 gmodena: not clear to me: is there more work needed on T386068?
[13:45:12] <stashbot>	 T386068: Implement articlecountry a new CirrusSearch keyword - https://phabricator.wikimedia.org/T386068
[13:53:22] <inflatador>	 <o/
[13:57:31] <gmodena>	 gehel based on latest comments in phab we should be ready (modulo testing). I had a small f/up patch to limit number of term in search, but it might not be needed after all.
[13:58:08] <gmodena>	 gehel i'll add a comment in that thead
[13:58:20] <gmodena>	 inflatador o/
[13:58:47] <gehel>	 gmodena: thx!
[15:11:21] <inflatador>	 going in to office, back in ~30
[16:12:56] <Trey314159>	 gmodena: to be clear, I'd be okay with deploying the 2025-02 models. The Japanese model is weird, but the new one is clearly better than the old one. I was just using that as an example where one is better than another and thinking about how to automate that decision.
[16:13:03] <Trey314159>	 Running quarterly A/B tests would be interesting. I'd also like to compare a much older model with a newer one to try to guage whether changes from model to model are random fluctuations, or if there is real drift in a consistent direction—i.e., older models don't perform as well because something has actually changed over time, either in our data or our users' search behavior.
[16:21:29] <gmodena>	 Trey314159 ack
[16:22:34] <gmodena>	 re older models. It would be interesting, and I think feasible. AFAIK we do have the full history of training models. 
[16:24:22] <gmodena>	 what it'd be nice to have IMHO is a way to persist the results of the A/B. Maybe in some iceberg table? I've been toying a bit with the idea today (to support some doc in scope for T385972)
[16:24:22] <stashbot>	 T385972: Deploy and test new MLR models - https://phabricator.wikimedia.org/T385972
[16:27:25] <gmodena>	 or maybe ml folks already have some form of tracking we could piggyback on. There's no shortage of tooling for these use cases :) 
[16:30:05] <gmodena>	 Trey314159 I do like your suggestion regarding improving the notebook's readability. I'll give it a try on Monday to see how much work it takes to implement the changes.
[16:31:04] <gmodena>	 but for today, I'm calling it a day :). Happy Friday - enjoy the weekend!
[16:35:13] <Trey314159>	 Have a good weekend!
[16:36:40] <inflatador>	 .o/
[16:50:51] <gehel>	 time to start the weekend!
[16:51:15] <gehel>	 And I have friends coming over for a raclette - https://en.wikipedia.org/wiki/Raclette
[19:13:34] <Trey314159>	 Ahh, the Power of Cheese!™ https://www.youtube.com/watch?v=-f_d7JBIwMA
[19:48:06] <inflatador>	 latency alerts again ;(
[19:59:27] <inflatador>	 Good news though, this one looks pretty clear https://logstash.wikimedia.org/goto/8f2da3b11aae385ac635965dd69eb76a
[20:00:13] <inflatador>	 `org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@28be706e on QueueResizingEsThreadPoolExecutor[name = elastic1066-production-search-eqiad/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 310.3ms, adjustment amount = 50,
[20:00:13] <inflatador>	  org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@4e773165[Running, pool size = 61, active threads = 61, queued tasks = 1000, completed tasks = 62763343]]`
[20:46:36] <inflatador>	 So this does seem capacity-related, but also constrained by the thread pool write queue size, which we set with https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/role/common/elasticsearch/cirrus.yaml#73 . Would it be safe to experiment with raising this value? It seems we haven't changed it in awhile
[21:20:07] <inflatador>	 need to look at re-re-purposing those Relforge hosts too
[21:25:38] <ryankemper>	 Hmm yeah I could go either way on the write queue
[21:26:15] <ryankemper>	 Feels like given the writes are taking multiple seconds that bumping the threadpool probably won’t help a ton, but it’s hard to say
[21:26:36] <ryankemper>	 I really want to figure out where these requests are coming from in the first place…quite difficult though
[21:33:03] <inflatador>	 Yeah, agreed on all counts
[22:10:08] <inflatador>	 still a lot of `BulkByScrollResponse` messages in logstash...Bulk API uses the write thread pool (ref https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-threadpool.html ).  Docs imply the default write thread pool settings are "size of # of allocated processors, queue_size of 10000s. "
[22:10:35] <inflatador>	 whereas ours is set to 6, queue size of 1000
[22:10:58] <inflatador>	 There's probably a very good reason for that, but we still might want to look into that