[13:00:16] \o [13:58:43] ryankemper I started a new data-reload under my user on cumin2002 , ref T386098 [13:58:44] T386098: Run a full data-reload on wdqs-main, wdqs-scholarly and wdqs to capture new blank node labels - https://phabricator.wikimedia.org/T386098 [14:05:21] Hi yall! Lots of SearchSatisfaction event validation errors since July 17. https://grafana.wikimedia.org/goto/WTamtQUNR?orgId=1 [14:05:56] Oh, nm, already known: https://phabricator.wikimedia.org/T399965 [14:05:57] carry on [14:32:10] * ebernhardson wonders, after several minutes of python just spinning the cpu, if 5000 rounds of bootstrapping is the "right" amount [16:00:32] looks like the wdqs reload failed again...not sure what happened there ;( [16:06:37] lol, 99th percentile autocomplete length: 39 chars. 100th percentile (max): 7201 [16:06:53] oddly, 10 chars is 55th percentile...seems really high [16:35:02] ebernhardson: I'd say that you should look at the whole distribution of query lengths (maybe ignoring anything over 50 so any graphs you make aren't ridiculous) to make sure the A & B data look similar, but restrict the analysis comparison to ≥6 ... or analyze ≤5 and ≥6 separately to see what random variation (in ≤5) looks like. OTOH, if that's too much trouble, just looking at ≥6 is reasonable. [16:37:48] curiously, right now things look worse, but only barely :P mean characters from 9.27 to 9.33. mean click position from 1.695 to 1.71 [16:38:59] but i have to poke at this data some more, it looks odd. 2.7M ac submits, but only 1.65M success (direct to page). those don't match what we saw in the sankey... [16:38:59] Those differences can't be statistically significant to any degree, can they? [16:39:09] Trey314159: they are, the dataset is pretty big [16:39:23] wow [16:39:25] ci on mean charcters typed is 0.02 characters [16:40:07] (thats after clipping max to the 95th percentile) [16:40:20] but clipping didn't make much of a difference [16:44:19] Well, too much recall can make precision worse, right? OTOH, can a human tell the difference between 9.27 and 9.33 characters typed? 9.25 is "usually 9, but 10 a quarter of the time" and 9.33 is "usually 9, but 10 a third of the time".. given that "usually 9" really means "from 6 to 12 most of the time", that's no perceptible difference. I click position is also statistically significantly worse, though, then maybe there's no [16:44:19] point. [16:44:36] It's kinda cool when a test subverts your expectations. [16:44:39] if these metrics are right, no human should be able to notice a difference :) [16:45:33] Presumably the extra fuzzy is more expensive, too, so not using it seems like a win. What wikis are you testing over? Might it do better in other languages/writing systems? [16:45:36] i still have to do the per-position values abandonment rate, but i'm suspecting abandonment is off and i have to dig through this data [16:45:43] this is all wikis [16:46:17] err, the per-position clickthrough rates, and separately the abandonment rate. And maybe the "fall-through" rate or whatever we call ending up at Special:Search [16:46:40] did you break it down by wiki or language? Comparing mean characters across languages would be interesting! [16:47:15] not yet, but we have the data. I could probably generate a table with the wikis/languages, but calculating all those CI's might be a bit much [16:47:59] might be interesting for each metric to have a "winners and losers" bit that shows the wikis/languages with the most increase/most decrease [16:48:11] will have to see how long that would calculate though...one ci metric already takes a minute or two :P [16:49:26] separately, the Search Metrics, Web dashboard shows fulltex abandonment dropped from 48% to 27% since friday. I don't know whats going on there :P [16:49:54] maybe some sort of metrics thing....counts are down too [16:49:59] You could skip the CI and sort by diff and maybe calculate CI for anything that we could even pretend might be human perceptible. If they are all less than 0.05 keystrokes, CI doesn't matter. [16:50:07] Maybe the Bots of Abandonment got tired. [16:50:20] *Bots of Useless Abandonment [16:50:20] yea my first guess is some bots got turned off [16:50:50] If a botfarm really makes that big of an impact, that's horrifying. [16:51:14] lots of metrics down. ZRR sessions from ~25k/day to ~8k/day [16:51:39] The AI scrapers finally got everything they were after? [16:51:54] search sessions down from 330k to 265k, which sounds quite low [17:53:28] starting CODFW restart now [18:03:34] * ebernhardson should just unmap the `insert` key...i don't know if i've ever used it intentionally [18:04:09] but it changes my insert mode in jupyter when i try and push `end` but hit `insert` instead [18:31:59] damn, looks like we have some nodes that need to be added to conftool [18:40:41] might be the best userscript i've found in a while: https://github.com/vmatt/airflow-colorblind-status/?tab=readme-ov-file [18:40:55] (yes, the symbols are ugly...but i don't have to guess if green and dark gray are the same color :P) [18:42:33] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1171258 [18:43:01] ryankemper ebernhardson ^^ patch for adding some cirrussearch nodes back to pybal if y'all have time to look [18:47:47] inflatador: seems plausible, not sure how to check if thats really all of them. I guess compare against whats joined the cluster? [18:50:20] ebernhardson yeah, I just did `for n in 2073 2078 2094 2095 2096 2110; do curl -s https://search.svc.codfw.wmnet:9243/_cat/nodes | grep ${n}; done` [18:50:36] terrible way to do it, but luckily it's only 6 nodes ;) [18:50:46] sec i can probably compare them easier than doing by hand...but looking by hand i already found mismatches :P [18:55:39] inflatador: https://phabricator.wikimedia.org/P79538 [18:56:21] so that claims 2064 is in cluster but missing in puppet, 2089 and 2091 are listed in puppet but not joined to the big cluster [18:57:32] interesting, let me check that out + update patch [18:58:51] fwiw can't ping or ssh 2089 or 2091 either [18:59:37] I know 2091 is borked, but I didn't know about 2089...checking [19:00:01] 2089 gave 100% packet loss alert on the 17th [19:00:06] thursday [19:05:40] damn, I need to get these alerts into IRC. Definitely missed that one [19:06:00] even the DRAC is down for 2089 ;( [19:08:13] ouch, full hardware fail [19:13:02] 2091 has DRAC but no disks. It's been broken for awhile. I'll try and reimage, but in the meantime I'll get a ticket started for DC Ops [19:23:59] OK, the patch should be good now (I hope). Quick break, back in ~15 [19:50:07] Back...the patch is merged and I've pooled all the hosts that were missing. Resuming rolling operation... [20:01:07] hmm, now something's screwy with the rolling operation. It doesn't want to restart any hosts even though some of the hosts' services haven't been restarted in weeks. I guess I'm gonna restart the whole cluster again (shrug) [20:42:12] patch for moving the broken CODFW hosts back to insetup: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1171270 [20:52:48] ryankemper are you around today? I'm gonna try and get the rolling-operations done today so we can unblock the reindexes for T399162 [20:52:49] T399162: Regression: Cirrus exact string regexp search for insource:/"u.a."/ has stopped working - https://phabricator.wikimedia.org/T399162 [20:55:05] +1'd [21:02:05] ebernhardson thx [21:02:58] inflatador: hitting some construction otw back, about 12 mins late to pairing [21:03:13] ACK [21:04:05] * inflatador is still wondering why `minikube image load` takes 3-5m to complete every time [21:29:50] damn, I missed some more of those cirrus hosts in the LB patch ;(. Time to make a new one [21:31:58] ah, no patch needed. just need to change weight in conftool [23:51:44] ryankemper anything going on w/eqiad, I just saw some alerts come thru