[09:03:00] pfischer: 1:1 ? [09:03:44] gehel: sure, thought you where out for the PT MGMT offsite [09:04:05] I'm here in the mornings. The Offsite is in Philadelphia [10:05:39] lunch [13:36:47] o/ [14:16:16] \o [14:22:52] o/ [14:34:49] o/ [14:44:10] * pfischer has a conflicting meeting, will join The Wednesday Meeting 30’ late [14:53:48] .o/ [14:53:59] working on OpenSearch k8s w/Ben, will not make the Weds mtg [15:38:24] https://people.wikimedia.org/~ebernhardson/T390858-prefix_len-all_wikis.html https://people.wikimedia.org/~ebernhardson/T390858-prefix_len-enwiki-frwiki-dewiki.html [18:40:55] Trey314159: i'm noting on the ticket what else to do, the request was to basically re-run the report in ja and zh? or am i forgetting something else [18:47:00] ebernhardson: yeah, I'd love to see a few other languages.. ar, ja, hi, sw, sr would make for nice variety to look at different writing systems, non-European languages, and one with heavy use of language converter. That's probably enough.. though if you want to add any from the second tier (100K+) on the Wikipedia portal that'd be cool, too. [18:47:06] huh. on zhwiki and jawiki default_1v increases automatic rewrite rate from 1.65% -> 2.28%, a relative change of 38%, with article clickthrough increasing from 8.6% to 11.4%, a 32% relative change [18:47:56] which is still not enough of a change to get a significant change in overall article clickthrough rate :P [18:49:17] I just glanced at the en, fr, de report. I didn't realize they were all lumped togther. I don't think "ar, ja, hi, sw, sr" is a natural group that would make sense together. Maybe ar and sw separately? [18:49:31] that certainly says i probably need to do the analysis per-wiki for a selection of large wikis [18:49:47] I would read them all! [18:50:12] i mean that the one we decided was "not good enough" from the overall report is the clear winner in zh and ja, with huge improvements [18:51:12] Maybe 1v2 will be better than everything, everywhere (it's never that easy, though) [18:55:14] aww, unfortunately the volume is so small...402k source queries, works out to saving ~570 queries/wk in baseline, and 1040/wk with default_1v [18:56:40] In the P&T Mgr offsite, talking about https://wikimediafoundation.org/news/2025/04/30/our-new-ai-strategy-puts-wikipedias-humans-first/. In particular "Giving Wikipedia’s editors time back by improving the discoverability of information on Wikipedia to leave more time for human deliberation, judgment, and consensus building;" sounds like something that Search could help with. [18:57:24] pfischer: this seems at least somewhat related to our discussion around discoverability of search features. You should talk to Chris or Leila or Miriam. [20:10:44] such mysterious answers. automatic rewrite rate went from 8.6% to 24.4% on hiwiki with the variant field (but only 2.8k queries/wk) [20:11:16] wow.. that's a *lot* [20:11:30] indeed [20:12:52] Trey314159: it's still generating the last couple, but most can be found here under T390858-prefix_len-* https://people.wikimedia.org/~ebernhardson/ [20:12:53] T390858: Improve CirrusSearch DYM suggestions using the phrase suggester on more content - https://phabricator.wikimedia.org/T390858 [20:15:45] Nice! I will check them out [20:17:29] they are all there now [20:35:05] Fun new wrinkle! Searcing for a regex of four or more four-byte characters (`insource:/𐌀𐌁𐌂𐌃/`) causes an error on-wiki. My local changes to the experimental highlighter plugin are not the cause—it's happening in prod right now!—so that's why it didn't quite make sense. [20:40:42] Provided analyzer generated more than one token, if using 3grams make sure to use a 3grams analyzer, for input [\uDF00\uD800\uDF01\uD800\uDF02\uD800\uDF03] first is [\uDF00\uD800\uDF01\uD800\uDF02] but [\uD800\uDF01\uD800\uDF02\uD800\uDF03] was generated. [20:41:25] but yea, sounds like another problem to work out [20:42:56] other random silliness in the log: regex search for 'insource:/''''/ -insource:/'''''/ above' timed out and only returned partial results! [20:47:49] I haven't looked into it yet because I was looking at my new code to make sure everything makes sense. It's odd that it only gets grumpy when there are 4 or more characters. One to three work fine!