[06:59:02] o/ [07:15:59] o/ [07:39:19] o/ [07:40:14] pfischer: I'll let you send the email invite for next Search Platform office hours... [07:47:23] Oh, I am sorry. Did we end up with multiple invites for yesterdays office hours? [08:00:10] I don't think so. I see only mine in https://lists.wikimedia.org/hyperkitty/list/discovery@lists.wikimedia.org/ [08:04:43] gehel: interesting, I definitely sent an invite 15:00 yesterday. Anyways, better twice than not at all, I guess. [08:56:54] much better! [08:57:03] Where did you send your invite? [09:31:20] errand+lunch [09:57:01] gehel: Wikimedia developers , "Discussion list for the Wikidata project." , A public mailing list about Wikimedia Search and Discovery projects [09:57:56] Weird. I don't see your email in the archives. Are you subscribed to those lists ? [11:47:27] gehel: Just checked. Yes, I am. [11:56:22] gehel: seems like something wrong on your side, I see Peter's emails: https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/NAODCXYZ5VARZOA375A4CZ6GR3AWEWPM/ [14:14:12] \o [14:40:27] o/ [14:40:46] .o/ [15:42:10] * ebernhardson wonders if we should try upgrading xgboost from 1.7.6 to 3.x [15:49:13] yes probably, if not a huge pain :) [16:02:56] sigh...should have noticed this earlier...i'm going to guess the reason i get ndcg > 1 is we have some labels with the value 2147483647 [16:03:13] i guess i did something wrong somewhere :P [16:09:50] that number is never right :) [16:35:16] it turns out...you probably want DataFrame.unionByName, and not the plain DataFrame.union. I guess i expected it would have blown up, but instead it does implicit type casting in a union [16:38:57] lunch/driving to Austin, back in ~3h [16:39:38] Trey314159: would you have objections if I split SecondTrySearch into one class per strategy? [16:40:28] I think I might want to put the language variant thing as another kind of SecondTrySearch so they're treated the same way [16:46:38] heading out [16:55:00] dcausse: I wasn't sure early on how much the different approaches would have in common, so it was easier to lump everything together... feel free to break it up however you like [17:39:31] hmm, i suspect we might be able to support alphabetical sorting these days, but first we would have to migrate from our fake-keywords to modern normalized keyword fields. [17:40:04] (in the past you couldn't attach analyzers to keywords, so we made our own pseudo-keywords. We can now have normalizers on keyword fields, but we never migrated) [17:41:49] the basic reason is we can't use doc_values on text fields, but we can on keywords, and doc_values allows the sort without having to load fielddata (==constant memory usage) [18:53:37] I think my runUpdate.sh script has stopped working because of the new user agent strict enforcement. Where in the stack would I go about setting a user agent? [18:54:02] For the query service, I mean [18:55:15] Something within UPDATER_OPTS, probably? [18:57:52] hare: hmm, not sure where we set this but can look [18:58:29] hare: we have a USER_AGENT variable in runBlazegraph.sh, it looks to get provided to the jvm with `-Dhttp.userAgent=${USER_AGENT}` [18:58:49] oh, updater, not blazegraph..ummm [18:59:17] which updater? If it's java it's probably quite similar, but we run the flink updater [19:01:30] hare: guessing you're using runUpdate.sh, I couldn't say definitively but i would try adding the -Dhttp.userAgent bit to UPDATER_OPTS. If it uses that settings UPDATER_OPTS looks like a reasonable way to inject it [19:50:22] back [19:53:24] I'll try that, thanks! [20:26:38] shipped the dym ab test for control vs prefix_len=1 vs prefix_len=1 + opening_text [21:08:06] ryankemper I'm in pairing if you're around [22:07:57] later