[21:39:08] Reedy wondering if you could review https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/565765/ please? :) [21:41:55] Reedy addressed! [21:42:15] Does it actually fix the issue? [21:43:06] I haven't tested, but that's what page_title uses. [21:43:22] i can test it [21:49:15] That'd certainly be useful [21:50:04] yup, i've applied it and now running rebuildall.php [21:50:30] In theory it make sense [21:50:44] as binary is measured in bytes rather than characters [21:50:54] and 255 chars doesn't necessarily fit into 255 bytes [21:53:34] Though, you'd kinda think 255 bytes would fit in 255 chars? [21:54:51] yeh [21:55:01] otherwise wouldn't page_title also have the issue? [21:55:26] Though, the description says "Munged version of title" [21:55:30] I've no idea what's munged about it [21:56:54] hmm, it still fails. [21:57:10] (just did the easy way and copied a db, updated searchindex) [21:57:13] then did this: [21:57:33] https://phabricator.wikimedia.org/P10211 [21:57:35] Reedy ^ [21:57:47] (that's query it errored out with first time) [22:07:52] Reedy script failed again with same error so yeh that dosen't work :( [22:08:02] Guess we could make this a small/medium text blob? [22:08:12] I guess the question is what "munging" actually means [22:08:24] If it's adding stuff, having it at the same length doesn't help for sure [22:08:34] I mean, we don't use this in Wikimedia... So I guess it's not so well tested [22:08:44] munging? [22:09:11] -- Munged version of title [22:09:11] si_title varchar(255) NOT NULL default '', [22:09:11] -- Munged version of body text [22:09:11] si_text mediumtext NOT NULL [22:09:20] Manipulated, somehow [22:09:34] ohh [22:09:47] https://phabricator.wikimedia.org/P10212 [22:09:47] I'd definitely suggest filing a bug [22:09:50] ok [22:10:03] Ideally providing some example titles that don't fit, etc [22:11:05] Reedy done https://phabricator.wikimedia.org/T243162 [22:14:55] Reedy could it be because page_title may be 255, but it adds namespace too [22:15:00] so that makes it more then 255? [22:15:14] https://github.com/wikimedia/mediawiki/blob/master/maintenance/rebuildtextindex.php#L105 [22:16:21] actually this particular query does not include a ns [22:16:30] (as it looks like it's in main) [22:16:51] I'm guessing it's SearchUpdate::getNormalizedTitle [22:17:21] It doesn't obviously look to include the NS [22:18:17] paladox: As above, yuo need to include some example titles that obviously fit ine in page_title, but don't fit in si_title [22:19:08] Well looking at the query, looking in page_title for labyrinth, shows "Why_do_i_get_black_squares_with_lost_labyrinth_sims_2_graphic_rules" [22:19:35] and lost for that one too (though it brings up other titles that contains it too) [22:21:31] Reedy ^ [22:23:31] That's 68 originally [22:23:35] So that's quite some expansion [22:25:45] yup [22:26:48] it seems to be mixing stuff up [22:32:08] I'm guessing there's no tests for any of this [22:33:42] i wonder why we normalise the title twice, here https://github.com/wikimedia/mediawiki/blob/master/includes/deferred/SearchUpdate.php#L98 and https://github.com/wikimedia/mediawiki/blob/master/includes/search/SearchMySQL.php#L365 [22:35:41] Pass [22:36:14] i've added some output so will hopefully see what it does [22:36:22] (to $title in rebuildtext) [22:41:58] Reedy seems the titles look sane to me (no mix up) [22:51:47] Reedy ahhhh [22:52:07] User blog comment:Lost Labyrinth/If you missed out on certain expansions or stuff packs for The Sims 2, here's your chance to grab them now/@comment-xx.xxx.xx.xx-20140715230100/@comment-2201555-20140715230503 [22:54:46] So i think going with text will fix this. [22:55:05] Since strict mode doesn't trunicate it.