[09:13:10] <_joe_> duesen: I think jaime made an interesting proposal at https://phabricator.wikimedia.org/T219279#5092155 re: the unicode issues with the php 7.2 transition [09:13:35] <_joe_> I would like to take a stab at it, but first I'd need a pointer to the function that normalizes titles and usernames [13:32:23] Krinkle, davidwbarratt: Currently it's mandatory that 'main' exist. duesen has in the past a desire to "eventually" make it non-mandatory. Whether a UI for "renaming" a slot (that would generate Special:Log entries or something) might exist in the future isn't impossible, although more likely to actually have a use case in the near future would be a sort of move/merge to e.g. move the main (only) slot of Template:Foobar.css to the [13:32:23] "templatestyles" slot of Template:Foobar. [13:38:48] _joe_: I like Jaime's proposal too; I said something similar in the #wikimedia-cpt channel to eprodromou the other day. In MediaWiki the patch would likely go in Language::ucfirst(), conceptually similar to what we already do in e.g. LanguageTr::ucfirst() to handle dotless-I in Turkish. [13:39:24] hello [13:41:11] <_joe_> anomie: heh, yeah. I'm finishing to compile a full character diff between HHVM and my patched php 7.2 [13:41:14] anomie: it looks like tests with the custom patch rollback have happened? [13:41:40] <_joe_> there are still differences but on very specific (and, odd) unicode characters [13:41:46] <_joe_> so that way is still viable [13:43:03] eprodromou: It looks like _joe_ made a php 7.2 package with mbstring (mostly) reverted to the ancient version of Unicode used in HHVM. I don't know whether any MediaWiki testing was done, you'd have to ask _joe_. [13:43:40] my head hurts [13:44:38] <_joe_> no mw tests performed for now :) [13:44:50] _joe_: I don't know if you missed it, but https://phabricator.wikimedia.org/T219279#5059771 points to a patch with a more complete set of differences between HHVM and PHP 7.2. The JS patch linked in the task's description filtered out many of the differences. [13:45:01] <_joe_> yes [13:45:03] <_joe_> ot [13:45:16] <_joe_> it filters out all chars unavailable on javascript [13:45:31] _joe_: metaquestion, are you getting what you need from Core Platform Team? [13:46:03] <_joe_> eprodromou: now that anomie chimed in, yes :) [13:47:36] <_joe_> but in a non-distant future, I'll need you to handle the transition to the newer unicode encodings anyways [13:48:17] U+F0003 CAPITAL THUMBS UP EMOJI [13:51:40] <_joe_> now, we'll need to understand how many titles are affected by this transition [14:50:59] who would be the person to talk to about VIPS/VipsScaler? [14:57:25] Krenair: gilles was the person to work on that area of stuff last IIRC [15:15:46] Is anyone with a bit of Postgres familiarity available to review https://gerrit.wikimedia.org/r/c/mediawiki/core/+/499013 ? This updates some queries in class SearchPostgres for MCR schema changes. I'd like to get it merged before I'm distracted away by other things and forget everything about Postgres that I just learned. :) [15:54:28] _joe_: Personally, I'd rather we go the other way by adopting the new uppercasing under HHVM (either in MW or patching HHVM) rather than by reverting mbstring in PHP 7.2. The revert seems like it could confuse other things like the Parsoid-PHP work which is specifically targeting 7.2+. [15:55:43] <_joe_> anomie: it would be only to have a consistent behaviour during the transition [15:56:38] <_joe_> I would plan to switch it off and let you fix the dupe pages at some point I guess [15:57:28] <_joe_> we definitely don't want to keep this patch long-term, I just want to decouple the php 7 migration from this problem [16:01:25] _joe_: FYI, I'm running DB queries to list all affected titles in unpatched 7.2, based on the diff in that Scribunto patch. Output is going to mwmaint1002:/home/anomie/tmp/T219279-php7.2-unicode-titles/check.txt [16:01:38] <_joe_> anomie: oh thanks [16:02:00] <_joe_> anomie: I obtained the same list AFAICS from hhvm/php7.2 in production [16:10:47] bpirkle: Done [16:12:04] _joe_, two comments: (a) break behavior should be a separate phase from move to php 7.2 .. i.e either have editors fix pages (rename, whatever) OR add compatibility code to MW (b) if compatibility code is added to MW, preferable to add the desired behavior vs previous behavior .. i.e. break early and explicitly so those pages can be fixed in the desired direction (unless tons of pages are impacted and breakage is going to be a huge disruption) [16:12:55] <_joe_> subbu: we're trying to understand that. I expect at least a few ligatures to be used quite widely in non-english wikis [16:13:48] thanks, makes sense .. getting that data would be useful .. also worth considering if a bot (or db script) can fix this en masse ... after providing communities adequate notice. [16:13:49] anomie: thank you! [16:13:53] <_joe_> so my idea would be: if it's a few 100s of pages, we can proably avoid doing the compat work, and fix them once the transition is complete [16:14:20] <_joe_> else, add php7 a compatibility layer for the transition period and fix all those pages with a script [16:54:34] <_joe_> I see from running wc on anomie's file (which is still being generated) it's several thousands of pages [16:54:46] <_joe_> so yeah, I think we need some sort of fixup [16:55:54] _joe_: Although that file is going to include pages in wiktionaries' main namespaces that aren't uppercased. Plus various noise lines. [16:56:08] <_joe_> oh ok [16:56:53] <_joe_> anyways, the performance impact if we just apply this change to Language::ucfirst() is not that large [19:12:33] _joe_: After filtering out *wiktionary ns0 and ns1, looks like 917 live pages across all wikis. enwiki has the most (160), followed by ruwiki (141), commonswiki (109), and kawiki (107). 77 distinct local usernames, only one is not SUL (I didn't check if any others were unattached though). 92 SUL accounts, apparently some have no local account (locally renamed for being harassing, maybe?). Also 249 deleted page titles (2208 deleted revisions) [19:12:33] across all wikis. I didn't think to check logging, links tables, and so on. [19:36:55] <_joe_> anomie: that's not many after all [19:38:18] <_joe_> can you add your findings to the task?