[19:03:37] TimStarling: Assuming the claim I just made is correct ( https://phabricator.wikimedia.org/T184458#5729071 ). Would it make sense to remove mutability of this setting entirely to avoid footgun scenarios? Alternatively, if there is no single cross-platform value we can use (looks like that might be the case, we pick a "good" one instead based on `bin/locale` options in the installer), perhaps that kind of check should be moved to a [19:03:37] (cached) run-time check? Or if we want to ensure consistency between servers, maybe the installer is the right way to go, but then with some kind of run-time check that fatals if it isn't "good enough" /supported for MW? [19:04:05] seems like we can operate with en_US.UTF-8 but not with fr_FR.UTF-8 for example. [19:04:22] also cf. https://phabricator.wikimedia.org/T201749 [20:19:33] Krinkle: yes, probably the installer is the right time to complain and runtime is the right time to choose a locale [20:22:04] TimStarling: would it matter if within a single wiki setup one server picks C.UTF-8 and another en_US.UTF-8? That is, are there subtle differences within the subset of locales we currently support fully that we don't want to vary on within a wiki / its servers? If so, then hardcoding into LS.php at install time might still make sense. Otherwise, if we can make the subset strict enough I guess run-time is fine. [20:22:28] I don't think it really matters [20:23:36] what are the consequences of using just "C"? IIRC we introduced locale selection because some shell commands (e.g. rsvg) were failing with non-ASCII command line parameters if "C" was used [20:24:04] I'm just wondering if C may be a better fallback than fr_FR.UTF-8 [20:24:09] C is always available [20:25:18] the point of always selecting a UTF-8 locale was to avoid that problem with rsvg etc. but users probably care more about DB query errors than rendering of non-ASCII SVGs [20:28:58] The inline docs reference https://bugs.php.net/bug.php?id=45132 for the UTF-8 part. The "C" part is originally that the locale affects how Lua handles various string comparisons (T107128). [20:28:59] T107128: Scribunto string comparison works case insensitive while the standard Lua case sensitive - https://phabricator.wikimedia.org/T107128 [20:29:26] We could potentially force C for LC_NUMERIC while using $wgShellLocale for some of the other LC_ stuff. [20:43:14] the obvious solution for that PHP bug is to not use escapeshellarg() [20:46:15] for Lua, well, we could throw an exception if the locale is wrong when Scribunto starts up [21:24:56] TimStarling: It was my impression/assumption going in that for some reason we can't assume "C.UTF-8" to always be available [21:25:12] but if it is, then I'd be in favour of just enforcing that all the way through and deprecating the config var and everythign related to it. [22:16:05] C.UTF-8 is not always available, but C is [22:16:30] C is the legacy 8-bit locale, charset unspecified [22:44:32] TimStarling: Hm.. and plain "C" isn't similar enough/compatible for our needs? [22:47:02] as anomie was saying, we use escapeshellarg(), which is broken if the locale is C [22:47:12] that's https://bugs.php.net/bug.php?id=45132 [22:47:24] LANG=C php -r 'print escapeshellarg("dög")."\n";' [22:47:24] 'dg' [22:50:32] looking codesearch, there is actually significant usage of escapeshellarg() in extensions, despite it being prohibited by phpcs [22:50:47] loads of extensions are overriding the default phpcs rules to allow it