[03:04:31] <Squox>	 Anybody here?
[19:58:28] <Krinkle>	 TimStarling: Hm. that includes our Shell.php in core.
[19:58:37] <Krinkle>	 It only has an alternate implementation for Windows
[19:59:03] <Krinkle>	 e.g. Shell::escape
[19:59:05] <TimStarling>	 yeah I know, I'm saying that's fixable
[19:59:12] <Krinkle>	 right
[19:59:25] <TimStarling>	 I said it when that bug was current but for some reason the solution was not liked
[19:59:55] <TimStarling>	 all you really need is $arg = str_replace("'", "'\''", $arg);
[20:00:04] <Krinkle>	 So breaking escapeshellarg() is the only thing blocking us adopting LANG/LC_ALL=C universally or smth like that?
[20:00:43] <Krinkle>	 And we can't require C.UTF-8 to be present cross-platform? Don't know how long a tail has it missing but that could be a consideration as well.
[20:00:54] <TimStarling>	 that's the only thing that stops us from using C internally, the other issue is what environment to use for external shell commands since they have their own bugs with LANG=C
[20:02:36] <Krinkle>	 Right, we don't distinguish that right now. We use the same for both (via wgShellLocale), not overridden by Shell.php et all.
[20:03:34] <Krinkle>	 Although I suppose if we can't require C.UTF-8 then distinguishing between what we set for MW proc vs sub proc presumably won't help. The one for MW proc itself can work with C but the difficult one remainig then is sub procs.
[20:05:24] * Krinkle reads https://sourceware.org/glibc/wiki/Proposals/C.UTF-8
[20:06:42] <Krinkle>	 Ah, this is much n ewer than I thought
[20:06:57] <Krinkle>	 yeah, `locale -a` on macOS for me doesn't contain C.UTF-8 for example
[20:07:44] <Krinkle>	 although it has `en_US.UTF-8` I don't know if there are distros we need/want to support that would exclude that intentionally or something like that? If not, that might be good enough as a fallback.
[20:10:00] <Krinkle>	 If we're not sure, I suppose we could do a Pingback bucket for this to gather some data in the wild over a period of time.
[20:13:14] <TimStarling>	 2011 in Debian, 2015 in Redhat
[20:16:40] <TimStarling>	 just reading those bugs to figure out whether it is default and/or optional
[20:18:37] <Krinkle>	 From the macOS-related thread linked from on sourceware, I gather that on macOS "C" behaves as "C.UTF-8" despite not being called that
[20:19:12] <Krinkle>	 LANG=C LC_ALL=C LC_COLATE=C php ~/Documents/Temp/tmp.php
[20:19:13] <Krinkle>	 string(6) "'dög'"
[20:20:16] <Krinkle>	 confimed also via /usr/bin/locale that it doesn't normalise to something different. It seems to identify fully as "C".
[20:27:00] <Krinkle>	 whatever is running behind 3v4l.org doesn't have C.UTF-8 or C-like-UTF-8 it seems https://3v4l.org/u3D8X
[20:27:36] <TimStarling>	 I was curious about the mac case so I looked at the implementation in PHP
[20:28:31] <TimStarling>	 escapeshellarg() calls mbrlen() which gives you a locale-sensitive number of bytes in the next character in a string
[20:29:09] <TimStarling>	 with LANG=C in linux it appears that mbrlen() on a non-ASCII character gives an error, "invalid multibyte sequence"
[20:29:20] <TimStarling>	 which causes PHP to skip it
[20:30:02] <TimStarling>	 so if OSX's mbrlen() just always returns 1, like an 8-bit clean locale, then it would pass through UTF-8
[20:30:37] * Krinkle has looked at more C code this months than the past 5 years prior
[20:30:41] <Krinkle>	 month*
[20:31:15] <Krinkle>	 right, I follow you there. I'm trying to get a hello world .c file to run now to test that in isolation.
[20:34:22] <Krinkle>	 I'm impressed my two lines of code can produce such a long wall of errors
[20:34:29] <TimStarling>	 lol
[20:36:40] <TimStarling>	 for internal usage the correct fallback sequence is probably C.UTF-8 -> C, with escapeshellarg() usage replaced with our own thing
[20:37:06] <TimStarling>	 for external commands, I guess C.UTF-8 -> en_US.UTF-8 -> C
[20:38:24] <Krinkle>	 OK. I copied a C++ program and was using gcc instead of g++. Fair enough.
[20:39:28] <TimStarling>	 if we're too lazy to replace escapeshellarg(), then I guess C.UTF-8 should be required except if we're on OSX
[20:40:19] <TimStarling>	 OSX could be detected by just doing escapeshellarg('ⓒ') === 'ⓒ', i.e. a feature test instead of an OS test
[20:41:21] <Krinkle>	 Right, yeah, that is assuming C.UTF-8 has proliferated enough for our needs. Requiring PHP 7.2+ like we do and it only applying to the next release, while obvious, does narrow it down quite a bit.
[20:44:06] <Krinkle>	 I've been working on a CPP project recently and one thing I did fall in love with immediately is the compiler and its super helpful warnings and errors (not joking). I don't know if real gcc is better or worse than the clang alias macOS ships, but it's spot on every time and easy to use.
[20:44:43] <TimStarling>	 clang claims to have the best errors, gcc's are not bad though
[20:45:06] <TimStarling>	 better than they used to be
[20:45:10] <Krinkle>	 Like, it never tells me something generic like "Syntax error, was expecting random thing thing you most definitely didn't want to do" as PHP or JS would.
[20:45:40] <Krinkle>	 but instead it tells me "you need a pointer here, use * to make it so" or "missing semi colon"
[20:46:25] <TimStarling>	 this virtual offsite idea is kind of working, although I'm not sure what to do about meals when I've been up for 5 hours and it's only 7:45am
[20:46:53] <TimStarling>	 I think I'll get something before techcom but not sure what
[20:51:34] <Reedy>	 At that sort of extent, unless you're gonna have like breakfast with the kids, have whatever takes your fancy
[20:52:46] <Krinkle>	 well, I'm not sure I did this right, but; https://gist.github.com/Krinkle/6c2fc025d0bdba08143329264f1f4034
[20:52:58] <Krinkle>	 Yeah, on LANG=C/LC_ALL=C, ⓒ produces len 1
[20:59:23] <TimStarling>	 I think your mbstate_t needs to be zeroed out, that syntax would give uninitiazed stack garbage in s
[21:03:07] <TimStarling>	 e.g. mbstate_t s = {0};
[21:06:13] <TimStarling>	 the manual suggests using memset(), see https://www.gnu.org/software/libc/manual/html_node/Converting-a-Character.html#index-mbrlen
[23:56:07] <Krinkle>	 TimStarling: hm. no protection/warning against that uh?  ok. (done) still seems to behave the same. Although trying out the counting approach (so that Hello yields 5 instead of 1) didn't work for me, got -1 instead.
[23:56:31] <Krinkle>	 TimStarling: btw, not sure if you can squeeze this in, but could really use a hand here at least to confirm or rule out my existing theory - https://phabricator.wikimedia.org/T239724#5729709