[00:25:14] Question: how are we going to ensure that articles stay in the same version of a language? Right now if we have an article written in British English, additions should be in BrE. [00:25:15] That's why I think iso 639-3 is not going to be finegrained and flexible enough. [00:26:04] We'll be using BCP47 codes which are ultimately arbitrarily fine-grained, though the mapping is imperfect. [00:27:05] I'm not sure `en-GB-Devon-Salcombe` is likely to be useful, but we could provide it technologically. Obviously which 'languages' are allowed would need to be controlled. [00:45:01] This is good to hear, thanks! [00:53:46] en-GB-Devon-Salcombe wouldn't be a valid bcp47 tag though [00:54:08] "en-u-sd-gbdev" is, though (for Devon, not specifically Salcombe) (re @Nikki: en-GB-Devon-Salcombe wouldn't be a valid bcp47 tag though) [00:54:14] yeah [00:55:27] For Salcombe we'd need to resort to en-x-Q1247786 [00:55:54] (which begs the question of how to handle combinations of country code, Qid, and Unicode BCP47 extension) [00:56:15] (which begs the question of how to handle combinations of country code, Qid, and Unicode BCP47 extension in language tags used in Abstract Wikipedia) [00:56:43] (is en-GB-u-sd-gbdev-x-Q1247786 overkill?) [00:57:48] en-gb-devon-salcombe to be valid someone would need to convince the subtag registry to add devon and salcombe, but since it's already possible to say devon another way, that's highly unlikely... but I'm mostly just being pedantic (and technically using qids doesn't always produce valid codes either since they can't be longer than 8 characters :/) [01:01:31] I don't think it's overkill to include that many parts if necessary, in that the qid is private use and the u extension ones barely known, let alone supported, but it's overkill in that I don't think we'd have a need to be that specific there [01:02:34] I don't think it's overkill to include that many parts if necessary, in that the qid is private use and the u extension ones are barely known, let alone supported, but it's overkill in that I don't think we'd have a need to be that specific there [01:03:44] With respect to the Qids being more than 8 characters, would base36 encoding the Qids be feasible? (re @Nikki: en-gb-devon-salcombe to be valid someone would need to convince the subtag registry to add devon and salcombe, but since it's already possible to say devon another way, that's highly unlikely... but I'm mostly just being pedantic (and technically using qids doesn't always produce valid codes either since they can't b [01:04:18] (then we'd have room for 36^7 possible Qids--e.g. Qxxxxxxx) [01:04:52] (then we'd have room for 36^7=~78 billion possible Qids--e.g. Qxxxxxxx) [01:06:00] and at the moment we're at 0.1/78=.13% of that [01:06:44] we'd presumably need a way to distinguish the unencoded ones from the base36 ones, but it seems like it would be theoretically plausible [01:07:32] Well before using such an encoding a mass migration (particularly in the realm of lexemes) would be necessary (re @Nikki: we'd presumably need a way to distinguish the unencoded ones from the base36 ones, but it seems like it would be theoretically plausible) [01:07:36] I remember someone (I think Daniel) mentioned that character restriction (though I seem to remember it as 4 rather than 8 characters) during the initial Lexeme development work [01:07:50] I don’t remember what we decided to do about it, just ignore it I guess [01:08:38] yeah [01:09:16] Variant, extension, and private use subtags are each maximum eight characters (re @lucaswerkmeister: I remember someone (I think Daniel) mentioned that character restriction (though I seem to remember it as 4 rather than 8 characters) during the initial Lexeme development work) [01:15:56] I'm pleasantly surprised to come across someone else who knows about the -u-sd- syntax... never happened before :D [01:15:59] aha https://phabricator.wikimedia.org/T167166 [01:19:27] okay, and is that restriction relevant in practice? for instance, I randomly found the https://github.com/jsommers/langtags via google and installed it, and that one doesn’t complain about longer private-use tags (re @mahir256: Variant, extension, and private use subtags are each maximum eight characters) [01:20:27] Was this ever addressed in any issues related to that package? Are there libraries which _do_ care about that restriction? (re @lucaswerkmeister: okay, and is that restriction relevant in practice? for instance, I randomly found the Python langtags package via google and installed it, and that one doesn’t complain about longer private-use tags) [01:21:40] (sorry, I pasted the wrong library – https://github.com/LuminosoInsight/langcodes is the one I tried, actually) [01:22:24] but yeah, my guess would be that not a lot of peoplle/libraries care about the restriction (unless someone has a counterexample)