[21:59:53] * robla starts prepping for chairing this [22:00:58] * DanielK_WMDE imagines the sounds of shuffeling, chairs being moved, and low mumbling as the room fills [22:00:58] https://phabricator.wikimedia.org/E140 [22:01:38] #startmeeting T114662: RFC: Per-language URLs for multilingual wiki pages [22:01:38] Meeting started Wed Feb 10 22:01:38 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. [22:01:38] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [22:01:38] The meeting name has been set to 't114662__rfc__per_language_urls_for_multilingual_wiki_pages' [22:02:20] #topic T114662: RFC: Per-language URLs for multilingual wiki pages Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [22:02:35] o/ [22:03:02] #link https://phabricator.wikimedia.org/T114662 [22:03:29] hi all! [22:03:41] so, let's start by re-stating the problem I'm hoping to solve. [22:03:54] * aude waves [22:04:10] on same wikis we have pages that show different content depending on the user language. i'll call them "multi-lingual pages". [22:04:30] examples are file descriptions on commons, and wikidata items [22:04:34] o/ [22:05:00] these are different from translated pages in that there is only one page source, from which different language versions ("renderings") are generated [22:05:14] DanielK_WMDE: so when I look at wikidata item, the content also seems to depend on my babel settings, not only primary language. Namely, the "In more languages" box [22:05:42] SMalyshev: yes, true, which is why we inject that box on the fly, and don't put i9t into the parser cache :) [22:05:47] I know I've said this before, but almost everything about this is identical to variants, isn't it? [22:06:00] DanielK_WMDE: ah, ok, that solves it [22:06:05] TimStarling: yes, pretty much. i want to generalize the mechanism we use for variants [22:06:30] your proposed hyphen is interesting, /wiki-de/ [22:07:06] #info question discussed: should this generalize the mechanism for language variants? [22:07:11] in the variant wikis we use the language at the top level, /de/ which works well enough for variants but will have conflicts if it is extended to languages [22:07:23] so basically, we don't want to serve different renderings from the same url. the web caches don't like that. which means that currently, all non-english views of commons and wikidata bypass the web caches. which is why we don't make it very easy for anons to get to a non-english view [22:07:23] for example there is a language code "api" [22:07:42] TimStarling: heh, good point :) [22:08:27] TimStarling: also, putting the language into the path makes it hard to explicitly link to a specific language representation. links like [[File:Foo.jpg@@es]] would be kind of nice to have [22:08:56] Or the language code could be attached to the namespace. Easier to avoid conflicts there than in page names. [22:09:24] DanielK_WMDE: how often would you link to "this file in Spanish" instead of "this file in my current language"? [22:09:40] I see not much case for the former... [22:09:50] WHich brings up another issues: wikis typically have multi-lingual pages in one namespace, but monolingual or translated in another. Should different namespaces then have a different path in the url? [22:10:17] SMalyshev: not that often, but in important cases. when discussion problems with the rendering, for example# [22:10:39] basically, in meta-content like discussions. [22:10:54] DanielK_WMDE: well, it's it's for technical discussions then it may have awkward syntax :) [22:11:10] TimStarling: what do you think of the per-namespace issue? variants apply to all namespaces, right? [22:11:35] there are hacks in some of the subclasses which disable variant conversion in some of the namespaces [22:11:37] but for wikidata, i wouldn't want to split the cache for all the talk pages and policy pages, which are not multilingual... [22:11:47] yay, hacks ;) [22:12:07] #info questions discussed: how do we ensure a good caching strategy? how often do we want URLs to be language specific? [22:12:09] SMalyshev: i could live with that, as long as it's possible [22:12:32] * DanielK_WMDE shuts up now and listens for questions/comments [22:13:29] DanielK_WMDE: why it is /wiki-de/ vs. /wiki/...?lang=de ? [22:13:33] you know on the variant wikis we currently deliver content from /wiki/ instead of redirecting [22:13:52] I recently have been thinking about this since I proposed changing it and then thought again [22:14:11] the main argument for delivering content from /wiki/ is so that people can share links [22:14:20] SMalyshev: prettyiness. /wiki-de/ would rewrite to ?uselang=de. if we put the language elsewhere in the url, we may want to handle this in php though [22:14:25] i.e. copying the URL out of the address bar and pasting into facebook or whatever [22:15:04] probably a better solution for that is to nag users on first view to see if they want to change their language [22:15:15] redirect and nag with a popup [22:15:24] TimStarling: that's a good point... if somebody reads Wikidata in Russian and posts link on Facebook and somebody from Spain clicks on it - should they see it in Russian or Spanish? [22:15:26] TimStarling: yes, when copy&pasting a full url, what do you *mean* by that? Are you referencing the abstract content, or the concrete representation?... [22:15:56] a sane thing that doesn't need nagging might be to "fix" the adress bar with JavaScript [22:16:02] TimStarling: it would still be possible to get a language neutral url that would do some kind of guessing. but if you take the url from your browser, it will be for "your" language [22:16:11] i think if we make it easy to switch the language, it should be fine [22:16:43] TimStarling: i was thinking of a a persistent navigation bar instead of a popup, but yea [22:16:47] jzerebecki: hmm... that's a nice idea... i.e. if your language == current language then remove language from browser URL? [22:17:10] #question discussed: what should happen when visiting a language neutral URL? [22:17:17] jzerebecki: possible... but it's cheating, right? it messes with the semantics of urls. [22:17:38] sounds like some sort of dystopic vision of the web [22:17:43] DanielK_WMDE, even if you do the offer to switch in a 48px font, lots of users will still fail to notice. users are like that. shit should just work :P [22:17:44] robla and All: (Re "how often do we want URLs to be language specific?" For ContentTranslation and a planned CC Universal Translator e.g. at WUaS, the more the merrier ... but with much ArchCom planning) [22:17:53] I think language variants actually support both types of urls? the /wiki/ url gives you the default language, and if you're logged in you can change your preferred variant, but then the /zh-cn/ url gives you a fixed language/variant. [22:18:11] TimStarling: one assumption is that it is *really* a good idea to serve different content from different urls. and that this is the main reason that we don't allow anons to set their language. [22:18:13] DanielK_WMDE: the thing is most people don't care in which language it is as long as it's in their preferred language. I.e. shareable link should be language-neutral [22:18:16] does that assumption still hold? [22:18:18] which is what a user would want most often? the language neutral url or the language specific one? [22:18:21] #info question discussed: what should happen when visiting a language neutral URL? [22:18:27] but for cache, URL should be language-specific [22:18:35] robla: i think uls takes care of that, but only for interface language [22:18:38] so commons..../wiki would give you "your" language (as best we can guess) but commons..../de/ or commons..../wiki-de would give you specifically the german version [22:18:51] SMalyshev: and for bookmakrs?... [22:19:00] DanielK_WMDE: I'd say the same, neutral [22:19:04] afaik, so if you are in the uk, you get british english (may be for logged in users only, not sure) [22:19:21] SMalyshev: that would mean using jzerebecki's proposal of "fixing" the address bar [22:19:30] it also considers browser accept langauges, etc to guess [22:19:30] DanielK_WMDE: unless you are a developer debugging specific language rendering :) [22:19:35] you know how permalinks work? we could have that kind of thing for language-neutral URLs [22:19:37] DanielK_WMDE: right, I like it [22:19:46] I've added some example links at https://phabricator.wikimedia.org/T114662#2016907 [22:20:08] it's not necessary to guess the user's intention and rewrite the address bar, breaking their assumptions about how URLs work and breaking all use cases except for the one we are guessing they want [22:20:15] it's a "casual vs power user" issue again. a casual user would want links that are "magic", a power user would want links that are fixed. [22:20:23] quiddity: shouldn't https://zh.wikipedia.org/zh-cn/12 example have some wiki in the path? [22:20:33] quiddity: oh, thank you! [22:20:34] we can just provide a toolbar link to the language-neutral URL [22:20:47] #info examples at https://phabricator.wikimedia.org/T114662#2016907 [22:20:55] SMalyshev: not at the moment. [22:21:05] SMalyshev: "wiki" isn't really a pan-language "word" [22:21:06] ah, I meant in current proposal... [22:21:26] ok, I got it [22:21:30] SMalyshev: i don't know. i'd prefer to keep it consistent with how zhwiki and languageconverter are already set up, myself. [22:21:42] I don't know what DanielK_WMDE prefers [22:22:04] cscott: it does look nice but I'm afraid of namespace conflicts. [22:22:15] #info jzerebecki suggests to use javascript to "fix" the address bar to show a language neutral url. SMalyshev thinks language neutral links are best for bookmarks and sharing [22:22:20] i think /w/ is the only possible conflict, and that's not a valid language code [22:22:21] cscott: what if some language has a code that is already a used url? [22:22:39] am i wrong about that? are there other protected url roots? [22:22:53] yeah, I mentioned /api earlier [22:23:03] cscott: api? [22:23:10] in any case, \w\w(-\w+) is the language code protected root [22:23:15] cscott: https://en.wikipedia.org/api/ [22:23:17] #info we can just provide a toolbar link to the language-neutral URL [22:23:20] api isn't a language code [22:23:29] gwicke put restbase there [22:23:29] but api is a really small language, moribund, we probably won't ever have a wiki in it [22:23:38] http://www.ethnologue.com/language/api [22:23:49] ISO 639-3 [22:23:49] api [22:23:52] TimStarling: we're using two-letter language codes, not three-letter language codes. [22:23:54] ISO 639-2 [22:24:02] #info question: how does /api interact with this proposal? [22:24:03] cscott: We use both... [22:24:03] * quiddity starts an incubator proposal, now! >.> [22:24:03] they are two separate standards [22:24:16] cscott: https://azb.wikipedia.org/wiki/%D8%A2%D9%86%D8%A7_%D8%B5%D9%81%D8%AD%D9%87 [22:24:28] https://meta.wikimedia.org/wiki/Special:SiteMatrix plenty of 3 letter ones... [22:24:32] no, we use BCP 47 [22:24:49] ah, boo. [22:24:50] DanielK_WMDE: so, the use cases you are interested in will have hundreds of different language variants? [22:24:54] and there's stuff like be-x-old [22:24:56] cscott, TimStarling: i'm not thinking of putting the language code at the beginning of the path. I'd go for /wiki-xy/, or put the code later in the url, into the namespace, or even as a suffix [22:24:58] which specifies that we use 2 letter codes for languages that have them, and 3 letter codes otherwise [22:25:12] so that's why I worry about namespace clashes :) [22:25:19] structrally, a suffix would make most sense. practically, putting it into the namespace would be asier, thought [22:25:23] BCP 47 also gives some rules about hyphens in language codes [22:25:30] e.g. en-US [22:25:30] DanielK_WMDE: yes, i know. i'm advocating for something we can (eventually at least) make consistent with how LanguageConverter works [22:25:38] #info daniel sais: i'm not thinking of putting the language code at the beginning of the path. I'd go for /wiki-xy/, or put the code later in the url, into the namespace, or even as a suffix. structrally, a suffix would make most sense. practically, putting it into the namespace would be asier, thought [22:25:50] I'll just say again that I think we should look at our URLs holistically. [22:25:52] DanielK_WMDE: you shouldn't need to specify that you want the page in zh-cn twice in the URL, once for languageconverter and once for the commons ui [22:26:07] I think adding a fourth URL pattern is a bad idea without a clear plan. [22:26:12] or three times, even, if you account for the domain name [22:26:34] cscott: yes, absolutely [22:26:45] #info you shouldn't need to specify that you want the page in zh-cn twice in the URL, once for languageconverter and once for the commons ui [22:26:47] en.wikidata.org? [22:26:50] yue.wikipedia.org/wiki-yue/File:Foo?langinfo=yue [22:26:58] #info I think adding a fourth URL pattern is a bad idea without a clear plan.the commons ui [22:27:04] Marybelle: what do you suggest? [22:27:14] DanielK_WMDE: Writing a specification for our URLs. [22:27:15] I agree with Marybelle. Do I get to say that often? [22:27:15] but like I say, Apiaká has "population 1" in the ethnologue, a single native speaker [22:27:20] SMalyshev: subdomains are a pain wrt same origin policy and such# [22:27:21] and en.commons.wikipedia.org [22:27:22] SMalyshev: that was once a real domain! :P [22:27:29] #info purging hundreds of variants with different URLs won't be feasible until we have Varnish 4 [22:27:31] And Glottolog - http://glottolog.org/glottolog/language - now lists 7,943 engaging many different code systems including ISO 693 + [22:27:59] DanielK_WMDE: same origin can be fixed with crossdomain.xml I think? [22:28:06] and some headers [22:28:07] DanielK_WMDE: I think we need to say "this is how you access German content" and if www.wikidata.org doesn't conform, we should change wikidata.org, for example. [22:28:15] SMalyshev: yes. for 300 languages. maintained by hand. fund. [22:28:18] fun even [22:28:36] I don't know of a good reason that Wikidata is special and doesn't use de.wikidata.org, if that's what the other 11(?) projects use. [22:28:52] doesn;t have to be by hand... probably can be scripted. But just an idea, I don't insist [22:29:35] Marybelle: because the prefix indicates that the projects have different content. we are discussing serving different representations of the *same* content [22:29:36] Marybelle: none of the other multilingual projects (meta, commons, mw.o) use language subdomains [22:29:45] anyway, my concrete proposal is to use the root part of the path, acknowledging the issue with /api/ and /w/ and apache configs requiring a BCP47 regexp. I think the precedent of zhwiki and others have established that this is workable, although they have only had to worry about a single set of language variants, not "all possible BCP47 tags". [22:29:57] legoktm: Maybe they should? [22:30:09] cscott: there's /static/ too, IIRC [22:30:14] I agree :) [22:30:15] (in what ways might it be possible to allow for planning for all 7,943+ languages plus invented ones etc at a later stage in related CC wiki projects?) [22:30:26] gwicke: sure. but that doesn't conflict. [22:30:32] DanielK_WMDE: Different content in what sense? de.wikipedia.org/wiki/Barack_Obama and en.wikipedia.org/wiki/Barack_Obama are essentially the same content. [22:30:54] Scott_WUaS: (?!api)\w\w\w?(-\w+)? [22:31:08] They're both encyclopedic biographies of Barack Obama, one's in German and one's in English. [22:31:29] legoktm: Or go the other way and put everything under www.wikipedia.org/de/ or wikipedia.org/en/ or something. [22:31:30] Marybelle: "the Spanish translation of page X" is different from "the Spanish community article about X", which is again different from "the Spanish rendering of X". Mixing them together is not going to help. [22:31:45] I'm all for a clear URL scheme thought [22:31:53] Marybelle: I'm with DanielK_WMDE here, it's different content on wikipedia, but same content under different rendering on wikidata [22:31:56] DanielK_WMDE: for the projects you are interested in, is that actually an issue? [22:32:00] We already have several schemes. [22:32:02] My proposal is still preserving the difference between zh.wikipedia.org/yue and yue.wikipedia.org/yue [22:32:04] so yeah, domain is slightly misleading [22:32:26] Marybelle: they are the same kind of content. they are not the same content, they are not even translations. I mean the *exact* same content. there is only one place to edit, only one edit history. [22:32:40] that is, the first is "chinese wikipedia, presented in cantonese" and the second is "cantonese wikipedia, presented in cantonese" [22:33:10] wikidata information is exposed under many domains [22:33:12] although i personally would love to see en.wikipedia.org/es/.. *eventually* be a machine translation of english wikipedia into spanish [22:33:23] Marybelle: I'd like to combine variant and uselang. one scheme less. [22:33:25] which is, again, different than the content of spanish wikipedia, or es.wikipedia.org/en [22:33:30] gwicke: is what actually an issue? [22:33:33] but what about Daniel's idea for title extension, [[Foo@@de]] or whatever [22:33:43] DanielK_WMDE: a mix of different content vs. presentation [22:34:01] I guess the title doesn't have to be in the path, with interwiki titles it is not [22:34:03] my understanding is that the projects you are thinking of are only varying the presentation, not the content [22:34:05] TimStarling: I think linking is already terribly confusing and I'm hesitant to see it made more complicated. [22:34:18] #info My proposal is still preserving the difference between zh.wikipedia.org/yue and yue.wikipedia.org/yue [22:34:21] We already have [[w:en:foo]] and [[en:foo]], some of which change based on context. [22:34:46] Marybelle: also [[:en:foo]] and [[foo/en]]. [22:35:00] Yep. [22:35:35] Marybelle: my idea is to reduce the confusion my conflating uselang with variant, and (for multilingual content) content language (or variant) with interface language. [22:35:46] i hope that will make thinks less confusing. [22:35:58] cscott, DanielK_WMDE: would there actually be a difference between de.wikidata.org and wikidata.org/de/ ? [22:36:20] gwicke: no, wikidata and ocmmons both use the translate extension to manage page translation on poilicy and help pages. [22:37:04] wikidata.org/de/ doesn't solve the problem of namespaces though... unless we do some clever redirects further [22:37:20] or link all namespaces to wikidata.org/wiki/ [22:37:20] gwicke: no, but between different translitterations of wikidata.org/wiki/Help:Foo/zh [22:37:21] yeah, i'd hope they'd be the same. in theory, data.wikimedia.org would be a better parallel to commons.wikimedia.org [22:37:37] so, about namespaces. [22:37:40] Commons wasn't really supposed to live on wikimedia.org forever. [22:38:10] Marybelle: be that as it may, my point was that "commons" appears in the slot designated for "language" to indicate that it is a pan-language project. [22:38:18] en.commons.wikimedia.org ;) [22:38:29] How about /wiki/File@en:Foo.jpg? /wiki/File:Foo.jpg/en would also be nice, but the suffix syntax is used by translations. and if we need both the translation language and the variant, it doesn't work, [22:38:35] cscott: Sure. And www appears in the designated slot for Wikidata. [22:38:40] variants arn't that relevant for wikidata though [22:38:56] Marybelle: bah, ban all www hostnames [22:39:05] DanielK_WMDE: what about localized namespace names? [22:39:18] cn.zh.wikipedia.org [22:39:20] DanielK_WMDE: would they direct to their language or current one? [22:39:24] DanielK_WMDE: well, they are, strictly speaking. it's just that you'd probably have explicit UI localizations into the variant languages, instead of using LanguageConverter to automatically create them. [22:39:25] Marybelle: sadly. apparently, all the scripts expect there to be a subdomain. i'd love to get rid of the www [22:39:30] gwicke: SSL issues. [22:39:32] (DanielK_WMDE: other CC wiki organizations will likely build on Wikimedia's 300 language standard here and add many more languages to this) [22:39:53] SMalyshev: localized namespaces names are tricky. for one thing, they conflict and contradict. [22:40:05] Marybelle: good point, I guess zh-cn.wikipedia.org would avoid that [22:40:09] you may end up with a bidi title [22:40:15] DanielK_WMDE: that's why i think it makes sense to share url space. I'm just saying "I want to see content in Traditional Chinese". The user doesn't need to be hit on the head with the different implementation mechanisms we use to make that happen. [22:40:27] but, that wouldn't give us a clean hierarchy [22:40:29] cscott: +1 [22:40:42] does everyone feel the problem defined in the RFC is well scoped? [22:41:41] I don't understand this bit about language codes in namespaces [22:42:01] @robla: please repost [22:42:09] the RFC has a fairly wide scope, but I think most of it is uncontroversial [22:42:18] robla: ish? T114662 and T114640 are related, but i think we're making good progress on this half of the issue. [22:42:44] are there any objections to the central idea of making wikidata use URLs for language variants instead of cookies? [22:42:49] TimStarling: so, the different-url-per-user-lkanguage thing would not apply to all namespaces. because most of them wouldn't use it, the content would always render to the same target language. [22:42:52] that is, figure out how to specify "i want to see content in target language/variant X". then there's the question of how we pass that around the various bits of mediawiki and template land, but i think that's separable. [22:43:05] TimStarling: technically it's blocked on purging [22:43:18] (you know I argued for this from wikidata's inception) [22:43:20] gwicke: and on making hte Linker smarter [22:43:22] unless we also want to disable caching [22:43:23] robla: so i'm saying i think there is a boundary here that's working for the moment, and therefore the rfc is well scoped. until/unless it isn't any more. ;) [22:43:26] use vcl_hash [22:44:11] cscott: "...and not screw with caching" ;) [22:44:19] Varnish 4 has the XKey mechanism which makes purging lots of variants with different urls feasible [22:44:33] #info figure out how to specify "i want to see content in target language/variant X". [22:45:27] TimStarling: for wikidata, would we want /wiki-de/Q1234 but /wiki/Talk:Q1234? Or would it be /wiki-de/Talk:Q1234? BUt what does that mean? [22:45:32] gwicke, isn't Varnish 4 impossible to use for us due to stuff we need being made nonfree? [22:45:33] although TimStarling you are right that we could add VCL magic to understand variants & their relation [22:45:53] DanielK_WMDE: sure. and i think the meta question about whether and how to present "generic" URLs is still relevant. we sort of moved on from that but i don't know if we settled it. [22:45:58] /wiki-de/Talk:Q1234 could redirect to /wiki/Talk:Q1234 [22:45:58] and rewrite things so that all variants map to the same hash [22:46:00] DanielK_WMDE: do we mean wikitext, links in wiki or direct URL here? [22:46:02] TimStarling: if it'S per-namespace, we may not want the language code in the path [22:46:38] #info /wiki-de/Talk:Q1234 could redirect to /wiki/Talk:Q1234 [22:46:49] the implication being that language variants would be globally enabled in apache configuration and configurable in MW [22:46:49] ok fine. could work. the linker should do the right thing, thought [22:46:51] MaxSem: it's under consideration; I think we'd lose persistent storage as it stands right now [22:46:55] DanielK_WMDE: I understand that Talk is assumed to be multilingual on wikidata.org? Even though project chat is not? [22:46:59] see https://phabricator.wikimedia.org/T122880 [22:47:07] (an in fact all Talk is in English) [22:47:12] DanielK_WMDE: I think the strawman was that all links would go to the generic URL, and logged-in users would be either redirected or URL-rewritten to their preferred language? [22:47:22] MW core would see a language variant URL and would check local configuration to see if it is enabled for that namespace [22:47:28] SMalyshev: no in the sense that mediawiki could magically show it in the right language. multi-lingual here just means you can write in any language. [22:47:35] i think that last part is the controversial part, right? having logged-in users always have uncachable content? [22:47:43] (maybe we need a disctinction between multi-lingual and poly-lingual?) [22:47:52] for content-affecting variants, we'll need a way to address this in APIs as well [22:47:53] DanielK_WMDE: yeah that's what I meant. One Talk page for all, not one per language [22:48:26] #info we could add VCL magic to understand variants & their relation and rewrite things so that all variants map to the same hash [22:48:29] (definitely need some better terms :) [22:48:29] for apis, modeling variants as a domain would be a lot simpler currently [22:48:40] Technically it's possible to better embrace polylingual talk pages, with machine translation into a particular set of favored languages, possibly aided by explicit manual translation (where the discussion warrants) [22:48:45] MaxSem: won't we have that problem anyway, as I assume upstream moved from varnish 3 to 4 [22:49:00] cscott: logged in users always bypass the web cache anyway. [22:49:04] gwicke: we can't model variants as a domain because there is overlap in zhwiki-land (maybe elsewhere as well) [22:49:12] #info for apis, modeling variants affecting content as a domain would be a lot simpler currently [22:49:19] gwicke: that is, there is both a cantonese wikipedia, as well as cantonese being a variant on chinese wikipedia [22:49:19] jzerebecki, well - there were talks about ATS, for example =) [22:49:29] #info question discussed: should talk pages associated with multilingual content be polylingual? [22:49:39] cscott: how does that rule out domains? [22:49:39] gwicke: it's not just variants, but whole languages [22:50:19] gwicke: currently the domain specifies the project, not the target language. but maybe you're thinking of something else -- could you explain how you want domains to work? [22:50:44] robla: you could also have Talk:Foo/en Talk:Foo/de etc [22:50:57] for an api, something like zh-cn.wikipedia.org would be pretty straightforward [22:51:10] #info one of the main concerns is to have a *sane* url scheme, and not come up with something too ad-hoc. And overall, the confusing mess of i18n for mediawiki shouldn't become even more complicated. [22:51:13] with zh.wikipedia.org serving the un-translated content [22:51:36] gwicke: right. that doesn't work because there are conflicting demands for yue.wikipedia.org [22:51:46] politics... [22:51:48] * robla is putting in the questions as an attempt at NPOV summarizing the discussion for the notes, rather than asking those questions on his own behalf [22:51:52] yeah, doesn't work, very sad [22:52:03] the politics of not having perfect machine translation? [22:52:19] no, i think it's worthwhile separating *projects* from *languages*. there is not a 1-to-1 mapping. [22:52:23] and that's probably a good thing. [22:52:27] obviously en.wikipedia.org/zh/Foo is the Foo article translated into chinese, right? [22:52:29] cscott: could you elaborate? [22:52:32] no, the politics of which variant is the "main" variant.... [22:52:46] robla: great [22:52:57] TimStarling: yes. eventually, at least. [22:53:30] TimStarling: if I could design the url pattern from scratch, i'd go for en.wikipedia.org/wiki/Foo@zh [22:53:42] but it's too late for that now [22:53:46] gwicke: the "zhwiki" project treats "yue" (cantonese) as a dialect/variant of chinese. But there is also a separate "cantonese wikipedia project" at yuewiki. [22:53:55] * robla notes we have 6 minutes left of officially scheduled time before he hits #endmeeting [22:54:05] no, i think it's worthwhile separating *projects* from *languages*. there is not a 1-to-1 mapping.#info [22:54:11] #info no, i think it's worthwhile separating *projects* from *languages*. there is not a 1-to-1 mapping.#info [22:54:14] cscott: okay, so zh-yue. and yue.? [22:54:49] gwicke: i think keeping the name of the project completely separate from the name of the target language would be wise [22:54:53] broadly, languages don't always follow political boundaries nicely. i don't think conflating projects and languages is a good idea. at all. [22:55:18] ok, can we wrap up now? [22:55:20] and vice versa -- there are languages which turn out to be very similar, but happen to have different names on different sides of a political border [22:55:30] DanielK_WMDE: there is content language vs. ui language as well [22:55:37] i've already been asked about helping to allow some india/pakistan wiki projects to merge [22:55:39] and for apis, it's primarily about the content [22:55:46] please can we have action items and summaries only for a few minutes? [22:55:55] #info TimStarling likes language codes in the path, like /zh/ or /wiki-de/ [22:56:09] cscott does too, fwiw [22:56:23] I'm open to the title extension idea [22:56:36] :) [22:56:44] #info I'm open to the title extension idea [22:56:48] but not to gwicke's domain reuse if I understand it correctly [22:56:49] action items would be welcome [22:56:54] i'm unsure about the next steps [22:57:09] except for refactoring Linker. we'll need that no matter which route we go [22:57:31] #action DanielK_WMDE to list proposed URL schemes with pros/cons [22:57:37] DanielK_WMDE: I think given the type of meeting, this is about ensuring the problem definition is correct [22:57:38] we could push the config changes to allow as the first part of the url, even before rolling out other support [22:57:43] (cscott and all: for future reference, CC WUaS's starting place for coding online CC schools/universities - and we donated to WUaS last autumn - is with all 7,938 languages and 204 nation states) [22:57:45] 'k [22:57:53] that would allow us to find out if there are other gotchas there, like api or static or something unexpected [22:58:14] who is going to implement the machine translation to power this? [22:58:16] *WUaS donated WUaS to Wikidata last autumn [22:58:23] cscott: i actually don't want the language code as the very first thing [22:58:32] DanielK_WMDE: there's a lot of work that is not blocked, right? [22:58:46] gwicke: we already do. the base content is not in any natural language. we are rendering structured dsata [22:59:06] well, i'm saying en.wikipedia.org/en-uk/Foo could start providing content, even though it won't actually be britishized. [22:59:08] en.wikipedia.org/zh/Foo <- how about this case? [22:59:21] TimStarling: yes, if there is vague consensus that we want *something* like this, there's quite a lot we can already do [22:59:39] gwicke: that will have to wait for machine translation. [22:59:48] my point exactly is that we can roll out a consistent URL scheme even in advance of having all the machinery in place to implement the UX / language switching on the backend. [22:59:49] we could actually offer that. it would suck, but maybe better than nothing [23:00:10] * aude cringes at the idea of en.wikipedia.org/en-uk [23:00:11] this one is complicated enough a good goal for a followup RFC meeting is a field narrowing discussion [23:00:15] cscott: so we should decide on a nice URL scheme? [23:00:20] on wikidata, wikidata.org/en-uk/ would actually do something useful sooner rather than later [23:00:24] but for wikidata, it would be uncontroversial and mor esimple to start there imho [23:00:37] and most beneficial (and on commons) [23:00:52] cscott: all you need is a rewrite to uselang=en-gb. that already works. [23:00:58] #info robla suggests this one is complicated enough a good goal for a followup RFC meeting is a field narrowing discussion [23:01:00] on wikipedia, en.wikipedia.org/en-uk/Foo could probably localize the UX at least without too much pain. [23:01:26] hitting endmeeting in less than 60 seconds.... [23:01:29] we also need to confirm that cache purging is feasible in varnish 3 with vcl_hash [23:01:32] anyway, i'm trying to propose implementing the URL parsing part as a possible action item, but maybe it's not reached enough consensus yet. [23:01:34] ok, excellent! thanks for all the feedback, everyone! [23:01:37] cscott: but it would fuel the expectation that it become en-gb [23:01:42] gwicke: for now, it might just produce english wiki with uselang=zh? [23:01:52] #action confer with ops re vcl_hash [23:01:53] hoo: that expectation does not displease me. [23:02:01] SMalyshev: with chinese ui. that'S what we already do. [23:02:08] DanielK_WMDE: right [23:02:09] all done I think, time's up [23:02:12] ok...thanks everyone! [23:02:13] but it bypasses the caches# [23:02:15] #endmeeting [23:02:15] Meeting ended Wed Feb 10 23:02:15 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [23:02:16] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-02-10-22.01.html [23:02:16] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-02-10-22.01.txt [23:02:16] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-02-10-22.01.wiki [23:02:16] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-02-10-22.01.log.html [23:02:16] and isn't sticky [23:02:39] * cscott has to turn into a pumpkin to go pick up his kids [23:02:59] seems like a good topic to continue elsewhere (e.g. #wikimedia-tech) [23:57:47] https://meta.wikimedia.org/wiki/Wikidata/Notes/URI_scheme#Planned_implementation [23:58:00] I remember that