[16:12:31] I just open wikipedia and a popup asked for a donation . . . which I've given in the past. In the speal, it talked about " We believe that facts matter." Well, I've read Dr. Matthias Rath wikipedia and it reads nothing of Dr. Rapp's positive research and findings. It ONLY cast a very negative image. FACTS to the positive in addition to the negative would be fair. This all and only negative reminds me of "lamestream [16:13:03] access to knowledge . . . . the positive about Dr. Rapp is knowledge . . . why would you hide that????? [16:14:03] I switch between Rath and Rapp . . . that is an error on my part. [21:06:01] is there supposed to be an archcom rfc meeting here now? [21:08:09] phab:E384 says 2016-11-30, 21:00 UTC [21:35:59] YairRand, 25 more minutes. Something got the time wrong. [21:36:35] * quiddity blames DST. [22:00:43] #startmeeting RFC meeting [22:00:43] Meeting started Wed Nov 30 22:00:43 2016 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. [22:00:43] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [22:00:43] The meeting name has been set to 'rfc_meeting' [22:01:20] #topic RFC: Per-language URLs for multilingual wiki pages [22:01:43] hi all! [22:01:54] #link https://phabricator.wikimedia.org/T114662 [22:02:03] #link https://phabricator.wikimedia.org/E384 [22:02:13] So, super brief synopsis: [22:02:37] Need: we want anon visitors to browse Wikidata and other multilingual wikis in their language [22:02:44] Problem: Serving different renderings for the same URL messes with web caches. [22:02:54] Solution: Force uselang based on some part of the URL path, similar to how language variants are handled [22:03:33] so the idea is to server english from a path like /wiki-en/Q123 and french from /wiki-fr/Q123 [22:03:52] o/ [22:03:56] the language code in the path would force the interface language via uselang, which on wikidata also causes the content to be generated in that language [22:04:08] first question: [22:04:17] wasn't there originally an idea to use code.wikidata.org/wiki/Q### kind of url structure? [22:04:30] yes, that was the first idea. [22:04:38] we can also go with that if it's easy enough to do [22:04:39] Q: would /w/... become /w-en/... etc? [22:04:51] i think there were concerns regarding using subdomains, because of cookie domains and stuff [22:04:53] is this a totally silly idea, or something we'd likey want to support? [22:04:58] YairRand: no. [22:05:07] YairRand: /w/ is the resource path, that would not be language specific [22:05:43] "other multilingual wikis" = commons? meta? mediawiki.org? [22:05:47] legoktm: i think subdomains look prettier, but are more tricky to get right. a rewrite based on the path is pretty simple. [22:05:50] so, would anons editing and whatnot just not get localized pages? [22:06:02] DanielK_WMDE: right. [22:06:05] mhm, existing variant wikis use a slightly different schema, e.g. /wiki/ is default, /langcode-variant/ for variants [22:06:20] We currently have a gadget for Commons and Wikidata that does this right? [22:06:30] TimStarling: Commons mostly. Things that use {{int}} and {{LanguageSwitch}} or whatever it's called would benefit. Things using Translate-style subpage would not yetg. [22:06:43] I'd also like to do that, for mw.o and meta, but that takes more thought and more work [22:07:15] Isn't Commons transitioning over to E:Translate though? [22:07:27] YairRand: oh, you mean when using an "ugly" path for editing, /w/index.php?title=...action=edit? [22:07:30] https://commons.wikimedia.org/wiki/MediaWiki:AnonymousI18N.js [22:07:40] DanielK_WMDE: Yes [22:07:43] DanielK_WMDE, reading through the two rfcs (lang variants and uselang one), one concern was whether there would be a default url for a default rendering vs. forcing all reads to have a lang component always [22:07:48] YairRand: good question. I hadn't though of that. maybe we *do* need /w-en/ etc for that case. [22:08:02] i think niklas raised that. [22:08:10] #info YairRand notes that we may need /w-en/ etc too, for action=edit at least [22:09:00] MaxSem: Variants modify content language. I'm talking about user language. I'd love to combine them somehow, but that needs more thought [22:09:16] subbu: ah, thanks for mentioning Nikerabbit's RFC [22:09:19] #link https://phabricator.wikimedia.org/T149419 [22:09:40] there is a competing RFC --^ that proposes to allow anons to select language, but not encode it in the url [22:09:54] the proposal is to split the cache based on a language cookie instead [22:10:05] DanielK_WMDE, it affects UI laguage too [22:10:09] iirc that'S not easy, but possible, with varnish [22:10:48] MaxSem: The interaction is... confusing. I'd live to sit with you and cscott and discuss how to integrate those things. [22:11:14] MaxSem: when we first discussed this rfc, i wanted to solve that problem at the same time. I now decided to narrow the scope, and focus on user language as needed by wikidata [22:11:46] for simplicity we can probably assume that commons is a different problem also [22:12:00] subbu: re default path for a default language: i think that when we first try this, the default path should stay as it is now. eventually, the default path can trigger a redirect to the apprpriate language path [22:12:11] wfm. [22:12:17] TimStarling: yes. for now, let's just focus on wikidata [22:12:44] TimStarling: so, for wikidata, i want the language in the url. reasons: [22:12:54] bookmakrs, links, and separate indexing by google [22:13:07] can't be done with cookies [22:13:33] #info subbu: re default path for a default language: i think that when we first try this, the default path should stay as it is now. eventually, the default path can trigger a redirect to the apprpriate language path wfm. [22:14:13] #info we can use subdomains instead of pathes, but it's probably harder to get right [22:14:42] #info competing RFC T149419 proposes to split the cache on a cookie, instead of the url. [22:14:43] T149419: Interface language selection for unregistered users on Wikimedia projects - https://phabricator.wikimedia.org/T149419 [22:14:52] ok. second question: [22:15:15] if we cache separate renderings (language versions) of a page into the cache, how do we purge them when the underlying data changes? [22:15:43] I think there's a hook? /me looks [22:15:47] varnish 4.1 has xkey. looks to me like it should do the trick. [22:15:48] DanielK_WMDE, wait .. subdomains? weren't subdomains frowned upon when we discussed the lang. variants proposal? Or, is this only a solution for wikidata? [22:15:52] how does XKey work/not work exactly? [22:16:18] TimStarling: when serving the page, you set xkey headers. the cache entry gets "tagged" with the xkey. [22:16:45] you then need a custom rule for purging those. e.g. by sending PURGE with a GET param xkey=Foo [22:17:10] sounds pretty simple [22:17:11] you rule would then call the xkey purge function (whatever it is called internally) [22:17:17] yes. let me find a link... [22:17:29] #link https://github.com/varnish/varnish-modules/blob/master/docs/vmod_xkey.rst [22:18:01] TimStarling: and you can send as many xkey headers as you like. one for each template used, maybe [22:18:38] xkey is apparently already installed on the wmf cluster. but i'm not sure there's a rule for purging [22:18:52] TimStarling: what would be the steps for getting this deployed? [22:19:01] how would be best test the functionality of xkey? [22:19:22] well, you can presumably test locally or on labs or whatever [22:19:28] #info vernish xkey could be used to purge all language versions (renderings) of a page in one go [22:19:45] TimStarling: sure. I was thinking of trying it with a small scale feature on the life system [22:20:04] before making a big project depend on it [22:20:10] i'll thing of something... [22:20:21] we already purge half a dozen URLs when article text changes [22:20:36] that's not a small project though :) [22:20:39] not at the same time, though [22:20:43] heh [22:20:56] most are triggered by async updates, and can't easily use xkey [22:21:03] but we could have an option to use xkey for this. and enabled it on a small wiki [22:21:12] yes [22:21:13] gwicke: why can't they? [22:21:46] #info perhaps could have experimental option to use xkey for regular page purges, and enable it on a small wiki [22:22:24] actually... wikidata also needs to purge JSON and RDF renderings of an item on top of the HTML. We could also try using xkey for that [22:22:24] DanielK_WMDE: there are several open tasks like T122867 about using xkey once bblack says that Varnish 4 is ready to start using that [22:22:25] T122867: Evaluate the feasibility of cache invalidation for the action API - https://phabricator.wikimedia.org/T122867 [22:23:11] not the least of which is thumbnails [22:23:22] bd808: i thought xkey is ready to go? https://phabricator.wikimedia.org/T122881 [22:23:23] according to https://phabricator.wikimedia.org/T131499#2821261 we're all on varnish 4 now [22:23:44] we need 4.1 for xkey [22:23:52] 4.0 has something similar, but not the same [22:24:00] it's called... hashtwo i think [22:24:07] they are async [22:24:13] mostly we need bblack to say "start using this feature" [22:24:44] gwicke: as long as it's known what resource a rendering is based on when it it served, that is all that is needed. [22:24:45] the mod being loaded isn't the same as a blessing that the feature is stable for use at scale [22:24:51] action item to corner bblack and make him say that? [22:24:57] gwicke: asynchronous processes for purging are no longer needed [22:25:07] TimStarling: sounds like a good idea [22:25:07] https://wikitech.wikimedia.org/wiki/XKey has some docs [22:25:15] bd808: true [22:25:23] legoktm: oh, thanks! [22:25:37] #link https://wikitech.wikimedia.org/wiki/XKey [22:25:38] "We currently do not support setting the same XKey on very large numbers of objects. In practice something on the order of 1-100 objects attached to a given XKey is reasonable." [22:25:52] so templates are out [22:26:00] for now [22:26:03] and lang variants are borderline I would assume [22:26:10] but it would be ok for the language use case. [22:26:14] it sounds like you could go ahead with the software development assuming XKey will be used [22:26:24] in practice, we are rarely going to have more than a few languages in the cache. [22:26:31] for some languages it could be 200 though [22:26:40] #info "We currently do not support setting the same XKey on very large numbers of objects. In practice something on the order of 1-100 objects attached to a given XKey is reasonable." [22:26:49] there is a need for orchestration [22:27:02] #info action item to corner bblack and make him say that? [22:27:13] gwicke: for the use case of purging language variants? [22:27:51] purge after updating X [22:28:11] ok... so how about the cache fragmentation as such. [22:28:13] most content updates work this way (ex: propagating template edits) [22:28:20] if xkey needs testing then we need to write the clients for it [22:28:34] this proposal basically means putting 20x or maybe 200x as much data into the cache for wikidata. [22:28:37] so, may as well do this on our side [22:29:24] what do you mean by "client" in this context? [22:29:42] yeah I am being loose with terminology [22:29:53] for purging an xkey, MW is the client [22:30:02] right, ok [22:30:22] on thing that bothers me is that xkey doesn't provide a standard http interface fur purging. you get to build your own. there is no standard. [22:30:32] the matrix <-> irc bridge is slow, so using webchat for now [22:30:36] but i guess we can just require it to accept xkey=foo as a url param [22:30:48] wb gwicke_freenode [22:31:00] DanielK_WMDE: I don't think async updates are going anywhere [22:31:13] we aren't going to re-render millions of pages synchronously [22:31:22] gwicke_freenode: my point is i don't think we need them for language variants [22:31:40] this isn't about re-rendering, and the template thing was a silly tangent [22:31:49] HTML cache invalidation via the job queue eventually ends up calling the same functions as synchronous updates [22:32:05] DanielK_WMDE: wait, what would replace async updates? [22:32:18] heh, matrix has finally caught up [22:32:57] DanielK_WMDE: all of our Varnish purges are handled by custom code. [22:33:04] so not a scary thing [22:33:07] gwicke_freenode: so. Q123 changes. I want to purge all 27 renderings of that from varnish. with xkey, i can do that with one HTTP request instead of 27. And I don't need to know *which* 27. [22:33:12] that'S it. [22:33:20] right [22:33:22] no change to sync/async for now. [22:33:39] yeah, I was only objecting to your earlier "async is no longer needed" statement [22:34:12] the patch would be to CdnCacheUpdate [22:34:16] gwicke_freenode: that was for the "brave new world" where we put an xkey for every template used... then the work would be left to varnish. but that's a silly tangent, and not going to happen any time soon [22:34:20] ok, back to reality [22:34:32] #info the patch would be to CdnCacheUpdate [22:34:37] even then it wouldn't work [22:34:43] anyway, back to the topic on hand [22:34:53] would it be ok to fragment the web cache for wikidata by x20 [rouch guess] [22:35:12] that'S a question for bblack, i suppose? [22:35:22] yes, we need ops to comment on that [22:35:30] do you have data on the percentage of anons who would actually select a non-standard language? [22:35:31] my intuition is "yes" [22:35:42] #info ask ops about impact of cache fragementation (x20?) [22:35:43] or a rough guess? [22:35:47] gwicke_freenode: no. [22:35:51] no data. [22:36:00] better to fragment the cache than to not have a cache, right? [22:36:02] rough guess... depends on how obvious we are going to make this [22:36:10] and we have LRU eviction to control cache size [22:36:18] you goal is to get the localized versions indexed, right? [22:36:19] me inclination is that there are relatively few "readers" on wikidata [22:36:30] does the client set the xkeys or is it inside varnish config? [22:36:46] gwicke_freenode: indexed, and also available to people we want to engage [22:36:52] SMalyshev: it is a response header from the backend [22:36:57] so MW controls it [22:37:12] aha. So we'll need the code to create proper xkeys [22:37:36] SMalyshev: the page title name will do for this use case. with some prefix, to avoid clashes with other use cases [22:37:40] yes, and as I was saying before, I think we can go ahead with that code without ops approval [22:37:58] we can use it for testing, and then get ops approval for final deployment [22:38:20] #info think we can go ahead with [xkey support in CdnCacheUpdate] without ops approval [22:38:27] DanielK_WMDE: hmmm not sure. Special:EntityData/Q123 is not the same page title as Q123, but probably needs common xkey [22:38:52] I think a URL-like namespace would make sense [22:38:53] SMalyshev: the xkey for both would be Q123 [22:38:55] that's the point [22:39:24] DanielK_WMDE: but that's not page title, that's only part of it. [22:39:28] TimStarling: like wikibase:entity:Q123? [22:39:48] I mean www.wikidata.org/wiki/Q123 [22:39:48] SMalyshev: it'S the page title of the actual data. for wikibase, call it the entity id. [22:39:59] basically use the language-neutral URL as the xkey [22:40:19] TimStarling: oh, you want the full URL as the xkey? right - the domain needs to be in there i guess. didn't think of that. [22:40:23] I don't know if there are implementation reasons to make the keys be really compact [22:40:37] #info basically use the language-neutral URL as the xkey [22:41:04] instead of the URL, we might go with the URI, but yea [22:41:07] cool [22:41:43] so... i have two more bikeshedding questions: [22:42:00] what should the path look like? and how do we make wiki-links to a specific language version? [22:42:23] I still think that paths are problematic [22:42:29] how often would wikilinks to a specific language actually be necessary? [22:42:43] YairRand: rare, but useful for discussions and bug reports [22:42:47] gwicke_freenode: why? [22:42:53] gwicke_freenode: what would you propsoe instead?= [22:43:03] parameters? cookies? subdomains? parrots? [22:43:17] it seems that this mixes content language and UI language [22:43:21] [{{fullurl:title|lang=xx}} link text] [22:43:40] gwicke_freenode: well, on wikidata, at least in entity namespaces, they are the same. [22:43:44] we already have a mechanism that (to users) appears to be a mechanism to select the content language: the domain [22:44:13] normally the domain selects the wiki, not the language [22:44:39] different domain, different community, different rules, different content [22:44:41] to users, it's the German wiki page about foo vs. the English one [22:44:58] DanielK_WMDE, to repeat my earlier comment ... weren't subdomains frowned upon when we discussed the lang. variants proposal? Or, is this only a solution for wikidata? [22:45:01] #info gwicke would prefer subdomains instead of pathes to select the language [22:45:08] I agree that editors might have a different perception, I'm mainly talking about readers [22:45:34] subdomains were the original proposal for wikidata back in wikidata prehistory [22:45:50] gwicke_freenode: i'm open to using subdomains if it doesn't cause too much trouble with cookies and SOP for user scripts and stuff. it should be possible technically. then it'S a product decision. [22:46:00] internally, it should set the uselang request param. [22:46:24] I think I agreed to it at the time, but I don't think it really helps the user understand what is going on [22:46:32] how would this affect API responses, for example? [22:46:42] subdomains might cause confusion, especially when the "X language wiki" seems to have loads of content and discussions in language Y and what is it doing here, etc... [22:46:59] YairRand: otoh, subdomains would solve the /w/index.php issue nicely [22:47:01] there can be more multilang wikis than wikidata... [22:47:16] if the API responses are expected to be localized as well, then imho the domain really makes more sense [22:47:19] can use the actual uselang parameter for /w/index.php [22:47:28] hm. fr.commons.wikimedia.org? [22:47:42] TimStarling: yes, we could. [22:48:00] #info can use the actual uselang parameter for /w/index.php [22:48:06] that would be messy [22:48:09] or we could use /fr/wiki/... and /fr/w/... [22:48:15] you'd have to encode some custom rules for wikidata [22:48:17] are subdomains are piling up? fr.m.meta.wikimedia.org... [22:48:21] or /lang/fr/wiki/... [22:48:33] YairRand: heh, indeed! [22:48:51] /api/lang/fr/rest_v1/? [22:49:07] ok for an api [22:49:13] i don't really want to see that in my browser [22:49:17] or type it in [22:49:59] ok, i guess we'll have to discuss pathes vs subdomains another time. it doesn't re4ally tough the other issues [22:50:05] actually... [22:50:23] one thing this needs to make the language "stick" is to mess with $wgArticlePath [22:50:40] * cscott pokes his head in [22:50:45] that will need som func refactoring, to make it work with both proper "DI" style code and stuff that relies on global state [22:50:52] ...and with IContext... [22:51:02] legoktm has been refactoring link generation already [22:51:14] yes, I love it! [22:51:25] that will help a lot [22:51:41] probably better to work on top of that than to hack $wgArticlePath [22:51:45] DanielK_WMDE, MaxSem: let's definitely sit down at dev summit in a month and a half to discuss ui language & other language stuff. [22:51:46] any, I think it will be simple enough to make this work gor 95% of links. [22:52:05] cscott: yes, let's! [22:52:05] one other note about implementation details [22:52:24] you said in the task that the language-neutral URL should be rewritten to a special page [22:52:47] yes, eventually. probably not initially [22:53:03] I don't think we should have intelligence in the web server, I think the web server should just send everything to index.php [22:53:11] then use PathRouter, which is hookable [22:53:32] here's how variants work: [22:53:40] if ( $wgVariantArticlePath ) { [22:53:40] $router->add( $wgVariantArticlePath, [22:53:40] [ 'variant' => '$2' ], [22:53:40] [ '$2' => $wgContLang->getVariants() ] [22:53:40] ); [22:53:41] } [22:53:46] simple, right? [22:53:56] easily extended to do what you want to do [22:54:49] not quite sure i guet what that does, but it tells me there is a nice hookable place to deal with patches [22:54:58] yeah, the hook is WebRequestPathInfoRouter [22:54:59] i'm fine with doing it in php [22:55:14] more control to me, less reason to bother ops ;) [22:55:24] #info Gabriel would like to see some details on how language selection would affect API responses (both PHP and REST) [22:55:44] #info [use PathRouter, which is hookable; the hook is WebRequestPathInfoRouter] [22:56:16] gwicke_freenode: it just sets uselang. how does uselanfg affect RESTbase responses? [22:56:25] it does not at all [22:56:29] Thanks for the pointer, Tim! [22:56:42] well then, that's how it will be. [22:57:03] gwicke_freenode: does restbase support different page renderings based on user language? [22:57:06] that wouldn't work if your frontend widget wants localized content [22:57:50] well, wikibase widgets don't use restbase [22:58:01] we currently support one content language per domain [22:58:10] and the only other pages affected would be pages that work like image description pages [22:58:36] so restbase doesn't work on multilingual wikis. that's not going to change with this proposal. [22:58:58] my point is that we should have a coherent plan [22:59:06] rather than treating it as an afterthought [22:59:26] i agree, but i'm not quite sure i understand your question [22:59:29] any other things for the notes before the meeting ends? [22:59:37] to make restbase work with multilingual wikis, it needs to ssupport multilingual wikis [22:59:44] that has nothing to do with per-ölabguage urls [22:59:52] but i'm happy to discuss the topic some other time [23:00:14] well, for RB it would be quite natural to model this as different domains [23:00:28] hence my preference for using those [23:00:30] gwicke_freenode: do you want RB to support commons? [23:00:38] gwicke made a note that he would like to see more info about API schemes [23:00:39] it already does [23:00:52] ok, thank let's talk about how that works [23:00:54] so fine, let's move on with other business considering there is -1 minute remaining [23:01:21] thanks for chairing, TimStarling! [23:01:32] and thank you everyone for the fruitful discussion! [23:01:40] I new feel we can move forward with this [23:01:43] *now [23:01:55] did we have a topic for next week in the end? [23:02:03] Deprecation policy [23:02:18] https://phabricator.wikimedia.org/T146965 [23:02:23] up next week --^ [23:02:23] i guess we didn't get to discussing https://phabricator.wikimedia.org/T122942 today. [23:02:36] ok, be there or be square [23:02:41] #endmeeting [23:02:42] Meeting ended Wed Nov 30 23:02:41 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [23:02:42] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-11-30-22.00.html [23:02:42] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-11-30-22.00.txt [23:02:42] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-11-30-22.00.wiki [23:02:42] or is gwicke_freenode going to figure that out in the services team? [23:02:42] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-11-30-22.00.log.html [23:02:51] subbu: that wasn't scheduled... or did i put the wrong link somewhere? [23:03:12] i assumed that both that and your rfcs were related and would be discussed. [23:03:15] but clearly i was mistaken. :) [23:03:37] subbu: gwicke_freenode said he would prefer to discuss variants in restbase later [23:03:44] ok [23:03:52] wfm. [23:05:10] gwicke: for the record, i agree we should have a cpoherent plan. it seems like per-lang urls are othogonal to restbase, but I'll let me be convinced otherwise. let's talk about it at the suimmit, at the latest. [23:05:38] * DanielK_WMDE can't type any more [23:05:55] it depends on what the effect on the API responses should be [23:06:04] hence my request to specify that