[21:00:02] #startmeeting RFC meeting [21:00:02] Meeting started Wed May 18 21:00:02 2016 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:00:02] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:00:02] The meeting name has been set to 'rfc_meeting' [21:00:14] #topic RFC: Requirements for change propagation | Wikimedia meeting channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:00:44] hi [21:00:58] hello [21:01:03] #link https://phabricator.wikimedia.org/T102476 [21:02:11] DanielK_WMDE & myself thought that it would be useful to catch up on what has been going on in eventbus & change propagation land, and talk about needs and open questions [21:02:52] Great, Gabriel [21:03:39] while most of the RFC is older & aimed at providing high-level background to the general issue, the section "Current status" is new & has information on current work, as well as some notes on next steps & open questions [21:05:02] the biggest of those open questions is probably cross-project dependency tracking & change propagation [21:05:50] gwicke: what are your thoughts about my last comment? https://phabricator.wikimedia.org/T102476#2296335 I think it'd be easier to keep track of the prose for long RFCs (like this one) using our software. [21:06:20] so, for the sake of stating the obvious: the general idea is to track dependencies between "things" (identified by URIs or some such) as a DAG, so we know what to purgfe or re-generate when. at least, that seems to be the core of it. [21:06:21] I know DanielK_WMDE is involved in that area, so I hope that he can tell us a bit more about the situation & needs for wikidata [21:06:52] gwicke: yea. so, one important issue is thæt the artifact we track may be created on the fly, during a page view (GET request) [21:07:19] if we track that in an SQL db, that means a master write during a get request, possible a cross-DC operation. [21:07:25] robla: I'm not opposed to moving the general portion especially to mw.org, lets just make sure we don't end up with multiple copies [21:07:25] aaron and ori don't like that... [21:07:42] DanielK_WMDE, there are workarounds for it, as long as it can be queued. [21:07:55] DanielK_WMDE, what is an example of triggering on a GET [21:07:55] DanielK_WMDE: and the artefact dies with the req or? [21:08:09] cross-dc master writes on GET certainly aren't optimal [21:08:25] matt_flaschen: yes, we currently use the job queue for that - but the JQ doesn't seem to kope well with the load. [21:08:40] DanielK_WMDE: is this about queries? [21:08:49] mobrovac: no, the arztifact should be persistet. think parser cache. [21:09:10] DanielK_WMDE: "aaron and ori don't like that...", I'm assuming you mean AaronSchulz, right? [21:09:30] gwicke: no, just tracking which rendering of a page (think parser cache key) uses which bits of which wikidata entity [21:09:36] robla: indeed [21:10:12] DanielK_WMDE: the job queue doesn't cope well with cycles [21:10:18] gwicke: in case of the parser cache, we currently always purge all renderings if any of them needs purging. we havn't found a good meachnism to avoid that [21:10:18] DanielK_WMDE, what kind of stuff might get generated during GET reqs? that sounds non-ideal. [21:10:21] that's another open issue [21:10:27] ok, so it's not critical that the write happens during the life of the request, it just needs to be recorded somehow [21:10:46] which question are we trying to answer here? [21:10:48] subbu: rendering in any of the possible user languages. multilingual wikis can't pre-generate all possible renderings on save. [21:10:50] DanielK_WMDE, why does that need to be updated on a GET request? I would think that dependencies only change on POST (either update to Wikidata or update to Lua module or update to Wikipedia article)? [21:11:19] matt_flaschen: we are tracking *renderings* of a page. they are generated on demand. [21:11:47] "tracking"? [21:12:02] like a log event or something else entirely? [21:12:06] matt_flaschen: if page Foo uses template X, the wikitext of Foo doesn't depends on X (it does reference X, but that's another issue). the HTML rendering depends on X (and on the wikitext of Foo) [21:12:25] bd808: like link tables. [21:12:28] DanielK_WMDE, right, i understand that, but the dependencies in that case only change on POST. [21:12:29] exaxtly the issue we have in RESTBase [21:12:34] This part from the RFC seems relevant: [21:12:36] "Our current approach of re-rendering all seven million articles can easily result in large backlogs of template updates. It might be useful to consider pull based or hybrid solutions (where only a timestamp is propagated and polled) as an alternative to pure push." [21:12:37] #info robla asks for response on T102476#2296335. gwicke answers "I'm not opposed to moving the general portion especially to mw.org, lets just make sure we don't end up with multiple copies" [21:12:37] T102476: RFC: Requirements for change propagation - https://phabricator.wikimedia.org/T102476 [21:12:37] but also for generated artifacts. possible stuff that is generated on demand [21:12:42] in that we care more about renders than wikitext [21:13:09] So maybe not all the HTML renderings should be pre-generated on save, but all the *dependencies* should be tracked on POST, and when to actually re-render is a different question, just as with templates. [21:13:17] matt_flaschen: the dependencies of the de rendering of page Foo are unknown until it is actually rendered for de output, and then cached. that happens on demand, during a get request. [21:13:32] matt_flaschen: image description pages have conditionals that depend on the user language. [21:13:41] not to speak of wikidata, where all of the output depends on the user language [21:14:13] matt_flaschen: we can't know the dependencies fro mthe wikitext. not even after resolvoing templates. [21:14:16] currently we have a canonical ParserOptions which is used to determine dependencies, and then those dependencies are used to purge all renderings [21:14:25] DanielK_WMDE: I'm trying to understand how this is different from links, templates & media; is the main issue that dependencies differ between languages? [21:14:34] gwicke: yes. [21:14:38] even though renderings may have different dependencies to the canonical one [21:14:44] gwicke: which is already broken for links, templates, and media. [21:14:57] we don't track dependencies that only occurr in non-canonical renderings [21:15:04] yeah, I was just going to say.. language variants are happily messing with that too [21:15:08] so we sometimes fail to purge [21:15:50] TimStarling: yea, currently, we just oignore the nastiness, because it's not very visible. but with better support for multilingual content, we need a better solution [21:16:15] there is a proposal in here to migrate all link tables to cassandra, is that correct? T105766 [21:16:15] T105766: RFC: Dependency graph storage; sketch: adjacency list in DB - https://phabricator.wikimedia.org/T105766 [21:16:24] gwicke: i want to point to another open question: usage tracking gor 3rd parties. think InstantCommons. and perhaps InstantWikidata in the future. [21:16:59] TimStarling: as I said on the task, it's largely theoretical & premature at this point [21:18:24] how abpout redis? [21:18:35] DanielK_WMDE: we have structured some of the recent change propagation work around URLs, with a view to possibly supporting outside resources [21:18:37] i think this raises the question of how reliable our tracking needs to be [21:19:01] is it ok to have this transient? or should we be sure that it's persistent? [21:19:01] but we have not tackled any dependency tracking so far [21:19:29] gwicke: 3rd party dependency tracking would basically require a pubsub service. but with very high granularity [21:19:37] it's not hard to do, but hard to scale [21:19:38] DanielK_WMDE, i am confused by your response to matt_flaschen's qn .. " the dependencies of the de rendering of page Foo are unknown until it is actually rendered for de output" .. that seems independent of when a page is rendered .. on a POST or on a GET, right? [21:20:06] how do you re-render all affected pages on a POST then? [21:20:15] or is that what you mean it is currently broken. [21:20:18] subbu: currect. so in some cases, we will only have this information in a GET request - and need to somehow store it [21:20:26] DanielK_WMDE: to me, the other big question is how to structure interfaces for dependency updates, so that they are both usable & efficient [21:20:55] subbu, I think DanielK_WMDE's point is that it's computationally infeasible to determine the dependencies for all languages ahead of time. Since you need to parse it for each language just to determine what the dependencies of e.g. the Spanish language version are. [21:21:12] With e.g. Commons and {{int:}}, etc. [21:21:38] the XKey Varnish work is heading in this direction as well [21:21:50] gwicke: well, basically, you need add(x,y), remove(x,y), lget(x), rget(y), lpurge(x), rpurge(y). Plus perhaps a batch interface. [21:22:27] DanielK_WMDE: alternatively, you could post the dependencies each time & let the API figure out diffs [21:22:47] matt_flaschen: also, we only need that info if we actually have a rendering cached. if the page was never rendered in hebrew, i don't care what dependenceis the hebrew version might have [21:23:19] gwicke: yes. that would be part of the batch interface. [21:23:35] DanielK_WMDE: could you describe an example for such language variance? [21:23:45] which kind of content would be pulled in conditionally? [21:24:11] gwicke: all renderings of any item page on wikidata. all of it is language dependent. [21:24:26] gwicke: license templates on file description pages on commons [21:24:35] image descriptions on commons [21:24:44] (blame the translate extension) [21:24:59] https://commons.wikimedia.org/wiki/Template:LangSwitch [21:25:05] anything that uses {{int}} [21:25:10] matt_flaschen: indeed [21:25:29] DanielK_WMDE, for Wikidata, how are the dependencies language-dependent? Doesn't one page always depend on the same thing regardless of language. It's just the actual output would be different, but how is the dependency graph different? [21:25:32] fun ;/ [21:25:36] For Q pages? [21:25:50] for item pages, is there a concern beyond CDN purging? [21:25:50] Since Q pages don't have LangSwitch AFAIK [21:26:11] gwicke, it's also the ParserOutput itself, right? [21:26:48] matt_flaschen: in case of wikidata, the dependency graph is probably the same, that is true. that's not true for anything that uses {{int}} or (or variants) [21:26:50] yeah, but afaik that's still keyed on the Q page & it's possible to purge all at once [21:27:19] at first sight, item pages sound like a somewhat simpler case to me [21:27:49] matt_flaschen: but it depends on the granualitty. If Q1 references Q7, that will be the same for all languages. But the de-ch rendering of Q1 would epends on Q7.label.de, and the en rendering would depend on Q7.label.en [21:28:10] conditional dependencies will always be tricky, and languages are not the only source of complexity there [21:28:13] gwicke: is work on this underway? which parts of the RFC need comment most urgently? [21:28:13] I'd like to have a system that can at least in theory handle this kind of thing. [21:28:20] lua code can transclude things based on the phase of the moon [21:28:37] gwicke +1 on reference [21:28:47] heh :) [21:29:14] gwicke: they are not conditional. different artifacts depend on different artifacts. the dependency tracking doesn't care how or why. [21:29:35] TimStarling: work is underway on eventbus & changeprop, but for dependency tracking we are mostly trying to better understand the issues at this point [21:29:37] if you have a uri for "phase of the moon" and you touch it every day, the dependency management should be able to handle this [21:29:58] uri = artifact, here [21:30:00] TimStarling: this discussion is very helpful in that regard [21:30:23] DanielK_WMDE, that could be abstracted, though. You could say Q1[de-ch] depends on Q7.label[de-ch], and Q7.label[de-ch] depends on Q7.actual-label[de-ch] and Q7.actual-label[de]. [21:30:26] gwicke: which question are you hoping to get consensus on? [21:30:55] robla: I wasn't hoping for any decisions in this meeting [21:30:59] With label depending on which if any actual-label are available. [21:31:09] robla: I think this is more requirements gathering [21:31:19] matt_flaschen: yes, exactly. in case of wikibase with nice structured data, it would probably be feasible to do this in advance. for wikitext, it isn't - so we don't know until we render on demand. [21:32:19] TimStarling, robla: since the entire rfc is about gathering requirements, it's kind of meta... how could it ever be "approved" or "implemented"? what does that mean? [21:32:25] * robla is trying to figure out if/when he should be capturing any of this with #info commands, and can't figure out how to make that useful [21:33:10] i have tried to formulate the requirements we have for wikidata. i can try and put that into #info tags, but i try no to do that too often with my own comments... [21:33:23] DanielK_WMDE: I think that's why I'm hoping to move it to MediaWiki. The finished RFC can be a clear description of the options. [21:33:42] robla: right, but the rfc process seems useful for gathering requirements [21:33:47] #info significant use: track dependencies when rendering pages with {{int}} and so that they can be purged when conditional dependencies change [21:33:53] The biggest open question I see is how to handle the case where the dependency graph itself varies depending on user language. [21:34:07] Sorry, didn't see Tim's #info before saying that. [21:34:14] DanielK_WMDE: gathering them in your head? [21:34:34] I'm not sure that this is a problem we can or should try to solve in general [21:34:47] matt_flaschen: just thing of renderings as natural parts of the dependency graph. that should Just Work (tm). [21:34:53] *think [21:35:17] unpredictable dependencies are always going to be a pain to handle, and even if we fully handled them, it would be fairly expensive to do so [21:35:18] one consequence is actually: wikitext *never* depends on wikitext. [21:35:20] DanielK_WMDE, yeah, I'm trying to get my mind around that perspective. [21:35:24] you never purge wikitext [21:35:38] I think MW's link tables need to be moved out of the core DBs fairly urgently, I've thought so for years [21:35:46] what needs purging is *gernerated* content, so we need to track what it depends on [21:36:29] I guess it is fine to start with a new thing like shadow namespaces, but the existing LinksUpdate system is not so awesome that I would want to see it preserved [21:36:42] TimStarling: i think there are two use cases for those: dependency tracking for puring (which should be moved and improved), and tracking references (in whatlinshere, related changes, etc) for maintenance (which we should keep, i think=) [21:36:45] TimStarling, could you explain about moving the links tables out? [21:36:49] one work-around for the unpredictable dependency problem is to eliminate the direct depenency by a) making things modular, or b) rendering things dynamically on the client [21:37:18] TimStarling: i think we need to consider these two things separately. with more dynamic content, we can no longer treat them the same [21:37:33] matt_flaschen: the *links tables are large and heavily-updated, it's not scalable [21:37:47] #info what needs purging is *gernerated* content, so we need to track what it depends on [21:38:51] DanielK_WMDE: yes, they can be split up [21:38:56] gwicke: but in order to do that, i need to know the dependencies, right? the client somehow needs to know what resources to load. [21:39:11] maybe we can stop putting entries in whatlinkshere when someone uses #ifexist [21:39:14] TimStarling: i think that is actually the cure issue this rfc is about [21:39:17] *core [21:39:24] DanielK_WMDE: yes, but the response of each part can vary independently, which avoids a lot of dependencies to the composite [21:39:54] it avoids materializing the composite [21:40:07] the dependencies that need tracking are the same [21:40:09] (DanielK_WMDE, matt_flaschen, gwicke: re https://commons.wikimedia.org/wiki/Template:LangSwitch - long term question - in what ways could this be extended to include translation from WMF Content Translation as well as later with Wiktionary - and in combination conceptually with Google Translate?) [21:40:19] well, the RFC is requirements gathering, so I am stating my support for a use case which is on the list of things this could solve [21:40:26] DanielK_WMDE: no, you no longer need to track the dependency to the composite [21:40:33] instead, you can just purge one component [21:40:49] & have it update wherever that component is pulled in dynamically [21:41:09] the same way CSS can be updated without re-rendering everything, or tracking dependencies [21:41:33] Scott_WUaS: this is one small step towards allowing people to view pages in their favorite language without logging in. but it's fairly technical/low level, not directly related to translation. [21:41:39] (leading eventual very sophisticated machine translation with artificial intelligence re natural language processing in extending WMF ~300 languages to all languages even?) Thanks! [21:41:52] of course, this only works with reasonable efficiency for a limited number of components [21:41:54] DanielK_WMDE: Thanks [21:42:37] gwicke: the dependencies still have to be known somewhere. but perhaps they can just live on the client [21:43:23] DanielK_WMDE: the difference is that you only need to store one edge (the reference), and don't need to track the backwards edge [21:43:41] ah, yea... i guess you are right [21:43:41] it's an example of polling [21:43:50] 2:25 PM for item pages, is there a concern beyond CDN purging? [21:43:55] what's the answer to that? [21:44:20] gwicke: anyway - do you agree that the main issue is tracking dynamically generated artifacts, in order to determin when they need to be re-generated? and the proposal is to no longer try to do this based on link tables? [21:44:21] * AaronSchulz is curious about xkey too [21:45:07] if per language renderings where bucketed in cdn, they could be purged via a single URL purge, without tracking each language variant, right? [21:45:07] AaronSchulz, gwicke: about items pages and cdn purging: yes. that's why anons can only view wikidata in english. [21:45:33] DanielK_WMDE: that is the general problem, yes -- but I think we'll need to work on this from both sides: a) avoiding some dependencies by making things more modular, and b) improving our infrastructure for tracking dependencies [21:45:46] avoiding the need to track new renderings on GET [21:46:33] AaronSchulz: for item pages, even a Vary might be able to do it right now [21:46:40] "anons can only view wikidata in english" -- what a stupid problem for a project like ours to have [21:46:45] AaronSchulz, but how do you know which pages to purge, if it varies by language what the dependencies are? [21:46:59] embarrassing, we should fix that [21:47:03] AaronSchulz: related to the cdn issue: https://phabricator.wikimedia.org/T114662 [21:47:10] gwicke: yeah, if we validate the language and 500 on bogus ones, that limits the hash-chain to only a few 100s of possibilities [21:47:10] Also, regarding the CDN question, parser cache itself also needs to be purged. [21:47:36] AaronSchulz: yeah, only a teeny bit of fragmentation ;) [21:47:41] once you hit MW, there are lots of ways to handle validating variant caches of a single source [21:48:10] (e.g. checking page_touched, some other field, WAN cache check keys,...) [21:48:21] AaronSchulz: no, we would still need that info, so we can purge the parser cache [21:48:27] AaronSchulz, never mind, "varies by language what the dependencies are" doesn't apply to Wikidata Q pages. [21:48:59] DanielK_WMDE: why? [21:49:13] AaronSchulz: also, we want selective purging. if only the french rendering depends on resource X, and X changes, it would be nice if we could purge only the french version. [21:49:19] not all of them [21:49:36] "would be nice" or "actually is worth it"? [21:49:46] AaronSchulz: you'd need multi-dimensional buckets. not just a bucket for "all renderings of Foo", but also a bucket for "all renderings that depends on X" [21:49:56] both on the CDN, and for the parser cache [21:50:12] (that's actually closely related to the recently rejected PSR6 proposal) [21:50:39] https://phabricator.wikimedia.org/T130528 [21:50:39] what is X? like magic words and templates that vary on language? [21:50:59] AaronSchulz: a template, or a wikidata item. or the phase of the moon. whatever the rendering depends on [21:51:35] is there a list of concrete use cases for wikidata? [21:51:46] AaronSchulz: the fact that only the french rendering depends on X would be due to {{langswitch}} or or something, yes [21:51:59] for MW core, I'd rather discourage/deprecate stuff like that (use lower TTLs where it is needed) [21:52:25] AaronSchulz: the one that has the most need for language dependant tracking is file description pages on commons. [21:52:39] that'S why i implemented on-render usage tracking for entities. [21:52:53] DanielK_WMDE, couldn't that be solved (the same way it's non-dynamic for Q pages like discussed above) once Commons uses Wikibase? [21:53:03] To avoid the "actual dependency graph varies by language" issue? [21:53:08] At least for file description pages. [21:53:11] there are a lot of trade-offs here between cost of dependency tracking, cost of purges, accuracy of tracking non-deterministic dependencies etc [21:53:13] AaronSchulz: well, if i understand gwickes vision correctly, he would like to habve a *lot* more of this kind of thing. [21:53:21] it adds a lot of complexity to go from just varying on rendering of entities/pages to using different helper entities/pages to render an asset/entity [21:53:48] matt_flaschen: once commons *only* uses wikibase, and doesn't generate wikitext from it - then yes, to an extend. [21:54:00] i expect that transition period to take about 5 to 10 years. [21:54:15] for wikipedia, we kind of track dependencies per language already by virtue of having one project per language [21:54:26] heh, another decade anniversary cake ;) [21:55:09] AaronSchulz: this rfc is about a generic mechanism to allow this kind of tracking, not just for wikidata, but for all kinds of content. this allows us to re-use rendered snippets/widgets, and purge them when appropriate [21:55:17] you know {{int:}} varying by user language was an accident [21:55:31] gwicke: indeed. the language issue arises for multilingual pages. [21:55:38] TimStarling: yea :D [21:55:43] if I hadn't made that implementation error then commons would have been stuck with JS hacks to hide languages other than the current one [21:55:51] which is what they did to start with [21:55:52] or CSS [21:56:16] they would just have extended to cover this [21:57:16] my personal inclination is still to reduce the reasons for such variance in the content [21:57:45] but it's clear that there are so many sources and use cases for this already that it won't be possible to avoid it altogether [21:58:11] right [21:58:36] any action items or #info for the notes before we wrap up? [21:59:03] I want to thank you all for participating [21:59:16] (Glad for this focus on language and translation) [21:59:25] I want to thank you for listening to my rants ;) [21:59:33] next week we don't have a particular RFC scheduled, sowea triage session [21:59:33] one action would seem to be to clarify the anticipated concrete dependency relations [21:59:42] lang variants seem to be the most painful point so far [21:59:44] this kind of discussion is a perfect reminder of all the tricky issues that we are trying so hard to forget [21:59:58] * robla looks up link to next week [22:00:00] s/sowea/so we are planning on doing a [22:00:29] mobrovac: yes, along with update volume & suitable APIs [22:00:30] next week: https://phabricator.wikimedia.org/E187 [22:00:43] as TimStarling said, it's a triage [22:00:52] mobrovac: that is an action item for gwicke? [22:01:21] i'd say this is an action item for all of us that want to get the most out of this [22:01:34] yeah, I think we'll look into this further as a team [22:01:35] the clearer the problem, the simpler the solution as always [22:01:38] #action update the RFC to clarify the anticipated concrete dependency relations [22:02:18] we might also want to split out the dependency tracking part [22:02:24] I think it's usually best if an action item is assigned to a single person, since shared responsibility is equivalent to no responsibility [22:02:40] you know the bystander effect [22:03:06] TimStarling: I can be that person, but mobrovac is leading changeprop development, so is heavily involved [22:03:15] along with pchelolo [22:03:22] ok [22:03:45] #endmeeting [22:03:46] Meeting ended Wed May 18 22:03:45 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:03:46] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.html [22:03:46] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.txt [22:03:46] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.wiki [22:03:46] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.log.html [22:04:25] :) [22:04:54] thanks again, everyone -- and see you on phabricator! [22:07:35] lol