[12:55:29] WMF Language office hour starting here in 5 mins [13:00:32] #startmeeting Language Team office hour - March 2018 [13:00:32] Meeting started Wed Mar 28 13:00:32 2018 UTC and is due to finish in 60 minutes. The chair is kart_. Information about MeetBot at http://wiki.debian.org/MeetBot. [13:00:32] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [13:00:32] The meeting name has been set to 'language_team_office_hour___march_2018' [13:00:43] Welcome to this online+IRC office for the Language team of the Wikimedia Foundation [13:00:54] Our main conversation is happening on Google Hangout/youtube: [13:01:01] #link https://www.youtube.com/watch?v=RmZcL6zVcTA [13:01:11] Please let us know if you would like to join on the hangout [13:01:17] We will also be taking questions here [13:01:22] Reminder that the logs of this channel will be recorded and posted on meta wiki [13:01:27] The recording from the last meeting is at: [13:01:34] #link https://www.youtube.com/watch?v=MD-BKoSj-oY [13:01:42] and logs are at: [13:01:48] #link https://meta.wikimedia.org/wiki/IRC_office_hours/Office_hours_2017-09-20 [13:03:54] dobrý deň [13:06:35] We are talking today about changes to the interlanguage links. [13:06:45] If you have any questions please ask us here and we will address them either on IRC or in the main session. [13:53:51] https://translatewiki.net/w/i.php?title=Special:Translate&showMessage=ext-uls-compact-link-count&group=ext-universallanguageselector&filter=&optional=1&action=translate (pick language) [14:00:46] Bye! [14:00:48] #endmeeting [14:00:49] Meeting ended Wed Mar 28 14:00:48 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [14:00:49] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-03-28-13.00.html [14:00:49] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-03-28-13.00.txt [14:00:49] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-03-28-13.00.wiki [14:00:49] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-03-28-13.00.log.html [16:37:26] !log T189075 upload lttoolbox_3.4.0~r84331-1+wmf1 to apt.wikimedia.org/jessie-wikimedia/main [16:37:26] http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [16:37:26] akosiaris: Not expecting to hear !log here [16:37:26] T189075: Package apertium-separable and dependencies - https://phabricator.wikimedia.org/T189075 [16:37:31] meh [16:46:08] the other #wikimedia-o [20:00:38] TimStarling: good morning! you here? [20:01:28] Hello Daniel and Tim and Bryan! [20:02:27] #startmeeting RFC meeting [20:02:27] Meeting started Wed Mar 28 20:02:27 2018 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. [20:02:28] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [20:02:28] The meeting name has been set to 'rfc_meeting' [20:02:41] only one bot today [20:03:16] bd.808 killed the other one some weeks ago [20:03:16] yay \o/ [20:03:27] :) [20:03:41] * anomie waves [20:03:57] #topic T190063 Tracking dependencies for multiple Content objects per page (MCR) [20:03:57] T190063: Tracking dependencies for multiple Content objects per page (MCR) - https://phabricator.wikimedia.org/T190063 [20:05:51] * DanielK_WMDE_ wibbles [20:06:56] this class file is enormous DanielK_WMDE_ [20:07:00] So, this session is about a nitty gritty question of how MCR should behave with respect to tracking things like template usage, external link usages, etc. The "links tables", in a word [20:07:54] TimStarling: PageUpdater? I have already split it twice, i'm happy to split it more. Most of it is copied more or less verbatim from WikiPage. [20:09:22] Ah, no, I guess you mean PageMetaDataUpdater. That doesn't have as much copied code - it's basically prepareContentForEdit and doEditUpdates refactored. [20:09:29] anyway [20:10:29] when we're considering visibility etc. I think we should consider not only UI but things like search. I.e. if you don't see something by default, but still want to use it in search (main or keyword) then changes in secondary document should be known to update index for primary [20:10:44] There is two main questions about tracking stuff in non-main slots: [20:10:45] 1) should extensions that define slots have to do work to make such tracking happen? or should it happen per default, with some way to suppress it? [20:11:20] (Are there templates both on the Wikidata /Wikibase side, and the Mediawiki side? What about external TEMPLATE link usage eg ... https://wiki.worlduniversityandschool.org/wiki/SUBJECT_TEMPLATE ...?) [20:11:31] 2) what'S the primary purpose of the links tables anyway? Enabling purging when referenced resources change (so it should be minimal), or allowing editors to find where things like templates or images are used (so it should be maximal)? [20:12:05] is there a concrete use case for slots not visible in the default view? [20:12:16] so, per the last fe comments on the RFC, the goal of this change is to enable MCR to the extent necessary for CommonsData. It is explicitly not the goal to support all use cases listed on [[mw:MCR]], that will happen with a next-gen tracking system which will require a schema change, or maybe some kind of external service [20:12:22] am I understanding that correctly? [20:12:57] SMalyshev: search integration is indeed a related TODO, but out of scope for today's discussion. AS to external templates - the discussion is limited to the existing tracking mechanism in MW core, which is local only. [20:13:09] TimStarling: see https://www.mediawiki.org/wiki/Topic:U8zvaqr5vxw5d1pw [20:13:45] tgr: the goal is to build code that will enable other use cases as well, and which can benefit from a better tracking system when that becomes available. [20:13:51] DanielK_WMDE_: well, search relies somewhat on how updates are tracked I think (LinkUpdate jobs etc.) so that's the aspect I am concerned with, not the rest of it [20:14:21] tgr: other use cases should not be blocked by the introduction of a new tracking system per-se. it should only be a blocker if there are specific performance issues with the specific use case. [20:14:42] "Categories: should be invisible, as categories are handled in a different way, not via ParserOutput" [20:14:44] this is incorrect [20:15:57] tgr: oh, i didn't see your comment there, sorry! do you want to copy it to the RFC? or at least link it? [20:16:04] tgr: I still don't see any use case for suppressed secondary data updates [20:16:31] TimStarling: "not via ParserOutput" there may have meant "not via HTML in ParserOutput". [20:16:31] categories are in fact generated via ParserOutput, they are visible in that sense [20:16:36] TimStarling: OK, the different way also includes ParserOutput, but they are not handled as rendered HTML [20:17:23] maybe we interpret "visible" differently [20:17:49] One of the proposals for handling this RFC and the default view is "make a ParserOutput for each slot's Content as it would be stand-alone, then smash them all together" [20:17:51] the RFC is proposing that invisible content shouldn't have its links tracked [20:18:01] obviously categories are visible in this sense [20:18:18] TimStarling: if we only track for purging, and the ParserOutput in the ParserCache doesn't depend on slot X, we don't need to run data updates for slot x. That's anomie's point, I guess. I disagree with the "if". [20:19:00] categories do appear in the HTML generated by the skin [20:19:01] anomie: "smash them together *excluding* the HTML". [20:19:16] for the html, we want to be at least a *little* smarter [20:19:17] links tables are used to purge that HTML [20:21:37] TimStarling: if some referenced resource has no impact on the output in the cache, do you think it's desirable to not track that reference in the links tables? [20:22:02] you mean the ParserCache or varnish? [20:22:10] ParserCache, primarily [20:22:24] then my answer is that you still have to track the reference for varnish [20:22:30] but the same applies to varnish, conceptually [20:22:38] but without any use cases [20:23:32] we need to have a concept of incremental re-rendering, parsoid has this, and in a couple of years, parsoid will be the default parser [20:24:23] yes. we also need fine grained dependency tracking for purging, for modular content, page composition, etc. [20:24:30] * anomie hopes "parsoid" in that sentence refers to the concept rather than the exiting nodejs service [ob-complaint] [20:25:09] I suggested calling the PHP port of parsoid "pharsoid" but subbu has not yet adopted that name [20:25:38] but as long as we don't have that.. Let's assume slot X references external URLs and local images, but has no impact on the default view. So it's not in the parser cache, nor in varnish. Would you still want to have this tracked in externallinks and imagelinks? [20:25:49] right now we need whatever is a minimum viable product for SDC and won't completely screw us over for the next step [20:26:28] what is the use case DanielK_WMDE_ ? [20:26:30] annotations could be implemented as invisible by default, and might need tracking, but that's a very hypothetical use case [20:26:36] i think yes, for maintenance, spam fighting, etc. if these links don't go into the "main" ParserOutput, AbuseFilter will miss them (until it knows about slots). Editors won't be able to find them. [20:26:55] I can't think of any other invisible + tracked combination [20:27:30] TimStarling: the use case for "slot x is not shown in main view"? i don't think it's a likely use case. [20:27:32] although I'm a bit wary of treating visibility as a property of the content, that should really be a skin-level thing [20:28:12] how would the mobile interface for structured file pages look, for example? do we really want to cram a wikidata-like property table into the default view? [20:28:30] ^^ that's a very good point, about visibility being UI feature [20:29:01] If TemplateData were purely data for VE and didn't double as displaying a table in the template's documentation, that could be a use case. More generally, MCR lets us add metadata that isn't necessarily visible to humans. [20:29:49] tgr: That sort of thinking about mobile is problematic, IMO. Responsive design is better than trying to vary output server-side based on the client. [20:30:19] Just look at MobileFrontend and the related content hacks. [20:30:21] when your are trying to optimize for bandwidth, not really [20:30:59] I'm saying track everything, it's simple and suits apparently every concrete use case [20:31:27] you either end up with mobile-first design where the desktop content is crippled to contain very little (compared to what the desktop interface could handle) or need a separate mobile view [20:32:01] which could be handled by a secondary transformation (like MCR / MobileFrontend now) but it's not ideal [20:32:04] Ok, let's assume the opposite: "invisible slots" are not a thing. How about documentation for a Lua module? It would be shown on the module's default view. It may be using templates, images, external links, etc. When specifying this new slot, should extra work be needed to track such things? Or should it just happen? [20:32:17] if an extension wants links to be untracked, it can just omit them from the ParserOutput, right? [20:33:27] yes links in lua module documentation should be tracked [20:33:46] this is the case at the moment, right? [20:34:07] anomie suggested that the code that defines the slot should explicitly trigegr that by calling a utility function. my thinking is that it should just happen [20:34:50] should we talk about the (class/method) API now? [20:35:16] Much of the debate here is how they get into "the" ParserOutput in the first place. Do we have some generic code that blindly calls Content->getParserOutput() for every slot and merges everything? Or do we have a "SlotHandler" that does that, maybe by doing that or maybe by just pulling links and whatever directly out of structured data in the Content. [20:35:24] for the utility methods? I say we don't need one. just combine all the links from all the slots always, and track them. [20:36:36] this is the safe option, it's closest to what we do now. it provides optimal information to editors. it's not optimally efficient wrt purging - but we need a better solution for that anyway, and we are owring on one. [20:36:37] s/"SlotHandler" that does that/"SlotHanler" for each slot that does the merging for that slot/ [20:37:02] the first option is that a Content understands its slot role? [20:37:14] currently Content is more generic than that, right? [20:37:44] yes - that would be bad, slot roles may be defined by extensions that re-use content models defiend by core (or other extensions) [20:37:50] The first option doesn't require the Content know about its slot role. [20:38:52] We will need some kind of SlotHandler for some things. We may not need one for "basic" slots. And I see no need to have boiler plate code in every slot handler to enable tracking [20:39:28] so really you want SlotHandler->getParserOutput? [20:40:02] I don't want SlotHandler->getParserOutput. I'd want SlotHandler->addToParserOutput. [20:40:20] to avoid the need for merging? [20:40:56] if you have merging then that more easily leads to incremental parsing/updates [20:40:56] although SlotHandler->getParserOutput will probably be necessary anyway for single-slot views [20:41:00] We'll need the ability to generate per-slot ParserOutput anyway. We may be able to avoid actually instantiating them in some cases. But we need the ability. [20:41:26] For B/C for various extensions, and for single-slot viewing/editing. [20:42:04] tgr: that doesn't *have* to go through the SlotHandler, but it's probably sensible. [20:42:13] presumably getParserOutput() could be implemented as $po = new ParserOutput; $slot->addToParserOutput($po); return $po; [20:42:38] The the baseline impl for SDC, I'd like to get away without even having SlotHandler. though we'll need them for more complex struff eventually [20:42:57] There's two things being "merged", HTML and metadata. tgr's list earlier was more oriented to the HTML merging. Do we want to generate a whole ParserOutput, even without populating the HTML, just to merge metadata if the merging of the HTML doesn't need one? [20:43:36] TimStarling: not sure that makes sense. a slot rendering standalone is not the same as a slot attaching itself to a "main view". I'm not sure such attaching of the HTML is viable, really. [20:43:54] tgr: Single-slot views might use the existing Content->getParserOutput(). Assuming we don't have to make that do weird things to support generic merging. [20:44:22] anomie: what's the problem with generating a near-empty ParserOutput? [20:44:50] anomie: why would that be so horrible? constructing a PArserOutput is cheap. Parsing Wikitext is expensive. But in many cases, it'S not going to be wikitext. And for the use cases where it is wikitext, we'd need to parse that eventually anyway. [20:44:55] TimStarling: Seems like a bunch of extra work to make a PO and then merge it when the code could just add to the existing PO. [20:45:29] actually make the code simpler, imho [20:45:34] I don't really see that [20:46:03] merging is extra work, yes [20:46:18] but generating an empty ParserOutput doesn't seem like extra work [20:46:31] and merging needs to happen anyway. [20:46:45] the question is just what source the merge is taking the data from [20:46:51] anomie: that would require all slots that are displayed differently to have a different content type [20:46:59] feels like a lot of unnecessary boilerplate [20:47:09] DanielK_WMDE_: That's my point. Source -> PO -> PO rather than Source -> PO. [20:48:29] i don't care too much about this in theory. in practice, adding the "add stuff to PO" code is tricky, because only the Content object understands the source. That is, we'd need to add a method to the Content interface [20:48:37] adding methods to interfaces breaks extensions [20:48:43] on an unrelated note, the way link tracking is done will have be changed somewhat [20:48:43] much easier to just go via the PO [20:49:03] consider the case when a link is removed from the main content and added to the templatedoc slot [20:49:12] the tracking table should not change [20:49:18] tgr: What slots would be displayed "differently" but have the same content model? [20:49:23] I don't think the current interface can handle that [20:50:05] tgr: that will happen trivially if you merge ParserOutput objects and pass them to the existing LinksUpdate [20:50:06] DanielK_WMDE_: You seem to be thinking of a generic SlotHandler trying to do something fancy with a generic Content. That's not the case that's interesting here. [20:50:09] anomie: Source -> PO -> PO can be done ina completely generic way for merging links. Source -> PO has to be re-implemented for every content model. And would require workarounds so we don't break extensions that implement Content. [20:51:04] tgr, TimStarling: That will happen trivially as long as the same set of links winds up in the merged PO, no matter what the mechanism for them to get there. [20:51:12] tgr: with LinksUpdate based on the merged PO, it would not change [20:52:12] DanielK_WMDE_: No, it would not break any extensions. Source -> PO -> PO is always an option even if it's not the most efficient one. [20:52:47] So let's implement that as the baseline. [20:52:54] I think anomie is correct that you could use Content as a slot role if you wanted to, you would just have to subclass it, have a content type specific to a slot role [20:53:46] DanielK_WMDE_: OTOH, if the whole API is built around that as the baseline and only option, then ever changing that means refactoring all the code. [20:53:56] TimStarling: in theory, though you cfan't re-use the same model ID. wich is inconvenient. also, it doesn't really help [20:54:15] anomie: as opposed to refactoring all the code right away.+ [20:54:49] DanielK_WMDE_: We're needing merging code for the HTML anyway. [20:55:19] So, 5 minutes left. I'm still proposing to a) track everythign always and b) do it by merging POs. That's the simplest thing, the smallest change, and the safe option that preserves a maximum of information and comaptibility. [20:55:22] you'd need to subclass both the Content and ContentHandler, as opposed to creating a single SlotRoleHandler [20:55:31] not terrible, but extra boilerplate [20:56:13] I'd really like to be able to go ahead with that solution, we are supposed to have this dine by the end of Q4. and we are behing about 2 months. [20:56:38] (Thank you, All!) [20:57:04] as a general rule, I think the implementor should have the flexibility to choose between feasible solutions [20:57:20] and I don't think anomie has demostrated that this proposal by DanielK_WMDE_ is infeasible [20:57:38] I still don't much like the idea of having code to merge only the HTML, and then other code that requires generic POs for merging metadata. [20:58:01] merging will be in core, right? [20:58:06] yes. [20:58:16] ...if we do it in the generic way [20:58:17] so we can change details of how it works later [20:58:19] ...via POs [20:58:24] otheerwise, no [20:58:37] if we have addToParserOutput() then merging is effectively in extensions, and we don't have so much flexibility in the future [20:59:33] If we have it in core via POs, then there's no flexibility to do it in any way besides generating a PO. [20:59:59] we have to finish now [21:00:28] I'm siding with DanielK_WMDE_ but I'm not sure we have a quorum to call it approved [21:00:31] are we worried about putting something in 1.31 (LTS) that we expect to rewrite later? [21:01:05] the generic solution doesn't prevent us from adding a way to override that behavior using SlotHandlers or whatever later [21:01:54] How would you override that behavior? Have the "generic" method return a completely empty PO so the SlotHandler can do something different? [21:02:37] Probably not. [21:02:59] TimStarling: we are late, i guess we have to close this. [21:03:02] tgr: last words? [21:03:07] I'm in the hangout already [21:03:10] if it does something different, it probably doesn't need to return a PO at all [21:03:23] +1 to Tim's general rule [21:03:47] I don't think we have another sensible way to break the tie [21:04:17] #endmeeting [21:04:17] Meeting ended Wed Mar 28 21:04:17 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [21:04:17] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-03-28-20.02.html [21:04:17] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-03-28-20.02.txt [21:04:18] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-03-28-20.02.wiki [21:04:18] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-03-28-20.02.log.html