[21:58:57] * Marybelle waves jackmcbarn. [21:59:09] * jackmcbarn waves back [22:02:22] hi all! [22:02:30] so, let's see if i can remember ho to use meetbot... [22:02:37] #startmeeting [22:02:37] DanielK_WMDE_: Error: A meeting name is required, e.g., '#startmeeting Marketing Committee' [22:02:51] #startmeeting ArchCom RFC meeting [22:02:51] Meeting started Wed Jan 25 22:02:51 2017 UTC and is due to finish in 60 minutes. The chair is DanielK_WMDE_. Information about MeetBot at http://wiki.debian.org/MeetBot. [22:02:52] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [22:02:52] The meeting name has been set to 'archcom_rfc_meeting' [22:02:53] * legoktm waves [22:03:06] #link https://phabricator.wikimedia.org/T154738 [22:03:15] I'm not sure I can control the bot. [22:03:24] Anyway, we're discussing that. [22:03:45] Marybelle: ha, thanks! [22:03:59] i was hunting for the link, i have the tab open on the Other Laptop ;) [22:04:08] #topic Accessing page properties from wiki pages https://phabricator.wikimedia.org/T154738 [22:04:16] I'm reading over your note now. [22:04:22] Is Tim around? [22:04:32] he wasn't in the archcom meeting [22:04:49] Okay. [22:04:49] but gwicke and Krinkle had a few thoughts too [22:05:08] Sure. [22:05:14] I think focusing on the category sort key use-case is simplest. [22:05:20] Specifically edits like this: https://en.wikipedia.org/w/index.php?title=Talk:Thomas_W._O%27Brien&diff=757480527&oldid=610994123 [22:05:40] So, it seems to me that accessing the pages *own* page-props is quite different from accessing *another* pages page-props. [22:05:44] That's a user re-using the subject-space page's sort key on the talk page. [22:05:45] different use cases, different problems [22:06:00] DanielK_WMDE_: yes, but if you're not careful, the latter can easily give you the former too [22:06:02] I don't think they're very different. [22:06:37] jackmcbarn: it will give you a broken version of the former, providing access to the pae-props of the previos revision... [22:06:40] My goal for this meeting is to figure out if there are any immediate action items we can take to eliminate edits such as https://en.wikipedia.org/w/index.php?title=Talk:Thomas_W._O%27Brien&diff=757480527&oldid=610994123 [22:07:00] The answer may be no, but I'm hoping we can find a yes somewhere. [22:07:08] DanielK_WMDE_: it's not always the previous revision. it can also be the previous parse of the current revision [22:07:16] Marybelle: if you ask me: don' [22:07:19] and that's where the trouble starts [22:07:29] ...don't store meta-data in wikitext, let's make MCR happen :) [22:07:38] Is that going to happen in 2017? [22:07:41] that --> MCR [22:07:55] I'll happily donate this meeting to MCR if it's realistic and feasible. [22:07:57] if not, Sloan will be very unhappy with us [22:08:09] it's a blocker for structured meta-data on commons [22:08:15] so, yes, pretty sure it's gonna happen [22:08:19] i didn't know about MCR until just now. it seems like a really nice way out of this [22:08:38] Marybelle: is your goal end of 2017 for this? [22:08:38] jackmcbarn: it's a very bug change, but it's going to enable a lot of things [22:08:51] legoktm: My goal was end of January when I thought this was just adding a parser function. :P [22:08:52] it's adding a dregee of freedom to our content model [22:08:54] now the issue comes when people want to be able to call templates from the other things [22:09:01] even with MCR, presumably we'd still support {{DEFAULTSORT:}} in wikitext for a while for back-compat right? [22:09:02] from the other streams, that is [22:09:05] then we're back at square 1 [22:09:16] Marybelle: if you don't care abotu stale data, and you are never accessing the page's own props, then yes. [22:09:30] I vaguely care about stale data. [22:09:32] we could use templatelinks for cache invalidation right? [22:09:39] Yes. [22:09:55] You could think of the default category sort key as a partial page transclusion. [22:10:08] actually, templatelinks don't entirely fix stale data [22:10:29] We should acknowledge that we already have stale data and volatile wikitext. [22:10:33] it's possible that you'd need to purge two pages instead of one for everything to be live again [22:10:44] yes, but currently, purging the stale page always fixes the staleness [22:10:46] So as long as we're not making the mess much worse, I don't think we should let perfect be the enemy of the good. [22:10:53] Purging is a horrible hack. [22:10:59] The purge action, I mean. [22:11:05] yes, but we don't have a way to fix that as of now [22:11:12] so it's a necessary evil [22:11:25] DanielK_WMDE_: Are Krinkle and gwicke gonna weigh in? [22:11:39] legoktm: how exactly would you use templatelins? just put fake template names in there, like #pageprop:Foo|bar? [22:11:43] jackmcbarn, legoktm, DanielK_WMDE_: Could we do https://www.mediawiki.org/wiki/?curid=647378#Option_3 ? [22:12:07] Marybelle: we could do those, but they seem really hacky since they're so single-purpose [22:12:21] More hacky than a parser function? [22:12:25] and they also don't fix the fact that you'd have to sometimes purge the subject page to make the talk page not be out of date [22:12:26] Everything is hacky and terrible. [22:12:33] imo, yes more hacky [22:12:54] I'll mention two things: Firstly, it seems to me that accessing a page's own pageprops has a less clear usecase so far compared to accessing other page's props. Especially considering the ones set by the wikitext itself (makes it rather fragile). E.g. a template that varies based on whether the page is a disambiguation page. [22:12:56] Marybelle: i don't know. I can try to summarize. 1) page-props are not a stable interface. Pages that rely on specific page-props may break unexpectedly. a more specicif interface would give more control, and prevent access to nasty things. [22:13:11] DanielK_WMDE_: if you register a normal templatelink, then when the [[Thomas W. O'Brien]] page is edited, it queues an update for the talk page [22:13:11] do having single-purpose magic words make editor integration like VE easier? I would think so [22:13:31] Marybelle: 2) we should look at concrete use cases instead of insisting on a general solution, if that general solution is problematic [22:13:43] DanielK_WMDE_: I have three concrete use-cases. :-) [22:13:55] I'm specifically discussing category sort keys for now, though. [22:14:04] Krinkle: i 100% agree. we shouldn't open the can of worms of letting pages access their own props [22:14:08] legoktm: i know how templatelins are used for puring. but how are you going to fit something inn there that is not a templatelink? [22:14:09] Secondly, we do have a few foreign-page magic words already. Both ones that work for both current and other pages, and those for other pages only. Such as {{PAGESINCATEGORY:}} and {{PAGESIZE:}} however it seems PAGESINCATEGORY for example has no cache invalidation strategy (no link table entry). [22:14:11] however, i don't know of an effective way to prevent that [22:14:33] correct, PAGESINCATEGORY is never invalidated [22:14:38] DanielK_WMDE_: it *is* a templatelink though. We'll just say that Talk:... transcludes [[Thomas W. O'Brien]] [22:14:54] DanielK_WMDE_: What Lego said. [22:15:03] You'd just treat the page as a regular transclusion. [22:15:08] legoktm: oh, not a specific page prop, just all of it? any any change to that page will then cause the talk page to be purged?... [22:15:18] pussible... but... not nice... [22:15:24] What is nice? [22:15:25] possible, even [22:15:30] Besides MCR, allegedly, maybe one day. [22:15:32] DanielK_WMDE_: legoktm: template links would be used the same way as #ifexist, not a fake entry. but an entry for the value passed to it. [22:15:33] Marybelle: only purge when the page-prop changes [22:15:37] that would be nice [22:15:40] Krinkle: exactly. [22:15:42] So that it has a backlink. [22:15:48] fwiw, I added the parsing team on the RFC; I'm sure they will have thoughts on issues around the parser processing model [22:15:48] ok, i get it. [22:15:58] would work, but seems wasteful, since *any* edit would trigger the purge [22:16:05] gwicke: Cool, was just thinking we needed more cooks in the kitchen. ;-) [22:16:05] even though most edits don't change the respective prop [22:16:06] we already have plenty of cases where we do that [22:16:14] That seems acceptable. [22:16:14] i don't think wasteful invalidation is a major concern [22:16:31] jackmcbarn: wikidata is getting chided for it [22:16:38] how so? [22:16:46] PAGESIZE is dependent on any page edit? [22:16:54] That one would be yeah [22:16:59] wasteful invalidation is a significant concern [22:17:02] Which we already do? [22:17:22] as well as the complexity of tracking even more dependencies [22:17:22] Like I realize that everything sucks and is inelegant, but if we're already doing these things... [22:17:29] Yeah, PAGESIZE does add a tempatelink entry [22:17:32] Currently. [22:17:34] jackmcbarn: when a wikipedia page uses data from a data item, it will be purged when that data item changes. that can cause a lot of purges, which causes a lot of load for rendering. [22:17:52] we are being told to have more specific usage tracking to avoid this (and other issues) [22:17:59] Marybelle: that's not a reason to keep doing it necessarily [22:18:12] DanielK_WMDE_: You mean wikidata only tracks the entity (Q id) being used, not the individual statement? [22:18:16] gwicke: It's also not a reason to block things. [22:18:18] Krinkle: yep. [22:18:27] DanielK_WMDE_: That seems solveable though? [22:18:32] Krinkle: we do track some things (like, only the label being used), but not the statement [22:18:43] solvable, but not easy at scale [22:18:49] DanielK_WMDE_: What's your thought on this RFC? Wait for MCR? [22:18:53] Would just need to add more precision to the tracking process, afaik they all have unique ids (P ids). [22:19:12] Krinkle: yes. we are talking a table with a billion rows, then... [22:19:17] Extra column and repeat entries if multiple are read. Or perhaps upto a certain number and consider the entire Q instead. [22:19:34] DanielK_WMDE_: Tht table presumably exists already, just with one less column. [22:19:37] Anyhow. [22:19:45] Krinkle: https://phabricator.wikimedia.org/T151717 [22:20:06] It sounds like we're currently so paralyzed by bad architecture that we'll do nothing. [22:20:12] And hope that it gets better this year. [22:20:20] With the support of the Sloan Foundation. [22:20:22] Or something. [22:20:26] Given how internal many page properties are I'd also say that whatever solution we come up with should not allow arbitrary access to them, but rather be a stable interface with just a subset of values we can support and have solid use cases. So that they don't depend on the current implementation. [22:20:37] Marybelle: adding more things that we know are going to bite us is not going to help [22:20:44] Not as specific as a new magic word/functino for each one, we can have a generic one, but not as generic as any page prop. [22:20:56] we need to be deliberate in how we change things [22:21:06] gwicke: Maybe we can make another datastore that nothing can easily access. [22:21:10] currently, the RFC is not even discussing different options for solving the use cases [22:21:20] page_props, Wikidata, what's the harm in a third? [22:21:45] gwicke: There are literally numbered option sections. What do you want? [22:21:47] Marybelle: that's pretty much the situation with regards to the storage layer, yes. that'S whjat happens when investments into infrastructure are neglected. [22:21:51] e.g. {{#pagemeta:Sandbox|pagesize}} and {{#pagemeta:Sandbox|disambig}} or some such. [22:22:08] gwicke: Did you read https://www.mediawiki.org/wiki/?curid=647378#Option_3 and sections surrounding? [22:22:11] Marybelle: but i'm not saying using templatelinks is a no-no. i'm saying it should be considered with care, as it'S not free [22:22:14] Krinkle: so whitelist the pageprops we allow? [22:22:18] that seems pretty reasonable [22:22:19] i think it's important to distinguish between pre-parse things, like page size, and post-parse things, like whether a page is a dab page [22:22:32] legoktm: Not just whitelist, but also with stable return values. Not raw from the db. [22:22:35] Yeah [22:22:40] pre-parse things cause almost no harm at all, and some (like page size) can already be simulated [22:22:41] Stable how? [22:22:51] I apologize; I did not read the MW RFC carefully [22:22:56] jackmcbarn: yes, but how do you know? in some cases, it's during-parse-added-by-an-extension [22:23:00] gwicke: :-) [22:23:04] E.g. cast to number or tinyint based on what the field is supposed to be. [22:23:21] requiring a bit of knowledge of the individual field. Not just freeform string return of whatever is in there. [22:23:21] DanielK_WMDE_: i lump during-parse in as post-parse. what i meant was pre-parse vs non-pre-parse [22:23:31] legoktm: whitelist would make it a lot more sane. [22:23:45] This would presumaby be extendable for extensions as well (adding more fields and a callback for the processing of the value) [22:23:50] Whitelist would accompany a generic parser function? [22:23:54] so that disambug ext can add its field there. [22:24:02] Marybelle: That's an idea yeah :) [22:24:04] Like #getpageprop:foo|bar|baz where foo, bar, and baz are defined as whitelisted? [22:24:26] Marybelle: No,the whitelist would be in the software, not in the wikitext. [22:24:31] i feel like whitelisting isn't really very helpful, since most of the props that Marybelle would want to be whitelisted are ones i have concerns with exposing [22:24:31] I'm not sure whitelisting solves much. [22:24:37] jackmcbarn: what pre-parse page-props are there? and are we talking about thinngs that are actually in the page_props table? Because page size isn't there, is it? [22:24:47] Krinkle: Right, but I don't see how that solves cache invalidation or wikitext volatility. [22:24:48] jackmcbarn: Which ones are you concerned about and why? [22:24:52] DanielK_WMDE_: not sure. i went off seeing it in Krinkle's example [22:24:54] Marybelle: whitelisting would avoid access to nasty unstable things. [22:25:00] It just hobbles a generic parser function. [22:25:10] Krinkle: i'm concerned about all of them that can be changed by purging only without having to edit the actual page [22:25:10] DanielK_WMDE_: Like {{CURRENTTIMESTAMP}}? [22:25:31] We're pretty good at firmly closing the door after the all the horses have escaped the barn. [22:25:38] Marybelle: that just reads the real time clock, i think :) [22:25:41] You mean that expansion of #time, can change with purge, and influence a page prop? [22:25:47] yes [22:26:01] Marybelle: it's evaluated before the revision is saved, so it can't be the *actual* time stamp [22:26:09] The size is pre-pst, but yeah in theory someone could vary __DISAMBIG__ or DEFAULTSORT on the time. [22:26:12] at whichi point a purge would change it [22:26:14] That's a good point. [22:26:42] Yeah, we wouldn't be retreiving it directly from TIMESTAMP< but since wikitext could pass it so somethign that is stored in a page prop.. [22:26:48] DanielK_WMDE_: You can use magic words with parser functions to create instability in any *links table. [22:26:56] [[Category:{{CURRENTTIMESTAMP}}]] [22:27:01] {{DEFAULTSORT:{{CURRENTTIMESTAMP}}}} but links tables... yeah [22:27:06] it's already pretty broken [22:27:17] It's already volatile. It's not really broken. [22:27:34] legoktm: But can it currently cause one purge to trigger multiple other purges? [22:27:35] legoktm: yes, that's exactly one of the ways that you could cause problems if you could access props [22:27:49] Krinkle: That sounds like templates. [22:27:50] Krinkle: the problem isn't that it can trigger it, the problem is that it can't [22:28:01] it would if we allow page prop access since those can vary. [22:28:09] so there's times when you'd have to purge multiple pages just to make one page up to date [22:28:10] I commented on the task [22:28:11] Marybelle: Except templates only ch ange when they are edited, not when they are purged. [22:28:24] FYI: [22:28:25] #link https://phabricator.wikimedia.org/P4806 [22:28:29] Purging a template may re-expand a timestamp, but it won't cascade. [22:28:29] Krinkle: They definitely used to submit jobs without a real edit. [22:28:41] Do you remember someone null-editing Template:! or whatever? [22:28:44] that's the output of select distinct pp_propname from page_props --^^ [22:28:49] And it flooding the job queue for days? [22:29:25] I think the behavior changed after it required someone tracking down the edit. [22:29:36] Marybelle: null-edit != purge [22:29:39] but yea [22:29:40] DanielK_WMDE_: It's very frustrating to store page properties and then not be able to use them. [22:29:51] Sounds like a bug. Neither a null edit nor a purge should result in a cascading purge. [22:30:00] Krinkle: they don't [22:30:09] there is a lot of frustrating things that non the less are as they are for a reason :) [22:30:10] cascading as in, purging all pages that use it as a template. [22:30:12] you can trigger a cascading purge manually though via the api [22:30:33] Or you can just null-edit all the pages. [22:31:04] Marybelle: but don't get me wrong. i'm not totally adamantly against it. i tend to raise all issues i can think of even if in the end i'm saying something like "oh, that doesn't sound to bad" ;) [22:31:23] DanielK_WMDE_: One option is to put sort keys into Wikidata. [22:31:32] But I don't know if it's possible to get them back out. [22:32:02] Or if storing the sort keys elsewhere would be reasonable anyway. [22:32:05] i don't think that makes any sense [22:32:11] the sort key is language-specific, no? [22:32:28] it's specific to the *page*, not the thing described by the page [22:32:41] so wikidata isn't a good place for this info [22:32:49] I think there's a genuine use case for meta data that does not belong in Wikidata. I know that may sound controversial, but e.g. how many pages there are in a category, the size of a page's wikitext, the last author, whether it's a disambiguation page, that's not about the subject, it's about the page. [22:32:50] I dont see why purging is an issue, we already have lots of experiance dealing with that for templates [22:32:51] DanielK_WMDE_: We already have page-specific data, as badges [22:33:10] Krinkle: Local Wikibase? [22:33:22] YairRand: Did badges finally get implemented? [22:33:27] I thought they were controversial on Wikidata. [22:33:43] YairRand: yea. badges can only be other items though, and the list of possible badges is configured. [22:33:49] bawolff: Would you re-use templatelinks or make a new table? [22:33:54] Badges have been implemented for years, yea [22:33:55] Marybelle: Yes, badges were implemented a while ago [22:34:11] i dont support putting data that functionally depends on a page in wikidata [22:34:15] unfortunately guys, i have to go now [22:34:24] bawolff: +1 [22:34:34] jackmcbarn: thanks for comming, jackmcbarn! [22:34:39] i like templatelinks, less complexity to reuse, but no strong opinions [22:34:48] jackmcbarn: Thank you! [22:35:12] Krinkle: i agree, there is a use case for such meta-data. is there a compelling use case for such meta-data in wikitext? [22:35:28] Storing in wikitext or re-using in wikitext? [22:35:34] I think re-using in wikitext has clear use-cases. [22:35:35] re-using. [22:35:48] We store/choose page images, for example. [22:35:54] Why can't I use that page image in an article? [22:36:03] bawolff: should existing badges also be moved off of wikidata? if not, where's the line drawn? [22:36:04] {{#getpageimage:Barack Obama}} [22:36:05] circular dependency [22:36:10] Marybelle: because that would be circular, if you do it on the page itself. [22:36:12] gwicke: Same as templates? [22:36:16] DanielK_WMDE_: ^ [22:36:21] We already have all these problems. [22:36:30] templates can currently be processed in parallel, without circular dependencies [22:36:31] have/had [22:36:36] i honestly dont know what a badge really is [22:36:45] gwicke: Page_title transcludes {{:Page_title}}. [22:36:46] changing that would have major performance implications [22:36:56] but i thought they were like interlanguage links [22:36:59] We track dependencies and have a hard limit, right? [22:37:10] bawolff: I think they're more like featured article badges? [22:37:12] the preprocessor blocks recursive templates [22:37:12] bawolff: things like featured article status are currently stored in wikidata, associated with the sitelinks [22:37:14] e.g. something that is needed on all lang links [22:37:27] so i think that makes sense in wikidata [22:37:31] Ok, straw man time. [22:37:32] Consider {{#pageprop:name|page}} with 1) a whitelist of props 2) no access to the page's own props 3) an entry in templatelinks. [22:37:38] yes? no? maybe? [22:37:49] * bawolff likes [22:37:58] any data that can potentially be added by a template cannot be accessed by the same page during parse without introducing a circular dependency [22:38:36] Computer science has grappled with circular dependencies previously, right? [22:38:38] This isn't novel. [22:38:59] nope, and the solution is predictable [22:39:01] DanielK_WMDE_: Fine with me. [22:39:02] Marybelle: no, but that doesn't mean it's fasible to do at scale, with the current architecture. [22:39:24] i dont see it as a big deal [22:39:29] gwicke: namely? Don't Do That Then? [22:39:45] that's the normal solution, yes [22:39:51] tell users not to expect circular dependencies to be stable [22:39:51] or try to break the cycle [22:39:56] gwicke: When I look at the various ways that wikitext is already volatile, I have difficulty caring. [22:40:05] or just dont allow it [22:40:09] trying to break the cycle is a whole other can of worms, of course [22:40:16] ftr, i'm kind of ok with the strawman, though i'm worried that it will cause a lot of pointless purging, if used pervasively. [22:40:19] You can save pages that a "null edit" will trigger a new revision, for example. [22:40:25] You can use parser functions and magic words in horrible ways. [22:40:43] DanielK_WMDE_: For sort keys, you're talking about millions of uses. [22:40:49] in any case, I don't see us sacrificing parallelism for a feature like this [22:40:50] bawolff: it's hard to detect, if it's indirect [22:40:58] Since every talk page will presumably want access to the subject-space's sort key. [22:41:04] Marybelle: yes, that's what worries me [22:41:13] gwicke: So we do nothing until when? [22:41:37] Marybelle: so every edit to every page will then purge two pages instead of one. effectively doubeling the rendering load. ugh... [22:41:37] Oh, i guess for multipage cycles [22:41:40] access to properties that are stored as separate metadata could work [22:41:52] this is where we are headed longer term [22:42:09] We are? [22:42:10] Says who? [22:42:10] yes. MCR saves the day again ;) [22:42:10] we will also need better means for tracking fine-grained dependencies [22:42:23] DanielK_WMDE_: Right after LiquidThreads and FlaggedRevs, I guess. [22:42:27] And Flow. [22:42:34] all those issues are not specific to this instance [22:42:37] Marybelle: MCR will save Flow, too :D [22:42:53] but they aren't simple ones, and it will take some time [22:42:54] * bawolff sees this as basically equivalent to #ifexist [22:42:55] No, seriously, there is commitment to doing this now. [22:43:00] gwicke: We need many things. We have actual use-cases and problems today, tho. [22:43:11] gwicke: Geological timescale? [22:43:29] bawolff: yes, it seems to be pretty much the same to me, too. [22:43:36] How much time? If I schedule an IRC meeting for January 2018 and there's no progress on storing metadata in some magical place, can we do something else instead then? [22:43:37] it would actually make this more tractable if we didn't pile on more backwards compatibility issues [22:43:50] Also, if we had a time machine. [22:43:55] So. crazy idea. [22:44:01] put something fake into templatelinks [22:44:12] We could make another links table. [22:44:16] like #defaultsort|Whateverpage [22:44:17] It doesn't have to use templatelinks. [22:44:19] ^ [22:44:26] new links table if we're going to do something weird [22:44:29] and purge on *that* when defaultsort changes [22:44:40] heh [22:44:51] Otoh its also like per page REVISION magic words which were vetoed back in the day [22:44:54] ok, new link table. [22:44:56] How often do you all think sort keys change? [22:45:09] a lot less often than normal edits [22:45:15] Sure. [22:45:19] been there, done that, broke the site... [22:45:23] but the proposed scheme would purge on each edit [22:45:43] that's the "fine grained" part [22:45:49] which is missing [22:45:54] if we go for a separate link table, it would of course contain the propname [22:45:59] and only purge when that prop changes [22:45:59] Or it could just be stale for a while. [22:46:03] like wbc_entity_usage [22:46:33] We could also just change MediaWiki behavior. [22:46:49] So that Talk pages sort under the subject-space sort key all the time. [22:46:55] That wouldn't solve the other use-cases. [22:46:58] Like page images. [22:47:00] Marybelle: yes. that'S the alternative. No templatelinsk (or other links), just make the page expire after... what. 1 day? 7 days? the default is 30, no? [22:47:18] DanielK_WMDE_: Parser cache expiry? No idea. [22:47:24] In my experience, never. [22:47:29] touch.py [22:48:04] We're three-fourths through this time slot. [22:48:11] #info We could also just change MediaWiki behavior. So that Talk pages sort under the subject-space sort key all the time. [22:48:11] Have we accomplished anything? [22:48:32] I don't know - did you get any question answered that you wanted answered? [22:48:51] I would honestly recommend to take a second look at the actual use cases, and see if they absolutely need general access to random page properties [22:48:55] I'm only willing to invest time into something that I think can get deployed. [22:49:03] It sounds like there are no solutions that are deployable currently. [22:49:09] So I'll probably go with option 4 (do nothing). [22:49:30] gwicke: There are three use-cases: page image, disambiguation status, and category sort key. [22:49:33] Marybelle: i think the "sort talk pages under the subject page's sort key" actually has merrit. [22:49:35] I like Daniel's proposal about sort keys [22:49:38] for example [22:49:51] gwicke: that was Marybelle's, i just echoed it [22:50:08] ah, right- sorry [22:50:19] so yes, i also think that is a pretty good one [22:50:40] having a generic parser functionn is very tempting, i can see that [22:50:40] And leave page images and disambiguation status for a different day? [22:50:54] page image and disambig status seem likely to become separate metadata [22:51:03] DanielK_WMDE_: I thought when we decided to store page properties that we would actually re-use them. [22:51:09] but i can also see that it will be used a lot, and will cause a lot of load for not much benefit, unless we add special purpose dependency tracking for it [22:51:21] Marybelle: we do! a lot! in code. [22:51:22] Used a lot and not much benefit? [22:51:28] That's a contradiction. [22:51:33] unless we determine that we still need to support manipulating them from templates [22:51:34] If it's used a lot, it has a lot of benefit. [22:51:50] Currently people are using a .NET program to duplicate the sort keys on the talk page. [22:51:52] Like come on. [22:51:59] Marybelle: nost pruging/re-rendering would be spurious, since most edits do not affect the respective page prop. [22:52:08] that's what io meant by "pointless". [22:52:23] We already do a whole lot of pointless purging. [22:52:30] And most pages aren't edited that much, FWIW. [22:52:34] yea! [22:52:41] let's not make it worse by factor 2! [22:52:46] we currently re-render about 400 pages per second [22:52:56] And yet I constantly find stale pages. [22:52:59] that's keeping a non-trivial part of the cluster busy [22:53:13] Between old HTML cache and old parser cache. [22:53:19] Can we set out requirements of what any solution has to do? Like prevent against recursion, have proper cache invalidation via the job queue, etc. [22:53:25] And garbage *links entries. [22:53:42] if we wanted to re-render everything within a couple of minutes, we'd need *a lot* more hardware [22:53:59] remember, single edits can touch millions of pages [22:54:08] legoktm: add to that "allow access only to a select set of page props". [22:54:30] I wonder what an IRC meeting to discuss implementing a pagelinks table would've been like. [22:54:44] 1) Prevent against recursion 2) Have proper invalidation via the job queue/whatever method 3) Only allow whitelisted page properties [22:54:54] Anything else? [22:55:01] Marybelle: at an educated guess, using templatelinks for puring, the sortkey use case, if used on all talk pages, would add 10 to 20 percent to the rendering load. maybe more. [22:55:04] I think a new *links table should meet all of those [22:55:17] don't allow access to anything that could come from templates [22:55:19] in the same page [22:55:30] nor to anything that could come from wikitext [22:55:43] gwicke: i don't see that as a hard requirement. it would return garbage in some cases. not do read damage [22:55:44] So don't use wikitext? [22:55:52] You know the whole site is built on wikitext, right? [22:56:03] It'd be cool if we could, like, leverage that. [22:56:08] DanielKWMDE: lol [22:56:08] legoktm: yes, with a new tracking table, it would work, i think. that's a pretty high cost, though. [22:56:13] easy implementation: return random values [22:56:17] hehe [22:56:24] that would avoid the cycles [22:56:45] ok, 5 minute warning [22:57:25] gwicke: how a bout a nice, generic, central system for tracking dependencies of rendered artifacts?... [22:57:42] DanielK_WMDE_: what's the high cost exactly? [22:57:44] Didn't MCR get postponed indefinitely? [22:57:52] I could swear I heard that. [22:58:02] And that was related to some content model schema change. [22:58:10] DanielK_WMDE_: it's clear that we'll need that sooner rather than later [22:58:34] legoktm: developer time, lines of code to maintain, dba time, storage, i/o. it *extremely* high. but if the use case isn't very compelling, probably not worth it. [22:58:39] err, [22:58:45] *not* extremely high [22:58:57] but even if we had that, we probably wouldn't want to use it with raw page properties as-is [22:59:03] gwicke: +1. this rfc is yet another reason to have it [22:59:13] DanielK_WMDE_: Have what? [22:59:21] DanielK_WMDE_: If Marybelle is willing to take that cost on, and has a worthwhile use case (I think so), then the cost seems reasonable. [22:59:23] What magical thing are you all building and when will it be available to use? [22:59:26] legoktm: with a generic dependency tracking system, the cost would be much lower [22:59:33] All features have cost, i wouldnt call this proposal excessively high [22:59:35] A dependency tracking factory. [22:59:47] We'll be pumping out dependencies in no time. [22:59:47] legoktm: Marybelle can only take the first in that list, right? [23:00:02] Marybelle: https://phabricator.wikimedia.org/T102476 [23:00:32] This sounds like the PubSubHubBub nightmare/rabbit hole. [23:00:35] bawolff: i'm personally undecided on that question. it just may be worth it. or not. [23:00:41] anyway [23:00:43] the time is up [23:00:47] any last comments? [23:00:51] Can we do a #agree on what the requirements are? [23:00:53] Everything is terrible. [23:00:55] Or are we not even agreed upon that? [23:01:00] it sounds like we have pretty broad agreement on looking into solving the category sort issue separately [23:01:07] Marybelle: no, efverything is insanely complex :) [23:01:13] I'm sure RobLa will be moving https://phabricator.wikimedia.org/T102476 forward any day now. [23:01:27] Sigh. [23:01:35] gwicke wants it [23:02:04] legoktm: i think we agree on the requirements you listed earlier. not sure we agree that the list is complete, though [23:02:09] > GWicke lowered the priority of this task from "High" to "Low". [23:02:24] #agree minimum requirements are 1) Prevent against recursion 2) Have proper invalidation via the job queue/whatever method 3) Only allow whitelisted page properties [23:02:51] I hope that's helpful to Marybelle [23:02:55] i actually think this was a useful discussion. even if it may sound dishartening. we have been assessed the cost of implementing the feature as requested, and have considered alternatives. [23:02:57] but I need to go now, bye [23:03:08] legoktm: Bye, thank you! [23:03:18] i'll summarize tomorrow. unless Marybelle wants to do that. [23:03:21] #endmeeting [23:03:22] Meeting ended Wed Jan 25 23:03:21 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [23:03:22] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-01-25-22.02.html [23:03:22] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-01-25-22.02.txt [23:03:22] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-01-25-22.02.wiki [23:03:22] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-01-25-22.02.log.html [23:03:35] DanielK_WMDE_: I'm not sure about useful. [23:03:55] Everything discussed was pretty much already captured on the wiki page, as far as I can tell. [23:04:21] I have no interest in summarizing. [23:05:30] do you have interest in a summary, though? [23:06:08] I'm not sure what a summary would provide that isn't already at https://www.mediawiki.org/wiki/?curid=647378 [23:06:27] Except maybe the part about just making talk pages sort better. [23:06:30] Better/smarter. [23:07:16] thoughts on a separate link table. Thoughts on circular dependencies. Estimates of spurious purging. And yes, considertation of addressing one of the use cases in a different way. [23:07:28] I do think the separate link table is a viable solution btw. [23:08:08] It's just not free. I don't have the info to see all costs/benefits. But it may be worth finding that out. [23:08:56] Marybelle: i have implemented such a separate links table, along with a purging mechanism. i know it's possible, i know it's not trivial, but i also know it may be worth it, depending on the use case. [23:09:55] Marybelle: https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/docs/usagetracking.wiki [23:51:47] Thanks all for the discussion today.