[19:09:37] mic is cutting out [19:12:02] victorgrigas: Wrong channel! [19:12:41] what is the correct channel? [19:16:06] the irony... lecture about remoting is streaming to remotes in a bad quality ))) I hope the recording will be ok [19:16:35] victorgrigas: -staff [19:20:12] i see "before" and "after" slide right now. what was the change between them? [19:41:46] any tips for people w/ pets? [19:42:29] bgerstle: Use -staff please. [19:42:36] ah whoops [20:41:39] loved the talk ! thanks much! [20:42:12] Loads of good content and tips we can apply. [20:42:18] #wrongchannel [20:42:40] * marktraceur is on the verge of moderating the channel [20:43:20] marktraceur: #wikimedia-staff? you need an invitation to join seems like... [20:43:45] Yeah, you should be set for that, I'll make sure [20:44:29] nuria: Go now [20:44:53] marktraceur: working now [20:45:03] Yup. [20:54:55] #wrongchannel2015 [21:00:54] #startmeeting [21:00:54] TimStarling: Error: A meeting name is required, e.g., '#startmeeting Marketing Committee' [21:01:00] #startmeeting RFC meeting [21:01:00] Meeting started Wed Apr 15 21:00:59 2015 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:01:00] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:01:00] The meeting name has been set to 'rfc_meeting' [21:01:21] #topic Watch Categorylinks | RFC meeting | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:01:29] #link https://www.mediawiki.org/wiki/Requests_for_comment/Watch_Categorylinks [21:03:29] Kai_WMDE: hello [21:03:37] hi there [21:03:56] * gwicke implemented such functionality in a previous life [21:04:02] hehe [21:04:04] it looks like a nice feature [21:04:16] i'm wondering whether it should go via the recentchanges table [21:04:26] Hey, Kai_WMDE. [21:04:27] the current implementation is based on echo, right? [21:04:28] I'm on the Collaboration team (maintains Echo). [21:04:36] question: how this is different from just watching category page? isn’t every addition/deletion a change for category page? [21:04:41] correct, DanielK_WMDE [21:04:46] DanielK_WMDE, what current implementation? [21:04:49] SMalyshev, no. [21:04:50] SMalyshev: it isn't. that's exactly what kai wants to fix [21:05:12] I'm a little concerned this is going to result in too many notifications. [21:05:16] superm401: the one kai proposes. it's inprogress afaik [21:05:34] so this is just functionality so that watching category page actually does what it is expected to do? [21:05:34] hey RoanKattouw [21:05:40] Hey sorry for the delay [21:05:41] What's the reason this is proposed to use Echo when the watchlist does not (mostly for the same reason, the watchlist is way more active than typical Echo traffic). [21:05:45] SMalyshev: pretty much [21:05:49] My desk is two floors away from the previous meeting's room [21:05:57] o/ [21:06:13] SMalyshev: the questions is mainly how to make it scale, not nly one the database, but also in terms of ux [21:06:25] I personally think that events like 'was added to category' would best be distributed through a event queue with pub/sub support [21:06:36] "The users are able to configure the desired notification channel (email, web). Both of them are disabled by default to keep the current behaviour. The notifications can be held back for daily/weekly digests" -- these are proposed features, not existing features of echo? [21:06:46] gwicke: that's what the recentchanges table is :) [21:06:55] TimStarling, email/web is implemented. Bundling is implemented. Daily/weekly digests not at all. [21:07:01] what's the rationale for using Echo instead of the watchlist? [21:07:10] +1, this is a key question for me as well. [21:07:17] but yea, a generic pubsub infrastructure would be *very* nice [21:07:38] I wonder though what happens if somebody adds a popular category to a widely used template… [21:07:40] * aude waves [21:07:52] SMalyshev: lots of notifications :) [21:07:59] SMalyshev: that is a possible drawback i mentioned in the rfc [21:08:00] DanielK_WMDE, there is already a hook. You could implement pub/sub today. [21:08:09] I see that as separate from the RFC. [21:08:10] well, we do have pubsub for recentchanges [21:08:16] exactly [21:08:17] I wonder if template ones shouldn’t be separate setting... [21:08:32] There's also a hook specifically for CategoryAdded/CategoryRemoved. [21:08:35] it's not implemented using pubsub, it's just a hook that extensions can plug pubsub support into [21:08:36] SMalyshev: templates are syntactical macros. you can't differentiate [21:08:57] one important bit is buffering in a queue [21:08:59] afaik the hooks is called on every category embedding that template as well [21:09:07] SMalyshev: the parser doesn't (always/reliable) know whether something came from a template [21:09:07] so that bursts of activity can be absorbed [21:09:18] isn't echo already utilizing a job queue? [21:09:50] the big difference between watchlist and echo is that watchlist is pull and echo is push, right? [21:10:05] yeah ^ [21:10:15] is it, internally? you register handler functions to echo? [21:10:19] the recentchanges table performs that push/pull inversion, it is the event queue [21:10:21] or do you mean on the user level? [21:10:30] Kai_WMDE, no, not in production. [21:10:35] Except test wikis and MW.org [21:10:40] i see [21:10:42] Kai_WMDE: i thought echo uses redis directly, but i could be wrong. perhaps superm401 knows better [21:10:43] I mean RecentChange pushes events into the recentchanges table, which is effectively a queue [21:10:52] then it is presented to the user as a pull [21:10:55] DanielK_WMDE: no it just uses a database [21:11:06] TimStarling, can we start with why this is proposed to use Echo? [21:11:10] so... pretty much the same as recentchanges [21:11:16] recentchanges is efficient because it has a single queue per wiki [21:11:34] we don't duplicate events by the number of listening users [21:11:38] it also only has a single event type [21:11:47] not really [21:11:51] there is a type field [21:11:55] gwicke: no, it has rc_source so you can put it other stuff [21:12:03] put in* [21:12:10] though it currently only supports a handlful of hard coded types [21:12:22] so we'd add 'page x became member of category x' in there? [21:12:25] custom types and type categories can be defined [21:12:40] (even by other extensions) [21:13:03] gwicke: why not? there's a blob field too, for the extra info you need [21:13:13] that would potentially be a lot of events [21:13:16] more than edits [21:13:28] yeah, I don't see how that could perform well [21:13:37] TimStarling: via template edits, yes [21:14:10] gwicke: i also see the performance issue. it's the same whether we use echo or rc, right? [21:14:21] DanielK_WMDE, it's not the same at all. [21:14:22] so events triggered by template edits definitely have to be filtered/ignored [21:14:45] superm401: echo would be worse if it has one queue per user. or am i missing something? [21:14:55] #info Kai_WMDE: so events triggered by template edits definitely have to be filtered/ignored [21:15:00] The Echo table has an event entry for every user/event pair. [21:15:06] Kai_WMDE: yes, but I don't see a good mechnism to do this [21:15:07] that'd be a work-around [21:15:30] superm401: so echo is a lot worse for events targetet at a lot of people [21:15:45] DanielK_WMDE, it's not designed for a watchlist scenario. [21:15:47] DanielK_WMDE: why is it hard to filter template changes? [21:16:10] the hook doesn't pass that info [21:16:21] it has to be determined, which might be expensive as well [21:16:23] As far as I know, the watchlist is just implemented by filtering the RC table on titles in the user's watchlist table. So the only per-user data stored is the watchlist table, which is solely user->page. [21:16:25] TimStarling: when the page including the template is re-parsed, how do you know which categories come from the template? [21:17:01] distinguish by who calls LinksUpdate [21:17:13] if it is the main reparse on save, emit RC events [21:17:22] so, some calls to LinksUpdate would just not trigger the notifications? [21:17:28] DanielK_WMDE: rc_source is not for hardcoded things [21:17:28] yes [21:17:34] * aude catching up [21:17:34] or have LinksUpdate return an event list [21:17:41] or trigger only for those that explicitly asked for it [21:17:46] which the caller will either dispatch to RC or not [21:17:49] rc_type was hardcoded things and is essentially deprecated [21:18:10] aude: i wous thinking of rc_type [21:18:13] *was [21:18:57] but wasn't that replaced by rc_source? [21:19:15] legoktm, it's in-progress/stalled. [21:19:15] legoktm: yes [21:19:19] there's also rc_log_type, it's all very confusing :) [21:19:23] the old thing is not gone yet [21:19:26] https://phabricator.wikimedia.org/T74157 [21:19:40] but people should stop using it (e.g. core, wikibase) [21:19:48] so... is there consensus that this should use RC rather than echo, because echo doesn't scale well in a multicast scenario? [21:20:16] DanielK_WMDE, I would favor that. Also from a user experience point of view, Kai_WMDE has not explained why they want to use Echo. [21:20:50] +1 on using watchlist/rc [21:21:04] superm401: i assumed there was some kind of job queue [21:21:16] Kai_WMDE, there is, but it's an optional feature not turned on everywhere. [21:21:45] Only problem with watchlist is that you might want two ways to filter ("Only show things affecting categorization" vs. "Only show things in namespace X that are added to/removed from categories"). [21:21:53] what happens if someone adds, say, 10000 categories to a page in a single edit? [21:21:57] If desired, that could be done by improving the watchlist UI (though it might be a little complicated). [21:22:05] should the edit just be rejected? [21:22:09] TimStarling: as a DoS attack? [21:22:15] I'm still sceptical about performance, and consider filtering a bit of a hack [21:22:23] If definitely can't be rejected, it would break templating. [21:22:29] superm401: 10k categories?? [21:22:46] Oh, 10,000 categories to a page, I thought you meant 10,000 pages affected by a cat change to a template. [21:22:48] deliberate or accidental DoS, people do strange things sometimes [21:22:49] My bad. [21:23:05] they do indeed :P [21:23:08] TimStarling: either reject it or only send notifications for the first 100 or other arbitrary number? [21:23:09] Yeah, that could be rejected. [21:23:20] maybe there should be a limit on how many notifications are sent? [21:23:24] if you truncate the list, then what happens when the change is reverted? [21:23:40] notification limitation is also a feature e [21:23:42] Does anyone actually still favor using Echo for this? [21:23:43] of echo [21:23:57] TimStarling: notifications about the revert would again be truncated, so they would only be sent for the first 100 or so [21:24:02] Kai_WMDE, what limit are you referring to in Echo? [21:24:21] configurable maximum number of notifications [21:24:39] Kai_WMDE, total number of notifications per user? [21:24:43] yes [21:24:51] I think people are more talking about number of notifications triggered by an edit (across all users). [21:25:30] probably truncation is the best we can do [21:25:33] It might make sense to use a different table, so as not to flood recentchanges, but a similar model. [21:26:01] recent_category_changes, and then keep the watchlist table as is. [21:26:05] Not sure if that would help. [21:26:07] well, there is worst case event flow, like adding 10k categories to a page [21:26:15] and there is typical event flow, which is probably more relevant [21:26:25] I don't know if typical event flow will actually flood RC [21:26:37] especially if we don't log updates due to template changes [21:26:39] from a user perspective I'd prefer a collapsed, but complete listing for template changes that add a category to many pages [21:26:59] gwicke: enhanced watchlist? [21:27:02] * DanielK_WMDE is annoyed by Konversation giving superm401, TimStarling and Kai_WMDE all the same color. [21:27:27] DanielK_WMDE, yeah, I hate that you can't reconfigure the bucketing. [21:27:31] legoktm: it sounds like it would be incompatible with truncation / filtering though [21:27:41] TimStarling, if we just ignored template changes a lot of things would get missed. [21:28:53] gwicke: so, since the change event would be tied to the category, it could be a single even (soingle row in rc) for all the pages? [21:29:08] there is rc_params, you can fit a lot of bytes in there [21:29:22] DanielK_WMDE: maybe, yeah [21:29:42] if we had an event in MW marking the end of all refreshLinks jobs for a given edit, that would be feasible [21:29:43] definitely a lot more efficient [21:29:55] assuming we wanted the RC event to fire at that time [21:29:57] the problem is of course that currently, the hook would be fired by each page individually, when each gets re-rendered. [21:30:24] TimStarling: isn't there a concept of sub-jobs? [21:30:34] yes, it just doesn't do this exact thing [21:30:36] wouldn't that allow a bracket atround all the links update jobs? [21:30:45] maybe it could be extended to do that [21:30:49] yes [21:30:53] there's a root job id [21:31:05] is aaron around? [21:31:18] If we don't need to filter on the page side (only show Project namespaces pages that are added to categories I'm watching), the rc_params idea could work. [21:31:29] Kai_WMDE: did we lose you along the way? [21:31:30] 356 pages were added to Category:Foo, which you're watching. [21:31:33] I don't think it's straightforward for a job runner to figure out when the last job of a split job is processed though [21:31:36] And show the exact pages if it's a small number. [21:31:38] there are retries etc to consider [21:31:47] DanielK_WMDE: He's on IRC... can invite him, if needed [21:31:52] I'm following :) [21:32:15] hoo: me might have ideas on how to best do this [21:32:40] anyway... [21:32:53] Hook is not necessarily the right way to do this. [21:33:12] there is two extreme cases: a category being add to a lot of pages via a template. could be ignored. and a lot of categories being added to a page. could be truncated. [21:33:15] but, the root job could have a link to an id or the like that makes it possible to look up the list of affected pages separately [21:33:33] and jobs could just use the same id to extend the list of affected pages [21:33:56] gwicke: could use the revision id to tie them all together [21:33:57] so rc_params would have the id rather than the actual titles [21:34:09] I think ignoring changes caused by a template reduces the utility. At the very least, need to show the root change "Template:Foo was added to Category:Bar", even if the consequent changes are ignored. [21:34:19] DanielK_WMDE: yup [21:34:47] gwicke: yeah, it is possible, at the expense of at least 3x longer development time on top of Kai's proposal [21:34:50] superm401: that's not necessarily true, categorization is often done in sections [21:35:11] DanielK_WMDE, true, so in some cases that root change could not be shown. [21:35:20] #info I think ignoring changes caused by a template reduces the utility. At the very least, need to show the root change "Template:Foo was added to Category:Bar", even if the consequent changes are ignored. [21:35:55] superm401: since it doesn't affect the category, it wouldn't be a change to that category. there would still be the edit to the template itself recorded in rd [21:35:56] TimStarling: not sure; with such a scheme it might be feasible to keep everything in recentchanges [21:35:58] *rc [21:36:05] and only move out the affected pages [21:36:21] each template edit would only cause a single RC entry [21:36:36] #info However, with , the template itself is not always affected. [21:36:50] a long time ago I lobbied for having a timestamp in categorylinks [21:37:00] yeah, what you are saying is possible, and 3x longer is not the end of the world [21:37:27] if that's still there, maybe that could even be used as a cheap way to figure out which pages were affected [21:37:43] revid would be better of course [21:38:12] I propose a baseline implenmentation: make RC entries when pages are added to / removed from a category. cut off if there is too many categories being added/removed. ignore hook calls if the parse was triggered by a template edit. [21:38:18] categorylinks has always had a timestamp [21:38:32] hehe [21:38:37] by long ago do you mean like a decade? [21:38:37] not always ;) [21:38:39] that would give us a baseline that is easy enough to implement, easy to expand on, and safe enough [21:38:54] TimStarling: yeah, roughly [21:39:05] 2005 sounds about right [21:39:12] :P [21:39:19] gwicke thinking ahead as always [21:39:41] DanielK_WMDE, baseline should also specify how it's exposed in the user interface. [21:39:43] I used it for a category-based news feed for wikinews back then [21:40:33] for this use case the timestamp wouldn't necessarily be unique [21:41:51] superm401: in the watchlist, as usual. but i'm not sure how you'd subscribe to these events [21:42:05] watch the category [21:42:14] what would happen if the watchlist looked for all pages in watched categories added after last watchlist TS in categorylinks? [21:42:29] TimStarling: no way to watch just the category page, not the contents? [21:42:40] i mean, i'm all for it, but people *will* complain [21:42:40] no [21:42:57] meh [21:43:12] gwicke, what about "removed from category"? [21:43:20] we'll be killing some uncommon but important use case :) [21:43:22] this is a 4 digit bug number we're solving here [21:43:24] superm401: yeah, that part would work less well [21:43:24] people will love us [21:43:27] :) [21:43:43] superm401: be positive, only show new shiny categories ;) [21:43:44] I don't think "watch cat page, not contents" is needed (it might be if we went the Echo route). [21:43:53] TimStarling: i was thinking of the visible whining minority :) [21:44:07] On the watchlist, we could probably allow filtering ("Don't show categorization changes"). [21:44:08] but yea, i'm pretty sure most people will love it [21:44:31] superm401: yes, you are right. except you have to implement this twice [21:44:41] because there are two watchlist implementations [21:44:41] DanielK_WMDE, implement what twice? [21:44:49] Oh, enhanced? [21:44:51] yea [21:44:55] isn't it lovely? [21:44:58] whether there is some important use case for watching category page contents is a question for product managers, right? [21:45:09] DanielK_WMDE: more than 2 [21:45:16] DanielK_WMDE, oh, believe me, we know plenty about watchlist and RC proliferation from Flow. [21:45:17] mobile stuff, feeds, etc [21:45:19] oh rly? [21:45:27] * aude thinks of feeds as another view [21:45:30] then api [21:45:35] TimStarling, there are no product managers for "general core" anymore, right? [21:45:38] aude: true [21:46:30] implementing this filter in all the right places is going to be harder than the actual functionality >_< [21:46:32] * aude wants some of this code more unified [21:46:41] and vary where needed on presentation things etc [21:46:56] yes, we should take that opportunity to consoliidate this somewhat [21:47:05] baby steps.. [21:47:17] yet, also remember recent changes / related changes share code with watchlist [21:47:19] so I think we are... [21:47:31] which is good yet can present challenges [21:47:36] #agreed use RC not echo [21:47:36] aude: maybe you could work with kai directly on this [21:48:05] maybe though have bazillion things to do :o [21:48:20] but can help with questions, at least or something [21:48:46] #info Two basic implementation options: "baseline" with refreshLinks ignored, and "full" with refreshLinks merged per edit, end of job batch event [21:48:48] yea, i was mostly thinking of helping him to dig in when needed [21:49:01] * aude nods [21:49:32] #info UI question unresolved: is it necessary to continue to allow watching of category page contents only [21:49:39] TimStarling, RC line per cat-page pair, or one per category? [21:49:45] Latter with rc_params. [21:49:49] #info watching a category page subscribes to pages being added/removed. filtering should be possible in all the various rc/watchlist feeds [21:49:58] #info alternative to end of job batch event would be to point from RC entry to ID and lazily store information about changes from jobs under ID somewhere [21:50:31] gwicke: a mutable event? [21:50:49] no, an event storing an id of associated information [21:50:51] if you see the RC entry early, it will not have all the page changes in it [21:50:52] which is built lazily [21:51:08] a delay + coalescing mechanism, maybe... [21:51:35] resourcing? [21:51:43] end of batch is difficult to establish afaik [21:52:12] we've had parsoid update jobs being retried for about a year [21:52:32] not sure if they are still in there [21:52:36] TimStarling: wmde comitted to implement the feature, though the resourcing is limited [21:52:57] anyone from WMF here interested in helping? [21:53:07] Kai_WMDE: how much time is planned for this, do you know? [21:53:11] superm401? [21:53:29] TimStarling, I can be available to provide support/code review. [21:53:42] DanielK_WMDE: since the concept isn't completed, there is no time planning yet [21:54:39] DanielK_WMDE: WMDE will still be committed if we require the full template-aware feature? [21:55:47] TimStarling: help from WMF with the job queue side of things would be very much appreciated in that case, I think [21:56:03] ok [21:56:26] RFC status? [21:56:47] needs to be updated, i guess [21:56:57] rfc itself [21:57:07] TimStarling: do you think we need another discussion about the updated rfc, or can kai just go ahead? [21:57:18] so, is this something that WMF would use? [21:57:22] Might be good to have 15 minutes to review updated RFC in the future. [21:57:33] gwicke: definitely think so [21:57:35] gwicke: yes [21:57:40] somethign the community wants [21:57:59] WMDE are doing this because the German Wikipedia wants it [21:58:18] I see; what are the minimum requirements for a WMF deployments then? [21:58:25] *deployment [21:58:33] just baseline, or category support? [21:58:38] I guess we can decide that in a second meeting [21:58:50] gwicke, you mean baseline or template support, right? [21:58:51] also we should have a wikitech-l post [21:59:00] superm401: yeah, sorry [21:59:05] multitasking ;) [21:59:14] TimStarling, summarizing the meeting? I can send that if you want. [21:59:35] superm401: yeah, especially the two open design questions [21:59:42] if our minimum requirement is template support, then I think it would be good to see whether we'd have the resources to make that happen [21:59:57] before starting work [22:00:14] TimStarling, two being: [22:00:21] 1. Whether to ignore template changes, or merge them at batch end? [22:00:34] 2. Whether to allow watching only cat page, while ignoring categorization/decategorization [22:00:36] ? [22:00:40] yes [22:01:05] I'm glad we had a full hour for this RFC [22:01:13] simpler: ignore the individual updates triggered by a template edit, and just generate an update for all pages that include the template [22:01:26] we're out of time now, thanks for coming everyone [22:01:27] DanielK_WMDE, wasn't that the batch end option? [22:01:28] much easier than collecting info from multiple events [22:01:37] thanks for writing the RFC, Kai_WMDE [22:01:50] Yep, thanks. [22:01:50] thank you all [22:02:01] superm401: well, batching jobs/events after they have been queued is much harder than posting a single "bulk" event directly [22:02:35] DanielK_WMDE, but how do you know what to put in the bulk event. In theory, a template could cause every page transcluding it to go into a different cat, right? [22:03:07] #endmeeting [22:03:09] Meeting ended Wed Apr 15 22:03:07 2015 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:03:10] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-04-15-21.00.html [22:03:10] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-04-15-21.00.txt [22:03:10] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-04-15-21.00.wiki [22:03:10] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-04-15-21.00.log.html [22:03:16] superm401: yeah, good point [22:03:20] superm401: if the category name includes parameters, then yes. this isn't uncommon. needs more thought [22:03:26] anyway, time for bed [22:03:28] ttfn [22:03:36] goodnight! [22:03:39] Have a good night. [22:03:45] even a simple conditional in the template can assign different categories