[21:57:42] just about to start https://phabricator.wikimedia.org/E138 at the top of the hour [22:00:45] #startmeeting RFC meeting [22:00:45] Meeting started Wed Feb 3 22:00:45 2016 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. [22:00:45] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [22:00:45] The meeting name has been set to 'rfc_meeting' [22:01:04] #topic Expiring watch list entries | RFC meeting | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [22:01:11] #link https://phabricator.wikimedia.org/T124752 [22:01:16] o/ [22:01:20] \o [22:03:09] hello [22:03:13] the first question on the key questions list is "Should the expiry date be its own field or should a properties field be introduced?" [22:04:01] So, I think that one has basically been answered. Although per discussion on T124752 a properties field might be useful further down the line (for other features) [22:04:36] is there a UI proposal? [22:05:34] #info re: expiry date: addshore believes it's basically been answered, though a properties field might be useful further down the line [22:05:39] the first thing that concerns me about this is that we are introducing user-visible complexity, so the question is whether we can manage that in the UI without a lot of clutter [22:05:39] FWIW, I am inclined to agree with MZMcBride that while this feature would be useful, it is not solving the problem at the right level of generality. Items in watch lists can and should be differentiated using any number of criteria. [22:05:52] i think there is quite a few unanswered UI questions. but it seems like we'll need an expiry date field in the watchlist table in any case. the question is really whether that is sufficient, or whether we should add more fields right away. [22:06:14] for instance, if we want to be able to reset the expiry period, we have to somehow know by how much. [22:07:18] I think this RfC (which -- full disclosure -- I co-authored) from 2013 is the right approach: . [22:07:24] #info question discussed: is this solving the problem at the right level of generality? [22:07:36] ori: so we may want anotehr field, for named sub-watchlists? or join a watchlist_tags table? [22:07:49] DanielK_WMDE: yes. [22:08:06] jynus: I thought you might want to be involved since there is a proposal for DB changes, and I know watchlists can cause performance problems when they have a lot of pages [22:08:13] 1) use case 2) design [22:08:20] and maybe another field, for things like whether to watch category membership changes and such... [22:08:21] ori: do we need to decide which of the two we want before we can agree that we want an expiry field? [22:08:37] Nothing you have said for now is good or bad without a concrete use case [22:09:02] ori: or is that really an orthogonal proposal? at least on the database level, it seems unrelated to me. though in the UI, the two are closely connected [22:09:11] if you want basic functionality, timestamps is ok, if you want more, needs more [22:10:16] People add pages to watch lists for all sorts of reasons: to see if a recent change they had made will stick; to keep an eye on pages which they have authored (or substantially revised); to watch pages which are prone to vandalism or edit wars; to stay up-to-date with topics or discussions they are interested in. [22:10:25] on the patch, addshore asked whether purging of expired watchlist items would be required [22:10:26] So part of the RFC was exposing UI changes as a beta feature only [22:10:52] while changing the watchlist, it may make sense to fully remake it and allow multiple watchlists [22:10:54] my answer was yes, add an index and an autoincrement wl_id field to support batching, similar to recentchanges purging [22:11:07] TimStarling: i think there should be a maintainance script for purging expired entries. if we want to run it, and how often, is another question. [22:11:08] You can either add columns (and user-interfaces) for each of these use-cases, or try to provide some general feature that users can build workflows on top of. [22:11:30] +1 for wl_id [22:11:39] you can theoretically do batching without a wl_id field, but if we are talking about having options attached to a watchlist entry then it makes sense to be able to refer to it with an ID [22:11:56] So in the current patch both wl_id and job based purging have been implemented [22:12:47] #info: question discussed: what sort of database/maintenance overhead does this impose? will this require a maintenance script? [22:12:56] ori: watchlist_tags plus maybe a wl_info blob should do. do you think another join would be ok? we are already often joining watchlist wioth recentchanges, I suppose. [22:13:49] I think so, but I'll let jynus respond :) [22:14:09] I don't think wl_info should be added until there is a use case for that [22:14:23] * aude waves :) [22:14:47] ...same for the watchlist_tags table, i guess. it's related, but should be done separately. [22:14:51] I feel these can all be done independently [22:15:08] i agree though that it would be very nice if i could set up multiple watchlists, and define an expiry period for each. [22:15:30] The watchlist UI is already beyond horrendous; it needs to be significantly simplified. Adding more things to it for expiry is a great use-case but I worry is going to make them even worse without a plan to improve them more generally. [22:15:32] addshore: for the baseline case, when and how would an expiry be set? [22:16:15] *goes to find a ticket* [22:16:32] addshore: it should at least become possible via the api, and displayed on Special:Watchlist/edit or whatever the watchlist management page is called. [22:16:55] Yes, so currently in the patch expiry can be set through ApiWatch [22:16:58] James_F: as the task points out, it is commonly requested, initially in 2006 in T8964, but it's common for power users to request terrible clutter without regard for the consequences [22:17:25] UI changes to Special:Watchlist etc would be a separate change (and the API change could hold off until then) [22:17:41] witness action=history when you have oversight permissions [22:17:41] changes to the UI can be introduced as a beta feature and slowly worked on. [22:17:47] TimStarling: Indeed. I'm not saying "no", just asking for ideas on how to make them better for all our users, not just the few power users that would actually use this. [22:17:59] * James_F nods. [22:18:13] addshore: i think it should at least be visible if an expiry date is set. [22:18:16] Also there were some basic wireframes done at https://phabricator.wikimedia.org/T103309 a while back, although I imagine that would be far in the future [22:18:49] DanielK_WMDE: indeed, hence holding off on the API change until Special:Watchlist is sorted would make sense [22:19:40] yea, but perhaps adding *display* of expiry dates to the watchlist edit page is simple. Figuring out the editing interface for expiry may take a while [22:19:57] so i would suggest the first for the baseline, and keep the editing interface for later [22:20:23] yup, and of course if the api is exposed I imagine people will create gadgets in the early stages [22:20:43] then they can figure out the management interface themselves ;) [22:20:44] it may be the case that power user interface than typical editors & readers, in the same way that most programming IDEs are more complicated than most software UI [22:22:17] robla: Well, ish. Adding more complexity to a power-user interface means some of the power-users on the threshold of not using it (including editors who aren't active yet) will choose not to. It's not just the current population we have to worry about. [22:22:24] Okay, so It sounds like patch 1 should add the internals, but not expose anything, patch 2 adds basic epic support and viewing of expiries on Special:Watchlist, and then the UI can more forward from there [22:23:20] about the API, it looks like we have strtotime() in there now, which allows relative dates [22:23:29] yes [22:23:37] I wonder if anomie or other people interested in the API have comments on that? [22:24:08] So, the way the expiry is done in the patch is taken from the protection api currently (which also has expiries) [22:24:31] right, should be uncontroversial then [22:24:43] relative dates sound like a conveniant feature to have in the api [22:24:52] as long as we don't end up with relative dates in the database [22:24:58] how syncronous would be the expiration? [22:25:24] what do you mean, jynus? [22:25:41] #info addshore> So, the way the expiry is done in the patch is taken from the protection api currently (which also has expiries) [22:25:42] jynus: so all queries looknig for watchlist items will take the expiry field into account (ignoring expired thing) so immediate [22:25:51] jynus: the current patch adds a wl_expiry>now condition to the selects [22:25:54] would we need to filter by expiration date on show or would we assume if it is there it is not expired? [22:26:01] purging will be done shortly after (through not immediate) [22:26:14] that worries me because of 2 ranges, that ones [22:26:34] if there is no index then it should be OK as long as the purging is done promptly, right? [22:26:41] *that one, and the one the article changed [22:26:48] mysql will just have to skip a few rows [22:27:03] filter == order_by for performance purposes [22:27:06] yea, we *could* rely on puring, if we have to. things may get watched for a few minutes more than intended. that shouldn't be a problem [22:27:22] is there precedent for ephemeral data being stored in the database? [22:27:25] maybe FORCE INDEX will be required to stop mysql shooting itself in the foot? [22:27:36] DanielK_WMDE: yeh, doesn't sound too worrying having things watched for a short amount of time extra [22:27:38] ori: recentchanges? [22:27:47] ori: ipblock? [22:28:02] #info question discussed: how quickly does expiry-based watchlist purging need to happen? does the feature need to rely on purging to work? [22:28:05] recentchanges is more of a materialized view imo [22:29:21] I mean, I agree it is not the perfect datastore, but I would not see as a huge problem with my previewed throughput [22:29:52] we have "worse" performance problems regarding purges [22:30:17] but it would be interesting to have some data about watchlist usage [22:30:32] yeah the number of deleted rows should be tiny compared to recentchanges [22:30:43] Yeah, but the mismatch of data to data store is not just an issue for query performance; there is also the added maintenance complexity and conceptual baggage that goes with introducing this to MediaWiki, IMO. [22:31:38] it means data changes independently of user actions, which is hard to reason about [22:31:56] ori, what are you talking about? MediaWIki is _easy_ to understand! ;-) [22:32:02] in the case it would be a huge issue, we could just *not* delete rows- and handle that at server side [22:32:05] as has been mentioned, this feature just follows a number of existing precedents [22:32:13] performance problems are more that one can watch e.g. 30 days, and there are bazillion filter combinatins and have many thousands of pages they watch [22:32:17] namely protection expiry and block expiry [22:32:26] so, being able to load Special:Watchlist, etc [22:32:41] * aude doesn't think purging would be a problem [22:33:02] recentchanges purging is also a precedent, except on a much larger scale [22:33:10] we are storing rendered pages in mysql for 3 months and purging them [22:33:15] aka parsercache [22:33:20] I think we can handle that [22:34:12] Okay, so another thing on the list was possible refactoring [22:35:10] do we currently have a "watchlist cleanup" tool or anything like that now? (other than just getting the big text field?) [22:35:46] we have Special:EditWatchlist [22:35:47] robla: not in core or as an extension [22:36:34] an alternative that would serve the same use-case without introducing the same maintenance overhead would be a creation timestamp for each watchlist entry, instead of an expiration time. [22:37:07] If you could sort by creation time, the items that you added temporarily would presumably be at the top [22:37:14] Special:EditWatchlist gives you your watchlist with a big list of checkboxes, you click the checkboxes to remove the items, it is basically a cleanup tool [22:37:14] ori: indeed, and then have some way of saying, remove all items that have been in my watchlist for X days [22:37:28] so it helps with that, and it also helps in the case where you don't know in advance how long you'll want to keep something [22:37:30] the textbox thing is Special:EditWatchlist/raw [22:37:47] e.g. when you want to watch a page until some controversy dies down [22:37:57] ori: the question is granularity, If a user wants to watch page A for time period X and page B for time period Y then having a start time and sorting by that doesn't really help [22:38:00] BTW, the watchlist editor doesn't work on extreme cases [22:39:12] addshore: that's true, but I guess it depends on what your requirements are for a solution. If only full automation will do, then yeah, you need expiration. But if you can be satisfied with making what is now very tedious substantially easier and quicker to do, then there is an opportunity to do this in a manner that is simpler. [22:39:13] and I think the same thing applies with the discussion of multiple watchlists each with an expiry time [22:41:02] The TTL will generally be short -- I find it hard to imagine that people will want to add items that expire in a year. So the set of recently-added watchlist items is a superset of ephemeral watchlist items. [22:42:32] Oh, and there's also the fact that many people add all pages that they edit to their watchlist by default [22:42:49] ori: combiend with tags, this should work: "remove entries older than x days and tagged with quux" [22:42:57] * aude adds all the pages :) [22:43:03] tags would be cool, yes [22:43:06] (or "from list 'potentially crazy people'" or whatever) [22:43:38] one obvious tag would be "auto-added" [22:43:43] If you like the convenience of having pages added automatically, without you having to explicitly add them, then presumably you would not stop (or want to stop) to declare an expiration time for each item that is added [22:43:54] and "created". that'S a good one, i want that. [22:44:03] which means that you will accumulate a backlog of watchlist items which you only cared about for a short period of time but which do not have an expiration time [22:44:32] #info questions discussed: is full watchlist cleanup automation required? would tags be helpful? do many people add all pages they edit to their watchlists? [22:44:35] being able to sort by creation time solves for that [22:44:49] ori: i want a way to automaticalyl watch everything I *create*... maybe that just just always eb the case. [22:45:20] DanielK_WMDE: that exists [22:45:20] ori: you would want a default expiry time for auto-added entries [22:45:25] as an option also [22:45:34] like watching anything you edit for just 1 week [22:45:58] #info question: do we want an expiration date, or a watched-since timestamp? [22:46:20] Platonides: I don't think so; the fact that an item is added automatically does not necessarily mean you don't want to watch it on a persistent basis (see DanielK_WMDE's comment above about created pages). If you have an automatic expiration date, then you need to go an undo it for every item that is added automatically which you actually want to retain. [22:46:24] expiring watchlist entries would be more useful for watching talk pages [22:46:26] Or you forget to do it, and it silently disappears from your watchlist. [22:46:38] and something i would more explicitly set [22:46:45] (or maybe as an option) [22:46:53] e.g. user talk pages also [22:46:56] ori: that depend if you do more watching of pages you want permanently [22:47:04] or of pages you want temporarily [22:47:22] I wouldn't be surprised if people started wanting different expiries per NS, too [22:47:40] but a Gadget could choose a different expiry based on the page you are in [22:48:01] Possibly helpful use-case: An editor who deals with a lot of edge-case deletions, and nominates dozens of articles per day (for discussion/deletion), and wants to watchlist each of the articles AND their respective AfD subpages, for 10 days. [22:48:10] having some way of bulk editing your watchlist based on sortable view(s) of pages one has edited and pages one created is the feature I would like [22:48:22] Platonides: which I think supports the point I made earlier, that people overload the notion of a watched page to mean all sorts of things, for all sorts of purposes, and the only way to accommodate that without forcing everyone to adopt a particular model or implementing every possible workflow is to implement something sufficiently general that it can be dressed up to represent any workflow. [22:48:24] robla: +1 [22:48:27] you know the only answer to this questions, right? A/B testing, choose based on user feedback... [22:49:24] ori: I support that [22:49:31] so... sounds like an "watched-since" timestamp would be usedful. but it caters to slightly different use cases / usage patterns than the expiration date. [22:49:41] having both... is tempting, but might get confusing... [22:50:28] I guess with a combination of watched since and maybe a tag of number of days to expiry would actually work for most of the main cases for the expiry field? [22:50:47] As I argued above, I think "watched-since" covers more use-cases, at the cost of being nominally less convenient for the specific use-case the RfC solves for -- and it is substantially easier to maintain, because it does not require scheduling future updates to the data [22:50:50] if we assume the bulk of watched things will not have an expiry [22:50:57] jynus: A/B-testing only works if you have a well defined goal and a pretty homogenous interaction path. hard to do with the diverse user base and complex interactions on wikipedia & co. [22:51:01] addshore++ [22:51:21] specially since these are slow-to-adopt use cases [22:51:26] DanielK_WMDE, and even that, we do not have the resources to code both, then throw one away [22:51:43] and once you support it, you would get a lot of complaints if you removed X from people's watchlist [22:51:44] jynus: right [22:52:00] (even if it's actually a property of the wl item) [22:52:01] ori: well, it sounds like that might be the next iteration of this proposal then [22:52:04] (9-minute warning) [22:52:22] addshore: I'm really glad you think so! [22:52:30] Okay, Would it make sense to try and squeeze in a bit of talking about possible refactoring? https://gerrit.wikimedia.org/r/#/c/267259/ [22:52:52] if you have very specific and simple questions [22:52:55] ori: i like the "magic is bad" argument. pretty convincing. stuff shouldn't just vanish. it should just be easier to clean up. [22:53:18] TimStarling: perhaps not then, I tried to untangle WatchedItem in the patch linked above [22:53:30] DanielK_WMDE: yeah, you put it better [22:53:55] I will comment on gerrit later [22:54:02] #info I guess with a combination of watched since and maybe a tag of number of days to expiry would actually work for most of the main cases for the expiry field [22:54:13] TimStarling: many thanks! [22:55:13] ok, any other action items? it seems like we're petering out [22:55:29] tangential question i came across when looking into this: why are we not logging page creations to the log table? [22:55:50] sounds like a question for another RFC ;) [22:56:03] that would make it easy to find pages created by a given user efficiently [22:56:15] TimStarling: well, it'S just a question, not a proposal :) [22:56:18] I think that's it from my side! [22:56:22] and solve the typical "I want to know the page creator" [22:56:35] #endmeeting [22:56:36] Meeting ended Wed Feb 3 22:56:35 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:56:36] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-02-03-22.00.html [22:56:36] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-02-03-22.00.txt [22:56:36] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-02-03-22.00.wiki [22:56:36] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-02-03-22.00.log.html [22:56:44] Thanks everyone! :) [22:56:45] Platonides: yes. and it would be very simple. [22:56:48] just a few lines of code [22:56:50] thanks addshore :) [22:57:27] DanielK_WMDE: not so much when you have recreated pages, merges… [22:58:03] Platonides: restoring a page creates a different log entry. merges... are rare, and not really supported. [22:58:18] (i'd love that, but that'S a different story) [22:58:33] thanks TimStarling for chairing! [22:58:44] next week: https://phabricator.wikimedia.org/E140 [22:58:48] I'm just listing things that would break the "creator" list [23:07:47] We do have page creator stored somewhere? or at least, it's listed at action=info... https://en.wikipedia.org/w/index.php?title=Monk&action=info#Edit_history [23:08:21] DanielK_WMDE, ^ [23:08:57] quiddity: it's easy to find the creator of a single page from the history [23:09:21] but if you want to list all pages a given user created, the query becomes very awkward and inefficient [23:09:44] ah, so action=info is just checking that info when requested? [23:09:50] you have to get the minimum revision id, then grab the user from that revision, filter by that user, and then join in the page table [23:09:52] takes forever [23:10:05] quiddity: i assume so. for a single page, it's no problem [23:10:23] ok. I figured I was probably missing something. :) ty [23:10:45] just writing it to the log table would be so simple... it must have been suggested before... [23:11:19] DanielK_WMDE, https://phabricator.wikimedia.org/T12331 ? [23:12:16] quiddity: Tested on MediaWiki 1.10.0. wow [23:12:51] "It was the best of times, it was the worst of times..." [23:12:58] hehe... [23:13:39] hm, i think i'll just plunk it into the rfc process and see what happens ;) [23:13:40] :)