[21:01:28] hello [21:01:32] o/ [21:01:38] Howdy! [21:01:55] hello! [21:02:00] today's topic: https://phabricator.wikimedia.org/T164990 [21:02:15] * gwicke tries to activate the bot [21:02:51] #startmeeting https://phabricator.wikimedia.org/T164990 | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:02:51] Meeting started Wed Jun 14 21:02:51 2017 UTC and is due to finish in 60 minutes. The chair is gwicke. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:02:51] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:02:51] The meeting name has been set to 'https___phabricator_wikimedia_org_t164990___wikimedia_meetings_channel___please_note__channel_is_logged_and_publicly_posted__do_not_remove_this_note____logs__http___bots_wmflabs_org__wm_bot_logs__23wikimedia_office_' [21:03:43] coreyfloyd or mdholloway: could you give a quick summary of what you are trying to do, and what you would like to get out of this meeting today? [21:03:57] gwicke: sure… [21:04:37] Reading wants to create a service for syncing private reading lists first within Android and then iOS and mobile web [21:04:58] For what the feature will look like, you can check out the current Android app [21:05:44] It will be much the same, except will allow users to sync their reading lists (which users use for bookmarking and offline reading) so that they will not lose them if they change or lose their device [21:06:03] mdholloway: do you have a screen shot of the feature to post? [21:06:21] * mdholloway finds one quickly [21:06:32] gwicke: does that answer the question well enough? [21:06:58] coreyfloyd: would there be just one list per user, or should the system be designed with multiple lists per person in mind? [21:07:18] DanielK_WMDE: it would be multiple lists per user [21:07:34] all private? or public? [21:07:43] Lists are all private [21:07:43] #info Reading wants to create a service for syncing private reading lists first within Android and then iOS and mobile web [21:07:57] And so this would require a user to be authenticated to access their lists [21:07:58] #info it would be multiple lists per user, all private [21:09:03] coreyfloyd: which questions would you especially like to get feedback on / make decisions on? [21:09:11] One major architectural decision we have to make - as laid out on the RFC - is whether we build this within MediaWiki as an extension using RESTBase as a proxy, or we build it as a separate node.js service [21:09:21] so you have to log in with your wikipedia user account to get reading lists at all? [21:09:43] correct [21:09:47] TimStarling: no, only for syncing purposes [21:09:54] oh right [21:09:54] ^ [21:10:13] how would anon lists work? [21:10:14] TimStarling: the feature will continue to work as it does now for those who do not want to log in [21:10:20] DanielK_WMDE: ^ [21:10:24] They would be local only [21:11:08] so the proposed REST service is only for logged in users, but the client side part would also support local-only lists for anons? [21:11:24] Without logging in, the user can still have reading lists locally. Then the first time they log in, the lists will start being synced. [21:11:25] DanielK_WMDE: correct [21:11:53] DanielK_WMDE: we have to work out UX, but the expectation is that users will need to opt in to syncing in some way [21:11:56] for apps yes, for mobile web I don't think we have decided yet [21:12:09] are the CRUD operations sufficient for reliable syncing? wouldn't a log be needed? [21:12:11] the worry is that storage is much more limited there [21:12:18] tgr: good point [21:12:30] DanielK_WMDE: we have laid out some sync APIs for that purpose [21:12:42] Basically getting “changes since" [21:12:42] #info Without logging in, the user can still have reading lists locally. Then the first time they log in, the lists will start being synced. [21:13:21] DanielK_WMDE: I am guessing this would use If-Match, and resolve conflicts clients side [21:13:22] ah, update timestamp and soft delete... [21:13:35] yep [21:13:35] the GET /lists/reading/changes/since/{date} endpoint would be used for sync [21:13:45] can the feature be extended to the desktop site? [21:13:47] screenshots https://usercontent.irccloud-cdn.com/file/BZhR0mSq/Screenshot_20170614-171133.png https://usercontent.irccloud-cdn.com/file/Y2qMFmM0/Screenshot_20170614-171139.png [21:13:55] TimStarling: yes it can be [21:14:09] TimStarling: we wanted to start out with the Android client first as a testing ground [21:14:19] ok [21:14:32] And then we would do some analysis then move to iOS, then mobile web, and then desktop [21:14:42] coreyfloyd: i'm blurry on sync algorithems. this sounds like it can work, but could get interresting in the nitty gritte. is this following a standard sync strategy? [21:15:13] does soft delete mean rows never get deleted? [21:15:34] DanielK_WMDE: basically just a standard time stamp syncing strategy… clients keep a time stamp of the last sync, and must pass it to get changes since the last sync [21:15:34] (coreyfloyd: What are your plans for Wikipedia's/Wikidata's 358 languages here?) [21:15:53] DanielK_WMDE: it's basically turned into a tombstone [21:16:01] DanielK_WMDE: soft deletes will need to be cleaned up later… [21:16:12] gwicke: and the tombstones are forever [21:16:22] We haven’t decided on the strategy, it could be after a period of time - kill after 30 days [21:16:36] the overall strategy looks quite standard [21:16:52] DanielK_WMDE: basically if a client hasn’t synced in a while, then it will be required do a full sync [21:16:53] coreyfloyd: sounds a bit unreliable [21:17:03] Scott_WUaS: these lists should not be wiki-specific; they can contain pages from any number of projects. [21:17:09] DanielK_WMDE: can you be more specific in how it would be unreliable? [21:17:10] a full sync means losing all local changes [21:17:48] DanielK_WMDE: to be clear… and this has not been decided… a full sync would not be about losing changes, but since it would have been a disconnected client for so long, it is best to do a full sync [21:18:11] DanielK_WMDE: fully syncing does not preclude doing a merge of local changes [21:18:16] coreyfloyd: if i have unsynced local changes older than 30 days, i might loose them. I don't mean to pick on the details here. my concern is the storage schema [21:18:19] you can probably tell when there were local changes that weren't synced yet [21:18:26] you can still use timestamps to merge on a full sync [21:18:43] it seems the timestamp/aoft delete is baked in. i wonder if a log/event based storage should be considered instead. [21:18:46] Scott_WUaS: as far as all languages, mdholloway answered… these are cross wiki [21:18:46] but let's move on [21:19:23] Scott_WUaS: so this will support all languages… and potentially other projects [21:20:00] #info the timestamp/soft delete is baked into the storage model. perhaps a log/event based sync should also be considered. [21:20:01] #info it seems the timestamp/aoft delete is baked in. i wonder if a log/event based storage should be considered instead. [21:20:18] also, beware of race conditions [21:20:19] gwicke: hehe [21:21:13] Any thoughts on the 2 options in the RFC? [21:21:15] this is private data so races are not very likely [21:21:30] unless the user has a phone in each hand [21:21:38] Tgr: lol [21:21:50] tgr: my joke was more about me & daniel info-ing the same thing than the RFC ;) [21:21:53] Maybe not too uncommon nowadays [21:21:59] option 1 seems fine to me [21:22:13] I'm skeptical about RESTBase becoming the new enormous monolithic blob [21:22:25] it can be a path router, that's simple, easy to maintain [21:22:44] that's the idea, I believe [21:22:55] Yeah for proposal 1 [21:23:07] and it seems like jaime favoured option 1? [21:23:09] The only other work from RESTBase would be injection of summaries [21:23:09] tgr: a laptop and a phone and a tablet, all syncing periodically, when they have connectivity. used phone and tablet on the plane, both go online again at the same time... [21:23:11] I think somehow "RESTBase service" has become a synonym for "node service" [21:23:24] tgr: otoh, offline utility of the app is limited ;) [21:23:40] gwicke: I think you are right… I do that myself [21:24:03] to clarify, option 2 is proposing a stand-alone service, right? [21:24:22] proxied through RB, which would do auth & summary hydration [21:24:30] jcrespo said "This has to be integrated into mediawiki or other existing service, as we will not have hw available for proxies or other middleware to have a dedicated service, specially at the beginning." [21:24:40] that's right. just using MediaWiki for the authentication layer. [21:24:49] TimStarling: IIRC he did not have a strong opinion but predicted a larger hardware need for 2 [21:25:16] TimStarling: https://phabricator.wikimedia.org/T164805 has more on the subject [21:25:17] gwicke: yes… it would exist outside of MediaWiki, but use MediaWIki for authentication. And how the service would access the DB is still under discussion [21:26:05] #info option 2 is proposing a stand-alone node service, proxied through RB, which would do auth & summary hydration [21:26:24] coreyfloyd: why sql for storage? just because it's there and we know it? [21:26:29] good reasons, for sure. [21:26:40] but are there others? are there tempting alternatives? [21:26:50] DanielK_WMDE: large queries, known performance and stability characteristics [21:27:04] range queries are important for this use case [21:27:13] the task description says "Reading lists contain primary data (cannot be regenerated from other sources and losing it would have a major UX impact), and data needs to be fetched based on criteria other than the id (e.g. all lists containing a given page, all entries which have changed after a given date) so MariaDB will be used" [21:27:22] which don't scale as well in some of the distributed alternatives [21:27:37] DanielK_WMDE: we did talk about Cassandra but did not seem to be a good fit [21:28:01] range queries are a good point, yea [21:28:56] this service is also not as critical, so doesn't necessarily need to be active-active right away [21:29:58] DanielK_WMDE: we also didn’t want to just store the data as a JSON blob [21:30:20] We want to have each entry be a row, so we can do more interesting queries on the data [21:30:26] if you stored it as a JSON blob you could have combined it with T128602 [21:30:26] T128602: Create and deploy an extension that implements an authenticated key-value store. - https://phabricator.wikimedia.org/T128602 [21:31:15] TimStarling: we looked at that, but then you lose granularity - for instance if we need to do push notifications, how do we know which users have which pages in a reading list? [21:31:24] one big json blob would be bad. but how about one per list entry? not as a page, of course. [21:31:54] Similar issues… if we want o push changes to users when a page changes, then there is no sensible way to query that information [21:31:56] that would make it seasy to add meta-data later. personal notes, last viewed, etc [21:32:32] One part of this that isn’t clear, is that the clients use reading lists as a means to manage offline pages [21:32:39] coreyfloyd: oh, i don't want to get rid of the db table. just have a blob in one of the fields. [21:32:49] that makes it easy to add fields that we don't need to query by [21:33:02] So an important part of having offline pages is updating them when the page changes [21:33:25] TimStarling: also the feedback on that RfC was that people would prefer a dedicated API [21:33:26] So being able to tell which clients need to be notified when the article “dog” changes becomes important [21:33:30] some feedback at least [21:33:34] (Thanks) [21:33:46] #info if you stored it as a JSON blob you could have combined it with T128602 [21:33:46] (I seem to remember brion) [21:34:05] coreyfloyd and tgr will be the implementors? [21:34:16] DanielK_WMDE: some fields could be in a blob if needed… just not the urls [21:34:24] TimStarling: tgr would be implementing [21:34:40] #idea for extensibility, have a field for a JSON blob in the reading_list_entry table [21:34:54] right, and tgr prefers option 1, judging by the task description? [21:34:55] coreyfloyd: sure [21:35:05] yeah [21:35:18] DanielK_WMDE: yeah that makes sense… we have some meta that will probably get added as clients need it over time [21:35:53] coreyfloyd: querying "pages on this list that changed since X" turns this into a watchlist. that opens a pretty big can of worms. [21:35:54] we had a rough consensus on 1 in the Reading Infra / Services discussion I think, but were sufficently unsure to pose it as a question [21:36:03] TimStarling: yeah… I think that it seems easier to do as option 1, but wanted to hear ideas here as well [21:36:47] I believe gwicke (or someone on services) was talking about using this as an opportunity to extract a library that can access MariaDB directly without MediaWIki [21:37:21] that was just jaime & me talking about mysql proxying etc [21:37:34] DanielK_WMDE: yeah - I think “any list of URLs” can bee seen as a watchlist at some level [21:37:48] only if you need to join against recentchanges [21:37:53] that's the crux here [21:38:28] DanielK_WMDE: changes for now are mostly about adds/removes from the list [21:38:29] or, more broadly: change propagation infrastructure [21:38:40] does this thing need to handle page moves? [21:38:54] coreyfloyd: no, i mean edits to pages on your list. so you can update offline pages. [21:39:16] from a testing and fault isolation perspective a stand-alone has some advantages, but I think in this case there are a lot of pragmatic reasons to add this to MW [21:39:39] DanielK_WMDE: I wouldn’t expect that to necessarily live in the reading list service [21:39:59] agreed, conceptually option #2 is better, but involves a lot more work [21:40:06] DanielK_WMDE: some other service may want to query reading lists to see if a particular user needs notified [21:40:44] DanielK_WMDE: originally we were hoping to put it on x1 to support such joins, but jaime said it would not be a good idea [21:40:57] ok, so I will propose that you remove option 2 and we put the RFC to last call in that form [21:40:58] legoktm: I think that page moves are ok as long as we have the redirect follow [21:41:04] but that was more of a "would be nice" thing, not a planned use case [21:41:55] there are a bunch of "TBD" marks in the RfC, if someone would like to advise on them [21:42:05] small things [21:42:16] #info from a testing and fault isolation perspective a stand-alone service has some advantages, but I think in this case there are a lot of pragmatic reasons to add this to MW [21:42:25] (mostly DBA questions though) [21:42:35] coreyfloyd: yea, some change propagation, somewhere. it's tricky to do at scale. but that's out of scope of this rfc, i suppose [21:42:35] #info agreed, conceptually option #2 is better, but involves a lot more work [21:42:49] #info ok, so I will propose that you remove option 2 and we put the RFC to last call in that form [21:42:49] C [21:43:47] any objections to TimStarling's proposal? [21:43:53] nope [21:44:07] you can also move it to an "also considered" section [21:44:09] #info querying "pages on this list that changed since X" turns this into a watchlist; some other service may want to query reading lists to see if a particular user needs notified [21:44:23] just so that it's clear which other options were considered later [21:44:39] DanielK_WMDE: yeah… its mostly forward looking… and just a reason we want to make sure we can query the individual pages later on for such things [21:45:14] (coreyfloyd: and mdholloway: any way to plan for or anticipate translation between wikipedia's 358 languages at this early stage and especially re querying ... re syncing private reading lists first within Android and then iOS and mobile web?) [21:45:58] the goal is to make this available on the app first, on via the web interface later? [21:45:58] Scott_WUaS: translation of what content? [21:46:29] (coreyfloyd: what will emerge in the reading lists) [21:46:38] DanielK_WMDE: that's correct; and perhaps only tentatively on the web (coreyfloyd probably knows better about their plans) [21:46:57] DanielK_WMDE: yes… Android, then iOS, then mobile web, then web - this gives us a reasonable ramp up of users and allows us to vet it from the project from both performance and product perspectives [21:47:25] Scott_WUaS: I’m not sure I quite understand the question [21:47:43] Scott_WUaS: there will be some UI in the apps/web interface that will go through the normal translate wiki process [21:47:44] #action Corey will update the RFC to make it clear that Option 1 is proposed [21:48:00] the product perspecitive is probably quite different on the web interface. people have browser bookmarks, editors have watchlists and user pages [21:48:04] Scott_WUaS: is that what you are asking about? [21:48:20] (coreyfloyd: I'll check out further https://phabricator.wikimedia.org/T164805 re what you mean by reading lists - thanks) [21:48:38] DanielK_WMDE: yeah… Reading Lists are being heavily investigated by the product team in Readers currently [21:48:46] And the designers… [21:49:08] Rita ho just did a survey: https://goo.gl/nC5NpX [21:49:11] why bake "reading" into the name, btw? [21:49:27] (re Reading Lists - https://phabricator.wikimedia.org/T164990) [21:49:30] nothing in the functionality suggests "reading". just lists of pages [21:49:38] could be my "deletion list" ;) [21:50:05] "killfile" [21:50:07] :) [21:50:16] just an artifact of the first planned use case, i think [21:50:17] hehehe [21:50:19] DanielK_WMDE: yeah it was specifically to scope this to reading [21:50:26] And not watch lists for instance [21:51:06] mdholloway: i'd prefer not to have "reading" all the service names and tables... "page lists"? [21:51:07] DanielK_WMDE: we are not sure that this would be the ideal infrastructure for other types of lists [21:51:42] DanielK_WMDE: however, the route naming exists to not preclude the option of having other types of lists (lists/reading/…) [21:51:55] DanielK_WMDE: but I am not against removing it [21:52:47] I guess there is a namespacing concern as well; if this service was to be used for all kinds of other lists, you'd need to be able to distinguish those from the reading lists [21:53:01] I'm fine with reading lists, being specific makes it easier to add features [21:53:10] Why not let users customize the name? [21:53:34] Zppix: users can customize the name of lists [21:53:51] the MW extension could contain both the API for the mobile apps, and also the UI for the websites [21:54:17] Zppix: this is just mostly about the name of the routes - and the name of the service itself [21:54:28] coreyfloyd: oh i see my bad [21:54:29] TimStarling: yeah… that is a possibility for sure [21:54:59] why not incorp this into mw itself? [21:55:06] TimStarling: web (mobile at least) is moving towards relying on APIs instead of the MW skin [21:55:28] coreyfloyd: speaking of other list-of-title use cases, what is the plan for collections? [21:55:33] if you make it generic then you have to imagine all possible use cases when you make b/c breaking changes [21:55:46] where that effort will be at by the time reading lists reach the web interface (planned for Q4-ish) is an open question [21:56:00] (collections is the print-many-pages-to-PDF feature) [21:56:52] https://www.mediawiki.org/wiki/Extension:Collection [21:56:57] gwicke: you mean for how collections relate to lists? [21:57:17] yeah, both need lists of pages [21:57:31] and some people might want to print their reading list.. just speculating [21:57:34] OCG seems to be a favourite punching bag at the moment [21:57:35] TimStarling: we considered building it on top of some kind of generic list feature in MediaWiki core but then decided for rule of three [21:57:38] gwicke: ahh… well they are separate but being thought of… [21:57:45] as an example of an unmaintained service [21:57:51] not enough use cases / impementations yet to generalize [21:57:57] gwicke: collections are public and lists are private… so we have talked about being able to convert between the 2 [21:58:26] and there were vague plans for sunsetting it, but it wasn't clear what that would mean for Collection [21:58:29] gwicke: also we have talked about having a “make a pdf” button for reading lists that will use the collection extension [21:58:42] coreyfloyd: makes sense; some kind of unguessable list UUID could help with that, I guess [21:58:58] TimStarling: yeah… I am not the expert on OCG, but it is being sunset and replaced with something else… that research is in process now [21:59:17] perhaps OCG would be replaced by browser automation? [21:59:27] TimStarling: current plan is to keep (refactor, hopefully) Collection and switch out OCG to Electron (or wkhtmltopdf, decision still pending) [21:59:39] tgr: three, like watchlists, collections, and now reading lists?... [21:59:44] something browser-based, in any case [22:00:18] DanielK_WMDE: well, Collection never had any serious list support [22:00:20] okay, we should wrap up soon [22:00:32] session and wikitext hacks [22:00:47] DanielK_WMDE: how these 3 work together and what can be eventually decommissioned is being looked at… they do have different use cases… but maybe we can get them to the same backend [22:01:06] I didn't hear any objections against Tim's proposal, and I think there were no major objections raised here that would make a last call inappropriate [22:01:20] sgtm [22:01:26] 👍 [22:01:54] #agreed After changes to call out Option 1 as proposed, this RFC will enter its Final Comment Period. [22:02:43] final chance to add anything to the log.. [22:02:50] Nope i agree with tim [22:03:08] #endmeeting [22:03:08] Meeting ended Wed Jun 14 22:03:08 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:03:09] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-06-14-21.02.html [22:03:09] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-06-14-21.02.txt [22:03:09] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-06-14-21.02.wiki [22:03:09] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-06-14-21.02.log.html [22:03:21] great, thanks everyone! [22:03:22] thanks all! [22:03:25] Thanks all [22:03:27] thanks! [22:06:47] that ' in meetbot name is driving me nuts [22:10:36] it's a ` [22:10:48] a backtick