[19:28:16] ohai [19:28:32] hey greg-g [19:28:39] Hey! [19:28:50] :) [19:29:07] dev summit meeting here, right? [19:29:15] yes :-) [19:29:31] Hi! [19:29:56] o/ [19:30:02] hi [19:30:32] I wonder whether we should use the meetingbot or just go ahead. [19:30:44] hi qgil et al. [19:30:54] * thedj votes for meetingbot [19:31:00] +1 [19:31:21] OK, I almost forgot to use it... [19:31:29] Even if I was among the ones asking for it back in the day [19:31:40] hi all! :) [19:31:46] Can someone volunteer, please? [19:32:38] leila, while we're setting up, can you take a look at https://phabricator.wikimedia.org/T148690 now? [19:32:45] anyone else here for the meeting besides greg-g, Birgit_WMDE, cscott, qgil, halfak, thedj, leila & srishakatux? [19:33:05] * qgil looks for meeting bot commands [19:33:25] * leila looks up the link halfak [19:33:27] are there people invited, who might have curled up in a ball due to events, that we might be able to awake with a simple ping ? [19:33:36] https://meta.wikimedia.org/wiki/Meetbot [19:33:39] brion, ? [19:33:45] Daniel wanted to join [19:33:55] rfarrand: me :) [19:34:04] * DanielK_WMDE pokes Birgit_WMDE [19:34:06] #startmeeting [19:34:17] er... nope :D [19:34:32] is it even online / [19:34:39] not here... [19:34:44] OK, letÅ› forget about MeetBot [19:34:44] wm-bot: help [19:35:12] Hi there, this is the Wikimedia Developer Summit Program commitee meeting! First one in real-time! [19:35:13] qgil: let's pretend it's here. then we can grep for #info and #link in the log [19:35:29] First of all, who is here for the meeting? [19:35:33] o/ [19:35:35] o/ [19:35:39] \o [19:35:39] o/ [19:35:40] \o/ [19:35:43] me [19:35:47] o/ [19:36:08] me [19:36:14] leila is [19:36:40] OK, let's say we have quorum. (If you arrive later, please still say Hi!) [19:37:02] \o/ [19:37:05] ;) [19:37:08] I feel like having talked / communicated a lot in many venues already, probably half-confusing most of you. [19:37:34] For this reason, I think it would be good to start by simply you mentioning topics you want to discuss right now [19:37:36] or questions, etc. [19:37:42] We take note, and then we move forward. [19:37:42] ok? [19:37:46] ok [19:38:23] I would like to hear what the program commitee needs from me. I will be providing details on the rooms that can be prescheduled, working out timing & providing a list of the interests of the participants [19:38:43] i'd like to hear more about next steps for organizing/scheduling/winnowing down sessions [19:38:43] qgil: one thing I'd like to ask is whether we have a timeline and clear set of tasks for each of us. [19:39:11] maybe one goal of today's meeting should be to come up with a timeline [19:39:45] I'm hearing timeline... [19:39:48] :) [19:39:49] OK, all these questions have a common theme [19:40:05] i note there is a partial timeline here: https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Call_for_participation#Selection_process [19:40:13] when scheduling, some key questions are: how many (and what kinds of) sessions do we have in parallel? What breaks do we have for room changes? How far are the rooms apart, how big are they? What equipment is available in which room? [19:40:14] Between now and the beginning of the Summit, the main goal of the Program committee is to produce... a Program. :) [19:40:34] #link https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Program [19:41:01] The closest we have to a timeline today is https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Call_for_participation#Selection_process [19:41:10] based on the current proposals, qgil? or are we accepting new proposals? inviting keynote speakers? etc.? [19:41:22] #link https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Call_for_participation#Selection_process [19:41:24] That refers to the selection of proposals that are willing to be pre-scheduled [19:41:30] The proposal deadline was a long time ago [19:42:09] We have enough proposals already. [19:42:24] so, we have a ton of tickets, a rought timeline for proposals, a set of timeslots and rooms [19:42:39] what's the next step? [19:42:40] In fact we got two beyond the deadline, and I will ask what should we do about them. (let's not get stuck with this detail now) [19:42:43] are there more requirements that need to be considered ? [19:42:55] i sec [19:42:58] 1 [19:43:16] There are somme tasks that depend on each other, and we need to solve. [19:43:28] rfarrand: is there an overview of the available rooms? a floor plan maybe? [19:43:32] The selection process depends on how the grid looks like [19:44:00] And the shape of the grid (how many rooms, how long the sessions, where are the breaks...) also depends on us [19:44:01] http://www.presidio.gov/venues/golden-gate-club [19:44:10] considering the limitations of time and space that rfarrand will provide [19:44:27] thedj: yea, i was just looking at that. not much detail, but it's a start [19:44:37] yeah, I can provide a better on [19:44:38] So one conversation we will need to have in the next days is: how does the grid look like? [19:44:42] do we have all spaces ? [19:44:44] with the intended uses for each room [19:44:59] The good news is that (afair) nobody had big objections with the grid last time, so we have a default to start with. [19:45:09] some will be avalible for the pre-scheduled sessions, others will be for unconference and others have other uses alocated. We have the entire venue, both floors. [19:45:16] "the grid"? [19:45:26] Any questions about the skeleton of the schedule [19:45:33] grid = skeleton [19:45:35] there's always the meta-question of "tracks" [19:45:42] "the skeleton"? [19:45:43] I can send a first draft (which can be modified) of intended uses of the space out to this group later today. [19:45:54] how many, are topics mutually exclusive in audience, etc [19:45:56] +1 cscott, I'd also like to talk about tracks [19:46:11] Good, this brings us to the content to fill the schedule [19:46:23] qgil, what schedule? [19:46:42] i'm looking at the 2016 schedule for guidance: https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016/Program#Schedule [19:46:46] tha content will come from a selection of proposals submitted plus other sessions we will inject: opening, closure, perhaps a keynote... [19:46:52] DanielK_WMDE, wrong year? [19:46:56] so this year, we had up to 6 sessions in parallel. But mostly 4. [19:47:04] halfak, that is our reference for this year [19:47:07] Oh! [19:47:35] so qgil asked if there were questions about the skeleton of the schedule. Where is that? [19:47:50] Just as a start: at the moment we are allocating two rooms to the program commitee to fill with pre-scheduled tracks. One is the main room that fits 200 people and the other is a large room fitting 70 people. DanielK_WMDE [19:47:51] DanielK_WMDE, note that we only need to take care of the pre-scheduled rooms (1? 2? 3?) [19:47:57] qgil: do you think last year's grid worked well? doe we need to change it due to specifics of the location? [19:47:59] all the rest will land there as part of the Unconference [19:48:09] Does "skeleton" = "tracks"? [19:48:16] I'm so confused [19:48:22] ok [19:48:25] I feel like there was a call for questions and then we didn't stop [19:48:30] fyi URLs for this year should be in format https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/ [19:48:41] A schedule consists of a table (empty) and then sessions to fill the table, right? [19:48:43] maybe we need an agenda for this meeting, based on the qs asked? [19:48:59] +1 for agenda [19:49:19] We've got 30 minutes left [19:49:27] sorry for jumping ahead [19:49:43] Well, I was still trying to explain the work that needs to be done. [19:49:51] Which, if you wait a bit, is not that difficult to explain. [19:50:20] (don't you love IRC meetings?) [19:50:21] 1. Define the structure of the table (the empty table with rooms and times) [19:50:30] 2. Select the sessions pre-scheduled. [19:50:59] 3. Define the rest of pre-scheduled activities (opening, closure, maybe a keynote, maybe some other exception) [19:51:07] That's it between now and the beginning of the event. [19:51:23] And I would leave our tasks during the event for another meeting, a month or so from now. [19:51:51] OK, then the agenda can be one point for each of the 3 tasks described above. [19:51:56] Comments>? [19:52:17] let's just start with that [19:52:20] Any questions still open about 1? [19:52:20] agh, didn't see the calendar invite until just now :) [19:52:30] ( greg-g :D) [19:53:08] Any questions still open about 1. Define the structure of the table (the empty table with rooms and times)? [19:53:14] well, we need to know the room list, right? [19:53:17] so, copy paste schedule table last year, empty it out and past in new location is task 1 I guess :) [19:53:25] :D [19:53:28] Yes, we will know the room list. [19:53:42] thedj, that would work for me as a start [19:53:45] then given how many rooms, we need to decide whether to fill them all with separate tracks, or just have a single track and the rest unconference or hacking space, or some middle ground between those two extremes [19:54:12] there are 7 rooms, 2 rfarrand just said targetted for pre-scheduled [19:54:21] list of rooms: http://www.presidio.gov/venues/golden-gate-club [19:54:22] I don't think the measurement for tracks or not tracks is the number of rooms, but the nukmber of good proposals for each candidate for a track [19:54:22] cscott: this year, we mostly had two pre-scheduled sessions in parallel. [19:54:29] BUT this is point 2 [19:54:35] Any more questions about point 1? [19:54:36] we also have two days of sessions, so we have to decide whether we're doing the "same thing" (same structure) both days, or else one thing the first day and a different thing the second day [19:54:45] ok, i'll copy and past the old table into a new location [19:55:17] thedj, can you plais just wait a sec? [19:55:19] this isn't point 2, this is point 1. i'm not sure there's consensus on "two tracks in two rooms" yet. [19:55:27] and that's the "structure of the table" [19:55:28] srishakatux, had some feedback about that table vs other presentations [19:55:40] "tracks" = content [19:55:50] thedj: i vote for shifting the schedule to one hour later... [19:55:51] "tracks" = simultaneous sessions [19:56:00] which is the number of columns in the table [19:56:02] thedj: correct. the program commitee currently has 2 rooms of the sizes I listed above for pre-scheduled sessions. 3 smaller rooms will be allocated to the unconference tracks and 2 very very small rooms will be for other uses (not sessions). [19:56:04] sessions = content [19:56:32] that means at any one time we could have 5 sessions ongoing at a time. Probably wont, but could. [19:56:49] Question: is it understood that the program committee needs to define the structure of the schedule (something that we will not do here and now)? [19:56:52] just to throw a wrench in: we could decide to have 2 main tracks and 1 minor track of scheduled content, and 2 tracks of unconference, right? [19:56:56] cscott: usually, tracks refer to content topics. and usually, you don't have two sessions for the same topic in parallel. and often, all sessions about a topic are in the same room. But often, you have fewer rooms than (content) tracks. [19:57:05] having simultaneous unconference tracks and pre-scheduled session tracks seems somewhat fraught [19:57:18] cscott: yes, that is totally fine if you guys want that [19:57:25] how would i be able to participate in the unconference w/o having it potentially conflict with a scheduled session on the same day [19:57:34] cscott, let me insist: discussing specifics about tracks without having any idea of the sessions we have to fill them outt seems premature to me. [19:57:38] cscott: i'd speak of columns, or just rooms, to avoid confusion [19:57:49] cscott, I think that's on whoever schedules the unconference event [19:57:55] DanielK_WMDE: sure. [19:57:55] So it's a non-concern now [19:57:55] cscott: also possible to do scheduled sessions in the mornings and unconferece in the afternoons [19:58:05] DanielK_WMDE: we have 5 columns in the table [19:58:18] can we say anything about how many rows? ie, how long each "session" would be? [19:58:19] i see 6 [19:58:21] The scheduler of the unconference event should make sure it doesn't overlap with anything super relevant. [19:58:33] halfak: that doesn't work very well in my experience [19:58:39] OK, I see that you got 1. right. Let's move onto 2. [19:58:46] we tried that last year, and i was severely overbooked and the scheduler hated me [19:59:00] Let me cut, as timekeeper. [19:59:03] cscott: you are just interested in too much stuff ;) [19:59:05] 20 minutes left,. [19:59:14] Any questions about 2. Select the sessions pre-scheduled. [19:59:24] qgil: yea. how. [19:59:25] well sure, but it's a common complaint for tracks in general, can't do X because of Y [19:59:31] i mean, beyond personal preference [19:59:41] i'm trying to start the meta-conversation about whether or not there's anything we can do about it, or do we just say suck it up [19:59:43] Sure, have you seen the milestones at https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Call_for_participation#Selection_process ? [19:59:44] qgil, do you have a proposal for how we'll select? [19:59:58] I would like the program committee to tell me if they would like to do prescheduled sessions in the morning and unconference in the afternoon OR both at the same time. Don't have to decide now though. [20:00:08] They suggest a first cut next week based on quality of the proposals. [20:00:16] Then a second cut a buit later based on registered interest. [20:00:32] What do you think? [20:00:34] rfarrand, I think both at the same time, but that depends on how many pre-scheduled sessions we have. [20:00:40] If they can fill the day, let's do that. [20:00:54] OK, just like last year [20:01:01] qgil, sounds OK to me. [20:01:02] rfarrand, I'll take note of your questions, but we cannot answer them today. [20:01:07] OK [20:01:16] no problem [20:01:31] i'm wondering what happens between "consolidating a discussion, regularly summarized in the proposal" and "The Program committee publishes the draft schedule" [20:01:51] According to the timeline proposed, by the end of this month we should know which proposals are well formulated and have discussions already started. [20:01:59] and in this context, i'm wondering if we are happy with our Main Topics as they are now. Maybe some are too broad, or too narrow? [20:02:00] * halfak goes to other meeting [20:02:30] Ideally, all those would make the pre-schedule. If they are too many or too little, we will need to apply other criteria (each of you on your main topics, the ArchCom guiding too...) [20:02:44] DanielK_WMDE, we are not touching the main topics. [20:02:46] Or better said [20:02:58] Each facilitator can touch their main topic as much they want. [20:02:58] as a strawman, let's say that *all* sessions are unconference sessions. that is, we put on thumb on the scale by ensuring that our selected "topic" sessions "win" the unconference selection process, but so do a bunch of other things [20:02:59] and then we come up with some mechanism to schedule actual slots for these things based on participants on the fly [20:02:59] like, say, someone writes a little web script that lets everyone list the sessions they are interested in and it cranks out a schedule w/ minimum overlaps [20:03:00] that is, tries to schedule things so that people can go to the maximum number of sessions they expressed interest in [20:03:00] that's a strawman proposal. it involves someone doing coding. ;) [20:03:03] in the interest of fairness, let's float the other strawman: we pick two tracks, unconference topics on the first two days are "best effort" only, and the unconference schedule manually tries to pick times that agree with those who expressed interest, and we acknowledge there will be conflicts but alas that's life. [20:03:09] But we are past the point of questioning them as awhole. [20:04:03] I had one idea that might or might not be related to cscott ideas... [20:04:14] By the deadline on 2016-11-28, [20:04:24] here's another meta question, forgive me for coming in last to this whole process, you may have discussed this already: are we talking big "discuss themes" sessions that combine a bunch of specific proposals, or more WMF-like sessions which have slides and are led by a specific proponent? [20:04:41] We could make a call to all participants registered + wikitech-l and neighboring regions... [20:04:42] (oops, pls table my question until qgil has a chance to finish this idea) [20:04:49] * DanielK_WMDE seems to remember that this scheduling problem is NP-complete [20:05:06] ... asking them to express their opinion about each session using Phabricator tokens [20:05:07] DanielK_WMDE: sure, like many things NP-complete but there are reasonable heuristics [20:05:19] This way [20:05:37] we could have a sense of interest beyond "comments The tokens also have some meaning themselves. [20:05:49] This would also give a chance to unconference sessions to actually register high interest [20:05:54] and become candidates to be pre-selected [20:05:57] What do you think? [20:06:28] I'm having trouble getting participation on the phab tasks in general. I'm not sure how many votes i'd actually be able to solicit. [20:06:52] That's the point, you don't have to solicit. [20:07:06] We make a list with all the sessions organized by main topics + other [20:07:13] We ask everybody to take a lot and give tokens [20:07:16] qgil: if we can get many people to actually do this, i think it's a good basis for session selection [20:07:16] and see what happens [20:07:28] I'm not saying we would make decisions based on that [20:07:29] i like the general idea, i'm just saying i'm not sure that phab is the right mechanism. [20:07:45] but it might be worth a try, and if we don't get participation we can try something else [20:07:47] but it would be an element additional to your criteria as program committee and number / speed of comments [20:08:21] All the session proposals have phab tasks, and all those tasks have a collection of tokens at a single click [20:08:24] simple [20:08:33] simple for people that live in phab [20:08:54] well, if you are attending the summit or interested in the evolution of MediaWiki etc... [20:09:00] qgil: is it fair to record that "have the unconference in parallel" vs "have the unconference after the scheduled stuff" is undecided? [20:09:09] qgil: i don't think that characterization is at all fair [20:09:12] totally undecided [20:09:18] assuming the goal of the summit is to broaden participation [20:09:49] DanielK_WMDE: i think we've "decided' that at least one day of unconference is "after the scheduled stuff"? [20:09:55] turns out that people out there use things that look like tokens in pages that look like phab tasks [20:09:58] i think it's just the programming of the five rooms which is undecided [20:10:21] Wednesday is not unconference [20:10:28] is non-structure [20:10:46] or at least that is the initial idea [20:10:57] Wednesday is for hacking and meetings and whatever reults from the first tweo days [20:11:20] That is why we came with a third day, because people asked for "non-structured"time after the two days. [20:11:24] which mean find a spot, find your peeps (or not if u wish) and get stuff done (be it discussion, code, governance etc). [20:11:30] right ? [20:11:37] This is how I see it, yes [20:12:17] hm. that could well be too much freedom, it may be hard to get the people you need together at the same time in the same place. but i guess we'll find out. [20:12:24] based on last year, we might want to at least encourage people to note down plans they make for that day somewhere. [20:12:52] well, the good news is that we have time to plan Wednesday better. [20:12:53] cscott: you can still schedule a session by posing it on the wall and/or wiki [20:13:06] time check: 7 minutes [20:13:07] i was thinking a small-scale optional unconference, mostly for the proposing things-to-work-on and picking-a-time-for-them parts. [20:13:25] Any questions about 3. Define the rest of pre-scheduled activities (opening, closure, maybe a keynote, maybe some other exception)? [20:13:37] but i guess that can be run independent of any actual organization for the final day [20:13:54] qgil: is the CTO interested in talking? [20:14:08] i think we'd all be interested in hearing her take, either as an intro or a wrap up [20:14:13] I don't know, but Rachel and I have thought about it. [20:14:24] i'd like to propose a single unified session for kickoff and wrap up regardless, actually. [20:14:33] assuming we can all squeeze into a single spot in the venue. [20:14:34] I think we have to leave behind the model of having our hughest managers opening the summit with a keynote [20:14:41] participants last year and the year before asked for unscheduled time to meet with people, unless we have really good reason to do so lets not start scheduling out Wed at this point [20:14:41] highest :) [20:14:54] i'm not thinking keynote so much as "introduction" [20:15:15] but it would also be very interesting to hear her thoughts at the conclusion of the event. what sessions did she attend, what did she learn [20:15:18] I think a non-plenary Q&A with the WMF CTO and the VP of Product would be useful [20:15:41] hm. [20:16:17] qgil: thats what I tried to do with Terry at the Mexico City wikimania hackathon and it did not work super well for a few reasons [20:16:31] a completely unscheduled Q&A [20:16:31] well, i'm looking for events that would bring us together as one org, given that the rest of the schedule will divide us into our parochial interests [20:16:37] Well, the good news is that we don't need to decide this either. :) [20:16:43] Right, empty table: https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Schedule_table [20:16:48] What else can we discuss here and now? [20:16:53] and hearing from our management would certainly be a unifying opportunity. [20:16:54] no times, just rooms [20:16:57] in the few minutes left. [20:17:26] thanks thedj [20:17:41] qgil: when do we meet next, and what's our homework? [20:18:00] Do we need meetings? [20:18:01] thanks thedj: ill mess around with that a bit later today [20:18:09] I mean regular meetings. [20:18:59] i'd suggest at least one more [20:19:02] Current homework can be found and discussed at https://www.mediawiki.org/wiki/Topic:Teiv1oh680cvw876. If you have questions or comments, post them there please. [20:19:21] are we going to do the token-voting thing? [20:19:24] https://www.mediawiki.org/wiki/Talk:Wikimedia_Developer_Summit/2017/Program_committee is where our discussion and our activities happen. Please watch those pages. [20:19:26] that page [20:19:27] ...during the period in which we are supposed to come up with a draft schedule. so, first week of December. [20:19:43] i have concerns about the self-selection that will occur when using phab as the mechanism, but i'd rather use phab than not do the experiment at all [20:19:49] " are we going to do the token-voting thing?" is something that I will comment in that page, and then the rest can give opinions [20:20:04] remember that nmany program committte members were not here today, so no decisions. [20:20:28] DanielK_WMDE, yes, we can have more meetings when we need them. [20:20:36] This was was needed, and I hope it has been useful. [20:20:52] I am just resistant to schedule today another meeting just because... [20:21:10] let's try to hammer out an agenda first next time? [20:21:36] cscott, yes, we cans ay that when we have enough topics for a meetings, we will schedule the meeting. No topics, no meeting. [20:21:47] works for me [20:22:10] Any other questions / comments? We need to finish. [20:22:24] I will post logs and summary in the page linked above. [20:22:33] K [20:22:51] (I'll do it tomorrow morning, if you don't mind) [20:23:22] 5 [20:23:24] 4 [20:23:27] 3 [20:23:29] 2 [20:23:31] 1 [20:23:34] #endmeeting :P [20:23:37] Thank you! [20:24:31] thanks qgil and everyone [22:01:25] #startmeeting RFC meeting: Define an official thumb API [22:01:41] \o/ [22:02:27] no meetbot [22:02:32] poop [22:03:32] and the instructions at https://wikitech.wikimedia.org/wiki/Tool:Meetbot don't work for me, it just closes the ssh connection immediately after login [22:05:23] become meetbot > "You are not a member of the group tools.meetbot." [22:05:35] who are the key participants in this meeting? AaronSchulz? [22:06:13] gwicke I suppose? [22:06:16] i had some interest but starting with wanting to see what updates came along [22:06:41] hey [22:06:58] anomie previously commented but is marked away [22:07:04] someone from the mobile apps or MCS [22:07:21] so, we repeatedly run into the issue of having no API for requesting thumbnails in a client-selected size or quality [22:07:50] this need comes up in the context of apps, lazy image loading, and API design [22:08:06] for example, lots of references to page thumbs currently provide a set of fixed thumb sizes [22:08:28] * bearND is lurking [22:08:31] and every so often another use case needs a different size, which then triggers API changes [22:08:54] the RFC describes requirements and some options for implementing this [22:09:29] the source of the problem is that you can't request arbitrary sizes, otherwise it would just be a matter of changing the size in the url you already have [22:09:34] that's not solved in the current proposal [22:09:44] an important requirement is to preserve cache efficiency, which basically means that we don't want to fragment caches significantly [22:09:52] this is https://phabricator.wikimedia.org/T66214 for anyone without the link handy [22:10:04] why can't you request arbitrary sizes? [22:10:09] I think there are two scenarios worth differentiating: 1 you have a file name and want to get a thumbnail with given parameters 2 you have a thumbnail URL and want to get a thumbnail of the same image but with different parameters [22:10:16] because if it's bigger than the original size, you get an error back [22:10:24] as I understand no one is really interested in 1 [22:10:35] and you'd have to keep requesting smaller ones until you get a thumbnail that actually exists [22:10:35] I mean, I probably wrote the code which caused that limitation, but surely it's just a couple of lines of code to fix if that's desirable [22:10:43] well 1 is a case of 2 where you get the filename by extracting it from the url [22:10:44] the status quo is that thumb APIs are private information, and explicitly not an API [22:11:01] returning an image on requested-larger is an actionable change which simplifies some of these interactions, yes [22:11:02] so you can't construct random URLs unless you are prepared for your app to break any time [22:11:06] tgr: would be nice for the cache if a request by name would redirect to a request by hash [22:11:13] it's just not documented, I don't know where you're getting the information that it doesn't try to be used as an API [22:11:27] everybody can request it and we've alwayd allowed hotlinking and the like [22:11:45] I've generally encouraged people to use it as an API [22:11:49] de facto it works, but there is no guarantee that it will continue to work [22:11:58] I think our current thumbnail URL schema is a de facto API, ugly as it is [22:12:00] and it is not designed as an API [22:12:01] yes, practically we've avoided making major changes to the url structure because they *do* get used ad-hoc as an api :) [22:12:14] it just is poorly documented and has weird edge cases [22:12:17] it's used in the iOS app, MCS, MediaViewer... [22:12:38] MCS is not actually making up random urls [22:12:40] the old schema could redir to the new schema [22:12:46] we could as well as make it official, document it and get rid of the size limitations [22:12:47] having recently written a parser for all its cases, it's not as bad as it first appears [22:12:56] * DanielK_WMDE_ likes redirects [22:12:59] what do you want it to do when scaling up is requested? [22:13:16] redirects cause an extra round trip, they're terrible for people with high latency [22:13:19] just return the original [22:13:25] may be better to return the original yeah [22:13:29] we're talking about several requests per page on average [22:13:32] ok, so let's make that change [22:13:34] i like the idea of a redirect but the latency sucks [22:13:40] or a thumbnail with the dimensions of the original rather [22:13:44] will it break anything to make it return the original? [22:13:46] so one option is to re-define the current format as an api [22:13:49] so no, permanent redirect isn't a viable migration option unless temporary [22:14:03] TimStarling: it currently returns an error so not likely [22:14:28] gilles: sure, when we generate html, we should use a url that doesn't need a redirect. but redirects are still useful for discoverability, and to keep compat, or provide aliases (file names instead of hashes, for example) [22:14:32] make that change, as in redirect to the original in case of requesting a size too large? [22:14:34] parsing and serialization would require custom code [22:14:58] would this work outside the WMF environment? [22:15:17] no, make the change as in stream out the original file with a 200 response code [22:15:18] (people use far more horrible hacks to get thumbnail URLs, like Special: pages) [22:15:40] yeah if you want a redirect we do have a special page for that [22:15:47] TimStarling: that's creating inbounded cache fragmentation, where currently it was at least limited by how wide the image is [22:15:48] the other main option is to consider this as an API design problem, and see if we can do better than the current format [22:15:52] thumb_handler.php is more for streaming [22:15:53] unbouded [22:16:08] and standardize on an official API that is a bit more designed, and easier to use [22:16:32] for instance, would it be beneficial to standardize how parameters are passed? [22:16:44] language for svg [22:16:50] lossless/lossy for tiff [22:16:54] page for pdf, tiff, djvu [22:16:59] time for video thumbnail [22:17:13] I personally think that the gains from a decent API are large enough to warrant cleaning this up once [22:17:36] what gains? [22:17:41] and what about adding more parameters for future rich media types? coordinates for prerendered maps? [22:17:59] I think the current API is ugly but usable (with a few small fixes) [22:18:04] i like hash based urls for files. and we can have the old urls redirect to the new ones for compat. [22:18:05] any API should be extensible [22:18:08] ability to set camera position for 3d model renderings or panoramas? [22:18:09] so far the main version proposed has qualities that are shared with the existing url scheme (low cache fragmentation, strict format, etc.) [22:18:39] gilles: ease of parsing and construction is definitely different [22:18:52] the current URL format is ad-hoc, and not used anywhere else [22:18:57] gwicke: as I've mentioned before, show me code that proves that what you propose is easier to parse. it's not [22:19:12] compared to query strings for example, it's a lot more code to parse it [22:19:17] (Not sure if relevant) Possibly this relates to https://www.mediawiki.org/wiki/Extension:PageImages ? (IIUC that's used to determine the page_image property which is then used by hovercards, action=info, and mobile search) [22:19:34] 4 different characters for separation, logic about some parameters needing to be in some place, etc. it's just as complicated to parse. it just looks nicer at a glance when you don't think about the code involved to parse it [22:19:34] i kinda like query strings for params, they're standard already :) [22:19:37] current de facto API format is 1) break up by '/', get the last component 2) get the first match for px 3) break up the part before that by '-', those are the extra parameters [22:19:40] gilles: you can easily try it yourself [22:19:50] the existing URL format is obviously extensible, considering that we have extended it to support arbitrary parameters [22:19:52] tgr: last component fails if filename is very long [22:19:59] gwicke: why no .png etc file extension on the urls? [22:20:09] tgr: how do you parse the extra parameters? [22:20:12] how are they constructed? [22:20:16] gwicke: you're the one making a case, do the work. if you're not going to bother doing that I have absolutely no faith that you will be ready for the huge undertaking that the migration involves [22:20:17] what is their order? [22:20:18] brion: don't think so, it just won't contain the file name [22:20:34] gwicke: alphabetical [22:20:46] gilles: I trust that you have parsed a query string before [22:21:06] gwicke: it's unsorted, we could sort with minimal B/C break [22:21:32] gwicke: there have been libraries for that in any decent languages for decades [22:21:38] and just have prefix-based ownership which is also the current de facto standard [22:21:56] built-in functions, even [22:22:16] pageXXX, qXXXX etc [22:22:18] gilles: for mediawiki thumbnail syntax? I don't think so. [22:22:28] and that's my point [22:22:40] are query string parameters an acceptable thing for 404-handler setup? what are the practical issues of getting something like that into production versus encoding settings into the filename? [22:22:43] you just said "query string" please me more accurate in your statement [22:22:52] to me, the biggest argument for changing the thumbnail api is to move to hash based urls. [22:22:58] that's not the same thing as mediawiki thumbnail url [22:23:00] the rest is cosmetics [22:23:06] gilles: as you say, one format has built-in support, the other does not [22:23:08] the existing paramaeter order is defined by makeParamString() in the relevant MediaHandler subclass [22:23:18] if the issue is lack of libraries for the existing format, that's very easy to resolve [22:23:30] more importantly, most clients don't care about the part before the size, they just want to change sizes, so we could alter that with limited B/C breakage [22:23:44] DanielK_WMDE_: benefit of hash-based urls being.... versioning? cache sharing for duplicates? fixed length? all good bits i agree :) [22:24:00] DanielK_WMDE_: that's kind of an orthogonal subject [22:24:12] just replace filename with hash and you are done [22:24:15] the event page on phabricator said that we weren't going to talk about hashing [22:24:18] brion: yes :) [22:24:26] DanielKWMDE: I decoupled the two proposals, as the move to hash-based image names will take a bit longer, and can be done in a second phase [22:24:26] unless I misunderstood it [22:24:32] DanielK_WMDE_: including the hash in the thumbnail url is already implemented in core, I wrote that while investigating hash-based cache busting a year ago. the vagrant thumbor role uses that option [22:24:34] ah yes there was talk of separating the hash issue [22:24:52] the second instance of the filename in the thumbnail url is simply replace by the original's sha1 [22:24:57] the RFC is explicit about this [22:25:01] tgr: true. but if we have a compelling reason to change the url format, it makes more sense to discuss niceties of parameter passing. Without the move to hash based urls, there is no pressing need, imho [22:25:06] imho, for that reason and many others, it's a feature need decouple from a url scheme overhaul [22:25:10] decoupled [22:26:00] the most immediately pressing need is the 'given a thumbnail, request same thumbnail in different size' case i think [22:26:04] so, to be clear, is anybody offering to develop the current thumb syntax in a stable & documented API? [22:26:20] which is improved if we make the single change of letting original files through for too-large requests [22:26:39] my point is: if we *don't* want to go to hashes, messing with the current thumb "api" is probably not worth the pain. but if we change the format anyway, we should make it nice. [22:26:41] brion: but that's possible in the current URI scheme. and the new scheme doesn't solve the width > original, which as we've mentioned earlier can be solved by a redirect that doesn't require touching the URI scheme [22:26:47] it sounds like some here favor that solution, but I'm not sure if anybody would be willing to take it on [22:27:02] an additional need, i think important, is to create image urls 'from whole cloth' during editing, parsing, plugin magic, etc [22:27:11] gilles: agreed [22:27:27] DanielK_WMDE_: moving to hash-based doesn't require a URI scheme overhaul [22:27:33] gwicke: I can come up with an RfC if decide to prefer that route, I really don't think it's all that complex [22:27:46] this add'l need will get worse when we add new media types, assuming we do (panoramas, finishing the 3d support, etc) [22:27:56] tgr: doesn't really need an RFC if you are describing the current situation, not proposing a change [22:28:09] I primarily care about having a sane API some time soon [22:28:18] just write about it on mediawiki.org [22:28:25] will adding more parameters on the existing schema re-complicate things that need to deal with media and images? [22:28:39] and I care less about the exact syntax [22:28:40] brion: the question is, to we go through a painful migration before those needs materialize, or do we leave things as-is with the potential implication that new ideas get shot down because it's too hard to do just for the sake of panoramas, for instance [22:28:43] gilles: no, but doing both at the same time may be easier than doing one without the other. and more useful, too [22:29:02] the cost of radical changes to the API is that 1) all media handler extensions need to be rewritten (which would be about time, the way they work is terrible) 2) all clients need to be rewritten, and I don't know if we have a grasp on the size of that [22:29:06] gilles: how painful a migration do we expect? [22:29:18] gwicke: what's insane about the current api? besides "ugly" [22:29:38] maybe the mobile apps and the media viewer are the only ones that actually try thumbnail URL guessing currently, in which case no big deal [22:29:56] tgr: 2) may or may not come with caveat "existing clients may or may not be correct under existing schema" :) [22:29:58] yeah [22:30:05] brion: code to adapt in mediawiki, extensions, VCL, Swift's rewrite.py, probably miscellaneaous puppet, thumbor, apps, restbase. you name it [22:30:06] tgr: i have written toolserver/labs tools that do that [22:30:16] *nod* [22:30:22] key-value maps can encode a lot of things, so I'm not too worried about future parameter passing needs [22:30:45] and i'm still concerned about parameters mapping onto low-level backing files [22:30:48] it also seems that the current options map pretty directly to key-value maps [22:30:52] DanielK_WMDE_: main problem with the current API beyond ugliness is that parameters are completely ad hoc [22:32:07] the current API is meant to be compact and human-readable [22:32:13] ok, we can sort the parameters, and slowly standardize them. use redirects for compat [22:32:18] the way it works internally is that the pre-filename-part of the URL (e.g. qlow-page1-123px) is passed through the MediaHandler inheritance chain and any handler is free to do any kind of processing [22:32:28] append, regex-parse, whatever [22:32:30] which I think maps to insane and ugly from a programmer's perspective [22:32:33] I'm in favor of a key-value format, preferably one that already exists for the sake of available tooling (which is why I brought up the classic ?&= URL format convention). the issues to solve at the Varnish level and client best practices don't add much work to the migration that needs to happen anyway for a URI overhaul to happen [22:32:44] I personally think that cleaning this up will only get more expensive over time, especially once we encourage users to rely on this format [22:32:59] yes, the arbitrary filename adjustments are painful and mean we have to duplicate knowledge of special structures in multiple places [22:33:06] so yeah, a key-value format would be a significant increase in sanity [22:33:23] +1 [22:34:10] with plain old URL query strings you can't really mandate a sort order [22:34:13] caching: we can prefer a canonical order for key-value maps, but will it be honored consistently? what about human usability? [22:34:20] although I guess you can rewrite it in varnish [22:34:20] OTOH we could go with /thumb/Filename/key:val-key:val-width:123px-Filename for example which is similar enough of the current scheme that most tools would not notice the difference [22:34:21] ...should we support the same key/valeu format in the file link syntax, then?... [22:34:59] DanielK_WMDE_: that does raise the related question of how to specify available keywords [22:35:05] tgr: downside is that it's still custom [22:35:10] TimStarling: you can encourage it. it's only problematic if it's inconsistent from the client, for the client's own cache's sake. at the varnish level we would normalize it. so that a random order of the sake parameter values would hit the same cache entry [22:35:14] the file link syntax doesn't distinguish between parameter options and caption text at the syntax level [22:35:22] *of the same [22:35:30] I have personally warmed up more & more to just using query strings [22:35:47] query strings _strike me_ as the right thing, i just am cautious :D [22:36:04] we do already have a key/value API, that's what imageinfo uses [22:36:18] I'm happy with that idea because the cache fragmentation seems solvable at the varnish level. and we need to fix it for our other API calls with are also query string-based [22:36:19] gwicke: yes. I think it would be easier to evaluate trade-offs if we had more of an idea of what clients have built-in knowledge about the current URL schema and to what extent [22:36:38] and we already have thumb.php which streams out files with parameters specified in the query string [22:37:21] The apps and MCS try to change thumbnail widths downwards via regex [22:37:22] thumb.php's scheme is not extensible, that's the main problem [22:37:26] tgr: yeah, I think that's a good point that we should record as a follow-up [22:37:31] indeed, we can use the existing param names used internally... [22:37:34] not without making it even more awkward anyway [22:38:00] we try to stick to certain bucket sizes: 320px, 640px, 800px, 1024px [22:38:02] extensibility is going to be important though, in a way that's as transparent as possible [22:38:14] to the code parsing through things [22:38:51] but that's really for width since that's the only thing that can be changed through URL manipulation [22:39:00] bucket sizes (thumbnail rendering speed) is a whole different can of worms, let's keep that separate IMO [22:39:22] sure, was just trying to answer the question what the apps and MCS use [22:39:27] okay, so it sounds like there is some amount of support for considering moving to query strings, with the main caveat being that we need to gauge the cost by figuring out how many clients rely on the current syntax [22:40:13] I will say one thing on that topic, though, which is that we intend to study the distribution of sizes again with filippo, to determine whether we can move away from storing all thumbnails forever in swift to storing only the most requested formats (de facto buckets based on actual use). if we find that the long tail being cut that way is significant in terms of storage size [22:40:15] ok so jut summarizing a couple things. 1) broad agreement(?) on letting orig file through on requesting oversized thumb. 2) jury still out on whether to use query strings for params, but lots of interest. 3) extensible parameters are important, but need to know more about other params that might be used [22:40:24] we also need to estimate how much change would be needed at the media handler layer [22:40:32] yes [22:40:52] media handlers mostly take the key-value pairs [22:41:09] so i think not huge [22:41:14] brion, for 1) the people who would disagree are probably in the editor community, not the developer one [22:41:16] the implication going forward with the issue of fragmentation is that we'll have a class of thumbnails that are more likely to be misses when they go out of varnish, so on average lower performance, when requesting exotic parameters [22:41:17] also tgr is going to write a spec of the existing situation [22:41:37] but the investigation to see if that's worth doing hasn't happened yet [22:41:52] media handlers use regex parsing, not key-value pairs [22:42:03] but fixing that would be time well spent IMO [22:42:26] tgr: when generating a thumbnail from parameters they use key-value pairs. [22:42:39] tgr: when extracting those parameters from URLs they use regexes [22:42:42] yeah, maybe we can pull out a centralised way of parsing URLs and feeding structured data to media handlers [22:43:10] and then extracting those parameters from [[File:foo=bar]] they use magic word regex chunks [22:43:12] *when [22:43:20] yeah, they transform between a key-value hash and a string [22:44:02] so just replacing that with putting key-value in the query is indeed easy [22:44:14] so i'm very interested in making some stuff happen on this :) anything i can help with on the media handlers end? [22:44:28] maybe thumb.php should give a Content-Disposition header with the human-readable filename [22:44:47] making this work on vagrant without varnish (a.k.a. "small wikis") would be a nice first step [22:45:02] ah yes, the no-varnish question :D [22:45:13] would this require a thumb.php-like intermediary for them? [22:45:15] you know we do that already to work around filename length limits in swift [22:45:25] varnish works on vagrant, if you feel adventurous in the vmod side of the issue [22:45:46] brion, re: replacing original with thumbnail, there is some discussion on the dedicated task, people use originals embedded in articles to demonstrate technical concepts [22:45:52] color spaces and whatnot [22:46:00] which get removed when thumbnailing [22:46:25] tgr: that bears investigation, reminds me of occasional requests to run videos at a specific resolution etc. need to think about a solution there. [22:46:58] as for the redirect to the original, I just remembered that it's a bad idea for EXIF rotation, which we apply and strip on thumbnails. what we really need is an original-sized thumbnail [22:46:59] an option to force fixed-original size would sometimes be useful [22:47:08] hehe yeah [22:47:11] good point :D [22:47:15] brion, tgr, timstarling: in the first step, would you be primarily interested in moving the key-value parsing of the existing syntax into a single step & then pass key-value pairs to the individual handlers? [22:47:38] gwicke: sounds about right [22:48:04] how would we handle migration? varnish magic? [22:48:14] so this would be a prep step that would make it easy to support query strings [22:48:35] gwicke: I think that would be desirable, I don't really have a good sense of how feasible it is [22:48:38] gwicke: isn't that how it works now with MediaHandler::parseParamString ? [22:49:07] brion: varnish magic is needed to avoid doubling the thumbnails stored, yes. rewriting both conccurrent schemes to the same URI varnish bases its caching on [22:49:11] currently parseParamString() is spread out in lots of little bits, in extensions and core [22:49:33] in theory we could instead have a single grammar for it, which is parsed in core [22:49:34] we could have a single back-compat param string parser that handles all known existing options [22:49:35] yeah, there are lots of individual implementations [22:49:41] presumably rewriting the new scheme to the old one, since that's what existing entries are stored by [22:50:06] the output of parsing that grammar might not be the actual b/c key/value pairs, that is the bit that might not be feasible [22:50:12] what would be the point of rewriting how a soon-to-be-deprecated syntax gets parsed? [22:50:56] well, would we still need it for back-compat / migration in updater? [22:50:57] re varnish migration: embedding complex parameter rewriting in varnish sounds fairly ugly [22:51:22] what do you propose? doubling the hardware for the varnish caches we have? [22:51:30] brion: yes, but no need to touch the code for that [22:51:40] we really don't want to duplicate the thumbs themselves, but maybe we could afford some duplication of metadata (headers) [22:51:59] tgr: well, depends whether we want to keep the old piecemeal extension bits or consolidate it into one bit of code that lives in the updater [22:52:17] gwicke: as in, cache redirects and resolve them internally in varnish? [22:52:29] tgr: yeah, something like that [22:52:39] would have to talk to bblack on that [22:52:47] could VCL distinguish between the 99% of thumbs with simple syntax and the "other cases"? [22:52:52] that would probably be useful on a more general level as well [22:52:53] then only double storage on the "others"? [22:53:09] brion: yeah, that sounds like a good idea as well [22:53:19] in case full k/v pairs are too awkward [22:53:28] most thumbs don't have any other parameters [22:54:55] I think that the discussions about varnish need to involve the traffic team, we have to guess too much about what's possible or practical [22:54:59] so, for next steps.. [22:55:18] 1) look into which users rely on the current thumb format [22:56:14] 2) investigate effort needed to clean up the mediahandler parameter parsing [22:56:39] 3) discuss possible migration strategies with the traffic team [22:56:56] and then reconvene with the results? [22:57:00] The apps/MCS get their initial thumbnail URLs from mobileview, Parsoid, or the RB /page/summary endpoint. [22:57:02] We can't control when old versions of the Android app get updated. Can we use some sort of flag in the APIs which include thumbnail URLs for a transition period? If we don't find the /(\d+)px- regex then the older Android app versions won't be able to request a different size. Maybe something akin to formatversion=2 for action=mobileview? Content-type [22:57:02] versioning for RB [22:58:08] bearND: good point! needs to be considered in migration [22:58:38] (it's also conceivable we could keep the XYZpx- prefix as the lone exception to k/v pairs for other options) [22:58:49] ok, I guess we're pretty much done? [22:59:01] sounds like! [22:59:12] brion: that sounds like an interesting option [22:59:20] TimStarling, tgr, brion, gilles, DanielKWMDE: does the summary of next steps make sense to you? [22:59:21] keeping some of the ugliness for nostalgia's sake, right? [22:59:27] it does [22:59:28] gilles: hehehe [23:00:03] cool, thank you all for the discussion! [23:00:20] I will copy the IRC log to somewhere [23:00:23] sounds good to me [23:00:34] do you want it on the task or the event? [23:00:37] gwicke: i'll do some general looking over media handlers this week, will include the parameter handling in my logs [23:00:44] Thank you guys! This is a really pressing topic for the apps and MCS [23:00:47] looks like it has previously been on the event [23:00:56] TimStarling: probably the event yeah [23:03:47] thanks everyone! [23:15:25] Protests planned at 5pm in SF and Oakland. There could be interruptions in transportation. plan accordingly.