[12:57:00] <arrbee>	 Language team office hour/online session starts here soon
[13:00:15] <Nikerabbit>	 #startmeeting Language Engineering office hour - June 2016
[13:00:16] <wm-labs-meetbot`>	 Meeting started Wed Jun 15 13:00:15 2016 UTC and is due to finish in 60 minutes.  The chair is Nikerabbit. Information about MeetBot at http://wiki.debian.org/MeetBot.
[13:00:16] <wm-labs-meetbot`>	 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
[13:00:16] <wm-labs-meetbot`>	 The meeting name has been set to 'language_engineering_office_hour___june_2016'
[13:00:39] <Nikerabbit>	 Welcome to the online+IRC office of the WMF Language team
[13:01:19] <Nikerabbit>	 Our main conversation is happening on Google Hangout/youtube: https://www.youtube.com/watch?v=0FrowkpBEnQ
[13:01:39] <Nikerabbit>	 me and kart_ will be taking questions here
[13:01:57] <Nikerabbit>	 And logs of this channel will be recorded and posted on a wiki
[13:04:28] <Nikerabbit>	 We just did introductions
[13:05:20] <Nikerabbit>	 Runa is explaining what has changed in this three month period with regards to our work
[13:06:23] <Nikerabbit>	 this time we are not only working with Content Translation, but also on Translate and Universal Language Selector
[13:06:48] <Nikerabbit>	 One of the highlights is time compact language links
[13:07:47] <santhosh>	 https://www.mediawiki.org/wiki/Universal_Language_Selector/Compact_Language_Links
[13:09:02] <Nikerabbit>	 Pau is explaining why and how and giving a demo now
[13:12:59] <Cheol>	 Do you have an user profile of language usages for ULS?
[13:15:17] <arrbee>	 #link https://www.mediawiki.org/wiki/Wikimedia_Language_engineering/Code_review_statement_of_intent
[13:18:11] <arrbee>	 Cheol: can you help us understand the question a bit more?
[13:18:13] <Cheol>	 If you edit on EN, JA and KO frequently,..
[13:18:39] <Cheol>	 you would expect them listed at the head.
[13:19:02] <Cheol>	 right
[13:19:46] <Cheol>	 Thanks I see.
[13:24:16] <Nikerabbit>	 More detailed updates about Translate and other projects on technical level can be found in our monthly report
[13:24:20] <Nikerabbit>	 #link https://www.mediawiki.org/wiki/Wikimedia_Language_engineering/Reports/2016-May
[13:24:45] <Nikerabbit>	 Currently Runa is talking about community consultation of Content Translation
[13:28:44] <Nikerabbit>	 #link https://www.mediawiki.org/wiki/Content_translation/Community_Consultation_June_2016
[13:39:28] <santhosh>	 In case you are wondering, Content translation produced 91573+ articles so far :)
[13:39:56] <TinoDidriksen>	 Neat
[13:40:56] <Cheol>	 any plan on Mobile version of Content tool?
[13:46:53] <arrbee>	 Cheol: Does that answer your question?
[13:46:56] <Cheol>	 research first, great! I see.
[13:47:17] <Cheol>	 yes thanks
[13:53:06] <TinoDidriksen>	 Was watching. Couldn't find a place to join the actual broadcast, but wasn't anything extra to add.
[13:55:10] <arrbee>	 #endmeeting
[13:55:16] <Cheol>	 safe trip and good bye, all.
[13:55:36] <arrbee>	 ouch... Nikerabbit you may have to end the meeting?
[13:55:49] <arrbee>	 TinoDidriksen: ohh thats odd. I had sent you the invite.
[13:55:54] <Nikerabbit>	 #endmeeting
[13:55:55] <wm-labs-meetbot`>	 Meeting ended Wed Jun 15 13:55:55 2016 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)
[13:55:55] <wm-labs-meetbot`>	 Minutes:        https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-15-13.00.html
[13:55:55] <wm-labs-meetbot`>	 Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-15-13.00.txt
[13:55:55] <wm-labs-meetbot`>	 Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-15-13.00.wiki
[13:55:56] <wm-labs-meetbot`>	 Log:            https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-15-13.00.log.html
[13:56:15] <arrbee>	 Thanks Nikerabbit
[13:56:16] <TinoDidriksen>	 I know, but I have so many Google accounts. I should probably have used another one.
[13:56:28] <arrbee>	 TinoDidriksen: heh. Okay, next time maybe. :)
[13:56:38] <arrbee>	 Thanks for watching
[14:28:46] <TinoDidriksen>	 Hm, no, invite was sent to the ID I was logged in as, but I couldn't find an option other than watch. Oh well.
[20:59:24] <robla>	 I'm chairing the upcoming RFC meeting but don't have a literal chair secured yet  :-)
[20:59:38] <robla>	 the meeting: https://phabricator.wikimedia.org/E213
[21:00:42] <DanielK_WMDE__>	 robla: if you find a literal sofa, will you be sofaing the meeting?
[21:00:49] <DanielK_WMDE__>	 hey yurik
[21:01:05] <yurik>	 hi DanielK_WMDE__
[21:01:39] <SMalyshev>	 hey all
[21:01:41] <cscott>	 if he doesn't find a chair he's going to stand us up
[21:01:58] <robla>	 alright, going to start it...
[21:02:07] <robla>	 #startmeeting T120452 Technical aspects of Data namespace blob storage
[21:02:08] <wm-labs-meetbot`>	 Meeting started Wed Jun 15 21:02:08 2016 UTC and is due to finish in 60 minutes.  The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot.
[21:02:08] <wm-labs-meetbot`>	 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
[21:02:08] <wm-labs-meetbot`>	 The meeting name has been set to 't120452_technical_aspects_of_data_namespace_blob_storage'
[21:02:08] <stashbot>	 T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML) - https://phabricator.wikimedia.org/T120452
[21:02:08] <yurik>	 hi everyone, thx for making it to the discussion that shall change the face of the earth... again
[21:02:35] <robla>	 #topic Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
[21:02:44] <cscott>	 let's change all the things!
[21:02:58] <yurik>	 cscott, oh no, not again!
[21:04:12] <SMalyshev>	 question: so this thing is using content handler from JsonConfig?
[21:04:38] <robla>	 so, Yuri, things ArchCom just discussed in our last meeting (E212) is "is this a question for ArchCom?" and "should this be a formal RFC?"
[21:04:55] <yurik>	 SMalyshev, correct
[21:05:34] <DanielK_WMDE>	 I propose that we don't discuss "on which domain shall it live" today. that's a community/product question which should be discussed elsewhere, i think
[21:05:53] <yurik>	 robla, there are clearly two topics, just as brion said.  The social aspect - should this be on commons - should probably be discussed with the community, and so far the majority of commons seems to be in favour. What I do want this meeting to address is the technical aspects
[21:06:01] <DanielK_WMDE>	 that is to say, that's the part of this that i think is not a question for archcom
[21:06:21] <yurik>	 ok, seems like we are in an agreement on that one
[21:06:21] <SMalyshev>	 yurik: ok. It may be worth it to make it more search-friendly, but I guess that is generic for all JsonConfig things then
[21:06:49] <yurik>	 SMalyshev, sure, search friendliness is definitly on the todo list
[21:06:51] <DanielK_WMDE>	 SMalyshev: how well do you think will the structured search interface you are working on work with tabluar data?
[21:07:20] <DanielK_WMDE>	 would it be possible to search for values in spoecific columns, for example? if the content handler does it right, i mean
[21:07:23] <SMalyshev>	 DanielK_WMDE: it might, but I wouldn't go that far for starters. I'd just start with something like being able to search the description of the dataset
[21:07:28] <cscott>	 (i personally like the idea of making a namespace available from every wiki, then letting social pressures move stuff around as necessary.  eg if i want to index the winners of the US "Dancing with the Stars" maybe that live on enwiki in Data:DancingWithTheStars.json, since that's US-specific and there's a different "dancing with the stars" in basically every country.)
[21:07:36] <SMalyshev>	 DanielK_WMDE: search inside the data is much bigger fish to fry
[21:08:12] <cscott>	 hm.  i agree that search can be done later.
[21:08:20] <DanielK_WMDE>	 it actually shouldn't be much harder than making the description searchable. but yea, let's start6 small
[21:08:21] <SMalyshev>	 yurik: my next question: is it attached to specific namespace or can be on any namespace?
[21:08:27] <cscott>	 like search of wiki articles is essentially separate from storage/viewing those articles
[21:08:30] <brion>	 of course we'll have articles on several different countries' shows in each language...
[21:08:49] <SMalyshev>	 cscott: not anymore :)
[21:09:06] <cscott>	 brion: right, at that point it starts to make sense to move it to commons.  but the community can decide this themselves organically.
[21:09:12] <SMalyshev>	 cscott: at least not completely. I'm working on a patch that makes ContentHandler handle both
[21:09:13] <brion>	 hmmm, interesting
[21:09:34] <cscott>	 SMalyshev: "not anymore" re search?  not sure what you were responding to.
[21:09:36] <DanielK_WMDE>	 SMalyshev: i suggest to rely on the content model, not the namespace, for any special processing. currently, content models are mostly bound to namespaces, but that can change
[21:09:38] <brion>	 cscott: so put the system on *every* wiki that doesn't opt-out, plus a common backing, like File: uploads?
[21:09:50] <SMalyshev>	 cscott: yes, re search. But let's not get too offtopic :)
[21:09:58] <cscott>	 brion: that would be my suggestion.  it also ensures we're running the same software everywhere.
[21:10:12] <cscott>	 i have a proposal for a new template system that could really use data blob storage.  for example.
[21:10:13] <SMalyshev>	 DanielK_WMDE: yes, I agree. But how you specify that you want to create page in certain content model?
[21:10:19] <brion>	 *nod* sensible :D
[21:10:26] <yurik>	 cscott, i am not too happy about allowing it everywhere.  We already got burnt by it waaay too many times - i would much rather keep it in one place, allow multiple wikis to reuse it, and name it accordingly - "dancing with the stars US 2015"
[21:10:28] <cscott>	 https://phabricator.wikimedia.org/T114454
[21:10:50] <DanielK_WMDE>	 SMalyshev: currently, via the namespace or a title suffix (like .css or .js)
[21:10:56] <cscott>	 yurik: sure, default should be commons, but that should be social pressure not technical restriction.
[21:11:02] <SMalyshev>	 DanielK_WMDE: ok, got it
[21:11:11] <SMalyshev>	 sounds reasonable
[21:11:17] <DanielK_WMDE>	 SMalyshev: but that'S just at the time of creation. after that, we should just look at the model associated with the page
[21:11:26] <robla>	 #chair DanielK_WMDE brion robla TimStarling Krinkle
[21:11:27] <wm-labs-meetbot`>	 Current chairs: DanielK_WMDE Krinkle TimStarling brion robla
[21:11:31] <DanielK_WMDE>	 it's actually stored separately for every revision
[21:11:41] <yurik>	 cscott, i would wait until community explicitly demands that feature. It is much easier to enable than to disable
[21:11:55] <yurik>	 we can always enable it later if needed, but disabling is next to impossible
[21:12:05] <yurik>	 so for the step 1 i would really like just 1 wiki
[21:12:09] <DanielK_WMDE>	 yurik: T120452 says "CSV, TSV, JSON, XML". But we are down to just a specific JSON format now, right?
[21:12:10] <stashbot>	 T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML) - https://phabricator.wikimedia.org/T120452
[21:12:11] <yurik>	 and wait for the feedback
[21:12:18] <SMalyshev>	 only thing is that now we are using model that's called JsonConfig for some things that are definitely not config... but I guess it's too late to rename...
[21:12:30] <yurik>	 DanielK_WMDE, my first goal is to get two types of data out :  tabular and maps
[21:12:40] <cscott>	 yurik: will https://phabricator.wikimedia.org/T91162 let me refer to the data store on commons as if it were local?  (like instantcommons does)?
[21:12:50] <yurik>	 maps would allow geojson storage (so that all maps can overlay with extra stuff), and tabular - .... is tabular :)
[21:13:00] <DanielK_WMDE>	 yurik: maps as in geo-shapes?
[21:13:05] <yurik>	 DanielK_WMDE, yes
[21:13:06] <DanielK_WMDE>	 or full maps?
[21:13:08] * robla will brb
[21:13:08] <cscott>	 yurik: one issue might be (for example) country-specific data, which then gets hung up on different countries/wiki's ideas of which are valid countries
[21:13:09] <yurik>	 and pushpins
[21:13:12] <cscott>	 ie, is "taiwan" a country?
[21:13:37] <cscott>	 yurik: allowing zhwiki to override the table from commons with a zhwiki-specific table is a valuable way to defuse that situation
[21:13:50] <cscott>	 and i think wikidata already has something like this, where certain facts are only true for certain wikis?
[21:13:59] <DanielK_WMDE>	 cscott: they can just choose to use a different table. much simpler solution.
[21:14:12] <cscott>	 DanielK_WMDE: how does wikidata handle this?
[21:14:15] <yurik>	 cscott, no, not via shadow.  The current implementation will only target Lua and Graph users at first, which means Lua will simply say   mw.data.get('Page.tab') - and use that data
[21:14:23] <SMalyshev>	 yurik: I wonder if we need a higher level API to operate with such data. I.e. if I want to store a tabular data set, I don't want to need to know specific JSON scheme (which could also change)...
[21:14:23] <DanielK_WMDE>	 but i thin the "where does it live" question is out of scope here.
[21:14:28] <yurik>	 we can add shadow later if requested
[21:15:10] <SMalyshev>	 yurik: and by operate I mean not just read (Lua probably covers that) but also write
[21:15:15] <TimStarling>	 presumably Lua will see the decoded object, not the JSON-encoded format?
[21:15:22] <cscott>	 i'm fine with deploying first without instantcommons/shadow and on a single wiki, but i'd like to state for the record that, if this functionality turns out to be useful, we'll eventually need that functionality.  we should ensure that we're not *foreclosing* that possibility, even if we're not initially enabling it.
[21:15:29] <DanielK_WMDE>	 cscott: in theory, wikis can pick specific statements over the default for a specific property, or they can filter by well-known authorities being cited as sources. i don't think this is actually being done, but the data model is specifically designed to allow this
[21:15:29] <yurik>	 cscott, the idea here is to provide the most basic usage that will cover 80%. If we get strong desire for 1) multi-wiki storage, or 2) multi-wiki overrides, we can always add that
[21:16:02] <yurik>	 TimStarling, correct
[21:16:17] <yurik>	 TimStarling, more specifically, Lua will get the json as table
[21:16:39] <DanielK_WMDE>	 yurik: i see some overlap between the use cases of wikidata queries and tabular data. it would be nice if the formats and interfaces would be very similar, of not identical.
[21:16:39] <yurik>	 so that it has access to all the meta fields
[21:16:46] <yurik>	 BUT, we can provide additional helpers to resolve the multi-lingual resolution
[21:16:46] <cscott>	 again, not many people have grokked T114454 yet, but the basic idea there is to separate code, content, and presentation, so every template potentially will have a "data" component, along with the "code" and the "presentation" component.
[21:16:46] <stashbot>	 T114454: [RFC] Visual Templates: Authoring templates with Visual Editor - https://phabricator.wikimedia.org/T114454
[21:17:02] <yurik>	 DanielK_WMDE, agree - we have discussed it briefly with Lydia_WMDE
[21:17:41] <cscott>	 assuming some basic separation of that happens in the future, we'll want the data namespace to be roughly on par with the template namespace.  ie, shadowed from a default on commons, override-able from specific wikis.
[21:17:56] <YairRand_>	 maybe have data blobs be a wikidata datatype, like commons file
[21:18:05] <DanielK_WMDE>	 yurik: to be compatible with wikidata, the representation of data values would have to become more complex. but i'm not sure whether we should require that, or offer it as an optional ferature.
[21:18:12] <DanielK_WMDE>	 yurik: btw, multilingual is pretty complex when it gets to the nitty gritty. do you think it's really needed from the start?
[21:18:13] <yurik>	 DanielK_WMDE, lets sync up afterwards to see if we can match wikidata api with this, or if they should go different routes
[21:18:46] <brion>	 could also do things like instead of multilingual text, refer to a wikidata item and then look up its data by name .... though that may have perf issues with large batches
[21:18:47] <brion>	 :D
[21:18:49] <DanielK_WMDE>	 yurik: not tonight, i'm going to bed after this :) will you be at wikimania?
[21:19:06] <yurik>	 DanielK_WMDE, absolutelly - i really think simple "multi-lingual" feature that will allow a fallback is something we need from the start
[21:19:07] <TimStarling>	 ideally Lua would see a read-only wrapper object like what is returned from mw.loadData()
[21:19:12] <DanielK_WMDE>	 brion: yes, wikidata Q-id is a very useful datatype to have
[21:19:16] <yurik>	 DanielK_WMDE, sadly, no Wikimania for me - no budget :(
[21:19:28] <TimStarling>	 that way you don't have to clone it for each #invoke instance
[21:19:39] <DanielK_WMDE>	 yurik: sad. let's find another time and place then.
[21:20:01] * aude wavez
[21:20:10] <DanielK_WMDE>	 yurik: it would be nice to re-use the language-fallback stuff we have in wikidata. we should factor it out into a library, i guess.
[21:20:15] <yurik>	 DanielK_WMDE, indeed it is :(   Yes, lets. I will schedule a hangout with you. Anyone else - pls poke me if you want to parttake-
[21:20:19] <DanielK_WMDE>	 hey aude
[21:20:33] <yurik>	 DanielK_WMDE, sure, but i already have something like that for the zero banners that i'm reusing
[21:20:35] <yurik>	 but sure
[21:20:58] * robla returns to the meeting he's allegedly chairing :-)
[21:21:11] <SMalyshev>	 yurik: so what do you think about having higher-level API to manipulate specifically the tabular data?
[21:21:17] * yurik thinks chairing !== participating ;)
[21:21:40] <yurik>	 SMalyshev, "manipulate" is out of the scope i think at this point.  I'm all for it though :)
[21:21:54] <DanielK_WMDE>	 yurik: do you want to start out with your own (simpler) data types for now (and spend tiem to specify them properly)?  or do you want to go with the representation that is used by the wikidata? that's already in reusable libraries
[21:21:55] <SMalyshev>	 yurik: thinking forward :)
[21:21:57] <yurik>	 especially because i can totally see some pages being custom-defined to store data in the backing SQL
[21:22:03] <yurik>	 SMalyshev, ^
[21:22:38] <yurik>	 DanielK_WMDE, i would like to match datatypes in wikidata as much as possible, but probably only provide a subset of them from the start
[21:23:04] <SMalyshev>	 yurik: well, there's two venues here: a) run external query, store results on wiki (I don't want to know too much details about how wiki stores it)
[21:23:13] <DanielK_WMDE>	 yurik: yea, an "table aware" storage backend is an interesting idea, and fits in with the blob-store refactoring i'm thinking about. but it's for later.
[21:23:14] <yurik>	 if that means re-implementing some of it first - lets, because otherwise we might spend years making it perfect only to realize that community needs are totally orthogonal to wikidata usage
[21:23:25] <SMalyshev>	 yurik: and b) represent internal query as data set on wiki (e.g. WDQS query)
[21:23:47] <yurik>	 SMalyshev, yep, that's what DanielK_WMDE is talking about i think. But again, lets not discuss it now :)
[21:23:49] <SMalyshev>	 all that needs clean API so that clients don't know too much
[21:23:57] <yurik>	 otherwise we might be redesigning SQL engine next ;)
[21:24:16] <SMalyshev>	 that's why I mention it - if we make it too specific now, it'd be hard to change it ater
[21:24:18] <SMalyshev>	 *later
[21:24:47] <yurik>	 SMalyshev, that i agree. But remember, the use case here is for Lua to GET EVERYTHING, and deal with it.  If we say we want SQL-like  GET EVERYTHING THAT MATCHES THE WHERE CLAUSE, we might get into all sorts of weird issues
[21:25:13] <yurik>	 especially because we might go the route that is not needed (yet or ever)
[21:25:26] <brion>	 huge-data-set needs are quite different yes
[21:25:27] <DanielK_WMDE>	 yurik: ok then. have you looked at how wikidata represents data values? e.g. look at https://www.wikidata.org/wiki/Special:EntityData/Q42.json
[21:25:38] <SMalyshev>	 yurik: Lua is a good enough API if we don't get too specific about the structure
[21:26:08] <yurik>	 exactly brion - that's what we actually discussed in the task earlier - dealing with large datasets is a very different beast, with a different reqs
[21:26:10] <aude>	 DanielK_WMDE: i'm not sur eabout duplicating all the json for each value
[21:26:12] <DanielK_WMDE>	 yurik: e.g. we have something like {"snaktype":"value","property":"P577","datavalue":{"value":{"time":"+2002-01-01T00:00:00Z","timezone":0,"before":0,"after":0,"precision":9,"calendarmodel":"http://www.wikidata.org/entity/Q1985727"},"type":"time"},"datatype":"time"}
[21:26:13] <stashbot>	 P577 (An Untitled Masterwork) - https://phabricator.wikimedia.org/P577
[21:26:24] <aude>	 if all the values have the same calendar, e.g.
[21:26:30] <DanielK_WMDE>	 yurik: the "value" thing is what i think should be in your table fields.
[21:26:54] <aude>	 then have calendar, before /after + array of timestamps?
[21:27:15] <DanielK_WMDE>	 yurik: the "time" data type is akind of a nice nasty example. it'S json representation really isn't too great, and i'd love to change it... we'll probably have to use a new type id for the new version, not sure yet
[21:27:24] <aude>	 maybe precision might vary though
[21:27:37] <yurik>	 DanielK_WMDE, i would prefer to go with the tabular data as defined by the industry for tabular data (see the bug), but for specific datatypes like time - sure
[21:28:01] <DanielK_WMDE>	 aude: yea, i at least wouldn't dublicate the time. we could have a "defaults" row, that gets merged into every value.
[21:28:11] <aude>	 the json is very verbose
[21:28:12] <DanielK_WMDE>	 a bit hacky, but woudl work...
[21:28:21] <aude>	 just saying...
[21:28:23] <yurik>	 DanielK_WMDE, btw, time is not part of this proposal just yet :)
[21:28:32] <yurik>	 too complex to have it in ver1
[21:28:42] * aude agrees with yurik 
[21:28:49] <aude>	 start simple
[21:29:06] <yurik>	 it can be easily added later - simply add a new type, and make the value object mean as DanielK_WMDE described above
[21:29:21] <DanielK_WMDE>	 yurik: yea, sure. for string-based types, sime literals work fine. for numbers, too. once we get into measured quantities, things get more complex
[21:29:38] <yurik>	 DanielK_WMDE, default row is fairly complex - should it be the "null" that gets used as the default?
[21:29:41] <aude>	 counts could be ok
[21:29:45] <SMalyshev>	 oh let's not get into units :)
[21:30:07] <yurik>	 agree, lets get back to overall strategy :)
[21:30:13] <DanielK_WMDE>	 yurik: we should make sure to avoid naming conflicts. if you define a type name and use it with a different format than wikibase does, that will become annoying
[21:30:21] <robla>	 #info much of the first half of the discussion was about defining datatypes
[21:30:24] <yurik>	 DanielK_WMDE, agree
[21:31:13] <DanielK_WMDE>	 yurik: the idea with the defaults row wats that you can e.g. say that all dates in a column use the same caledar, or all coordinates refer to earth, without havign to repeat that info for every field. but that's an optimization that can be added later.
[21:31:49] <yurik>	 ver1 datatypes:  strting, numbers, multilingual strings.  I don't even know if i want to allow bools for now.
[21:32:11] <robla>	 #info DanielK_WMDE and yurik agree to try to avoid naming conflicts (e.g. with wikibase types)
[21:32:13] <aude>	 multilingual gets somewhat complex
[21:32:22] <yurik>	 these three types should cover almost 90% of the usecases from the start - simply because it will be Lua doing the processing and presenting of the dat
[21:32:24] <yurik>	 data
[21:32:25] <SMalyshev>	 yurik: bools are just numbers 1/0. Or strings yes/no :)
[21:32:33] <aude>	 and numbers (units? no units / counts?)
[21:32:34] <DanielK_WMDE>	 yurik: i think the way you represent multilingual is different from what the DataValues lib does. but wikidata doesn't use multilingual yet, so it can be changed
[21:32:44] <yurik>	 aude, simple JSON numbers
[21:32:55] <yurik>	 which means if you want units, you add a string column
[21:32:56] <aude>	 yurik: like counts
[21:33:00] <SMalyshev>	 aude: I don't think we need units and associated headache. We havent' properly figured them on wikidata even
[21:33:14] <aude>	 SMalyshev: that's why i am asking :)
[21:33:16] <DanielK_WMDE>	 robla: i think data types are a crucial issue. but i agree that we should leave some room for other topics ;)
[21:33:41] <yurik>	 DanielK_WMDE, lets make this part of our wikidata-jsonconfig sync up meeting
[21:34:06] <yurik>	 are there any other issues that people are concerned about?
[21:34:22] <aude>	 yurik: btw, maybe you can visit berlin before SOTM in belgium?
[21:34:27] * DanielK_WMDE thinks that values from a query api will actually be full "snaks"...
[21:34:30] <SMalyshev>	 yurik: are there any limits on how big it can get?
[21:34:36] <aude>	 and we can talk more of the details
[21:34:43] <yurik>	 SMalyshev, 2mb - same as a wiki page
[21:34:50] <DanielK_WMDE>	 oh, a visit sounds nice!
[21:34:53] <SMalyshev>	 ok
[21:34:55] <yurik>	 because it uses storage engine
[21:35:36] <DanielK_WMDE>	 do you think we will want to expand to very large data sets later?`
[21:35:37] <robla>	 yurik: can/should you formalize T120452 as an ArchCom-RFC?
[21:35:37] <stashbot>	 T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML) - https://phabricator.wikimedia.org/T120452
[21:35:53] <yurik>	 DanielK_WMDE, aude, do we really want to wait until september to deploy this?  JsonConfig has been in production for the past 2 years, for all wikis (as part of the zero system)
[21:35:55] <robla>	 (at least the technical side)?
[21:36:28] <yurik>	 DanielK_WMDE, i don't want to tackle large datasets until after this thing has had some usage, e.g. half a year
[21:36:30] <robla>	 (perhaps T134426 is the right one to focus on)
[21:36:30] <stashbot>	 T134426: Review shared data namespace (tabular data) implementation - https://phabricator.wikimedia.org/T134426
[21:36:51] <DanielK_WMDE>	 yurik: that's a pretty brisk pace ;)
[21:37:04] <yurik>	 DanielK_WMDE, agree, i will wait a year until large datasets :D
[21:37:05] <DanielK_WMDE>	 if you really want to support data sets by then, you better start thinking abotu that early
[21:37:32] <yurik>	 but yes, it should be in the back of our minds, but shouldn't be fully speced until later
[21:38:02] <yurik>	 robla, i think there is another task that formalizes how the system works
[21:38:05] * yurik looks
[21:39:13] * robla waits patiently
[21:40:23] <DanielK_WMDE>	 how about directly transclusing a table into a wiki page? how would that work? do we newed that? or do we rely on lua for that?
[21:40:29] <yurik>	 robla, i think its in https://www.mediawiki.org/wiki/Extension:JsonConfig/Tabular
[21:40:55] <yurik>	 DanielK_WMDE, even though i do have it implemented (as a template expansion), i don't think its a usecase
[21:41:02] <yurik>	 simply because there is really no big reason for it
[21:41:18] * robla notes that the Extension:JsonConfig points to T120452
[21:41:18] <yurik>	 it is always very usage dependent - e.g. show a list generated from a table
[21:41:36] <DanielK_WMDE>	 yea, you'd always want som custom stuff anyway
[21:41:39] <robla>	 yurik: is T120452 the right Phab task?
[21:41:40] <stashbot>	 T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML) - https://phabricator.wikimedia.org/T120452
[21:42:35] <DanielK_WMDE>	 can we remove the formats from the title? i think they are misleading now.
[21:43:09] <yurik>	 DanielK_WMDE, agree, but please keep in mind that as part of this discussion i would really like geojson (maps overlays) to be agreed on as well
[21:43:20] <yurik>	 actually geojson is much simpler than tabular
[21:43:33] <yurik>	 it is a well established format, and we are already heavily using it in maps
[21:43:36] <robla>	 yurik: what Phab task do you want to declare as an ArchCom-RFC?
[21:43:55] <yurik>	 robla, that one is fine i think - we some refining to the title and description
[21:43:55] <cscott>	 I wonder if extensions are really the right way to select data type?
[21:44:06] <yurik>	 cscott, what do you mean?
[21:44:22] <cscott>	 since there is some discussion of types, for instance, it might be that we start with a very simple "json" but later have a more typeful "json" with the date-type figured out, etc.
[21:44:31] <cscott>	 .json is going to get overloaded quickly
[21:44:40] <cscott>	 mime types would be much nicer
[21:44:47] <cscott>	 but then that begs the question of where they get stored
[21:44:51] <yurik>	 cscott, i actually don't want .json - it will be heavily misused from the begining, no?
[21:44:58] <cscott>	 still, storing the data type separately from the data/article name is not a bad thing.
[21:44:59] <yurik>	 and we won't be able to do some proper editing
[21:45:15] <DanielK_WMDE>	 cscott: internally, it will be represented as a content model id. the extension is one way to indicate that.
[21:45:17] <yurik>	 if we from the begining define a rigid structure, we can add useful tools
[21:45:33] <yurik>	 so for tabular, VE can have a nice editor of values (like a spreadsheet)
[21:45:39] <SMalyshev>	 .json is way too generic... I think it'd be better if tables and geojson had their own spaces
[21:45:40] <yurik>	 actually it won't even be a VE on commons
[21:45:40] <cscott>	 DanielK_WMDE: will we be able to eventually just associate a mime type with the content?
[21:45:42] <DanielK_WMDE>	 hm... will geo-shapes and tables live in the same namespace? with different suffixes/extensions?
[21:45:53] <robla>	 yurik: If that becomes an ArchCom-RFC, then you won't be the assignee, and Danny Horn will be the author.  Is that the desired outcome?
[21:45:56] <yurik>	 DanielK_WMDE, yes
[21:46:05] <cscott>	 i don't mind Data: as the namespace.  i'd rather have that than GeoJson: Tables: etc etc
[21:46:14] <yurik>	 robla, i hear you, ok,  i will create a new task
[21:46:37] <yurik>	 DanielK_WMDE, example:    Data:Don Qixote Trip in Spain.geojson
[21:46:37] <DanielK_WMDE>	 cscott: a content model id, not a mime type. the mime type specifies a serialization format, like json or xml. that's also stored, but kind of redundant. the important info is what model/vocabulary/scheme the data is using.
[21:46:45] <cscott>	 fwiw Scribunto/JS has this same issue -- there's no way to specify which *language* the module is in, in the Module: namespace.
[21:46:49] <MaxSem>	 I agree with cscott
[21:46:51] <DanielK_WMDE>	 cscott: we already do that. that's how contenthandler works.
[21:46:57] <brion>	 my one concern about separating type is that if a table changes type, will that break usage? :)
[21:47:11] <yurik>	 that's why we from the begining define extensions
[21:47:13] <robla>	 #info yurik agrees to create a Phab task for use as an ArchCom-RFC
[21:47:19] <brion>	 (eg if you change an image from .png to .svg you can still use it the same from wiki side, but for tables it may matter more)
[21:47:22] <cscott>	 brion: possibly, but that's no different from a rename breaking usage, or any other edit breaking usage.
[21:47:28] <yurik>	 JsonConfig will be set up to only allow pages that match REGEX
[21:47:32] <brion>	 *nod*
[21:47:36] <DanielK_WMDE>	 cscott: you could indeed use a file extension to indicate whether a modules is JS or Lua. Just add .js or .lua
[21:47:40] <brion>	 and we really should rename JsonConfig ;)
[21:47:43] <yurik>	 so it will be  Data:.*\.tab
[21:47:54] * cscott is not a fan of file name extensions
[21:47:55] <yurik>	 no other pages will be creatable in the data namespace
[21:48:00] <cscott>	 not i18n friendly
[21:48:04] <DanielK_WMDE>	 cscott: internally, that would just define the content model to use when creating the page
[21:48:06] <cscott>	 not human friendly, really
[21:48:18] <yurik>	 cscott, the only other option is to have multiple namespaces - and the community (and i personally) really hate that
[21:48:35] <brion>	 well, the other option is to have some sort of content model selection in the creation process
[21:48:40] <DanielK_WMDE>	 i kind of like to have that info in the title, cscott
[21:48:41] <brion>	 which implies UI etc
[21:48:44] <cscott>	 no, i'm just saying that the content model should be defined separately (as DanielK_WMDE indicates is already the case under the covers) and not rely on filename extensions
[21:48:45] <yurik>	 brion, sure, that can also work
[21:49:15] <cscott>	 DanielK_WMDE: but the info in the title doesn't mean anything unless you speak english -- or "hacker english" at least
[21:49:28] <cscott>	 and "geojson" doesn't really mean anything to even english speakers
[21:49:41] <DanielK_WMDE>	 cscott: i like to do both. we *can* handle different models without any indicator in the title, but it's *nice* to have that indicator there. we already do this for .css and .js in the MediaWiki and User namespaces
[21:49:56] <yurik>	 cscott, brion, we could create an elaborate system for model selection - is that an absolute blocker/requirement?   I really feel that since data will be very technically oriented, people will actually find it better usable
[21:50:01] <cscott>	 DanielK_WMDE: that will probably have to be good enough for now.
[21:50:02] <yurik>	 just like we have File:Blah.json
[21:50:16] <yurik>	 exactly
[21:50:21] <yurik>	 i really like that indicator
[21:50:44] <TimStarling>	 "geojson" hopefully means something to the people who are writing lua modules
[21:50:45] <cscott>	 yurik: i'd just like it clear during the document/evangelization process that filename extensions may be a convenient *shortcut* for specifying the data type, they are only a stopgap and not strictly speaking required.  (especially if your native language is not english)
[21:50:47] <brion>	 i'm happy enough with extensions given the existing ecosystem
[21:51:09] <cscott>	 hopefully we'll eventually have more robust article metadata editors, so you can just directly edit the content model
[21:51:12] <yurik>	 remember that we are targeting a very tech savvy community with this until a nice editor system is in place. And when it is, I wouldn't mind a VE to edit the data remotely, without even switching to commons (like we do in Wikidata)
[21:51:28] <brion>	 mmmm, spreadsheet editor
[21:51:30] <cscott>	 brion: and i'm lobbying against them based on where i'd like to see the ecosystem eventually go. ;)
[21:51:31] <robla>	 cscott: file extensions and file types are tied up with one another, despite years of standards bodies trying to make that not be true
[21:51:33] <yurik>	 brion, exactly
[21:51:37] <DanielK_WMDE>	 i notice we are getting close to the end of the meeting.
[21:51:46] <DanielK_WMDE>	 are there any thoughts or comemnts about geojson?
[21:51:53] <yurik>	 brion, T134618
[21:51:53] <stashbot>	 T134618: Implement spreadsheet-like cell editing for tabular data - https://phabricator.wikimedia.org/T134618
[21:51:54] <DanielK_WMDE>	 yurik: how do youo render geo shapes?
[21:51:56] <cscott>	 "tech savvy community" == we systematically exclude potential community members who are not tech savvy
[21:51:59] <cscott>	 that's what i hear, at least
[21:52:04] <brion>	 and if you really want to have fun with file extensions <-> type, try dealing with video containers vs codecs! </runs away>
[21:52:19] <yurik>	 DanielK_WMDE, easy - you just put that geojson inside the <mapframe>...</> wikitext element :)
[21:52:34] <robla>	 brion: amen
[21:53:06] <brion>	 cscott: a legit concern, yes
[21:53:15] <DanielK_WMDE>	 yurik: so there is a hard dependency on the maps extension?
[21:53:16] <yurik>	 cscott, i am by no mean trying to exclude them, but rather understand the users.  Non-tech savvy community is the ones that will provide the most value (simply because there is probably a bigger nontechsavy community there), but we should make it nicer and easier for them.
[21:53:17] <brion>	 usability will become a bigger concern once there are tools built up on top of this system
[21:53:26] <yurik>	 DanielK_WMDE, when supporting geojson as storage - yes
[21:53:29] <cscott>	 so long as the file extensions aren't baked hard into the design, i'm happy.
[21:53:32] <brion>	 eg if you already have graphing/table-formatting templates+lua modules ready to use
[21:53:34] <DanielK_WMDE>	 yurik: what do you do if it's not there? just show json as text?
[21:53:36] <brion>	 and a good editor
[21:53:41] <DanielK_WMDE>	 would be ok-ish, i guess
[21:53:42] <cscott>	 just like i'm happy so long as we can *eventually* enable shadow namespaces or instantcommons on this
[21:53:44] <yurik>	 DanielK_WMDE, we could - as a backup
[21:54:03] <yurik>	 cscott, i am having very big doubts about shadow namespaces to be honest
[21:54:05] <DanielK_WMDE>	 cscott: +1
[21:54:06] <robla>	 I think the file extension issue needs to go to wikitech-l
[21:54:09] <yurik>	 but that's a separe discussion :)
[21:54:22] <cscott>	 yurik: instantcommons then.  or data: namespaces on every wiki.  what you will.
[21:54:55] <yurik>	 cscott, i'm not against it, just doubting the long term viability of it ;)
[21:55:10] <cscott>	 i have faith in kartik ;)
[21:55:11] <yurik>	 but again, we can totally support it if we decide that's the way forward
[21:55:18] <DanielK_WMDE>	 can we confirm that geojson is good to go? i have no objection, but i also know next to nothing about it
[21:55:32] <robla>	 so yurik, thanks for bringing this conversation up on wikitech-l generally.  I think there's a lot more to discuss here...and I'm not sure how to do it
[21:55:33] <DanielK_WMDE>	 is anyone around who aqctually knows something about geojson?
[21:55:36] <cscott>	 sure.  that's all i'm lobbying for at the moment.  leave space for the future, don't do anything that would make it impossible later.
[21:55:39] <yurik>	 DanielK_WMDE, https://www.mediawiki.org/wiki/Help:Extension:Kartographer
[21:55:40] <brion>	 so the alternative on the extension is probably "don't enforce an extension, have everything in the Data: namespace be this tabular format _for now_"
[21:55:42] <MaxSem>	 DanielK_WMDE, /me
[21:55:43] <yurik>	 it has a geojson sample
[21:55:43] <robla>	 DanielK_WMDE: I don't want to confirm anything in this meeting
[21:55:51] <brion>	 with an eventual UI/API extension for picking different content model
[21:56:14] <SMalyshev>	 DanielK_WMDE: well, I know a little about it... nothing that would prevent us from having it on wiki as format :)
[21:56:17] <DanielK_WMDE>	 robla: ok, check that there are no objections at this time ;)
[21:56:28] <yurik>	 brion, i'm not too happy about that - i would much rather say "for now, lets only allow pages in the Data: that match the extension"
[21:56:39] <yurik>	 this way we can put geojson there as well
[21:56:43] <yurik>	 and other formats
[21:56:48] <brion>	 is geojson ready to go?
[21:56:54] <yurik>	 brion, yep
[21:56:57] <brion>	 ah fun
[21:56:57] <yurik>	 it is much easier
[21:57:02] <DanielK_WMDE>	 robla: as in humming ;)
[21:57:06] * robla doesn't feel like he understands what's being proposed to have had a chance to object
[21:57:08] * aude and soem other people implemented geojson content handler in zurihc
[21:57:14] <aude>	 2 years ago?
[21:57:14] <yurik>	 geojson is very straight forward - we already have it as part of kartograhper ext
[21:57:27] <cscott>	 i'd say "the content model of the page is defined at page creation type by the extension.  but nothing after that point tries to parse the article title for an extension"
[21:57:37] <yurik>	 cscott, agree
[21:57:43] <aude>	 not sure exactly how it would work now, but think it's not too complex
[21:57:53] <DanielK_WMDE>	 cscott: yes, absolutely.
[21:57:58] <brion>	 cscott: that seems sensible yeah
[21:58:04] <cscott>	 it also potentially means you could work around the need for an extension by sneaky renames. ;)
[21:58:10] <brion>	 and allows for the future to drop the extension at creation time
[21:58:12] <yurik>	 cscott, the only limitation - jsonconfig will not allow renaming if the target page name does not match the original regex
[21:58:13] <DanielK_WMDE>	 that's how it works for .js and friends
[21:58:25] <cscott>	 in lieu of having a proper direct edit mechanism for the content model
[21:58:31] <yurik>	 yep
[21:58:55] <cscott>	 yurik: yeah, i'm okay with the rename limitation for now.  i just don't want to code to have regexp matches against the page title scattered everywhere.
[21:58:59] <robla>	 we're running out of time.  very good discussion; I think I know how to pull open questions out, but I'm not volunteering to do it.
[21:59:13] <aude>	 not sure we need hard dependency on kartographer
[21:59:15] <yurik>	 cscott, oh, thats not there .  The content id is stored with the page
[21:59:18] <brion>	 for the .js/.css subpages we also just have predictable naming which is what the exts are for, something not relevant for the primary item
[21:59:20] <DanielK_WMDE>	 yurik: people got upset when they couldn't rename a misnamed foo.jd to foo.js, because the content model mismatched ;) so now they can re-decalre the content. a bit scary...
[21:59:22] <yurik>	 aude, its a soft dep
[21:59:36] <aude>	 yurik: or some generic stuff could be seaprated and used by both things
[21:59:38] <yurik>	 just like many extensions depend on syntax highlighter
[21:59:38] <robla>	 yurik: where should people who are interested continue this discussion?
[22:00:47] <yurik>	 robla, i guess I should create a new task "deploy" ?
[22:00:52] * robla plans to type "#endmeeting" by 22:05 UTC
[22:00:53] <yurik>	 as we discussed earlier ?
[22:01:05] <cscott>	 (brion: .js/.css subpages are a bit weird since browsers and webservers still do content-type sniffing based on url extension and other factors; that shouldn't be relevant to the data namespace which is for internal mediawiki use, not for directly serving to web browsers)
[22:01:11] <robla>	 yurik: could you file a quick placeholder task?
[22:01:19] <yurik>	 sec
[22:01:32] * robla wishs Phab allowed reassigning the submitter
[22:01:50] <brion>	 cscott: when we serve them as JS/CSS content it's through RL's load.php; their URLs don't end in .js or .css at all :)
[22:02:52] <yurik>	 https://phabricator.wikimedia.org/T137929
[22:03:00] <robla>	 thanks!
[22:03:22] <robla>	 #info conversation will continue at https://phabricator.wikimedia.org/T137929
[22:03:24] <yurik>	 https://phabricator.wikimedia.org/T137930
[22:03:31] <yurik>	 robla, ^ geojson
[22:03:37] <yurik>	 should there be one common one?
[22:03:51] <yurik>	 that discusses the underlying tech? like extensions, etc
[22:03:56] <robla>	 #link https://phabricator.wikimedia.org/T137930 geojson
[22:04:08] <aude>	 thanks yurik :)
[22:04:21] <aude>	 suppose maybe we can also talk at SOTM US :)
[22:04:36] <yurik>	 ok, if needed, i will create another task later
[22:04:41] <robla>	 let's treat T137929 as the parent task
[22:04:41] <stashbot>	 T137929: Enable shared tabular data storage on a shared wiki - https://phabricator.wikimedia.org/T137929
[22:04:54] <robla>	 ok....let's end the meeting
[22:04:57] <yurik>	 :)
[22:05:04] <robla>	 thanks all!
[22:05:08] <yurik>	 thanks robla!
[22:05:08] <robla>	 #endmeeting
[22:05:09] <wm-labs-meetbot`>	 Meeting ended Wed Jun 15 22:05:08 2016 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)
[22:05:09] <wm-labs-meetbot`>	 Minutes:        https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-15-21.02.html
[22:05:09] <wm-labs-meetbot`>	 Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-15-21.02.txt
[22:05:09] <wm-labs-meetbot`>	 Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-15-21.02.wiki
[22:05:10] <wm-labs-meetbot`>	 Log:            https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-15-21.02.log.html
[22:05:11] <yurik>	 and thanks everyone!
[22:05:29] * yurik should probably re-read most of what has been said
[22:05:38] <yurik>	 DanielK_WMDE, tomorrow?
[22:05:58] <yurik>	 if anyone feels they want to parttake, ping me, will add them to the hangout
[22:07:16] <yurik>	 wow, the moment meeting ended, the channel just went dead
[22:11:17] <DanielK_WMDE>	 yurik: i'm a bit swamped right now. wikimanian would have been a good opportunity. if i look at my calendar, the second week of july looks good :-o
[22:11:41] <robla>	 yurik: these tend to be very timeboxed conversations
[22:11:45] <DanielK_WMDE>	 you can ping me tzomorrow, but i can't promise that i'll have time
[22:11:46] <yurik>	 DanielK_WMDE, second week of july is my wedding... no go :)
[22:11:59] <DanielK_WMDE>	 haaa! congratulations! awesome!
[22:12:06] <yurik>	 thx ):
[22:12:08] <yurik>	 :)
[22:12:14] <yurik>	 that was a Freudian typo
[22:12:39] * robla also congratulates yurik and ignores the Freudian typo
[22:12:40] <DanielK_WMDE>	 hehehe :D
[22:12:47] <yurik>	 :D
[22:12:49] <yurik>	 thx
[22:13:01] <yurik>	 i will try to sort through all the needed tasks
[22:13:45] <yurik>	 it should be main, tabular, geojson, plus enablement task that depends on each discussion
[22:13:56] <yurik>	 bleh, don't want too many tasks, but will see
[22:17:32] <SMalyshev>	 many small tasks is OK imho, if they all are well-defined unit of work
[22:17:55] <yurik>	 cross-dependencies and cross-linking is sometimes challenging
[22:37:04] <Platonides>	 congrats yurik
[22:59:58] <yurik>	 thx Platonides