[00:06:59] Ironholds: edits per country split by mobile / desktop running for all wikipedias for all of july! [02:06:35] lzia, stop working! [02:06:52] Ironholds, not working. not sure why this logged me in [02:06:56] I'll be out of here in 5 min [02:06:58] :D [02:07:01] and you, too [02:07:23] I am doing personal code ;pr [02:07:28] rewriting my image composition library to avoid OOP [03:13:34] where can i get page view data per article ? [03:15:00] OrenBochman, the webstatscollector dumps are your best bet, I think [03:15:11] see the links at the bottom of http://stats.grok.se/ [03:36:34] thanks [03:38:03] no apis to get these I guess [03:59:36] OrenBochman1, not at the moment! But there will be. [03:59:48] As soon as we have the new PV definitions implemented :) [06:04:44] evening Pine :) [06:10:18] Hi Ironholds [06:10:43] Oh and there's ragesoss, working 24/7 [06:11:00] as opposed to me? ;p [06:11:05] Ironholds: ragesoss and I met on Sunday [06:11:13] cool! [06:11:20] I'm aware that you work 24/7 [06:11:40] I also noticed that it seems WMF hired an entire department to replace you as engineering liason [06:13:32] Ironholds: ^ I conclude from that information that you should be paid like 4 people [06:13:51] I am! If we assume 4 people are paid but I do get to do fun research these days so I can't complain. [06:14:10] and I don't have to live in the bay area! [06:15:00] Is that a good thing? [06:15:06] no [06:15:09] it's terrible [06:16:20] legoktm: the bay area is terrible? [06:16:26] the bay area is indeed terrible. [06:16:29] noooo [06:16:31] well, it has a lot of good people [06:16:36] Ironholds leaving the bay area is terrible! [06:16:37] but it's overpriced and there's no weather variation [06:16:42] legoktm, I'll be back! [06:16:44] ...in January [06:16:46] ...for two weeks [06:16:47] the weather is a feature, not a bug! [06:16:52] .....if I'm no longer scared of flying. [06:19:07] :D [06:20:35] By the way, Ironholds, how was Wikimania? [06:20:53] I /think/ it was good but I didn't get to actually go to any of the sessions [06:21:02] I spent the entire hackathon and conference preparing for my presentation on the Sunday morning [06:21:17] and then passed out because I'd been awake for 26 hours. and woke up for the closing party [06:21:19] fail :( [06:22:23] :( [06:22:43] Hi halfak! [06:22:57] Hey Pine [06:23:02] It's 11 PM, which means everyone is starting work, it seems [06:23:09] or has never left [06:23:20] hey, it's 2am for me [06:23:28] east coast! [06:23:42] I thought you were in Wales [06:23:57] I was in Wales until...let's work this out [06:24:10] okay, Wales I left in April 2013, moved to SF in September 2013 [06:24:18] moved to Boston..well actually on Monday. [06:24:31] Oh ok [06:24:37] You are well travelled [06:24:41] you find me in a room one of my friends used to sublet that contains a matress, two pillows and all of my crap. [06:25:00] you think that's bad, I did a road trip across the country for 3 weeks in June [06:25:11] 19 days, 28 states, 8,000 miles. Wonderful fun, NEVER DOING IT AGAIN. [06:25:15] Heh [06:26:20] alright, my MR task is launched, my SQL task is launched, and both are in screen [06:26:29] I'm going to go to bed. Talk soon guys :) [06:26:34] Good night! :) [06:26:38] halfak, I'll hopefully have a useful gdoc for you about the session stuff soon [06:26:45] getting all my stuff organised is...time-consuming :/ [06:26:51] but I'll at least have the dataset! [06:39:31] Ack. Sorry guys. Got side-tracked. [06:39:35] It's 8:30AM here [06:43:08] halfak: you are in UK? [06:45:36] Germany today. [06:45:42] Was in France yesterday [06:46:53] You are well travelled [06:47:05] Did Lila send you on a speaking tour? [06:59:48] Negative. This one I set up myself. [07:00:21] I know a few people in labs across Europe working on wiki stuff and there are a few weeks inbetween Wikimania and WikiSym [07:00:24] Pine, ^ [07:00:34] * halfak is slow because he's going through his morning email too. [07:00:37] sorry about that. [07:07:33] That's ok [10:02:26] Awww [10:02:33] Halfak left [10:03:08] legoktm: need to ask him about releasing the country edits data [10:04:04] Halfak always leaves :'( [10:04:51] Yeah [10:04:55] Busy man [10:05:52] EdSaperia: I just genetated data for edits per country per wiki for September, split by mobile and desktop :) wondering what to do with it. [10:06:24] Sell it? [10:06:34] Make an infographic? [10:08:04] EdSaperia: probably infographic, I suppose. I'm not researchery enough to do actual research... [10:08:37] http://qed.econ.queensu.ca/working_papers/papers/qed_wp_1083.pdf [10:09:02] * YuviPanda clicks [10:21:29] is jorn an alias of jorm? [10:28:59] EdSaperia: nah, sorry it's joern [10:29:15] joern is an alias of jorm? [10:29:44] don't know, but it's my firstname ;) [10:46:48] joerm [10:52:46] EdSaperia: hahaha [10:54:29] halfak: hey! Is it OK to publicly quote numbers on 'edits per country'? I think it should since you guys are already doing it and there is no private data there [10:59:08] Olollolloolllo [10:59:09] 9oooloooo99l9o99olooloool9o9loooolooooloooooo9oo9ooo9ooooooooloooooooooloooooooooooooooolloolooooooooololllooololoolloollololllollllllolollllllollllllllolllollololollolllolollllllolllllloollolollolollollolllllllllllllolllollllllllllollolllllllollllolllllllolllllollllolllollllllololllolooolollollllolllolollllllolllolollollolololllloloololllololloolollllollolo [10:59:09] llllllllllllollllllllllololllllllllllllllolllllllllllllllllllllllllllolllllllllolllllllllllllllllllllolllllllllolllllllllllllllllllllllllllllllllllllllllllllllllolllllllllloollllolllllllllolllllolllllllllolll [10:59:09] Olollolloolllo [10:59:09] 9oooloooo99l9o99olooloool9o9loooolooooloooooo9oo9ooo9ooooooooloooooooooloooooooooooooooolloolooooooooololllooololoolloollololllollllllolollllllollllllllolllollololollolllolollllllolllllloollolollolollollolllllllllllllolllollllllllllollolllllllollllolllllllolllllollllolllollllllololllolooolollollllolllolollllllolllolollollolololllloloololllololloolollllollolo [10:59:10] llllllllllllollllllllllololllllllllllllllolllllllllllllllllllllllllllolllllllllolllllllllllllllllllllolllllllllolllllllllllllllllllllllllllllllllllllllllllllllllolllllllllloollllolllllllllolllllolllllllllollll [10:59:11] l [10:59:12] ll [10:59:18] Look plop [10:59:19] ol [10:59:19] Loo [10:59:19] o [10:59:19] lo [10:59:20] Lolol [10:59:20] ll [10:59:21] ll [10:59:22] ll [10:59:23] O [10:59:24] l [10:59:25] Ol [10:59:27] L [10:59:27] L [10:59:29] L [10:59:30] L [10:59:31] L [10:59:32] L [10:59:33] L [10:59:34] Look [10:59:35] l [10:59:36] L [10:59:37] L [10:59:38] L [10:59:39] L [10:59:40] L [10:59:41] L [10:59:42] L [10:59:43] L [10:59:44] L [10:59:45] L [10:59:46] l [10:59:47] L [10:59:48] L [10:59:49] L [10:59:50] L [10:59:51] L [10:59:52] Ll [10:59:53] Llll [10:59:54] L [10:59:55] L [10:59:56] Ll [10:59:57] L [10:59:58] L [10:59:59] l [11:00:00] L [11:00:01] L [11:00:02] Ll [11:00:03] Ll [11:00:04] l [11:00:05] ll [11:00:07] lll [11:00:07] L [11:00:08] L [11:00:09] l [11:00:10] l [11:00:11] l [11:00:13] L [11:00:14] L [11:00:15] ll [11:00:16] ll [11:00:17] L [11:00:18] Ll [11:00:19] L [11:00:20] L [11:00:21] Lll [11:00:22] l [11:00:38] ll OK gah [11:00:48] Cat [11:00:55] I swear [11:01:00] It was the cat [11:01:02] Ok? [12:03:25] how come your cat hits return so often? ;) [12:05:56] s [12:39:30] Heh [12:41:53] who knows anything about the wikidata db structure? [12:45:47] Ironholds: legoktm I suspect [12:45:57] iff he's awake [12:46:05] * Ironholds stares intently at legoktm until he wakes up. Or until I go out. [13:04:27] * ragesoss waves at Pine. [13:04:38] not quite 24/7, as it turns out. [13:05:38] just 23/7? [13:06:04] well, it was 6 hours ago when Pine left the ping I just saw. [13:13:28] Anyone have access to this? http://www.inderscience.com/info/inarticle.php?artid=64056 [15:17:21] yes hello I am here [15:17:25] but no Ironholds :< [15:20:46] legoktm, is this re NPF? [15:21:01] no, something about wikidata's db [15:21:09] Oh. n/m then. [15:21:23] oh, but he did also ping me about NPF [15:21:31] I woke up to https://en.wikipedia.org/wiki/User_talk:Legoktm#NPP [15:23:54] heh. I've been talking to Ironholds and a few page patrollers about NPF. I've got an experiment that I'd like to run that I think has the potential to have a big impact. [15:25:04] TL;DR: The median time between edits is about 30 minutes. I want to delay the display of a small random sample of new page creations for 30 minutes so that, if a second edit is going to happen, it doesn't get edit conflicted by a CSD template. [15:25:31] I think it will both reduce page patroller workload and improve the newcomer page creation experience. [15:26:14] I have a few people signed on to help me make sure that we don't let spam and other nasty things go unchallenged for 30 minutes. [15:26:23] 0/ Ironholds [15:26:25] ^ [15:26:40] legoktm, ^ [15:26:48] hey halfak :) [15:27:07] legoktm, yeah, wikidata [15:27:26] Though experiment: I have a Q-code. How in hell's name do I get the gender tag associated with that Q-code. [15:27:48] gender property you mean? [15:27:48] the db structure appears to have been put together by a lunatic sort of generally cramming concepts into a standard mediawiki db [15:27:55] a lunatic who does not know how to name tables [15:27:55] yep [15:28:27] you can't get it out of a database yet [15:28:38] ...what. [15:28:41] you're shitting me [15:28:46] it LIVES in the REVISION TEXT?! [15:28:55] :) [15:29:00] https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q42&format=jsonfm <-- search for "P21" [15:29:02] ...WHY DID WE FUND THIS. [15:29:10] yeah, except I have 1 million pages ;p [15:29:19] but you can use magnus's thingy [15:29:26] the query interface is being reviewed still [15:30:09] http://wdq.wmflabs.org/ [15:30:30] you could talk to him about getting access to it [15:30:37] or just grab his scripts and build it yourself [15:30:57] yeah, it's not generating the API query that's my problem, it's scaling it to...circa 1.2m requests. [15:31:16] from inside the cluster I doubt it'll be a problem [15:31:20] you can get 50 items at a time [15:31:36] so that's only 20k requests [15:31:48] s'true! [15:31:48] I think it's 50, it might be 500 if you have apihighlimits [15:31:52] Wait. Wikidata is stored in rev_text? [15:31:54] WTF [15:31:58] halfak, I KNOW RIGHT. [15:32:02] WHY WOULD YOU INTENTIONALLY DO THAT. [15:32:04] Serious WAT [15:32:17] because it's a smart design decision [15:32:19] that's sort of completely anti- the entire purpose of wikidata. [15:32:29] in the sense that they didn't need to reinvent everything? [15:32:48] basically [15:32:57] they did have to write ContentHandler for it, but yeah [15:33:02] sure, but... [15:33:15] okay. the purpose of wikidata is to act as a repository for other purposes [15:33:37] it's not a site in and of itself. Or, it's not PRIMARILY a site. It's primarily a structured way of storing data. [15:33:42] once WikibaseQuery is reviewed and deployed, you'll have your nice database setup :) [15:34:02] if you have prioritised ease of implementation over ease of access to that degree you have misunderstood why you're building what you're building. [15:34:35] the data is easy to access [15:34:51] in small chunks, on web-connected machines [15:34:55] it's just not in a queryable format, because that's pretty hard to do on wikidata's scale. [15:35:05] You can download daily dumps in JSON [15:35:06] Na. [15:35:16] and diff files are provided too [15:35:17] People wrote SPARQL to handle this [15:35:20] On big machines [15:35:23] See DBpedia [15:35:55] +1 for Dumps. [15:35:58] hmm, interesting. [15:36:21] Oliver, if you want to process it and filter out some items by Gender, that might be the best way right now. [15:36:23] Also, the Wikidata team would have gotten this done like months ago if Magnus hadn't written his own query thing :P [15:36:31] halfak, the daily JSON dumps? [15:36:35] yeah. [15:36:42] yeah, I'll just run from stat3. bah. [15:36:47] * Ironholds doesn't like change! :P [15:37:31] oh, dennyvrandecic is in here :) [15:37:36] you can blame him ;) [15:38:13] halfak: reason why not to use SPARQL on top of MW - see semantic mediawiki :) [15:38:31] Not sure how that's an argument against SPARQL [15:38:44] Is semantic mediawiki made of RDF triples? [15:38:45] it's an argument against MW, more like [15:39:09] so this was the way something like WikiData is going to work on top of MW [15:39:09] Ahh. Yeah. WikiData shouldn't be MediaWiki. [15:39:27] Also, it shouldn't be called "WikiData" [15:39:46] heh [15:39:50] too late :P [15:39:59] Yeah [15:40:00] It should be called "SemanticWiki" because it stores semantic things. [15:40:09] Data is too general. [15:42:40] no, it should be light green [15:43:10] * Ironholds ducks [15:44:38] IT'S NOT WikiData!!! [15:44:40] Wikidata* [15:45:16] halfak: https://upload.wikimedia.org/wikipedia/mediawiki/6/69/Hesaidsemanticga2.jpg [15:46:41] heh. [16:35:08] halfak, Ironholds, I'll be in Hangout in few min. I have a tech problem with the webcam [16:35:12] not sure where DarTar is. [16:36:14] DarTar is in conf call [16:36:25] coming! [16:36:32] sorry, was writing paper draft [16:46:08] Ironholds: the problem was the flexibility of allowing users to create new properties [16:46:30] * Ironholds headscratches [16:46:53] how else would you do that? users can create new properties, these can be used on any topic. you can either create a new table for each property [16:46:58] or have a big-ass key-pair [16:47:16] (I mean, I assume. Me no DBA.) [16:47:25] key value pair [16:47:37] that would not scale to wikidata size [16:47:54] we have that for categorylinks and templatelinks and it seems to work pretty well. [16:48:09] because there you dont have a property in the middle [16:48:14] it would be one table per property [16:48:15] Ditto user_properties, which also contains user-created keys. But I'm probably missing something vital [16:48:19] ahhh. Ew. Okay, fair point ;p [16:48:25] It makes sense now. [16:48:41] it is a terrible design, but we had a lot of smart db people come in [16:48:50] and some nosql developers, people from mongo and others [16:49:07] some semweb developers, all helping us with coming up with something decent [16:49:28] gotcha. So it's a least-bad-option kinda problem. [16:49:31] it sucks, but it was the only thing we were sure would scale for the expected size [16:49:50] unfortunately [16:49:57] Ah well. Iterations! :D [16:50:04] yep. :) [16:50:49] but i do apologize for the mess that it is, but I still wouldn't know how to get it off the ground better\ [16:53:42] building on top of mediawiki gave the advantage of history, of flexibility, and of letting users use already known tools, and the ops being less uncomfortable with deploying a whole stack made outside [17:30:47] hey halfak I’m running a bit late, 5 mins? [17:31:03] Sounds good. Stuck in the last meeting anyway. [17:41:10] hey halfak, ready here, how are things on your end? [17:41:15] I’ll have to split in 20 [17:42:19] Are you in the call? I don't see one on the event? [17:42:25] DarTar, ^ [17:42:50] no, I don’t have the link [17:42:54] and can’t add one :( [17:43:01] I just added one. [21:53:31] yo Ironholds [23:04:17] DarTar: hey [23:04:30] hey YuviPanda [23:04:39] DarTar: is it OK to make public aggregate info of 'edits per country'? [23:04:54] (I already have it) [23:05:01] And I'd assume it is... [23:05:36] errr no :( [23:05:41] And you guys kinda already did that with the mobile trends... [23:05:46] it really depends on the granularity [23:05:54] yeah, that stuff was super-aggregate: [23:05:58] all projects combined, [23:06:02] 90 days [23:06:10] YuviPanda: just 'number of edits from country', 30days [23:06:36] Err [23:06:42] DarTar: ^ [23:06:52] I could make it 90days [23:07:07] can you send a line to the analytics list? We do maintain dashboards about the geographic breakdown of edits but had to make them private because of privacy concerns [23:07:21] What concerns] [23:07:28] I'll email the list tomorrow yeah [23:07:28] deanonymization [23:07:35] by combining geodata with RC [23:07:40] We can threshold them [23:07:43] when breaking down edits by project [23:07:57] So anything under say 1000 just gets mentioned as that [23:08:07] <1000 [23:08:10] yeah, the other issue is that we’re trying to come up with a general strategy to deal with geo/temporal aggregates of traffic and editing data [23:08:32] (trying to get an expert of geo-anonymization on board) [23:08:40] Right but I hope that doesn't block this :) [23:09:01] it shouldn’t block it, but we should err on the safe side until we have the general solution [23:09:05] I could even do like just 'top 20 countries' [23:09:17] start a thread on analytics, I think we can take it from there :) [23:09:23] I’d love to make more geodata available [23:09:24] True I'll make sure I pass the specifics by the analytics list [23:09:41] the mobile trend analysis was pretty shocking to me [23:09:46] Why? [23:09:56] because of the dominance of north america [23:10:04] which is obviously caused by the dominance of enwiki [23:10:15] but I was not expecting the data to be so skewed [23:10:30] Heh [23:10:39] we’re actually planning on releasing the raw aggregate data too [23:10:42] Erik Zachte touched upon this a little as well I think [23:10:44] Niiice [23:10:45] so it’s a good time to start the discussion [23:10:53] yeah [23:11:32] Exciting times etc [23:11:34] if the declining trend in desktop traffic from North America is also confirmed, this could become a little concerning [23:11:41] yesss [23:11:42] Yeah [23:11:57] DarTar, ooh [23:12:06] do you think IP blocks move around the world? [23:12:12] by which I mean, could it be fun for me to geolocate the referer data too? [23:12:19] My most interesting thing was how wikidata has 10x as many edits from toollabs as from the next highest country? [23:12:44] Ironholds: hmm, I could see some server balancing affect the way in which referred traffic is served [23:13:03] yeah, fair [23:13:53] YuviPanda: I think when we release the data ppl will start asking quite a lot of questions (as you can’t necessarily answer them just looking at a geoplot) [23:14:44] also Ironholds I forgot who asked (was it Jessie) if we had evidence of hourly fluctuations in the effectiveness of Growth experiments [23:15:02] time/geo are hugely important dimensions we’ve never seriously explored in Product [23:15:24] Yeah [23:15:31] partly because for most features we can’t target by time or geography [23:15:47] I don't even know how much research we do in general with product [23:16:03] *cron: switch off all CTAs when country X goes to sleep* [23:16:45] YuviPanda: ad-hoc data analysis? A ton. Research that could inform design, not that much [23:17:14] product design or *strategy*, that is [23:19:36] DarTar: yeah, problem is perhaps the unstructured and ephemeral nature of ad hoc research [23:25:32] DarTar, what features? [23:25:35] logging hits cu_changes too ;p [23:25:50] give me a timestamp and an IP address and I can tell you anything you want. [23:26:23] aside from CN, we don’t have the ability to *run* features on specific slices of the reader/editor population [23:26:36] collecting data is a different story [23:26:40] s'true