[05:06:12] any researchers awake? halfak, you appear to be but that may be non-standard /away settings, I dunno ;;p [12:34:14] Sorry Ironholds [12:34:36] Will need to figure out my auto-away in my client. [14:54:36] halfak, heh, no problem. [14:54:47] I was going to ask if you can imagine a situation where a researcher would be writing to a db other than staging [14:54:52] I installed a plugin. it should solve the problem. [14:54:57] (adding write capabilities to my utilities library) [14:55:06] Ironholds, for LabsDB. [14:55:17] aha. from stat2/3? [14:55:20] External researchers. [14:55:26] gotcha [14:55:42] Are we WMF researchers the only intended users? [14:56:44] at the moment? That may change but afaik we're the only people dumb enou-er, I mean, enlightened enough to treat R as a viable programming language. [15:00:23] heh. Lots of researchers out there using R. You can tell by their attractive ggplot-based graphs. [15:00:32] But fair enough. :) [15:00:56] fair! [15:01:01] I'll see what I can do then [15:01:14] I spent my weekend rewriting the python connector to accept text and tsvs as well as JSON [15:01:21] that way random encoding problems can be handled nicely [15:06:59] * Ironholds whistles, streams 29m rows into mysql, what could possibly go wrong [15:20:41] halfak, wanna see something cool? [15:21:06] SELECT * FROM staging.referer_data LIMIT 5; [15:21:31] a year of referer tracking data, geolocated and parsed out and spiders identified, all stuck into mysql where people can query it :D [15:23:33] Are the pageviews bucketed? [15:25:09] define bucketed? [15:25:47] that is, grouped? Yep. [15:25:54] It's a key-value - well, a key-key-key-key-key-value [15:26:33] Bucketed as in you can't less than 1000 page views. [15:27:45] oh. Well, sampled data [15:27:55] so 1000 pageviews == 1 pageview, approximated upwards [15:29:12] Oh! I see. [15:29:18] Highly sampled then. [15:29:36] yeah, 1:1000; it was the best I could do to get the historical data :/ [15:29:48] I mean, unsampled we could do but it would only let us look at variations over the last month and a bit. [15:29:52] Makes sense. [15:30:19] How far back does this go? [15:30:42] 8 August 2013! [15:30:54] so it lets us take in the drop in December/January of 2013/14 [15:31:05] also, appropriately we don't have any data before my 21st birthday. [15:31:13] This seemed somewhat serindipidous to me. [15:31:16] *t [15:31:18] Cool. It makes me a little sad that we can't go back to the deployment of the knowledge panel. [15:32:51] But yeah. Covering the dip is important. [15:33:21] yeah :( [15:33:43] if we had our traffic aggregates broken down by country or region I actually worked out a way to investigate knowledge panel. [15:33:47] but I dunno if that exists that far back. [15:34:15] (basically: we take advantage of different deployment dates in different regions/languages/national google variants to treat it as a natural experiment) [15:35:15] oooh [15:37:13] I quite liked it, but I am unsure as to whether that data is available. I guess there's high-level by-region numbers up on wikistats? I don't know if that would be sufficient though. [15:37:46] Hmmm... Might be. [15:37:58] I wonder if we could officially dedicate time to exploring this next quarter. [15:38:57] * halfak goes back to his sanctioned activities around data QA [15:39:36] hehe. Good luck! [15:39:43] I'm going to try and work out what in god's name apps users are doing [16:00:48] hey tnegrin. We have a 1:1 right now, but there's no hangout. [16:00:55] Just drop me a call when you are ready. [16:01:03] kk -- just finishing up with erik z [17:00:46] ewulczyn: you should also auto-join #wikimedia-analytics ;) [18:53:09] Hey guys. I got a surprising result while I was doing some background work on counting unique editors. [18:53:09] https://meta.wikimedia.org/wiki/Research:Daily_unique_editors [18:53:29] Note that all 5 wikis I looked at look like they are taking a nose-dive in daily unique registered editors. [19:02:53] Hey Ironholds, how much work time do you get on your laptop when you use the battery you linked to? [19:03:00] Poweradd™ Pilot Pro 32000mAh [19:03:15] * halfak is considering doing some programming on his trip this weekend. [19:08:02] halfak: the nosedive in es, fr, it seem to coincide with with wikidata launch [19:08:25] Ahh! Interesting. I'll do the same analysis for wikidata. :) [19:08:34] i wonder how the curve would look like if you remove all edits that only concerned edits to the language links [19:08:49] That'd be a trick. Hmm. [19:08:53] i.e. maybe the dip would not be there [19:09:07] wikidata removed several 100m lines of wikitext [19:09:14] which dont need to be maintained anymore [19:09:20] partially they were maintained by bots [19:09:33] but there were still human edits involved [19:09:37] halfak, bah, sorry! tabbed away [19:09:42] it could be an explanation [19:09:50] I got about 2 loads out of it, but the X1 isn't renowned as power-efficient so you may get more [19:10:01] not for en and de though [19:10:19] * halfak has an X1 [19:10:57] dennyvrandecic_, I suspect those managing language links manually were likely to make other edits though. [19:11:14] no, not necessarily [19:11:24] they might have the homewiki ru or he [19:11:28] heh [19:11:36] and then go to enwiki to enter the link to the new article they created [19:11:46] or to frwiki, or whatever language they speak [19:12:31] so, yes, they would make other edits, but not necessarily in that language [19:14:42] just a theory, though, obviously :) [19:15:16] Ahh yeah. Fair point. [19:15:27] It would be great if we could just query cross-wiki already. :S [19:15:42] Well... we can, just that it's ad-hoc ATM [19:17:23] Yeah... that's a big jump in wikidata. Now to fix my loess model so it doesn't look idiotic. [19:22:34] Added the wikidata plot. https://meta.wikimedia.org/wiki/Research:Daily_unique_editors#Italian_Wikipedia [19:23:24] Pretty steep growth around the deployment. The wiggles leading up to 2012 are due to some strangely sparse datapoints that I assume came during merges with old timestamps. [19:23:55] Hmm.. Something must be up. The bot filter doesn't seem to be working. [19:24:02] i assume the wiggles come from pages that were imported from other wikis [19:24:06] with their history imported as well [19:24:12] Agreed [19:24:14] e.g. templates from enwiki [19:24:29] Yup. My thoughts exactly. [19:24:38] but the raise after deployment *could* explain the losses on fr, it, es [19:24:54] Maybe. I'd have to track people cross-wiki to be sure. [19:25:09] even if you do you might not get the right answer [19:25:16] ? [19:25:36] language links editing patterns changed a lot [19:25:52] by looking at people crosswiki you might not capture it all [19:26:00] Sure, but saying "you might not get the right answer" is true of any study. [19:26:05] :D [19:26:28] ok, i might suggest an alternative which might lead you closer to the truth [19:26:52] which is to analyze every edit, and tag those that are only dealing with langlinks [19:27:13] and remove those and see if there is still a dip [19:27:27] it's only, what, 2b edits or so ;) [19:28:24] Pretty easy to just flag edits that add or remove a langlink where the affected text outside of langlinks is below some thershold. [19:28:46] i would hope so [19:28:52] However, time. Time is the killer here. [19:28:55] and edits who sort langlinks and similar stuff [19:28:56] Got some hours to spare? [19:29:06] :D [19:29:17] letme check my calendar [19:29:22] sorry, not right now [19:29:25] ;) [19:29:53] I am not saying this has to be done, just offering a possible theory for the dip [19:30:08] we won't know until we actually do the calculation [19:30:21] Yup. [19:30:40] one argument against the explanation could be: "why does it keep falling?" i.e. shouldnt it have been a steep decline, and then stabilize on the lower level? [19:30:43] Maybe we should file it as a research idea so I can try to pass it off to some of the people studying cross-wiki behavior. [19:30:55] dennyvrandecic_, good point. [19:31:01] that would be good [19:31:45] I need to give computermacguyver a ping. haven't seen him around this channel for a while. [19:31:54] I suppose Nettrom might know someone too. [19:32:18] See discussion of drops in daily unique editors being discussed. [19:32:30] graphs here: https://meta.wikimedia.org/wiki/Research:Daily_unique_editors [19:34:46] i'll add a sentence there [19:35:11] Thanks dennyvrandecic_ [19:37:18] done [19:44:41] halfak: I didn’t get a chance to comment on the Unique Editors stuff but it’s super-interesting (yay, research as a by-product of finetuning measurements ;) ). Shall we take some time to go through this before our 1:1 on Wed? [19:45:14] Sure. I'm free after out 1:1:1 (lool) today. [19:45:22] s/out/our [19:45:33] k cool, lemme set up something [19:48:32] halfak: 4.30 PT ok? I have a quick check-in with Chip at 4 (I can reschedule it if needed) [19:49:16] Works for me. [19:49:36] Thanks for checking. that'd be too late some days. [19:51:57] invite sent [19:51:57] ewulczyn: lunch? [19:52:38] DarTar & ewulczyn, did you gets get access to stat3 and the dbs yet> [20:31:52] halfak: I believe ewulczyn’s ops request is still pending. Interestingly, I created the ticket and permissions were updated so now I can’t follow up on this request via RT (but only when people respond and CC me), amazing! [20:32:23] lol wat. [20:32:30] Why does rt have sub-permissions? [20:32:51] halfak, because RT was designed by people trying to make it as difficult to contact Ops as possible [20:33:05] I guess for super-sensitive requests? Like ewulczyn being an undercover German spy [20:33:15] that's literally the only explanation I can come up with for RT that doesn't assume incompetence: it was built by people who felt overworked. [20:33:23] DarTar! Now the whole internet knows [20:33:33] damn [20:33:38] Ironholds, not that unreasonable [20:33:42] DarTar, that's silly, you and I both know he works for DGSE [20:33:46] how do I redact an IRC log? [20:33:47] halfak, yeah, I sort of wish-wait [20:33:51] guys, guyzs [20:33:56] can we use RT for research requests? [20:34:05] and refuse to accept requests that come via email? Please? [20:34:05] ha ha [20:34:08] wm-bot4: Delete everything DarTar said in the last hour. [20:34:28] thx halfak [20:34:34] Exactly what I was thinking Ironholds [20:34:39] Ironholds: it’s not a totally crazy idea [20:34:50] It's kind of crazy [20:35:01] and there’s still a BZ component for data requests or something [20:35:06] (under analytics) [20:35:17] huh. That'd be an interesting way to manage it. [20:37:11] huh [20:37:17] I mean, it'd be more transparent to existing workflows, which is nice. [20:37:38] I love Trello but it's really bleh to work with a lot of the time, simply because it's...not linear. [20:37:48] That is, it seems to be designed around the workload as a whole, not individual cards. [20:38:04] and I find my work style doesn't match that. [20:38:34] ping ewulczyn [20:40:10] Ironholds: our new scrum master will magically help clear up that mess for us [20:40:21] I think our Trello board is a huge mess though. Ironholds it could be better. [20:40:23] it’s just a matter of weeks (or days) [20:40:30] We hired someone? [20:40:35] halfak: amen to that [20:40:36] will our new scrum master also make it so we can push back on silly requests? ;p [20:40:56] I don't remember participating in an interview. [20:40:57] btw, halfak thanks for the round of cleanup [20:41:09] no we haven’t hired or interviewed anyone [20:41:19] I wish we had someone to start tomorrow [20:41:22] My offer is still open to clean it all and remake it in my image. [20:41:31] Ironholds: yes, that’s part of the job [20:42:00] good good [20:42:06] in that case, can I have the job? [20:42:10] If only there was a giant UNDO button that would put the cards back if we didn't like it. [20:42:12] I LOVE saying no to people. [20:42:16] It's like my favourite thing. [20:42:30] no shit [22:01:38] DarTar, tnegrin: getting bit by the non-automatic VOIP call bug [22:04:23] halfak: running a few mins late, tnegrin is wrapping up with kevin [22:04:48] k. [22:09:44] halfak: try now [23:16:26] o/ Soni_WP [23:26:08] halfak: 10-15 mins maybe? [23:26:19] OK [23:27:16] cool thx [23:31:25] Bah! VOIP problem [23:31:30] Stupid google [23:31:35] Everyone wants a hangout