[13:55:49] Ironholds: I hadn't realized that apps was the only blocker to getting pageid evrtywher... [13:56:03] If you file a bug and point me to it I might be able to help next week [15:01:57] good morning guys [15:02:56] Have the proxy servers changed? I have http(s)://webproxy.eqiad.wmnet:8080 as the proxy servers setup right now [19:48:02] hola nuria. if you are around, I have a question about hadoop fs -ls /wmf/data/archive/pagecounts-all-sites/ Do we filter bots for the pageview definition used to collect this data? I checked https://wikitech.wikimedia.org/wiki/Analytics/Data/Pagecounts-all-sites but this hasn't been mentioned explicitly in the page. [19:51:27] leila, no, we don't filter bots [19:51:35] thanks Ironholds. hi. :-) [19:51:41] it's the same as https://git.wikimedia.org/blob/analytics%2Frefinery.git/37056b31c6a5ae3363647b785d1acee66b09ce58/oozie%2Fwebstats%2Finsert_hourly_pagecounts%2Finsert_hourly_pagecounts.hql but rolled up, basically [19:51:44] heyo :) [19:52:23] do we have 3 different definitions of pageview Ironholds? webstatscollector, this other one, and the one you developed? [19:54:20] webstatscollector == "this other one" [19:54:47] there was the legacy definition WSC used. Then Christian turned it into a hive query because...because nobody wants to maintain what came before that hive query [19:54:59] (also, we're switching to hive/hadoop) [19:55:05] and then there's the new one [19:55:50] ow I see. thanks, Ironholds. [20:02:39] leila, np :). Always happy to help [22:59:11] hey again lzia :). How goes? [23:03:38] hey Ironholds. goes okay. trying to fix couple of sqoop codes [23:04:14] neat! [23:05:18] what are you up to, Ironholds? [23:07:01] I am tidying and corresponding with some OII people I'm finishing up a paper with :) [23:07:11] and trying to brainstorm on what my next work should be. [23:07:33] because as soon as I get this paper wrapped up I am out of new papers. Maybe correcting Taha's study? [23:10:02] I see. [23:10:18] * Ironholds shrugs [23:10:31] I need something to do in my spare time. Feeling kinda...out of enthusiasm at the moment, though. [23:10:37] this sounds exciting. I'm not sure what you want to do with Taha's study. I wouldn't touch his work. ;-) [23:11:05] * Ironholds snorts [23:11:09] I want to show that his work is WRONG [23:11:17] http://www.plosone.org/article/metrics/info:doi/10.1371/journal.pone.0030091#citedHeader [23:11:20] hahaha! good luck to you, man! that should be fun [23:11:26] http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030091 rather [23:11:33] I can do better! [23:12:04] his methodology is wrong! [23:12:06] and so are his results! [23:12:06] well, I would be surprised if you couldn't, given that they didn't have access to non-public data [23:12:09] yup [23:12:22] I'm already doing some circadian stuff with Scott to look at the mobile/desktop distinction but [23:12:55] their methodology looked okay to me. but again, they had much more limited data. [23:12:57] oh god, there are /pie charts/ [23:13:09] leila, assuming every enwiki editor lives on the east coast is, to me, problematic ;p [23:13:28] wait, sorry [23:13:28] Chicago [23:13:33] they assume every enwiki editor lives in Chicago [23:14:27] I have to read the paper again, when I read it, I thought they had done a good job with the data they had. [23:14:48] the reality is that at some point, you have to make some assumptions to make your system simpler [23:15:21] the nice thing is that they have published their work, so if you can do better, you can build on their ideas, or come up with completely new ideas, and move the understanding a bit more. [23:15:44] yup! [23:15:48] and also a load of people have cited them [23:15:57] so I have a wide range of papers to point to as evidence that this is interesting :D [23:16:02] they will be my stalking horse *purses fingers* [23:16:14] ;-) [23:17:26] okay, back to sqoop. [23:20:01] have fun!