[12:54:22] mooorning [12:54:26] average, are you around? [13:05:22] drdee: yes [13:05:26] right here [13:05:43] cool! [13:05:48] what is the status of 353? [13:05:57] it's in progress [13:06:38] can you be more specific :) [13:06:52] I can [13:07:01] well, first off I need to geocode all the data [13:07:11] 353 is a list of bugs; before fixing them i think we need to figure out where we are and what needs to happen [13:07:13] then, I need to set a cron job to geocode it for me for each day from now on [13:07:38] so wikistats can work on that data [13:07:48] one step back, before fixing things [13:07:52] ok [13:08:18] let's first make sure we understand all the things that need to happen so we can make a plan and we know what we are getting ourselves into [13:08:46] hi Snaps [13:08:58] drdee: I agree [13:09:04] i am starting to work on adding some content to the mingle cards, starting with https://mingle.corp.wikimedia.org/projects/analytics/cards/545 [13:09:21] please have a look and let me know what type of information you would like me to add [13:12:27] Hi drdee! I'm not quite sure I follow #545. The broker failover and replication mechanisms are not exposed through the client API. So not sure what role you mean rdkafka would play in this? [13:13:01] drdee: librdkafka will support migrating to another broker of course, without loosing any messages (unless the rdkafka machine goes down aswell). [13:15:05] i might be misunderstanding :) [13:15:10] you are going to add kafka 0.8 support to librdkafka, right? [13:16:17] average: i added a column 'todo' to card 353, maybe you can add details for each bug in that column? [13:16:22] Snaps: ^^ [13:16:27] drdee: yes [13:17:54] drdee: yes [13:18:55] but isn't one of the major reasons for kafka 0.8 the support for replication? [13:19:12] and kafka 0.8 protocol is incompatible with 0.7, right? [13:19:27] drdee: yes, on both questions :) [13:20:00] lets rewind a bit. 0.8 adds support for replication, etc. This is all controlled by an internal protocol between the brokers and is not directly exposed to the clients (producers/consumers). [13:20:29] However, the producer may dictate the number of replications it expects per message sent, and that should indeed be configurable on the producer side. [13:20:53] ok [13:21:15] But that replication factor is all the producer side needs to know about replication. [13:21:54] right, i added it more as a reason why kafka 0.8 support is important, did not mean to imply that it required code [13:22:10] drdee, ah, okay. Sorry about the confusion :) [13:22:34] np! [13:22:34] the coding part is adding support for the new protocol, [13:23:03] and i was wondering, would it make sense make it configurable to run librdkafka in 0.7 and 0.8 mode [13:23:08] ? [13:23:44] Preferably not, but its really up to you guys if you are planning on a slow migration. [13:25:12] i think it depend on how fast kakfa 0.8 is released :) [13:25:27] there is no fixed timeline for that, right? [13:27:02] They are fixing up the last couple of issues [13:27:10] I'd say the beta will be out within 2 weeks. [13:27:36] And there's quite a demand for it, so it will probably get a fair bit of testing right away. [13:28:04] awesome! [13:28:16] so what details should I add to card 545? [13:32:18] * Make the per-message replication factor configurable (possibly per topic..) [13:33:14] * Configurable list of initial brokers [13:55:28] drdee: It would be great if someone could find the time to have a quick glance at the remaining questions from my last mail :) [13:55:44] yes! i will give it a bump today! [14:07:51] just bumped it [14:09:03] thanks [14:24:26] MORNING ottomata, milimetric [14:24:32] Yiiihaaaaa!!!! [14:24:35] mooorniiing [14:24:37] morning :) [14:24:50] i am all PUMPED! [14:26:05] * brion passes drdee more coffee [14:26:55] * drdee is drinking coffee and now HE REALLY FEELS ****PUMPED**** [14:27:31] YEAAAAAH LETS ANALYZE SOME TICS [14:28:01] * brion sneaks back to mobile channel before he unleashes The Hulk or something [15:43:55] New patchset: coren; "New implementation of log2udp" [analytics/log2udp2] (master) - https://gerrit.wikimedia.org/r/58449 [15:44:51] average: can you do a code review on https://gerrit.wikimedia.org/r/#/c/61049/ ? [15:49:37] New patchset: coren; "New implementation of log2udp" [analytics/log2udp2] (master) - https://gerrit.wikimedia.org/r/58449 [15:50:24] New review: coren; "Verified to build and run." [analytics/log2udp2] (master); V: 2 - https://gerrit.wikimedia.org/r/58449 [15:53:11] drdee: yes [15:53:38] am i in any way actually needed for leadership scrum? [15:54:14] i think you are excused [15:54:32] cool. [15:54:44] i figured, since i haven't had to answer a single question in a week [16:09:51] is kafka 0.8 released? [16:10:36] close [16:10:55] yeah i just noticed they have a link to setup instructions [16:11:00] no official release yet [16:11:10] but according to Snaps very soon [16:11:39] hm, cool [16:45:37] morning rounce123 [16:45:42] Morning [16:45:48] how are you today? [16:45:57] GOOD! [16:46:06] i am in the hangout [16:46:14] Ok will join now [16:48:02] I just joined which hangout are you on? [16:48:57] https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [16:49:37] rounce123: that's our default hangout for scrum activities [17:19:29] hey kraigparkinson: i don't have access to the 'archive' folder on google docs, can you please give me access? [17:20:50] i updated https://mingle.corp.wikimedia.org/projects/analytics/cards/466 [17:21:14] dunno whether people want to close that, or move it ot another status so we remember there's followup work [17:21:39] drdee, I think it has to do with Sharing ownership of the folder. [17:31:47] drdee, think its best for adrian to take it. over... [17:31:59] ok [17:34:26] drdee, coming to the Analysis/Opportunuties hangout? [17:34:57] yes [17:56:32] milimetric: is there a way for me to test card 388? [17:57:04] no, but you can see the work [17:57:20] it's on a branch called feature/csv_upload [17:57:28] trying to find it in gerrit... [17:57:45] k [17:57:47] https://gerrit.wikimedia.org/r/gitweb?p=analytics/E3Analysis.git;a=shortlog;h=refs/heads/feature/csv_upload [18:27:16] drdee: histogram for march is running [18:27:18] nice! [18:27:26] erosen, drdee: did either of you request the analytics/user_metrics repo? I'm about to [18:27:33] no tyet [18:27:35] not yet [18:27:37] no need to request [18:27:42] we can make it ourselves [18:30:27] k, i'm making it (http://www.mediawiki.org/wiki/Git/Creating_new_repositories) [18:31:21] dschoon, when/where is your lunch and learn today? [18:31:36] kraigparkinson: people wanted me to move it to weds. [18:31:40] i'm writing the message now. [18:31:54] Ah, OK. Thanks. Please add me to the invitation. [18:34:20] to today's earlier chat, I just pinged tfinc about helping us clear #92. Said he's planning to get on it today. [18:34:27] yup [18:34:32] gracias :) [18:35:50] uh drdee, what "group" are we a part of in gerrit? [18:35:57] 'analytics' [18:36:10] also: stats [18:36:10] that's the "parent" but is it also the "group"? [18:36:13] ... [18:36:19] oh, sorry [18:36:21] gerrit. my bad. [18:36:23] just analytics. [18:36:27] kraigparkinson: can we add the most up date link for the stats in https://mingle.corp.wikimedia.org/projects/analytics/cards/92 ? [18:36:43] this is like - security gorup [18:36:43] *group [18:36:57] i'll add that tfinc [18:37:01] tfinc: sorry, that's my fault [18:37:07] i updated the job on friday. [18:37:09] the new link is http://stats.wikimedia.org/kraken-public/webrequest/mobile/platform/mobile_platform-daily.tsv [18:37:12] i'll update the card. [18:37:34] thanks, guys. [18:37:34] done [18:37:50] milimetric 1 sec [18:37:59] dschoon: thanks for adding it to the card. any other place is likely to get lost [18:38:07] agreed. [18:38:13] i think i've done that for the other cards. [18:38:29] there's no public dataset for the sessions job as it's too big [18:38:40] milimetric: you only need 'rights inherited from' and that should be Analytics [18:38:45] and the project name [18:38:55] when i cleaned things up, tfinc, i tried to move everything for you guys under http://stats.wikimedia.org/kraken-public/webrequest/mobile/ [18:39:03] to help with exactly this problem [18:39:06] and don't check 'create initial empty commit' [18:39:11] hopefully people will continue with that going forward. [18:39:18] oh, you're not allowed to use the interface apparently drdee [18:39:19] and don't check 'only serve as parent...' [18:39:25] read this: http://www.mediawiki.org/wiki/Git/Creating_new_repositories [18:39:27] ??? [18:39:29] yeah [18:39:33] * milimetric shoots himself [18:39:35] that used to be fine [18:43:02] drdee, in ops meeting, they are asking about stat1002 [18:43:11] what about it? [18:43:45] well, they were talking about all the users on stat1, i mentioned that i was just waiting to get stat1002 stuff prioritized and then we'd get the data off of it [18:43:53] any idea when you are thinking of scheduling that? [18:43:53] drdee: http://localhost:8888/beeswax/table/default/tmp_mobile_session_histogram_march [18:45:10] swap it with oxygen upgrade? [18:50:48] ottomata ^^ [18:51:30] milimetric; i would just use the UI, i had never problems; just make sure you call the repo 'analytics/name_of_repo' [18:51:35] and make sure it inherits from the right group [18:52:26] mk [18:53:56] mk/ [18:53:57] ? [18:56:20] ok [19:04:47] ottomata, milimetric, drdee: first draft of hadoop tools page https://www.mediawiki.org/wiki/Analytics/Infrastructure/Hadoop_Tools [19:05:20] NIIICE! [19:06:03] cool dschoon, i'll take a look in a bit [19:13:08] drdee: does http://stats.wikimedia.org/EN/TablesPageViewsMonthlyMobile.htm include any of the API requests ? [19:14:27] i'm going to try to brain dump my disparate notes and TODOs about the cluster [19:14:31] i'm putting ideas in https://www.mediawiki.org/wiki/Analytics/Research [19:14:39] and TODOs in a file in the kraken repo [19:15:03] i have to use instead of the updated report as http://stats.wikimedia.org/kraken-public/webrequest/mobile/platform/mobile_platform-daily.tsv reports on 4/2013 but https://mingle.corp.wikimedia.org/projects/analytics/cards/60 doesn't have 4/2013 yet [19:15:42] dschoon: how long would it take to run https://mingle.corp.wikimedia.org/projects/analytics/cards/92 for 3/2013 ? [19:15:55] tfinc: about 3 hours. [19:16:01] (i believe) [19:16:04] that way i could compare more up to date reports [19:16:32] i can kick it off after lunch. [19:16:48] yes please. it wonder overwrite the existing report right ? [19:16:50] wont* [19:16:59] i'll ensure it doesn't. [19:17:06] thanks. i'll step out for lunch then [19:17:23] tfinc: oh [19:17:24] tfinc: no webstatscollector only counts url's that contain 'wiki' [19:17:26] sorry, there's a problem [19:17:35] drdee: thanks [19:17:38] we did not import sampled data going back before 4/14 [19:17:50] that's why i didn't launch a backfill job last friday [19:17:54] ahh ok [19:18:01] i'll have to compare to the old report then [19:18:02] you'll have to get somebody to do that first. [19:18:06] mk. [19:18:18] what would it take to import the data ? [19:18:18] brb lunch [19:18:27] ottomata is the man for that [19:18:32] k [19:18:33] brb [19:45:50] back [19:49:11] average: around? [19:54:01] drdee: i have no idea what bucketing criterion that stupid thing uses. [19:54:12] it is certainly not equal-width bins! [19:54:18] nor equal heights! [19:54:34] use the source luke! [19:58:37] i don't care enough :) [19:58:41] just saying that the results aren't that helpful [19:59:00] the bucket for 2.8896 has 1178071450 in it [19:59:02] can you give me the raw output, instead of the binned otutput? [19:59:18] dunno. [19:59:22] what do we want the output to be? [19:59:48] num_pageviews_in_session, times_seen? [20:00:05] yes [20:00:30] sure. [20:01:42] https://gist.github.com/anonymous/4de22c13fcd0949ed042 [20:01:44] that look right? [20:02:24] the where clause has two different time formats [20:02:35] but yes that looks good [20:03:03] time format is a string [20:03:08] so it's using lexical sort anyway [20:03:21] aight [20:03:30] it's running. [20:03:38] http://localhost:8888/beeswax/watch/201?context=design%3A115 [20:04:15] 1274 mappers, 381 reducers [20:04:33] 2 jobs [20:05:17] should take around 15 minutes [20:06:43] http://www.mediawiki.org/wiki/Meetings/2013-05-01 [20:15:43] drdee: http://localhost:8888/beeswax/table/default/tmp_mobile_session_pageviews_freq_march [20:15:46] job is done [20:16:13] neat! [20:17:38] i'll make a gdoc and graph it. [20:19:03] New review: Milimetric; "This looks good to me but I'm not 100% sure whether the new sql will generate the three-column CSV y..." [analytics/limn-mobile-data] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/60614 [20:19:03] Change merged: Milimetric; [analytics/limn-mobile-data] (master) - https://gerrit.wikimedia.org/r/60614 [20:21:41] drdee: https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0Ai_u2wTiMldddGFwYzg0SWp6Z3JkR2UyWWV1ZWhWUlE#gid=0 [20:22:34] those spikes are interesting [20:22:43] isn't pageviews => sessions? [20:22:53] i bet they're due to some sort of double-counting [20:23:00] that's very regular [20:23:29] SELECT pageviews, count(session_start) as times_seen [20:23:48] it's num_pageviews_in_session, times_seen [20:23:59] where times_seen is num sessions [20:24:40] drdee: http://localhost:8888/beeswax/execute/115 [20:26:52] i suspect the double-counts happen at job boundaries [20:26:55] but it's still weird. [20:36:35] guys, did everyone read the link that Leslie had sent a while back? https://wikitech.wikimedia.org/wiki/Server_access_responsibilities#Handling_sensitive_data [20:37:18] https://wikitech.wikimedia.org/wiki/Server_access_responsibilities [20:49:04] i did. [20:49:20] as i have no more meetings today, i'm going to wfh [20:49:23] back in 20 [21:04:23] drdee: you around for the x-cs meeting? [21:04:41] https://plus.google.com/hangouts/_/ef9fa6467ec7808b624e500fbeaf4c2ac3bb247b [21:05:50] i'm in it [21:05:52] on my way [21:42:40] [travis-ci] develop/ba0718f (#131 by milimetric): The build is still failing. http://travis-ci.org/wikimedia/limn/builds/6740683 [21:47:59] drdee: looking at the monthly report draft, I imagine you'll add a couple of lines for UMAPI? Ping me if you need input or feedback [22:25:37] drdee: i responded about the app numbers [22:25:43] overall they look fine but iOS seems high [22:26:13] iOS too high or android too low? [22:26:23] iOS too high given our install base [22:26:32] ko [22:26:33] ok [22:26:44] but overall its looking ok [22:27:28] i suggest we do a quick investigation to see whether there is an obvious issue, if there is then we fix it right away. else we close #92 and open a new card and slot that asap [22:27:45] sounds good? [22:28:34] tfinc ^^ [22:28:42] yup [22:28:57] awesome! [22:29:03] milimetric ^^ [22:29:15] the data has already proved useful in our product planning decisions. thank you [22:30:17] very glad to hear that! [22:33:21] drdee: now comes the next version of that card 'Page View Metrics report for Un-Official Wikipedia Mobile Apps' what detail can i put into the card to make it actionable ? [22:33:48] s/version/sibling [22:34:22] sweet there is already a card for it https://mingle.corp.wikimedia.org/projects/analytics/cards/503 [22:34:29] yup [22:34:57] drdee: is there anything missing from the story to make it actionable ? [22:35:32] yes, a list with user agent strings for nonofficial wikipedia apps [22:36:00] drdee: i can find some. but invariably there will be new ones that show up that we don't know about [22:36:04] i wonder what we can do about those [22:36:58] me too :) [22:37:16] i mean we will always miss stuff, that will be inevitable [22:37:43] drdee: sure, but we need a way to catch new apps as they show up. [22:37:54] i can make a list but it'll be out of date as soon as hit save [22:38:20] right, so i suggest a wiki page where we collect user agent string for non official wikipedia apps [22:38:35] which wiki do you want it on? [22:38:50] and then once in a while we rerun it for older data to adjust for the fact that it took us a while to find the UA [22:39:16] mediawiki is fine; there is a similar mediawiki page for the official WMF apps [22:39:31] pass me the link and i'll branch off of it [22:41:37] http://www.mediawiki.org/wiki/Mobile/User_agents [22:54:04] drdee: ok catalog is there now http://www.mediawiki.org/wiki/Mobile/User_agents#Un-Official_Apps . next i see two ways of getting the ua's [22:54:13] 1 - mine our logs given those ua strings [22:54:42] 2 - install every app, tunnel through a laptop, and sniff the traffic [22:57:59] brion has an interesting suggestion "it's probably not too hard to build an automated report that slurps every UA that claims to be 'iOS', sort em, and display them for humans to check if they mention an app name" [23:01:49] sniffing should catch em as long as they don't use https [23:04:02] drdee: thoughts ? [23:14:11] tfinc: like that idea! [23:14:18] feed the UAs into captcha! :P [23:15:11] but only those that are 30 characters or longer [23:15:17] and contain control characteres [23:16:02] drdee: like which idea ? i mentioned three different ones. brions ? [23:17:09] why does colloquy crash so often? [23:17:21] whoever said something to me, i missed it [23:18:07] drdee: like which idea ? i mentioned three different ones. brions ? [23:18:29] brions [23:22:27] brion: sadness so far this is Wikipanion - User-Agent: Mozilla/5.0.(iPod;.CPU.iPhone.OS.6_1_3.like.Mac.OS.X).AppleWebKit/536.26(KHTML,.like.Gecko).Mobile/10B329 [23:22:55] d'oh [23:23:17] oh wait nm [23:23:17] tfinc: on the plus side, we can distinguish that from Safari [23:23:17] that was an image request [23:23:28] aaaah yeah check if the page is different just in case [23:23:49] User-Agent: Wikipanion/1.7.8.3.CFNetwork/609.1.4.Darwin/13.0.0.. :D [23:23:52] \o/ [23:24:08] but anyway, worst case we can have a block of "generic third-party app" for anything that doesn't customize its ua [23:29:12] brion: fun Wikitap goes through a proxy [23:29:58] WikiWiretap [23:35:22] New review: Tim Starling; "Another buffer overflow, see inline for details." [analytics/log2udp2] (master) C: -1; - https://gerrit.wikimedia.org/r/58449 [23:55:03] brion: drdee : not a bad first pass http://www.mediawiki.org/wiki/Mobile/User_agents#iOS