[03:28:30] ori-l: hey man [03:28:34] ori-l: what timezone are you on ? [03:29:04] wikimedia foundation time [03:29:10] early 90s, php just taking off [03:29:20] perl the latest cool thing [03:29:51] ori-l: haha [03:30:06] ori-l: isn't it late there right now ? [03:30:25] 8:30pm [03:34:04] ori-l: oh , intersting [03:35:09] you? [03:37:16] ori-l: 09:40am here [03:37:19] ori-l: GMT+3 [03:39:04] where are you? [03:43:38] ori-l: .ro in eastern europe [03:43:57] oh, cool [03:44:00] where in romania? [03:51:36] ori-l: near hungary [03:51:47] ori-l: are you originally from turkey ? [03:55:24] no, israel [03:55:27] my wife is bulgarian [03:55:34] ori-l: :) [03:58:00] ori-l: right now I have some stuff to work on but.. [03:58:13] ori-l: Ori, I was reading about Kraken, cause Erik sent me a link about it [03:58:27] ori-l: how does Krkane affect E3 and ClickTracking ? [03:58:33] ori-l: Kraken [03:58:46] ori-l: I mean, I know Kraken will change a lot of stuff [03:59:03] ori-l: and I was wondering what things it will make obsolete and what not [03:59:31] ori-l: the projects are always evolving and maybe some become obsolete and some new ones pop up [04:00:16] i'm not sure. we're using redis in e3 and it's a very good platform for getting something going fast, and it'll also scale up to a point [04:00:27] but not beyond it :) [04:00:49] i think the future will be that all of this data goes into hadoop instead [04:00:57] the bits that will probably stay are the front-end ones [04:01:00] ori-l: I'm working on two projects in analytics/webstatscollector and analytics/wikistats [04:01:12] ori-l: is analytics different from E3 or ClickTracking ? [04:01:27] analytics and e3 are teams, clicktracking is an extension [04:01:53] ori-l: it's different right ? cause in analytics we're analysing logs like squid for example. but you guys in E3 want to analyse user usage behaviours right ? [04:03:24] i think the difference is in outlook [04:03:49] there needs to be some non-dinky way of gathering basic statistics and making it available for analysis [04:04:10] that's a big investment and it's going to take some time to productionize [04:05:11] in the mean time, there are various small things we can do [12:30:07] morning average_drifter [12:30:25] drdee: hello ! [12:30:35] let's work today in labs on build1 [12:30:50] and get webstatscollector to completely work and then build a debian package [12:31:00] so we can deploy it today or tomorrow [12:31:20] drdee: I propose to use it for periodic deployment. My connection with it is rather slow, and vim moves like a snail on it for example(the machine might be fast but my connection is not..) [12:31:37] drdee: Yes, I can do the .deb packages today for example [12:31:42] ok [12:31:50] or we can use mikogo [12:32:00] but let's focus on getting this finished [12:32:24] yes [12:35:11] drdee: I have some things that I need to talk to you about on skype [12:35:19] ok [12:36:47] why did you drop the unsigned long serial from collector? [12:37:02] why did you drop the debug clause in Makefile.am? [13:26:06] morning ottomata [13:26:58] morning! [14:06:01] ottomata, if you feel like looking at some code ;) : https://gerrit.wikimedia.org/r/#/c/25195/ [14:06:39] brb [14:29:33] heading to cafe, brb [14:36:36] average_drifter, are you on linux? [14:37:11] milimetric: always... alll..ways [14:37:13] :) [14:38:17] :) ok, well google hangouts screen sharing wasn't working for me and it was a known bug [14:38:28] sudo apt-get install libxrandr-dev will fix it right up [14:39:10] oh cool [14:39:31] I have an iPad, hangouts work ok there [14:39:41] (I don't like Apple, my father gave it to me as a present) [14:39:48] haha :) nice [14:40:24] how'd you get mikogo working on Ubuntu (or you running a different distro?) [14:41:30] I tried downloading and running and it complained about libGLU and libX something not being found (they were in the /usr/lib/x86_64 directory when I searched) [14:42:37] morning milimetric [14:42:45] howdy drdee [14:43:36] milimetric: just dpkg -i mikogo.deb [14:44:17] ah, cool, thanks, didn't know they had a .deb [14:44:22] milimetric: I'm using Ubuntu 10.10 (because a lot of stuff in newer versions is bugging me) [14:44:36] milimetric: oh sorry, they have a .tar.gz [14:44:40] milimetric: sorry about that, forgot [14:44:44] milimetric: but it's up on their website [14:45:02] ah! wise man, I should've done that. I used to use Ubuntu until 10.10 and I was very happy. I went windows for a couple of years and everything's awful [14:45:19] drdee, btw, I reviewed that, not sure if you guys saw it [14:45:43] ottomata, thanks! and i fixed it [14:46:58] average_drifter: almost ready with webstatscollector? [14:47:22] drdee: yes, I'm on it, I'll have a review soon [14:47:34] ok [15:39:08] ottomata, what is the syntax for cidr range of 1 address in udp-filter [15:41:09] just the address [15:55:51] i get Could not initialize cidr filter: Invalid argument [16:04:33] ottomata, and what is the syntax for a cidr range? [16:05:51] IP/netmask [16:06:02] wait, invalid arg? [16:06:06] what's your command? [16:14:28] lunchtime, brb [16:48:53] you guys smoke? I wanna give up smoking [16:50:01] nope [16:50:06] you should [16:57:22] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [16:59:36] drdee: should we have asserts instead of print_error ? [17:02:56] depends :) [17:03:16] drdee: so basically if we have a print_error that's like critical and exit(-1) follows for example [17:03:22] ottomata, dschoon ^^ [17:03:32] yeah that would be a good case for assert [17:03:45] but if it's more a debugging statement then you can leave it as-is [17:03:51] ok [17:03:55] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:03:56] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:03:58] hehe [17:04:03] you beat me :) [17:05:45] drdee: I've got a question about serial. It's used for debugging purposes, I understand, but is there any code somewhere that counts lost messages ? [17:06:02] drdee: not really important, but would be interesting to know.. [17:06:03] not sure :) [17:09:20] ok [17:17:10] hey Diederik said to come back erosen [17:17:47] sorry i missed scrum. [17:17:49] we still going? [17:17:55] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:18:10] great [17:23:07] fixed demo friday meeting [17:27:52] oh guys, i am goign to miss stand up tomorrow, s'ok? [17:31:59] we will miss you! [17:34:45] ottomata, what is the cidr range for 0.0.0.0-255.255.255.0 ? [17:34:57] or, how can you derive a cidr range? [17:38:10] you can google for cidr calculators [17:38:12] or um [17:38:23] do you just want to match all ipv4 address? [17:38:34] why 255.255.255.0? [17:39:05] if so: [17:39:06] 0.0.0.0/0 [18:10:29] ottomata: https://github.com/mozilla-metrics/akela/blob/master/examples/geoip.pig [18:11:51] drdee: review around the corner [18:11:57] sweet [18:12:36] drdee, NICE! [18:12:38] geoip in java [18:12:42] yeah, that's pretty sweet. [18:12:48] i'm trying to do it with pig streaming, but it isn't working [18:12:50] it should though... [18:12:55] more importantly: geoip as a hive UDF [18:12:57] er, pig [18:38:21] ottomata: http://www.menupages.com/restaurants/dos-toros-taqueria/ in case you are back in NYC and miss SF burritos [19:01:49] drdee: https://gerrit.wikimedia.org/r/#/c/25169/ [19:02:17] drdee: I wasn't able to --enable-debug to enable the DEBUG variable inside Makefile.am [19:02:35] drdee: I tried to use it [19:04:02] i just use DEBUG=1 [19:04:12] okay i am gonna merge this change [19:04:27] and then we can start test in labs or on your box [19:05:00] drdee: yes [19:05:08] i merged it [19:05:19] see https://gerrit.wikimedia.org/r/#/c/25169/ [19:05:49] drdee: alright, shall we do mikogo/skype for the debianization ? [19:06:04] yeah totally [19:06:09] cool [19:06:22] i can use a quick break, can you use one as well? [19:06:55] now also run git pull on your local repo on your master branch [19:07:14] oh ok [19:10:42] git review is a piece of s****, i am getting this error: [19:10:43] Had trouble running git log --decorate --oneline build --not remotes/gerrit/master [19:10:43] fatal: ambiguous argument 'build': both revision and filename [19:10:44] Use '--' to separate filenames from revisions [19:10:51] ?????? [19:59:35] grumble grumble [19:59:39] nothign working yet [19:59:45] i have tried two different routes [19:59:58] it is really hard to figure out why things don't work with pig [20:07:59] ottomata: https://gerrit.wikimedia.org/r/25408 but not super necessary [20:22:14] back [20:39:25] ottomata, dschoon, milimetric, (erosen), can everybody take some time to update / close / open asana tasks today? [20:40:06] i just did when we talked about it in stand up [20:40:08] drdee! [20:40:09] I think I got it! [20:40:20] pig geocode and group by country code [20:40:30] SWEEEEEEEEEEEET [20:40:44] can you put the pig script on gist? [20:41:09] one sec, gonna commit it to kraken, lemme just make sure my cleanup works [20:41:23] k [20:41:32] or github is fine as well [20:41:33] drdee: yeah, i'll look later in the day [20:42:40] drdee, should I keep asana updated with day to day minutia? Right now I'm just working on the nebulous d3 issue but I could update Asana with what specifically I'm doing as I do it [20:42:56] milimetric; not too granular [20:43:13] more like at the feature / bug / task level [20:43:24] but shouldn't I update that in github? [20:43:42] yeah the issues should be updated as well [20:43:46] in github [20:43:53] oh you mean you'd like both? [20:43:59] in your case :( [20:44:03] ah, I see [20:44:28] but there are also things you do on limn that won't go into github but should go into asana [20:44:44] agreed [20:44:54] limn features / bugs => both asana / github (for now) [20:45:05] limn 'tasks' broadly defined => asana [20:45:30] k. maybe we can figure out a way to sync git back to asana [20:46:55] drdee [20:46:55] https://github.com/wmf-analytics/kraken/blob/master/src/pig/geocode_and_group_by_country.pig [20:47:09] word [20:47:21] there are probably ways to make it more efficient [20:47:30] but it runs relatively fast on a single day log file [20:47:34] nice, based on the mozilla staff [20:47:35] i'd love to load in peter's data and try it [20:47:37] yeah [20:47:39] thanks for the link [20:47:42] welcome [20:47:50] i have a meeting in 10, drdee, so i won't get to it until later [20:47:51] just copy the files that are on locke [20:47:54] just fyi [20:48:03] righ on [20:48:04] hmm, that's only from a day though [20:48:10] stat1 then? [20:48:13] there must be more [20:48:24] naw, jeff copies them off elsewhere manually [20:48:28] i can ask jeff [20:48:29] aarrghhh [20:48:34] i did [20:48:42] mark needs to change the netapp [20:48:47] oh! [20:48:49] there are a bunch on locke [20:48:55] so the analytics machines can fetch the data from there [20:48:57] ohhh awesome! [20:49:11] ah but only from the last few days [20:49:14] he rotates really often [20:49:21] :[ [20:49:24] every 15 mins it looks like [20:49:27] ?? [20:49:28] why [20:49:36] doesn't matter [20:49:44] i dunno, there was some issue with space or load or something a long time ago [20:49:45] and he does this [20:49:50] I think I have a todo to work with him to clean this up [20:49:52] but meh [20:49:54] so…. we gotta ping mark or someone who can change the netapp [20:49:58] (no clue what the net app is) [20:50:05] but then you can fetch 2.3Tb of data [20:50:07] why? i ahve root, can't I just copy them? [20:50:11] you can try [20:50:21] but apparently it is more complicated [20:50:27] but sure give it a shot [20:51:04] maybe you have to go like net app => stat1 => analytics [20:51:16] just guessing [21:17:57] hey dschoon [21:18:01] i gotta run reallllly soon [21:18:10] my pig job is working on smallish sets of data [21:18:18] but I'm trying to run it on a month of sampled data at the moment [21:20:45] would love if you could take a min to check it out [21:20:55] i'm having heap size problems, even if I increase -Xmx [21:21:03] -Xmx100000m [21:21:08] if you cd into my homedir on an04 [21:21:11] you can run [21:21:26] export PIG_HEAPSIZE=100000 && dse pig -p input=/user/otto/logs/sampled-1000-201205 -p output=/user/otto/test0/sampled-1000-201205-country_count -f ./geocode_and_group_by_country.pig [21:21:34] and see what happens [21:55:21] ottomata, you should at least set both Xms and Xmx and to the same value, this prevents the JVM from growing and shrinking the heap) [21:55:45] that probably won't fix the problem [21:55:51] but it's a good habit [21:56:12] and that way you discover immediately if there is enough memory available [22:02:38] back [22:22:14] drdee, he left, but yeah. [22:22:29] also, setting the heap to 100,000M definitely will not fix it. [22:22:36] well, actually [22:22:45] i guess these machines *do* have 100G+ memory [22:22:46] heh [22:22:50] wow. [22:23:01] i guess that is a totally unreasonable but true fact [22:23:39] hey - how many dell c2100's did analytics get? [22:24:03] 25, but i can double-check [22:24:16] yeah, please do [22:24:43] C2100: analytics1011-analytics1022 [22:24:50] R310: 23-27 [22:24:59] Cisco: 01-10 [22:25:09] but [22:25:17] i think those are just the ones that are set up. [22:25:20] let me check my email. [22:30:46] i'm having trouble finding the the actual procurement request. [22:32:11] checking with RobH [22:33:28] binasher: 10 C2100s [22:33:51] we also have 5 R310s, but afaik we have no issues with those (yet) [22:34:44] (apparently the number had magically changed in my head at some point. we have 25 machines total, but only 10x C2100s.) [22:34:50] (which is rather embarrassing.) [22:36:37] here's some good general advice [22:36:51] if you're looking for a euphoric feeling and your usual hobbies leave you down [22:37:06] mess around with d3 js for about two hours or so [22:37:25] you're welcome [22:40:04] hey dschoon, you know the log scale on this graph: http://reportcard.wmflabs.org/graphs/sample_graph/ [22:40:13] hehe [22:40:23] d3 is so great. [22:40:31] I got d3 to do that, soo easy, but the grid lines, how does dygraphs know to do that? [22:40:47] like how's it know to paint more lines around the data [22:40:59] dygraph uses canvas. [22:41:02] so only god knows. [22:41:05] lol [22:41:12] no but I mean is that an option you specified or.. [22:41:21] yeah, if you look in options, it's gridlines [22:41:28] but you don't specify the intervals or the numbers [22:41:34] ok, gotcha [22:41:38] that's what i was wondering [22:41:39] it just picks some numbers based on the data and the scale. [22:41:48] how's it goin otherwise? [22:41:53] cool - it seems pretty good at it. Was that something people liked? [22:43:08] oh, in general, I spent like another two hours banging my head against a virtual wall (I never do real walls) and then I just scrapped it all and am old school injecting html and CSS right from client.co [22:43:43] so far i got a legend, the lines, color scale, and little dots along the lines [22:43:50] it looks almost exactly the same [22:44:08] so i have axes, zooming, data display, and aggregates left [22:44:15] prolly polish that off tomorrow easy [22:44:20] in a couple hours [22:44:30] and then maybe I can pick your brain [22:45:06] sorry (typing is hard to stop after I just coded a bunch :) I'm off to patrol my neighborhood for the night - see you morrow [22:45:19] *ndo* [22:45:22] no worries [22:45:27] cheers [22:45:28] sounds good