[04:38:56] !log test [11:36:02] morning milimetric [11:36:15] howdy [11:36:39] early bird [11:37:40] i am rerunning the zero job based on the oozified pig script and with geocoding enabled [11:38:28] cool, good stuff [11:39:05] spending some time to make those dashboards read data straight from kraken might be worthwhile [11:39:45] i know evan usually changes things up a lot for his gp dashboards, so it wouldn't work for those [11:39:55] but i think the zero ones are expected to be quite stable [11:41:38] yup [12:11:43] drdee: hi ! :) [12:11:52] hey! [12:11:54] whats'up? [12:11:55] drdee: I've pulled from kraken repo [12:12:00] great [12:12:03] drdee: where is the pageview definition again please ? [12:12:05] any comments? [12:12:09] which one? [12:12:09] yes, I'm preparing the review [12:12:14] uhm, the one for new mobile pagevies [12:12:16] *pageviews [12:12:49] https://raw.github.com/wikimedia/metrics/master/pageviews/new_mobile_pageviews_report/pageview_definition.png [12:12:55] that's the spec [12:12:59] the implementation is here: [12:13:01] oh, I mean in the implementation [12:13:02] ok [12:13:55] I see isWikistatsMobileReportPageview , isWebstatscollectorPageview , isPageview , isWebstatscollectorPageview [12:14:00] https://github.com/wikimedia/kraken/blob/master/kraken-generic/src/main/java/org/wikimedia/analytics/kraken/pageview/Pageview.java [12:14:02] line 154 [12:14:19] ok, so isWikistatsMobileReportPageview [12:14:21] nice, LiAnna's csv upload problem helped me find a very useful bug in User Metrics [12:14:29] what's the bug? [12:14:36] average: yes that's correct [12:15:00] line 154-161 is where the magic happens :) [12:17:03] milimetric: question: how can you determine what the most optimal ordering of multiple criteria is? so you can put the criteria that catches the most hits first? [12:17:42] i missed the context [12:17:47] regular expressions? [12:18:32] (the bug was not doing .strip() on usernames) [12:21:36] oh I see what you mean drdee - most optimal in my opinion is this: [12:22:17] order in increasing complexity (execution time) and decreasing probability of positive match [12:22:47] basically, sort of like highest expected value goes first [12:28:06] right but how do you determine that empirically? [12:29:03] through experiments ? [12:29:17] like running on a sample and seeing which criteria discards the most ? [12:33:05] mmmmm but isn't there a tool that can do this automatically? this must be a common question [12:33:21] is this for Pig ? [12:37:26] what is? [12:37:43] Snaps: when do you think the code review process can start? [12:37:56] tonight! [12:38:04] of librdkafka [12:38:19] I just finished up the last things yesterday but didnt have time to push it. [12:38:21] i don't know of any automated tool that does that drdee, because how would the tool know all possible datasets? [12:38:30] that should also allow Faidon to debianize the hell out of it. [12:38:37] unfortunately in programming sometimes you gotta follow your gut :) [12:38:41] well you would specify an input dataset [12:38:57] so you can only make the ordering optimal to a particular input dataset [12:39:41] Snaps: that sounds great! any blockers / questions on your side? [12:40:19] drdee: I need Mark's input on performance bottlenecks before getting any further with varnishkafka. [12:40:35] what exactly do you need? [12:41:58] I need to know if the bottleneck is in the vanilla parts of varnishncsa, or with your modifications, and if so, what they are. [12:42:36] I cant really simulate the required loads on varnish locally on my system. [12:43:15] would labs allow you to simulate the required load? [12:45:24] Probably, hoping it wont affect other VMs too badly. [12:46:01] Thing is that Mark was going to do some varnish stuff anyway, so he'd provide some real wolrd profiling when he was at it. [12:46:54] are you now blocked or can you work on other things [12:46:54] ? [12:47:49] Thats not a blocker, no. I'll work on the non-performance related parts in the meantime. [12:50:54] cool [12:57:39] erosen, milimetric: https://gist.github.com/dvanliere/1d141fd3561f3d7a2dc8 [12:58:00] cool, that looks very reasonable [12:58:14] that's pageview count (naive) not webrequest [12:59:07] i really hope we can close this card soon [13:12:38] erosen, milimetric: please review https://github.com/wikimedia/kraken/pull/6 [13:15:09] shouldn't there be some zero filter? does the flatten zero somehow filter? [13:15:28] oh i see, it doesn't filter [13:15:38] but then there'll just be one big group of junk? [13:16:04] brb [13:16:53] log_fields = FILTER log_fields [13:16:54] BY ( (x_cs IS NOT NULL) AND (x_cs != '') AND (x_cs != '-') [13:17:53] btdubs, we should change the x_cs name to x_analytics [13:18:01] but we'll have to change it in all scripts [13:18:14] not now :) [13:19:09] sho sho [14:20:38] ottomata, milimetric; do you guys feel okay with the new zero_carrier.pig script? if yes, let's merge it and run it [14:20:51] oh new one? [14:21:52] fine with me :) [14:22:00] ottomata: check https://github.com/wikimedia/kraken/pull/6 [14:22:05] so the thing I mentioned above [14:22:17] see it [14:22:18] the x_cs not in ('', '-') is not enough [14:22:26] there will be x_cs for alpha & beta like you said [14:22:34] so the script will have a useless bucket [14:22:36] but that's ok... [14:22:41] we should probably check if x_cs contains 'x-cs=..' [14:22:42] or whatever [14:22:47] it just gets more than zero stuff [14:22:57] that happens in the Zero pig udf [14:23:14] right, but it's not *filtering* by it [14:23:27] it's just putting into buckets according to the ZERO udf [14:23:56] it's a small point, and I don't think it matters [14:23:59] other than that it looks good [14:24:15] okay [14:24:26] yeah, and remember that x_cs isn't really x_cs anymore [14:24:27] it is x_analytics [14:24:32] which may have other content in it [14:24:38] other than X-CS related stuff [14:24:49] so checking for non empty X-CS won't be accurate [14:25:09] please look also at the Zero pig idf [14:26:14] where? [14:26:28] https://github.com/wikimedia/kraken/blob/master/kraken-pig/src/main/java/org/wikimedia/analytics/kraken/pig/Zero.java [14:26:33] yes [14:27:34] but that's empty... [14:28:00] what??? [14:29:14] on the travis branch it's not empty https://github.com/wikimedia/kraken/blob/travis/kraken-pig/src/main/java/org/wikimedia/analytics/kraken/pig/Zero.java [14:29:21] it's back [14:30:51] ottomata: can you restart the zero_carrier job on kraken? [14:31:33] its stopped? [14:32:03] dunno, but there is a new version now ;0 [14:32:06] oh [14:32:08] oh oh [14:32:09] hm, ok [14:32:12] but yeah, don't we need to filter? [14:32:25] i see how your zero returns stuff if x_cs is set in the field [14:32:31] but doesn't the pig script need to filter on that result? [14:32:36] like milimetric said? [14:33:59] sure, i gotta run but feel free to add an extra filter [14:34:41] nope. i dunno what is going on here [14:35:19] i don't want to have to backfill the job after I restart it [14:39:44] heading home from cafe, be back in a bit [14:41:00] milimetric: pushed extra filter stmt [14:41:17] can you ask ottomata to relaunch it from oozie? [14:44:55] yeah, i'll keep an eye out for when he's back drdee [14:45:30] ty [15:16:41] haiiaa average [15:16:42] you there? [15:22:29] hey ottomata, drdee asked me to ask you to relaunch his new zero script from oozie [15:23:12] milimetric: ja could do but reallllyyyyy is it ready? erosen wants it [15:23:32] the numbers looked ok to all of us [15:23:40] it's at least ready enough to show amit [15:23:45] hokay! [15:23:47] if you say [15:23:48] but i'm not sure if erosen wanted to do so offline [15:23:59] hm [15:23:59] hm [15:24:01] how much work is it to relaunch the job? [15:24:14] its not to hard, does it need to be backfilled? or just start from [15:24:15] now [15:24:19] backfilling is the annoying part [15:25:06] let's just start from now, and if it's good we can backfill later? [15:25:17] we know for sure what's there is broken anyway [15:25:46] ok [17:28:30] drdee hallo? [18:01:35] is there a hangout? [18:02:21] https://plus.google.com/hangouts/_/5b70172d0f7418695ff6d98f3cb53bbb7097e020 [18:02:56] is this the showcase ? [18:31:05] hey Snaps: did you push code for review? [18:57:31] New patchset: Stefan.petrea; "Fixing problem with cronjob [IN PROGRESS]" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/67010 [19:01:42] New patchset: Milimetric; "fixed upload capitalization and whitespace stripping" [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/67136 [19:05:16] New review: Diederik; "Looks good." [analytics/user-metrics] (master) C: 1; - https://gerrit.wikimedia.org/r/67136 [19:05:34] New review: Diederik; "Looks good." [analytics/user-metrics] (master); V: 1 - https://gerrit.wikimedia.org/r/67136 [19:06:22] drdee, let's talk about that pig script once we are done here [19:06:27] i can put it up, just had a q for you [19:06:33] k [19:39:57] making coffee [19:40:10] * average is doing same [19:40:16] *the same [19:40:23] * drdee wants to do the same [19:59:50] average! [19:59:53] oh wait [19:59:53] first [19:59:55] drdee! [20:00:01] your about new zero pig scripts! [20:00:06] yaa [20:00:10] 1 sec [20:00:18] did you modify both carrier and country scripts? [20:00:18] and 2. [20:00:25] ottomata: I'm here [20:00:26] does there need to be a new pig udf jar uploaded? [20:00:30] ottomata: what's up ? [20:00:37] was going to get into dclass with you [20:00:45] hangout ? [20:01:03] sure [20:01:12] ottomata: just zero_carrier script [20:01:29] the new jars are in an10:/home/diederik/ [20:01:37] i am in hangout [20:02:09] batcave? [20:02:10] are you sure? [20:02:18] i am in batcave [20:03:26] drdee which hangout? [20:03:32] the regular one [20:03:41] i am lying [20:03:42] 1 sec [20:03:44] drdee , ottomata should I join the same one ? [20:04:02] ja [20:04:09] i am here [20:04:09] https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [20:14:32] drdee come back! [20:14:46] drdee: come back, I'll be lurking there for a long time [20:15:02] 1 sec watching average's review [20:15:13] average: http://www.wiggy.net/presentations/2001/DebianWalkThrough/handouts/handouts.html#DEBHELPER [20:24:36] drdee: yes, I just pushed 0.8-wip branch to github. [20:26:55] https://github.com/wikimedia/kraken/blob/master/oozie/mobile/zero/carrier_country/properties/carrier-coordinator.properties [20:32:45] Snaps: ty! when can you push the fist varnishkafka code? [20:33:52] drdee: I'm aiming for friday [20:34:39] do you mind if i tweet about librdkafka 0.8-wip? [20:34:47] no, please do :) [20:34:50] k [20:36:28] done [20:38:02] sweet! [20:39:50] vanilla varnishncsa is built from within varnish, depending on some APIs that are not available in libvarnishapi. My goal is to have varnishkafka buildable from outside the varnish code base, simply depending on libvarnish-dev et.al. [20:40:18] Does that sound okay with you guys? [20:40:50] that sounds reasonable to me, but please check with paravoid (faidon on irc; wikimedia-operations) and mark as well [20:41:00] ottomata: ^^ [20:41:24] okay, will do [20:42:16] paravoid is currently online [20:44:40] that sounds reasonable too, and probably preferable [20:45:02] better to not have to have theĀ full varnish source downloaded to build your thang [20:45:05] cool! [20:52:40] librdkafka is BSD licensed. Is that a concern for you guys? I think drdee mentioned LGPL at some point. [20:53:24] 20:46 < paravoid> I think you should import master as it is into gerrit, fork into a wikimedia branch with your changes [20:53:27] 20:46 < paravoid> and have a debian branch on top of wikimedia [20:53:30] 20:47 < paravoid> or you can just have upstream's master + debian branch and put your changes into debian/patches/ [20:53:48] ottomata: you and paravoid are on the same page on the wikimedia upstream branch [20:54:19] from 2 days ago [20:54:31] if I recall [20:54:47] haha awesome! [20:54:49] good to know! [20:54:49] :) [20:54:57] we both made the exact same suggestions! [20:55:02] :D [20:55:24] ottomata: I'm all clear except for the debian/patches/ , but it's a long night.. [20:55:37] have to get this sorted [20:55:46] yeah don't worry about it [20:55:51] if you are doing your own branch its no problem [20:55:55] the patches are more complicated [20:58:10] * drdee mumbles let's try to avoid that [20:59:11] * average agrees [21:27:50] ottomata: [21:28:38] ottomata: git clean -d -f -x; autoheader; aclocal; libtoolize --automake ; automake --foreign --add-missing ; autoconf ; ./configure; make; [21:29:11] ottomata: but first git clone ssh://@gerrit.wikimedia.org:29418/analytics/dclass ; git checkout wikimedia [21:30:27] cool ok! [21:30:41] ottomata: please let me know if it works [21:34:38] k few mins [21:38:41] hey average [21:38:49] i have a abunch of symlinks into /usr/local/Cellar [21:38:51] whazzat? [21:38:57] lrwxrwx--- 1 vagrant www-data 66 Jun 5 21:31 config.guess -> /usr/local/Cellar/automake/1.12.2/share/automake-1.12/config.guess [21:39:37] isn't Cellar used by homebrew? [21:39:55] ottomata: oh you're on OSX [21:40:04] or on vagrant ? [21:40:08] vagrant [21:40:24] so you're running inside an Ubuntu ? [21:40:28] yes [21:40:38] what is Cellar ? [21:40:48] dunno it came from you! [21:40:53] oh [21:41:07] I don't have a /usr/local/Cellar [21:41:10] oh they came from upstream [21:41:21] ahmm, i do on os x [21:41:29] master has them symlinked [21:41:39] are they now symlinked to Cellar on yours? [21:41:51] let me check, sec [21:42:10] user@user-K56CM:~/wikistats/libdclass$ stat config.guess File: 'config.guess' -> '/usr/share/automake-1.11/config.guess' [21:42:36] oh wait [21:42:39] forgot something [21:46:35] average [21:46:36] java-wikimedia/dclass-wrapper.c:4:17: fatal error: jni.h: No such file or directory [21:46:36] compilation terminated. [21:46:36] make[1]: *** [libdclassjni_la-dclass-wrapper.lo] Error 1 [21:47:05] ottomata: before that, can you please delete the remote master please ? [21:47:31] I need to push a new master instead (the one from the real upstream) [21:47:55] oh, hmmmmMmmmm [21:48:01] i can try... [21:49:28] (btw, i have to leave in 10 mins) [21:49:35] ok [21:50:10] user@user-K56CM:~/wikistats/libdclass$ git push --force origin :master [21:50:13] remote: Branch refs/heads/master: [21:50:15] remote: You need 'Push' rights with the 'Force Push' [21:50:18] remote: flag set to delete references. [21:51:07] yeah [21:51:09] are you sure you want to force push? [21:51:14] i can't either [21:51:20] i can delete the repo and we can start again [21:51:24] but i can't delete master [21:51:28] ottomata: yes please [21:52:16] I have everything locally so I can push, no problem [21:52:24] ok cool [21:52:31] recreated with an empty repo [21:52:38] i betcha you can just push to the same url [21:53:15] ok [21:53:16] doing [21:54:49] ottomata: ok pushed master and wikimedia [21:55:42] git clean -fxd will remove any stuff that's not in the repo [21:57:28] AHH ik now why Cellar was there [21:57:37] i accedentally typed that command you gave me in my os x prompt [21:57:41] and autoconf installs those files [21:57:49] they aren't there on clean clone [21:58:00] ok , average [21:58:01] still this [21:58:01] java-wikimedia/dclass-wrapper.c:4:17: fatal error: jni.h: No such file or directory [21:58:01] compilation terminated. [21:58:01] make[1]: *** [libdclassjni_la-dclass-wrapper.lo] Error 1 [21:58:17] ottomata: you got jdk 1.6 on that vagrant ? [21:59:20] yes [21:59:31] oh [21:59:32] maybe just jre [22:01:55] ok i just had to set JAVA_HOME [22:02:03] i think it worked! [22:02:31] got a buncha .o files [22:02:51] dclassjavalinux64.so [22:02:52] ? [22:02:59] ko [22:03:00] ok [22:03:02] look in .libs/ ? [22:03:02] it looks good i think [22:03:03] i have to run [22:03:07] ok [22:03:10] ja lots of stuff in there [22:05:41] cool [22:05:43] ottomata: thanks [22:06:55] yup!