[12:53:23] morning everyone [12:53:37] Change merged: Milimetric; [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/62098 [13:16:49] moorning! [13:43:55] mooorning [13:52:35] morrnnigngi [13:59:05] sbt sbt sbt! [14:05:24] restarting compy [14:09:46] there he is! mr ooooooottttttooooooomaaaaaataaaaaa [14:15:41] one more restart, trying to fix my ssh-agent [14:17:38] milimetric [14:17:40] around? [14:17:46] yea [14:17:49] average, around? [14:17:51] working on the umapi stuff [14:17:55] to the bat cave? [15:02:25] New patchset: Milimetric; "fixing bugs from DarTar's feedback" [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/62155 [15:02:43] bugs fixed ^ should I self-merge drdee? [15:02:55] try git-review :) [15:02:57] well, someone else can review i guess [15:03:00] (seriously) [15:03:15] no i mean, the patch is submitted [15:03:26] oh i see the link [15:03:26] great! [15:03:34] let's add erosen as reviewer [15:03:38] i spent 55 minutes fixing the bugs and 30 minutes fighting with gerrit, then giving up and re-cloning [15:04:46] milimetric, drdee: thanks for this - I'm leaving home in a moment, will follow up on this + reply on feedback once I'm in the office [15:04:47] New review: Diederik; "Ok." [analytics/user-metrics] (master); V: 1 C: 1; - https://gerrit.wikimedia.org/r/62155 [15:05:11] i gave it +1 looks good to e [15:05:23] you read fast :) [15:05:36] or trust me too much [15:05:46] i saw that you fixed the things from the email [15:12:09] milimetric, you got a sec for the batcave? [15:13:03] ya, brt [15:29:45] ottomata: can you kill your udp2log process on locke: otto 1199 0.0 0.0 20744 2000 ? S May02 0:01 /usr/bin/udp2log -p 8420 --config-file ./udp2log.conf --recv-queue=16384 [15:30:12] hmm [15:30:29] weird [15:30:30] done [15:33:14] New review: Erosen; "(1 comment)" [analytics/user-metrics] (master); V: 1 C: 1; - https://gerrit.wikimedia.org/r/62155 [15:38:37] ottomata: drdee: x_cs spike? [15:38:47] sure! [15:38:59] i'm looking into this sbt stuff now, would rather help you at the moment :) [15:39:49] let's do it [15:39:52] i will join in a few minutes [15:40:16] k [15:40:32] ottomata: do you want to take a look at this ipython notebook i made on stat1 [15:40:51] sure [15:41:43] * erosen checking ssh command [15:41:57] k [15:41:59] try this: ssh -N stat1 -L 7000:localhost:7000 [15:42:37] and then open up localhost:7000 and hopefully you'll see an ipython notebook browser [15:42:47] click on the x_cs_forensics notebook [15:43:15] there [15:43:26] great [15:43:49] the other data source of interest are the counts which david made from kraken [15:44:34] which he put on the mingle card here: https://mingle.corp.wikimedia.org/projects/analytics/attachments/297 [15:44:53] hmm, can we count just zero tagged vs not zero tagged? [15:44:56] we aren't checking for validity here [15:45:03] just whether or not it gets tagged [15:45:08] might make it easier to read and see problems [15:45:18] sure [15:45:33] that was part of my goal with the whole ipython notebook thing (which may or may not be that useful) [15:45:58] if you replace fields = [….] with just fields = ['x_cs'] [15:46:01] that should simplifiy things [15:47:27] also it hink we should drop hostname from the grouping as well [15:47:35] we are pretty sure there aren't any host specific problems, i think [15:47:47] fair, [15:48:04] but really we should just be checking off each theory, one by one and marking it somewhere [15:49:09] also, i think we should be using dtac thailand now [15:49:46] where are the field names defined? [15:50:25] hangout? [15:50:40] mime_type [15:50:41] :) [15:51:01] ? [15:51:10] i call it content-type [15:51:12] :) [15:51:14] fair [15:51:32] hehe, i don't actually know anything about how the web works [15:52:07] ok, hangout is tough atm, i'd have to change locations [15:52:11] which I want to do soon anyway [15:52:13] that's fine [15:52:15] should I do that now or you wanna just chat? [15:52:20] did you figure out how to change the fields [15:52:23] yeah [15:53:49] ok, you want to look at thailand? [15:54:24] so [15:54:24] for thailand, they have to be zero domans, right? [15:54:27] i think so [15:54:28] yes [15:54:33] k [15:56:22] ok so [15:56:26] i'm looking at may 02 [15:56:31] 126472 lines [15:56:36] 814 matches for zero.wiki [15:56:37] is that right? [15:56:47] dtac thailand [15:57:51] let me check [15:58:58] ottomata: which days are you running? [16:01:39] just may 02 [16:01:51] zero-dtac-thailand.tab.log-20130502 [16:04:16] most of the untagged reqs there are from 302s [16:04:25] but there are 200s as well [16:05:49] 1% of reqs are not tagged [16:06:06] .06% of untagged reqs are 302s [16:06:07] but are you checking that they are m. or zero. domains? [16:06:12] just zero [16:06:13] not m. [16:06:17] m. not valid for thaland [16:06:22] aah [16:06:23] good [16:06:25] (I'm using awk) [16:07:05] k [16:07:22] i can't find any consistency though [16:07:25] as to why these are untagged [16:08:00] when I run against the whole month of march, I get zero untagged requets to zero. domains... [16:08:48] oh weird [16:08:50] but you get some in may? [16:09:33] may is good to... [16:09:46] oh maybe my check is bad [16:10:05] oo [16:10:12] i was filtering on the request being for an article [16:10:19] may 2 you get 0 untagged zero domains? [16:10:22] oh [16:11:01] k, running on just may 2 [16:11:21] drdee: ottomata: Just sent you guys some email about reproducing data loss. [16:11:28] can i share screen with you, or will that also overload your connection? [16:12:07] sure, actually, i'm going to go sit outside, just dont' want to disturb my coworkers here [16:12:09] will grab headphones [16:12:19] gotcha [16:12:33] it's fine to just have video, if you prefer [16:13:56] in standup hangout [16:14:01] man there is zero shade back here [16:14:06] i can't really see my screen anyway :p [16:14:10] hehe [16:14:19] let's do screen share no audio inside? [16:14:31] let's try thist first [16:14:38] k [16:32:47] drdee, rounce123: which hangout do you want to use? [16:32:53] https://plus.google.com/hangouts/_/69179906f1bf2473b03a00a77eafe0e5f14e98f1 [16:33:00] otto and erosen are in the scrum one.. [16:33:05] yup [16:51:26] erosen: logging.exception(str(e)) seemed to get me both the traceback and the exception message [16:51:33] you're saying just do logging.exception()? [16:51:33] interesting [16:51:37] yeah [16:51:40] i like it, done [16:54:14] well, instead I changed it from str(e) to a descriptive unique message to help debug [16:54:26] it looks like either way the stack trace is getting dumped: http://docs.python.org/2.6/library/logging.html#logging.exception [16:56:14] New patchset: Milimetric; "exception fix" [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/62165 [16:56:58] Change abandoned: Milimetric; "gerrit fail" [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/62165 [16:57:44] erosen: ha, didn't know you were using ipython nbs on stat1, we're experimenting the same on vanadium (to access eventlogging data) [16:58:06] yeah, i heard ori-l was looking into that [16:58:22] works well, minus the lack of multi-user support issue [16:58:45] let's chat about that, would be great to join forces [16:59:47] oh god [16:59:55] GERRRIT!!!!! [17:00:14] i have now officially spent more time dealing with gerrit than fixing code [17:01:36] milimetric, average: scrum [17:02:19] milimetric ^^ [17:12:08] average, around? [17:12:30] drdee: I am here, yes [17:12:48] millimetric, average: wanna quickly talk about 356? [17:13:04] sure [17:13:32] ottomata, is this done: https://mingle.corp.wikimedia.org/projects/analytics/cards/385 ? [17:14:17] ottomata, how busy are you this afternoon? [17:14:58] New patchset: Milimetric; "fixing bugs from DarTar's feedback" [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/62155 [17:15:16] finally god [17:15:32] milimetric, hangout? [17:16:23] TypeError: exception() takes at least 1 argument (0 given) [17:16:37] average: hangout? [17:17:15] which one ? [17:17:31] the regular one [17:17:33] ok [18:05:27] erosen: got a moment? [18:05:40] not right now [18:05:42] in a meeting [18:05:57] we can chat at noon? [18:06:00] sure [18:06:07] are you in the office? [18:06:08] milimetric: what's up with vagrant these days? [18:06:18] which vagrant [18:06:24] ori's mediawiki one? [18:06:25] erosen: ye[ [18:06:35] cool [18:29:32] udp-filter is not compatible with file format in 2011 because it had IPs of the form [18:29:40] 88.88.88.88|US [18:30:00] udp-filter is now splitting on tabs instead of spaces [18:30:37] ok, skip 2011 [18:30:50] so I have to write some code to detect the delimiter first [18:31:06] so I can tell udp-filter what the delimiter is with -F [18:31:14] no :) [18:31:31] just determine the first file that has not 88.88.88.88|US [18:32:12] these are two separate problems [18:32:26] drdee: I understood your solution for 88.88.88.88|US and I will use it [18:32:32] drdee: but now there's another one, the separator [18:32:44] which is different pre/post 1st feb 2013 [18:32:49] right [18:33:08] so pre 1st feb 2013, has space as delimiter [18:33:40] yes, I'll have to adapt the script, to use this information [18:34:24] remember that card I wrote about heterogenous data ? it was hard for me to describe it properly [18:34:29] but this is what I wanted to put in it [18:34:39] we have heterogenous data formats across time [18:38:28] ottomata: I'm around if we need to discuss anything today (since I hear you're off next week). [18:39:20] average: you are making it too complicated [18:39:36] find the files that are space delimited and are not yet geocoded [18:40:09] write a bash script to geocode those files [18:40:20] yes [18:46:10] xyzram: in a meeting uhhh, not sure [18:46:19] drdee wants me to just try your new stuff [18:46:32] i'll try to build it on analytics1008 and run some stuff there [18:46:39] are you working in a branch? [18:50:12] ottomata: No, it is just standalone code in ~ram/udp2log-local/src built with a trivial Makefile [18:53:30] can you commit it to a branch of udp-filter repo? [18:55:27] xyzram: ^? [18:55:37] ok, sure. [18:55:56] danke [18:57:36] Just "git review" as for other projects OK or is there some other preferred way ? [18:57:50] i think you can make a branch and just push to it [18:57:55] we'll do review when we merge back into master [18:58:00] git push ? [18:58:07] git push origin [18:58:09] i think will doi t [18:58:20] Ok, thanks. [19:00:42] mwalker: ping [19:01:40] erosen: pong! I was wondering, now that I have stat1 access; a) how do I get onto stat1, and then b) that counting script you wrote; I was going to play around with it for that list of URLs so I was wondering if you had any pointers (/paths I needed to know to get to the log files) [19:01:59] sure, want to chat in person? [19:02:17] can do -- are you on 3 or 6? [19:02:37] 6 [19:02:56] i can come to you, or you to me [19:03:46] you can also meet midway at in between floor 4 and 5 in the elevator and test the wifi network [19:09:56] xyzram: lemme know the branch if it works [19:13:58] drdee: I wrote a script to find out what you wrote above [19:14:07] cool! [19:14:10] it's now running [19:14:20] it detected 6 files so far [19:14:31] there are ~400 more to process.. [19:17:30] ottomata: branch multiplexor pushed [19:17:56] dankeee [19:18:28] je vous en prie (not sure what that says actually) [19:19:52] Just realized: You might have to hack one line of the code since the names of [19:20:05] output files are automatically generated with: sprintf( cmd_buf, "%s >> out_%d.log", args.cmd, i ); [19:25:56] milimetric: i meant with vagrant not working for you [19:28:00] away for a bit, back in about an hour. [19:29:41] oh, I got it to work yesterday, jeremyb_ [19:29:57] average figured out the right versions of virtualbox and vagrant to use on Ubuntu [19:30:18] I sent them to ori-l, and they're: Vagrant 1.1.2, VirtualBox 4.2.8 [19:30:31] any other combination I tried (higher or lower) did not work for me [19:39:21] $ dpkg -l virtualbox | fgrep virtualbox; gem list vagrant | fgrep vagrant [19:39:24] ii virtualbox 4.1.18-dfsg-2+deb7u1 amd64 x86 virtualization solution - base binaries [19:39:28] vagrant (1.2.1) [19:39:30] worksforme [19:39:31] but i'm debian not ubuntu [19:49:43] drdee: ETA 6h for that script [19:49:49] drdee: and after that I can run the geocoding.. [19:49:50] :| [19:54:06] https://gist.github.com/wsdookadr/fef34b127d4300935e5e [19:54:15] this is what's running now on stat1 [21:06:37] heyaa drdee [21:06:49] and xyzram [21:06:53] hi [21:07:13] afaict, we can use new udp-filter to anon and geocode produce unsampled mobile into kafka, yay [21:07:18] and i think we can do it with a single producer [21:07:19] hey [21:07:31] WOOOT [21:07:35] that's great news [21:07:35] ! [21:07:45] without needing multiplexor ? [21:08:09] what do you mean by "new" ? [21:08:12] your thing [21:08:20] i just tried the version with your branch [21:08:22] Oh, interesting, great news! [21:08:29] i mean, we never tried it with the old one either [21:08:30] :) [21:08:37] but I just tried yours and it seems good [21:08:46] great to hear. [21:09:06] you guys want to build a .deb next week so that when I get back we can start doing so? [21:09:44] yes, we will do that [21:10:01] ram, thank yooouuu so much! that's really awesome! [21:10:16] looking forward to you joining our scrum meetings, 10AM PST [21:10:18] welcome, glad I could help. [21:10:50] Yes, I'll be there Monday though I may miss Tuesday due to some personal stuff. [21:10:54] ok, np [21:13:33] oh awesome! [21:15:52] milimetric: have the umapi fixes been deployed? [21:16:07] not yet [21:16:13] blocked? [21:18:46] hm, no, i'm just waiting for review [21:18:51] you guys said i shouldn't self merge :) [21:23:35] oink [21:23:39] ok i will merge [21:23:40] 1 sec [21:24:20] New review: Diederik; "Ok." [analytics/user-metrics] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/62155 [21:24:24] Change merged: Diederik; [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/62155 [21:24:45] milimetric ^^ [21:25:19] thanks [21:25:37] so now it'll be deployed whenever puppet runs [21:25:40] or whenever someone pulls it [21:25:49] yes so that is within the next 30 minutes [21:26:35] ottomata, what's the name of the new branch in udp-filter? [21:26:47] and if it is working, shouldn't we merge it into master? [21:26:55] and do a code review? [21:26:57] sure, ja [21:26:58] totally [21:26:59] xyzram ^^ [21:27:39] xyzram: can you submit a patch to gerrit that merges your branch? [21:27:43] Branch name is 'multiplexor' [21:27:53] k [21:28:08] can you merge and push to gerrit for review? [21:28:30] ok [21:28:38] ty [21:34:51] drdee: I'm getting an error with git review: ! [remote rejected] HEAD -> refs/publish/master/add-multiplexor (no new changes) [21:34:59] sweet! [21:35:12] milimetric is really good in fixing these problems :) [21:35:27] but i think you have to merge the branch locally [21:35:31] and submit that as a patch [21:36:05] I did that: created a new branch called add-multiplexor, merged multiplexor and I still get this error. [21:36:19] git checkout master [21:36:24] git merge multiplexor [21:36:36] Oh, merge into master ? [21:36:39] yes [21:37:12] For the other projects I always create a separate feature branch and "git review" that. [21:37:21] Why is it different here ? [21:37:48] because you pushed directly your feature branch [21:37:56] you can do that [21:37:58] actually [21:38:02] for udp-filter [21:38:03] but yes in general that's a better practise [21:38:04] it probably doesn't matter [21:38:07] doing into master is good [21:38:21] but xyzram, in this case what you would want is not a feature branch, but a topic branch of master [21:38:22] so [21:38:34] git checkout -b add-multiplexor origin/master [21:38:39] But then if I want to do an independent change I have to reset hard on master. [21:38:47] nope [21:38:49] its a local branch [21:38:55] that gets submitted for review to the remote master [21:39:10] so you can make as many of those local topic branches as you want [21:39:20] and each one tracks the origin/master remote [21:39:36] whenever you git pull it pulls from origin/master [21:40:03] that way you can do separate work locally on different branches, but still use the same remote for all of them (makes the review side easier) [21:40:25] so [21:40:45] average: your script https://gist.github.com/wsdookadr/fef34b127d4300935e5e is nifty but you don't have to parse the entire file, the first line of every file will tell you what you need to know [21:40:50] git checkout -b multiplexor-merge origin/master [21:40:51] git pull [21:40:51] git merge multiplexor [21:40:51] git review [21:41:05] i think that would do it [21:41:23] or you could just merge into master (and not do any more work til the review is done), but i think the topic branch way is better [21:41:24] I think I did something similar: git checkout master [21:41:43] git checkout -b add-multiplexor [21:41:59] git merge multiplexor [21:42:03] git review [21:42:09] and got that error. [21:42:30] maybe you pushed against origin/master [21:42:46] don't think so, lemme see [21:42:52] let met chck [21:43:17] This is how I pushed: git push origin multiplexor [21:44:04] hmmm [21:44:09] but did you create the remote branch? [21:44:28] No, not even sure how to [21:44:54] multiplexor was my local branch [21:45:22] That was, I think, your recommendation earlier today :-) [21:45:31] yes :) [21:45:39] okay so master does not have your patch yet: https://gerrit.wikimedia.org/r/gitweb?p=analytics%2Fudp-filters.git;a=shortlog;h=refs%2Fheads%2Fmaster [21:45:51] and there is a remote branch called multiplexor [21:45:55] so that's all good [21:46:03] ja, xyzram git push origin multiplexor created that [21:47:01] Seems like git review wants to create a new branch but finds there is already an identical branch in place. [21:47:16] So it says, "no changes" [21:47:50] should I try to git review it? [21:47:54] we can delete the remote branch, would that solve it? [21:47:58] don't thikn so [21:48:16] xyzram: git branch -a [21:48:24] hmm no [21:48:31] ottomata: yes see if git review works for you. [21:48:37] k [21:48:43] New patchset: Ottomata; "Added src/multiplexor.c which runs multiple copies of a filter." [analytics/udp-filters] (master) - https://gerrit.wikimedia.org/r/62198 [21:48:51] :D [21:48:54] hilarious [21:49:09] oh!, uhhhhh [21:49:10] Ok, so looks like that worked. [21:49:15] was I supposed to use multiplexor to test stuff today? [21:49:17] i just used udp-filter [21:49:42] i don't even know what this is, i thoguth you were just doing performance improvement work on udp-filter [21:51:19] drdee? [21:51:35] It is performance work but doesn't modify udp-filter directly; it adds a shim which is a standalone program that multiplexes to multiple copies of udp-filter. [21:51:54] i am confused ottomata; this is what you said: [21:51:58] xyzram: without needing multiplexor ? [21:51:59] [5:08pm] xyzram: what do you mean by "new" ? [21:52:00] [5:08pm] ottomata: your thing [21:52:00] [5:08pm] ottomata: i just tried the version with your branch [21:52:13] so i assumed you used the multiplexor branch [21:52:13] * robla tries to follow along too [21:52:15] i did [21:52:24] So did you test by just using udp-filter or did you use multiplexor as a filter ? [21:52:26] i compiled from multiplexor branch [21:52:33] and then used udp-filter [21:52:37] no, i didn't know t here was a new binary [21:52:44] No but you also need to use the new binary. [21:53:07] So, looks like your testing shows that the multiplexor is not needed ! [21:53:24] That udp-filter alone is adequate. [21:54:14] at least for the mobile stream [21:54:36]