[00:16:20] Eloquence: you mean app instrumentation-type stuff? "how many times did a user do X?" [00:16:35] yeah [00:16:54] basically using whatever the most mature version of clicktracking is right now [00:17:31] proper funnel analysis would be nice, I know e3's been doing some early experimentation with that [01:03:46] Eloquence: alolita and siebrand sent me a few pointers for i18n tool usage metrics, I created a card here and told them I'll review it next week: https://trello.com/c/Ml6lC4rg [01:05:01] are you user "Erik Moeller" on Trello or is that a sock? [01:05:43] cool. I looked at that wiki page. it basically asks for the moon right now ... but I think just knowing what technical means they have to perform basic analysis would be useful to them at this stage. [01:06:15] that's a-me. [01:06:37] yes, Alolita says they should narrow this down to the 4-5 main metrics they care about and see how much effort it would take to get them [01:06:53] k sending you an invite so I can assign you some tasks :p [01:06:57] hee [01:07:50] you should have a trello invite in your inbox already [01:10:48] nite everyone :) [01:11:22] night dan [01:16:17] average_drifter, can you push the udp-filters fix? [01:16:47] Eloquence, happy to talk to Alotita, will schedule a meeting with her and ori-l [01:38:38] drdee: do you happen to know if there are any statistics on the growth of IPv6? [01:38:56] possibly filtered by Text/HTML requests and then separated by browser? [01:40:22] walker, that's not too hard i guess, will try to spin up something tomorrow [01:44:01] drdee: i don't have anything meeting-worthy yet [01:44:40] drdee: awesome! what we're most interested in is the change betwean Nov 2011 and Sep 2012 if it's available [01:57:29] ori-l, i understand, i do wanna talk with alolita to have a better understanding of her needs, if you want i can invite you but i can also do it by myself [02:07:55] hey drdee, did you see my message earlier? [02:08:09] i am afraid not [02:08:22] I just said I finished the script for day, month, year [02:10:56] drdee: yeah, i don't think i'd have anything to present that she isn't already familiar with. she signed off on the early E3 logging plans, and sadly not a whole lot has changed. [02:11:12] ok, i'll just talk with her then [02:11:33] louisdang, great! can you paste the gist link one more time? [02:12:26] alright drdee one sec [02:14:32] drdee: https://gist.github.com/3869339 [02:14:53] only has code now. I didn't record my tests last time so I'll run them again [02:16:51] I couldn't get the convert date UDF in akela to work right so I just parsed the date in a stupid way. I can try and figure out how to use UDF later. [02:24:00] drdee: I'm also not sure how you wanted the data stored and formatted [02:27:24] output is fine as csv file [02:27:37] yeah, the date handling seems like something that we can work on [02:32:13] drdee: posted some sample output https://gist.github.com/3869339 [02:32:39] drdee: I'll work on the output tomorrow [02:33:26] awesome! maybe we can have a quick Skype call tomorrow and talk a bit more about longer term plans [02:33:49] that be nice [02:35:15] drdee: I have a quiz tomorrow from 12:30-3:30 but I should be available all day aside from that [02:35:36] Pacific time [12:13:56] gooooooood morning everyone [12:14:02] dschoon? but it'd be 5am [12:14:19] sometimes sleeping is hard. [12:14:23] goodness gracious! [12:14:46] you working or just zombieing around [12:15:10] hey now. i slept for ...a bit. [12:58:06] hey dr [12:58:09] drdee ? [13:06:24] morning everyone [13:09:33] morning [13:16:48] morning otto [13:16:52] hey ottomata [13:17:31] morning [13:20:35] ottomata: I have these lines on a local vm in /etc/apt/sources.list [13:20:37] ottomata: https://gist.github.com/9dd3e2e5c53813fd1e4b [13:20:46] ottomata: they are the same lines build1 and build2 have [13:20:52] ottomata: yet when I do aptitude search libcidr [13:20:56] ottomata: or aptitude search libanon [13:20:59] ottomata: I don't get any results [13:21:14] ottomata: but build1 and build2 both have entries for libcidr and libanon on them [13:21:28] ottomata: I must be missing something, but I don't know what [13:21:38] how about in /etc/apt/sources.list.d/wikimedia.list [13:21:39] ? [13:21:46] oh ? [13:22:42] ottomata: just a moment, trying [13:23:04] W: GPG error: http://apt.wikimedia.org precise-wikimedia Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 09DBD9F93F6CD44A [13:23:36] ottomata: do I need a key ? I see there's a wikimedia.key file in /etc/apt [13:23:42] ottomata: do I need that key ? [13:23:59] bwer, usually that doesn't keep you from installing things [13:24:01] just gives you an error [13:24:19] lemme fix my labs login stuff... [13:24:21] ottomata: I just hit aptitude search again and libcidr and libanon aren't there [13:24:28] ok [13:28:34] i'm not sure why I can't log into labs, but in the meantime, you can always manually download the .debs from the link I gave you and dpkg -i them [13:29:36] ok [13:36:23] psshhhh why can't I get into labs anymore!? [13:46:57] ottomata: erm, I can login to it [13:47:09] ottomata: I'm logged in right now into both build1 and 2 [13:47:23] mornin [13:47:48] ottomata: i can still get into reportcard2 [13:47:50] yeah, i can't even get into bastion.wmflabs.org [13:47:52] i tried just this moment. [13:48:25] debug1: Offering RSA public key: /Users/otto/.ssh/id_rsa-wmf [13:48:25] debug1: Server accepts key: pkalg ssh-rsa blen 279 [13:48:25] Connection closed by 208.80.153.207 [13:48:46] if you can get into bastion [13:48:48] can you ls [13:49:06] ls /public/keys/otto/.ssh [13:49:37] dsc@bastion1:~$ ls /public/keys/otto/.ssh [13:49:37] ls: cannot access /public/keys/otto/.ssh: Permission denied [13:52:02] hmk [13:52:09] /public/keys is shared to build1 right ? [13:52:10] root@i-000002b3:/public/keys/otto# ls .ssh/ [13:52:11] authorized_keys [13:52:19] ja i guess so [13:52:26] but I can't even talk to build1 [13:52:26] heh [13:52:31] cat authorized_keys [13:52:33] heh. [13:52:33] if you can [13:52:39] I can talk to it [13:52:42] apparently i can't sudo there, and will be reported :) [13:52:49] ha, on bastion no [13:53:07] hey drdee [13:53:10] ottomata: pm-ed you with the output [13:53:27] can't cat -- perms again [13:54:07] aye [13:54:38] morning guys [13:54:54] drdee: I can git review the new switch -t [13:54:57] drdee: but the tests are failing [13:55:02] drdee: should I still git review ? [13:55:13] drdee: or work on getting the tests up to par and then git review ? [13:55:19] we first need to fix the -p stuff [13:55:23] yes that's better [13:56:11] average_drifter, I dunno what's wrong, I'll ask for help in #wikimedia-labs later, when ryan lane is online [13:56:21] for now i'd just say install the .deb s you need directly [13:57:27] ottomata: if you try `ssh -vi ~/.ssh/id_rsa-wmf otto@10.4.0.54` [13:57:29] does that work? [13:57:49] er [13:57:55] obv that's a private address [13:58:16] my machine believes bastion is 208.80.153.207 [13:58:27] ja me too [13:58:32] mk [13:58:40] check the perms on your key? [13:58:50] on my local? [13:58:53] i can log into other machines fine with it [13:58:59] huh [14:00:37] morning dschoon, ottomata, milimetric [14:00:45] morning drdizzle [14:00:47] morning [14:00:48] dschoon, early bird! [14:00:53] indeed [14:00:53] oh yeah! [14:00:57] sleep is hard. [14:01:07] big data hard? [14:01:14] hm. maybe. [14:01:28] hey dee :) [14:02:55] ottomata: ok got things working [14:03:08] ottomata: I think there are only packages for 64-bit [14:03:15] ottomata: that was why I didn't see it [14:03:35] i have to pick up a bed to accommodate anna's mom who arrives this weekend, so i am back in 90'ish minutes [14:06:33] oh average_drifter, if you can help erik z with some git stuff that would also be super cool [14:07:01] the labs machiens aren't 64 bit? [14:10:11] drdee: yeah of course [14:10:45] drdee: talked with Erik today, about git, I really see he's all into it reading the git pro book and stuff [14:11:18] drdee: I'm on skype all the time so we can treat every problem as we go [14:11:34] drdee: I understand he's doing a migration right now [14:12:15] ottomata: they are 64 bit, my local vagrant vm was not, but I quickly switched to a ubuntu precise 64 vm locally [14:12:20] ottomata: all fine now [14:12:31] ottomata: btw have you checked out vagrant ? it's awesome [14:12:51] ottomata: http://vagrantup.com [14:13:15] ahhh [14:13:18] cool [14:13:29] i thought you were running aptitude search on build1? [14:13:55] no, on the local vm I've set up [14:14:15] I can package faster on it, and then switch to build1/2 when I get a working package [14:25:04] yeah, I dev locally too [14:25:11] but try to use the same OS and arch as in prod [15:21:18] backy [15:29:18] ottomata, so why doesn't the labs bastion work? [15:29:34] i mean the proxy actually [15:29:41] you can log in to bastion directly [15:29:49] annoying [15:31:36] no i can't [15:31:40] i can't log into bastion.wmflabs.org [15:31:43] i don't know why [15:33:18] that does work for me [15:37:18] drdee, fyi, i am taking down our puppetmaster on an01 [15:37:21] kpuppet will not work for now [15:37:27] k [16:05:04] ottomata: https://labsconsole.wikimedia.org/wiki/Help:Access#Connection_closed_by_remote_host [16:05:31] didn't you have to do some recovery due to your compy problems recently? [16:05:36] have you logged into labs since then? [16:05:54] could it have changed your local UID/GID which might make something weird happen? [16:06:10] and/or what's the email address on the key? does it match the one you've uploaded? [16:08:50] hm [16:08:54] i don't thikn i've logged into labs since then [16:09:18] but hm, i did a full restore from a backup though [16:09:30] so anything that was stored on my / should be the same [16:10:08] email address on the key? [16:10:17] this bit? [16:10:17] otto@klein.local [16:10:19] its the same [16:14:25] ottomata: same as the one on labs? [16:14:40] ja [16:14:44] the key is the same [16:14:52] also: in my experience, a "full" restore won't necessarily preserve UID/GID [16:14:56] it'll preserve the names and all that [16:15:01] UID/GID for what? [16:15:05] your user [16:15:21] like, your default group might have changed or some bullshit [16:15:30] which would cause ssh to fail your key [16:15:40] those messages can often be non-existent or cryptic [16:16:04] if you go into Accounts and right-click your user, there's an "advanced" dialog [16:16:12] where you can see the actual nuts and bolts without using dscl [16:17:58] accounts on os x? [16:17:59] dscl? [16:20:19] ottomata, why don't you run ssh with -vvv and paste the output in a gist [16:21:37] https://gist.github.com/d7b8c9c8d85adfb91cb9 [16:22:31] man dscl :) [16:23:48] oh shit [16:23:49] i bet i know. [16:23:57] hay ottomata -- run `ssh -V` [16:24:18] dsc ~ ❥ ssh -V [16:24:18] OpenSSH_5.9p1, OpenSSL 0.9.8r 8 Feb 2011 [16:24:21] drdee , ottomata can I haz buildbot ? [16:24:24] drdee: please review https://gerrit.wikimedia.org/r/27729 [16:24:36] OpenSSH_5.9p1, OpenSSL 0.9.8r 8 Feb 2011 [16:25:09] drdee: now looking at that --debug you told me about yesterday [16:25:16] ok [16:25:34] drat. [16:25:41] next! [16:25:44] line 12 is suspicious [16:25:46] debug3: Could not load "/Users/otto/.ssh/id_rsa-wmf" as a RSA1 public key [16:25:58] hmm that is [16:25:58] 21:29 yup [16:26:02] 21:30 so i rather have ./collector --debug then [16:26:03] didn't notice that before [16:26:04] 21:30 ./configure DEBUG=1 [16:26:23] try ssh -2 [16:27:02] i was already googling for line 11 [16:27:06] the key should specify though. [16:27:43] and yes, ottomata, the accounts pane in OSX, if you right-click on your entry at the left, you can see default GID [16:27:47] ah nope [16:27:48] All that this means is that your id_rsa file is not an RSA1 public key, which is a good thing since RSA1 public key are only used for protocol version 1 of SSH and are mostly a thing of the past. So this is really not something to worry. [16:27:57] heh [16:28:05] i'm mountain lion [16:28:10] I don't have 'accounts' I have Users & Groups [16:28:13] and right click doesn't do anything [16:28:29] oh, have to unlock... [16:28:42] ok, but ssh doesn't use that stuff, does it? [16:28:53] i see UUID [16:33:07] it doesn't show default GID? [16:33:38] average_drifter: change set merged [16:35:58] nope UUID [16:40:42] drdee: thanks :) [16:40:45] average_drifter, better to replace strcpy with strncpy [16:40:57] drdee: coming up in the next git review [16:41:07] drdee: but can I ask about the --debug for collector [16:41:33] drdee: I'm looking over what we talked yesterday, I pasted above [16:41:46] drdee: so the collector should have a --debug ? [16:41:52] drdee: what should it print in that [16:41:54] drdee: ? [16:42:33] oh yeah there are some #if DEBUG in the code of collector.c [16:42:49] so instead of using the DEBUG=1 from configure we just add a switch to the collector [16:49:49] yes [16:50:13] and the printf statements should only be effective when the --debug option is turned on [16:50:32] I was able to build udp-filters .deb package locally [16:50:43] inside a ubuntu precise 64bit vm [16:50:48] but only as root [16:50:48] but about priorities, has the -p issue in udp-filters been fixed? [16:50:57] that's priority 1 [16:51:01] because as a normal user, I was getting some weird permission denied errors [16:51:02] the --debug is priority 2 [16:51:12] ok, the -p issue [16:51:19] you shouldn't build debian packages as root [16:51:25] uhm, I have to re-read the backlog on that [16:51:51] I know, I'm in a vm locally so that's not an issue but the weird thing is [16:51:59] it was trying to write manpages to /usr [16:52:03] while building the package [16:52:05] and I'm not sure why [16:55:57] ok, so I prepended some stuff to the debianize.sh script rm -f configure Makefile ; aclocal ; autoreconf ; autoconf; automake ; [16:56:03] and now the debianization works fine [16:56:11] that's coming up in my next git review as well [16:57:57] k [16:58:44] average_drifter: and if the --debug is turned on then the collector should dump the db every minute, else every hour [16:59:06] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:22:58] drdee: I've read the blog post about search logs. Have you considered filtering the logs by only showing queries that appeared, say, 100 times? Or just the top 10000 queries if that's too risky? [17:24:20] yeah, we are considering that [17:24:29] but right now a bit swamped with other stuff [17:24:42] i see [17:24:46] aggregation is really the only good answer, but it's unclear if the risk is worth the reward [17:24:51] aggregation is really the only good answer, but it's unclear if the risk is worth the reward [17:24:53] er [17:25:01] iPad sucks [17:25:06] the rewards are huuuuugue :D [17:25:17] a wiki zeitgeit [17:25:24] the g+ app decided it wasn't going to work just before I wanted to join the meeting (so I siwtched to my desktop) [17:25:40] ottomata, you wanna look at my pig script? [17:26:01] drdee: you guys were talking about Go and FFI [17:26:04] ja sure, i'm super hungry, but will put off my satiation just for you [17:26:12] an01:/home/diederik/status_count.pig [17:26:26] drdee: so you want to write some kind of C/C++ bindings for Go so that you can use it as a module for .. something [17:26:32] average_drifter: yes we were [17:26:38] but I got in so late that I didn't get the whole context [17:26:47] we were talking about the kafka go producer [17:27:10] we are discussing on how to stream all traffic data realtime into the analytics cluster [17:27:28] and we are looking into kafka as one solution [17:27:58] ottomata, so i think that version actually compiles but has no results [17:28:28] hmmm, i don't htink you can group a bag by a field in a different bag [17:28:44] how would you do it? [17:29:36] average_drifter: going back to udp-filter, have you been able to replicate the -p problem? [17:30:01] you should probably include the http_status in the first bag [17:30:19] in the site bag? [17:30:25] or whatever, hangon [17:31:00] yeah, cgo [17:31:01] um [17:31:05] i think RegexExtract [17:31:12] extracts matches from the regex [17:31:12] and you `require "C"`, which i find kind of hilarious [17:31:14] but you aren't matching a nything [17:31:23] no parens in your rege [17:31:27] drdee: I need to re-read the backlog [17:31:30] drdee: erm, last night [17:31:32] FLATTEN (RegexExtract(uri, '\\.m\\.', 1)) as site:chararray; [17:31:38] drdee: we decided that it wasn't a udp-filter bug [17:31:40] what do you want to extract? [17:32:03] drdee: what should I do to replicate it ? [17:32:04] yeah, drdee, that was a problem with log2udp relay prepending a seq number [17:32:14] aruggh two discussions :) [17:32:26] 01:44 hey, there is a bug with the -p option in udp-filter [17:32:27] 01:44 version 0.2.6 does work [17:32:28] 01:45 while 0.3.14 doesn't [17:32:30] hehe [17:32:33] pig: so i need break down of status codes by mobile vs non mobile site [17:32:47] drdee: so basically -p works on 0.2.6 but not in 0.3.14 [17:32:50] average_drifter: the -p issue is independent of the seq issue [17:33:02] filtering on paths doesn't work right now in version 0.3.14 [17:33:48] ottomata, so that FLATTEN should extract '.m.' so we know whether it's a mobile site or not [17:34:00] ok, i htink you need parens around your regex [17:34:02] not sure , but i think so [17:34:12] also, you can keep the http_status in the same bag with the site [17:34:20] how? [17:34:22] maybe this? [17:34:23] SITE = FOREACH LOG_FIELDS GENERATE FLATTEN (RegexExtract(uri, '(\\.m\\.)', 1)) as site:chararray, FLATTEN (RegexExtract(http_status, '.*(\\d\\d\\d).*', 1)) as status:chararray; [17:35:36] ottomata, what is the command to sample? [17:35:51] ? [17:35:54] oh from relay? [17:36:01] no within pig [17:36:05] sample? [17:36:32] job runs but it fails [17:40:34] ok, gotcha [17:41:14] https://gist.github.com/3880457 [17:41:48] super duper ty [17:42:18] ja so, you don't need to extract the .m. part, all you want to know is if .m. is in the uri [17:42:24] right [17:43:00] so from LOG_FIELDS, generate a bag containing {canonical, status} [17:43:03] then group and count [17:43:22] and how can you run an additional reducer so you have 1 final output file? [17:43:36] instead of all the parts? [17:43:38] why not just do [17:43:52] drdee: are you using udp-filters in conjunction with pig and kafka ? [17:43:53] hadoop fs -cat /path/to/part* | sort > outfile.txt [17:44:20] average_drifter: no, we just use udp-filters as a temp solution to collect data [17:44:41] drdee , ottomata please post a dataflow diagram when you have time about how all this stuff fits together (I have a narrow view of it atm which is good because I'm focusing on what I'm doing but I'd be curious to find out more) [17:44:45] average_drifter: https://www.mediawiki.org/wiki/Analytics/Kraken [17:44:51] ottomata, but that doesn't give total counts [17:45:00] that just concatenates the files [17:45:00] total counts? [17:45:04] ? [17:45:24] yes you get the same keys [17:45:28] do you want to have group by canonical,status, or a total count? [17:45:38] oh, you shouldn't have duplicate keys across files [17:45:46] the same keys go to the same reducer [17:45:49] right? [17:45:49] oh right [17:45:51] okay [17:45:51] got it [17:45:53] yes [17:46:32] average_drifter: the dataflow diagram is https://upload.wikimedia.org/wikipedia/mediawiki/3/38/Kraken_flow_diagram.png [17:47:18] oh btw, re kafka go stuff, in my chat with Robla yesterday, I explained that bit, and explained why we were disappoitned in teh C stuff [17:47:26] well, C++ stuff [17:47:31] he and i also discussed it. [17:47:35] but he wasn't opposed to us implementing zookeeper there [17:49:05] ottomata: chat available anywhere ? [17:53:43] brb heading into office [17:57:10] drdee: trying to replicate [17:57:18] drdee: I did a 0.2.6 checkout [17:57:47] drdee: I used the logfile I got from ottomata last night and did cat main.log | ./udp-filter -p "/wiki/Main_Page" [17:57:52] and I didn't get anything in 0.2.6 [17:58:11] there was no output for what I run just above [18:03:38] chat? naw it was our weekly 1on1 checking meeting [18:03:49] running out to get food and run an errand, be back in a bit [18:10:45] average_drifter: don't use main.log [18:10:52] or if you use that one add this: [18:10:53] 23:31 < average_drifter> drdee: please tell me how to replicate the bug [18:10:56] 23:31 < drdee> just run [18:10:59] 23:31 < drdee> udp-filter -d en.wikipedia.org and it should capture all url's with the domain en.wikipedia.org [18:11:04] this was from yesterday [18:11:11] so it was -d and not -p ? [18:11:12] yes that's true if you use the right input file [18:11:16] no it's -p [18:11:24] ok, what input file should I use ? [18:11:33] if you use main.log add this after cat [18:11:33] cut -d ' ' -f2-15 | [18:11:44] this will drop the sequence number [18:11:59] if you use example.log from udp-filters source code then you don't need the cut command [18:12:01] ok, and then -p with what argument ? [18:12:05] yes [18:14:04] ok, I can use the example.log [18:14:41] ok got it [18:14:58] user@garage:~/wikistats/udp-filters$ cat example.log | ./udp-filter -p "/wiki/Main_Page" [18:15:01] sq18.wikimedia.org 1715898 1169499304.066 0 216.38.130.161 TCP_MEM_HIT/200 13208 GET http://en.wikipedia.org/wiki/Main_Page NONE/- text/html - - Mozilla/4.0%20(compatible;%20MSIE%206.0;%20Windows%20NT%205.1;%20.NET%20CLR%201.1.4322) [18:15:05] user@garage:~/wikistats/udp-filters$ [18:15:05] drdee: so 0.2.6 is outputting this [18:15:15] now I'm going back to 0.3.14 to see what's going on there [18:15:27] ok [18:39:05] average_drifter, been able to replicate the bug? [18:40:31] drdee: yes [18:40:34] drdee: bug confirmed [18:40:53] ok [18:49:42] j'arrive [19:15:17] hey hoooo [19:15:42] haay gurl [19:21:54] i'm back in labs [19:22:01] apparently since I'm in ops now I have to use a different bastion! [19:23:34] obivously [19:24:15] lol [19:24:25] is that bastion-private? [19:25:04] i like how it's like, "now that you're special, we've moved you to the Executive Suite" ...which is half the size, drafty, and facing a wall [19:25:21] bastion-restricted! [19:25:41] right! [19:25:50] milimetric: you continued playing with git flow? [19:25:52] bastion-biohazard [19:26:18] i'm using it but not very actively since i've been on the same feature branch for a couple of weeks [19:26:25] drdee^ [19:26:41] your test stuff just got merged [19:26:52] https://gerrit.wikimedia.org/r/26476 [19:31:38] ottomata, i am getting weird http status codes in the pig script [19:31:51] basically every number between 0 and 999 [19:32:46] hmmmm [19:33:10] ok one sec.. [19:35:19] I'm confused drdee [19:35:34] milimetric…. about the merge? yeah me too [19:35:39] I can just push to the reportcard-data repo? [19:35:45] that's why it merged, I just did a push [19:35:46] yes [19:35:54] oh [19:36:00] i was changing test stuff to do with limn [19:36:13] uh, that makes Gerrit sort of useless doesn't it? [19:36:35] like, I shouldn't be allowed to circumvent it [19:37:36] weird, drdee, it is looking pretty normal to me on my small sample: [19:37:37] (desktop,302,3) [19:37:37] (desktop,304,35) [19:37:37] (desktop,200,157) [19:37:37] (desktop,404,3) [19:37:37] (desktop,301,2) [19:38:47] ottomata,, https://gist.github.com/1234e25fd0e2722e7d25 [19:42:35] which pig script are you running? [19:43:00] home/diederik/status_count.pig [19:51:20] i made a small fix to the mobile regex stuff [19:51:50] maybe there are some lines with somewhere an extra space... [19:55:07] drdee: was wondering if there had been any headway with the science project to determine banner load times on enwiki main page? [19:55:25] yes we are working hard on it, fixing a bug in udp-filter [19:56:08] no in 0.3.14 [19:56:28] ah; nifty! :) [19:57:17] average_drifter: can you give a status update on the -p issue? [20:00:39] hey I'm hanging out with Rob [20:00:40] https://plus.google.com/hangouts/_/4d17c1bee0c30e050921f0c1d83773c393267237 [20:01:58] drdee: yea, workin on it [20:02:10] drdee: debugging, gdb, print statements [20:02:30] k [20:03:46] drdee, dschoon, ottomata - are we having happy demo friday? [20:03:55] 'cause we're in the hangout [20:04:04] coiming [20:05:25] erosen, where you at? [20:05:31] oh yeahhhhh! [20:22:10] drdee, what did you change in the regex? [20:23:05] drdee: [20:23:17] drdee: I'm having some doubts if I can confirm the -p issue [20:23:38] drdee: do you want to do a hangout so I can show you my screen (looks like it's easier than mikogo) [20:23:48] ottomata, in the status code it used to be [20:23:53] drdee: I mean, at first I was able to confirm it, but when trying to reproduce it again I can't [20:24:00] .*(\\d\\d\\d).* [20:24:06] i dropped the final .* [20:24:09] drdee: if we just go to g+ hangout we can check have a look together [20:24:19] because that is always a space [20:24:23] average_drifter sure [20:24:28] send me an invite [20:25:23] drdee: invited, does it arrive in your email ? [20:25:48] just paste the link in ir [20:25:49] c [20:25:52] no email yet [20:26:02] ttps://plus.google.com/hangouts/_/4d17c1bee0c30e050921f0c1d83773c393267237 [20:34:46] mwalker: udp-filter issue fixed [20:34:52] setting up two filters right now [20:38:36] sorry! [20:38:39] got distracted [20:38:39] ottomata, is it possible to run two udp-filter instances on an11? [20:38:46] right :D [20:39:06] yeah sure [20:39:18] from the same stream? [20:39:25] or just in general? [20:39:41] drdee: haha! fantastic :D [20:39:45] from the same stream on an11 [20:40:20] not really [20:40:29] you just want to test it? [20:40:32] it seems that when i launch two instances of udp-filter [20:40:36] the first one gets killed [20:40:47] i just want to run two filters side-by-side for 10 minutes [20:40:51] with netcat? [20:40:54] yeah you can't do that [20:41:02] mmmmm [20:41:11] is there a quick hack possible? [20:41:14] you could set up a udp2log instance [20:41:18] sure! write a socket server :) [20:41:29] a hum *quick* hack [20:41:29] and then use your different udp-filters as filters in the config file [20:41:43] why do you need them side by side? [20:41:43] right [20:41:57] can you just dumb a buncha data for a few minutes into a file [20:42:05] and then run them each on the file? [20:42:06] because one is capturing hits on main page enwiki [20:42:19] and the other is capturing banners with the referer being main page enwiki [20:42:39] you can't do that in the same filter? [20:42:40] then you can join on ip address and then see the response time [20:42:57] sort of but you get some additional hits as well [20:43:06] so then we have to resort to your negative grep [20:43:13] right, which is bad? [20:43:14] ok okok [20:43:16] will set up udp2log [20:43:18] no not really [20:43:19] gimme a few minutes [20:44:56] k [20:51:12] ottomata, pig job still has weird http status codes [20:51:48] rats! [20:52:12] also the counts seem way too low [20:52:37] not just seem, they are too low [20:53:00] i can reproduce this problem with main.log [20:53:01] aya, sorry about the demo friday fail on my part [20:53:09] when I script counts it on cli [20:53:15] i can see the good results [20:53:20] which do not match the results I get in pig [20:53:23] was eating lunch with jessie [20:53:24] how did you change the regex? [20:53:49] drdee: .*(\\d\\d\\d).* [20:53:50] [4:24pm] drdee: i dropped the final .* [20:54:04] but that didn't seem to make any change at all [20:55:07] mmmm [20:55:54] yeah i tried that too :p [20:55:55] heh [20:56:16] totally weird, i had thought it was something to do with the grouping, but the values are bad at the regex extraction [20:56:24] i even tried using the built in extractor rather than piggybank [20:56:25] same results [20:56:26] werid [20:56:27] um [20:56:32] shoudl I work on that or udp2log on an11? [20:56:42] let's do udp2log first [20:56:44] k [20:59:01] ok drdee [20:59:11] don't bother with the udp2log init script [20:59:13] just run it manually [20:59:15] you can edit [20:59:20] /etc/udp2log [20:59:25] and add whatever filters you want there [20:59:36] then just run [20:59:37] udp2log [20:59:44] and it will start up and load up your filters and do its thang [20:59:47] cool! [20:59:51] you can ctrl-c udp2log when you want it to stop [21:00:34] so no sequence number dropping? [21:00:52] just pipe | udp_filter blbalbla? [21:02:12] can you quickly verify /etc/udp2log? [21:02:43] average_drifter: debianize.sh changset approved [21:03:49] drdee: https://gist.github.com/3881509 [21:04:05] when you asked about teeing a UDP stream [21:04:14] drdee: thanks [21:04:16] i vaguely remembered it was easy in node [21:04:21] i *think* that does it. [21:04:51] ottomata: ^^ curious if that works [21:07:23] ottomata, is it adding the sequence number again? [21:07:55] ummmmm, yes [21:08:04] seq number i added by log2udp relay from oxygen [21:08:50] hmm, drdee, i jus trealized that the pig stuff wasn't working for me becuase I was running on my main.log example [21:09:03] :) [21:09:30] and it has seq numbers [21:09:34] so my regex was working on ipaddy [21:09:37] is that happening to you? [21:09:49] no because i am using the sampled log files [21:10:01] hhmmm yeah when I did a sampled file I didn't have the problem [21:11:17] something totally different, but i have been seeing this user agent Twisted%20PageGetter hitting the enwiki homepage multiple times per second for at least 24 hours [21:11:49] ok, try this one [21:12:09] mwalker, filters are running [21:12:21] https://gist.github.com/3881563 [21:12:41] you also dropped the ',1' ? [21:12:55] different regex func [21:13:00] we don't need piggybank for this [21:13:08] k [21:16:25] mwalker, got 2 files ready for you [21:16:29] put them on fenari? [21:19:46] mwalker: main_page.log banner_from_main_page.log on fenari in my home folder [21:19:57] Jeff_Green ^^ also relevant for you [21:25:19] mwalker, Jeff_Green: ping [21:26:19] drdee: https://gerrit.wikimedia.org/r/27823 [21:26:23] drdee: debug flag for collector [21:26:36] sweet sweet [21:27:11] I am going to put a turkey in the oven now, I will be back in about 25m [21:27:37] then I can hack on the Perl scripts while the turkey does its thing in the oven [21:27:49] that man can do way too many things at once. [21:32:44] average_drifter: review ready: https://gerrit.wikimedia.org/r/#/c/27823/ [21:33:01] you are cooking at 11:30 PM? [21:33:49] it's weird I know.. [21:34:15] got fed up with ordering .. so I just put it in the oven, low heat, return in 3h and it's done [21:34:26] man, let's call it a day, go out drink beer [21:34:29] etc [21:34:46] as always, I agree with drdee [21:45:54] drdee: ok, fixed the problems [21:46:21] https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-snc7/417249_527435447283420_307228456_n.jpg <== this is not a bad beer [21:46:38] it isn't the best, kind of average.. like me :) [21:46:44] is that the bottle you are drinking now? [21:47:04] drdee: yes it is [21:47:12] :D [21:52:23] merged https://gerrit.wikimedia.org/r/#/c/27823/ [21:54:36] drdee, did that pig script work better? [21:55:06] nope [21:55:14] and it broke the mobile stuff again [21:55:35] drdee: pong! [21:55:39] '\\.m\\.' ==> .*('\\.m\\.') [21:55:59] mwalker files are ready on fenari [21:56:43] ottomata ^^ [21:57:07] huh? [21:57:36] oh probably need the stars [21:57:39] yup [21:57:40] drdee /home/diederik/main_page.log and /home/diederik/banner_from_main_page.log ? [21:57:43] but not the parens [21:57:50] mwalker: yes [21:57:54] did you still have the crazy status codes though? [21:57:58] yes [21:57:59] awesome possums! thank you so much [21:58:33] made it is an issue with lines having more / fewer than 14 lines [21:58:37] i mean fields [21:58:46] stupid autocorrect [21:58:50] made == maybe [21:59:36] drdee, was that for me? [21:59:45] yes [21:59:50] i was trying to say [21:59:59] 'maybe it is an issue with lines having more / fewer than 14 fields' [22:01:24] hmmmmmmmmmmmmm could be, that is a lot of lines though [22:01:31] more shouldn't be a problem [22:01:46] have we reproduced this problem with a smaller sample set? [22:01:56] no [22:01:58] i ran this on a day of sampled logs and the http statuses looked good [22:02:07] me too looked good as well [22:02:13] so maybe there is just one file that is causing issues [22:02:34] hm [22:02:44] how about not gripping the status code [22:02:54] but just output the entire status field [22:03:04] you will get a couple of more keys [22:03:08] sure, you can try that [22:03:12] would be interesting [22:03:13] because of the diff between varnish and squid [22:03:20] i tried that on smaller sample sets and it was fine [22:03:25] but then we know for sure whether it's a regex issue or a line issue [22:03:58] yah maybe, i'm heading out pretty soon [22:04:00] might as well try it though [22:05:12] ok running sample job now [22:06:54] sample job looks good, now running real job [22:07:35] if it's a line issue then we should ramp up the tab delimiter conversion project [22:10:05] you guys are going to be testing the changes right? with the wikistats stuff? [22:10:13] making sure the changed format works? [22:21:48] dschoon, milimetric: any progress on the datasource tag idea for limn? [22:22:03] i assume you've both been busy with other things, but just curious [22:22:04] haven't worked on it at all, erosen. been doing the d3 stuff. [22:22:06] yeah. [22:22:11] but it's def on a list [22:22:14] cool [22:22:24] the more you pester, the more likely it happens in a timely manner :) [22:22:28] yup [22:22:30] so no offense taken! [22:22:34] cool [22:22:43] while i'm at it... [22:22:50] remember we talked about the sort of datasource [22:23:02] this won't matter of tags are implement, maybe [22:23:21] but i remember we looked in the way it worked and decided that it had to do with the file names [22:23:34] is that what you recall? [22:24:25] sort -> sorting [22:29:04] dschoon: sorry to pester but ^^ [22:29:19] ah yes. [22:29:36] because the filenames were not the same as the IDs or the presentation names [22:29:46] so the datasource sort is stable for filenames [22:29:55] oh, and IDs [22:30:14] because it's var/data/datasources/rc/$ID.json [22:30:24] i guess my issue is that it doesn't seem to be sorting on file name or presentation name [22:30:28] i'll look at the file now [22:30:29] one sec [22:32:02] dschoon: which site is this in? [22:32:18] hmm. [22:32:21] i'll come over. [22:32:30] appreciate it [22:41:31] ottomata, found the problem with the pig script [22:43:02] apparently the status_code field and the response time field get tangled up sometimes [22:43:22] so then it grabs the first 3 digits from the response time [22:43:27] ughh [22:43:37] nice 'feature' for next week [22:52:21] drdee, busy? Do you have anything I can work on this weekend? [22:53:22] yes, there is one thing [22:53:51] it would be super super super awesome if you could write in java a pig UDF that would determine whether an ip address is IP4 or IP6 [22:54:13] ideally, put this in package org.wikimedia.analytics.kraken or something similar [22:54:29] that's the plan! [22:54:39] ok I'm vaguely familiar with ipv4 vs ipv6 can I learn that in time? [22:54:44] to make the UDF [22:55:06] so if i i supply 127.0.0.1 it should return IP4 or 4 or something [22:55:23] don't have to explain it now [22:55:29] if i supply 2001:0db8:85a3:0042:0000:8a2e:0370:7334. then it should return ip6 [22:55:36] ok [22:55:41] that's all you have to know regarding the ip stuff [22:55:50] just read some wiki pages ;) about the format's [22:55:55] I see [22:56:08] most work will be to get it a dev env with pig source [22:56:44] ok [22:56:49] Can I use ottomata's repo? [22:56:59] on github [22:57:02] use wmf-analytics [22:57:06] and make a new repo [22:57:12] ok [22:57:20] what is your github user account? [22:57:24] louisdang [22:57:24] i will add you to the group [22:57:28] ok [22:57:44] i gotta go quickly but email me if you have any questions [22:57:51] ok [22:58:30] see you later [22:58:41] and have a good weekend [22:58:42] look at libcidr for notational support [22:58:51] (as inspiration) [23:00:09] average_drifter, DarTar, dschoon, erosen, robla, ori-l, walker, louisdang, all have an amazing weekend! [23:00:28] milimetric as well of course! [23:00:43] drdee: likewise to you! good weekend [23:06:05] drdee: you too!