[00:15:04] heading home a bit early -- more housemate interviews tonight, but i'll be online and working afterward. [09:34:24] morning [09:34:30] got some problems connecting to stat1 [09:34:57] ok got it [12:34:41] good morning analytics :) [12:44:19] morning milimetric [12:44:41] got a friend who's using command-T [12:44:41] morning average [12:44:49] saw you were talking about command-T yesterday [12:44:50] oh yea, great plugin [12:45:09] I think it's written using Ruby and has some C++ bindings as well [12:45:17] the only advice I have is be careful installing it because it'll significantly change your life :) [12:45:23] I use NerdTREE and ack for locating files and opening them and stuff [12:45:36] really ? for the better ? [12:45:36] yep, I use NerdTREE too, that's awesome as well [12:46:05] well, yea for the better except it'll make you stop caring where stuff is so you have to be careful [12:46:19] I wish command-T had a grep/ack functionality [12:46:33] like, you can find /www/something/blah/what.co by typing wwwblhwco [12:46:40] that's the first misunderstanding I had about it. I thought it was searching in files, whereas it was actually searching in filenames.. [12:46:56] yeah, grep would be cool but you can bind :grep to something like \r [12:47:44] would it show the results in command-T ? [12:47:50] nono, I mean separately [12:47:58] command t is pretty specialized at file finding [12:48:31] it's just really fast so it helps save a lot of time actually [12:50:35] does it find files containing stuff ? [12:50:51] like I give it a search pattern, can it find files containing a string of bytes inside them(not in their names) [13:09:14] average_drifter no, command t only searches filenames. It's mapped to \t so I was saying you can map \r to :grep and search in files that way. I can send you a vimrc line if you want [13:45:38] hey drdee [13:45:56] hiya [13:57:55] hey ottomata, milimetric [13:58:02] howdy [13:58:38] merrnin [14:09:16] average_drifter: time to merge wikstats stuff? [14:10:37] drdee: not yet, workin on it [14:13:15] anyone know why grep --max-count=10 still prints over 10 lines? [15:30:28] hey drdee [15:30:44] hey louisdang [15:31:50] I'm ready to work on that test data [15:32:01] cool [15:32:03] 1 seec [16:00:49] drdee: I have a question, would it be possible for me to get one of the .gz in /a/squid/archive/sampled and anonymize it with udp-filters(perhaps also removing some other filters) and use it locally ? [16:01:16] s/filters/data/ [16:02:05] you already have access to stat1, right? [16:02:19] yes [16:05:06] drdee: yes [16:06:33] so……you can do that [16:07:36] ok [16:58:04] morning all [16:58:18] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:01:05] ottomata, erosen ^^ [17:36:59] brb a sec [17:46:27] k [17:51:35] hey ottomata, any idea why proxy traffic is so slow? [18:01:10] hmmm, nope [18:01:21] it is though, hm [18:12:43] Do we have a gig or so of reqlogs somewhere? [18:12:50] drdee, ottomata ^^ [18:13:07] I don't really care about their content or timeframe. Just that they're normal web requests [18:13:11] yes [18:13:15] ...where? [18:13:15] both on stat1 and on hdfs [18:13:39] stat1:/a/squid/archive/sampled [18:13:46] hdfs:/user/otto/logs (i belief) [18:13:49] great. [18:13:56] thanks [18:14:18] i'm going to stuff them into avro records and see how much space is saved [18:14:21] hdfs:/user/otto/logs/sampled-1000 [18:14:22] i think [18:14:45] ah on [18:14:46] no [18:14:49] /user/otto/logs/sampled [18:14:51] that's them [18:14:54] great. [18:15:01] thanks [18:16:21] ottomata: did you kill the vpn? [18:16:29] or at least, the part that lets me access an01 via vpn? [18:16:53] something is wonky with the network [18:16:56] yes, that killed hadoop, that eth0:0 IP [18:16:58] sigh. [18:17:00] so I killed that [18:17:02] okay. [18:17:10] the vpn should work though [18:17:10] proxy traffic is suuuuper slow [18:17:10] i guess i will not get these files from hdfs then [18:17:10] but yeah, things are being weird for me too [18:17:14] you can [18:17:27] dschoon [18:17:27] not if i can't get to them :P [18:17:27] i have them on an01 too [18:17:29] on the filesystem [18:17:34] where? [18:17:36] /a/logs/sampled [18:18:06] cool [18:18:07] ty [18:18:22] things are def weird, i'm going to take down the vpn for now, not sure if it is a problem…but it might be [18:18:31] k [18:19:15] drdee, I turned off vpn and restarted apache [18:19:22] seems a little happier now [18:19:38] yes much faster [18:39:29] drdee: can you give me some feedback on potential names for the wiki-wide statistics module I'm making? [18:39:31] it is currently called wikistats [18:39:32] b [18:39:37] ut I know that is no good [18:39:52] what does it do? [18:40:00] i mean, concretely [18:40:00] however, I think other names would be deceptive [18:40:00] ya [18:40:20] given a list of wikipedia language codes and specifications for which statistics to compute for each of those languages, gives you a csv with all of that information [18:40:23] wiki thermo :) [18:40:30] ? [18:40:39] thermometer [18:40:44] gotcha [18:40:49] i was thinking wikitrix [18:40:51] hat's obviouslyy bad [18:40:52] but then i saw wikilytics [18:40:59] gotcha [18:41:02] dschoon *hates* that [18:41:05] i think wikilytics would be great for this [18:41:14] sure go for it [18:41:18] i just refused to let diederik name kraken "wikilytics" [18:41:29] yeah, i saw a erik z file with the name wikilytics [18:41:29] hehe [18:41:30] which trivializes the giant hardware expense :) [18:42:01] another angle worth considering is the fact that it allows you compare one language to another [18:42:14] drdee: have you heard of wikilytics elsewhere? [18:42:29] afaik, i am the only who used it [18:42:39] cool [18:42:44] i used it late 2010, for my first assignment [18:42:44] at wmf [18:42:47] gotcha [18:42:59] would you be okay with me appropriating it? [18:44:20] drdee: got a sec? [18:44:44] erosen, totally [18:44:45] dschoon sure [18:44:54] cool [18:45:07] i PM'd you the logline [18:45:14] since it's just spam in here [18:45:17] i keep wanting to mention it to people but can't share it until it gets a new name [18:49:03] haha [18:49:14] there are only two hard things in computer science: [18:49:17] 1. cache invalidation [18:49:19] 2. naming things [18:52:36] 3. estimations [18:53:35] 4. off-by-one errors [18:54:04] optional :) [18:55:43] hoi drdee [18:56:04] uploaded the complete traffic stats for Talk pages and FeedbackPage pages at https://trello.com/c/gOKrxqjf [18:56:05] yo [18:56:13] still collecting main namespace data [18:56:18] cool [18:56:52] short answer is: nobody ever visits the per-article feedback page [18:56:54] ever [18:57:57] it's good to have the full dataset though, as some community members have concerns about exposing unmoderated feedback to readers [18:58:19] i spend most work on getting the timestamp from the file into the dataset itself [18:58:37] that works now [18:58:41] sweet [18:58:48] so hopefully i can start the job soonish [18:59:04] have you guys seen https://twitter.com/ReaderMeter/status/261366665944653826 ? Started from a stackoverflow thread [19:14:17] heh [19:14:22] yeah [21:01:19] any of you guys still free to talk analytics and Parse.ly? :) [21:01:35] I realize the meeting invite had location, "The Webz" -- I assume that's here [21:01:51] DarTar that's awesome [21:02:08] I'll totally add it to Limn when I'm not so under the gun [21:02:23] we should make it the default for the reportcard [21:03:03] milimetric: as a d3 expert maybe you have an answer to this? https://twitter.com/ReaderMeter/status/261368247079800834 [21:03:27] :) maybe, give me a sec [21:04:28] hm, I don't know of any off the top of my head. The closest thing is kdirstat / windirstat [21:04:51] so there's probably some yummy code there that would help if one were to tackle it [21:05:54] for something not as fancy, have you seen this drilldown? [21:05:55] http://bl.ocks.org/d/1283663/ [21:08:13] amontalenti: i received from you a 'declined' message hence i assumed it was canceled [21:08:27] Oh drdee, didn't realize a decine went out, my mistake. [21:08:54] *decline; we can schedule for another time then [21:09:01] ooh! DarTar there's a layout you can use to get pretty close in d3: https://github.com/mbostock/d3/wiki/Treemap-Layout [21:09:09] send me an invite :) [21:09:16] drdee: will do, what time zone are you? [21:09:34] EST [21:09:41] ah, that's easy, us too [21:09:46] OK, will send you one now [21:09:48] k [21:10:27] milimetric: I did see that in the gallery, not quite like what I was looking for [21:11:37] yeah, to get xkcd-ness you'd have to nest one of those layouts inside itself recursively and tweak the spacing the same way the line interpolator was tweaked to make the xkcd line graph [21:11:49] but I'll give it a try and see how it looks like [21:12:32] there's a fun pure CSS implementation that I came across on github, if you look up those tweets [21:12:39] hm, thanks though something good to think about - letting people add a human (or human-like) touch to the graphs [21:13:00] k - back to deployment [21:42:19] yayyyyyyy kafka hadoop consumer for pixel.php is back up [21:42:27] woo [21:42:49] requests to somethign like this: [21:42:49] http://analytics1001.wikimedia.org/pixel.php?topic=pixel.php&messages=message1,message2,message3 [21:42:52] logged to pixel.php topic [21:43:04] and hourly that topic is consumed into hadoop [21:43:04] did you make port 80 public? [21:43:06] in /user/otto/pixel/logs [21:43:08] no [21:43:20] hmmm, port 80 public I guess would be ok, huh [21:43:21] okay. so i guess you have to go through the HTTP proxy? [21:43:21] hmmm [21:43:25] yeah, you can curl with auth [21:43:35] but lemme make 80 public, that should be fine [21:43:35] heh [21:43:50] i was thinking about testing it [21:43:59] and i figured we could stick a pixel-on-load call on the reportcard [21:44:07] there we go [21:44:08] since we don't really have meaningful traffic logs :) [21:44:13] hole for 80 punched in firewall :) [21:44:20] woo [21:44:21] sure! [21:44:23] do it [21:44:36] i'm signing off so very soon [21:44:36] totally will [21:44:36] but glad that all works [21:44:36] aiight [21:44:41] got a nice little puppet-kafka module going too [21:44:42] :) [21:44:51] https://github.com/wmf-analytics/puppet-kafka [21:52:50] awesome ottomata!