[00:43:19] ping average_drifter [01:32:15] here [01:32:36] drdee: have you tried running the build script / [01:32:36] ? [01:43:07] no [01:43:13] but i trust that it works [01:43:57] send an email to ottomata and me with instructions for ottomata where to find the debian packages so that he can install them tomorrow [01:44:09] what's the progress with the editor version of wikistats? [01:52:43] finishing them up [01:55:56] couldn't, perhaps, that script be called "Makefile"? [01:55:58] just curious. [02:03:26] dschoon: which ? [02:03:43] "build" [02:05:04] there is already a Makefile [02:05:31] dschoon: https://gist.github.com/511891d05a0da84bef5b [02:05:33] dschoon: this is the script [02:05:38] dschoon: it doesn't do what a regular Makefile does [02:05:58] dschoon: it builds packages for different distributions of Ubuntu [02:07:15] ah [02:07:17] my bad :) [02:07:34] why not have a "make packages" target then? [02:08:09] dschoon: it can be done with a Makefile also , yes [02:08:29] dschoon: but it is not specific to one single project, it's a higher-level script [02:10:29] i don't think we should spend much more time on this, it's time to deploy it and move on to the next project [02:10:50] yes [02:15:15] yeah. no big. [02:15:17] ship it. [08:04:08] average_drifter (or others): do we have numbers for active editors per country yet? [08:04:34] i seem to recall this was planned as a feature for (the interactive part of) http://reportcard.wmflabs.org/graphs/active_editors [11:41:31] HaeB: hello, I'm working on it, want to get it ready asap [11:43:43] cool, looking forward to it! (even if i don't need it any more right now - a journalist had asked about the number for one country, but i just gave him some other interesting numbers) [14:24:15] good morning [15:00:56] morning [15:04:43] average_drifter, ottomata, milimetric [15:06:04] hey drdee [15:12:24] hey [15:12:38] wanna start with working on the editors version of wikistats [15:12:40] ? [15:14:43] drdee: I'm working on them in /home/spetrea on stat1 [15:14:53] drdee: I have configured part of it, still have some problems with paths [15:15:13] drdee: I want to get it done asap, I talked with HaeB , he told me he needs it as well [15:15:47] k [15:15:53] hey ottomata [15:16:59] drdee: sending an e-mail to Erik about detecting prod/test environment (which is currently being done by checking -e /home/ezachte and such) and proposing to replace with checking /etc/wikimedia-realm [15:17:38] morning [15:17:42] drdee: I went to #wikimedia-labs and asked what is a reliable way to get the hostname of a machine on labs and they told me /etc/wmflabs-instancename [15:17:42] well, i need other thing much more urgently ;) - but yes, this is the stuff that the press wans to know often [15:18:05] drdee: if there was such a thing for stat1 and all machines wikistats is running on, then I can use that [15:18:10] drdee: are webstats collected already for blog.wikimedia.org and wikimediafoundation.org? :) [15:18:11] drdee: is there a #wikimedia-prod ? [15:18:30] HaeB: those are part of our release of udp-filters which is ongoing [15:18:46] HaeB: packages are ready but they need to be deployed [15:18:53] ah cool [15:18:59] HaeB; yes we are deploying today [15:19:13] ottomata, can we start with rolling out the new version of webstatscollector? [15:19:44] also, fix all the path issues in wikistats [15:19:54] 1) no paths should be hardcoded in wikistats [15:19:55] drdee: equivalent of #wikimedia-labs but for production environments [15:20:15] oh probably #wikimedia-ops [15:20:44] 2) production vs debug mode should be a command line parameter. not determined in the code by the hostname [15:21:51] (re previous topic: once stats like editors/country have been available for a month or so, that should make for a really nice blog post to highlight the results of your work) [15:22:01] totally [15:22:27] hah, drdee [15:22:30] yes! [15:24:22] i have to do some errants for anna,be back in a hour but if you and average_drifter can get this running that would be really awesome [15:24:42] stat1 is the machine where it will run, right? [15:25:16] once we have confidence that the new udp-filter is working fine we should also upgrade udp-filter on locke, oxygen and emery as that would fix the X-Forwarded-for issue [15:26:04] and finally update the server log config: tab separator, accept language header and append 'new_format' to the log file names so we can easily recognize the old and new format [15:26:30] if we can get these 3 things done today then a whole bunch of people will be happy and then we can go back to kraken tomorrow [15:26:52] ok [15:26:58] sounds good to me! [15:27:01] average_drifter [15:27:07] uhhhh, what can I do? [15:27:07] yes yes [15:27:11] reading [15:28:01] ottomata: can we go for a quick test with the static binaries ? [15:28:08] The .deb packages were built with the script /home/diederik/build. You can find .deb packages for both lucid and precise in [15:28:18] /home/diederik/lucid [15:28:19] /home/diederik/precise [15:28:31] average_drifter: which VM? build1 or build2 [15:28:54] sure [15:29:36] drdee: they share the /home so it's the same [15:29:57] aarrgghhhhhh i keep forgetting that [15:32:23] ottomata: on build1 /home/diederik/wikistats/webstatscollector/collector-static [15:32:25] ok so, i see them [15:32:32] oh [15:32:33] static? [15:32:36] what about [15:32:41] ottomata: on build1 /home/diederik/wikistats/udp-filters/udp-filter-static [15:32:49] /home/diederik/lucid/ [15:33:10] ottomata: you can use packages if you want [15:33:26] oh isee [15:33:28] that's the binary [15:33:29] ok [15:33:31] drdee [15:33:35] the packages need to be named like this: [15:33:38] http://apt.wikimedia.org/wikimedia/pool/main/u/udp-filter/ [15:34:27] average_drifter ^^ [15:39:56] ottomata: oh ok, I can add the trailing ~lucid or ~precise [15:41:06] ok, yeah, and I think that needs to be in the changelog version name too [15:41:08] so [15:41:33] for example [15:41:37] for libcidr, I did this [15:41:38] https://github.com/wmf-analytics/libcidr/blob/debian/1.2.0-1precise-wikimedia/debian/changelog [15:42:06] i could recommend modifying changelog with regular changes in master [15:42:06] and then [15:42:13] when you are ready to build packages for different distributions [15:42:20] create a branch with the version and the dist name [15:42:27] and modify changelog there with the dist name [15:42:29] and build the package [15:42:43] oh, changelogs are under control, I can change that quickly with a cmdline param [15:42:44] then, create a new branch from master for the other dist, and do the same, [15:42:46] ok [15:42:50] then you got it [15:45:23] ottomata, quick question: what are these kafka jobs on hadoop, like: http://analytics1001.wikimedia.org:8088/cluster/app/application_1352758471717_0001 [15:45:52] man, drdee, y u no like vpn? [15:45:56] i have to change all of the urls you send me :p [15:46:01] those are the hourly pixel.php imports [15:46:15] i use the new proxy server :D [15:46:21] pshhhhh [15:46:28] but sory [15:46:37] hehe, s'opk [15:46:37] s'ok [15:46:59] is that data stored? [15:47:15] yeah, if anyone actually sends any [15:47:19] D: [15:47:21] i mean :D [15:47:38] right now its kidna dumb, i think its creating files every hour even if there is no data [15:48:24] /user/otto/pixel/logs [15:48:54] aight [15:49:09] we should also talk about the file structure on hdfs [15:49:15] like: [15:49:27] /traffic/ [15:49:45] /db// [15:49:52] and for traffic' [15:50:00] nawwwww why in /? [15:50:13] /traffic// [15:50:17] etc [15:50:21] just a suggestion [15:50:21] ok [15:50:41] it should not be in a user directory i think [15:50:41] i'm all for that, except keeping it / [15:50:42] we can google and see what others do, or just mimick unix [15:50:46] mimic [15:50:46] sounds good [15:51:52] ok, so help me understsand one more time what needs to be done to deploy this new stuff [15:51:57] you want to test this stuff on locke first? [15:52:02] or stat1? [15:52:07] stat1 [15:52:09] alongside of the currently scirpts [15:52:16] current versions [15:52:17] right? [15:52:19] i would like to keep the current webstatscollector running [15:52:21] ok [15:52:21] on locke [15:52:25] and have the new version on stat1 [15:52:29] the changes are quite big [15:52:29] can't we just verify [15:52:33] by taking a sampled log file [15:52:41] and piping that through the old and the new [15:52:44] and compare the results? [15:52:48] do we need to run it live? [15:52:53] good ide [15:52:54] a [15:53:18] you would have to do that on locke with the old version [15:53:22] and on stat1 with the new version [15:53:33] ok cool, so we should be able to do that on stat1, or build1/2, without using the .debs yet [15:53:42] the new version supports new domains [15:53:50] ok [15:53:51] so there will be differences between locke and stat1 version [15:54:11] ok, but you know what they should be, right? [15:54:14] but for enwiki for example there should be no differences [15:54:15] ok [15:54:24] well for the blog we have no comparison data [15:54:44] so whatever it says we have to assume it's correct unless it's so low / high that we know it doesn't make sense [15:55:02] so, remind me real quick. udp2log -> filter -> collector -> file [15:55:03] ? [15:55:21] so I can do [15:55:32] sampled-file | filter -> collector >> file [15:55:33] ? [15:55:58] udp2log -> filter -> log2udp -> collector -> file [15:56:25] so cat -> filter -> log2dup -> collector -> file [15:56:39] you would have to start the new collector in 'debug' mode (ask stefan) [15:56:48] so it will write the file every n minutes [15:56:58] the old version of collector only writes every 60 minutes [15:57:11] so it's a bit complicated to test [15:57:24] so you need the new version of collector because that has a new debug mode [15:57:30] again stefan knows all of this [15:57:36] average_drifter ^^ [15:57:50] so cat -> udp-filter -> log2dup -> collector -> file [15:57:54] ok [15:58:00] so for the old one (which I am doing now [15:58:01] ) [15:58:06] i have to wait 60 minutes? [15:58:11] yes [15:58:26] and I have no idea how collector is being run on locke :p [15:58:26] ha [15:58:29] like what is supervising it [15:58:33] and i am not sure if you can specify the output location is configurable [15:58:37] afaik someone started it in a screen [15:58:42] average_drifter ^^ [15:58:59] okay i have to do some errants really now [15:59:04] back in 90 [15:59:35] k [15:59:38] errands [15:59:43] you can do errants too if you want [15:59:43] hah [16:00:38] drdee, real quick, before you go. who worked on setting up webstatscollector before the analytics team existed? [16:01:06] probably domas [16:01:22] hm [16:01:31] we gotta bite this bullet once :D [16:01:31] that woulda been a long time ago then, right? [16:01:34] yup [16:01:37] at least 3 years [16:01:52] that's why i want to do stuff first on stat1 [16:01:53] b [16:01:54] e [16:01:55] c [16:01:57] because it's so brittle [16:02:00] really going now [16:02:03] okbye [16:03:10] ottomata: what's that field in changelog [16:03:16] ottomata: when you do lucid-wikimedia [16:03:20] ottomata: what's that called ? [16:03:20] ah it daemonizes itself! [16:03:57] i think 'distribution-name' [16:03:59] but i'm not sure [16:21:38] ok, collector is running, waiting to dump some data on build1 [16:22:03] once we have that output, we should be able to do the same thing that I just did with the same file but with the new filter (udp-filter?) and collector [16:22:08] and compare the results [17:17:30] growl, sorry, my irc quit! it does that sometimes [17:17:35] average_drifter, let me knwo if you need anything [17:30:58] ottomata: --distribution paramter added to git2deblogs [17:31:25] now updating the "build" script so it knows to give a distribution name dependin gon the machine it's running on [17:32:26] ottomata: using lsb_release to get the distro name [17:32:55] cool [17:42:03] ottomata: what do you think about using git2deblogs on libanon and libcidr ? [17:42:18] ottomata: it would allow us to automatically set all the stuff [17:49:42] yo [17:50:03] ottomata: ah it daemonizes itself! [17:50:04] yes [17:54:42] ls [17:55:04] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:56:28] milimetric i am in [18:00:04] average_drifter, re: git2deblogs, don't know what that is, but I guess? [18:00:19] do we need to build new versions of those? [18:02:34] ottomata https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [18:04:58] ottomata: it's just something that converts your git logs to debian/changelog [18:05:08] ottomata: we're using it for udp-filters and webstatscollector [18:05:51] ottomata: do you manually update your debian/changelog for libcidr and libanon ? [18:07:17] yes, but i mean, these are 3rd party libs [18:07:19] we didn't write them [18:07:26] i just created debian packaging for them [18:13:35] ottomata, let me know when I can compare the old and new webstatscollector output [18:13:45] i;ll take care of that [18:14:22] cool [18:14:24] yeah they are there [18:14:27] on build1 [18:14:31] /home/otto/webstats/dumps [18:14:39] aight [18:14:45] Reminder: Update Analytics Roadmap! [18:14:58] That's the annoying thing I forgot to mention :) [18:15:01] did that! [18:15:05] sweet! [18:15:07] ha, doesn't look like it worked correctly [18:15:15] drdee, i gotta change locations [18:15:16] * average_drifter worries [18:15:22] i'll be back on in 30 mins [18:15:23] k [18:15:40] i swear to god i leave for work at the same time every day [18:15:56] i have no idea how muni sometimes gets me here 15m early, and sometimes 10m late. [18:17:05] ottomata, quick question [18:17:26] /home/otto/webstats/dumps is the old version? [18:17:40] drdee: whitelist appears to be working for the office now. [18:17:51] i didn't get a password prompt for http://analytics1001.wikimedia.org:8088/cluster [18:18:05] yes, old version [18:18:07] haven't done new [18:18:13] but the file in /home/otto/ is the file I used [18:18:16] to do taht [18:18:26] i'll kill my version of collector [18:18:28] and then you can run the new versions [18:19:37] * average_drifter is talking to Erik about wiki edits [18:20:20] dschoon: cool! [18:21:16] ottomata, what was the command line you used for the first test/ [18:21:16] ? [18:24:32] i did [18:24:46] cat file | filter > file.filter [18:24:49] bin/collector [18:25:22] cat file.filter | log2udp -h 127.0.0.1 -p 3815 [18:25:27] then wait an hour [18:25:47] ok, be back in 30 [18:27:41] average_drifter: can you help me? [18:29:36] drdee: yes [18:29:41] drdee: talking with Erik on skype [18:29:46] oh ok [18:29:48] drdee: trying to invite you [18:29:54] ty [18:29:57] drdee: we are discussing some of the geoip [18:30:04] loop me in :) [18:30:15] drdee: we're trying to but I don't have the option on my ipad [18:30:20] drdee: Erik's trying to [18:30:24] ok [18:30:25] maybe I'm dumb but setting the proxy and entering the password doesn't work for me in FF or Chrome [18:30:54] mmmm [18:46:19] anyone know where we're running the support software for the menagerie of hadoop-related systems? (Hive's MySQL, whatever crap Hue requires, etc) [18:49:12] dschoon: an1001 [18:49:44] Yeah, that's what I figured. Gotta move that. [18:53:44] ottomata and i discussed that yesterday, we could use stat1001 or stat1 (the machine that is in eqiad) [18:53:59] hm. [18:54:00] interesting. [18:54:20] i was thinking a R720, since we also need a secondary NN [18:54:30] and they could coexist [18:55:12] does Oozie require a MySQL? [18:55:26] I forget which services have external deps. [18:59:01] oozie, yes it uses the hive metastore [18:59:05] and that's mysql [18:59:13] R720 is fine with em as well [19:05:17] ok back [19:06:21] ottomata, i think you did something wrong with running filter [19:06:36] maybe parameters ? [19:06:40] when i just run filter the output is okay [19:06:50] no this is using the old version [19:07:04] (using filter not udp-filter [19:07:17] hm [19:07:25] is the filter output wrong? or the collector output? [19:07:29] ottomata, average_drifter: check //home/otto/webstats/filtered [19:07:34] that is the correct output of filter [19:07:37] i saved the filter output in a file, so we coudl check that [19:07:42] ottomata, I sent a reply on the hardware thread [19:07:46] lmk what you think [19:08:00] your output (sampled-1000.log-20121112-10000lines.filtered) is incorrect [19:08:05] hmmm [19:08:13] all I did was cat | file | filter [19:08:20] cat file | filter > file.filtered [19:08:24] dschoon, ok cool [19:08:27] will read in a sec, thanks [19:08:32] i did the same :) [19:08:33] we left out a few allocations [19:08:51] didn't you accidentally run udp-filter? [19:09:57] me? no [19:10:05] ohhhh, but i didn' do bin/filter [19:10:08] i did filter, hmmmmm [19:10:52] anyway, ok, do you want me to run it again or ahve you already done it? [19:12:10] i just ran it and i am waiting for collector to output the file ;) [19:12:26] * average_drifter is trying to think what happens if the webstatscollector is broadcasting udp filters and because it's UDP it might deliver some messages to filter and some to udp-filter but maybe not the same to both because UDP is not guaranteed to 100% deliver [19:12:32] ok cool [19:12:44] heh [19:12:51] yeah, but this is just localhost tests right now [19:12:54] so the tests should be valid [19:12:59] oh alright [19:13:01] * dschoon is trying *not* to think about webstatscollector. [19:13:01] i don't think we're going to lose packets on lo [19:13:24] average_drifter: where are the binaries of the new webstatscollector ? [19:14:02] drdee: /home/diederik/wikistats/webstatscollector [19:14:07] thx [19:14:26] http://stackoverflow.com/a/2662405/827519 [19:14:52] "If you are losing packets over the loopback interface after sending only 6 or 7 packets, then it sounds like maybe your receive buffer is too small. You can increase the size with setsockopt using the SO_RCVBUF option. However, if you are sending 1500 bytes, then if this is indeed the issue, it means the receive buffer is only about 9K (or more likely 8K, however that seems a rather small default). I believe on Win [19:16:20] man it takes forveeeever to build a raid array on 10 2TB disks! [19:23:50] brb [19:38:21] um, drdee, i don't think we have to wait an hour [19:38:25] we should be able to snd a sigalrm [19:39:43] working with average_drifter right now on comparing files [19:39:47] ok cool [19:39:48] so you can play with karekn [19:40:01] ok, welp, i'm waiting for long running things to finish! [19:40:28] change the server logs? [19:42:06] ? [19:45:44] replace space with tab, add accept-language header? [19:47:05] ahhh, hm, i can do that, sure, wikistats ready for that? [19:53:38] so, drdee, to make those changes, I need an ops person willing to babysit and make it happen with me [19:53:51] might be tough to convince them to do this now that fundraising is about to start being really active [19:53:54] what do you think? [19:54:22] better to do it before its in full swing [19:54:30] it'll just become a harder sell as time goes on [19:54:54] agree and we definitely need tab separator before kafka and storm are switched on [19:59:00] why do we def need it before kafka and storm are switched on? [19:59:10] i mean, it'll be nicer, but won't really make that much of a difference, right? [19:59:43] it will be a huge difference because else we have to do a lot of processing of the fields and as it's hard realtime that sucks [19:59:47] but [20:00:01] fundraiser is doing a large trial this week [20:00:06] so better not disturb them [20:00:19] i just remembered that [20:00:49] ottomata ^^ [20:02:06] yeah, that's w hy I was asking, really, Jeff was talking about tha tin ops room just now [20:02:17] can't we just make the scripts split on space or tab? [20:02:22] \s [20:02:23] ? [20:02:23] no [20:02:34] because we don't know how spaces there are in a field [20:02:45] the user agents string fucks us over [20:02:48] for now we just do what we have been doing [20:02:50] how often though? [20:02:51] that's why we need tab [20:02:52] not that often [20:02:55] right? [20:02:58] often often enough [20:03:05] yeah but, it won't stop us from working [20:03:09] i just mean, its not a blocker [20:03:10] its a problem [20:03:11] and we should first upgrade udp-filter [20:03:11]