[14:22:28] morning1 [14:28:07] Morning ottomata you are not taking your day off? [14:28:39] average_drifter: It's a US holiday today, veterans day [14:28:46] Morning milimetric [14:29:02] it is!? [14:29:28] Yes the office is closed today [14:29:32] cool! [14:29:36] then no? [14:29:36] no! [14:30:21] I think it was officially yesterday but WMF decided to close today [14:31:29] Is milimetric around? [15:02:28] hehe [15:02:30] I'm here [15:58:55] good late morning guys [16:09:12] Hey milimetric ! [16:13:54] drdee_, I think I am stoopiding myself looking at the inputcount snappy stuff on an26 we set up [16:15:05] i'm trying to compare hourly bytecounts with and without snappy [16:15:36] the snappy hourly byte counts come in at about 1693464529 (1.5GB) [16:15:49] i'm running an hourly count without snappy now too [16:15:54] but, I just ran a per minute count, and scaled up [16:15:57] Ok [16:16:18] and it came in at about 1.5GB as well [16:16:21] without snappy compression [16:17:30] Mmmmmm that's odd [16:17:56] Are we sure that snappy is working ? [16:18:13] i guess not then, hm? [16:18:30] hmm, we ahve a non sampeld file on stat1, i'll copy it over, and uncompress it, then snappy compress it [16:18:32] and just compare the file sizes [16:18:44] Or it is a very lousy compressor :) [16:18:49] Sounds good [16:26:52] naw, snappy works fo sho: [16:26:52] 2.0G sampled-1.head5M.log [16:26:52] 371M sampled-1.head5M.log.gz [16:26:53] 684M sampled-1.head5M.log.snz [16:27:00] maybe my cli doesn't? [16:33:18] Dunno :) still on the road [16:54:29] cool, snappy has a node js binding. Not sure how we could use it but nice. Is it really as fast as they advertise ottomata? [16:54:46] haven't been doing time comparisions [16:55:05] we want to use it because its faster than gz, but also because it plays nice with hadoop [16:55:30] at 250MB/s like they say it'd make my disk the bottleneck [16:55:31] i think you can decompress portions of the file at a time (based on a pre set block size) [16:55:47] cool [17:08:21] back [17:09:58] ottomata, like your email about the hardware! [17:10:12] one thing to consider is on what machine to run mysql [17:10:34] for hue/hive, etc? [17:10:38] or whatever? [17:10:40] yes [17:10:41] we might wanna use stat1 for that as the [17:10:51] nawww, lets' just put that on the namenodes, no? [17:10:57] mysql contains instrumentation data that should be easily accessible to analysts [17:10:59] or on the Zks [17:11:04] ? [17:11:08] instrumentation data? [17:11:11] like the hive stats stuff? [17:11:12] for hive [17:11:13] yes [17:11:19] hm [17:11:23] else we have to give analysts access to one of those machines [17:11:27] it's just a consideration [17:11:37] that's true, we can maybe put the hive stats stuff on stat1 [17:11:40] is stat1 in eqiad? [17:11:51] i think so [17:11:53] ok cool [17:11:55] else stat1001 is in eqiad [17:12:01] stat1001 def is [17:12:14] there are in separate dc's [17:12:20] so probably stat1001 is then a better choice [17:12:21] for now i'd like to keep the gui interface mysql databases (oozie, etc.) in the cluster [17:12:26] k [17:12:28] i don't mind pointing hive stats db elsewhere though [17:12:33] yeah it's not urgent / high priority [17:12:39] naw, stat1001 is for web stuff [17:12:39] hm [17:12:55] would rather have hive stats in kraken and grant mysql access [17:13:02] we can grant mysql access without giving shell access [17:13:11] ok [17:13:15] hey louisdang [17:13:17] what's up? [17:13:20] hey [17:14:11] still haven't figured out how to call Hue's controllers from Java [17:15:22] why don't you write your stuff in python? [17:15:28] hue is python as well [17:16:03] I need admin access to directly access Hue's controller [17:16:16] or "view" in django I guess [17:17:37] why don't you install hue in labs so you have access to the source code and you have root as well [17:17:45] so you can tinker as much as you need [17:19:51] ottomata, you wanna start experimenting with High Availability mode as well [17:19:54] ? [17:20:48] yeah need to do that soon, want to see what dschoon thinks about namenodes on the 310s [17:21:26] the name nodes need a lot of memory, right? since they keep track of all files on HDFS [17:28:03] hmm [17:28:06] from cloudera [17:28:06] A good rule of thumb is to assume 1GB of namenode memory for every one million blocks stored in the distributed file system. [17:28:42] okay we should be fine then [17:30:42] yeah, rough calc, if we have 200TB of space and 256GB block size, we'll have about the ability for 819200 blocks total (i'm not sure if replication factor matters here) [17:31:00] but, either way, 8GB should be good i think [17:31:02] at least for now [17:51:37] ping average_drifter [17:59:16] good morning, friends [17:59:21] yoyoo [17:59:25] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:59:25] reminder: today is a holiday. you are not at work. [17:59:31] heh [17:59:37] ottomata and i have opted to work [17:59:42] and maybe milimetric as well [17:59:52] :) [17:59:55] indeed. and i am in the office. [18:00:03] no work? [18:00:04] why? [18:00:12] US veteran's day [18:04:18] https://www.mediawiki.org/wiki/Analytics/Roadmap [18:04:53] once that's updated, I'll copy to https://www.mediawiki.org/wiki/Roadmap#Analytics [18:05:55] we also have a bunch of tasks on the kraken page https://www.mediawiki.org/wiki/Analytics/Kraken [18:06:15] (Due on Thursday) [18:26:35] brb [18:32:52] ottomata, you can also add the NoProxy Directive in the proxy conf, that way people do not have to setup that stuff in their browser [18:35:56] you can use the Directive to whitelist ip addresses / domain names [18:56:26] I took an orbit of the office. [18:56:37] There are exactly two of us here. Myself and Zach. [19:02:58] git commit patch mode! [19:02:59] http://neowork.com/2012/11/11/improve-your-git-commits-using-patch-mode [19:03:02] i did not know that! [19:04:08] interesting! [19:04:36] cool indeed [19:49:21] ahhh, why are the GoeIP files commited to kraken repo? [19:49:35] I suspect louisdang. [19:50:46] ottomata, dschoon it's the public files [19:51:05] they're already in the .deb, i believe. [19:51:13] hmm, ok [19:51:14] also: it's usually bad to commit data files to souce [19:51:15] *source [19:51:17] but yeah [19:51:21] I mean the free dbs on Maxmind's websites not the licensed [19:51:23] the repo is much larger than it was before [19:51:33] oh ok [19:51:36] right, but your tests should probably just try to dl those files if it doesn't have them [19:51:44] ok [19:51:50] and its really annoying to take files out of git once they are there [19:52:01] near impossible (gotta mess with weird git internals) [19:52:41] hey ottomata [19:52:43] ottomata, anything I can do? [19:52:53] ottomata: http://packages.garage-coding.com/ [19:52:54] drdee: hey [19:53:06] got a test reprepo package repository up with reprepo [19:53:10] good old git filter-branch [19:53:11] *reprepro [19:53:18] everyone's least-favourite utility knife [19:53:22] tryin to build packages for them [19:53:30] aight [19:53:50] louisdang, ja, this isn't a hurry, but it would be good if you could use filter branch (or whatever) to figure out how to remove it from the history [19:53:53] http://stackoverflow.com/questions/5300458/how-to-remove-large-file-permanently-for-the-whole-team [19:53:57] maybe is helpful? [19:54:07] ottomata, ok [19:54:21] if you need it for your tests, maybe you can make your tests scripts download the files from maxmind on demand [19:54:25] thanks [19:54:32] hey average drifter [19:54:39] nice, cool [19:54:49] fyi, https://gist.github.com/2925497 [19:55:04] just found out that build1 and build2 share /home.. [19:55:15] oh yeah! [19:55:15] nice [19:55:30] recall i had to figure this out for limn [19:55:42] nice, dschoon [19:55:46] because someone had accidentally checked in several hundred megs of CSVs at some point. [19:56:00] i do not recommend you run those without changing them. [19:56:07] it's more notes than anything else. [19:56:16] esp git-rewrite-history.s [19:56:17] sh [19:57:23] the important command is: [19:57:28] git filter-branch --tree-filter "rm -rf DELETE_PAT" --prune-empty --tag-name-filter cat -- --all [19:57:43] that removes everything matching the glob at DELETE_PAT from your repo's history [19:57:47] it is gone *forever* [19:57:52] so back it up if you care abou tit [20:30:09] drdee: please review https://gerrit.wikimedia.org/r/#/c/33120/ [20:30:14] added static binaries build [20:30:30] aight [20:36:11] drdee: got to push another commit [20:36:29] had a small flag problem [20:36:34] k [20:36:36] push [20:38:19] ok pushed [20:39:16] ok now we have static binaries that we can use on any machine without any dependencies, for testing purposes [20:39:26] for both webstatscollector and udp-filters [20:42:13] cool [20:42:24] one question, why are you still compiling filter.c? [20:43:32] maybe just for comparison with udp-filters [20:43:38] but I can take it out of the Makefile.am [20:43:55] should I also delete filter.c ? [20:46:03] never mind, leave it for now [20:46:24] ottomata, ready to roll out the new webstatscollector ? [20:46:48] average_drifter: merged [20:51:14] kinda, kind of in the middle of something atm, but soon [20:51:25] what needs to be done right now? [20:52:07] ottomata: got some questions about dupload [20:52:23] dupload.... [20:52:28] heheh, ok, don't know it but ok! [20:52:54] so dupload apparently is used to upload new packages(or new package versions) to a debian repo [20:53:12] but I built the packages on build1 and build2 so they are not signed [20:53:15] (had no key there) [20:53:21] i can sign them [20:53:40] that would be good, you can also just create a key on those machines and modify the changelog so the author line matches your key's name [20:53:50] we don't use dupload to put the packages in wikimedia apt repo [20:53:58] we copy .debs in with reprepro manually [20:55:07] drdee: ok, let me just see if I got it correctly, I'm giving you a ping when I have all set up for upload to wikimedia repo [20:55:14] k [20:56:19] we have 3packages * 2arches * 4files/package = 24 files [20:56:25] actually 4 packages [20:56:36] 4 files/package = .dsc , .changes , .tar.gz , .deb [20:56:47] 2 arches = amd64 , i386 [20:57:14] ottomata: I think also 2 distros , like Ubuntu precise and Ubuntu lucid right ? [20:57:30] right [20:57:34] forget the i386 [20:57:54] ok [21:48:03] ottomata, question about hadoop, there are 5 apps in this weird nether state: http://analytics1001.wikimedia.org:8088/cluster/apps [21:48:10] any clue what happened? [21:49:43] oo, werid, dunno [21:59:07] how do we kill those jobs? [21:59:22] i can't determine the job id [21:59:32] i've tried but was not successful [22:08:37] restart resourcemanager? :D [22:11:34] using khadoop? [22:14:19] hmm, naw just service [22:14:21] on an01 [22:14:27] sudo service hadoop-yarn-resourcemanager restart [22:14:31] heyyyyyyy [22:14:31] hey [22:14:33] drdee, dschoon: [22:14:35] http://analytics1001.eqiad.wmnet:50070/dfsnodelist.jsp?whatNodes=LIVE [22:15:19] BAM BAM BAM [22:15:30] SUPER SUPER SUPER [22:15:40] Rock'N'Roll [22:18:02] ottomata, fixed small typo in khadoop and pushed to github [22:18:17] now you can restart the resource manager using khadoop [22:20:28] nice [22:21:01] cool, i'm out for the eve [22:21:04] actually going night fishing tonight [22:21:06] way too cold [22:21:09] my butt is going to freeze [22:21:10] ungghhhh [22:21:17] in the kayak [22:23:03] have FUN! [22:26:22] haha [22:26:23] nice [23:09:49] ok packages built [23:09:52] but unsigned [23:10:02] drdee: I wrote a script called build [23:10:12] drdee: it acts on the wikistats directory on /home/spetrea [23:10:38] drdee: if you run it on build1 then the script will make packages and put them in $HOME/lucid/ (since build1 is lucid) [23:10:57] drdee: if you run it on build2 it will do all the work and put packages in $HOME/precise/ (build2 is precise) [23:12:37] ls [23:13:11] brb 30m [23:14:52] ok