[00:04:07] so the kraken-pig, kraken-generic, and kraken-dclass jars that I need are in HDFS. So I can't access them in pig -x local mode. [00:05:07] But I would rather not work with pig in normal mode because it's very slow. [00:05:20] drdee, if you have any suggestion, I would love to hear ^ [00:05:36] and by very slow I mean it's taken me 3 hours to get absolutely nowhere [00:05:49] you can put the jars in your home folder [00:05:58] and use that for development [00:06:27] and if you want you can show me your console, maybe i can give some tips [00:06:50] oh yea, otto fixed my ssh so now I can do this maybe [00:06:52] k, i'll try [00:11:54] ok, I give up - how do I create the SNAPSHOT jars drdee? [00:12:07] mvn clean;mvn package; [00:12:13] and then copy them to an10 [00:12:26] i work on an02 on dschoon's instruction [00:12:35] doesn't matter [00:12:38] that's fine as well [00:12:58] cool, thanks, it compiled :) [00:13:49] awesome! [00:15:52] back [00:34:56] dschoon, how'd you build dclass jni thing? [00:35:04] ah yeah [00:35:26] you have to check out a different branch [00:35:31] i didn't realize that at first [00:36:19] it's called branch package [00:36:35] oh i see [00:36:43] and then init the submodules [00:36:44] and package [00:36:45] k [00:37:09] beyond that, i recall i had to also copy the dtrees into a system folder [00:37:14] for me, it was: sudo cp -R dtrees/* /usr/share/libdclass/dtrees/ [00:37:20] but it might be different for linux [00:37:39] yep, that makes sense, i'll put them where the maven error says [00:38:09] follow this in as far as it makes sense :: https://github.com/wikimedia/dClass/blob/package/README [00:38:13] that's for OSX [00:38:21] feel free to add Ubuntu specific instructions [00:39:14] glibtoolize [00:39:14] aclocal [00:39:14] autoheader [00:39:14] autoconf [00:39:14] automake --add-missing [00:39:16] ./configure [00:39:19] make [00:39:29] hello, this is how you compile the .so for the dclass lib [00:39:37] oh ok [00:39:43] thx average_drifter [00:39:43] but first, you must checkout the package branch from this repo [00:39:54] https://github.com/wikimedia/dClass/tree/package [00:39:55] i got that [00:40:23] i don't have glibtoolize... [00:40:29] milimetric: are you on a Mac ? [00:40:35] linux [00:40:36] i think for ubuntu it is libtooolize [00:40:57] yes [00:41:23] milimetric: sudo aptitude install libtool [00:41:27] oooh ok [00:41:39] general question: how do you guys know that? just search for libtoolize? [00:42:18] tinkering a lot :) [00:42:21] yes [00:42:23] hm, the submodule is not initializing average_drifter [00:42:28] but we are getting closer! [00:42:31] * milimetric is the worst tinkerer of all time [00:42:36] milimetric: you don't need the submodule [00:42:38] k [00:45:03] you can also consider just using the .deb http://garage-coding.com/releases/libdclass-dev/ [00:45:12] ugh [00:45:14] but there probably is a reason you're compiling it.. [00:45:37] drdee, were you aware that / was full on an03, an09, an26? [00:45:49] milimetric: ^^ [00:45:52] if no, i am emailing ot [00:45:53] otto [00:46:28] no, i was not :( but notpeter was setting up disk space monitoring today so we should also have our boxes monitored [00:46:30] ok, i solved that dclass problem, now another test fails [00:46:40] shoot [00:46:42] Failed tests: testIpad2(org.wikimedia.analytics.kraken.pig.UserAgentClassifierTest): expected: but was: [00:46:42] testIpod(org.wikimedia.analytics.kraken.pig.UserAgentClassifierTest): expected: but was: [00:47:22] milimetric: yes, that's because you need the openddr file in place [00:47:29] yep. [00:47:35] k, searching [00:47:37] what i said above [00:47:39] drdee: can milimetric use the .deb ? [00:47:44] copying the dtree files to the appropriate places [00:47:48] milimetric: can you use the deb ? [00:47:52] no idea bud :) [00:47:55] it does eveyrthing for you [00:47:56] i'm just trying to compile the jars [00:48:13] milimetric: are you on 32bit ? [00:48:20] the kraken-pig-blah-SNAPSHOT [00:48:26] deb should work if right architecture [00:48:38] milimetric: uname -a please ? [00:48:40] well, it matters what the analytics machines are on, not my local right? [00:48:43] i'm on 64 [00:48:46] great ! [00:48:48] wget http://garage-coding.com/releases/libdclass-dev/libdclass-dev_2.0.12_amd64.deb [00:48:54] dpkg -i libdclass*.deb [00:49:00] milimetric: did you copy the dtree files? [00:49:35] average_drifter: does the deb contain the dtree files? (IIRC i think it does) [00:49:35] milimetric: ^^ [00:49:39] drdee: yes [00:49:41] nope, nobody told me about dtree files [00:49:42] it contains everything needed [00:49:54] * milimetric is forever amazed at how people figure this out on their own [00:50:04] try the deb milimetric :) [00:50:04] trying! [00:50:04] :) [00:50:07] and, fwiw, i *did* tell you above. [00:50:24] average_drifter: put the deb on github.com/wikimedia/dclass in files section [00:50:32] drdee: ok [00:50:32] that is sort of useful :) [00:51:09] ok, build failure #3: [00:51:09] Tests in error: [00:51:10] testExec1(org.wikimedia.analytics.kraken.pig.GeoIpLookupTest): /usr/share/GeoIP/GeoIPCity.dat (No such file or directory) [00:51:33] i rsync'd them from the cluster. [00:51:34] sudo aptitude install libgeoip-dev libgeoip1 [00:51:37] milimetric: ^^ [00:51:44] no I saw you just told me dschoon, I meant before like before today [00:52:23] and I had already switched to trying the .deb when you said [00:52:53] mm same error after that average_drifter [00:53:05] you still need to copy the dat files from the cluster [00:53:07] i don't have that geoip database though [00:53:11] oh rsync it [00:53:11] k [00:53:16] milimetric: sudo su; cd / ; updatedb ; locate GeoIPCity.dat [00:53:27] no [00:53:31] i don't think i have it on my local box [00:53:33] milimetric: listen to drdee [00:53:44] just copy them from /usr/share/GeoIP/ on the cluster [00:53:47] k [00:53:57] copy all 4 dat files [00:54:25] when you say "cluster", do you mean any an** machine? [00:54:27] or... [00:54:29] yes [00:54:38] or at least any an1* machine [00:55:33] permission denied drdee [00:55:36] ? [00:55:55] try scp [00:56:16] rsync -Cavz an11:/usr/share/GeoIP/\*.dat ./ [00:56:46] I'm doing scp dandreescu@analytics1010.eqiad.wmnet:/usr/share/GeoIP/* /usr/share/GeoIP/ [00:57:10] try dschoon's command [00:57:30] yep, that works [00:57:53] is that because the rsync module has different permissions? [00:57:57] thanks dschoon [00:58:24] scp doesn't necessarily use your ssh config correctly [00:58:34] rsync is always better. [00:58:42] scp is more or less obsolete. [00:58:48] k so never use scp, got it [01:00:06] scp is ok too if you have this in your ~/.ssh/config [01:00:30] drdee: do you have any idea what's special about an03, an09? [01:00:39] even an26 has nothing listed in https://www.mediawiki.org/wiki/Analytics/Kraken/Infrastructure [01:00:42] this all worries me [01:01:03] no, i used the same hostname for scp and rsync average_drifter and scp said permission denied [01:01:20] i need to use dandreescu@analytics10**.eqiad.wmnet because my username is different on those machines [01:01:23] Host bastion1.pmtpa.wmflabs Hostname bastion.wmflabs.org ProxyCommand none [01:01:26] so maybe rsync gets that [01:01:26] Host bastion1.eqiad.wmflabs Hostname bastion2.wmflabs.org ProxyCommand none [01:01:29] Host *.pmtpa.wmflabs ProxyCommand ssh -a -W %h:%p bastion1.pmtpa.wmflabs [01:01:32] Host *.eqiad.wmflabs ProxyCommand ssh -a -W %h:%p bastion1.eqiad.wmflabs [01:01:35] Host *.wmflabs User spetrea [01:01:37] average_drifter: we know. we all have the appropriate proxy commands. [01:02:00] yep, i have all those, i think rsync must be smarter about my bad username situation - it's gonna be fixed at some point anyway [01:02:07] (still copying the .dat files) [01:02:17] https://gist.github.com/wsdookadr/fc50039b332fab2a85fd [01:02:42] dschoon: ah, ok [01:03:28] shit [01:03:34] drdee: they're all udp2log recievers [01:05:22] drdee: i think we should probably text otto [01:05:36] yes i agree [01:05:41] there's a chance we'll lose mobile data otherwise [01:05:44] okay. i'll do it. [01:06:03] milimetric; got the dat files? [01:06:17] nope, still copying [01:06:40] k [01:09:01] no response on phone [01:09:02] i also texted [01:09:10] :( [01:09:11] ^^ kraigparkinson, drdee [01:09:27] otto not responding? [01:09:28] i double-checked his number on https://office.wikimedia.org/wiki/Contact_list [01:09:36] correct [01:09:43] i belief there are 4 udp2log instances running in total on an03-an06 [01:09:57] the number is correct. I got a text from him.... [01:10:07] but no clue why / would fil up [01:10:10] dsc ~/w/w/k/tmp/dotfiles ❥ dsh -g kka -- pgrep -fl udp2log [01:10:10] an03: 25474 /usr/bin/udp2log --config-file=/etc/udp2log/webrequest --daemon -p 8420 --multicast 233.58.59.1 --recv-queue=524288 [01:10:11] an04: 1685 /usr/bin/udp2log --config-file=/etc/udp2log/webrequest --daemon -p 8420 --multicast 233.58.59.1 --recv-queue=524288 [01:10:11] not recently, but last week. [01:10:12] an05: 1516 /usr/bin/udp2log --config-file=/etc/udp2log/webrequest --daemon -p 8420 --multicast 233.58.59.1 --recv-queue=524288 [01:10:14] an06: 30376 /usr/bin/udp2log --config-file=/etc/udp2log/webrequest --daemon -p 8420 --multicast 233.58.59.1 --recv-queue=524288 [01:10:16] an09: 24170 /usr/bin/udp2log --config-file=/etc/udp2log --daemon -p 8420 --multicast 233.58.59.1 --recv-queue=131072 [01:10:18] an08: 6562 /usr/bin/udp2log --config-file=/etc/udp2log --daemon -p 8420 --multicast 233.58.59.1 --recv-queue=16384 [01:10:20] an26: 32283 /usr/bin/udp2log --config-file=/etc/udp2log --daemon -p 8420 --multicast 233.58.59.1 --recv-queue=524288 [01:10:29] so no. [01:10:37] all the listed boxes are udp2log recievers [01:10:46] and i checked on an26, it's looking for cp1044 [01:10:54] mmmmmm [01:10:55] that's a mobile varnish box [01:10:57] yup [01:11:04] have you mentioned this in the wikimedia-operations channel? [01:11:11] no. [01:11:16] they might be able to find someone that can help us in the short run, no? [01:11:20] only otto admins our boxes. [01:11:29] jesus. [01:11:34] txt from otto: [01:11:37] still worth asking. got the same advice from robla [01:11:47] "Hm. Ok Seder dinner ahhh" [01:11:49] (during the packetloss fun) [01:12:25] I deny everything. wait, what am I denying? [01:12:52] we're texting -- i'm going to call [01:12:56] lol, robla, I was suggesting that dschoon pings #wikimedia-operations to get help on the boxes with disk full warnings. [01:13:47] assuming ottomata is otherwise detained. [01:14:03] he says / won't effect udp2log [01:14:23] still weird that they are filling up [01:14:50] disks filling up never ends well [01:14:51] i will look into it [01:14:55] crisis is less critical [01:15:05] which machine is filling up, and which partition? [01:16:02] / on an03, an09 and an26 [01:18:04] root partitions filling up really never ends well. are those boxes expendable? [01:18:11] milimetric: were you able to build the jars? [01:18:31] they are not. [01:19:05] yeah, some help from the other opsen may be in order [01:19:13] i think that is wise. [01:21:29] yes drdee, jars are built! [01:21:35] AWESOME! [01:21:39] congrats [01:21:45] from now on it's easy breezy [01:21:45] ops is ignoring me. [01:22:09] * average_drifter is happy if you're happy [01:22:23] :) [01:22:35] thank you very much for all your help dschoon, average_drifter, drdee [01:24:16] ugh [01:24:47] /var/lib/hadoop is 700G [01:24:50] this is a conf bug [01:25:14] logs? [01:25:20] Leslie must be AFK at the moment [01:46:41] okay, i think we're okay now [01:46:47] only an03 was handling customer data [01:49:00] robla, drdee [01:49:02] ^^^ [01:49:35] k [01:50:09] * robla skims #wm-ops backlog [01:51:13] dschoon: great, glad to hear. [01:51:40] is an03 sorta limping along, or is it pretty much fixed? [01:51:43] we'll look into the root cause tomorrow [01:52:05] an03 has 1.5G free [01:52:11] sounds reasonable [01:52:15] this hasn't changed in the last 10m [01:52:42] i've kept otto up to date [01:53:04] I wonder if that torrent server I was running there had anything to do with it [01:53:11] >:( [01:53:16] whut? [01:53:29] ok...I'll stop :) [01:53:40] how many times have we told you? [01:53:49] an21-24 are the porn boxes? [01:53:50] INFORMATION JUST WANTS TO BE FREE [01:53:55] jesus. [01:54:10] time for a goddamn glass of wine. [01:54:36] yes, sounds like it. enjoy! [03:49:54] that your doing, ottomata? [03:50:31] i see disk on an09 and an26 resolving itself magically [03:54:24] yes [03:54:29] so [03:54:46] i was using those when I was hunting down udp2log packet loss problems (that turned out not to exist) [03:55:10] in order to capture short bursts of logs, I wrote unsampled logs to disk by turning udp2log on and then off again [03:55:18] looks like udp2log started back up [03:55:24] i've edited the config files there to turn it off [03:55:26] as for an03, i don't konw [03:55:28] looking now [03:57:30] what did you delete from an03 to free ups pace? [03:58:32] ahh [03:58:42] an03? [03:58:49] ja [03:58:49] ottomata: i think mutante gzipped a log file [03:59:01] i recommend reading over the ops channel later [03:59:14] all the chat about it was there [03:59:25] plus, uh, your phone :) [03:59:43] i'm sure #ops will make more sense to you [04:00:13] and as a reminder, i deleted /var/lib/hadoop/data on an03 (because you ok'd it!), even tho that was silly [04:00:21] i agree it doesn't matter, though [04:00:40] an03 definitely doesn't show up as a datanode [04:00:48] milimetric: were you asking about profiling code in nodejs ? [04:00:55] dschoon: were you interested in this ? [04:01:05] nope. [04:01:08] ok [04:01:12] ottomata: lmk if you have other qs [04:02:09] hey average_drifter not recently [04:02:21] but I was trying to profile knockout js code [04:03:04] http://substack.net/heatwave_node_knockout_2011 [04:04:14] heh, that's funny [04:04:31] dschoon, we should get kraken webrequest loss into ganglia! :) [04:04:34] that woudl be real nice [04:04:37] it's the client side code that needs optimizing in limn though [04:04:40] unfortunately [04:04:54] more importantly, we should get / freespace into ganglia :P [04:04:59] ottomata: can I help with getting that into ganglia ? [04:05:08] i don't give any shits about aggregate disk [04:05:37] that should be in nagios, afaik / should be in nagios, but nagios alerts from the analytics cluster never worked right [04:05:44] yeah. [04:05:46] but i mean [04:06:00] the ganglia disk_free metric appears to be the aggregate of / + jbod [04:06:05] https://icinga.wikimedia.org/icinga/ [04:06:06] we should split them into /, jbod [04:06:12] + jbod?! [04:06:28] either that, or the numbers here make 0 sense: [04:06:41] aww, i can't link to icinga? [04:06:48] there we go [04:06:49] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=analytics1003&service=Disk+space [04:07:13] http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=&vl=&x=&n=&hreg%5B%5D=analytics10&mreg%5B%5D=disk_free>ype=line&glegend=show&aggregate=1 [04:07:24] that number means nothing to me [04:07:41] i mean, i'm sure i'm wrong and it means SOMETHING [04:07:50] but clearly the worker nodes are including more than just / [04:08:20] hm [04:08:21] http://ganglia.wikimedia.org/latest/graph.php?r=4hr&z=xlarge&c=Analytics+cluster+eqiad&h=analytics1003.eqiad.wmnet&v=1670.212&m=disk_free&jr=&js=&vl=GB&ti=Disk+Space+Available [04:08:29] i'm going to go back to beating spelunky now [04:08:33] we can talk more tomorrow [04:08:41] ooook bye [13:20:59] morning [13:50:33] mornin! [14:06:04] mooooorning [14:32:28] heya drdee [14:32:51] erggh, 1 sec... [14:33:17] better [14:33:21] yeah, drdee, you there? [14:37:28] yo [14:37:49] and idea what caused the full root partitions? [14:38:43] on 9 and 26, yes [14:39:00] errant udp2log procs leftover from the packet loss sleuthing, i turned them off, but you know what happens [14:39:07] i commented outthe lines in udp2log config files there now [14:39:10] on 03, i don't know [14:39:23] but, i callllled you, [14:39:34] oh, and 09 and 26 do nothing right now [14:39:39] k [14:39:39] 03 is one of the 4 udp2log producers [14:39:43] so that is a worry if it busts [14:39:45] but ja [14:39:47] but [14:39:47] so [14:39:50] yup [14:39:55] the 5xx log on locke is huge in the last few days [14:40:10] historically it is approximately 5MB per day [14:40:24] on march 20-22, 50-100MB [14:40:25] and now [14:40:33] march 23-25, 350-550MB [14:40:42] that's compressed [14:41:00] how often does your SSH session freeze up? [14:41:03] i see mostly requests like this: [14:41:04] http://commons.wikimedia.org/w/index.php?title=MediaWiki:Filepage.css&action=raw&maxage=2678400&usemsgcache=yes&ctype=text%2Fcss&smaxage=2678400 [14:41:08] milimetric, not very often [14:41:20] mine stutters a lot but once every hour or so it freezes and I can't get control back, I have to quit [14:42:10] ottomata, i am looking, 1 sec [14:43:15] btw, I just moved 5xx over to gadolinium using udp-filter [14:43:54] k [14:45:33] it seems mostly to be the Mediawiki:Filepage.css file, definitely worth mentioning this in #mediawiki and #wikimedia-operations [14:48:53] yeah [14:48:55] ok [14:56:59] ok, drdee, in other news [14:57:25] all filters expcept for webstatscollector are on gadolinium now [14:57:40] that one is annoying, so many unknowns with that one [14:57:55] like what? [14:58:15] like, how do the files get to dumps.wikimedia.org? how is collector launched? [14:58:29] the first question we did answer a while back ...... [14:58:39] let me poke apergos about that (again) [14:59:05] ok…i poked around for it yesterday but didn't find it [14:59:35] just asked apergos [14:59:45] about the collector [15:00:14] no init.d no nothing? [15:01:14] ... happened again... [15:02:09] what happened? ssh timeout? [15:03:35] ja nothing [15:03:38] that i can see [15:03:54] drdee, can we just leave locke as the webstatscollector machine? :p [15:05:16] from apergos [15:05:16] pagecounts go via cron from spanoshot1 [15:05:33] snapshot 1 that is [15:05:41] snapshot! [15:05:49] yup [15:05:54] probably not in puppet though [15:06:04] ha, totally not [15:07:45] ok cool, I see it [15:07:48] I can make that work [15:09:33] drdee, we renamed webstatscollector master branch, right? [15:09:39] the one I should be looking at is master, not time_travel? [15:11:56] yes [15:11:59] look at master [15:12:59] k [15:19:45] drdee: I sent an e-mail about the csv [15:19:53] drdee: I still haven't found how to run WikiReports.pl [15:19:54] to erik? [15:19:59] drdee: to Erik, you, and Kraig [15:20:05] ok ty [15:25:02] ottomata1, drdee, so my SSH resets after every major dump statement I make. [15:25:02] or every 20 minutes [15:25:02] and by resets I mean I lose my work, have to kill the window and open a new one [15:25:03] but! I've got a pig script ready for oozie [15:25:18] that's annoying and weird! [15:25:24] well that is real annoying [15:25:31] if you have connection probs [15:25:35] work in a screen [15:25:41] that will help at least with not losing your work [15:26:12] but! I've got a pig script ready for oozie :) [15:26:29] so I'm reading your oozie tutorial again andrew and will ask you as I have questions [15:27:22] drdee, is this good enough for output? [15:27:24]