[14:57:16] morning average_drifter, milimetric [14:59:38] morning drdee [15:00:37] yoyo [15:00:53] google's spanner is awesome and scary [15:01:03] TrueTime API :D [15:07:25] ping average_drifter [15:08:47] morning [15:08:49] drdee: hey drdee [15:10:12] jenkins stuff finished? [15:10:32] yes, jenkins works ok [15:10:49] time to debug webstatscollector? [15:10:59] yes [15:11:43] do you already have some comparison datasets? [15:13:02] drdee: no, but I'm producing them now [15:13:09] copied source code to build1 [15:13:30] ok [15:19:51] Connection to bastion.wmflabs.org closed by remote host. [15:19:52] Write failed: Broken pipe [15:19:57] my connection to bastion keeps breaking [15:20:04] hey ottomata ! :) [15:20:08] my connection to bastion keeps breaking [15:20:11] anyone else experiencing this ? [15:20:52] yes i saw a email about this today as welll [15:24:59] are you still having the issue? [15:25:55] hey guys [15:25:56] hmm [15:25:59] isn't there another bastion? [15:26:00] hm [15:27:36] ah yes, but it doesn't have access to labs [15:30:55] brb 20m [16:50:25] ottomatta: i'm looking into the saudi telecom issue from a while back and I can't seem to find any IPs that aren't saudi-telecom [16:50:41] do you know the correct ip ranges? [16:51:04] https://office.wikimedia.org/wiki/Partner_IP_Ranges#Saudi_Telecom_.28ST.29 [16:54:53] brb coffee [16:55:37] ottomata: ^^ [16:56:20] as in, you thikn those ranges are too large? [16:56:27] i don't know anything about the ranges beyond this page [16:56:49] basically I am filtering the files for anything that doesn't match the ip ranges on the linked page [16:56:52] and nothing is being filtered [16:56:56] they are all actually in the range [16:57:11] those are large ranges [16:57:45] i guess the first thing to check is whether the page I'm looking at is up to date [16:57:48] you mean all zero reqs are in those range? [16:58:00] it should be, this is the one I build the udp-filter lines from [16:58:07] amit tries to keep it up to date, afaik [16:58:08] all of the ones which were filtered as saudi-telecom prior to (and after) 10/16 [16:59:19] yeah…..so, wait, i'm confused, that's how it should be, right? [16:59:26] all of the IPs in the saudi logs are in those ranges [16:59:28] right? [17:00:20] i guess I was under the impression that something changes on 10/16 [17:00:26] that the original filter was too broad [17:00:38] and that you fixed it and replaced the filter with a narrower range on 10/16 [17:00:45] is that not what happend? [17:02:11] ottomata: what was the update that you referred to in the e-mail: Date: Tue Oct 16 10:39:36 2012 -0400 [17:02:11] Updating zero filters for Saudi Telecom and Tata India [17:03:36] ha, um, here's the workflow: [17:03:40] amit changes the wiki page [17:03:44] emails me and tells me he changed it [17:03:47] then I change the udp-fitler [17:03:55] that date is from the git commit log when I changed it [17:04:02] gotcha, so you just **added** it [17:04:20] like you updated the udp-filter [17:04:28] but not the saudi-telecom specific range? [17:04:41] um [17:04:55] i make it so the udp-filters match what is on this page [17:05:00] yeah [17:05:01] actually [17:05:06] i *think* I noticed [17:05:09] that what it was before [17:05:12] was very redundant [17:05:13] https://office.wikimedia.org/w/index.php?title=Partner_IP_Ranges&oldid=79897#Saudi_Telecom_.28ST.29 [17:05:40] gotcha [17:05:56] interesting [17:06:10] actually, this is easier to read: [17:06:11] https://office.wikimedia.org/w/index.php?title=Partner_IP_Ranges&oldid=80056#Saudi_Telecom_.28ST.29 [17:06:14] (w/o quotes) [17:06:33] yeah [17:06:55] well (probably as you expected) it appears to have been filtering the same ips [17:07:03] so the change in traffic has to be something different [17:08:19] even though it really seems like it must have to do with the filters [17:08:21] oh you are comparing the filters between the changes? [17:08:25] cause the traffic literally doubles [17:08:30] yeah [17:08:35] or rather the filtered files [17:08:41] yeah, so this range: [17:08:42] and trying to see what is in one but not the other [17:08:42] "84.235.72.0"/22; [17:08:43] is huge [17:08:46] and was on the previous list [17:08:51] 84.235.72.1 - 84.235.75.254 [17:09:04] interesting [17:09:09] so it includes everything that is also listed [17:09:23] ▪ "84.235.72.32"/27; [17:09:23] ▪ "84.235.73.110"; [17:09:23] ▪ "84.235.73.160"/28; [17:09:23] ▪ "84.235.73.208"/28; [17:09:23] ▪ "84.235.73.224"/28; [17:09:23] ▪ "84.235.73.240"/28; [17:09:23] ▪ "84.235.74.0"/29; [17:09:24] ▪ "84.235.74.14"; [17:09:24] ▪ "84.235.75.80"/28; [17:09:40] so, since those ranges were all included in the /22 range [17:09:41] I removed them [17:09:58] same with [17:09:59] 212.118.140.0/22; [17:10:06] which is 212.118.140.1 - 212.118.143.254 [17:10:15] which includes: [17:10:26] ▪ 212.118.140.16/28; [17:10:26] ▪ 212.118.140.200/29; [17:10:26] ▪ 212.118.142.80/28; [17:10:26] ▪ 212.118.142.96/28; [17:10:26] ▪ 212.118.143.32/28; [17:10:27] ▪ 212.118.143.248/29; [17:10:30] so I removed those [17:10:47] yeah, seems reasonable [17:10:56] that leaves: [17:10:56] 212.118.140.0/22; [17:10:56] 84.235.72.0/22; [17:10:56] 84.235.94.240/28; [17:10:56] 212.215.128.0/17; [17:11:00] which, by the way [17:11:02] are huge ranges [17:11:10] /17 is huge [17:11:19] that's 32,766 [17:11:22] IP addies [17:11:54] i'm actually a little fuzzy on my cidr notation/ip range stuff [17:12:28] so, there is a realy nice tool that came witih the libcidr package I built for udp-filter [17:12:30] on stat1 [17:12:32] check it out [17:12:33] cidrcalc [17:12:37] nice [17:12:44] $ cidrcalc 212.215.128.0/17; [17:12:44] Address: 212.215.128.0 [17:12:44] Netmask: 255.255.128.0 (/17) [17:12:44] Wildcard: 0.0.127.255 [17:12:44] Network: 212.215.128.0/17 [17:12:45] Broadcast: 212.215.255.255 [17:12:45] Hosts: 212.215.128.1 - 212.215.255.254 [17:12:46] NumHosts: 32,766 [17:12:51] i guess I just though that the cidr notation gives the number of routing bits [17:13:05] and the more routing bits the more ip addrs [17:13:19] yeah, but its backwards, it is the bitmask for the network [17:13:20] so [17:13:23] aah [17:13:29] 32 - (cidr number) == bits for ips [17:13:33] cool [17:13:40] i started to suspect that was what was going on [17:13:43] 32 − 17 = 15 [17:13:49] 2^15 = 32768 [17:14:29] you can't use .0 or .255, so subtract 2, and you've got 32,766 [17:14:36] so, the smaller the /XX number [17:14:38] the more IPs [17:15:14] anyway, yeah, those were all redundant, so that change shouldn't have made a difference in the actual filtered traffic [17:15:49] not sure how you are testing, but a good way would be to take a log file from before the change [17:16:00] and run it through udp-filter with the different ip ranges [17:16:07] and just count the number of lines [17:16:11] should be the same [17:16:21] eah [17:16:23] yeah [17:16:27] i'm doing roughly that [17:20:10] hmmmm, dschoon, does analytics1002 have fewer processors than the other ciscos??? [17:21:12] i don't think so. [17:21:20] (i can't imagine why it would) [17:21:54] in point of fact, it instead appears to be gifted with extra, magic silicon, allowing it to randomly manifest intermittent device errors. [17:23:58] $ cat /proc/cpuinfo | grep processor | wc -l [17:23:58] 12 [17:24:01] that's an02 [17:24:13] $ cat /proc/cpuinfo | grep processor | wc -l [17:24:13] 24 [17:24:16] that's an04 [17:36:41] buh. [17:36:53] talkign to robh [17:36:55] its a bios setting [17:40:50] fucking ssh-keyscan [17:40:54] i hate this process [17:41:59] ? [17:42:18] an01: 24 [17:42:18] an09: 24 [17:42:18] an05: 24 [17:42:19] an03: 24 [17:42:21] an08: 24 [17:42:23] an02: 12 [17:42:25] an06: 24 [17:42:27] an04: 24 [17:42:29] an10: 24 [17:42:29] yeah [17:42:31] an07: ssh_exchange_identification: Connection closed by remote host [17:42:37] a07 is still down [17:42:59] (i assume after our most recent network reshuffle, all the ciscos but an01 got new hostkeys. hence a lot of typing "yes") [17:43:05] so yeah. an02. 12. wtf. [17:45:59] yeah, its bios [17:46:01] CPUs are identical. [17:46:01] they all have 12 [17:46:03] yeah. [17:46:11] there is a hyperthreading bios setting that causes it to report 24 [17:46:15] robh recommends we turn that off [17:46:15] model name : Intel(R) Xeon(R) CPU X5650 @ 2.67GHz [17:46:17] so i am trying [17:46:17] all around. [17:46:21] ...why? [17:46:29] why turn it off, that is? [17:46:37] i suspect that is a very poor plan for us. [17:46:50] we want compute more than we want anything else. [17:47:04] and since there's always going to be iowait, i'm totes fine with hyperthreading [17:47:18] apparently hyperthreading doesn't really do much for most apps? [17:47:45] truth. but it does for compute intensive apps :) [17:47:55] esp ones that are going to hammer IO [17:47:59] like, say, hadoop mappers. [17:48:14] but it's worth looking into. i'm sure cloudera has something to say. [17:53:26] http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/ in a benchmark they say they turned on hyperthreading [17:58:02] yeah, saw that too [17:58:11] these are the ciscos though, so for storm or whatever [17:58:14] i'll go ahead and turn it on here too [17:58:19] hyperthreading is off on the dells right now [17:58:28] we should run a bench, then turn it on, and run the bench again [17:58:55] do we have benchmark jobs yet? [17:59:10] there are the easy benches that ship with hadoop [17:59:12] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [18:30:17] ping average_drifter [18:38:27] weeee, cool! [18:38:38] dschoon, add storm.analytics.wikimedia.org to that line in your hosts file [18:38:39] and [18:38:40] http://storm.analytics.wikimedia.org/ [18:38:44] hot! [18:39:08] gotta lunchy and move locs [19:03:19] ping average_drifter [19:13:58] brb, forgot about breakfast [19:25:03] drdee_: ok, I managed to unzip, compile udp-filters and webstatscollector on build1, copied static binaries to stat1 /home/spetrea/webstats_debugging , setting up a run for them now [19:30:18] k [20:08:38] ottomata: https://copynightnovember.eventbrite.com/ is tonight in case you want to go [20:09:16] Also tonight: The Masters of Social Gastronomy [20:09:57] part of http://brooklynbrainery.com/ [20:24:23] ottomata: i fixed http://storm.analytics.wikimedia.org/ [20:24:42] $storm.home wasn't on the classpath. i fixed it in puppet as well (in theory) [20:24:47] ottomata: free tonight? https://copynightnovember.eventbrite.com/ & http://brooklynbrainery.com/ Masters of Social Gastronomy [20:25:51] hiaaa, naw can't, i'm finally unpacking my stuff and moving into my new room [20:26:18] ottomata: oh, congrats on the new place, what neighborhood? [20:26:41] dschoon, nice! [20:26:47] same neighb, crown heights [20:26:50] a block from where I used to be [20:27:02] dschoon, yeah that looks like a good fix [20:27:17] https://github.com/wmf-analytics/puppet-storm/commit/5d93a353691a726fb8c55af69b9bca2697ac083d [20:27:18] rather [20:27:33] ottomata: I tried http://www.wixlounge.com/ and it's fine, although I wouldn't want to make conference calls from there -- if you get there between like 9am and 10:15am you're fine in terms of getting a seat, and it's free [20:27:44] ah cool [20:27:47] bbl foods [20:28:07] * sumanah goes to try to concentrate on OPW people in #mediawiki [20:29:12] one thing that bugs about puppet. [20:29:16] oh? [20:29:19] it seems like i have to be root to do anything. [20:29:27] like, i have to be root to *look* at things [20:29:43] "no user serviceable parts within" [20:29:45] the manifests you mean? [20:29:47] ? [20:30:29] and all the files it checks out, the config templates, their results... [20:31:33] ? [20:31:44] do you mean things in /etc/puppet, or things that puppet does? [20:31:56] like, /etc/defaullt/storm [20:31:56] ? [20:32:03] the latter. [20:32:11] i was trying turn up debugging to see the 404s. [20:32:16] easy, i think. [20:32:17] -rw-r--r-- 1 root root 442 Nov 27 20:14 /etc/default/storm [20:32:24] java makes this shockingly sane. [20:32:49] you don't need to be root to look at that [20:32:52] but yeah, to change it [20:32:52] while there may be global config in /etc/storm, surely there is local config elsewhere! [20:33:07] ah! i have hunted down /var/lib/storm! maybe here? no. [20:33:14] what are you looking for/ [20:33:14] ? [20:33:22] no no. [20:33:24] i figured it out. [20:33:32] the only configs I know of are in /etc/storm and /etc/default/storm* [20:33:35] this is my needlessly melodramatic narrative version of what happened [20:33:44] oh. ha [20:34:06] so. i get that the current arrangement of files is very unix-y [20:34:22] but it took a long while to find anything because it looks nothing like what a storm distro looks like [20:35:32] /usr/lib/storm/{conf,log4j} feature pretty heavily in the docs [20:35:36] and they don't exist in our setup [20:35:51] (because conf is /etc/storm and we've dumped log4j into it) [20:36:07] is the /etc/default/storm* thing standard with upstart? i've never seen that before. [20:36:37] hmm, usually those are symlinked [20:36:47] /usr/lib/storm conf etc [20:36:51] i should add that [20:36:53] yeah. [20:36:57] i was going to suggest this [20:36:58] the default thing is for init scripts, yeah [20:37:02] usually those set env vars [20:37:15] because i totally get that there's a tension between "package mode" and "tarball mode" [20:37:31] i think homebrew does a pretty great job walking the line [20:38:01] fyi, this setup originally comes from the .deb [20:38:01] https://github.com/wmf-analytics/storm-deb-packaging [20:38:10] which I forked from https://github.com/phobos182/storm-deb-packaging [20:38:54] everything gets unpacked to its own directory a la $HOMEBREW_PREFIX// [20:39:11] even cloudera uses /etc/default [20:39:28] then all the relevant files are symlinked to whatever weird locations the OS package manager things are groovy [20:39:44] (which at least makes it reasonable to figure out where they came from) [20:39:46] *nod* [20:40:21] i just haven't seen it before. init scripts are a bikeshed i've always found cosmically boring. [20:41:41] ooo, i missed the 404s before, cool, now it is prettier [20:42:07] dear dschoon, what would it take for me to have a working java dev env? :) [20:42:47] i think the most useful answer is "a limn release that doesn't shame us" [20:43:15] ha [20:44:09] but that means at least 2 more weeks of waiting, can you tell ottomata what's left to do so maybe he can look into it (regarding setting up nexus and the maven repo?) [20:44:17] but i'm happy to bring you up to speed if you want to dive in [20:44:24] maybe ... thursday. [20:44:26] or friday. [20:45:54] and no, drdee_. next metrics meeting is dec 6, which is <10 days away [20:46:05] * dschoon frets, fusses [20:46:08] ok, i need food. brb [20:46:12] (i am bad at that) [21:08:39] (back) [21:43:55] this is a very cool kickstarter project (IMHO) http://www.kickstarter.com/projects/holo/holo-magazine [22:11:13] drdee_: ok, found the problem [22:11:22] drdee_: it was 9th field [22:11:24] what wa it? [22:11:26] drdee_: the url field [22:11:33] drdee_: uhm, I need to enlarge it [22:11:34] buffer too small? [22:11:36] yes [22:11:42] how large is it right now? [22:12:15] there was a char[] which was 2000 and I up-ed it to 5000 [22:12:28] but still the output would be now [22:12:44] how long was the url that was causing the buffer overflow? [22:12:48] https://gist.github.com/4b72b898869fbfc6a251 [22:13:21] drdee_: the url that caused this was 2851 characters long [22:14:32] it came from the Persian language, from Farsi [22:14:53] it is always the iranians :D [22:15:20] is it an actual article title or is it a 404? [22:15:33] trying it [22:17:24] "The requested page title was invalid, empty, or an incorrectly linked the title Myanzbany or Myanvykyay. May contain one or more characters that can not be used in titles" [22:26:21] k [22:26:59] maybe you can do a quick shell script over all sampled logs in stat1 and find the largest url to make sure that the current buffer is large enough [22:52:55] drdee_: so far 7420 https://gist.github.com/4df242c6b38f9efded6f [22:53:03] 7420 is the biggest url so far [22:53:13] ok [22:53:29] so let's put the url buffer to 10k as well [22:53:33] script just found one with 7823, we prolly need to truncate them [22:53:39] oh ok [22:54:08] but on the other hand this is an extra reason to filter out 404's