[00:01:39] i await the milimetric [00:02:58] i sing the body electric [00:06:57] i collect the stamps ecclectic [00:22:26] [travis-ci] master/f07c32b (#107 by dsc): The build has errored. http://travis-ci.org/wikimedia/kraken/builds/5703040 [00:32:33] i really ought to fix that ;) [00:36:06] I'm gonna start saying the same thing every time I see that message [00:36:15] i really ought to fix that [00:39:09] i really spend some time looking into how to fix it and not sure if it's something we can fix [00:39:18] maybe use our own jenkins server? :D [00:42:42] ok, I'll turn off the travis hook then. There's no sense wasting resources until we fix it [00:44:07] k, turned it off on github drdee. see? told you I'd fix it :) [00:44:59] disagree, at least we were notified of some one pushing code [00:49:11] ah, well for that github has an IRC service hook [00:49:12] enabling now [00:50:05] thx [13:58:20] morning [13:58:41] mornig! [14:03:34] morning all [14:06:50] i updated 398 drdee [14:06:52] ottomata: any luck getting my mess sorted? [14:07:09] i remember we found the same problem a while back but it wasn't a priority to get me and kraken on speaking terms [14:08:03] what's your mess? i just responded to your latest email [14:08:18] if your labs account is different than your shell account, hue is probably not going to work for you :/ [14:08:35] we can create a manual hue dandreescu account in hue... [14:08:35] that should be fine [14:08:40] it would be more proper to fix labs now [14:09:05] i would say let's fix this for once and for all [14:14:05] doh, missed your email ottomata, my fault [14:14:18] i'll talk to labs about changing my username [14:14:21] but i'm a little confused [14:14:31] wouldn't that be on a different authentication scheme anyway? [14:14:40] these accounts can't be glued together by name alone... [14:14:46] that'd be a crazy security hole [14:14:49] yup [14:14:52] that's what hadoop does [14:15:00] ... [14:15:17] dude but then right now someone could get a labs account called "dandreescu" and take over hue [14:15:53] http://blog.cloudera.com/blog/2012/03/authorization-and-authentication-in-hadoop/ [14:16:00] k, i'll read [14:16:00] yes [14:16:20] the worst is: [14:20:37] (pmed) [14:40:26] lol [14:40:33] the worst part is "pmed" [14:55:37] I have a question [14:56:07] hehe [15:33:45] ottomata: i am okay with disabling the ssl stream for logging and ideally we would enable x-proto-for soonish [15:34:11] i'm for it, can I let you and K communicate and push that through? [15:34:19] about pmtpa upload, if that's no longer used then let's not log it either [15:34:23] you sure can! [15:34:26] at your service :D [15:34:34] do we need to notify any downstream consumers that we are removing ssl logs? [15:34:53] i will do a track and trace :) [15:35:22] uhhh, ok! [15:56:59] drdee: run for 2012-08 => 2013-02 started [15:57:12] drdee: code refactored so discarding rules are in their own methods [15:57:21] drdee: dropped command-line arguments in favor of json configuration files [15:57:43] drdee: reports expected today [15:59:49] ok, ty [16:08:58] gm all [16:19:24] drdee: ETA 6h starting 18minutes ago [16:20:29] tu [16:20:31] ty [16:21:47] you're welcome [16:22:07] now I'm working on finding a way to export data to wikistats [16:22:18] I'm a bit behind schedule but I'll get it done [16:22:34] I should think about getting some new tasks in mingle [16:22:59] i will think about that :) [16:23:03] I feel good this week, everything's good for me so I'm getting back on track with things [16:23:11] drdee: thanks [16:39:07] mornin [16:42:01] ottomata: i'm getting chewed out in labs [16:42:03] care to help? [16:42:04] :) [16:43:18] just joined, what's up? [17:00:00] scrummmm [17:00:04] https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [17:00:09] ottomata, drdee, milimetric [17:00:35] gr, i did the not-join thing again [17:20:31] on the topic of writing pig scripts [17:20:46] early on, i found this very helpful http://blog.linkedin.com/2010/07/01/linkedin-apache-pig/ [17:34:32] dschoon: re zero dashboard, can we chat [17:34:40] k [17:34:41] yes! [17:35:04] basically, the dashboards mostly work, but I can't tell certain things without more data [17:35:13] is it possible to run the job for the last month? [17:35:19] or is it extremely slow? [17:36:18] no, it's snappy. [17:36:31] let me see what I can do [17:36:34] that would be greeat [17:37:31] so there's data from 3/14-3/22 erosen [17:37:40] processed, that is [17:37:51] coo; [17:37:52] tha [17:37:59] is what it looked like [17:38:03] I'm wondering if there is more raw data? [17:38:24] name a time for the start :) [17:38:47] initial-instance="2013-03-14T00:00Z" timezone="Universal" [17:40:28] ^^ erosen [17:40:44] as old as we have basically [17:41:08] okay. [17:41:34] ottomata: quick q [17:42:32] ottomata: later will you have a few minutes to braindump the setup of the currently-disabled import streams (events, all100)? [17:43:49] ottomata: question two! [17:44:07] do you have an objection to me deleting /wmf/raw/webrequest/webrequest-wikipedia-mobile/*/_original_dir_name.txt ? [17:44:27] naw tha'ts fine [17:44:29] i'm pretty sure they'll get picked up by any jobs that run across data before 1/4 [17:45:04] it is done. [17:50:43] kraigparkinson: i am in the hangout [17:50:46] headed to cafe, beback in a bit [17:50:54] ok [18:04:31] pig syntax in IntelliJ: [18:04:43] https://twitter.com/rjurney/status/201052603365343232 [18:10:54] i could never figure out how to install it. [18:10:56] i tried. [18:11:27] i gave up after 15m or so. [18:12:12] i guess i should publish my textmate pig bundle [18:13:00] (the existing ones had crappy syntax highlighting and the wrong comments) [18:16:52] :) [18:17:08] dschoon if you ever have a problem with anything, you should probably just ask me [18:17:17] chances are that instead of difficult it's just so easy you're not seeing it [18:17:37] installation instructions: download three-little-piggies.jar from that github link [18:17:47] save in your idea/plugins directory [18:17:52] restart intellij [18:17:54] i didn't see a jar... [18:18:05] ah, it's not in the repo [18:18:08] https://github.com/brandonkearby/three-little-piggies/blob/master/three-little-piggies.jar [18:18:13] it's in the repo [18:18:23] weird. [18:18:28] anyway [18:18:35] ahhh [18:18:38] i was looking at https://github.com/11xor6/three-little-piggies [18:19:11] heh, this is why forks are confusing [18:27:38] dschoon: any progress on new reports? [18:27:53] not yet. was helping dan. [18:27:56] dschoon: also, I have noticed a few oddites about the data, which we should talk about before closing the card [18:27:57] k [18:27:59] almost done, though [18:28:01] k [18:28:04] go for it :) [18:28:22] basically, some countries are missing and some providers are missing [18:34:01] that means they're missing from the data [18:34:02] ^^ erosen [18:34:08] yeah [18:34:13] (we talked about htis) [18:34:19] i figured it wasn't really your problem [18:34:23] *nod* [18:34:26] dschoon: did we talk abou thtis? [18:34:30] you had said your scripts could fill in zeros? [18:34:32] yeah [18:34:36] but I think this is different [18:34:42] because entire carriers are missing [18:34:45] hmm. [18:34:47] which I hadn't yet discovered [18:34:48] okay. [18:34:53] interesting. [18:34:56] you in the office? [18:34:59] dschoon: yup [18:35:08] cool, i'll come up and we can chat [18:35:22] great [18:35:36] (btw, I've got a lunch thing at noon) [18:37:03] aiight [18:37:07] find me after [18:37:09] erosen [18:37:13] dschoon: sounds good [18:37:26] btw [18:37:26] https://github.com/wikimedia/kraken/blob/master/kraken-generic/src/main/resources/mcc_mnc.json [18:37:32] that file lists all the known carriers [18:37:33] ^^ erosen [18:38:56] dschoon: interesting, it seems that we haven't dealt with the tata-india case. [18:39:09] drdee: do you know more about this? [18:39:29] (ja, he's the man who created it) [18:39:35] i haven't touched the datafile [18:39:41] btw, the logic is straightforward https://github.com/wikimedia/kraken/blob/master/kraken-pig/src/main/java/org/wikimedia/analytics/kraken/pig/Zero.java#L87 [18:40:03] gotcha [18:40:04] tâta case is that the mnc-mmc with a * [18:40:51] (it turns the json file into a map by MCC_MNC, and just checks it for the value we get from X-CS) [18:41:23] drdee: what do you mean? [18:41:32] erosen: ^^ see above [18:41:42] there is one carrier, i belief tata [18:41:50] that has about 40 mnc-mcc codes [18:42:04] dschoon: yeah, that is the logic I expected, but tata is a special case [18:42:15] for each province in india it has it's own mnc-mcc code [18:42:23] so what i did [18:42:24] why aren't they listed in that file? [18:42:29] drdee: where does the fancy matching actually happen? [18:42:30] 1 sec [18:42:36] slow down [18:43:02] * drdee is looking in puupet [18:43:40] drdee: didn't mean to overwhelm, why don't we have a chat in the afternoon. does that work for you? [18:43:49] just 1 sec :) [18:44:09] erosen: fwiw, if the MCC_MNC lookup doens't work, I still emit the code [18:44:21] interesting... [18:44:25] good [18:44:28] so in puppet [18:44:30] that's line 94. [18:44:35] it does [18:44:36] if (req.http.X-Subdomain == "ZERO") { [18:44:37] set req.http.X-Carrier = "TATA"; [18:44:38] /* MCC-MNC not clear from http://en.wikipedia.org/wiki/Mobile_country_code */ [18:44:39] set req.http.X-CS = "405-0*"; [18:44:50] see templates/varnish/mobile-frontend.inc.vcl.erb [18:45:07] line 354 [18:45:12] uhhh [18:45:18] if that got merged is is brand new [18:45:25] someone committed that today [18:45:27] agreed [18:45:32] it used to check for IPs, not for X-Subdomain [18:45:43] additionally, there are no entries for tata in https://github.com/wikimedia/kraken/blob/master/kraken-generic/src/main/resources/mcc_mnc.json [18:45:45] this is the important line [18:45:46] set req.http.X-CS = "405-0*"; [18:45:49] which means it won't be matched anyway. [18:46:04] how old is this requirement? [18:46:06] it could also be that mccmnc.com did not have the tata entry [18:46:10] cause it's not on the card. [18:46:20] erosen scraped it from that stie [18:46:32] hm [18:46:35] yeah [18:46:57] the site mcc-mnc.com has carriers for the tat codes [18:47:02] they just don't call the carriers tata [18:47:47] there are only two entries in the JSON file that start with 405-0: [18:47:58] Reliance and Fascel [18:48:31] so even if we had the data -- which we don't -- it wouldn't be matched by pig [18:48:51] yeah, that seems to be the case [18:49:02] so we need to figure out tata's mcc-mnc code :) [18:49:12] hm, wait. [18:49:20] oh. [18:50:58] dschoon: btw, these are the carrier names that show up in pig output [18:50:59] https://gist.github.com/embr/37c2d637da7f0e515f59 [18:52:13] git blame sees 405-0* (line 267) committed on Jan 23 [18:52:32] ottomata, if I see the commit in my local copy, that means it has been merged, right? [18:53:15] er, erosen [18:53:27] that looks like all manner of crap. [18:53:32] yeah [18:53:33] lots of accept-lang [18:53:47] i think my shell command it right: cut uses tab as delimiter [18:53:53] ohh [18:53:57] but there could be blanks. [18:54:00] two tabs in a row [18:54:02] is one delimiter [18:54:05] maybe [18:54:09] that's my guess. [18:54:35] but yeah. [18:54:41] that's an impressive quantity of crap [18:54:46] dschoon: but those fields aren't present in the pig output [18:54:47] let me see what I get [18:54:52] right [18:54:54] so they have to be getting in there in the udf / pig [18:54:59] yep. [18:55:00] true [18:55:10] and that list is too small, right? [18:55:28] for which thing? [18:56:11] about how many carriers should there be? [18:56:23] around 10 [18:56:24] it's not way off [18:56:45] we only set x-cs headers for things which are in the varnish config [18:57:36] hm [19:12:04] so yeah. i can confirm that result, erosen [19:12:10] I'll look into it. [19:30:39] erosen: there are exactly 71 crap X-CS entries in the output. [19:49:08] lunch time. [19:49:15] erosen: https://gist.github.com/dsc/5224205 is running, i expect it'll take a while. [19:49:53] on the plus side, i managed to get it to do that in a single MR job. [19:50:03] on the minus side, it's 20k splits :) [19:50:06] er, 30k [19:50:37] Total input paths (combined) to process : 30107 [19:51:06] progress: http://analytics1010.eqiad.wmnet:8088/proxy/application_1363811768346_2731/ [19:51:46] bbl [19:57:54] be back in a min [20:08:19] hm. apparently this job is going to take 4.5h. [20:19:05] back [20:20:52] me too! [20:21:51] ottomata: http://localhost:8888/jobbrowser/jobs/?state=all&text=&user= [20:21:55] is this going to be a problem? [20:22:37] like, will those jobs be blocked until my X-CS script finishes in 4h? [20:23:55] (due to our scheduler?) [20:24:04] i can kill it and restart with a lower parallelism... [20:24:08] ^^ ottomata [20:24:13] ees good question [20:24:19] i do not know for sure, but I am going to venture a yes [20:26:49] i guess we'll watch for another cycle [20:26:56] and if they back up, i'll kill it [20:27:05] reminder: we should switch to FairScheduler :P [20:27:54] yeah i wonder if that is supported now [20:27:58] there was some cdh4 problem with that I think [20:27:59] drdee? [20:28:01] ^^ [20:28:02] lame. [20:28:24] yoyo [20:28:38] afaik, still not compatible [20:28:58] yeah, ottomata, it seems that two runs of Kafka consumers are now waiting [20:28:59] but probably good idea to check with cdh4 docs [20:29:02] i'll kill my job. [20:29:08] 4.2 might have changed things [20:32:02] yay, ottomata it's clearing the queue. [20:32:52] oh its yarn [20:38:05] hmm, it might be suppported now [20:38:14] The fair share scheduler functionality has been ported in part in CDH4.1. Its missing the Web UI (with metrics), making it difficult to use. We are porting the Web UI as part of CDH4.2. [20:51:11] interesting [20:51:26] btw, ottomata i restarted my job with half the input and parallel=5 [20:52:56] we'll see if it blocks all other jobs again [20:53:16] mm ok [20:55:53] (in fact, we shall see in 5m) [21:02:34] sigh [21:02:46] i guess i need to force parallel to a small number? [21:05:16] parellel only applies to reducers, i think [21:05:33] i think there's no way to limit the number of mappers, except by input [21:05:44] input data size [21:06:30] changing the hdfs block size [21:07:06] ottomata is right about parallel, that only applies to reducers [21:07:23] you can also specify mappers as jobconf property but it's a hint [21:08:12] ottomata: how long do you want to wait with merging 55394 [21:08:13] ? [21:08:51] ok i see your comment [21:09:24] i think maybe monday is fine [21:09:26] let's do it then? [21:15:51] yeah [21:16:09] but the problem is that if you don't limit reducers, they will take up all the nodes [21:16:16] the mappers are never the problem. [21:16:38] i switched to parallel 2 and the kafka jobs are fine now [21:17:10] ah ok [21:17:11] cool! [21:20:23] ottomata, yeah let's wait until monday or how about a 5pm friday 'deployment, i know you love those ;) [21:20:33] hah, i wouldn't mind with this one [21:20:34] super easy [21:20:40] very low risk [21:20:46] technically low risk [21:20:51] let's wait [21:26:43] well, i wanna wait for feedback but ja [21:37:25] drdee, yt? [21:38:46] ya [21:40:40] are you going to be OK getting all the cards in the backlog and in analysis to ready for tech review by Tuesday? [21:42:12] that's a pace of roughly three cards per day between now and then. [21:46:18] you mean this list: https://mingle.corp.wikimedia.org/projects/analytics/cards/list?filters%5B%5D=%5BType%5D%5Bis%5D%5BFeature%5D&filters%5B%5D=%5BDevelopment+Status%5D%5Bis%5D%5BBacklog%5D&filters%5B%5D=%5BDevelopment+Status%5D%5Bis%5D%5BIn+Analysis%5D&filters%5B%5D=%5BRelease+Schedule+-+Sprint%5D%5Bis%5D%5B%28Next+Sprint%29%5D&page=1&style=list&tab=All [21:47:07] Yup. Same cards as I saw in this team favourite: https://mingle.corp.wikimedia.org/projects/analytics/cards?favorite_id=758&view=%3EWIP+-+Feature+Analysis [21:47:48] i like it the manual way :D [21:47:58] luddite [21:48:06] :) [21:51:20] drdee, so? think you can do it in the time alloted [21:51:52] i am considering pushing 236 and 259 back to another sprint in the future as they seem to become less urgent now we are not using limn so much; unless dschoon feels very strongly about these two cards [21:52:30] dschoon: https://mingle.corp.wikimedia.org/projects/analytics/cards/236 and https://mingle.corp.wikimedia.org/projects/analytics/cards/259, is still urgent? [21:52:53] They're in the Intangible class of service, so that's fair. IF it's possible to do one of them, we should.. [21:52:59] kraigparkinson: but yes that sounds doable [21:53:08] It would be good to have at least one Intangible in per sprint. [21:53:14] ok [21:59:44] welp. [21:59:44] xcs job blocked everything again. [21:59:45] i think i just cannot run it. [21:59:50] not without blocking all other jobs. [22:00:03] we'll need to talk about this on monday [22:01:26] drdee: 236 is important. [22:01:32] and i would argue it is not intangible. [22:02:14] unless we migrate kripke to precise, all customer limn dashboards go away if gluster dies and we restart the box. [22:02:20] ^^ kraigparkinson [22:02:42] as for 259, you'd have to ask milimetric [22:03:37] aighty guys, good weekend! [22:03:41] ohh and kraigparkinson i belief that https://mingle.corp.wikimedia.org/projects/analytics/cards?favorite_id=758&view=%3EWIP+-+Feature+Analysis should include Defects as well, or not? [22:03:51] 236 is a quality of service issue, right? [22:04:09] drdee, yes is should! I'll update the view unless you're ready to do it right now. [22:04:16] an expected value of quality of service, i suppose [22:10:45] Yes, quality of service certainly is valuable to customers. Just trying to clarify it was a quality of service requirement over say design/tech debt [22:11:47] i added detail to the card [22:11:52] good weekend drdee, I reviewed your sql [22:13:08] 259 is certainly a priority for a lot of people, but I think it would be a very big feature. We can have a conversation about it, but it can wait a week or two in my opinion. [22:13:12] kraigparkinson: ^^ [22:13:54] ciao drdee [22:14:33] erosen: so, i don't think i can easily run my X-CS job, but i'm pretty sure the answer is that those garbage values are *really* in the X-CS header [22:14:39] as in, garbage is being submitted [22:14:44] i *think* [22:14:52] interesting... [22:15:07] however, accept_lang is the field immediately before x_cs in web requests [22:15:14] yeah [22:15:20] seems like a delimiter problem. [22:15:22] i wanted to dump out the lines that the junk came from [22:15:37] but apparently running my job across all the data blocks all other jobs [22:15:48] which is a much ...huger... problem. [22:15:49] giant. [22:15:51] astronomically bad. [22:15:59] which i expect we will talk about on monday [22:16:16] ya [22:16:19] well that's fine [22:16:21] i'll let amit know [22:16:26] as there's not much point in having 10 workers if we can only do one thing at a time [22:16:28] well [22:16:31] in the interim [22:16:51] i can modify the zero job to drop unknown carriers [22:17:01] hm [22:17:02] actually [22:17:25] i'll leave it as it is now, but add a step afterward in the workflow that filters the results for valid carriers [22:17:30] agreed with dschoon that the parallelism thing seems weird [22:17:34] scheduler doesn't help with that? [22:17:42] i think it will. [22:17:52] so we need to prioritize changing the scheduler [22:17:54] k, we should probably earmark a card with that :) [22:18:14] anwyay, erosen. leaving both datasets will let us monitor the garbage, but give you something decent to work with [22:18:16] yep. [22:18:19] doing it now. [22:18:31] k [22:18:32] great [22:22:30] kraigparkinson, drdee, ottomata: https://mingle.corp.wikimedia.org/projects/analytics/cards/447 [22:22:33] important! [22:23:24]