[13:49:35] good morning [13:52:41] good morning milimetric [14:23:00] moooorning!!!!!! [14:24:11] morning! [14:24:17] it is time for a 3rd pair of socks! [14:24:57] indeed! [14:34:38] so, unsampled packet loss monitoring doesn't look very helpful [14:34:38] http://ganglia.wikimedia.org/latest/?c=Analytics%20cluster%20eqiad&h=analytics1009.eqiad.wmnet&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [14:34:55] the m meens thousandths [14:35:00] so over here [14:35:00] http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Analytics%20cluster%20eqiad&h=analytics1009.eqiad.wmnet&v=0.01253&m=packet_loss_90th&r=hour&z=default&jr=&js=&st=1359037834&vl=%25&z=large [14:35:59] so that hovers between .005 and .05 % [14:36:14] whereas the sampled packet loss monitoring hovers around 5% [14:36:33] now, a big difference there is that there is nothing else going on on analytics1009 [14:36:46] and on analytics1003 (where there is sampled packet loss monitoring), there's all those filters running [14:40:56] i don't understand how this could be though, if I am getting the same numbers writing to a file as I am in hdfs and only have 0.05% loss [14:41:02] very confusing [14:41:53] how about running the packet loss python script against your local saved fies [14:41:54] a [14:42:00] and compare the numbers to see if they match [14:42:12] that at least both methods report the same packet losss [14:43:07] hmmmmmmMmmmmm [14:43:11] interesting, yeah I think I can do that [14:43:17] good idea [14:58:49] drdee, i'm looking at packet loss for those 4 hosts in the mobile logs as computed by packet-loss.cpp [14:58:53] i'm saving all the results, but as it goes [14:58:57] it looks pretty typical [14:59:03] about 5 or 6% packet loss all around [14:59:11] which is what I computed with my sleuthing manually as well [15:08:38] ok [15:14:41] EROSEN: http://analytics1010.eqiad.wmnet:19888/jobhistory/job/job_1355947335321_9786/mapreduce/job/job_1355947335321_9786 [15:14:43] FINISHED [15:14:45] DONE [15:14:46] DEAL [15:14:48] IT WORKS [15:14:49] niiiiice [15:14:55] terribly slow [15:15:00] 22 hours [15:15:00] ya [15:15:22] so please have a look at the data :) [15:15:29] what was the output did called? [15:15:38] i didn't see anythign in wikihadoop9 [15:15:57] data is on /usrr/diederik/wikihadoop02 [15:16:08] gotcha [15:16:16] and this was the working command [15:16:18] hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -libjars wikihadoop-0.2-CDH4.jar -D mapred.reduce.tasks=5 -file revision_differ.py -file diff_match_patch.py -file xml_simulator.py -input /user/diederik/arwiki-20130120-pages-meta-history.xml.bz2 -output /user/diederik/wikihadoop02 -inputformat org.wikimedia.wikihadoop.StreamWikiDumpInputFormat -mapper "revision_differ.py" [15:16:20] lot's of unicode hehe [15:16:45] with the sys.path.append('.'), right? [15:16:48] hey erosen, how do I make a limn graph? :) [15:16:50] I have limnified data [15:16:58] I just want to see a graph of it now [15:17:10] if you want all of the rows in the DataSource to be graphed [15:17:22] sure [15:17:24] you can just called ds.write_graph(') [15:17:26] oh but [15:17:27] i mean [15:17:29] i don't have an instance [15:17:30] haha [15:17:33] like how do you see it [15:17:35] yup [15:17:38] i just have datafiles and datasources [15:17:39] ... [15:17:40] now what? [15:17:41] haha [15:17:44] can I use an existing instance? [15:17:55] milimetric has the remote source working [15:17:59] it works for the continent graph [15:18:03] hmm [15:18:07] can't I ijust put my files somewhere that they are accessible? [15:18:14] and then somehow create a graph in one of the existing limn instances? [15:18:15] this get's into the question of whether to use the new limn or old limn [15:18:20] ha [15:18:22] yeah, in theory [15:18:23] yep [15:18:31] I think you should use the new limn [15:18:32] oh milimetric can tell me! [15:18:35] and you just have to talk to milimetric [15:18:36] yeah [15:18:41] hehe [15:18:42] :) [15:18:43] milimetric, can I just do something on dev-reportcard? [15:19:05] maybe without affecting the dashboard? [15:19:05] yep [15:19:18] wanna hangout and I can show you where to commit graphs/datasources? [15:19:30] we can do it together [15:19:31] i don't wanna commit! [15:19:44] i jsut want to drop data somewhere public and then see a graph :) [15:19:58] i think you can only use datasources remotely [15:20:00] well, the graph definition has to go somewhere unfortunately [15:20:06] not graph json files [15:20:10] yeah [15:20:40] erosen: with the sys.path.append('.'), right? [15:20:42] yes [15:20:45] but it's pretty easy ottomata, I can just do it for you [15:20:59] yeah ok [15:21:15] http://analytics1001.wikimedia.org:81/limn/ [15:22:02] k, i'll put it in a graph on dev-reportcard [15:22:05] erosen: i will load the diffs into the revdiff db [15:22:07] coooooooool [15:22:10] can't wait 'till it's easier to do this [15:22:17] word [15:22:20] dschoon used to have a graph editor [15:22:21] no good anymore? [15:22:57] drdee: when are you doing this, let me know about the steps required, if you don't mind [15:23:22] erosen, now? [15:23:26] k [15:23:47] ottomata, can you puppetize libdclassjni-dev and pypy on all an machines? [15:24:22] sure [15:25:14] if there are debs! [15:27:21] seems like what we need: http://packages.debian.org/experimental/pypy-lib [15:27:36] pypy i got [15:27:43] you need lib? [15:27:46] pypy-lib [15:27:55] don't see libdclassjin-dev [15:27:59] hmm [15:28:05] good point [15:28:13] this probably the one: http://packages.debian.org/experimental/pypy [15:28:23] my bad [15:28:31] libdclassjin-dev is on stat1:/home/spetrea/releases/ [15:28:47] we need pypy binary [15:29:14] erosen:building jar for revdiffdb [15:29:25] k [15:29:29] need to make some small changes [15:29:48] to the pom? [15:33:46] it isn't called jin-dev [15:33:47] libdclass-dev [15:33:51] right drdee? [15:34:14] yup [15:35:22] that shoudl be installed on all hadoop nodes [15:35:36] yup [15:35:38] and pypy coming up [15:35:56] awesome and thank you! and this will also be puppetized? [15:35:58] yup [15:36:16] libdclass-dev should go into the wikimedia apt repo [15:36:31] its in the kraken apt repo [15:36:36] not in wikimedia [15:36:40] do you need it in wikimedia? [15:36:45] oh ok [15:36:49] dunno [15:36:51] maybe not [15:38:42] erosen: built jar, now copying files to local fs [15:38:48] from hdfs [15:38:53] nice [15:39:05] revdiff should be able to read straight from hdfs but it can't :) [15:39:17] hmm [15:39:17] weird [15:39:24] ottomata: Exception in thread "main" org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device [15:39:31] on an01 :D :D :D [15:39:32] uh ohhhhh [15:39:36] that's me [15:39:52] or actually erosen ;) [15:39:59] on an01? [15:40:06] hmm [15:40:07] i don't see any disks full [15:40:23] i'm doing anything on an01 right now [15:40:27] diederik you are using 25% of / [15:40:38] i can make some space if we realize where the issue is [15:41:30] (it's deploying ottomata - had to change the date format but other than that it was good) [15:41:50] hmmm, yeah ok, erosen, when I limnified, i thought it would change the date format to what limn expects [15:42:04] no it's ok [15:42:06] it is supposed too.... [15:42:10] it left it in this format: [15:42:10] 2013-01-23T01:21:01 [15:42:11] that ok? [15:42:14] but maybe the new date format is different somehow? [15:42:15] hmm [15:42:15] it is now :) [15:42:16] weird [15:42:27] not sure what the old one did [15:42:37] but I made a regex for the new one that looks like this: [15:42:41] i'm getting off the train now, so i'll be walking for a few min [15:42:50] YYYY-MM-DD.HH:mm [15:42:56] hmm [15:42:59] so the dashes are mandatory but the T can be anything [15:43:08] sorry! bbib [15:43:09] oh ok [15:43:16] i see, you changed what limn expects, not my file? [15:43:41] yep [15:43:51] the customer is always right [15:43:51] cool [15:43:57] maybe if you are at that [15:43:59] make the regex do [15:45:31] YYYY[-/]mm[-/]dd.HH[:-\.]MM[:-\.]SS [15:45:35] but whatteevvaah [15:46:22] no prob, will do [15:46:50] in the meantime, feast thy eyes [15:46:52] http://dev-reportcard.wmflabs.org/graphs/mobile_packet_loss [15:46:58] (still bugs on xaxis) [15:47:14] zat is crazy looking [15:47:26] what is the time period? [15:47:27] cool smoothing! [15:47:48] two days worth [15:47:50] jan 23 and 24 [15:47:57] on an09? [15:48:01] definitely bugs on callout :( [15:48:26] no [15:48:27] an01 [15:48:30] ok [15:48:45] this is directly from the file I used to compute the seq gaps for udp2log data yesterday [15:49:00] but isn't a 5% loss always the case? [15:49:01] so, its pretty uniform [15:49:10] yup, that's what we are saying here [15:49:19] but it is due to network [15:49:21] i mean also on lockeemry [15:49:23] not to local buffers [15:49:23] right [15:49:26] which is not what we thought [15:49:34] which also means [15:49:35] I just made it YYYY.MM.DD.HH.mm.ss ottomata/erosen [15:49:41] that partitioning the stream will not help [15:50:23] so basically, here's what I think drdee [15:50:24] so i think we should start storing the data for mobile, keep monitoring this, maybe using a pig / oozie / limn workflow? and as long as it hovers around 4-7% we are godo [15:50:32] well, i mean [15:50:39] we ahve this number in ganglia already [15:50:49] it isn't unsampled packet loss numbers, but it is the same report [15:50:57] aight [15:51:06] i still think partitioning is worthwhile [15:51:07] but this means we can't do funnel analysis with kraken [15:51:14] i don't htink it is worht it right now [15:51:19] it won't help anything [15:51:24] it might be nice organizationally [15:51:32] it will reduce computation time significantly [15:51:43] but for mobile we raen't having a problem [15:51:58] we saw yesterday that all of hte data that gets to the machines also gets into kraken [15:51:59] because we can store traffic by source in separate folders [15:52:08] so there is less filtering to be done [15:52:20] but we're already doing that [15:52:22] but filtering on our nodes [15:52:26] ok [15:52:31] i mean yes, our nodes will have to do less work [15:52:36] we could probably use fewer of them to do it [15:52:39] so that would be cool [15:52:49] but [15:52:51] at the moment [15:52:55] funnel: we will use event logging data [15:52:57] we're getting everything we cna get for mobile data [15:53:02] that is ideal for sure [15:53:10] yeah, let me rephrase that [15:53:13] we can't use webrequest data for funnel [15:53:18] right [15:53:25] but, if we are thinking about it [15:53:31] well you could [15:53:32] i bet we can turn off all other filters [15:53:34] webrequest 100 [15:53:38] whatever else is there [15:53:40] and just leave mobile [15:53:43] since we are only working on mobile right now [15:53:44] right? [15:53:52] mobile and blog :) [15:53:57] yeah blog is its own stream now [15:53:58] so that's cool [15:54:08] let's do it! [16:00:04] erosen, do you use the unsampled zero logs in kraken? [16:00:46] oh he's walking [16:00:49] couldn't that just be part of the mobile stream [16:00:56] probably so yeah! [16:01:05] all zero requests are also to m.wikipedia domains? [16:01:15] yes [16:01:17] cool [16:03:27] nope, sorry [16:04:00] this is the regex (^([a-zA-Z0-9-]+)\.zero|^zero)\.([a-zA-Z0-9-]+)\.org" [16:04:33] better to ask erosen probaby [16:05:19] oh it has to have zero in it? [16:05:24] ok i'll wait for erosen [16:05:27] yes [16:05:44] maybe mobile logs should include zero domain [16:05:50] it isn't zero.m.wikipedia [16:05:50] ? [16:06:18] i thought so but now i am not sure anymore [16:08:38] ottomata: all zero requests are also to m.wikipedia domains? -- I don't think so [16:08:41] but i'm not sure [16:09:39] my q to you is! [16:09:50] are you using the unsampled zero request logs in kraken? [16:09:55] no [16:10:05] basically i am waiting for x-carrier [16:10:09] yeah man [16:10:10] HAAHAHAHAHAHAHH [16:10:16] are you trolling us? [16:10:18] hehe [16:10:29] but you wanna do zero in kraken, right? [16:10:38] please say yes :) [16:10:43] yeah for sure [16:10:58] i mean I am already doing it, just hackle with the 1:10 sampled files [16:11:38] apparently colloquy doesn't like hackily [16:11:49] i got Orange Congo code from Amit [16:12:18] ok cool [16:12:40] i'm going to go ahead and merge the X-CS change now [16:13:11] do you still need the tata code? [16:13:21] sort of [16:13:25] there are 40 tata codes in india [16:13:32] each for their own geography [16:13:36] so i have now as code [16:13:51] 405-0* [16:14:38] interesting [16:14:53] just to clarify, these headers will stick around in the logs? [16:15:36] yes [16:16:49] k [16:21:08] ottomata, the squids / nginx also need to log the CS field or set it to '-' [16:21:16] else we have inconsistent number of fields [16:21:17] yeah well [16:21:21] that is already happening [16:21:30] we already deployed the x-carrier header to all frontends, ja? [16:21:35] so, i changed the nginx one [16:21:41] but deployign squid is more annoying (it isn't in puppet) [16:21:46] so, i just left it at x-carrier [16:21:54] both squid and nginx don't set this header anyway [16:21:59] so the field will always be - [16:22:00] form them [16:22:01] from* [16:22:02] ok [16:22:11] when we deploy the changes that do change squid [16:22:13] i'll fix it there too [16:22:19] but asher might be good banana's when he sees carrier :) [16:22:23] (cookies and tabs) [16:22:25] its already ther eman [16:22:29] he helped me deploy it originally [16:22:34] and didn't say anything at the time [16:22:34] aight [16:22:41] about cookies and tabs [16:22:43] first tab [16:22:45] s [16:22:48] fooo suuure [16:22:51] yeah [16:22:57] i'll get the patch in today [16:23:27] because we need to figure out if it's possible to log only specify key/value pairs from a cookie and not the entire cookei [16:23:39] sorry, ottomata you mean we won't get to the point where all the requests have been tagged? [16:23:54] until something additional happens? [16:23:54] eh? [16:24:03] with x-cs? [16:24:15] both squid and nginx don't set this header anyway so the field will always be - [16:24:18] yes [16:24:25] but all mobiles requests are in varnish [16:24:30] so no biggie [16:24:31] hmm [16:24:42] maybe, but a lot of the requests are not to the mobile site [16:24:49] we are logging - in squid and nginx just so that the number of fields are consistent [16:24:54] a lot of the zero requests? [16:24:56] yeah [16:25:02] then yup, those will not be tagged [16:25:20] i mean it isn't clear that this is important, but the current logs are 90% main site requests [16:25:21] ok [16:25:31] 90% desktop? [16:25:33] i think it will be fine, i'll talk with amit [16:25:34] yeah [16:25:35] ? [16:25:37] haha [16:25:42] well, for the mobile logs [16:25:47] * drdee is confused [16:25:54] we are only importing those that match m.wikipedia domains [16:25:55] i mean they mostly only care about the mobile site visits [16:26:09] well, if you are trying to know who is using the zero program [16:26:17] and 90% of visits from zero clients are to desktop [16:26:21] and we filter on the X-CS header [16:26:25] you'll be missing 90% of your requests [16:26:30] indeed [16:26:31] yeah [16:26:31] right now doing it by IP address will get them all [16:26:46] but as i said this should still be okay because amit usually only cares about the mobile site requests [16:26:52] are you sure that 90% goes to desktop? [16:27:09] i'll show you a graph, one sec [16:27:10] and why is this the case? [16:27:24] nobody knows [16:27:28] I keep asking [16:27:51] i think it is either that nobody makes bugzilla requests when their phone goes to the main site [16:28:00] or that there is a lot of dongle use [16:28:19] or very broken redirection [16:28:57] or that [16:32:54] here is an example: http://global-dev.wmflabs.org/graphs/orange_kenya_versions [16:33:36] X: desktop, M;m., Z:zero. [16:34:00] yup [16:34:35] those counts come from the current udp-filters [16:47:54] i just looked at 10000 lines from tim-brasil [16:47:57] hardly any m. domains [16:50:03] or maybe mobile is highly over-rated in the global south :) [16:50:19] i should say mobile internet [16:52:16] drdee, well, if the IPs we are filtering on are supposed to be from mobile phones [16:52:23] you'd think they would get directed to m. no matter what [16:52:37] drdee, do you know if we have an RT for tab change? [16:53:08] 1 sec [16:53:50] maybe not ottomata, probably a good idea to open one ;) [17:00:17] ok ja [17:13:29] poop drdee, packet-loss.cpp splits manually on space [17:13:35] we'll need to deploy a new udp2log deb [17:14:09] ugggh [17:14:46] why is it splitting? [17:16:01] it needs seq and timestamp [17:16:03] from the log lines [17:16:15] also, the udp-filter version we have deployed doesn't have the -F flag option yet [17:16:25] so we need to deploy a new udp-filter too [17:16:45] okay so let's do that first [17:17:04] ok, should we just deploy a new one with the default as \t? [17:17:14] i will ask stefan to build the deb [17:17:32] let's make it a parameter [17:17:40] ok, but which version? [17:17:46] you guys have a lot of changes to udp-filter [17:17:47] right? [17:17:54] do we want the webstats changes? [17:17:56] probably not, right? [17:17:57] no [17:17:59] it is already a parameter [17:18:02] i will go back in time [17:18:20]