[08:30:59] "LZO is fast to the point that it can speed up some Hadoop tasks by reducing the amount of I/O" http://www.schmitztech.com/tech/archives/5 [13:16:37] False [13:20:47] False [13:21:18] False [13:21:48] False [13:22:03] False [13:22:50] False [13:37:35] yo qchris [13:37:43] two great emails about the bugs [13:37:43] Good morning drdee [13:38:14] mooooorning [13:38:18] bug 54783: should that be taken to the ops list and see if we can have single canonical trusted x-forwarded-for list? [13:38:56] bug 54782 / comment 17: that sounds like a new bug that we should file or not? [13:39:03] moooorning ottomata [13:40:12] drdee: bug 54783 no idea. If you take it to a public mailing list ... I cannot handle yet more communication, so be sure you can find someone that can drive the discussion. [13:40:38] bug 54782: To me the bug is just about the problem of which requests we log for mobile. [13:41:11] So the discussion perfectly fits the bug from my point of view. [13:42:03] And eventually, we'll probably also discover the missing forwards to m. and zero. domains for mobile devices in that bug. [13:42:35] If you prefer to factor out some part of this discussion, please be bold :-) [13:43:46] (But I'd prefer to keep it in the bug, as we do not know what the desired outcome is. Maybe it's just that the documentation is wrong and we do not want to count such requests at all) [13:46:29] 54783: added faidon to CC let's see if we can have him chime in on the bug [14:34:00] Morning, Snaps, you there? [14:34:22] ottomata: yes siree [14:36:02] are the varnishkafka sequence numbers ints or longs? [14:36:39] False [14:36:39] uint64_t [14:36:45] Truth [14:36:49] ha [14:36:50] hm [14:36:55] False [14:36:56] False [14:37:33] so i'm noticing more 0s than I would expect in varnishkafka logs [14:37:52] for example, if I select * where sequence < 1000 [14:38:55] you mean entries with seqid=0? [14:39:10] yes [14:39:17] ok check this [14:39:22] I did that query, and got 3 results [14:39:26] all with seqid = 0 [14:39:30] and 2 even from the same node [14:39:45] neither with dt (request timestamp) set [14:40:11] https://gist.github.com/ottomata/242ec1905ec04ca05387 [14:40:57] varnishkafka has not been restarted [14:41:20] those requests were imported on october 17 [14:41:29] varnishkafka has been running since the 15th [14:42:31] huhm [14:43:09] those are extra large log lines [14:43:41] each one more than 2048 bytes (in that output anyway) [14:43:58] myeah, maybe the logline is prematurely reset or rendered when a field gets too large. I'll need to make a test case [14:44:17] oh yeah, and I wanted to ask you, what does happen when the line is too long? [14:44:32] does varnishkafka drop the message? or just truncate whichever field is longest? [14:45:27] its more complicated than that, the scratch buffer works as a memory request backend, so it depends on the tag being parsed and how it requests memory, and how it responds to memory request failures [14:45:44] so I dont have an answer, which I should have, and I better get one [14:46:33] hm, ok [14:46:47] I'll have a stab tonight [14:47:14] but we need to agree on one thing first: what should we do when the logline gets big? [14:47:27] truncate field, truncate logline, skip logline, expand logline to fit, .. [14:50:37] i think truncate field is fine [14:50:44] do you know which field is causing the problem? [14:50:48] can you just truncate the longest field? [14:51:05] i mean, expand logline to fit would be awesopme [14:51:14] but that sounds much more complicated, right? [14:52:03] its a bit more complicated, but more importantly it is costly, and these loglines are really extreme outliers from what I understand, so it would be wrong to optimize for them [14:54:07] yeah [14:54:08] we are fine with truncating the offending field [14:54:09] if you can detect which one is the proble [14:54:20] usually the problem is caused by junk in the query string, or junk in the user agent [14:54:27] the rest of the request is useful though [14:55:07] False [14:55:36] DarTar, drdee, I've edited card https://mingle.corp.wikimedia.org/projects/analytics/cards/699, could you please take a look and let me know if I've simplified it too far? [14:55:41] okay, I'll look into it tonight [14:56:28] It was describing a bunch of features that are not metric-specific so I removed those. And it was specifying features that don't make sense with what's possible today with wikimetrics. [14:56:40] oh Snaps, also, I'm trying to get word from Ops folks on the varnishkafka UDP thing [14:56:46] but if you are dying for things to do to varnishkafka [14:56:53] adding that would be useful in the long run anyway [14:57:02] So if we want to get #699 done first, we have to simplify it like this. Otherwise, we have to do some other cards in preparation [14:57:02] milimetric hopes that someone will have a look at https://mingle.corp.wikimedia.org/projects/analytics/cards/699 [14:57:39] ottomata: hehe, okay. good [14:57:48] cool thanks [14:58:02] in the meantime also, milimetric, there seem to be nasty extra 0 sequence numbers in the logs [14:58:05] for unknown reasons right now [14:58:20] hm [14:58:24] which means that the total / (max - min) approach won't work [14:58:42] oo [14:58:42] you could just say where sequence <> 0 [14:58:46] yeah [14:58:48] just thought of that too! [14:58:49] hehe [14:59:03] good old "ignore" approach :) [15:14:12] heya qchris, question for you…. [15:14:19] * qchris hides [15:14:23] if I pushed JsonStringMessageDecoder upstream... [15:14:29] Yes. [15:19:06] ottomata: did freenode swallow the line after "if I pushed JsonStringMessageDecoder upstream..."? [15:38:30] oh sorry [15:38:31] ha now [15:38:33] no [15:38:44] yeah, if i pushd it upstream, there wouldn't be any java code in kraken, right? [15:38:46] what would we do then? [15:39:13] awessoooome, milimetric, this query is working better, [15:39:23] this is saying 1 missing seq from amsterdam in 2 days [15:39:26] but maybe that's ok? [15:39:42] cool [15:40:06] hm, yeah, doesn't sound awful [15:40:34] i mean as long as we have these numbers and they don't go over some threshold, we can always say what level of confidence we have in whatever analysis we do [15:44:22] ottomata: That should not be a problem. We could use maven to just produce the jar with all dependencies inside. [15:44:47] ottomata: And if we do not like that, we could switch to the vanilla camus-example shaded jar. [15:47:38] !card 818 [15:47:39] https://mingle.corp.wikimedia.org/projects/analytics/cards/818 [15:55:36] ottomata: is that 1 missing seq or seq run? [15:56:34] one missing seq [15:57:35] qchris: cool [15:57:56] yeah I like producing the jar with the dependencies in kraken [15:57:57] that's god [15:57:58] godo [15:57:59] good [15:58:14] Do you plan to push the tests upstream as well? [15:58:33] Do I need to prepare them to better fit their environment? [16:00:22] oo [16:01:12] (CR) Milimetric: [C: 2 V: 2] Abstract dependencies to separate requirements.txt file [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/90682 (owner: Diederik) [16:01:53] yeah hm, looks like they are just using basic junit? [16:02:32] (PS1) Milimetric: fixing broken setup.py [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/90915 [16:02:40] (CR) Milimetric: [C: 2 V: 2] fixing broken setup.py [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/90915 (owner: Milimetric) [16:02:50] qchris: dunno, what do you think we should do about that? [16:03:22] I never talked to upstream, so I have no idea. [16:03:44] Judging by their code/test-coverage, I think they'd rather include it without tests. [16:03:59] ha, yeah [16:04:02] i mean they do have a couple of tests [16:04:05] but not many, eh? [16:04:10] Yes. [16:04:22] But if they ask for tests, we can bring them over easily. [16:04:38] Even without the LoggingMockingTestCase stuff. [16:04:39] hmm, ok I'll ask them [16:04:41] oh ok [16:04:57] if you can do it without those dependencies, I'm sure they will welcome the tests [16:05:11] We'll have to put the code in the TestCase directly. [16:05:17] That looks awfull. [16:05:22] But we can do it. [16:05:30] The code looks way cleaner without tests. [16:05:59] So I'd go without tests and not even mention them. [16:06:11] But be sure we can bring them over, if the require them. [16:06:14] (For whatever reason) [16:07:12] Getting the code upstream would be great. Then we can spare discussions about robustness, and defer to upstream :-D [16:17:31] haha, ok [16:17:37] i'm sure they will not require the tests [16:17:43] but you made them so pretty! [16:17:54] i'd be sad to see them go [16:22:28] Hahaha. [16:23:02] If we want to keep them for the aestethics ... we can keep them in kraken-etl :-) [16:23:18] Maybe that would even be good? [16:23:53] But if upstream accepts the code, we should probably just nuke them. [16:24:04] them = the tests :-) [16:40:58] yeAH [16:49:27] milimetric: should we pull the stuff we've been working on out into its own repo and use a submodule? [16:49:46] not sure. [16:49:55] I'd like to be able to share hive-partitioner [16:54:55] yeah, I'm for it [16:54:56] if anyone can use our stuff, we should make it easy to do that [16:57:42] well, anyone can for sure use hive-partitioner, but probably not pagecount-impoerter, right? [17:07:06] pagecount importer would only be useful as an example script, not as a library of any kind [17:07:21] but yeah, hive-partitioner is important [17:13:20] False [17:15:17] False [17:15:41] TRUE! [17:55:44] wow this count where seq = 0 query is taking a looong time [17:55:53] longer than the count missing seq queries [17:56:26] hmm, milimetric, what if I factor out hive-partitioner and util.py and the tests [17:56:31] into its own repo [17:56:47] and then we can use a submodule in kraken, and make pagecount-importer use it [17:57:08] hmm [17:57:11] what should this repo be called? [17:57:22] dataset utils [17:57:26] dataset-utils [17:57:31] hadoop-utils [17:57:34] hadoop-dataset-utils [17:57:35] hm [17:57:45] ahhh I dunno.... [17:57:56] maybe we should just leave it in kraken and reorganize kraken-etl properly [18:20:51] ooo good news, number of missing seqs == number of 0s [18:20:52] woot! [18:20:56] 0 data loss in 4 days! [18:21:09] (i hope I am not jumping to conclusions to soon as usual!) [18:50:06] ottomata: so, for the client-side stats stuff -- it'd be a new service, and it'll take some time before people use it. initially it'd mostly be me. so if it turned out that varnishkafka was dropping some data, it would not be critical at all [18:50:11] that's why i was wondering if it might be a good test case [18:50:12] It's never done that before [18:50:34] where do you need varnishkafka to run? [18:50:40] boom CommanderData is right again ;) [18:50:46] ottomata: bits varnishes [18:50:55] hm, ok. [18:51:10] it can probably replace the eventlogging hadoop varnishncsa instance, i don't think you guys are actually using it [18:51:21] i bet that would be fine by mark, since it'd be killing a varnishncsa instance and replacing it with varnishkafka [18:51:22] yeah we aren't using it [18:51:31] so, varnishkafka hasn't yet been deployed in full production yet. I just now sent out an email proposing that we do so for mobile hosts [18:51:48] If mark is cool with deploying to bits I'd be fine with that too [18:52:16] in the meantime, I bet he'd let us test this out with varnishkafka on a single bits instance [18:52:29] that's how i've been testing for mobile (albeit on a upload instance) [18:52:45] is it puppetized? [18:52:51] there is a puppet module [18:52:54] but it isn't used yet [18:53:00] because there isn't an official deb yet [18:53:03] we ahve debs but not in apt [18:53:15] we've been waiting to get the stable version of varnishkafka out the door [18:53:19] and there are still pending changes atm i think [18:54:24] ottomata: awesome! really cool. thats at ~150gig per day? [18:54:33] ottomata: yes, I will see to those changes now [18:54:35] https://github.com/wikimedia/operations-puppet-varnishkafka [18:55:38] Snaps: that's 356.7 (uncompressed, I think) in hdfs on oct 20 [18:55:48] 356.7 G [18:56:32] okay [18:57:11] where is this meeting today? [18:58:36] dooooo what meeting? [18:58:51] this platform/analytics planning the next 20 years of your work meeting [18:58:58] s/20 years/3-6 months/ [18:59:04] greg-g: 3Flr - R34 - Browne (M) [18:59:11] thankya kindly ori-l [18:59:17] small room [18:59:22] I am not invited! (but maybe that is good? :p) [20:09:39] qchris; what was the ether pad with tech debt? [20:10:02] Let me find that email for you ... [20:12:09] Ha. I cannot find it either :-) [20:13:57] drdee: http://etherpad.wikimedia.org/p/analytics-tech-debt [20:14:31] ty [20:14:38] i am gonna minglify it ;) [20:14:43] False [20:14:59] True [20:18:38] wm-bot2: You pestered us long enough with your False comments ... [20:18:44] @rss-off [20:18:44] Rss feed has been disabled on channel [20:21:53] drdee, tnegrin: sorry, I don;t think I can make it either to the backlog grooming session today [20:22:20] let me know if there's anything I can help with [20:22:28] qchris no :( i liked the rss feed for mingle [20:22:52] DarTar that's ok, have you spoken with Dan about #699 [20:22:52] drdee hopes that DarTar will have a look at https://mingle.corp.wikimedia.org/projects/analytics/cards/699 [20:23:00] drdee: wm-bot2 is reporting "False" all day periodically. [20:23:11] we're going to talk about it some time this afternoon [20:23:21] or whenever works for him [20:23:22] drdee: Just turn it on again :-) [20:23:52] drdee: You've admin rights. [20:24:03] true :) [20:24:09] drdee: (It just annoyed me to read "False" again and again) [20:24:17] yes that's annoying [20:28:20] why is it doing that? [20:28:21] No one told me so I was forced to assume which way to do that [20:28:42] one bot responding for another bot :D this is getting fun [21:14:29] hey ottomata: bastion's down and therefore all limn instances + wikimetrics [21:15:21] [21:15:48] ha, well well [21:15:59] they're down because bastion is down? [21:17:30] that's……….not good :) [21:17:36] nope [21:17:46] definitely falls under the "not good" category of random things [21:20:38] i'm out for a bit, back on for some time this evening [21:25:17] likewise, I'm out [21:25:26] later