[10:30:14] ottomata: !! [10:30:15] hello [10:30:20] you working today? [10:31:14] milimetric: quick q: what is the status of the support for datasource definitions embedded in the datasource? [10:31:29] i saw something like it on the reportcard, but it looked hardcoded into the description... [10:31:44] the metric defs? [10:32:03] yeah [10:32:17] so there is a list of them defined in a metric-defs code file [10:32:22] and you reference them by id [10:32:25] interesting... [10:32:34] is it dashboard / limn-instace specific? [10:32:42] no [10:32:45] k [10:32:54] they're in src/ so they're global [10:33:00] k [10:33:13] yoyooyo [10:33:18] yup working today [10:33:19] :) hey [10:33:21] milimetric: that might still be worth it [10:33:27] i'm just replying to y'alls email [10:33:34] yeah, the metric defs are awesome [10:33:45] if you have defs that don't exist, i'll definitely show you how to add them [10:33:46] ottomata: great, any time for some zero backfilling fun? [10:34:10] maybe even some rebuilding of the jars ….. :) [10:34:23] oof, yeah probably [10:34:26] guess what doods [10:34:31] wha.? [10:34:31] i'm working on the webrequest loss monitoring. [10:34:32] what dood? [10:34:36] since I started it back up on 5-21 [10:34:38] oh nice :) [10:34:43] annnnnd [10:34:44] looks like there are crazy duplicates in the logs from then on right nowp [10:34:46] now [10:34:47] but.... that's not on the WIP! [10:34:50] you're gonna get in trouble [10:34:51] it is! [10:34:55] oh rly [10:35:00] https://mingle.corp.wikimedia.org/projects/analytics/cards/715 [10:35:04] oho reeaaaaally [10:35:09] oh yeah! expedited [10:35:33] yeah so, duplicates. [10:35:36] yeah [10:35:48] that shouldn't be that difficult to deal with right? [10:35:54] ok, so, here's what i saw [10:36:00] guess not, but we should be aware [10:36:04] we can probalby remove them [10:36:06] isn't this like a perfect MR job? [10:36:12] starting like the 21st, I see triple the numbers for mobile apps [10:36:22] erosen: re zero stuff, yes, gimme an hour or two though? [10:36:27] i think I can get webrequest loss stats into ganglia [10:36:32] ottomata: sounds good [10:36:33] which would allow me to generate alerts [10:37:03] ottomata check out the 21st - now in this file: http://stats.wikimedia.org/kraken-public/webrequest/mobile/platform/mobile_platform-daily.tsv [10:37:11] ottomata: can you give me a suggestion for rebuilding the jar with the mcc-mnc.json file in it? [10:37:53] milimetric good catch the other day, i guess the app popularity wasn't skyrocketing like we had expected. ... [10:38:09] so that's the sampled logs though otto, did you expect that to be affected too? [10:43:57] i hvaen't checked sampled, and I think not [10:44:03] but i could be wrong [10:44:07] i don't know why there are duplicates yet [10:44:12] the duplicates are actually in kafka [10:44:20] which means that somehow udp2log producers are sending them there [10:44:36] i suspected there were somehow multiple producer procs running on each udp2log instance, but now that I thikn about this it doesnt' make any sense [10:44:50] also, depending on how the duplicates are created, it might be harder to detect in sampled logs [10:45:27] if they are actually created by multiple producers, then for sampled that would just mean double the log lines, not necessarily duplicate seq numbers [10:45:30] actually [10:45:31] a quick check [10:45:35] is just to look at filesizes [10:45:44] thus far its pretty much 100% duplication (i think) [10:45:45] so [10:46:00] if the filesizes for imports after 05-21 are ~ double what they were before 05-15 [10:46:06] then you can assume there are duplicates [10:46:09] i gotta run and get lucnh [10:46:11] back in a bit [10:47:26] looks kinda like triple to me, not double [10:47:35] but bon apetit and we'll talk after :) [10:47:57] I'm still kinda zombie-mode [11:33:21] ok back... [11:52:27] woohoo [11:52:28] its working [11:52:28] http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Analytics%20cluster%20eqiad&h=analytics1027.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1369741866&v=-99.070396&m=webrequest_loss_average&vl=%25&ti=Average%20Loss%20Percentage&z=large [11:58:23] nice! although i don't understand the y axis :) [12:06:10] its −99% right now [12:06:20] because of duplicate log lines [12:06:35] we've had lots of duplicates since this was turned back on on may 21 [12:06:38] trying to understand why [12:06:43] there are actual duplicates in the multicast stream... [12:07:12] so the percent loss comes from [12:07:25] sequence count actual / sequence count should be [12:07:28] in this case [12:07:32] oh [12:07:59] sorry no [12:08:15] (seq count should be - seq count actual) / (seq count should be) [12:08:25] in this case, seq count actual is 2x the amout it should be [12:09:02] so this is almost (−100 / 100 ) * 100.0 % == −100% [12:12:29] yeah totally [12:12:38] duplicates [12:12:39] hmm [12:20:28] ok [12:20:38] the duplicates are in the multicast stream. [12:20:43] now i'm confused [12:20:48] drdee [12:20:55] share in my confusion. [12:35:33] give me a sc [12:36:05] ottomata: hangout? [12:36:15] one sec [12:41:17] drdee, so what I know so far [12:41:27] since may 21 (when we fixed the acl blocking multicast traffic) [12:41:36] (i am in the hangout) [12:41:38] we've had duplicates in the logs [12:41:41] ahhhh fine [12:53:35] hi average, welcome back! [12:53:39] hi drdee , thanks [13:03:18] we will do our scrum meeting today at 600PM European Time [13:04:44] are you working on #716? [13:10:54] I will be attending scrum today [13:11:30] there are multiple european timezones http://www.timeanddate.com/library/abbreviations/timezones/eu/ [13:11:35] drdee: which one ? [13:11:45] drdee: amsterdam time ? [13:11:45] yes amsterdam time [13:11:50] ok [13:11:57] about 716, I have read the ticket [13:12:25] last time I approached Ops with a request for importing the package they said they want to be able to build it themselves [13:12:46] I will migrate github => gerrit though [13:14:17] talk to ottomata on how to do this [13:14:28] ok [13:54:29] eh boys, want to do standup now? [13:54:31] drdee? [13:55:55] let's wait for adrian [14:02:39] oh he's going to make it at 6? [14:07:45] i think / hope so [14:07:58] we will do it at 6 for sure [14:56:15] ottomata: what did we conclude about meeting time? [14:56:29] 6pm [14:56:34] in 1 hour [14:56:35] great [14:56:52] just didn't know if the "let's do it now" idea caught on and left the e-mail thread behind [15:00:32] ottomata: https://plus.google.com/hangouts/_/87c708c932764edefe229ccf574c5665570c019b [15:01:06] ottomata: varnishkafka demo ^^ [15:17:39] erosen, I found even more celery good advice: http://docs.celeryproject.org/en/latest/userguide/tasks.html#tips-and-best-practices [15:18:00] ottomata: did you have a chance to look at the zero job backfilling [15:18:31] milimetric: /me reading now [15:19:08] milimetric: I actually tried out the multiple inheritance approach and ran into the craziest error ever [15:19:22] oh yea? :) [15:19:29] which opened the door to metaclass programming, a route which i'm sure should never be taken [15:20:14] milimetric: basically you can't do multiple inheritance when the two parent classes use different metaclasses [15:20:57] in the spirit of your vim stackoverflow answer, this answer seems to say it all: http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python [15:21:37] oh so in our case SQLAlchemy and Celery Tasks don't have the same metaclass? [15:21:51] erosen, no, wasn't sure what you meant for zero backfilling [15:21:53] what's up? [15:22:11] basically, can you backfill the country and carrier jobs further into the past? [15:22:18] they currently start at april 1 [15:22:31] but IIRC we have data going back to mid Feb right? [15:22:42] oh yup, should be able to do that juuust fine, ja think so... [15:22:42] can I show you how to do it? [15:22:42] :) [15:22:45] please [15:22:50] data goes back to feb 1st [15:23:02] I would genuinely like to know more, but I feel pretty unfamiliar with oozie [15:23:06] okok ok [15:23:08] hangout? [15:23:10] ya [15:23:12] we got some time before standup [15:23:15] yup [15:23:15] k batcave [15:23:19] one sec grabbing powercord [15:27:40] brb [15:44:46] guys are we ready to scrum? [15:46:07] yeah, i'm ready [15:46:08] erosen: ok, so maybe a simpler approach is better here. Instead of trying to force our way into this __call__ / inheritance approach, why not just have a task() method that's the @celery.task? What does making these objects callable really gain us besides a bit of elegance? [15:46:22] milimetric, average, rounce123: ^^^ [15:46:22] btw - metaclass answer was great [15:46:37] I am [15:46:40] I answered... am I offline? [15:46:48] join the hangout :) [15:48:39] average, are you around? [16:08:27] ottomata: for later: http://localhost:19888/jobhistory/logs/analytics1013:8041/container_1369058864829_3453_01_000002/attempt_1369058864829_3453_m_000000_0/erosen/stdout/?start=0 [16:48:48] doh i jsut realized why DISTINCT wasn't working for log files [16:48:54] the kafka offset is differnt, duhhhhhh [16:48:56] so not distinct [16:48:56] ok ok [16:48:57] cool [16:48:59] can fix then [16:50:13] you could throw away the kafka offset and then do distinct [16:54:24] can someone create a gerrit repo named "dclass" please ? [16:56:00] yeah [16:56:28] average: i gotcha... [16:57:08] also, aveage [16:57:10] average [16:57:12] you should read this [16:57:12] http://honk.sigxcpu.org/projects/git-buildpackage/manual-html/gbp.html [16:57:30] okey dokey [16:57:31] here you go [16:57:32] https://gerrit.wikimedia.org/r/#/admin/projects/analytics/dclass [16:58:39] wow git-dch already does what git2deblogs does :| [16:59:09] well, I'll read it then [16:59:19] ok, I'll be doing background reading on it [16:59:24] ottomata: thank you for creating the repo [16:59:29] I'll push stuff into it quite soon [17:00:22] ottomata: back [17:27:11] erosen1, you got time to talk? I know you're not available later but wanted to quickly touch base on what I'm going to try next [17:40:42] milimetric: you around still? [17:40:46] yep [17:40:49] hangout? [17:40:58] yup [18:26:42] brb 2h [18:26:59] rather bb in 2h [18:47:21] milimetric, around? [19:25:54] [travis-ci] develop/76aafee (#137 by milimetric): The build has errored. http://travis-ci.org/wikimedia/limn/builds/7574881