[13:50:05] moorning! [13:50:15] drdee, is that varnishkafka check in supposed to be daily? [13:50:22] moorning! [13:50:47] yup [13:51:28] has kripke been salvaged? [13:53:29] yup! i think dan has some work to do making the limn instances on limn0 work properly [13:53:34] i think right now you can't save graphs or something [13:53:41] but [13:53:41] the ip has been moved [13:53:57] so nexus. and all the limn instances are hosted on limn0 now [13:54:01] i'd like to move nexus elsewhere [13:54:05] puppetized, etc. [13:54:10] but for now I just copied it [13:57:23] yup, can't we reuse someone else's puppet manifest for this? [13:57:28] probably [13:57:30] haven't looked [13:57:34] its real simple though [13:57:38] but [13:57:40] dunno about a .deb [13:57:45] that would make the puppet stuff more difficult [13:57:46] as usual [13:58:01] i'd be fine with not puppetizing the installation [13:58:13] we could host this on stat1001, no probs [13:58:38] i'm going to take hadoop down for a sec [13:58:42] need to restart an10 [13:58:45] well nexus does need a of diskspace [13:58:51] need a lot [13:58:53] i meant [13:58:54] ok [13:59:03] 4.1T avail on stat1 [13:59:10] stat1001 [13:59:25] also, stat1002 is ready again, with new disks [13:59:27] i can work on that too [13:59:43] going to finish this kernel upgrade stuff, i'll need to reboot udp2log hosts too [13:59:58] i might wait for oxygen until we do the precise upgrade [14:00:07] then i'm going to work on hadoop cdh4 [14:00:11] then maybe stat1002 [14:00:13] then kafka [14:00:15] s'ok? [14:00:56] !log stopping hadoop and restarting analytics1010 (namenode) for kernel upgrade [14:00:58] Logged the message, Master [14:09:11] sounds great! [14:28:40] New review: Diederik; "Ok." [analytics/wikistats] (master) C: 2; - https://gerrit.wikimedia.org/r/62209 [14:28:40] Change merged: Diederik; [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/62209 [14:29:03] New review: Diederik; "Ok." [analytics/wikistats] (master) C: 2; - https://gerrit.wikimedia.org/r/62206 [14:29:04] Change merged: Diederik; [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/62206 [15:00:05] drdee, whachu think [15:00:29] can I reboot gadolinium? the multicast stream would be offline for maybe 5 mins or less [15:54:18] !log rebooted gadolinium to upgrade linux kernel [15:54:20] Logged the message, Master [16:01:48] ottomata: got your mail in duplicate fyi. [16:02:47] yeah, i accidentally sent the first with my personal gmail [16:02:51] resent from wmf addy [16:07:48] ahhh [16:07:52] well i got both [16:37:02] ottomata, can we do scrum now? [16:37:16] erosen: scrum now? [16:37:22] it's just the three of us [16:37:22] sure [16:37:24] works for me [16:38:13] sure [16:45:35] drdee [16:45:36] ! [16:45:38] we are waitinf o youuu [16:46:49] sorrryyyyyy [17:32:20] ottomata: re 244, looks like what I suspected is happening: the country counts are only being generated from requests which are tagged with x-cs [17:32:23] ottomata: see https://github.com/wikimedia/kraken/blob/master/pig/webrequest_zero_hour_carrier_country.pig#L34 [17:32:28] i think the fix is simple [17:32:34] but i will mean we need to rerun thing [17:32:34] s [17:32:53] oh [17:32:54] yes [17:32:57] that's correct [17:33:06] was something different supposed to happen? [17:33:20] well the country counts are meant to be the total number of requests from that coutry [17:33:22] i thought we weren't going to trust zero stats in kraken until we sorted out the x-analytics problems [17:33:26] oh [17:33:30] so that we can assess the relative share which a particular provider has [17:33:51] huh, ok so different jobs [17:33:54] well, regarding the x-analtyics problems, we think they don't exist... [17:34:01] right ok [17:34:05] well they don't need to be different jobs [17:34:17] i think we can just reorder the pig script [17:34:22] hm [17:34:25] and generate the country counts prior to the filter command [17:34:27] maybe ...? [17:35:02] might make the job take longer [17:35:05] but should work [17:35:10] also, i just noticed this: [17:35:11] i'm happy to split the job up [17:35:13] carrier_count = FOREACH (GROUP log_fields BY (date_bucket, language, project, site_version, country, carrier)) [17:35:14] is that ok? [17:35:22] country,carrier? [17:35:29] that means that if there are different countries for a single carrier [17:35:35] they will be counted differently in the carrier_count [17:35:37] is that ok? [17:35:40] yeah [17:35:42] ok [17:35:53] I do a groupby on the data before I use it [17:35:58] ok [17:35:59] so it is always okay to have an extra field in there [17:36:10] hm [17:36:32] what vexes you? [17:36:46] naw, got it, was going to ask another q but i got it [17:36:48] ok so hm [17:37:09] i wonder if we just change the pig script, if the job will just start using it [17:37:29] seems likely [17:37:48] we can manually run it on a smaller date range [17:37:57] and then switch out the file which oozie uses [17:39:17] eh? [17:39:34] gonna try this... [17:40:25] hmmm actually [17:40:30] i think this should be a diffferent job [17:40:36] this is full mobile country counts then [17:40:38] right? [17:40:46] not necessarily anything specific to zero [17:41:08] yup [17:41:08] exactly [17:41:38] do you still want this bit of the filter? [17:41:38] - host matches *.wikipedia.org [17:41:38] - request is a pageview [17:41:50] for country count? [17:41:53] hmm [17:42:06] from a greedy "more info is better" perspective [17:42:13] i think we should keep the project in the output [17:42:25] yeah i think so to [17:42:26] but [17:42:27] by project i mean {'wikipedia', 'meta.wikimedia', 'wikisource'} [17:42:34] right now those are being filtered out [17:42:38] but it is fine for now if it is easier [17:42:38] New patchset: Erik Zachte; "Collect and use countable namespaces via api" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/64598 [17:42:42] yeah [17:42:45] - host matches *.wikipedia.org [17:42:54] naw, its the same if I make it a different job [17:43:00] what about pageview? [17:43:04] - request is a pageview [17:43:11] yeah [17:43:14] that part should stay [17:43:30] we are trying to use that definition more broadly [17:44:11] Change merged: Erik Zachte; [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/64598 [17:44:33] k [17:44:38] so [17:44:44] filter for pageviews [17:44:48] group by same fields [17:44:49] output [17:45:01] yup [17:45:01] hm [17:45:02] ok [17:45:08] i think i will keep this at the same location, as a zero job [17:45:11] but I will make it another job [17:45:12] check this out: https://mingle.corp.wikimedia.org/projects/analytics/cards/244 [17:45:21] the output format is pretty well specified [17:45:24] in the example [17:47:49] ottomata: are you working on this? [17:47:58] ottomata: in general, what is your pig dev env like? [17:48:04] yeah, working on it [17:48:18] in general: i don't have one, pig testing I do from an02 [17:48:20] in grunt usually [17:48:28] we hope to have one in labs eventually [17:48:28] gotcha [17:48:42] well then it would need the data, right? [17:48:48] i do have a vagrant vm that I will keep tweaking too, i'm using it for puppet dev, but could also use it for general kraken dev [17:48:58] it would need fake data [17:48:58] that's roughly what I would do--just checking [17:48:59] ya [17:52:20] ottomata: there is another issue which we may want to fix at the same time: the x-cs code -> name lookup is still a bit rough [17:52:37] and the tata-india stuff is missing for some reason [17:53:10] ungh, that's the udf? [17:53:19] not sure [17:53:31] i've been trying to find it for the last few minutes [17:53:49] i think when we designed the system we created a json file which represents the mapping [17:53:58] and that then gets loaded in some UDF [17:54:01] but I don't know which one [17:54:13] Zero.java [17:54:19] yeah, where is that? [17:54:27] aah, not in pig [17:54:27] kraken-pig/... [17:54:59] https://github.com/wikimedia/kraken/blob/master/kraken-pig/src/main/java/org/wikimedia/analytics/kraken/pig/Zero.java [17:55:21] ja [17:56:50] hmm [17:57:03] do you know where that UDF expects the actual file to live? [17:57:19] ? [17:57:35] this? [17:57:35] mccMncMap = converter.construct("org.wikimedia.analytics.kraken.schemas.MccMnc", "mcc_mnc.json", "getMCC_MNC"); [17:57:36] https://github.com/wikimedia/kraken/blob/master/kraken-pig/src/main/java/org/wikimedia/analytics/kraken/pig/Zero.java#L61 [17:57:38] hehe [17:57:38] yeah [17:57:49] drdee maybe does [17:57:51] hey, does someone know where I can find the source code of http://test-reportcard.wmflabs.org/graphs/active_editors_target (I don't mean Limn, but the datasources and graphs)? [17:57:51] what is the pwd in that case? [17:57:52] i've never looked at this [17:58:27] jgonera: you should be able to access them at http://test-reportcard.wmflabs.org/graphs/active_editors_target.json [17:59:12] and http://test-reportcard.wmflabs.org/datasources/rc_active_editors_target_aligned.json [17:59:24] erosen, i think its in kraken.jar [17:59:41] jgonera: add ?pretty=True to have it pretty printed like this: http://test-reportcard.wmflabs.org/datasources/rc_active_editors_target_aligned.json?pretty=True [17:59:47] ottomata: intresting [18:01:04] erosen, thanks, I think I've found what I needed [18:01:09] cool [18:11:42] erosen, do you know if I can make something like this: instead of bars from consecutive columns in the data table being drawn on top of one another in z-axis, draw them on top of one another in y-axis? [18:11:48] this probably sounds confusing [18:12:20] hehe [18:12:22] a litt [18:12:23] le [18:12:32] i think i know what you mean though [18:12:36] and I think the answer is yes [18:12:50] does the idea of a "stacked" chart sort of fit what you are thinking? [18:12:57] hm, let's say we have a table with three columns: date, apples, oranges, with the following rows: (1/1/2013, 5, 10), (2/1/2013, 7, 4) [18:13:01] I guess [18:13:05] so that the plotted value of a particular line is the cumulative value of all the lines beneath it? [18:13:29] yeah, I'd just prefer bars to lines though ;) [18:15:04] yeah that is possible [18:15:09] so in the apples and oranges example, the oranges bar would be stacked on top of the apples bar [18:15:09] i believe [18:15:27] finding an example [18:15:34] lunchtime! back in a bit [18:15:34] thanks [18:15:51] jgonera: like this: http://gp.wmflabs.org/graphs/grants_count_by_global_south? [18:16:51] erosen, not really, there if "Number of Grants by Global South, north" is set to 13 it will completely cover "Number of Grants by Global South, South" [18:17:04] really? [18:17:08] interesting... [18:17:18] in other words, this is based on the fact that we can predict which column has the higher value [18:17:33] i see [18:17:36] very true [18:17:45] well the order actually doesn't matter in the datasource [18:17:52] but i believe it is possible to make it stacked [18:18:35] it matters in the graph JSON file, I checked it, Limn doesn't automatically determine which column has the lowest value [18:18:41] and in my case, this can change over time anyway [18:18:43] this might be more useful http://reportcard.wmflabs.org/graphs/pageviews_mobile_target_stacked [18:19:58] hah, this might be it! ;) "stack": { [18:19:58] "enabled": true [18:19:58] } [18:20:08] thanks, I'll try to play with it [18:20:15] yeah i think that's the trick [18:22:40] yep, that works, for some reason though the y-axis range is not auto-adjusted now, but I'll try to figure this out [18:22:53] yeah, i think that is a known bug [18:23:15] do you know if I can somehow show the cumulative value in the legend when I hover the bars? [18:23:17] oh [18:23:24] not sure [18:24:02] really milimetric is the one to ask [18:24:21] he isn't around today, but he should be around later this week [18:24:46] ok,I'll ask him when I see him, the stacked bars already solve a lot for me, thanks ;) [18:24:51] great [18:57:53] baack [19:07:50] erosen, do you need/want historical for this new job? [19:07:53] zero_country? [19:08:11] the existing zero/country/ data is inaccurate, right? [19:09:35] ottomata: yes, existing data is inaccurate [19:09:44] i will eventually need new data [19:09:47] k [19:09:49] but let's just do a sample [19:09:52] like the last month [19:09:54] is that easy? [19:10:06] yeah, think so [19:10:30] you mean april or may? [19:15:02] how about april [19:15:11] ottomata: sorry, didn't see this ^^ till now [19:15:44] ok [19:15:45] s'ok [19:15:53] this is going to take a bit of playing and tweaking [19:15:58] oozie is slow going for dev :) [19:16:07] i'm not sure what you want done to Zero.java though [19:16:48] i'm building a new mcc-mnc.json file [19:16:55] and then we can use that one [19:17:06] we can work on the two issues separately [19:31:04] ottomata: is it simple to disable the translation of x_cs to carrier name? [19:32:03] ? [19:32:06] i know nothing! [19:32:08] hehe [19:32:17] this line: FLATTEN(ZERO(x_cs)) AS (carrier:chararray, carrier_iso:chararray) [19:32:26] you want to remove that? [19:32:27] eyah totally [19:32:29] ideally we could just extract the x_cs line [19:32:31] do you want to just group by x-cs [19:32:33] yeah no probs [19:32:43] i meant ideally we could extract the x_cs **code** [19:32:55] which I think the zero udf does [19:33:03] i'll find a better line to send you [19:35:59] hm ok [19:36:00] oh [19:36:12] k [19:36:15] okay, nvm it is too much work i think [19:37:23] we could replace this line: https://github.com/wikimedia/kraken/blob/master/kraken-pig/src/main/java/org/wikimedia/analytics/kraken/pig/Zero.java#L91 with this: getMCC_MNC() [19:37:28] but i don't really like that solution [19:37:48] i just can't decide whether the number -> name lookup should happen in kraken or python