[01:31:22] (PS4) Terrrydactyl: [WIP] Add ability to tag a cohort [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/133091 [07:34:40] (CR) Gilles: "This can be merged now, FYI. I've made sure that the new tables for the updated schema versions exist already." [analytics/multimedia] - https://gerrit.wikimedia.org/r/134065 (owner: Gilles) [08:57:24] (CR) Nuria: Refactor cohort methods into service (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134480 (owner: Milimetric) [09:21:29] (CR) Nuria: "Please add pertinent tests and handling of empty tags. Validation of ui input needs to happen at controller level (at least)." (4 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/133091 (owner: Terrrydactyl) [10:55:50] (CR) Nuria: [WIP] Add cohort class hierarchy, refactor CohortService (4 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134743 (owner: Milimetric) [13:31:32] qchris: https://gerrit.wikimedia.org/r/#/c/133425/ [13:31:33] better? [13:31:43] Just reviewing your commits. [13:32:12] We're having the same nits as with kraken-deploy back then. [13:32:24] But I'll merge and provide follow-up commits. [13:36:23] (PS1) QChris: Limit git-fat to artifacts subdirectory [analytics/refinery] - https://gerrit.wikimedia.org/r/134818 [13:36:29] (PS1) QChris: Whitespace cleanup for gitfat configuration [analytics/refinery] - https://gerrit.wikimedia.org/r/134819 [13:37:27] :) [13:37:34] ah i forgot about that artifacts/ limitation [13:38:07] (CR) Ottomata: [C: 2] Limit git-fat to artifacts subdirectory [analytics/refinery] - https://gerrit.wikimedia.org/r/134818 (owner: QChris) [13:38:12] (CR) Ottomata: [C: 2] Whitespace cleanup for gitfat configuration [analytics/refinery] - https://gerrit.wikimedia.org/r/134819 (owner: QChris) [13:38:29] (CR) QChris: [C: 2 V: 2] "Way better :-)" [analytics/refinery] - https://gerrit.wikimedia.org/r/133425 (owner: Ottomata) [13:40:43] (CR) Ottomata: [V: 2] Limit git-fat to artifacts subdirectory [analytics/refinery] - https://gerrit.wikimedia.org/r/134818 (owner: QChris) [13:40:51] (CR) Ottomata: [V: 2] Whitespace cleanup for gitfat configuration [analytics/refinery] - https://gerrit.wikimedia.org/r/134819 (owner: QChris) [13:41:20] ah! i didn't save my drafts for the review of the other commit [13:41:35] (CR) Ottomata: Adding bin/sequence-file wrapper for refinery-tools (7 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/133519 (owner: Ottomata) [13:41:55] Me presses F5 :-D [13:42:40] (PS4) Ottomata: Add bin/sequence-file wrapper for refinery-tools [analytics/refinery] - https://gerrit.wikimedia.org/r/133519 [13:43:23] (CR) QChris: [C: -1] Add bin/sequence-file wrapper for refinery-tools (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/133519 (owner: Ottomata) [13:44:04] DUH! [13:44:14] https://gerrit.wikimedia.org/r/#/c/133520/ [13:44:15] I see ... it's dependent on the refinery/source change. [13:44:20] Yes. [13:45:04] I looked over that yesterday, and it's fine. Let me just check that the angle brakcets for parameters also wolk nicely on the java libraries for parameter passing. [13:45:28] hmm, ok! [13:46:11] well, actually, I doubt it would make a difference [13:46:25] docopt is the only one where it would matter [13:46:31] as it actually parses the documentation [13:46:47] But e.g.: args4j does the opposite. [13:46:54] It reads code and prints documentation. [13:46:58] OH [13:47:01] ok let's check it then [13:47:52] results in [13:47:52] "-wrong" is not a valid option [13:47:52] -file FILE : Sets a file if that is present [13:47:52] -name VAL : Sets a name [13:47:54] http://args4j.kohsuke.org/sample.html [13:47:58] hmm [13:48:02] those are for option docs [13:48:10] but those are arguments to the options [13:48:11] so [13:48:13] captial it is? [13:48:27] I guess/hope we can adjust that in args4j. [13:48:40] You want this merged before standup? [13:49:35] not necessarily, but why not eh? [13:49:39] We'll have to supply metavarwith angle brackets. But that should be ok for now. [13:49:41] i'm changing to captials... [13:49:44] ? [13:49:45] Nono. [13:49:47] no? [13:49:49] Leave the angle ones. [13:49:52] ok... [13:49:53] That's fine. [13:51:55] (CR) QChris: [C: 2 V: 2] "LGTM." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/133520 (owner: Ottomata) [13:55:09] yay :) [13:55:41] I will get you an artifact shortly, and we can merge the other one then [13:55:57] Coolio. [13:59:03] qchris, what is 'shim'? [13:59:11] "Maybe something like “percentage_helper”, “percentage_shim”, “shim”, ..." [13:59:43] something you force between two things in order to make it work. [13:59:54] haha [13:59:55] hahaha [14:00:01] yes, that is the definition i know too! [14:00:41] https://gerrit.wikimedia.org/r/#/c/134377/ [14:13:40] (CR) Milimetric: [WIP] Add cohort class hierarchy, refactor CohortService (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134743 (owner: Milimetric) [14:21:18] qchris, milimetric I totally can use the aggregates within the same query [14:21:22] dunno why I thought I couldn't... [14:21:27] so, ja, I think we can just use one query [14:21:27] :-D [14:21:36] EVEN with the percentage :p [14:21:39] Yay for simplicity [14:22:04] Is percentage what we are interested in? [14:22:13] (rounding errors, cancellation, ...) [14:22:22] i think both are good [14:22:24] difference and percentage [14:22:30] because the volumne of requests are different for different hosts [14:22:48] so, if i'm going to graph this, its nice to have it uhh...normalized? :p [14:22:55] even though totally denormalizes the table :p [14:23:17] evil denormalized tables :-D but ok. [14:28:55] (CR) Ottomata: Add hive/webrequest/presence.hql to help monitor webrequest loss and duplication (12 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/134377 (owner: Ottomata) [14:29:12] (PS2) Ottomata: Add hive/webrequest/presence.hql to help monitor webrequest loss and duplication [analytics/refinery] - https://gerrit.wikimedia.org/r/134377 [14:54:20] qchris, hm, maybe we shoudl clean up poms, and import some more stuff into refinery/source before we make a release [14:54:25] meaning, before I add the artifact [14:54:33] for refinery-tools [14:55:00] The artifacts built with the current poms should be fine. [14:55:14] It's just that we can trim some things. [14:55:29] But if you prefer, we can do that beforehand. [14:56:19] About importing more stuff ... sure, we can do. But the current version would already allow to [14:56:29] interact with Snappy SequenceFiles. [14:56:37] That would be helpful already. [15:01:05] yeah, i guess i'm just trying to avoid making more releases than necessary [15:02:39] Agreed. But we'd need a artifact for https://gerrit.wikimedia.org/r/#/c/133519/ [15:02:43] wouldn't me? [15:02:48] s/me/we/ [15:03:14] yes [15:03:23] i'm saying we'd just wait to merge that too [15:03:32] Ok. [15:03:39] Fine by me :-) [15:03:50] (Just thought you want all those changes merged) [15:04:25] * qchris_meeting marks today on the calendar "ottomata hitting the brakes" :-D [15:04:29] haha [15:04:31] * qchris_meeting has never seen that before :-D [15:04:37] well, i mean, let's get the source repo all nice [15:04:38] we need to: [15:05:07] - fix up poms [15:05:08] - bring in nuria's ua parser udfs [15:05:08] - geocoding udfs too (?) [15:05:34] Not sure where to draw the line. What about X-Forwarded-For handlng? [15:07:50] Psshhh [15:08:14] Geocoding without X-Forwarded-For handling is pointless :-) [15:08:26] Because you would not know which IP to geocode. [15:09:36] PSsh, this is for prototype only! [15:10:03] * qchris_meeting accepts "Psssh" :-) [15:10:03] haha [16:00:14] (PS3) Ottomata: Add hive/webrequest/presence.hql to help monitor webrequest loss and duplication [analytics/refinery] - https://gerrit.wikimedia.org/r/134377 [16:02:57] (CR) Nuria: [WIP] Add cohort class hierarchy, refactor CohortService (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134743 (owner: Milimetric) [16:06:17] running out to grab lunch, back in a bit [18:43:56] (CR) MarkTraceur: [C: 2] Take sampling factor into account [analytics/multimedia] - https://gerrit.wikimedia.org/r/134065 (owner: Gilles) [19:15:11] (PS3) Milimetric: [WIP] Add cohort class hierarchy, refactor CohortService [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134743 [19:58:06] heya DarTar, can you join #wikimedia-operations for a sec [19:58:14] i'm asking mutante about the research group thing [19:58:18] Ironholds: yt? [19:58:58] ottomata, I am! [19:59:16] i'm running that query right now too! [19:59:19] just to find the error [19:59:29] so, i find it very very strange that it takes a while to actually submit that query [19:59:35] i experimented with different versions of it [19:59:46] and noticed the more partitions i worked with, the longer it would take to submit [20:00:03] which i find very strange...as if the hive client has to process a bunch of metadata before submitting the job? [20:00:05] dunno... [20:00:11] yeah, I have no idea. [20:00:14] The query is actually now running [20:00:19] I'll let you know if it explodes [20:00:35] wait mine is too! [20:00:35] but it's 53/4 percent of the way through the subquery without balking. [20:00:38] That makes me optimistic. [20:00:46] oh of the mapper? [20:00:51] the only question is whether it'll hit 100/100 and/or die on the second MR task [20:00:56] 53 map, 4 reduce [20:01:09] oh, hm, mine has been running for like an hour [20:01:11] and it is only at 2% [20:01:17] i will kill mine then [20:01:20] probably because it's running next to mine [20:01:21] good luck! [20:01:21] if you are going to get the error output from yours [20:01:26] gaha [20:01:28] ha [20:01:42] yeah, it's pretty much impossible, I've found, to elegantly kill queries from the terminal or run multiple MR tasks at once. [20:01:59] I tried, a few weeks ago, running 3 tasks simultaneously. Bad idea. [20:01:59] They were actually running a deficit on the reduce portion. [20:02:04] That is: it took more than a second to add a second of CPU time. [20:02:16] ha, hm [20:02:24] you can kill queries! [20:02:32] yarn application -kill application_1387838787660_12241 [20:02:37] ahhh [20:02:47] ctrl+cing is not somrthing hive likes [20:02:50] just killed mine [20:02:51] yeah, it doesn't [20:02:57] because at that point hive isn't controlling the job anymore [20:03:01] its a distributed hadoop yarn application [20:03:21] the only reason you get output at all is because your hive process is still connected to the job somehow [20:03:24] maybe it just polls, dunno [20:03:36] ahhhhh [20:03:43] okay! We should absolutely note that somewhere. [20:03:46] so if you ctrl-c, you just kill the interface [20:03:52] I need the breathing space to write fuller hive documentation [20:04:05] i'll add that to the hive page [20:08:05] Ironholds: https://wikitech.wikimedia.org/wiki/Analytics/Kraken/Hive#Killing_a_running_query [20:08:11] danke! [20:37:49] (PS4) Milimetric: [WIP] Add cohort class hierarchy, refactor CohortService [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134743 [21:41:08] (PS5) Milimetric: [WIP] Add cohort class hierarchy, refactor CohortService [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134743 [21:42:14] (PS1) Milimetric: Fix bad file_manager creation [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134951 [21:44:05] (PS6) Milimetric: [WIP] Add cohort class hierarchy, refactor CohortService [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134743 [22:20:16] (PS7) Milimetric: [WIP] Add cohort class hierarchy, refactor CohortService [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134743