[05:55:18] (CR) Gilles: [C: 2] Code hygiene: pass flake8 python linter [analytics/multimedia] - https://gerrit.wikimedia.org/r/134128 (owner: Hashar) [05:55:50] (CR) Gilles: [V: 2] Code hygiene: pass flake8 python linter [analytics/multimedia] - https://gerrit.wikimedia.org/r/134128 (owner: Hashar) [05:56:23] (CR) Gilles: [C: 2 V: 2] Tox environement to run flake8 python linter [analytics/multimedia] - https://gerrit.wikimedia.org/r/134144 (owner: Hashar) [08:01:07] (CR) Hashar: "recheck" [analytics/multimedia] - https://gerrit.wikimedia.org/r/134065 (owner: Gilles) [09:02:42] (PS1) Hashar: Tox configuration to run flake8 (python linter) [analytics/wp-zero] - https://gerrit.wikimedia.org/r/134314 [09:05:06] (CR) Hashar: "recheck" [analytics/wp-zero] - https://gerrit.wikimedia.org/r/134314 (owner: Hashar) [09:07:31] (CR) Hashar: "The job fail because the repository has a bunch of linting errors :-]" [analytics/wp-zero] - https://gerrit.wikimedia.org/r/134314 (owner: Hashar) [09:30:36] (PS1) Hashar: Lint: allow commenting of code [analytics/wp-zero] - https://gerrit.wikimedia.org/r/134320 [09:30:38] (PS1) Hashar: Lint: pass pep8 checks [analytics/wp-zero] - https://gerrit.wikimedia.org/r/134321 [09:30:40] (PS1) Hashar: Lint: remove unused imports [analytics/wp-zero] - https://gerrit.wikimedia.org/r/134322 [09:31:28] (CR) Hashar: "The flake8 Jenkins job is now passing \O/" [analytics/wp-zero] - https://gerrit.wikimedia.org/r/134322 (owner: Hashar) [11:42:13] (CR) Ottomata: Adding refinery-tools and pom.xml content (6 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/133520 (owner: Ottomata) [11:43:38] (CR) Ottomata: "I did not try to run any mvn commands for this commit. It works fine in the next one though, as the parent pom is no longer empty, and it" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/133257 (owner: Ottomata) [11:43:53] (CR) Ottomata: "OOF, thanks for catching that..." [analytics/refinery] - https://gerrit.wikimedia.org/r/133425 (owner: Ottomata) [11:44:13] (PS2) Ottomata: Adding refinery-tools and pom.xml content [analytics/refinery/source] - https://gerrit.wikimedia.org/r/133520 [11:45:03] (CR) Ottomata: "For safety's sake, I'm going to abandon this and create a new commit." [analytics/refinery] - https://gerrit.wikimedia.org/r/133425 (owner: Ottomata) [11:46:30] (CR) Ottomata: Initial repository layout (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/133425 (owner: Ottomata) [11:46:47] (Abandoned) Ottomata: Initial repository layout [analytics/refinery] - https://gerrit.wikimedia.org/r/133425 (owner: Ottomata) [11:49:14] (Restored) Ottomata: Initial repository layout [analytics/refinery] - https://gerrit.wikimedia.org/r/133425 (owner: Ottomata) [11:54:05] (PS3) Ottomata: Initial repository layout [analytics/refinery] - https://gerrit.wikimedia.org/r/133425 [11:56:43] (CR) Ottomata: "Ok, qchris, how's that?" [analytics/refinery] - https://gerrit.wikimedia.org/r/133425 (owner: Ottomata) [11:59:19] (PS2) Ottomata: Adding bin/sequence-file wrapper for refinery-tools [analytics/refinery] - https://gerrit.wikimedia.org/r/133519 [12:02:32] (PS1) Hashar: Tox configuration to run flake8 (python linter) [analytics/geowiki] - https://gerrit.wikimedia.org/r/134334 [12:09:21] (PS3) Ottomata: Adding bin/sequence-file wrapper for refinery-tools [analytics/refinery] - https://gerrit.wikimedia.org/r/133519 [12:10:11] (CR) Hashar: "recheck" [analytics/geowiki] - https://gerrit.wikimedia.org/r/134334 (owner: Hashar) [12:13:35] (CR) Ottomata: [C: 2 V: 2] "Merging this. (some) poms are added in the next commit." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/133257 (owner: Ottomata) [12:15:46] (PS1) Hashar: Lint setup.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/134339 [12:15:48] (PS1) Hashar: Replace git URL to git.wikimedia.org [analytics/geowiki] - https://gerrit.wikimedia.org/r/134340 [12:19:01] (PS2) Hashar: Tox configuration to run flake8 (python linter) [analytics/geowiki] - https://gerrit.wikimedia.org/r/134334 [12:19:25] (CR) Hashar: "Ignore lines being too long and comment starting with '# '" [analytics/geowiki] - https://gerrit.wikimedia.org/r/134334 (owner: Hashar) [12:26:42] (PS1) Hashar: Lint geowiki/mysql_config.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/134342 [12:32:49] (PS1) Hashar: Lint geowiki/geo_coding.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/134345 [12:39:52] (PS1) Hashar: Lint geowiki/process_data.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/134347 [12:41:49] (PS2) Hashar: Lint setup.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/134339 [12:41:57] (PS2) Hashar: Replace git URL to git.wikimedia.org [analytics/geowiki] - https://gerrit.wikimedia.org/r/134340 [12:43:36] I have added flake8 jobs to some of your repositories (geowikip, wp-zero and multimedia) :D [13:34:08] thanks hashar, saw that [13:34:34] I have submitted a bunch of follow up patch to pass flake8 [13:35:21] s/submitted/proposed/ [14:13:46] (PS1) Milimetric: Clean up imports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134356 [14:13:48] (PS1) Milimetric: Fix type-o [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134357 [15:05:21] Ironholds: helloooo [15:05:27] what'sup? [16:12:39] (PS1) Ottomata: Add hive/webrequest/presence.hql to help monitor webrequest loss and duplication [analytics/refinery] - https://gerrit.wikimedia.org/r/134377 [16:50:39] ottomata, yo :) [16:50:44] sorry, was/sort of am in standup. [16:50:48] What's occurring? [16:51:43] oh was just pinging about your hive probs [16:51:48] i'm running your other query now [16:51:55] you are trying to return 5million results though... [16:55:51] wait, Ironholds, why the subquery there? [16:56:31] ottomata, because I want 5m randomly selected rows [16:56:46] and, yes. Should I not be? ;). [16:57:02] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling [16:57:50] neat! [16:58:10] and WHERE clauses still apply? [16:58:17] i've never done it, but i guess so! [16:58:20] the only bugbear is it doesn't allow a raw count [16:59:44] wait, no, it does! cool! [17:01:05] okay, let's see if that helps. [17:01:16] it shouldn't, because a query very much like this has worked before, but. [17:01:31] ok, yeah i'm not saying there isn't a problem [17:01:39] you've gotten hive to return you 5M rows on the CLI before? [17:04:23] yerp [17:04:53] the mobile sessions work [17:05:10] augh. ottomata, we're on Cloudera 4, aren't we. [17:05:33] yup [17:05:36] why? [17:06:04] support for TABLESAMPLE(N ROWS) is CDH5 up. [17:06:33] okay, retrying with a smaller number to see the class of problem we're dealing with. Let's see if that helps. [17:07:35] ah but Ironholds we have hive 0.10.0 [17:07:45] ah [17:07:47] the JIRA says Fix Version/s: [17:07:47] https://issues.apache.org/jira/browse/HIVE/fixforversion/12323587 [17:07:53] but the apache wiki says 0.10.0 [17:07:54] ah well [17:08:24] okay, it's giving me the same heap space error on 1m rows. [17:08:31] And that number I /know/ I've used before ;p [17:08:44] in fact, more than 1m, because it was all observations associated with 500k IPs. [17:13:29] HMMmmmMMm [17:20:42] huh! [17:20:46] you know what I think it might be? [17:20:57] the rlikes are causing it to go "bah, I don't have enough capacity for this! die! die!" [17:21:12] the first entries in the traceback are: [17:21:13] at java.util.Arrays.copyOf(Arrays.java:2367) [17:21:13] at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) [17:21:13] at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) [17:21:13] at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415) [17:21:14] at java.lang.StringBuffer.append(StringBuffer.java:237) [17:21:15] at [17:21:17] etc etc [17:22:09] Oliver, I got it a bit farther [17:22:09] by setting [17:22:09] (CR) Milimetric: [V: 1] Add hive/webrequest/presence.hql to help monitor webrequest loss and duplication (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/134377 (owner: Ottomata) [17:22:09] export HADOOP_HEAPSIZE=1024 [17:22:21] but, not sure, there are some other weird things too [17:22:25] hrm [17:22:50] if we can't MapReduce regexes we're kind of boned for...a lot of stuff. [17:27:49] ironholds: we did a bunch of queries with regexes for the glam folks before [17:28:07] hmhmnmnmn. [17:28:19] so, then, nothing in this query has not been done before [17:28:22] it's just exploding. [17:28:27] because...god hates me? [17:32:02] i think god hates you, and I think there is a little naughty directory somewhere it shouldn't be in the data [17:32:09] Job Submission failed with exception 'java.io.FileNotFoundException(Path is not a file: /wmf/data/external/webrequest/webrequest_bits/hourly/2014/05/15/08/08 [17:32:19] we will see...trying again [17:40:19] ironholds: you can checkout glam seklects here: https://commons.m.wikimedia.org/wiki/Commons:GLAMwiki_Toolset_Project/NARA_analytics_pilot [17:40:24] *selects [17:43:46] (PS1) Terrrydactyl: Capitalized User for dummy users. [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134395 [17:44:04] Ironholds: [17:44:06] try it now [17:44:13] first, without setting HADOOP_HEAPSIZE [17:44:15] if that fails you [17:44:16] then set it [17:44:28] export HADOOP_HEAPSIZE=1024; hive ... [17:44:47] you could do larger than that if you need [17:44:48] i think it might run now... [17:45:08] i'm going to run for a while, maybe someone will be in paris to play polo with me...doubtful since it rained today [17:45:09] laaters! [17:48:37] (PS1) Nuria: Can upload names with utf-8 characters via "Paste Usernames" textbox [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134397 (https://bugzilla.wikimedia.org/64893) [17:49:11] good luck ottomata [17:50:04] ottomata, totaly [17:53:37] (CR) Nuria: [C: 2] Clean up imports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134356 (owner: Milimetric) [18:01:21] (CR) Milimetric: Can upload names with utf-8 characters via "Paste Usernames" textbox (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134397 (https://bugzilla.wikimedia.org/64893) (owner: Nuria) [18:03:27] (CR) Nuria: [C: 2] Fix type-o [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134357 (owner: Milimetric) [18:03:40] (CR) Milimetric: [V: 2] Capitalized User for dummy users. [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134395 (owner: Terrrydactyl) [18:03:50] (PS2) Milimetric: Capitalized User for dummy users. [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134395 (owner: Terrrydactyl) [18:03:58] (CR) Milimetric: [C: 2] Capitalized User for dummy users. [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134395 (owner: Terrrydactyl) [18:05:51] ottomata, failed without the heapsize change [18:05:52] trying again [18:15:34] ottomata, well, it's at least /running/ [18:15:40] the slowest damnt hing I've ever seen, but it runs ;p [20:05:48] (PS1) Milimetric: Refactor cohort methods into service [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/134480 [21:16:48] ottomata, can you unabandom https://gerrit.wikimedia.org/r/#/c/49678/ ? It looks like it is no longer blocked, and a few people are interested. [21:44:42] (CR) Gergő Tisza: [C: 1] Take sampling factor into account [analytics/multimedia] - https://gerrit.wikimedia.org/r/134065 (owner: Gilles) [22:42:21] (PS3) Terrrydactyl: [WIP] Add ability to tag a cohort [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/133091 [22:42:39] (CR) jenkins-bot: [V: -1] [WIP] Add ability to tag a cohort [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/133091 (owner: Terrrydactyl)