[07:55:13] (PS8) Nuria: Add autcomplete to tags on tag insertion [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/145039 (owner: Terrrydactyl) [08:06:38] (PS9) Nuria: Add autcomplete to tags on tag insertion [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/145039 (owner: Terrrydactyl) [08:27:07] (CR) Nuria: Refactor Aggregation constants (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/146520 (owner: Milimetric) [08:34:24] (CR) Nuria: "Is this change supposed to be backwards compatible with our prior format? I do not think it needs to be (and from my tests it is not) but " (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/146521 (https://bugzilla.wikimedia.org/67822) (owner: Milimetric) [08:42:10] (CR) Nuria: "This is so much better. It will be about 1 order of magnitude of file size savings. The dashboard will be a lot happier. Now we only need " [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/146521 (https://bugzilla.wikimedia.org/67822) (owner: Milimetric) [09:47:01] Analytics / EventLogging: UniversalLanguageSelector-tofu logging too much data - https://bugzilla.wikimedia.org/67463#c1 (nuria) Language team was logging data to that table at a very high rate without being aware they were doing so. Since their intention was to run an experiment for couple weeks they... [12:22:57] (CR) QChris: Avoid pickle max recursion while serializing chain (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/146297 (https://bugzilla.wikimedia.org/67823) (owner: Milimetric) [14:23:33] (CR) Ottomata: Document 'refinery-' prefix for bin directory (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/145976 (owner: QChris) [14:27:57] (PS2) Ottomata: Document naming conventions [analytics/refinery] - https://gerrit.wikimedia.org/r/145421 (owner: QChris) [14:28:04] (CR) Ottomata: [C: 2 V: 2] Document naming conventions [analytics/refinery] - https://gerrit.wikimedia.org/r/145421 (owner: QChris) [14:34:32] (PS1) Ottomata: s/kraken/refinery in refinery-tools/README.txt [analytics/refinery/source] - https://gerrit.wikimedia.org/r/146778 [14:34:34] (PS1) Ottomata: Build refinery version 0.0.1 against 2.3.0-cdh5.0.2 [analytics/refinery/source] - https://gerrit.wikimedia.org/r/146779 [14:34:52] (CR) Ottomata: [C: 2 V: 2] s/kraken/refinery in refinery-tools/README.txt [analytics/refinery/source] - https://gerrit.wikimedia.org/r/146778 (owner: Ottomata) [14:35:08] qchris: https://gerrit.wikimedia.org/r/#/c/146779/ [14:36:18] ottomata: Looks good. Let me try to build it. [14:39:01] (CR) QChris: [C: 1 V: -1] "Building for me fails with" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/146779 (owner: Ottomata) [14:48:28] (CR) Milimetric: Add autcomplete to tags on tag insertion (4 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/145039 (owner: Terrrydactyl) [14:48:55] nuria: ^ review posted with note for custom bindings [14:49:03] k [14:50:00] will grab it once i am done watching steve sanderson presentation [14:53:01] (CR) Milimetric: Avoid pickle max recursion while serializing chain (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/146297 (https://bugzilla.wikimedia.org/67823) (owner: Milimetric) [14:53:19] hmm, qchris, i'm having troulbe with a snappy dep building now [14:53:25] i didn't before... [14:55:15] :-/ [15:01:21] ah needed to add it to archiva [15:01:25] there we go [15:01:27] ok try now qchris [15:03:53] Build worked. Let me check the cdh part. [15:04:51] (CR) QChris: [C: 2 V: 2] Build refinery version 0.0.1 against 2.3.0-cdh5.0.2 [analytics/refinery/source] - https://gerrit.wikimedia.org/r/146779 (owner: Ottomata) [15:05:56] danke! [15:18:25] (PS1) Ottomata: Add refinery-tools-0.0.1.jar artifact and symlink [analytics/refinery] - https://gerrit.wikimedia.org/r/146787 [15:18:31] qchris ^ :) [15:18:46] Hahaha. I waited for that one :-D [15:18:52] haha, i knew you were :) [15:18:55] also, once that is done [15:18:56] https://gerrit.wikimedia.org/r/#/c/133519/ [15:18:56] :) [15:20:01] Jajajaja ... you're justhe CR requests, so we cannot prepare for the ETL design discussion and have a easy game convincing us of your approach :-P [15:20:18] oh haha [15:20:21] are you prepping? [15:20:22] i am not! [15:20:42] No. I am not preparing ... I am reviewing code :-P [15:20:43] i want hive to work, and i want to be able to test stuff, and i don't want to have to manually copy the tools.jar to do so :) [15:20:47] just doing things right! [15:20:47] haha [15:20:49] good! [15:20:54] these are easy peasy reviews though [15:20:58] just close your eyes and hit submit! [15:24:18] ll [15:24:29] Sorry. [15:26:38] eh? [15:27:07] Sorry. Got pinged typing away 'll'. [15:27:37] The 'll' was not meant for IRC. [15:29:36] (CR) QChris: [C: 2 V: 2] Add refinery-tools-0.0.1.jar artifact and symlink [analytics/refinery] - https://gerrit.wikimedia.org/r/146787 (owner: Ottomata) [15:32:24] (PS5) Ottomata: Add bin/sequence-file wrapper for refinery-tools [analytics/refinery] - https://gerrit.wikimedia.org/r/133519 [15:35:34] (PS2) Milimetric: Refactor All Enums to a separate file [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/146520 [15:38:59] (PS6) Ottomata: Add bin/sequence-file wrapper for refinery-tools [analytics/refinery] - https://gerrit.wikimedia.org/r/133519 [15:42:07] (CR) Nuria: [C: 2] "This looks a lot better. Thank you. I think this can be merged." [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/146520 (owner: Milimetric) [15:42:45] grr, having rebasing issues, sorry nuria, I'm trying to get the other patch in [15:43:21] the 146520 will be merged by jenkins i think [15:45:22] (CR) Milimetric: [V: 2] "jenkins looks a little stuck" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/146520 (owner: Milimetric) [15:46:47] (PS4) Milimetric: Make coalesced report format more efficient [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/146521 (https://bugzilla.wikimedia.org/67822) [15:47:44] ok, better [17:16:11] (CR) BryanDavis: "recheck" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/146520 (owner: Milimetric) [17:53:37] qchris: https://gerrit.wikimedia.org/r/#/c/133519/ :D [17:54:18] ottomata: :-D [17:54:38] also, ok, i'm not sure what's wrong with my sequence fil eerror [17:54:43] where are you testing, in labs? vagrant? [17:54:51] In labs. [17:54:52] how are you putting data in? [17:54:57] which machines? mine? [17:55:00] Through my cluster-scripts. [17:55:09] No. the worker nodes. [17:55:12] qchris-worker? [17:55:21] qchris-worker2 typically. [17:55:30] But I am just bootstrapping the cluster again ... [17:55:36] So it's currently not up. [17:55:47] oh [17:55:55] ok, i noticed that earlier, was like, but hive and hfds aren't evne here! [17:56:26] Yep. ... But getting distracted by meetings and gerrit stuff :-D [17:58:59] aye [17:59:11] i'm running them on my cluster :) [18:00:04] So how can I help debug? [18:00:18] Like trying to get data in from analytics1011? [18:00:49] Or could you post the error message? [18:02:23] https://gist.github.com/ottomata/5444e31ef204501cceba [18:02:31] thta's jsut from a select count(*) ... [18:03:40] oh yay that error! [18:03:49] you've seen that Ironholds? [18:04:16] yeah, I got it running over files that did not exist. [18:04:27] it's worrying how many hadoop error messages I can interpret now [18:04:39] (old query with wrong month=foo parameter, pointing at partitions that were no longer containing files) [18:04:45] I know thaht error. [18:04:57] Let me find it in my notes ... [18:07:25] HMMMM [18:07:31] so that maybe means the location is wrong?! [18:08:16] ottomata ... there is no table wmf on the cluster ... you're using otto? [18:08:46] yes, but i'm also considering renaming the 'wmf' database to 'external' [18:08:50] so there is an external db right now [18:08:52] but i'm using otto [18:09:10] Ok. [18:10:10] webrequest_text/hourly ... shouldn't that be hourly/webrequest_text [18:11:47] no [18:11:54] /wmf/data/external/webrequest/webrequest_text/hourly/2014/07/16/18/webrequest_text.18.10.23116.3434569163.1405533600000 [18:11:59] ok. [18:13:32] this is how i created a partition [18:13:32] alter table webrequest add partition (webrequest_source='text',year=2014,month=07,day=15,hour=14) location '/wmf/data/external/webrequest/webrequest_text/hourly/2014/07/15/14'; [18:13:40] sorry, i'm looking at the external.webrequest table now [18:13:45] since that's the one I really want to work [18:13:56] Ok ... switching over. [18:16:34] 'select * from webrequest where year > 0 limit 1;' works for me on that table. [18:18:09] ottomata: Could you please create me a home directory in hdfs? [18:18:12] I am getting [18:18:17] ja [18:18:22] Permission denied: user=qchris, access=WRITE, inode="/user":hdfs:hadoop:drwxrwxr-x [18:18:51] Thanks. [18:18:53] i haven't remade everyone's user dirs yet because I think we want to put some thought into the stats user [18:18:55] but ja, there you go [18:19:05] Thanks. [18:20:03] ottomata: Even 'select count(*) from webrequest where year > 0 limit 1;' worked for me. [18:20:11] What query is giving you the error? [18:20:44] you know, i got that to work sometimes too, maybe it always works that way on the webrequest table [18:20:46] lemme see [18:22:00] "sometimes" ... that sounds fun. [18:22:28] ja ok, count works, it wasn't working in labs or somewhere else, moving back to a real query, checking... [18:22:33] ja [18:22:34] SELECT [18:22:34] referer, [18:22:34] COUNT(DISTINCT ip) AS hits [18:22:34] FROM [18:22:34] webrequest [18:22:34] WHERE [18:22:35] webrequest_source = 'mobile' AND year = 2014 AND month= 07 AND day = 14 AND hour = 14 [18:22:35] GROUP BY referer [18:22:36] ORDER BY hits DESC [18:22:36] LIMIT 50; [18:22:43] try that [18:22:49] * qchris tries [18:23:06] DAY = 14 [18:23:10] THERE IS NO DAY = 14 partition [18:23:11] !!! [18:23:15] GRRRRR am i that blind?! [18:23:37] Ha. [18:23:55] AH [18:23:55] no, phew [18:23:57] it still errosr [18:23:59] with day = 15 [18:24:03] same error [18:24:30] i'm going to explictly add another partition and use it, just to be sure [18:26:35] ottomata: Quoting does the trick. When quoting the partitions it starts working. [18:26:48] in the query or in the add partitoins statement [18:26:48] like: [18:26:51] SELECT referer, COUNT(DISTINCT ip) AS hits FROM webrequest WHERE webrequest_source = 'mobile' AND year = '2014' AND month= '07' AND day = '15' AND hour = '15' GROUP BY referer ORDER BY hits DESC LIMIT 50; [18:27:10] in the query. [18:29:16] MMMMM [18:29:30] it does work [18:29:33] \o/ [18:29:34] welllllLLLLL ok then! [18:29:40] quotes it is! [18:29:50] :-D [18:29:53] thank you qchris, not sure why that didn't work when I tried it before, mabye my labs or otto table was all weird anyway [18:30:25] yw [18:31:02] well, good! [18:31:24] qchris, you ok with the 'external' database name? I think its better, and will allow us to keep the raw external data separate from what we intend people to use [18:41:06] (PS1) Ottomata: Import webrequest_upload into HDFS as well [analytics/refinery] - https://gerrit.wikimedia.org/r/146842 [18:41:22] (CR) Ottomata: [C: 2 V: 2] Import webrequest_upload into HDFS as well [analytics/refinery] - https://gerrit.wikimedia.org/r/146842 (owner: Ottomata) [18:41:39] qchriiiis, gimme some pluses on https://gerrit.wikimedia.org/r/#/c/133519/ :D [18:41:42] i want to deploy! [18:41:48] (PS7) Ottomata: Add bin/sequence-file wrapper for refinery-tools [analytics/refinery] - https://gerrit.wikimedia.org/r/133519 [18:44:28] sorry, ottomata. [18:45:06] About the "external" database name ... not sure. What do we want to convey there? [18:45:26] To me, it's more like "raw" [18:45:40] because ... being "external" is not really descriptive ... is it? [18:47:06] Maybe "wmf-raw"? [18:47:13] ha, i had raw once, milimetric didn't lik eit [18:47:14] liek it [18:47:18] (oh cant' type) [18:47:31] Does milimetric like "external"? [18:47:39] that's why 'external' is in the file path [18:48:03] er? [18:48:06] (I thought 'external' in the path meant 'this data is used in external tables') [18:48:16] I didn't like raw? [18:48:27] milimetric: likes 'raw'? [18:48:57] wait, what does raw refer to? [18:49:03] untouched from camus? [18:49:11] yes (I hope) [18:49:20] then yeah, I think that's great [18:49:43] I think I didn't like "raw" to be something else, like, after geocoding or sanitizing [18:50:11] Cool. [18:50:24] 'cause then it's like maybe medium-rare or rare but not raw :) [18:50:24] Makes sense. [18:50:30] :-D [18:50:46] ummm, ok raw then is fine with me? [18:50:50] database name: [18:50:54] wmf-raw? or just raw? [18:51:12] i will switch path to /wmf/data/raw... [18:51:33] What will the database for processed data be called? [18:51:33] i think jsut 'raw' is fine [18:51:58] wmf [18:51:58] ? [18:52:13] It is not too obvious thaht 'raw' will get turned into 'wmf'. [18:52:18] haha [18:52:21] refined [18:52:22] ? [18:52:22] hah [18:52:28] Hence, 'wmf-raw'. [18:52:35] ok ok [18:52:39] But yes ... raw -> refined would work. [18:52:39] wmf-raw is fine with me [18:53:07] i like 'wmf-raw' and 'wmf' better, s'ok? [18:53:12] Sure. [19:09:12] (PS1) Ottomata: Import webrequest data via camus into /wmf/data/raw directory [analytics/refinery] - https://gerrit.wikimedia.org/r/146846 [19:13:52] (CR) Ottomata: [C: 2 V: 2] Import webrequest data via camus into /wmf/data/raw directory [analytics/refinery] - https://gerrit.wikimedia.org/r/146846 (owner: Ottomata) [19:14:36] doodee dooo, wouldn't qchris love to review this one? [19:14:39] https://gerrit.wikimedia.org/r/#/c/133519/ [19:14:58] sorry qchris, just want to do a deploy and it'd be nice to have this go along with it so I don't have to do another one later [19:15:02] :-D [19:15:15] id eploy this, re do the database, paths, etc., start importing upload, and bam, good to go for this part [19:15:50] if you think there is a lot more discussion on sequence-file tool then we can postpone and I can do another deploy later i guess [19:16:06] LI do not think that there'll be much more discussion. [19:16:14] Let me just go through it ... [19:16:19] Doing it right now. [19:16:53] '# --jar default should be in ../../artifacts/ directory' [19:16:58] Looks like it should be [19:17:02] # --jar default should be in ../artifacts/ directory [19:17:04] oh in the comment [19:17:04] ok [19:17:38] (PS8) Ottomata: Add bin/sequence-file wrapper for refinery-tools [analytics/refinery] - https://gerrit.wikimedia.org/r/133519 [19:18:19] PS7 and PS8 agree for me. [19:18:20] anythign else? [19:18:28] that was a rebase [19:18:34] Oh. Ok. [19:18:51] So the '../../artifacts' is the only thing that I see. [19:18:53] will submit PS9 if nothign else [19:18:53] ok [19:19:04] (PS9) Ottomata: Add bin/sequence-file wrapper for refinery-tools [analytics/refinery] - https://gerrit.wikimedia.org/r/133519 [19:19:32] (CR) QChris: [C: 2 V: 2] Add bin/sequence-file wrapper for refinery-tools [analytics/refinery] - https://gerrit.wikimedia.org/r/133519 (owner: Ottomata) [19:20:09] thank you! [19:20:22] Anything else to merge before you can deploy? [19:20:27] naw think that's it [19:20:30] Cool. [19:27:51] (PS1) Ottomata: Fix typo in camus/camus.webrequest.properties [analytics/refinery] - https://gerrit.wikimedia.org/r/146852 [19:28:05] (CR) Ottomata: [C: 2 V: 2] Fix typo in camus/camus.webrequest.properties [analytics/refinery] - https://gerrit.wikimedia.org/r/146852 (owner: Ottomata) [19:33:49] qchris, i know I am Mr. interrupter today, but i'm just curious! [19:33:55] how's the add partition reworking going? [19:34:17] Hahaha. [19:34:27] It's not gonna happen today. [19:34:46] Sorry. [19:34:54] (Although it is simple) [19:34:55] oh i know [19:34:59] i'm just curious about it [19:35:05] did bundles end up making sense? [19:35:31] Yes. But we might need to use the same coordinator 4 times. [19:35:40] But that might change once everything is in place. [19:36:00] like, instead of a bundle? [19:36:16] No. Like the bundle injecting [19:36:19] hm aye [19:36:23] k [19:36:26] the same coordinator with 4 different properties. [19:36:55] Let me rephrase that ... [19:37:22] "Like the bundle injecting the same coordinator 4 times. Each time with a slightly modified properties file" [19:37:53] aye [19:38:14] we could also just submit 4 different coordinators with different properites files manually too, but the bundle makes it nicer to admin via the cli, right? [19:38:18] groups the 4 jobs together? [19:38:39] hm, since the partitions need to be strings, i wonder if I should create the table partitioned by strings [19:38:42] PARTITIONED BY ( [19:38:42] `webrequest_source` string, [19:38:42] `year` int, [19:38:42] `month` int, [19:38:42] `day` int, [19:38:42] `hour` int) [19:38:42] hm [19:38:51] i guess it works like that... [19:39:22] Right, the bundle bundles them together. [19:39:45] I am not sure about the strings vs int partitions. [19:39:53] ints feel right here ... [19:40:10] the fact that we somewhat have to treat them as strings is bad though. [19:40:35] But I did not hunt the issue down. [19:40:49] aye, don't worry, ok, i'll mess with it [19:40:50] Not sure ... maybe it is on purpose, maybe a hive issue, maybe a cdh issue. [19:41:07] I'd just keep ints for now, and quote them when querying. [19:55:23] ok, if I define the partition columns as strings, and add the partitions as strings [19:55:30] then querying via strings and via ints works [19:55:36] so I think that is the better way to go [20:01:13] (PS1) Ottomata: Add webrequest table schema [analytics/refinery] - https://gerrit.wikimedia.org/r/146878 [21:12:27] leila: You should register leilatretikov. [21:12:45] hahaha! [21:12:53] seriously! ;-) [21:12:53] :P [21:13:00] leila: you should make him put money in the jar [21:13:01] and she should register LilaZia [21:13:14] why? did he do the bad thing? [21:13:16] I'm not mistaking anything! I'm fully cognizant of the difference. [21:13:18] ah [21:13:20] right. [21:13:31] oh, I'm in -analytics. [21:13:33] nevermind then :) [21:13:47] What [23:14:04] Analytics / Refinery: Story: AnalyticsEng has daily PageView counts - https://bugzilla.wikimedia.org/67804#c1 (Kevin Leduc) Following discussions and note taking here: http://etherpad.wikimedia.org/p/analytics-design Requirements: * Use the cluster * Sample Report (tsv) | Hour | Project | Total Pagev... [23:14:34] Analytics / Refinery: Story: AnalyticsEng has hourly PageView counts - https://bugzilla.wikimedia.org/67804 (Kevin Leduc)