[13:03:35] mooooorning [13:06:35] Gooooood morning drdee [13:27:17] mooorning [13:27:22] drdee, i know I've asked this question before [13:27:30] but when you and average were working on udp-filter [13:27:40] what was the longest buffer size you needed to handle the urls and user agents? [13:27:50] i believe 64k [13:29:53] Snaps: ^ :p [13:30:00] that's for the whole log line though [13:30:33] is your scratch buffer for a single field? [13:37:36] yes that's the whole line [13:37:52] check filter.c :) [13:39:23] https://git.wikimedia.org/blob/analytics%2Fwebstatscollector/62eaf03bab1ea515a42f809ecda315b959f8a36c/filter.c [13:53:22] danke [14:03:58] whop [14:04:07] what kind of fricked up request would require 64k?! [14:04:22] (PS6) Milimetric: Still WIP, almost there. [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 (owner: Ottomata) [14:04:31] haha [14:04:42] people do crazy things with wikipedia [14:05:34] uploading their private pictures as X-Holiday: seems to be one of them. [14:05:46] so this needs a proper fix in vk then. [14:09:01] but line 19 in filter.c says 4K [14:09:11] (which is what vk uses now) [14:10:14] we could do some quick queries to check if 4k is sufficient [14:10:57] neato [14:30:25] ping qchris [14:30:32] pong milimetric [14:30:37] batcave? [14:30:45] Sure. Coming. [14:30:55] collaborative design on pageview MVP [14:31:01] drdee: ^ [14:32:42] 1 sec\ [14:49:54] oof milimetric, i dunno how to merge our commits now [14:49:57] since they are actually one commit [14:49:58] hm [14:50:00] hmmmm [14:50:12] oh [14:50:17] did i mess up your work? [14:50:24] dunno, no, i mean, i know how to resolve the conflicts [14:50:25] hmmmmmmm [14:50:27] oh [14:50:37] maybe if I just get yours and then unstash my changes instead of merging hm [14:51:19] yeahhhh this will do it [14:52:31] sorry i'm in batcave with christian and diederik [14:56:06] s'ok no problem, i think i got it [14:56:07] i had to apply a stash instead of a merge, so I could be sure it woudln't try to create a new commit [15:29:07] (PS7) Ottomata: Big commit containing complete hive-partitioner and pagecounts importer. [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 [15:29:13] ok milimetric [15:29:14] woot! [15:29:23] (PS8) Ottomata: Big commit containing complete hive-partitioner and pagecounts importer. [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 [15:31:19] (PS9) Ottomata: Big commit containing complete hive-partitioner and pagecounts importer. [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 [15:31:53] oo I might have changed something that would have broken your thing, interesting. [15:32:00] fixing. [15:33:28] hmm [15:36:30] (CR) Ottomata: "(2 comments)" [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 (owner: Ottomata) [16:18:45] heading over to Fabian's place for the rest of the day, back in a bit [16:18:54] milimetric: check out my review! [17:22:26] http://dumps.wikimedia.org/other/pagecounts-ez/merged/2013/2013-08/ [17:24:49] https://gist.github.com/ottomata/7028878 [17:38:22] ping DarTar [17:38:28] hey [17:38:28] I'm in the batcave if you want to talk [17:38:36] sweet [18:12:15] milimetric: still eating back soon [18:12:20] np [18:12:21] milimetric: if I run a simple query like SELECT * FROM milimetric_pagecounts_daily WHERE page = 'London' LIMIT 1; I get a permission error, can you run it? [18:12:24] i'm in hangout [18:12:46] trying DarTar, sec [18:19:41] hey ottomata, I think people need home folders created in HDFS under /user in order to run jobs [18:19:48] could you please make one for DarTar? [18:19:54] Otherwise they get permission denied [18:20:01] I know you're eating :) [18:35:17] hey ja [18:35:17] so [18:35:27] what we need to do is make sure people can all read from wherever the import is [18:35:31] which we will figure out shortly [18:35:41] anything global I want to import into /wmf/raw [18:35:48] but we can figure out the group perms [18:35:49] prety sure [18:36:46] wrote up some basic instructions here: https://office.wikimedia.org/wiki/Data_access#Hadoop [18:36:56] milimetric, ottomata ^ [18:37:11] oh awesome, thank you [18:37:49] ottomata: I think the error DarTar is getting makes sense. It is denying him WRITE to /user when he runs a simple select [18:38:00] that to me means he needs a home folder [18:38:21] if that's the case, we should probably make it part of the "get people set up with access" flow [18:42:16] oh [18:42:17] yes [18:42:18] totally [18:42:20] he needs a homedir [18:42:25] yeah [18:44:37] try now [18:44:54] milimetric: DarTar [18:45:04] hey [18:45:13] k giving it a try [18:45:43] DarTar, fyi there is also this page [18:45:43] https://wikitech.wikimedia.org/wiki/Analytics/Kraken/Access [18:46:08] milimetric: hangout? [18:46:16] it worketh \o/ [18:46:38] yeah I should puppetize that or something :) [18:46:45] thanks folks [18:46:57] coming ottomata, one sec [18:47:19] ottomata: I'll add/merge the instructions to the office wiki, that's going to be the main go to point for internal folks who need to get data access [18:47:29] (for lack of a better place) [18:47:36] ok cool [18:48:38] I'm stoked that you beat me to this DarTar. :) I'm hoping to pick some hadoop exploration today too. [18:48:43] ha ha [18:49:03] milimetric: just realized that we should try and normalize the project strings so they match the canonical project identifiers, when importing the pv dumps [18:49:41] i.e. the various fr.b, fr.q etc [18:57:21] DarTar: Do you have a link to a good intro to Hive? [18:57:32] hang on [18:57:48] kk [18:57:58] https://github.com/Prokopp/the-free-hive-book [18:58:51] thanks [19:00:31] halfak: check pm [19:42:12] drdee: when you said 64k, did you mean that, or 4k? [19:44:18] (PS10) Milimetric: Big commit containing complete hive-partitioner and pagecounts importer. [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 (owner: Ottomata) [19:44:20] 64k for the entire log line, 4k for the user agent but i checked with the team members and we are not aware of formal limitations to the size of the user agent string [19:44:49] but the filter.c thingie was for the entire logline, and it was 4k [19:44:57] fwiu [19:49:26] ottomata: Are you the person to bug about getting packages installed on stat1? [19:49:39] suurrrre [19:49:43] yeah what's up? stat1? [19:49:55] Hokay. Is IRC fine or is there a more apt channel? [19:50:02] IRC is cool [19:50:15] i mean, RT is the official way [19:50:18] OK I'm looking for "convert" Apt tells me that it lives in the "imagemagick" package. [19:50:22] Snaps: did you get access to any of our servers? [19:50:26] actually, if you make an RT ticket and then just poke me with a link, that would be most ideal [19:50:33] then there's al ittle bit of paper trial [19:50:35] trail [19:50:37] Oh sure. Let me do that. [19:53:01] drdee: nope. I use team ottomata for all my server management needs :| [19:53:11] sensible :D [19:53:32] maybe we upped it to 64k in a different branch :( [19:53:44] anyways ..how about collecting some data on it ? [19:54:56] and take 3 standard deviations plus the average user-agent length as the max length and after that we just cut off? [19:55:59] yep, I think thats a good idea (off the udp pipeline, not kafka) [19:58:04] halfak: ottomata , shouldn't be a big deal, we already install imagemagick in several places in puppet, is this for "misc:statistics::plotting"? [19:58:15] then it would be just a one-liner [19:58:25] or mutante could do it for you too :p [19:58:25] :) [19:58:57] if you tell me an existing class it belongs to, np:) [19:59:23] just don't want to put it on stat1 as a node.. that would be ugly [19:59:31] hmm low latency sql project: http://tajo.incubator.apache.org/ [19:59:36] drdee: ^ [20:01:17] mutante: I can do it, no worries [20:01:26] I'd just add it to misc::statistics::packages [20:01:57] brb [20:03:17] ottomata: review ?https://gerrit.wikimedia.org/r/#/c/90427/ [20:03:49] done, halfak, looks like its done :p :) [20:03:58] mutante: i didn't merge, shall I? [20:04:17] man looks like I am a little trigger happy on the emoticons [20:04:20] yea, i don't see any issues with it, it's already used in all those other places [20:04:25] k [20:04:28] That was fast! Thanks [20:04:55] running puppet [20:05:27] halfak: if you want to request others and feel like it, you can also make patches like that [20:05:45] and then just have them reviewed via gerrit [20:06:54] mutante: Yeah. That would be cool. However, I seem to run in the wrong circles to learn how to do that stuff. Unless there's a guide I'm missing somewhere. [20:08:00] the hard part is usually knowing _where_ to put it, but if there is an existing role that installs packages you usually just have to add the package name to a list [20:08:08] yea, no worries [20:08:11] What's a "role"? [20:08:30] a special class in puppet that defines a certain type of server [20:08:47] we apply roles to "nodes" (the actual servers) [20:08:56] so things can be reused on more than one machine [20:08:57] Where does the config live that I'd be patching? [20:09:11] in the "operations/puppet" repo [20:10:09] there is a directory called "manifests" with a file "site.pp" [20:10:30] that tells you which roles are on which servers [20:10:37] What does "submitting a patch" mean? I thought we were in distributed version control land. [20:10:59] it means uploading it to gerrit.wikimedia.org [20:11:01] for code review [20:11:16] So, we don't use pull requests? [20:11:19] after passing that it's merged [20:11:25] into the actual git repo in production [20:11:32] no, it's not like github [20:11:42] you send something into it and wait for reviews [20:11:48] and then it gets merged.. or not [20:12:42] https://wikitech.wikimedia.org/wiki/Help:Git [20:13:43] halfak: it is kinda like a pull request, except there's no remote repository that your commit goes to before the 'pull request' is created [20:13:55] the gerrit changeset is like a puragtory for a commit [20:13:57] The word "patch" doesn't occur on that page. [20:13:59] before it goes to remote heaven [20:14:25] "changeset" also fails a Ctrl-F test. [20:14:28] https://wikitech.wikimedia.org/wiki/Help:Git#Making_changes [20:14:31] halfak: s/patch/change [20:14:34] see particularly step 4 there [20:14:52] so [20:15:08] a changeset in gerrit, might go through multiple patchsets before a commit makes its way to the remote [20:15:24] if you already sent a change for review [20:15:35] and then you want to amend something (before it's merged) [20:15:43] here we go [20:15:43] you get "patch set 2" [20:15:45] actually [20:15:48] halfak: this page is better [20:15:48] http://www.mediawiki.org/wiki/Gerrit [20:16:56] Seems like an awful lot of trouble to get a package installed. :\ I'm happy for the docs though. Thanks for pointing me to them. [20:17:42] Theoretically, us research folk may make use of gerrit one day. For now, we do code review and merges in the land of round corners and gradients. ... or we don't do code review. [20:17:46] true, as usual setting it all up for the first time is... [20:17:51] but it's just once [20:18:11] don't worry about it a package every couple months is all :) [20:18:16] if [20:18:19] milimetric: done with my meeting, whenever you are back [20:19:51] halfak: convert is now on stat1 [20:22:08] Excellent. It looks like I ran into a dependency issue for converting SVG to PNG. :( I'm looking into it now. [20:23:38] (PS11) Ottomata: Big commit containing complete hive-partitioner and pagecounts importer. [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 [20:26:47] k ottomata, back [20:26:50] jumping into hangout [20:27:06] k [20:29:52] (PS12) Milimetric: Big commit containing complete hive-partitioner and pagecounts importer. [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 (owner: Ottomata) [20:36:44] (PS13) Milimetric: Big commit containing complete hive-partitioner and pagecounts importer. [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 (owner: Ottomata) [21:00:47] (PS14) Ottomata: Big commit containing complete hive-partitioner and pagecounts importer. [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 [21:04:54] (PS15) Ottomata: Big commit containing complete hive-partitioner and pagecounts importer. [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 [21:05:41] (PS16) Ottomata: Big commit containing complete hive-partitioner and pagecounts importer. [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 [21:05:48] (CR) Ottomata: [C: 2 V: 2] Big commit containing complete hive-partitioner and pagecounts importer. [analytics/kraken] - https://gerrit.wikimedia.org/r/89871 (owner: Ottomata) [21:09:20] (PS1) Milimetric: dry run is usually -n [analytics/kraken] - https://gerrit.wikimedia.org/r/90437 [21:09:39] (CR) Milimetric: [C: 2 V: 2] dry run is usually -n [analytics/kraken] - https://gerrit.wikimedia.org/r/90437 (owner: Milimetric) [21:21:51] (PS1) Ottomata: Fixing options, not double logging. [analytics/kraken] - https://gerrit.wikimedia.org/r/90438 [21:22:12] (CR) Ottomata: [C: 2 V: 2] Fixing options, not double logging. [analytics/kraken] - https://gerrit.wikimedia.org/r/90438 (owner: Ottomata) [21:39:08] (PS1) Milimetric: verbose mode [analytics/kraken] - https://gerrit.wikimedia.org/r/90442 [21:46:29] (CR) Ottomata: [C: 2 V: 2] verbose mode [analytics/kraken] - https://gerrit.wikimedia.org/r/90442 (owner: Milimetric)