[01:06:37] halfak: around? [03:54:26] Hey ori. Just got back. [04:00:33] halfak: heyo. i enabled module storage across the cluster and was going to send the announcement soon [04:00:59] is there anything else i should mention other than the 100ms figure? [04:02:36] Sadly, no. I haven't had time to sit down with the docs again. [04:02:43] ori: ^ [04:02:56] I was planning to get to that on Friday. Is that too late? [04:04:13] yeah, I enabled it, so I have to send something :/ Is the number for Chrome something I could generate myself? I'm cherry-picking a stat that I hope is especially positive, but I don't think there is anything wrong with that if I say so outright, right? [04:08:28] I want to spend some more time with that before I sign off. [04:08:44] I'll load up my R environment in a sec. [04:10:39] Or maybe a minute. This dataset is large. [04:10:43] :D [04:26:30] I don't remember this taking 20 minutes the last time I tried to pull it in. [04:29:17] halfak: are you loading a local copy or fetching across the network? [04:31:25] over network. I almost always load over sshfs for security, but I'm considering rsyncing it. [04:35:29] the data has very low privacy implications, i think it's ok to just python -m SimpleHTTPServer on stat1 and wget on your pc [04:35:42] if the user-agents were scrubbed it would probably meet the criteria for public release [04:36:19] Meh. Just got it. [04:36:42] Scoping out chrome's load distribution to convince myself that I believe it. [04:36:55] it = the large difference [04:39:07] * ori nods [04:51:14] OK. I know what I'm looking at. Can you tell me what you'd like to be able to say? [04:51:44] It looks like chrome performs well (could be related to the bandwidth of chrome users). [04:52:33] But I don't see a substantial difference to the benefit that chrome received over other browsers. [04:53:37] ah, okay. I thought it may be substantial. I suppose if it's not an interesting datapoint then there's nothing recommending it for a special mention [04:54:05] I think the real effect is due to the bandwidth of chrome users. [04:55:38] I'll email the table that I think tells the story for your reference. [04:56:28] OK! I really appreciate that, halfak [05:36:39] halfak: thanks again. good night [14:56:32] I guess no standup today... [14:58:27] csalvia: but why ? [14:58:56] oh , people are at the architecture summit [14:59:06] We can have standup without milimetrics :-) [15:00:03] it's just not the same [15:01:08] coming! [16:13:43] qchris_away: when/if you come back, could you help stefan and I try a query? [16:13:54] orrr [16:13:55] actually [16:14:00] csalvia, you could help us just as well [16:14:12] we just need someone who is not me or stefan to try to run a hive query [17:03:24] ottomata: Sure. [17:03:30] What do I have to do? [17:03:49] ok, log into analytics1026 [17:04:01] run hive [17:04:02] hive [17:04:14] and just run this query [17:04:15] https://gist.github.com/ottomata/8579834 [17:04:44] on which host? [17:05:02] ottomata: ^ [17:06:35] qchris: analytics1026.eqiad.wmnet [17:06:46] Ok. Trying ... [17:07:28] java.io.FileNotFoundException: /home/spetrea/sanitizing.py (Permission denied) [17:07:30] average: ^ [17:07:46] oh [17:07:58] hmmm yeah spetrea can you make that file world readable? [17:08:07] average [17:08:08] ^ [17:08:15] omg [17:08:16] moment [17:08:54] qchris: cp /tmp/sanitizing.py /home/qchris/ [17:09:07] hmm [17:09:11] no no [17:09:16] wait wait [17:09:21] ok try nw [17:09:23] try now [17:09:40] I've chmod 777 /home/spetrea/sanitizing.py [17:09:42] so it should be ok now [17:10:18] To late ... I started with the /tmp one when you said "try now" [17:10:26] :D [17:10:31] Retrying with the one in your home now [17:11:52] Both jobs still running ... [17:12:51] Does it make sense for me to wait for the result ... or are they long-running jobs? [17:13:57] long-running [17:14:03] Ok. Thanks. [17:14:06] can you put them in a screen ? then you can just leave them there [17:14:24] I just leave them running. It's ok. [17:14:33] But I do not wait for the output then :-) [17:14:36] qchris, can i kill your first one? [17:14:51] Like Ctrl-C? [17:14:55] Or some other method? [17:14:56] yarn application -kill [17:15:22] ctrl C will just kill the submitting process [17:15:27] not the job in hadoop [17:15:28] you could do it too! [17:15:29] yarn application -kill application_1387838787660_0926 [17:15:30] :0 [17:15:43] haha, spetrea [17:15:44] done. [17:15:47] qchris is running this job just fine [17:15:49] it is just you! [17:16:01] average :) [17:16:01] hah [17:16:09] hive doesn't like me [17:16:14] I like you! [17:16:18] :) [17:16:21] :)) [17:16:22] Meh... I'm not hive :-( [17:16:30] I'm not good for anything [17:16:31] :-((( [17:16:40] hehe [17:16:43] qchris: you don't have to wait for the result [17:16:47] this is what it did for me too [17:16:51] it just fails for average [17:17:15] That's not ok from hive! [17:17:23] Hey hive... We're all equal! [17:17:30] Well ottomata is root ... [17:17:48] Anarchy! [17:18:09] "all men are created equal but some are more equal than others" [17:18:38] -- some person sometime somewhere [17:19:02] j/k [17:19:03] hmm [17:19:51] spetrea@analytics1026:~$ groups qchris [17:19:51] qchris : wikidev stats [17:19:51] spetrea@analytics1026:~$ groups otto [17:19:51] otto : wikidev stats [17:19:51] spetrea@analytics1026:~$ groups spetrea [17:19:54] spetrea : wikidev stats [17:19:56] we're all in the same groups [17:21:27] trying to find out what hdfs groups I'm in [17:23:08] ottomata: how can I find out what hdfs groups I'm in ? [17:23:17] apparently there is no "hdfs dfs -groups spetrea" [17:24:28] hdfs groups spetrea [17:24:55] they are the same groups as you are in on the namenode [17:24:58] analytics1010 [17:25:03] which should be the same everywhere [17:25:09] (on analytics nodes) [17:25:31] ottomata: is there a list of all the nodes in the analytics cluster ? [17:26:23] I mean the hadoop cluster [17:28:10] yeah i suppose, um [17:28:17] you can click on the nodes link in the 8088 interface [17:28:23] or really [17:28:27] analytics1010-analytics1020 [17:28:28] is it [17:28:59] ottomata: I found this GNU parallel thing, so I'm gonna use it for this [17:29:04] it's a bit silly but I'll try [17:31:53] to do what? [17:38:08] for X in $(seq 10 20); do echo "== analytics10$X =="; ssh -oStrictHostKeyChecking=no spetrea@analytics10$X.eqiad.wmnet "/usr/bin/hdfs groups spetrea" 2>/dev/null ; done [17:38:28] supposedly if I piped that through like big-loop | parallel [17:38:41] it would do all the stuff in parallel.. if it would work, but it doesn't [17:38:45] so I just run it like this [17:41:48] so apparently I have the same hdfs permissions on all of them [17:41:50] uhm [17:42:24] ottomata: what if I backed up my stuff and then you wiped out my user and then re-created it ? [17:43:00] you can use dsh [17:43:01] its real easy [17:43:02] but [17:43:08] but yeah [17:43:09] average [17:43:16] that command doesn't actually run on a particular host [17:43:23] hdfs is a client interface for hadoop [17:43:25] it talks to the namenode [17:43:39] oh , damn I forgot, hdfs is the same thing on all of them [17:43:42] yeah [17:44:11] hmm, average, yeah we can do that, i dont' want to do it for your cli user [17:44:15] but i can do it for your hdfs user if you like [17:44:21] basically just wipe your home directory [17:44:24] in hdfs [17:44:31] and your tmp dir maybe too [17:44:43] yeah, that'd be cool [17:44:55] ok, you ready for me to do that [17:45:02] i can delete your /user/spetrea directory? [17:45:07] yes [17:47:09] ok done [17:47:12] i guess try again average? [17:47:21] trying [17:53:16] ok, job started, no errors so far, mappers at 15% , still running [17:53:32] will get back with result [18:08:19] ottomata, average: The job ran through for me without problems. I guess we're interested in the output of the query, are we? [18:08:30] s/interested/not interested/ [18:10:55] naw [18:10:58] qchris: you got 5 integers right ? [18:11:01] i've already run it successfully [18:11:12] Yup. 5 integers. [18:11:25] qchris: non-zero integers ? [18:11:32] average: yes. [18:13:06] ok [18:14:58] would someone mind just +2ing https://gerrit.wikimedia.org/r/#/c/108633/ when they have a moment? ;) [18:18:24] milimetric: Hi. [18:18:36] hi Gloria [18:19:03] milimetric: Can I assign https://bugzilla.wikimedia.org/show_bug.cgi?id=42259 to you? :-) [18:20:03] yes Gloria, but probably the better person to talk to is Toby Negrin [18:20:30] so, the analytics team is working actively on this [18:20:42] but basically, very roughly, the idea is this: [18:20:58] the data behind stats.grok.se is inaccurate for many reasons [18:21:07] it lacks mobile, doesn't count /w/index.php views, etc. [18:21:29] and there are no plans to fix these problems in webstatscollector (which creates that data) [18:21:29] milimetric: Perfect is the enemy of the good. [18:21:44] definitely, agreed [18:22:09] All stats suck. [18:22:54] basically, the way we're tracking this is here: https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/113 [18:23:15] and the history has been: we've sucked at prioritizing this [18:23:39] so long story short, we're prioritizing this as basically our #2 priority [18:23:48] there's a current effort to enhance wikimetrics to be more generic [18:24:37] so immediately after that and probably before we finish that, we'll be focusing on pageviews [18:24:48] so right now, we have the full unsampled log of pageviews in a database in prod [18:24:59] we have to 1. anonymize the IP addresses and 2. aggregate [18:25:29] so Gloria, assign it to me, and bug me as often as you like, but that's the rough plan [18:25:58] Already assigned. ;-) [18:26:06] Fuck you, Mingle. [18:26:28] Where is wikimetrics? [18:27:38] https://meta.wikimedia.org/wiki/Meta:Ma%C3%B1ana are my notes on the history of analytics, BTW. [18:27:39] metrics.wmflabs.org [18:27:52] and I agree with you on Mingle, I'm actively trying to find a way to replace it at the foundation [18:28:03] heh, thanks ;) [18:28:13] indeed; Mingle is terrible. [18:28:20] (don't tell fundraising I said that, they love it.) [18:28:23] so, to be clear, all I've said above is *the way I see things* [18:28:30] not in any way official position of the analytics team [18:28:35] which is controlled solely by Toby [18:28:40] :D [18:28:41] :-) [18:29:58] nuh uh [18:30:03] qchris has serious veto powers [18:30:07] we all heard them that one time [18:30:09] Hahahaha :-P [18:30:10] :) [18:30:24] we all have veto powers [18:30:28] and we all have strong opinions [18:30:32] as I've shown above [18:30:33] I won't veto milimetric if he says how he sees things. [18:30:44] but Toby controls the "official" [18:30:51] thanks qchris [18:35:29] Ironholds: i can't just +2 that [18:35:44] soryr, i will add to review my thoughts [18:47:15] milimetric: Nemo says you're a dirty cookie-licker. [18:47:18] For the record. [18:47:34] * Nemo_bis slaps Gloria  [18:47:41] Now the other cheek. [18:47:45] * Nemo_bis slaps Gloria  [18:47:56] ty [18:48:00] no more? :o [18:48:17] It's early still. [18:48:20] what a sober Gloria today [18:51:12] it worked !!! [18:51:14] it worked !!!!! [18:51:20] it Wooowowowoowooorked ! [18:51:27] \o/ [18:52:00] \o/ :) [18:52:09] I'm confused [18:52:14] But I'll slap both of you if you don't behave [18:53:50] ottomata, replied :) [18:55:24] Ironholds: thanks [18:55:40] yeah i think installing a client on stat1002 will be at least easier from a thought level standpoint [18:55:42] np! Thanks for being your usual helpful self. [18:55:47] i.e. if they say yes, we just do it [18:55:55] da; it makes sense as the long-term end goal, and if it can be done in the short-term, yay [18:56:07] otherwise, we have to think more broadly as a team as to what we want to install on the cluster [18:56:11] my worry is that if they drag their heels I'll end up with grumpy mobile people asking for session data and/or their head. [18:56:16] *my [18:56:23] ha yeha [19:00:50] but that's a problem for if and when they drag their heels ;p [19:02:02] maybe i'll submit a patch for it :p [19:02:09] to help push it along [19:35:08] ok nuria [19:35:09] hiiii [19:35:12] wanna hangout and try some stuff?