[13:51:45] milimetric: Am I right to assume that there is no standup today? [13:52:06] qchris: I'm working and I'll be at standup [13:52:12] Ok. [13:52:21] but since nuria is not around, I'm fine if you skip [13:52:30] of course, it's also an international holiday [13:52:57] and more importantly - international labor day, so skipping work is almost mandatory, right? [13:53:03] Hahaha. [13:53:07] No worries. [13:53:29] I'll see if the google machine wants to work on labor day. [13:59:26] (PS1) Milimetric: Use One Box instead of s1 [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/130833 [14:14:15] (CR) Milimetric: "A note about the down-side of this: replication lag is good on the One Box now. But as people start hammering it, the replication lag may" [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/130833 (owner: Milimetric) [15:04:46] (PS1) Gilles: Add the 95th percentile [analytics/multimedia] - https://gerrit.wikimedia.org/r/130842 [15:08:31] (PS1) Gilles: Add the 95th percentile [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/130843 [15:14:52] yo qchris [15:14:53] yt? [15:14:57] yup. [15:15:05] Wassup? [15:15:33] thinking about this hdfs_sync thing [15:15:49] and deploying kraken (oozie, hive, scripts. etc) to hdfs [15:16:03] i know we talked about versioning and snapshots etc. in hdfs [15:16:15] just wanted to think about it with you some more [15:16:20] Sure. [15:16:43] Irc / trap? [15:16:57] /wmf/kraken/0.0.2/same/file/hierachy/as/in/kraken/repo [15:16:58] ? [15:17:17] not everything would be synced, of course [15:17:23] but things like the oozie/ dir i suppose would? [15:17:37] not sure if it really makes sense to have an oozie dir... [15:17:44] maybe just oozie xml files in kraken-etl? [15:17:52] We currently have an oozie dir IIRC. [15:17:54] Let me check [15:17:57] we do now [15:17:57] yes [15:18:02] and there are some things in there tha tmight make sense [15:18:06] reusable things [15:18:16] Yes. [15:18:18] (PS1) Milimetric: Fix for May - Total editors [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/130845 [15:18:22] the hive partitioner workflow is going to be reusable [15:18:25] the coordinator probably not [15:18:33] (CR) Milimetric: [C: 2 V: 2] Fix for May - Total editors [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/130845 (owner: Milimetric) [15:18:53] So in which repo would the coordinator live? kraken-deploy? [15:18:59] nono, in kraken repo [15:19:07] kraken deploy just contains artifacts and kraken as a submodule [15:19:10] OK. [15:19:13] we're mailny just taling about kraken right now [15:19:34] 1. how to organize oozie and hive and scripts inside of kraken [15:19:34] and [15:19:34] 2. where to sync/deploy these files into hdfs [15:19:39] maybe based on date? [15:19:47] too bad hdfs doesn't have symlinks :/ [15:19:53] It has :-) [15:19:59] it does...? [15:20:02] But they get disabled in CDH-5.0.0 [15:20:04] :-( [15:20:06] OH pff [15:20:17] Hadoop 10200 or something. [15:20:34] whoa this jira is long [15:20:44] https://issues.apache.org/jira/browse/HADOOP-10020 [15:20:55] (Sorry, got a 0 and a 2 flipped) [15:20:58] and old [15:20:58] hm [15:21:04] oh i'm looking at the one that adde dit [15:21:20] ah rats [15:21:46] Who wants to update CDH5 (and disable symlinks)? [15:21:52] me! [15:21:55] oh wait [15:21:56] yes me! [15:22:01] :-D [15:22:09] i don't see an hdfs dfs -ln command [15:22:23] oh well, anyway [15:22:23] It is not exported to the cli IIRC. [15:22:27] let's say we don't have it :) [15:22:28] ah ok [15:22:37] so...how to sync to hdfs? [15:22:38] Ok. No symlinks. [15:22:58] /wmf/kraken//...everything we need here in the same hierarchy? [15:23:15] Looks good. [15:23:22] since there is no hdfs 'rsync', the way i've been syncing is just -rm -R && -put [15:23:51] hmmm..? https://github.com/alexholmes/hsync [15:23:52] "rm -R" would leave us in a bad state for a very short amount of time. [15:24:04] Cannot we use "force put" [15:24:47] * qchris looks at hsync [15:25:44] I am probably just stupid ... but where is the code. [15:25:52] I can only see the README.md [15:25:56] oh ha [15:25:56] uhhhh [15:26:07] hahah [15:26:10] iTS JUST A README [15:26:13] GOOGLE BAIT [15:26:15] :-D [15:26:18] Hahahaha [15:26:31] But a good readme :-) [15:26:39] yeah! [15:26:50] basically rsync --help > README.md [15:26:50] haha [15:27:48] Did you try distcp? [15:29:36] no... [15:29:51] The hsync readme links to it. [15:30:13] "hadoop distcp" [15:30:24] hm, that's within hdfs though, right? [15:30:25] is available on hadoop-master0 [15:31:00] not local fs -> hdfs [15:31:08] Oh. [15:32:09] Well ... cannot we keep just (force) putting the files to hdfs? [15:32:41] hey kevinator [15:32:49] is howie in the office? [15:32:54] Hi [15:33:02] hello [15:33:06] I’m still at home :-) I don’t know [15:33:14] kk [15:33:30] oops, right, it’s early in SF [15:34:31] yeah, we can do that qchris, and that is an argument for keeping all oozie and hive stuff in their own directories [15:34:42] then there is just a small list of directories that need force put [15:35:45] What would be the alternative to keeping oozie and hive stuff in their own directories? [15:41:23] kraken-etl/oozie [15:41:32] like, oozie configs for particular things [15:41:38] kraken-etl/webrequest/{oozie,hive} [15:41:39] dunno [15:42:14] Whatever hierarchy we pick will be bad for some aspect. [15:42:35] As we are versioning the files, it should not be much of a problem. [15:42:54] We can relayout the files (if we need to) with the next version, and we're done. [15:43:45] The separation did not seem to cause too much problems up to now, so I'd just stick to it for now, and change upon need. [15:44:41] hm ok [15:44:42] ok cool [15:44:58] so /wmf/kraken//... [15:45:07] ARE YOU TRAPPING ME?! [15:45:11] IT'S A TRAP! [15:45:13] yes. [15:45:19] and, i'll make hdfs_sync copy the current DATE to /current/ or something? [15:45:20] maybe? [15:45:27] it'll be a copy, but meh? [15:45:36] Are we sold on the date? [15:45:45] no [22:19:11] (random/late comment on kraken: I was drinking this on Saturday evening. Good stuff! http://www.choosy-beggars.com/wp-content/uploads/2011/08/kraken_bottle.jpg ) [22:28:08] milimetric, does stat1 still exist? [22:35:03] hi haeb [22:35:12] hi [22:35:25] Ironholds: you there? [22:35:39] so james and i are experiencing slow load times on the blog (>30sec), intermittently [22:36:26] and we're wondering if it's a traffic spike (in which case we would need to nag ops to check whether the caching plugin is working properly) [22:37:09] only if there is a quick way to check like the last hour [22:37:15] i can check [22:37:16] * Jamesofur is actually getting a lot of 503's too [22:37:22] let me see [22:37:47] db1047 is not caught up yet, but db1048 (the master) has all the latest data, and iirc so does one of the other analytics databases [22:37:58] i can't share the credentials for db1048, unfortunately, but i can run a query on there for you [22:38:04] i'll just get the count of hits for the last hour [22:38:06] oh right, yes, you said that (503s), i didnt get these though [22:38:46] that would be great [22:39:47] ori, yep [22:39:51] braindead, but present [22:40:59] Ironholds: could you hook up HaeB with access to the analytics slave that has eventlogging data and is caught up? [22:41:14] i am querying db1048, but using rw credentials that i can't share [22:41:55] shouldn't he go through opsen? [22:42:15] i'm not sure what the current protocol is, to be honest [22:43:11] HaeB: [22:43:19] mysql:root@localhost [log]> select max(`timestamp`) from WikimediaBlogVisit_5308166 \G [22:43:19] *************************** 1. row *************************** [22:43:19] max(`timestamp`): 20140501224116 [22:43:21] 1 row in set (0.00 sec) [22:43:24] mysql:root@localhost [log]> select count(*) from WikimediaBlogVisit_5308166 where `timestamp` > 20140501214116 \G [22:43:25] *************************** 1. row *************************** [22:43:27] count(*): 190 [22:43:29] 1 row in set (0.48 sec) [22:43:35] so, 190 in the last hour [22:43:45] thanks! [22:43:54] that sounds... odd [22:44:00] HaeB: is the blog hosted on our infrastructure still? [22:44:05] so obviosly the problem is more in the other direction [22:44:07] I've done more refreshes then that [22:44:14] the blog is fubar [22:44:17] yes [22:44:19] i just loaded it, lots of 503s [22:44:28] i'm just chatting with paravoid so i'll poke him about it [22:44:30] (yes to: still wmf-hosted) [22:44:54] ping is very fast btw (64 bytes from holmium.wikimedia.org (208.80.154.49): icmp_seq=1 ttl=52 time=73.9 ms_ [22:49:52] HaeB: a significant portion of requests are 503ing [22:50:17] because the request for the javascript code has to succeed for the eventlogging code to load, it undoubtedly means that we're undercounting [22:51:36] counting only successful views, you mean? [22:52:33] it depends on how you qualify a successful view. my first attempt to load the blog was partly successful: the request for the page itself succeeded but none of the css / js requests did [22:52:38] so it appeared, but unstyled [22:53:00] and no event was logged, because the js code didn't load [23:06:45] Hey analytics [23:06:47] http://multimedia-metrics.wmflabs.org/dashboards/mmv# [23:06:52] Y u no click and drag to zoom [23:06:55] milimetric: ^^ [23:36:35] (CR) MarkTraceur: "Typo in the commit message" (1 comment) [analytics/multimedia] - https://gerrit.wikimedia.org/r/130842 (owner: Gilles) [23:41:55] (CR) MarkTraceur: [C: 2 V: 2] Add the 95th percentile [analytics/multimedia] - https://gerrit.wikimedia.org/r/130842 (owner: Gilles) [23:46:24] (CR) MarkTraceur: [C: 2 V: 2] Add the 95th percentile [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/130843 (owner: Gilles)