[08:13:01] (CR) Nuria: "Retested on dev and things are working with cohort, report and symlink creation" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/143040 (https://bugzilla.wikimedia.org/66087) (owner: Milimetric) [08:13:13] (CR) Nuria: [C: 2] Add pretty symlink for WikimetricsBot [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/143040 (https://bugzilla.wikimedia.org/66087) (owner: Milimetric) [08:13:24] (Merged) jenkins-bot: Add pretty symlink for WikimetricsBot [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/143040 (https://bugzilla.wikimedia.org/66087) (owner: Milimetric) [08:16:05] (CR) Nuria: "I actually think we should not merge these changes until we have done some load testing and know the report rate creation we can sustain." [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/142007 (https://bugzilla.wikimedia.org/66841) (owner: Milimetric) [10:19:46] all right qchris: newly register metrics runs on dev work well: https://metrics-dev.wmflabs.org/static/public/datafiles/NewlyRegistered/ [10:20:19] k [10:20:42] "502 Bad Gateway" [10:20:56] nuria: ^ is what the above url gives for me. [10:21:16] really? [10:21:20] yup. [10:21:21] works like a charm [10:21:27] for me [10:21:36] seems labs avilability issue [10:21:43] "labs availability [10:21:58] Wait ... it works in wget, but not in firefox. [10:22:12] juas [10:22:40] Now it works in firefox. ... Maybe a nginx hiccup? [10:23:18] ya [10:23:37] i suspect the load we can sustain there is nothing [11:23:20] morning qchris [11:23:29] so, thanks so much for fixing labs [11:23:30] milimetric: Hi [11:23:38] yw [11:23:50] secondly, I was going to test my oozie flow and coordinators [11:24:01] Andrew said CDH5 should be up and running [11:24:08] and I know you were going to look at labs [11:24:21] I read him saying that you can use his "e" cluster. [11:24:30] yea [11:24:48] so of course I'm not sure what that means and I was thinking you did :) [11:25:31] So you if you wanted to test oozie, just log in into one of the instances, like "hadoop-e-worker0" [11:25:44] and you can use that if it were a machine of our production cluster. [11:25:56] Hive & Co should be available. [11:26:02] k, trying [11:26:10] IIRC he suggested an instance to use ... let me read the backlog. [11:29:07] Cannot find it ottomata's suggestion ... but "hadoop-e-worker0" has client roles for oozie and hive. So that instance should work. [11:41:57] argh, qchris: can I ask you questions when I get stuck? or would that be annoying at the moment? [11:43:27] milimetric: Sure. Ask questions. Not sure though if I can help. But first ... lunch. [11:43:31] Sorry. [11:51:27] :( sigh, this is super sad, but I give up, I can't find it [11:51:33] I need the create table for webrequest [11:51:39] but the cluster's down so I can't get it there [11:51:54] and I searched all over puppet and couldn't find it there [11:57:06] milimetric: if you're looking for cdh5 on puppet, you should look into the cdh5 submodule, it's not in the main repo [11:57:25] oh, I know, i looked through: [11:57:37] operations/puppet/cdh (new version of ../../cdh4) [11:57:46] operations/puppet/varnishkafka [11:57:49] operations/puppet/kafka [11:58:10] but yeah, those were all long shots as I wouldn't expect this to be in there [11:59:10] i would just always grab it off the cluster so now that it's down I'm all confused [12:00:05] * milimetric doesn't understand ops work, he thinks it's all magic and sorcery [12:11:50] closest I got so far: [12:11:51] # We are only using hive-partition to add partitions to the pagecounts table. [12:11:51] # The webrequest table is using Oozie. [12:11:51] $tables = 'pagecounts' [12:42:34] milimetric: Are you sure the webrequest table is created through puppet? I thought this was done by hand ... [12:42:43] qchris: no clue :) [12:43:00] I was hoping it was done in some way I could find, so I can get the schema [12:43:03] So if you need the table in a labs cluste, just create it by hand. [12:43:15] right, but I don't know the schema [12:43:24] For the oozie stuff, you only need some columns. [12:43:42] mmm, you're saying just ignore the other stuff.... mmm, ok [12:44:06] I'd do it that way ... but then again ... maybe I am just sloppy. [12:45:49] milimetric: I know that https://wikitech.wikimedia.org/wiki/Analytics/Kraken/Hive/Schemas [12:45:58] is outdated, but it might serve as start. [12:46:27] (outdated, because at least the webrequest_source column is missing) [12:47:13] no, I'm dumb, I can just create the columns I need to test and not be nitpicky about stuff that doesn't matter [12:48:35] http://dpaste.com/3YKS2ZP [12:48:50] ^ is the lasted version I have locally. It is from 2014-05-29 [12:49:14] I assume that it is still accurate (except the location setting) [12:49:17] milimetric: ^ [12:49:41] thanks :) [12:50:11] yw [13:22:35] qchris: I get permission denied trying to insert to a table in hadoop [13:22:52] also - hive syntax for inserting ad-hoc is insane :) [13:23:11] As which user? "milimetric"? [13:23:16] yes [13:23:17] Or "hive" "hdfs". [13:23:19] Ok. [13:23:26] should i become another user? [13:23:35] Not sure ... might be. [13:24:10] Let me check the labs instance ... [13:26:52] ok qchris, hdfs user works [13:26:55] (sudo su hdfs) [13:26:58] ok. [13:27:00] so last problem [13:27:15] inserting ad-hoc in hive requires at least one row to exist in at least one table somewhere [13:27:27] is there an easier way other than creating an external table? [13:27:31] yup. That sucks big time. [13:27:42] I typcially create my own database. [13:27:49] Create a simple table on that database. [13:27:51] right, i created wmf [13:27:54] then webrequest in there [13:27:55] With one column. [13:28:02] Add a single row to that table. [13:28:03] oh ok, and you make that table external [13:28:17] And then, you can use that table to create into another table. [13:28:25] how do you "add a single row" [13:29:04] insert either from a local file, or select from a different table (hence the single row-ed separate table I typically create) [13:29:22] (you can also insert from a file on hdfs, but that's even stranger) [13:30:36] Ha ... Stack Overflow even suggests the same thing: http://stackoverflow.com/questions/17425492/hive-insert-query-like-sql [13:31:17] yea, I'm not sure how to do any of those things (insert from file or insert from file on hdfs) [13:31:36] select from a different table isn't possible because there is no other table on the cluster [13:32:19] Ok. I'll bootstrap the table. [13:33:24] oh no, i see the SO answer has example [13:33:36] i got it, sorry i didn't see - the cool answer is buried down [13:33:43] qchris: ^ [13:33:52] Ok. [13:39:22] milimetric: Going through my test scripts I found a line that matches the schema I dpasted before. [13:39:31] cool [13:39:36] You should be able to "load data local" from /tmp/logline.txt on [13:39:43] hadoop-e-worker0 [13:39:52] (Just copied the line there) [13:40:14] (It is cleaned from PII, so just go wild with it) [13:56:00] so weird [13:56:14] i put demo.txt in /hadoop-e/user/milimetric/ and it won't work from there [13:56:15] trying yours [13:56:28] FAILED: SemanticException Line 1:17 Invalid path ''demo.txt'': No files matching path hdfs://hadoop-e/user/hdfs/demo.txt [13:56:38] hdfs@hadoop-e-worker0:/home/milimetric$ hdfs dfs -ls /hadoop-e/user/milimetric [13:56:38] Found 1 items [13:56:38] -rw-r--r-- 3 hdfs hadoop 2 2014-07-04 13:38 /hadoop-e/user/milimetric/demo.txt [13:57:25] hm, yours doesn't work either, I must be don't something wrong qchris: [13:57:26] hive (wmf)> load data inpath '/tmp/logline.txt' into table webrequest; [13:57:26] FAILED: SemanticException Line 1:17 Invalid path ''/tmp/logline.txt'': No files matching path hdfs://hadoop-e/tmp/logline.txt [13:57:59] You're missing the "local" [13:58:09] ' You should be able to "load data local" from /tmp/logline.txt on' [13:58:21] Without it, hive tries to load it from hdfs. [13:58:50] I'll jump into standup meeting, and afterwards, we get the table bootstrapped. [14:31:26] qchris: what's localhost stand for in: [14:31:27] oozie job -oozie http://localhost:8080/oozie -config job.properties -run [14:31:44] typically 127.0.0.1. [14:31:57] Ubuntu has something slightly different in /etc/hosts IIRC. [14:32:00] i mean, in our case, is oozie on local:8080? [14:32:53] Ahm. Not sure. [14:33:25] Telnet fails to connect on localhost:8080 [14:33:32] yeah, apparently nothing's serving there [14:33:41] hm, so where do we think oozie is here... hmmm [14:34:22] hadoop-e-master0 is oozie server. [14:34:25] milimetric: ^ [14:34:34] ah! [14:35:05] But ... 8080 for oozie does not sound plausible. [14:35:45] Default for "-oozie" seems to work just fine. [14:35:51] So I'd not specify it. [14:35:57] oh really? [14:35:58] dammit [14:36:15] I just ran"oozie jobs" and itgave me nice output. [14:36:22] sorry then [14:36:26] no worries. [14:37:47] hm, no, -oozie command needs it: [14:37:48] milimetric@hadoop-e-master0:~$ oozie job -oozie -config refinery/oozie/webrequest/sequence_stats/workflow.properties -run [14:37:48] Invalid sub-command: Missing argument for option: oozie [14:37:48] use 'help [sub-command]' for help details [14:37:48] milimetric@hadoop-e-master0:~$ oozie job -oozie http://localhost:8080/oozie -config refinery/oozie/webrequest/sequence_stats/workflow.properties -run [14:37:48] Error: IO_ERROR : java.net.ConnectException: Connection refused [14:38:58] Sorry, I was too value :-( You gave "-oozie" but no value for it. What about not even passing "-oozie" [14:39:06] The oozie will use the default. [14:39:27] So something like: oozie job -config refinery/oozie/webrequest/sequence_stats/workflow.properties -run [14:42:04] yeah, no i'm an idiot [14:42:05] it's official [14:42:38] No. The hadoop & ecosystem CLI tools suck. Really. [14:42:57] Every tool is a bit different. [14:43:05] :-( [14:47:24] hm, well maybe we can help improve them at some point in the distant future, as we've certainly felt the pain [14:47:39] so - now the job didn't error [14:47:42] but it's just sitting there hanging [14:47:46] Sounds great \o/ [14:47:59] Oh wait ... hanging :-D [14:48:02] that can't be right, i mean there's 2 records in the table :) [14:48:24] For the period of time we're interested in? [14:48:35] no, like i put only 2 records in the whole table [14:48:39] :D [14:48:59] so something's not working if the job's just taking like 10 minutes to start [14:49:09] it doesn't show up in `oozie jobs` yet [14:49:33] That's what we love oozie for. [14:49:50] yep. ok, it's clear to me I'm not getting any progress on this. I'm going to switch to wikimetrics [14:49:58] Haha :-) [14:50:00] k. [14:50:07] thanks for all the great help though [14:50:17] Hey ... you could help me in return ... [14:50:25] Can you see instances on: [14:50:38] https://wikitech.wikimedia.org/wiki/Special:NovaInstance [14:50:54] All projects look empty for me since some hours. [14:52:26] I guess I'll just ask in the labs channel [15:08:16] qchris: yes, saw the same thing [15:08:32] Ok. Thanks. [15:08:41] ah! ping me I got all excited about being productive again :) [15:09:00] Oh .. wait "saw� ... does it mean it's back to normal now for you? [15:09:32] yes qchris, sorry [15:09:40] i was t rying to read in labs if you already got it figured out [15:09:46] i'll answer there for others' benefit [15:10:11] ok. please do. Thanks. [15:18:07] oh, gerrit question: [15:18:27] I referenced a commit the other day from analytics/wikimetrics in a ops/puppet/wikimetrics change [15:18:38] and you reverted it I think and connected them some other way? [15:18:43] or am I misremembering [15:19:01] I'm about to do the same thing, and just wanted to ask first [15:19:03] Sorry ... I do not recall. Do you have urls for me? [15:19:12] it's not important, but if I have: [15:19:21] wikimetrics change <-- should reference --> puppet change [15:19:29] I'm about to just put that in the commit message of each one [15:19:34] is that bad? [15:19:53] Not necessarily. [15:20:23] Using sha1 of other commits in a commit message is fine if the things are merged already. [15:20:33] Then git tools can find them automatically. [15:20:43] I was going to just use a link to the gerrit repo [15:20:55] *gerrit change [15:21:01] But for things that are not yet merged, I'd rather use change numbers, because a rebase changes tho SHA1 but does not change the change number. [15:21:14] ok, so change number but not link to change? [15:21:25] like not https://gerrit.wikimedia.org/r/#/c/142007/ but Change: 142007 [15:21:26] ? [15:21:30] holaaa [15:21:33] hi nuria [15:21:33] No, use the link. [15:21:40] oh ok, will do, thanks! [15:21:50] Hi nuria [15:22:04] hello, i though milimetric was on va-ca-ti-on [15:22:21] on holiday, rather [15:23:00] (PS4) Milimetric: Remove limit on recurrent, add throttling [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/142007 (https://bugzilla.wikimedia.org/66841) [15:23:10] :) [15:23:54] qu'est-ce que c'est vacation? [15:24:20] no, just kidding, I wanted to at least try to work on the oozie stuff [15:24:23] juas [15:24:25] I failed miserably but I at least tried [15:24:41] but! nuria: perfect timing [15:24:48] i can take the oozie stuff or do we need andrew for that [15:24:49] you said yesterday that "365" was kind of arbitrary [15:24:50] I agree: https://gerrit.wikimedia.org/r/#/c/142007/ [15:25:09] the last patch adds a configuration setting, and I added it to puppet just now [15:25:43] k let me look at it [15:25:49] i think we need andrew nuria, oozie refuses to accept my job [15:50:41] mmm.. i get a bunch of integrity errors when trying to create that many reports (100) on vagrant, let me see if i can see what is going on [15:52:57] I tested batches of 60 ealier on today w/o issues though [15:54:26] weird [15:54:28] what's the error? [15:54:49] Problem creating child report: (IntegrityError) (1062, "Duplicate entry '23-2014-06-02 00:00:00' for key 'uix_report'") 'INSERT INTO report (created, user_id, queue_result_key, result_key, status, name, show_in_ui, parameters, public, recurrent, recurrent_parent_id) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)' (datetime.datetime(2014, 6, 2, 0, 0), 1L, None, None, 'PENDING', None, 0, [15:54:49] '{\n "cohort": {\n "id": 8, \n "name": "wiki", \n "size": 0\n }, \n "metric": {\n "name": "NewlyRegistered", \n "end_date": "2014-06-02 00:00:00", \n "start_date": "2014-06-01 00:00:00"\n }, \n "recurrent": false, \n "public": true, \n "name": "wiki - NewlyRegistered"\n}', 1, 0, 23L) [16:05:53] yeah, that index is on recurrent_parent_id and created, so it sounds like a real problem. But I don't know how it got in that state so it's hard to tell. Maybe the new logic doesn't properly select the old already-finished runs [16:06:03] what happens when you: [16:06:26] select (* minus parameters) from report where recurrent_parent_id = 23; [16:16:49] milimetric, all reports look to be there ... [16:17:44] right, so it shouldn't try to run any more, and it sounds like your instance IS trying to run more [16:17:55] I'm finishing up a quick patch and I'll take a look [16:20:08] also "Problem creating child report: QueuePool limit of siz [16:20:08] e 5 overflow 10 reached, connection timed out, timeout" [16:20:22] this hhapens when trying to create ~90 reports on vagrant [16:20:26] (PS1) Milimetric: Fix test that adds existing tag [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/144159 (https://bugzilla.wikimedia.org/66671) [16:20:31] Analytics / Wikimetrics: test_add_new_tag_utf8 fails - https://bugzilla.wikimedia.org/67166#c2 (Dan Andreescu) NEW>RESO/INV I couldn't replicate this, and I may have been a bit confused when it was first reported, because of another bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=66671. Eith... [16:20:40] up to that all works welll [16:21:30] huh, ok, so nuria, here's the steps I'll try to reproduce, let me know if I'm missing anything: [16:21:38] rebuild vagrant [16:21:44] (hold on let me start that) [16:22:04] you know, the thing is that dev is a better testing ground [16:22:41] cause in vagrant limits are going to be different [16:23:06] so i would test intervals there ( i alredy created batches of 60 in dev w/o issues) [16:23:39] i changed the queue.yaml in dev to execute on 'debu' mode ( and thus the scheduler runs more often) [16:23:47] sorry, debug mode [16:24:06] we can try the logic there [16:26:05] ok, I just got done rebuilding vagrant [16:26:13] seriously, I have no idea how you guys like this :) [16:26:23] it triplicates the work I have to do every time I touch it [16:26:26] anyway [16:26:40] well, nuria, why don't we test in staging? [16:26:57] sure, it's the same [16:27:03] ah, vagrant [16:27:06] 's broken anyway: Could not parse options: invalid option: --hiera_config=/tmp/vagrant-puppet-1/hiera.yaml [16:27:08] right? dev and staging are identical virt machines [16:27:22] well, but you already did some other stuff in dev [16:27:32] i guess we could just wipe the dbs and reset everything [16:27:36] oh, do you wanna hangout and do it? [16:27:40] in this case vagrant will be missleading [16:27:42] sure [16:27:51] give me 2 mins [16:33:35] ok, reday [16:33:39] *ready [16:57:47] Analytics / Wikimetrics: Wikimetrics needs to read config files /etc/wikimetrics/ rather that files on local repository by default when setting up the config - https://bugzilla.wikimedia.org/67542 (nuria) NEW p:Unprio s:normal a:None Wikimetrics needs to read yaml config files in /etc/wikim... [17:42:43] (PS5) Milimetric: Remove limit on recurrent, add throttling [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/142007 (https://bugzilla.wikimedia.org/66841) [17:57:04] (PS6) Milimetric: Remove limit on recurrent, add throttling [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/142007 (https://bugzilla.wikimedia.org/66841) [18:04:01] Analytics / EventLogging: Fix per schema counts in graphite - https://bugzilla.wikimedia.org/67073 (Dan Andreescu) PATC>RESO/FIX [18:04:16] Analytics / Wikimetrics: Story: CronUser runs EEVS metrics on all projects daily - https://bugzilla.wikimedia.org/65946 (Dan Andreescu) PATC>RESO/FIX [18:04:30] Analytics / Wikimetrics: Path to recurrent report (from dashboard) should have better semantics - https://bugzilla.wikimedia.org/66087 (Dan Andreescu) PATC>RESO/FIX [18:08:16] Analytics / Wikimetrics: Story: CronUser backfills NewlyRegistered User for recurring reports - https://bugzilla.wikimedia.org/66841#c3 (Dan Andreescu) PATC>RESO/FIX tested in staging, most everything looking good, but we did find a performance issue we're going to follow up on with: https://bugz... [18:08:18] Analytics / Wikimetrics: Performance of Recurrent Reports - https://bugzilla.wikimedia.org/67543 (Dan Andreescu) NEW p:Unprio s:normal a:None While testing how many instances of recurring reports we could spawn as part of the daily run, we discovered a lack of understanding around wikimetri... [18:08:48] hey look at that we're on-track: http://sb.wmflabs.org/t/analytics-developers/2014-06-26/ [19:14:03] magically... [19:34:38] [travis-ci] develop/f709bb6 (#176 by nuria): The build passed. http://travis-ci.org/wikimedia/limn/builds/29165039