[08:59:08] (CR) QChris: [C: 2] Tox configuration to run flake8 (python linter) [analytics/geowiki] - https://gerrit.wikimedia.org/r/134334 (owner: Hashar) [08:59:18] \O/ [08:59:37] qchris: there is a few more patches for geowiki and wp-zero repo. I haven't actually tested the code though [08:59:47] Yes. Finally getting to them. [08:59:53] Sorry for taking me that long :-( [09:01:46] well those changes are not going to change the face of the world and I don't qualify them as being urgent :D [09:03:26] (CR) QChris: [C: 2] Lint setup.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/134339 (owner: Hashar) [09:05:12] (CR) QChris: [C: 2] Replace git URL to git.wikimedia.org [analytics/geowiki] - https://gerrit.wikimedia.org/r/134340 (owner: Hashar) [09:09:51] (CR) QChris: [C: 2] Lint geowiki/process_data.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/134347 (owner: Hashar) [09:12:31] (CR) QChris: [C: 2] Lint geowiki/mysql_config.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/134342 (owner: Hashar) [09:13:56] (CR) QChris: [C: 2] Lint geowiki/geo_coding.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/134345 (owner: Hashar) [09:17:16] bah they don't merge [09:17:18] Zuul is locked [09:17:19] hashar: Zuul page looks unusually busy ... https://integration.wikimedia.org/zuul/ [09:17:21] fixing it [09:17:30] Oh same thought :-) [09:17:31] Thanks [09:17:39] which gives me an opportunity to debug it [09:17:51] hmm no [09:22:30] (CR) QChris: [C: 2] Tox configuration to run flake8 (python linter) [analytics/wp-zero] - https://gerrit.wikimedia.org/r/134314 (owner: Hashar) [09:22:54] (CR) QChris: [C: 2] Lint: allow commenting of code [analytics/wp-zero] - https://gerrit.wikimedia.org/r/134320 (owner: Hashar) [09:42:29] bah [09:42:37] jenkins-bot is not allowed to submit patches :-D [09:43:08] That's something I can change :-) [09:43:33] yeah it is not listed at https://gerrit.wikimedia.org/r/#/admin/projects/analytics,access [09:43:41] can't remember the exact group name though [09:44:03] qchris: JenkinsBot is the Gerrit group [09:44:14] basically me and Zuul ( https://gerrit.wikimedia.org/r/#/admin/groups/14,members ) [09:44:55] (CR) QChris: "recheck" [analytics/wp-zero] - https://gerrit.wikimedia.org/r/134314 (owner: Hashar) [09:45:05] qchris: that will just rerun the jobs [09:45:08] not actually merge them [09:45:20] So I Submit them by hand [09:45:31] I could potentially make it so recheck submit the jobs if they have been +2ed but not sure how to handle the Zuul conf to do that [09:45:32] And see if it works for the new changes? [09:45:36] so yeah, manual submit would do [09:45:41] Ok. [09:46:39] all geowiki ones are in :) [09:46:53] Thanks! [09:47:08] The two that are remaining from the wp-zero ones will need more time. [09:47:14] (PS1) Hashar: Jenkins job validation (DO NOT SUBMIT) [analytics/geowiki] - https://gerrit.wikimedia.org/r/136293 [09:47:41] (The code we use in production there is ahead of master, and I'll have to merge them in) [09:48:24] (Abandoned) Hashar: Jenkins job validation (DO NOT SUBMIT) [analytics/geowiki] - https://gerrit.wikimedia.org/r/136293 (owner: Hashar) [09:48:52] yeah that is fine [09:48:58] I'll clean up the repos and make flake8 pass. [09:49:00] I will be happy to rebase the wp-zero on top of the prod one [09:49:16] you can ignore some flake8 changes by editing setup.cfg [09:49:23] ex: [09:49:23] ; E265 block comment should start with '# ' [09:49:23] ; E501 line too long (X > 79 characters) [09:49:23] ignore = E265,E501 [09:49:28] some are rather annoying [09:49:32] Yup. I saw that one. [09:49:33] Thanks. [09:49:49] the reason I push tox / flake8 is to get people used to tox and running the command [09:50:08] :-) It certainly helped for me. [09:50:15] hoping that it will create some good habit and eventually have devs to run/write tests [09:50:39] That would be great. [09:51:19] one of you repo (I think that is wikimetrics) depends on having a redis server though :-( [09:51:30] Right. [09:52:19] What do you suggest to work around that? [09:52:28] (Tests also need databases) [09:54:15] I have no clue :-] [09:54:27] maybe have the test suite boot a redis server in def setUp() [09:54:33] or mock the responses [09:54:44] That's music to my ears :-) [09:55:02] yeah you taught me about mocking a while back [09:55:17] Hah :-D [09:55:20] python has a rather nice mocking library [09:55:46] Only ... we hardly use it (at least in analytics) [09:55:54] Having mostly legacy systems around ... [09:56:02] those repos are a bit short on testing. [09:59:03] legacy legacy :-( [10:49:01] (PS1) QChris: Lint scripts/restore_from_files.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/136297 [10:49:03] (PS1) QChris: Lint scripts/make_limn_files.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/136298 [10:49:05] (PS1) QChris: Lint geowiki/wikipedia_projects.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/136299 [10:49:07] (PS1) QChris: Lint geowiki/mysql_config.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/136300 [10:49:09] (PS1) QChris: Lint geowiki/geo_coding.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/136301 [10:49:12] (PS1) QChris: Remove unused geowiki/format_output.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/136302 [12:07:25] (CR) Hashar: [C: 1] "Straightforward :)" [analytics/geowiki] - https://gerrit.wikimedia.org/r/136297 (owner: QChris) [12:08:52] (CR) Hashar: [C: 1] Lint scripts/make_limn_files.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/136298 (owner: QChris) [12:09:40] (CR) Hashar: [C: 1] Lint geowiki/wikipedia_projects.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/136299 (owner: QChris) [12:09:59] (CR) Hashar: [C: 1] Lint geowiki/mysql_config.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/136300 (owner: QChris) [12:10:36] (CR) Hashar: [C: 1] "Bye bye unused imports and variables." [analytics/geowiki] - https://gerrit.wikimedia.org/r/136301 (owner: QChris) [12:17:32] (CR) Hashar: [C: 1] Remove unused geowiki/format_output.py [analytics/geowiki] - https://gerrit.wikimedia.org/r/136302 (owner: QChris) [14:01:31] ottomata: Standup :-) [14:01:33] on it [14:01:40] i'm clicking buttons ALREADY [14:07:10] * hashar listens to button clicks [14:09:09] you can't listen to them anymore. they are flat [15:22:09] (PS1) Milimetric: Update column types for logging table [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/136328 (https://bugzilla.wikimedia.org/65944) [15:26:47] (PS2) Milimetric: Update column types for logging table [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/136328 (https://bugzilla.wikimedia.org/65944) [15:35:04] so qchris, milimetric, python question [15:35:13] ya [15:35:24] subprocess.Popen [15:35:31] DOES behave differently with lists vs strings [15:35:33] right? [15:35:38] when you use pipes in your command [15:36:11] no idea, testing [15:36:20] Pipes in subprocess are a problem to me, as you cannot detect when one of the commands in the pipeline failed. [15:36:59] But about what kind of different behaviour are you talking about? [15:37:19] https://gist.github.com/ottomata/c11b4aea1bcb933f381e [15:37:55] btw ottomata, if you want to see details on something in ipython, add ?? [15:37:56] like: [15:37:56] subprocess.Popen.__init__?? [15:38:13] ha, cool [15:39:28] double quoting around sed's expression? [15:39:38] (line 2) [15:45:00] ottomata: You're using shell=True [15:45:01] then: [15:45:05] "If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional arguments to the shell itself." [15:45:11] naw i have to quote it, its a string [15:45:21] hmmmm [15:45:23] lemme try [15:45:26] That's why the "echo" is threated as the command to run. [15:45:34] And "hi" is parameter to shell. [15:45:55] naw, same result [15:46:21] Not for me :-) [15:46:29] In [36]: p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) [15:46:29] In [37]: stdout, stderr = p.communicate() [15:46:29] In [38]: stdout [15:46:30] Out[38]: '\n' [15:46:33] oops [15:46:38] and [15:46:38] command = ['/bin/echo', 'hi', '|', '/usr/bin/env', 'sed', '-e', 's@hi@wee@'] [15:46:40] right? [15:46:47] Ok. [15:46:56] That should give an empty line. [15:47:06] (When using "shell = True") [15:47:21] In [39]: p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False) [15:47:21] In [40]: stdout, stderr = p.communicate() [15:47:21] In [41]: stdout [15:47:21] And it gives the empty line for me. [15:47:22] Out[41]: 'hi | /usr/bin/env sed -e s@hi@wee@\n' [15:47:28] Right. [15:47:37] Looks correct to me. [15:47:52] Popen escapes the pipe. [15:47:52] ok so, the answer is, that Popen does treat list and string commands differently [15:48:22] the only way I could get Popen to do pipes the way I wanted was to use a string command and to set shell=True [15:48:39] That's why my review asks to not use pipes. [15:48:52] pipes in Popen are no good. [15:49:01] i like most of those, i don't think I want to change how sh works, because i think you should be able to use pipes if you really want to [15:49:07] but, i think for most of your comments we can avoid it [15:49:09] gonna work on it [15:49:44] Let me boot the Google machine :-) [17:41:34] ottomata, ping [17:42:40] hiyaa [17:43:05] ottomata, so, I'm getting claims that the JSON SerDE can't be found from hive. [17:43:12] "java.io.FileNotFoundException: File does not exist: /usr/lib/hcatalog/share/hcatalog/hcatalog-core-0.5.0-cdh4.3.1.jar" [17:43:21] is it as simple as 'we upgraded to a newer version', or...? [17:43:47] oooo i merged a change that qchris made today to makethat easier [17:43:48] lemme see [17:44:01] aha [17:44:34] also, the heapsize increase + remembering to limit the partitions it's searching over worked like a charm. Would you like to send an email out about it or should I? I feel like people should probably know how to un-bork it when it borks. [17:44:50] hmm, ok i'm not getting that error at all [17:44:52] how are you getting it? [17:45:07] hive, ADD JAR /usr/lib/hcatalog/share/hcatalog/hcatalog-core-0.5.0-cdh4.3.1.jar; [17:45:24] and then, well, any query I've tried, both one of my select-random-entries things and a very limited COUNT(*) [17:45:24] Added resource: /usr/lib/hcatalog/share/hcatalog/hcatalog-core-0.5.0-cdh4.3.1.jar [17:45:32] aha! [17:45:36] you have to run a query for it to bork ;p [17:45:36] no i mean [17:45:38] its working for me [17:45:38] ok [17:45:45] try SELECT COUNT(*) FROM wmf.webrequest WHERE year = 2014 AND month = 05 AND day = 14 AND hour = 12; [17:45:46] i'm running [17:45:47] show create table [17:45:48] kk [17:45:52] which usually borks if it isn't loaded [17:45:53] try that [17:45:57] show create table webrequest; [17:46:10] that works fine. [17:46:13] ok running coutn [17:46:34] i'm also limiting to webrequest_source='mobile' [17:46:36] foir even smaller data [17:46:40] kk [17:46:45] and it runs? [17:46:48] so far.. [17:46:55] yup runs fine [17:47:02] okay, that's just bloody weird. [17:47:04] on stat2? [17:47:07] stat1002, ja [17:47:08] real quick [17:47:10] try it [17:47:11] (probably doesn't make a difference, but...) [17:47:15] without adding any add jar command [17:47:28] hmm, that failed for me [17:47:29] ! [17:47:34] but it shoudlnt' [17:47:53] ottomata, what fail message? [17:47:59] java.io.FileNotFoundException: File does not exist: /usr/lib/hcatalog/share/hcatalog/hcatalog-core-0.5.0-cdh4.3.1.jar [17:48:03] that's without adding the jar at all [17:48:09] yup, same error both times. [17:48:12] (for me) [17:48:25] would be expected, but qchris's change is supposed to make that better [17:48:25] hmmmm [17:48:45] can you run your count query again? [17:48:47] with jar added [17:48:51] did do, borks, same error. [17:48:57] just now you did? [17:49:03] This is just silly. Sorry for producing something non-replicatable [17:49:03] yep. [17:49:14] hmmm [17:49:14] third time fer testing, same problem. [17:49:21] ok i'm going to su to you and try things [17:49:31] suuuu tooo youuu [17:50:52] hmmm [17:50:58] ok Ironholds, not sure why add jar doesn't work [17:50:59] that is weird [17:51:00] but [17:51:01] if you do [17:51:07] --auxpath /usr/lib/hcatalog/share/hcatalog/hcatalog-core-0.5.0-cdh4.3.1.jar [17:51:09] on your hive cli [17:51:11] yup [17:51:12] instead of doing add jar [17:51:13] that works [17:51:18] hive --auxpath /usr/lib/hcatalog/share/hcatalog/hcatalog-core-0.5.0-cdh4.3.1.jar [17:51:26] da. and then -f yada yadda. [17:51:27] Thanks :) [17:51:32] but still [17:51:34] add jar shoudl work [17:51:35] AND [17:51:39] it shoudl now work without even specifying that! [17:51:40] grrr [17:52:10] I should get a new set of business cards. [17:52:15] Oliver Keyes, Headacher of Ops [17:52:36] Breaks Things Faster than a Speeding Patch [17:52:58] Leaps Reproducible Bugs in a Single Bound [17:53:12] ha [17:53:22] unrelated to the bug itself: thanks for the work with qchris on removing the ADD JAR need :) [17:53:32] the simpler we can make it to run hive queries the more people will use it. [17:53:35] yeah [17:53:36] Wait, people will use it. Crap. [17:53:41] haha [17:53:42] My stuff will run so slow! [17:53:51] yeah, actually, hue in cdh5 might just actually work for us [17:53:56] so there might be a web ui to hive... [17:57:53] well, the query is still failing, but for different reasons. progress! [17:58:20] and the task page 404s [17:58:21] * Ironholds sighs [18:42:35] (CR) Ottomata: Add code to auto-drop old hive partitions and remove partition directories (40 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/135128 (owner: Ottomata) [18:47:22] (PS5) Ottomata: Add code to auto-drop old hive partitions and remove partition directories [analytics/refinery] - https://gerrit.wikimedia.org/r/135128 [19:06:13] Ironholds: I just tested a simple query and it works both on stat1002 and analytics1010 without the ADD JAR, and without --auxpath [19:06:23] Which query are you using to make it misbehave? [19:06:33] (i.e.: not finding the jar) [19:06:37] qchris, SELECT COUNT(*) FROM wmf.webrequest WHERE year = 2014 AND month = 05 AND day = 14 AND hour = 12; [19:06:45] Thanks. [19:06:46] see above - otto couldn't replicate it, but su'd in as me and could. [19:06:50] it's very strange. [19:07:00] Still, the --auxpath trick worked, so it's resolved from a user POV. [19:07:48] Ironholds: That query fails for me as well. [19:08:07] Jould you try 'select * from wmf.webrequest where year=2014 and month=5 and day=30 and hour=1 limit 1;' [19:08:13] s/Jould/Could/ [19:08:29] (I just want to rule out that something is broken with your account) [19:08:31] that worked for m! [19:08:33] me [19:08:38] but coutn(*) didn't! [19:09:06] qchris, that works fine. [19:09:10] Ok. [19:09:58] ottomata: Could you but the jar into dfs? [19:10:04] (I lack permission to do so) [19:10:20] mkdir: Permission denied: user=qchris, access=WRITE, inode="/":hdfs:hadoop:drwxr-xr-x [19:10:33] s/but/put/ [19:11:26] you can put it anywhere, [19:11:29] why put it at /? [19:11:31] where do you want it? [19:11:35] you can put it in /user/qchris [19:11:41] I wanted to put it at /usr/lib/hcatalog/share/hcatalog/hcatalog-core-0.5.0-cdh4.3.1.jar [19:11:53] And /usr does not exist. [19:11:58] So I wanted to mkdir /usr [19:12:14] And I cannot mkdir, because I lack write on / [19:14:06] (My theory is that the plain 'select ... limit 1' does not spawn a map reduce job, hence can be done locally, but the 'select count(*) ...' does need a map reduce job, and wants the file in hdfs) [19:18:00] hmmmm [19:18:12] ok we can test, but we can't keep that file there [19:18:14] ok qchris? [19:18:33] ok. [19:19:05] ok its there [19:19:06] try it [19:19:18] Trying... [19:19:35] Job is starting ... [19:19:41] Job is running... [19:19:49] So it looks like that really was the issue. [19:19:55] (PS1) Milimetric: Fix datetime parsing problem [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/136429 [19:19:57] (PS1) Milimetric: Clean up errant print [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/136430 [19:19:59] (PS1) Milimetric: Add Newly Registered Users metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/136431 (https://bugzilla.wikimedia.org/65944) [19:20:32] (CR) jenkins-bot: [V: -1] Add Newly Registered Users metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/136431 (https://bugzilla.wikimedia.org/65944) (owner: Milimetric) [19:20:39] Can we put the jar file in $some_well_known_place in hdfs? [19:23:04] (PS2) Milimetric: Add Newly Registered Users metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/136431 (https://bugzilla.wikimedia.org/65944) [19:24:17] ok qchrisy [19:24:19] yes we can [19:24:23] and actually need to for oozie stuff [19:24:23] hmm [19:24:42] heheh [19:24:43] /wmf/kraken/artifacts/hcatalog-core-0.5.0-cdh4.3.1.jar [19:24:45] its already there :) [19:25:07] so the hive-site setting is for hdfs then? [19:28:45] It's for both I guess? [19:28:53] I just tried on my labs cluster. [19:29:01] and forcing local file system works. [19:29:09] So I'll prepare a patch for that. [19:29:38] Or do you prefer to force hdfs? [19:29:48] (I guess we have to force one of the two) [19:30:20] (PS1) Milimetric: Fix invalid users display for invalid cohort [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/136434 [19:33:32] ah, file:/// qchris? [19:33:35] hmmmmm [19:33:36] Yes. [19:33:41] i think forcing local is better, for now [19:33:58] it will always be there with cloudera [19:34:02] the hdfs path we ahve to put it there [19:34:06] and make sure its there [19:34:11] so in, labs, vagrant, etc. [19:34:13] extra step [19:34:36] Ok: https://gerrit.wikimedia.org/r/136435 [19:54:36] Ironholds: ottomata merged the fixup for the auxpath thing. [19:54:50] Could you try your query again (without auxpath or ADD JAR) [19:55:08] (It worked for me, but wanted to double check that it works for you as well) [22:58:50] DarTar: hey. csteipp told me you were asked to do analytics regarding some tls chipher changes, which he talks about in https://gerrit.wikimedia.org/r/#/c/132393/1/templates/nginx/nginx.conf.erb in is 3rd comment. [23:00:35] is the necessary data being collected and available for later analysis so it doesn't block deployment of that pach? [23:04:40] jzerebecki: yes csteipp reached out, but this has been a pretty low priority for my team so far, I don’t have an answer to your question yet [23:16:26] DarTar: are there any timings in the path until the user being able to start reading a page being continously monitored for regressions or at least observed from time to time? [23:18:51] no, sorry – I don’t have an ETA yet, but I’ll bring this up with tnegrin so we can see how we can help [23:20:51] jzerebecki: if I understand csteipp’s request we’re already collecting the data that is needed, so that should not affect the deployment in any way [23:23:25] DarTar: nice. please comment to that effect on https://gerrit.wikimedia.org/r/#/c/132393/ [23:23:37] sure [23:55:52] thank you :)