[04:11:03] New patchset: Rfaulk; "fix. pep8" [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/71316 [04:11:28] Change merged: Rfaulk; [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/71316 [04:23:06] New patchset: Rfaulk; "mod. Assign boolean to is_reverted when no revisions have been made by a user." [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/71317 [04:23:28] Change merged: Rfaulk; [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/71317 [12:57:56] hi [13:17:01] New patchset: Stefan.petrea; "New mobile goes to stat1002 instead of stat1001" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/70929 [17:30:23] Change merged: Erik Zachte; [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/70929 [17:40:03] milimetric: let me know when you have a chance to chat client side stuff [17:42:01] will do [17:42:07] chatting with christian [17:42:18] milimetric: cool [17:44:51] ok erosen, all yours [17:44:55] hangout? [17:44:58] milimetric: yup [18:35:35] did you guys even bother to have standup today/ [18:35:36] ? [18:42:03] tnegrin: yeah, dan, stefan, erik and I did the wall [18:42:15] cool -- sorry I missed it [18:42:20] everything ok? [18:42:28] yup, everything seemed pretty good [18:42:38] except qchris had some trouble with the hangout [18:42:48] but dan chatted him on skype [18:43:16] Yes. Google did not like my machine. Updating it already :-/ [18:44:16] I think that's a good thing these days [18:44:44] qchris: I had wanted to connect you with Diederik since he's got some tasks he needs to offload [18:44:50] but it's canada day so he's off [18:45:15] tnegrin: He told to think about jars and multiple different versions in hadoop. [18:45:24] tnegrin: We briefly met :-) [18:45:49] ok -- that seems reasonable -- it's certainly a problem [18:45:59] good -- glad you connected [18:46:25] it's probably useful to d/l a hadoop vm to play with locally [18:47:39] Any particular suggestion on which image to use? [18:47:53] cloudera cdh4 [18:48:01] Oh. Sure :-) [18:49:02] https://ccp.cloudera.com/display/SUPPORT/Cloudera+QuickStart+VM [18:49:56] we use pig and will start using hive a lot more [18:50:29] impala is interesting but a little too beta right now -- I'd actually like to talk to someone who uses it in production before we try! [18:53:04] + [19:38:52] New patchset: Erik Zachte; "Split WikiReportsScripts.pm into ..Html ..Ploticus ..R / Reactivate revert reporting" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71397 [19:38:52] New patchset: Erik Zachte; "Collect revert checksums from sha1's in xml instead of calculating md5's" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71398 [19:38:52] New patchset: Erik Zachte; "fix renaming temp files on partial run" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71399 [19:38:53] New patchset: Erik Zachte; "temp fix to avoid collecting all tranlations on each run, needs further care" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71400 [19:38:53] New patchset: Erik Zachte; "disable monthly ticks on R charts, rendered as small blocks, not essential" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71401 [19:38:53] New patchset: Erik Zachte; "minor tweaks in R chart processing" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71402 [19:38:53] New patchset: Erik Zachte; "Minor tweaks in trace code and report comments (MD5->SHA1)" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71403 [19:40:15] Change merged: Erik Zachte; [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71397 [19:40:34] Change merged: Erik Zachte; [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71398 [19:40:51] Change merged: Erik Zachte; [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71399 [19:50:04] Change merged: Erik Zachte; [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71403 [19:50:04] Change merged: Erik Zachte; [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71402 [19:50:04] Change merged: Erik Zachte; [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71401 [19:50:04] Change merged: Erik Zachte; [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/71400 [21:46:39] milimetric: bug i filed https://github.com/wikimedia/limn/issues/86 [21:46:50] while setting up a dashboard for James_F [21:46:51] yep, i saw that YuviPanda [21:47:02] it was counting ints as NAN [21:47:10] took out quotes in the source and it's all fine now [21:47:30] yeah, but I'm not sure it's reasonable to expect Limn to figure out the datatypes like that [21:47:36] if you put something in quotes, it's a string [21:47:54] if Limn has to parse that, it has to do so for every cell of every row [21:47:58] so then that slows it down quite a bit [21:48:07] well, CSVs and TSVs usually strip quotes before parsing [21:48:24] that's not a very good idea [21:48:26] what if someone was trying to quote something explicitly [21:48:38] or what if some column header has an apostrophe or quote in it [21:48:44] that's how most language's default modules work (python, for example) [21:48:50] they are escaped [21:48:52] \" [21:48:59] yeah, i disagree, I've not seen that default [21:49:15] this is a standard csv/tsv parser too [21:49:38] well, see for example http://docs.python.org/2/library/csv.html#csv.QUOTE_ALL [21:49:55] it is buggy then :) [21:49:59] either way, at least having it documented would help [21:50:17] csv has tons of options just to figure out how to deal with quotes [21:50:19] yeah, documentation is good [21:50:27] but yeah, Limn is not a CSV parser [21:50:47] well, if it accepts csv it should have one. [21:50:58] I mean, it's not a flexible parser [21:50:59] either way, documenting would be good :) [21:51:04] it won't ever have options or anything like that [21:51:18] because that's the responsibility of another module, before it gets to Limn [21:51:25] trust me, it's complicated enough as it is :) [21:51:42] I'll note that quotes are also part of the RFC [21:51:43] https://tools.ietf.org/html/rfc4180 [21:51:52] what I would want to do is build a place to cleanse / prepare datasets for Limn [21:51:59] section 2.5 [21:52:02] that'd be a great place for CSV parsing, etc. [21:52:11] yeah, but as it stands now using Limn in any form feels very painful :( [21:52:37] need to dive into code for even simple docs (which nodes do what, what options they have) [21:53:07] plus surprises like this. [21:53:25] we're not violating the spec and the spec says nothing about what the program should do with quote-enclosed values [21:53:29] my point is this [21:53:53] Limn can't be expected to handle every format that everyone out there wants it to [21:53:56] err [21:53:57] that's just not Limn's job [21:54:22] no I'm just saying that when it says 'it reads CSV' or 'it reads TSV', it should read CSV or TSV [21:54:44] if not it should at least *say*, 'we support CSVs, but do not quote ints!' [21:54:52] think about it a second and you'll see that what you're saying is unreasonable [21:54:59] Microsoft fails at reading quoted values all the time [21:55:03] for example, long numbers in quotes [21:55:06] it thinks they're strings [21:55:20] so if you want them to be numbers you literally have to change it to "=2342342" [21:55:21] see the second part of my sentence :) [21:55:37] i agreed with you from the start about documentation [21:55:48] we are clearly agreeing :) [21:56:12] I just have a backlog of a hundred things to do before I am allowed to work on Limn documentation [21:56:30] if you feel strongly about getting it done faster, I can definitely set you up as a contributor [21:56:31] I udnerstand :) [21:56:43] apologies if I came across as entitled/grumpy [21:56:46] not my intention [21:56:49] no problem :) [21:57:03] I'm only frustrated that I can't do everything I want [21:57:08] which I think is a good problem to have :) [21:57:20] but it does have negative side effects like this [21:57:20] I think we both are frustrated that you can't do everything you want :D [21:57:23] yeah [21:57:47] but yeah, if you'd like to take a shot at documenting this, you're welcome to jump in anytime [21:58:03] i'm happy to point you around the codebase [21:58:15] i've this weird aversion to Coco... [21:58:37] and slight reservations about the entire architecture too (downloading all the tsvs, to the client, for example) [21:58:39] this is one example of where I'm messing with the built in parser [21:58:41] https://github.com/wikimedia/limn/blob/master/src/data/dataset/dataset.co#L12 [21:58:42] but I'll take a look when I can [21:58:45] it had to handle dates more flexibly [21:58:56] so that line is telling it to handle dates using moment.js [21:59:07] which is basically the most bad-ass date/time library ever [21:59:12] I think I'm also confused about what limn exactly it [21:59:33] ideally, it's just a tool that non-technical users can use to make graphs [21:59:42] right now it has a couple of serious flaws that prevent it from being that [22:00:14] and by "make graphs" I mean, people should be able to play with data so as to understand it better [22:00:26] make correlations, study statistical analysis, etc. [22:00:40] for now it's just a heavy way to make dashboards [22:00:53] but the architecture supports some pretty incredible stuff - if we ever get the time to work on it [22:01:54] milimetric: yeah, but it seems to be currently used just for 'make charts out of CSVs' [22:02:01] yeah and very heavy for that [22:02:04] exactly YuviPanda [22:02:25] so IMO I guess it should decide it is for non-tech users, and go entirely that way... [22:02:27] we were pretty close to getting Limn in good shape back in February [22:02:33] but then we were told to stop all work on it [22:02:44] oh? [22:02:46] and? [22:02:46] yep [22:02:47] why? [22:02:59] well, we had to demonstrate that we were providing value as a team [22:03:01] to the organization [22:03:16] so we did a bit of analysis, took on User Metrics API, and a few other things [22:03:20] but Limn got left behind [22:03:49] hmm, so analytics team pivoted from Limn stuff into User Metrics? [22:03:51] it's pretty close actually, we have to do two main things: [22:03:54] yep [22:04:07] 1. change from storing metadata in files to a database [22:04:15] 2. finish the edit UI [22:04:36] hmmm [22:04:36] and 2. is trivial because we're using Knockout so we just need a few simple contextual html forms bound to the properties we're already observing [22:04:58] so Limn is specifically not going to be built for something like 'I have a bunch of CSVs, and want to make it look pretty!' [22:05:10] no, that could be one of the use cases [22:05:14] but that's not its purpose [22:05:44] but more like 'I am an analyst, want to make charts with data from various sources, so I use limn' [22:05:50] ideally, it would be: "I have a bunch of CSVs and I want the user to be able to play with all of them at the same time and understand the data" [22:05:54] hmm [22:06:16] as an analyst, I'd use Limn to tell a story that people could interact with [22:06:23] hmm [22:06:26] so I imagine discussions and annotations going alongside graphs [22:06:36] and the story of the data unfolding as that happens [22:06:45] but that being done without having to write code [22:06:48] is that right? [22:08:05] just trying to fully understand the rationale behind some of the things Limn does (have all raw data available in JS, for example) that I don't understand yet. [22:08:50] yep, without writing code is the key part [22:09:16] does the analytics team have someone working on it full time right now? [22:28:20] New patchset: Milimetric; "got knockout and wtforms communicating well" [analytics/wikimetrics] (master) - https://gerrit.wikimedia.org/r/71542 [22:28:47] Change merged: Milimetric; [analytics/wikimetrics] (master) - https://gerrit.wikimedia.org/r/71542 [22:29:06] hey erosen, it works ^ [22:29:07] :) [22:29:14] stupid mistake like I thought [22:29:21] awesome [22:29:22] so the $.ajax call was a bit black magic [22:29:29] i see [22:29:40] but the simple conclusion there is: pass it a JS object and it'll figure out what to do [22:29:50] the problem I kept running into was self-inflicted of course [22:29:50] great [22:29:57] the BetterBooleanField was working properly [22:30:00] as usual [22:30:01] awesome [22:30:04] but in our form_for_configure.html [22:30:11] I was doing if f.type == 'BooleanField' [22:30:24] so I just had to add or f.type == 'BetterBooleanField' [22:30:31] and all is well [22:30:36] hehe [22:30:37] nice [22:30:47] milimetric: one more q before you leave: [22:31:00] so that's all in decent shape now and your knockout model should be all set to submit to /jobs/create [22:31:02] did you run into the UnboundField error at all? [22:31:09] no [22:31:11] what does it say [22:31:16] milimetric: great, I'll work on plugging that together [22:31:41] basically the new field type that I declared is considered an instance of an UnboundField [22:32:00] so that when the metric tries to use the namespaces property, it fails with something like: [22:32:01] TypeError: 'UnboundField' object is not iterable [22:34:01] silly question: is that field being declared inside a wtf.Form child? [22:34:20] sounds like for whatever reason it thinks it's not [22:34:49] once we figure out how to do this, I'm going to go on a spike to figure out how to test it all [22:35:04] nice [22:35:05] we probably need to test these forms under WebTest [22:35:05] it is not [22:35:10] i guess I could do that [22:35:12] perhaps that would be easier [22:35:18] to declare it inside the form [22:35:20] is that what you did? [22:35:31] wait, the class or the field? [22:35:34] no; hmmm [22:35:37] the field definitely has to be inside the form [22:35:43] the class should be free to roam wherever it pleases [22:35:44] oh, the field [22:35:47] yeah, just not the form [22:35:51] not hte Field class [22:35:54] yeah [22:36:10] milimetric: check this out: http://wtforms.simplecodes.com/docs/0.6/fields.html [22:36:11] we probably should make like a wtforms_utils.py or something [22:36:15] and add all our nice classes there [22:36:16] and ctrl-f for "UnboundField" [22:36:20] yeah [22:36:34] oh ok [22:36:48] but if you stick it inside a form, it should pick those things up automagically right? [22:37:09] yeah, that's what I thought [22:37:16] i guess just make sure to call the relevant super methods to ensure its automagic is free to happen [22:37:50] milimetric: i'm not sure where the form instance should appear though [22:38:32] ok, hangout [22:38:34] :) [22:38:38] oh but you dropped out [22:38:43] yeah [22:38:46] i'm on my way [22:43:20] milimetric: hmm, looked through the code - seems to be very little activity right now (~2 commits in last 2 months?). I'll poke at it and see what I can do :) But thanks for clarifying the intended purpose of Limn, though - I think a lot of my frustrations were because of me trying to use Limn for something it wasn't intended for [22:45:28] np YuviPanda, and yeah, I'm surprised there's even that much activity :) [22:45:34] actually, be sure to use the develop branch though [22:45:38] master hasn't been updated in a while [22:45:47] (longer story I don't have time for now) [22:45:53] hehe :) [22:46:29] but I'll see if Limn is really the tool I want. My use case (have csv, want graph) seems to be different than Limn's. Maybe I can knock up some quick things that'll run csvs through matplotlib. [22:46:38] milimetric: and we should catch up on the longer stories sometime :) [22:46:54] sure :) [22:47:01] are you coming to Wikimania? [22:49:26] i'm not YuviPanda, sadly [22:49:31] :( [22:49:38] let's hope I make it for allstaff then :) [22:49:47] I was in Amsterdam and I thought we could only do one or the other :) [22:49:53] that'd be awesome [22:50:00] allstaff is a time for jubilation and Limn stories :) [22:50:08] k, nite everyone, see y'all tomorrow [23:04:29] milimetric: :D nite! [23:43:48] YuviPanda: btw having a csv and wanting to turn it into a chart is exactly the purpose of Limn