[03:31:31] DarTar: Hey! o/ [03:31:50] hello [03:32:14] DarTar: Do you have a minute to discuss some Popups related stuff? [03:32:24] sure, what's up [03:32:27] DarTar: Also, did you get a chance to look at the Trello board? [03:33:01] I saw the thread, but haven't looked into it in detail [03:33:07] 1 sec [03:35:14] back, looking up the thread [03:35:32] and hi btw, I don't think we've ever talked on IRC :) [03:39:14] prtksxna: a couple of comments, [03:39:24] do you want me to reply here or on trello? [03:41:59] prtksxna: I'll go ahead and post on trello, not sure how long I'll be around tonight [18:33:56] * halfak is pained by the fact that he can't join log data to non-enwiki dbs. :(  [18:53:10] * milimetric milimetric feels halfak's pain and wishes he could grow more pairs of hands [18:53:36] how are you doing it now halfak? [18:53:38] or not at all? [18:54:33] Hi milimetric. Right now, I run scripts that grab data from other wikis, store the data temporarily as a TSV file on stat1 and then use mysqlimport to put it on the "staging" db next to the "log" db so that I can query together. [18:54:58] As you can imagine, this is painful when I want to write reports that update regularly. [18:55:10] grr [18:57:25] halfak: as usual there's no easy answer but there might be a hack if two things are true: [18:57:25] * average is wondering what the join condition is in this case [18:57:36] 1. the data from EL is shareable publicly [18:57:53] 2. the data from labsdb is sufficient to do the analysis [18:58:08] be them things true? [19:01:10] halfak: what do you join on ? [19:01:35] sorry guys, brb [19:34:54] [travis-ci] master/daca84a (#163 by milimetric): The build passed. http://travis-ci.org/wikimedia/limn/builds/18752686 [19:35:51] gr, k, sorry if i missed anything halfak [20:14:54] milimetric, sorry, you didn't just finishing up a meeting. [20:15:14] oh good, no prob, i'm around [20:15:23] milimetric, sadly, this EL stuff could not be made public. [20:15:29] k [20:15:47] It's instrumented button clicks. Not terribly sensitive, but we said that we'd keep it private. [20:15:48] well, we're talking about how to sanitize EL data and bake in hints to the schemas themselve [20:16:22] Yeah. I think this is an example of a schema that wouldn't work for it. :\ [20:16:36] PageCreation, on the other hand, is a great schema for sharing :) [20:16:39] like field: { name: ip, anonimizeBy: removing } [20:16:52] or field: { name: browser, anonimizeBy: aggregating } [20:17:16] but yeah, that's nothing that'll help you now, sorry :( [20:27:45] I like the hint idea. I think capturing it in the schema is a great idea too. I was discussing a similar thing with dario re. indexing. [21:26:25] the thing I like about capturing the anonimization settings in the schema, halfak, is that it makes it public and accessible. I think we should still be very careful about automating anonimization tools, but this would be a nice first step [21:27:08] Totally agree on that point. I'm not as worried about "automated" when it comes from human-generated documentation like this. [21:27:23] If we were "detecting" fields that ought to be anonymized then I would be more worried. [21:28:09] It wouldn't be a bad idea to have a little bit of structure in place for reviewing these kinds of things too. I'm imagining something like code review. [21:32:22] (PS2) Milimetric: [WIP] Run recurring reports using the scheduler [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/112165 [21:32:24] (PS2) Milimetric: Allow reports to be rerun [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/111914 [21:56:02] zz_prtksxna: I posted a detail reply on Trello, I need your input on this before we can move on and finalize the schema [21:56:04] (PS1) Diederik: Output correct Hue tunnel address on console [analytics/kraken] - https://gerrit.wikimedia.org/r/113013 [21:57:52] (CR) Ottomata: [C: 2 V: 2] ":)" [analytics/kraken] - https://gerrit.wikimedia.org/r/113013 (owner: Diederik) [22:14:43] code review would be awesome [22:14:47] k, gotta sign off, bbl [22:31:18] DarTar, halfak: http://www.nature.com/polopoly_fs/1.14700!/menu/main/topColumns/topLeftColumn/pdf/506150a.pdf about p-values [23:17:29] drdee, thanks. I saw that. I don't see that "hacking" p-values is that much of an issue. I usually have much better reasons to reject a paper. [23:19:36] Also, the whole plausibility graphic is silly. I'd like to see a method for determining the odds of a hypothesis from "previous experiments, conjectured mechanism and other expert knowledge".