[08:57:32] (PS3) Nemo bis: Typofix in comment [analytics/wikistats] - https://gerrit.wikimedia.org/r/118366 [08:57:52] (CR) jenkins-bot: [V: -1] Typofix in comment [analytics/wikistats] - https://gerrit.wikimedia.org/r/118366 (owner: Nemo bis) [09:29:08] (PS1) Nemo bis: [Full dump analysis] Reduce edits_only and reverts_only intricacy [analytics/wikistats] - https://gerrit.wikimedia.org/r/118436 [09:29:24] (CR) jenkins-bot: [V: -1] [Full dump analysis] Reduce edits_only and reverts_only intricacy [analytics/wikistats] - https://gerrit.wikimedia.org/r/118436 (owner: Nemo bis) [09:30:14] (PS2) Nemo bis: [Full dump analysis] Reduce edits_only and reverts_only intricacy [analytics/wikistats] - https://gerrit.wikimedia.org/r/118436 [09:30:29] (CR) jenkins-bot: [V: -1] [Full dump analysis] Reduce edits_only and reverts_only intricacy [analytics/wikistats] - https://gerrit.wikimedia.org/r/118436 (owner: Nemo bis) [09:46:44] (PS3) Nemo bis: [Full dump analysis] Reduce edits_only and reverts_only intricacy [analytics/wikistats] - https://gerrit.wikimedia.org/r/118436 [09:47:00] (CR) jenkins-bot: [V: -1] [Full dump analysis] Reduce edits_only and reverts_only intricacy [analytics/wikistats] - https://gerrit.wikimedia.org/r/118436 (owner: Nemo bis) [12:04:56] morning! [13:28:47] (PS2) Csalvia: Implement security for public reports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/118068 (owner: Milimetric) [17:33:13] nuria: csalvia: gather round :) [17:33:26] I've got ye a tale ter tell [17:33:35] it wasn't really a bug, it was a design problem [17:33:50] the tree looks like this: [17:33:54] RunReport [17:34:00] AggregateReport [17:34:23] MultiProjectMetricReport [17:34:31] MetricReport [17:34:45] and show_in_ui is used to hide aggregatereport [17:34:45] now, of those, the properties as set by the system are: [17:34:52] mmm, no [17:34:53] sadly [17:35:09] RunReport: show_in_ui: False, recurrent: True [17:35:17] AggregateReport: show_in_ui: True, recurrent: False [17:35:28] like, recurrent totally makes no sense on AggregateReport of course [17:35:33] yeah [17:35:44] damn tree [17:36:17] there should be a report table [17:36:22] and a ReportNode table [17:36:47] erosen: if you're out there, I think I you said that from the first, so you were right! Or I said it... heh :) [17:37:21] but .... [17:37:22] the ReportNode table should be what the current Report table is, without show_in_ui, recurrent, recurrent_parent_id, or public [17:37:32] and Report should keep those fields but lose the celery links [17:37:38] YES [17:37:44] ja ja [17:37:49] yep... [17:37:49] let's create a card for it [17:37:58] but - what the heck do we do in the meantime... [17:38:23] hmmm [17:38:24] well [17:38:24] let's talk in batcave [17:38:32] k [17:40:55] average, did you ever get my email about r/hive integration through RJDBC [17:40:56] ? [18:27:00] added this to the etherpad, nuria & csalvia: * refactor ReportNode tree to make RunReport nodes one-to-one with a single metric/cohort combination, instead of one-to-many. Currently, AggregateReport is one-to-one in this way, but it complicates the tree and makes this feature impossible [18:27:17] okeis [18:28:33] ok [20:49:29] any dartar around? [20:51:03] matanya: I don't see him at his desk. will let him know if I see him. [20:51:42] thanks lzia , tell his late with his trello promise please :) [20:52:27] matanya: I'll tell him you were looking for him instead. ;-) [20:52:40] works too :P thanks! [20:52:47] np [20:59:01] So I'm thinking about doing a query that takes the top 10% of a dataset [20:59:05] Sorted by a particular field [20:59:13] http://stackoverflow.com/questions/4741239/select-top-x-or-bottom-percent-for-numeric-values-in-mysql looks scary [20:59:22] Does anyone have thoughts about this? [21:00:24] I may need to just run it as a separate query [21:02:44] rdwrer: what happened to the org chart? [21:03:28] It's dead while it gets migrated to eqiad [21:03:35] Slash I've been lazy about bringing it back up [21:05:53] oh well, you have at least one disapointed user :) [21:07:05] Noted [21:07:23] milimetric: Ping if you're not busy; percentiles seem like they're going to be a bitch no matter what [21:07:54] percentiles rdwrer? [21:08:06] milimetric: I want to get the top ten percent of load times from the database [21:08:12] For each type of event [21:08:30] a bitch in sql you mean, right? [21:08:34] https://gitorious.org/analytics/multimedia/source/4564339af460aa1fe16cf831e089f77ee782cc50:perf/enwiki.sql [21:08:37] Yeah [21:09:14] yes, especially so with mysql [21:09:16] I want to take that, but get those stats for the top-ten percent as well [21:09:18] rdwrer: the link you shared earlier is doing the percentile and not percent. which one do you want? [21:09:27] Er; percentile [21:09:31] aa [21:09:36] I think? [21:09:42] heh [21:09:49] you want the top ten percent it sounds like [21:10:01] if you wanted the tenth percentile you'd be getting the top 90 percent [21:10:07] Oh, percent then. [21:10:09] Sorry [21:10:24] so, yeah, that's a bitch in sql [21:10:37] rather, in mysql [21:10:58] hm, well, hm [21:11:06] std helps here, right? [21:11:13] Sort of [21:11:16] because if you get that, and have the average in a subquery [21:11:37] then you can get everyone > average + k * std dev [21:11:44] and that gets you something similar to what you want [21:11:46] though not exact [21:11:54] i imagine you just kind of want "the top" [21:11:54] What's k? (been a while since I did stats) [21:11:57] rdwrer: am I missing something or you can order by that variable and just take the first 10% of the rows? [21:11:59] k is my fudge [21:12:18] well, yeah, it's 10% by value right rdwrer? [21:12:21] not by position? [21:12:28] I think by position [21:12:41] oh, psh, then what lzia said [21:12:44] We want 1/10 of the entries [21:12:50] But even that is hard, requires like three selects [21:13:00] yeah, then it's by position [21:13:02] And since I'm staring down inline queries [21:13:03] nah, just order what you have there and make that a subquery [21:13:16] milimetric: But I'm selecting times for many different events at once [21:13:27] then in the outer query limit .... hm, can you limit by a dynamic thing in mysql? [21:13:41] well, yeah, you'd need to break it up as they're separate things [21:13:44] So when one of them is always faster, that won't get represented if I understand correctly [21:13:47] Ugh [21:13:50] lol [21:14:00] I hate sql so much, why do I do this to myself [21:14:13] Mark you have the most unreasonable mathematically impossible expectations of technology of anyone I know [21:14:29] Hah, yup [21:14:40] get me 1/10 rows such that they are the first 10% in multiple conflicting categories [21:14:40] :) [21:14:43] not possible mark! [21:15:12] First, group these things together like this. Then order each group like this. Then limit those things to this number of things. Then select these fields from the result, and order it by this [21:15:17] NOT HARD [21:15:18] no matter how you pulled that data, you'd still have to split it out if you want the top 10 by each kind [21:15:33] Argleflargle. [21:15:41] OK, I'll start the loooooong trek towards that...later [21:15:53] wait i forget, can you make temp tables? [21:16:04] if so, just do that and then go over the table once per category [21:16:44] if not, maybe pulling down that dataset as is and crunching it in python is an option [21:17:14] good luck rdwrer, our prayers be with you [21:17:29] Yeah [21:17:38] I can't make temp tables, I'm on the research account [21:22:48] Luckily this card is further in the future than I thought [22:23:21] (PS17) Milimetric: [WIP] Run recurring reports using the scheduler [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/112165 [22:51:54] hey matanya [22:53:26] it sounds like you were looking for me? If I am AFK feel free to reach out by mail -> dario@wikiedia.org [22:58:33] DarTar: yes i have [22:59:01] hey [22:59:18] hi, reminder: https://rt.wikimedia.org/Ticket/Display.html?id=6845 [22:59:29] your time passed :) [23:02:22] so DarTar when you find time, would be nice if you can update that [23:03:07] yes, I got a reminder this morning :p [23:03:22] unfortunately I haven't been able to hear back from the people I was supposed to contact, I'll need a few more days [23:04:08] thanks, at your own pace. i'll poke around in a week. [23:04:16] sounds good [23:05:17] night