[13:37:45] Hey YuviPanda|zzzz. When you come back online, I have an idea to pitch -- an internal version of quarry for teaching people how to work with EventLogging data. [15:24:34] ewulczyn: https://github.com/wikimedia/operations-puppet/blob/production/templates/udp2log/filters.erbium.erb [17:55:19] Hey Nettrom, I just got through the psmag coverage of the Han article. [17:55:22] * halfak barfs [17:55:28] How was the actual paper? [17:56:15] Or wait... was this one we couldn't find the PDF for? [17:57:49] From the abstract: "Experiments demonstrate that our approach generates a good performance." WTF is "good" and how does it compare to past work? [18:51:04] halfak: yes, this was the one where a PDF was EUR30 [18:51:21] Did you ever get your hands on the PDF? [18:51:23] nope [18:51:45] I think someone were to contact Phoebe or someone else to see if they could get it? [18:53:22] Hmmm. I'll see if I can find Ocaasi [19:30:52] halfak: hey! I'm on a bus. But yes, that has been something I have wanted to do for quite a while :) [19:31:05] halfak: needs a machine to play with tho... [19:31:32] Indeed. And it would need to be inside so that it could access the internal slaves. [19:31:56] stat1003 is the current place that a ton of mysql clients are basically doing quarry-like things. [19:46:26] halfak: yeah need to get Toby to approve a machine for this. One shall be more than good enough. [19:47:00] * halfak will take that ad a to-do item. [19:47:13] halfak: quiddity has been asking for this forever as well. Toby is interested too iirc [19:47:32] * quiddity denies everything. [19:47:38] >.> [19:47:51] milimetric, re. cubes, do you think it would be worthwhile to pursue the SQL-based UI in the meantime? [19:48:38] TL;DR of above: I asked YuviPanda about hosting an internal version of quarry to support ad-hoc analytics. [19:49:03] e.g. mobile team sets up schema then queries log DB for records. [19:49:12] halfak: totally, so "cubes" are just paint on top of a data warehouse schema [19:49:27] what we want to do first and foremost is to establish a proper data warehouse schema [19:49:36] that is fed by the kind of event bus that you envision [19:49:48] (for now, just snapshots of data, but eventually, an event bus) [19:49:53] so much want [19:50:07] so once we have that schema, we have 4 clear wins that I see right now [19:50:18] 1.) quarry would have a simpler schema to hit, in addition to the raw tables [19:50:28] 2.) wikimetrics would have a more optimized table to report metrics from [19:51:02] * halfak will write a quick proposal to analytics list about running an internal quarry. [19:51:19] 3.) we can build cubes on top of such a schema, and tune the underlying db to handle ad-hoc analysis. Saiku seems like a great dashboarding tool built on top of standard OLAP cubes, and there are others [19:51:40] 4.) we can *finally* unify answers to "hey where's ____ data?" [19:52:30] But when will who build it? We always seem to have more urgent things to do before we can get to the important ones... [19:52:36] We as in you guys :) [19:52:48] i just opened up the conversation with sean [19:52:54] he was open to it when i mentioned it informally [19:53:02] he said they had hardware they could repurpose [19:53:06] Nice! [19:53:13] so the only question is - how do we update such a thing? [19:53:21] Yup we have a bunch of spare servers lying around [19:53:28] replication doesn't quite work, because the tables kind of need to be fed by events [19:53:38] so i was thinking recent-changes? [19:53:45] Rcstream? [19:53:48] but I'd love a little refresher on where Ori's new thing is [19:53:49] yeah - that [19:54:04] It got sidetracked when hhvm came along I think [19:54:38] oh :( [19:54:45] Since I assume he won't have time to do it for the next few months maybe I can take a shot once I an ops.. [19:54:45] rcstream doesn't replay, so it can't handle downtime [19:54:51] well - in theory, we have events but we squish them into our "state" table design [19:54:57] It is very close to production levels [19:55:04] BUT that's the whole point of the project I have been working on with MWEvents. [19:55:14] The event source shouldn't matter for the event data you get. [19:55:15] yeah - basically a good event bus is paramount for this [19:55:22] but we can make do and hobble along until that's set up [19:55:38] Kafka! Orwell! [19:56:07] halfak: are you doing mwevents in your spare time or officially? [19:56:10] ottomata is working on a second kafka system that consumes and produces arbitrary events [19:56:16] and *that* would be awesome [19:56:29] oooooOOooOooo [19:56:40] YuviPanda|zzzz, spare time [19:56:56] Anything you hear about that I do is spare time :( [19:57:15] My "official" stuff is usually not worth talking about. [19:57:18] halfak: except the meetings. [19:57:26] Oh yeah. I'll tell you about those ;) [19:57:29] halfak: officially you do meetings :) [19:58:12] Speaking of which. [19:58:16] * halfak goes to a meeting [19:58:17] HAHAHA [19:58:25] well - the way we've been doing analytics here is kind of crazy. I'm done being quiet about it. We need to stop being so backwards, we have no excuse. [19:58:28] Good luck halfak [19:58:50] milimetric, +1 for loud. I like loud. [19:58:55] * halfak waits in meeting room. [19:58:57] milimetric: +1 [20:01:28] * quiddity meets in the waiting room. [20:01:49] quiddity, :D [20:01:57] This is way I get work done. [20:02:15] * halfak is super productive when being stood up for a meeting