[17:36:24] ... [18:12:24] SF folks (e.g. brion, ori ) are you planning on using the conf room up on 5 for the E86 meeting that's about to happen? [18:12:37] robla: i'm remote, will be on hangout [18:12:44] * robla contemplates heading upstairs [18:13:14] shouldn't the meeting happen on IRC and not on the hangout? [18:13:17] * mobrovac confused [18:13:31] oh maybe i'm confused! [18:13:34] i'd prefer that, actually [18:13:39] but i saw that there was a hangout on the invite [18:13:40] mobrovac: I wasn't planning on joining a hangout [18:13:43] but maybe it was just the default [18:13:46] i have to run off toward internet archive soon [18:13:49] but i'll be on my phone of course ;) [18:13:54] (for irc) [18:14:01] <_joe_> so irc it is, ok [18:14:06] <_joe_> I think that's better [18:14:06] lol [18:14:08] ori: I think you're right.... I think the default setting is awful [18:14:10] unlike the rest of you mobrovac actually corrected me [18:14:18] instead of just being like "ok ori, you can join that hangout" [18:14:22] :P [18:14:38] <_joe_> ori: we wanted you to join an empty party, yes [18:15:24] <_joe_> ok, I guess it's time to start? [18:15:35] aham, irc only then [18:15:37] ? [18:15:43] _joe_: I think so. who's chairing this? [18:15:47] <_joe_> nuria: yup [18:15:50] hiayy [18:15:53] <_joe_> robla: ah! good question! [18:16:03] <_joe_> ottomata? [18:16:08] ori, I added the hangout before I knew what I was doing [18:16:08] so [18:16:09] IRC [18:16:16] yay [18:16:22] I have never run an RFC meeting, nor have I reallllly been in one. [18:16:24] <_joe_> robla: I actually assumed you would :} [18:16:28] hello [18:16:34] but, i can start, can someone else run this thar meetbot thing? [18:16:50] * paravoid is here too [18:16:56] _joe_: I didn't volunteer to run this, and I opposed setting up a one-off [18:17:05] <_joe_> robla: ok [18:17:06] hello [18:17:06] * urandom too [18:17:15] #topic EventBus | RFC meeting | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ (Meeting topic: RFC meeting) [18:17:20] thank sori [18:17:29] startmeeting first? :) [18:17:29] I'm not on the bot's list, apparently [18:17:32] oh [18:17:41] #startmeeting [18:17:41] ori: Error: A meeting name is required, e.g., '#startmeeting Marketing Committee' [18:17:46] #startmeeting EventBus RFC [18:17:46] Meeting started Fri Oct 30 18:17:46 2015 UTC and is due to finish in 60 minutes. The chair is ori. Information about MeetBot at http://wiki.debian.org/MeetBot. [18:17:46] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [18:17:46] The meeting name has been set to 'eventbus_rfc' [18:17:47] #startmeeting [18:17:47] gwicke: Error: Can't start another meeting, one is in progress. Use #endmeeting first. [18:18:07] #topic EventBus | RFC meeting | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ (Meeting topic: RFC meeting) [18:18:21] * mobrovac noise [18:18:22] so, I think most people attending this meeting (except maybe Robla?) have a lot of context already, right? [18:18:25] thanks ori! [18:18:25] look i'm like tim starling [18:18:35] now i just need to be smarter and have a dryer sense of humor [18:18:41] and maybe ori has a little. [18:18:44] * robla will catch up....don't worry about context for me [18:18:46] ayayay [18:19:21] so, the basic idea is to have a sane way to produce and distribute events to multiple consumers [18:19:22] k, so, from what I can tell, this RFC is mostly to help us do a couple of things: [18:19:49] #link https://phabricator.wikimedia.org/T88459 [18:20:14] #info [18:20:14] this meeting will help us: [18:20:14] - finalize some implementation decisions [18:20:14] - get buy in from ops and others [18:20:22] (maybe that worked, heh?) [18:21:07] I know that _joe_ in particular isn't sold on the need for an EventBus service at all [18:21:11] should we talk about that first? [18:21:14] ottomata: I think the info needs to be a single line [18:21:21] (and the bot doesn't ack those) [18:21:36] meh, oook [18:21:37] he [18:21:37] h [18:21:37] <_joe_> I guess it's a good starting point, yes [18:22:03] #info goals for meeting: 1) finalize some implementaiton decisions, 2) get buy in from ops and others [18:22:17] Ok, should we talk about use cases that benefit from an event system then? [18:22:21] ok, for a little bit of context: my main goal here is to standardize the way events are sent around, and the format of those events [18:22:34] I think we all agree that doing that is good [18:22:37] one use-case is the fact that restbase needs to update its content based on actions taken on-wiki, right? [18:22:43] I agree with nuria [18:22:45] ok [18:22:47] ori: yes [18:22:52] <_joe_> nuria: actually I'd like to separate the discussion about "event system" and "implementing a rest proxy in front of kafka" [18:23:02] we have started to discuss a first set of events needed for that use case at https://phabricator.wikimedia.org/T116247 [18:23:04] yes _joe_ agreed [18:23:09] <_joe_> the two things are not equivalent [18:23:13] use cases are agnostic to implementations [18:23:24] let's talk about the "whys" [18:23:26] for use cases, there is also: https://etherpad.wikimedia.org/p/scalable_events_system (near the bottom) [18:23:29] and fate about "hows" [18:23:30] _joe_: but which are you actually skeptical about? the use-case or the implementation? [18:23:38] <_joe_> ori: the implementation [18:23:47] the basic events defs are here - https://github.com/wikimedia/restevent/pull/5 [18:23:51] nuria++ :) [18:23:54] nuria, paravoid, we can talk about this, but we have had several meetings about the use cases already [18:24:01] let's recap them anyway [18:24:05] ok [18:24:14] they're going to be useful for context and for the discussion that will follow I think [18:24:14] <_joe_> yes that is useful for the rest of the discussion [18:24:20] <_joe_> +1 [18:24:31] okay, so one is tracking changes in MediaWiki [18:24:31] one use case is to be able to join (via event queue) differents sources of info [18:24:35] currently, there are many ways and formats that MW and other clients send events around [18:24:51] example: join pageview data and editing data w/o having to use the database records [18:24:53] eventlogging, RCfeed, RCstream, and a myriad of other direct one offs [18:25:33] there are also custom extensions creating custom jobs [18:25:38] ottomata: right, and we would like to have a uniform way in which we can inspect events that represent for example "edits" [18:25:43] like the RESTBaseUpdateExtension [18:25:57] ottomata: so would it be fair to say that a successful implementation will unify / replace the systems you just mentioned? [18:26:02] edits, and everything really. and we want a way to be sure that any particular set of events conforms to a schema [18:26:11] so downstream consumers can know what they are getting [18:26:18] without having to know about the upstream producers [18:26:22] yes ottomata, edits was just a concrete example, right [18:26:25] yes [18:26:29] ori: eventually, yes [18:26:46] ori, eventually, yes [18:26:48] haha [18:26:48] in the short term, we have more concrete use cases that we'd like to replace [18:26:49] <_joe_> sorry can I stop for a second? [18:26:52] yes. [18:26:52] * akosiaris kind of worried abut that eventually... seems to be a synonym for never [18:26:56] yeah [18:27:04] <_joe_> RCStream is an externally exposed service [18:27:08] I think EventLogging will be hard to replace in the short term [18:27:18] <_joe_> it has nothing or little to do with how events are propagated internally [18:27:19] RCStream would just swap out the backend for this system [18:27:24] that one is a little easier [18:27:28] <_joe_> ok [18:27:30] ori said "unify / replace" [18:27:36] not "replace" [18:27:56] <_joe_> ok [18:27:59] tackle all those systems is not a priority for us this quarter [18:28:03] I'm with akosiaris on this. When someone wants to introduce new things, they talk about all the existing ad hoc systems that need to be consolidated. But then when the discussion shifts to concrete plans, that tends to become ethereal, qualified. [18:28:14] yeah, agreed [18:28:21] *tacking [18:28:23] we keep adding stuff to our stack [18:28:25] * gwicke can't type [18:28:38] It doesn't have to be now, but I think a firm commitment would be nice [18:28:41] and maybe a timeline [18:28:41] but rarely go back and consolidating [18:28:46] ori++ [18:28:46] we are concretely planning to replace the RESTBaseUpdate extension [18:28:51] <_joe_> I agree that a clear path of migration should be laid out beforehand [18:28:59] this quarter [18:29:06] it doesn't have to be this quarter [18:29:10] I don't know much about RCFeed, and I only know a little about RCStream. from what I can tell, RCStream won't really be that hard to unify into this system. [18:29:18] most of what we are building is on the producer side of things [18:29:46] there is a bug about letting consumers catch up on disconnect as well, which will be easier to address with a kafka backend [18:30:07] <_joe_> we also have i.e. the search update jobs - a ton of jobs that get spawned by edits right now and that we might want to replace with the event bus [18:30:14] EventLogging requires some more discussion, specifically on the implementatino side [18:30:25] it is a contender for implementation choice [18:30:48] _joe_, indeed. [18:30:50] are those in redis now? [18:30:54] <_joe_> yes [18:30:56] aye [18:30:56] does anyone disagree that "a firm commitment and maybe a timeline" for migrating is needed? [18:31:02] (paraphrasing ori) [18:31:06] <_joe_> agreed [18:31:15] if folks sign up to do the work, then sure [18:31:18] hm. mostly. :) [18:31:21] but paravoid depends if priority is to migrate an old system [18:31:30] we have concrete use cases that take priority [18:31:40] or to create an event flow that say, i can tap into now [18:31:45] "folks" are not signing up to do the work of maintaining more systems either though, are they? [18:31:50] generalizing to cover other use cases is definitely planned, but will require a joint effort [18:32:11] agreed on the joint effort [18:32:12] but i see your point [18:32:15] <_joe_> priority is not to have multiple systems that replicate the same functionality or duplicate part of it indefinitely. [18:32:24] but someone needs to take ownership and be held accountable for coordinating that effort [18:32:26] _joe_: yeah, definitely [18:32:29] paravoid: i think there isn't a system that will work for gwicke's current use case [18:32:29] right, i see, from ops perspective the value proposition [18:32:35] so, a new one of some kind is needed [18:32:39] is to do away with one of the systems we currently have [18:32:53] if that new one can be made general enough to be able to migrate old systems to it [18:32:55] then we should do that [18:32:57] as we can [18:33:10] otherwise paravoid and _joe_ fill that we are just adding to the problem of too many systems doing teh same thing [18:33:20] what other system is doing the same thing? [18:33:21] ok, everyone hang on. let's capture consensus here because i think we have one, let me take a stab at it [18:33:21] so, first system that's going to be replaced is the RESTBaseUpdateJobs [18:33:22] <_joe_> nuria: I think that is a valid architectural principle, not just an ops concern [18:33:23] 8the [18:33:49] ori: yes, please do [18:33:51] maintaining systems doesn't solely fall into our hands [18:33:56] _joe_: right, it shifts the use cases a bit though [18:34:10] decreasing entropy's good for humans [18:34:26] * robla is also eager to hear ori's consensus capturing [18:34:35] - The success criterion for this system (and this project) is its ability to unify the set of partial and divergent implementations that currently exist. [18:34:35] then -seems to me - a good 1st use case is to have an edit stream that can replace current backend for rc stream [18:35:01] there is a balance to be struck between having work informed by a concrete use case, and making things general and universal [18:35:01] yeah I think we all agree to this [18:35:05] o/ [18:35:06] well, nuria, yes, to have an edit stream that can replace backend systems that use an edit stream. [18:35:09] <_joe_> ori: +1 [18:35:12] like RESTBaseUpdateJobs [18:35:22] - The work of refactoring / consolidating specific existing implementations should not fall exclusively on any one team; it is a joint responsibility. [18:35:24] but, [18:35:32] and eventually RCstream, i thikn RESTBaseUpdateJobs is the first use case that people are really really excited about! [18:35:34] :) [18:35:37] agreed too [18:36:00] - The implementers of this project should assume leadership / ownership of the overall process. [18:36:08] <_joe_> ori: +1 [18:36:10] +1 [18:36:15] #info: ori> - The success criterion for this system (and this project) is its ability to unify the set of partial and divergent implementations that currently exist. [18:36:16] I would like too that although I'm guessing it may be more contentious [18:36:25] robla: no, let's use #agreed I think [18:36:28] if noone disagrees [18:36:34] (only ori can, as chair) [18:36:41] I think we should share the burden [18:36:52] sharing means noone feels responsible [18:36:52] leadership / ownership is fine [18:37:17] that doesn't mean we have to implement all the changes [18:37:24] that is correct [18:37:24] I'd like to make sure that the implementation for this will be Kafka-agnostic where possible. [18:37:27] it means we have to own them and poke til they get done [18:37:29] i'm curious what that leadership/ownership actually looks like [18:37:32] Perhaps not unlike how RCFeedEngine works rightnow [18:37:32] mark: I think it depends on what the liability is [18:37:38] where we have both Redis and UDP implementations [18:37:40] <_joe_> Krinkle: please let's talk about implementations separately [18:38:00] if nobody signs up to do the work, I think that teams providing the infrastructure shouldn't be the only ones on the hook [18:38:03] (Krinkle, we'll get there, but in past discussions everyone has been for that) [18:38:15] otherwise, we just encourage more one-offs [18:38:18] gwicke: well, if you're not willing to assume responsibility, I think there is another way [18:38:39] which is to start by porting a few things that aren't already systems that you guys have written [18:38:54] as a way of demonstrating the generality and suitability of the design [18:38:58] if this system works they way I hope it does, i'm happy to have ownership of those things [18:39:28] so. [18:39:29] #agree The success criterion for this system (and this project) is its ability to unify the set of partial and divergent implementations that currently exist. The work of refactoring / consolidating specific existing implementations should not fall exclusively on any one team; it is a joint responsibility. But: the implementers of this project should assume leadership / ownership of the overall process. (Last clause is actual [18:39:29] ly somewhat contentious; Ottomata agrees but Gwicke / urandom worried about implications.) [18:39:43] ori, like RCstream? [18:39:52] <_joe_> yup [18:39:57] <_joe_> or the jobqueue [18:39:58] #agreed The success criterion for this system (and this project) is its ability to unify the set of partial and divergent implementations that currently exist. The work of refactoring / consolidating specific existing implementations should not fall exclusively on any one team; it is a joint responsibility. But: the implementers of this project should assume leadership / ownership of the overall process. [18:40:07] #info (Last clause re: responsibility is actually somewhat contentious; Ottomata agrees but Gwicke / urandom worried about implications.) [18:40:25] _joe_: meaning migrating RCStream to use eventbus [18:40:27] ? [18:40:29] jobqueue gwicke is intersted in already, I think. [18:40:46] (time check: we have 35 minutes left) [18:40:53] but, if this system does what I think it will, I would have fun porting RCStream (would recruit helpers for sure) [18:41:04] rcstream is a pretty minor usecase overall [18:41:12] _joe_: yes, i'd like to see the jobqueue replaced before rcs, if you ask me [18:41:12] well, i'm just picking one [18:41:14] <_joe_> yup, should we move to the implementation ideas and questions? [18:41:14] paravoid: not in terms of analytics, [18:41:16] but that's a bigger feat [18:41:39] yeah, paravoid, edit stream in kafka is hugely nice for analytisc stuff [18:41:40] Yeah, I'd like to avoid a single person both designing and porting systems. It woudl benefit the design and confirm intuitive use if someone else can implement it at least once before we sort of rubberstamp it [18:41:43] basically, I'm trying to set clear expectations that our primary objectives / use cases are going to take priority over porting everything & the kitchen sink to eventbus right away [18:41:48] #info _joe_: yes, i'd like to see the jobqueue replaced before rcs, if you ask me rcstream is a pretty minor usecase overall jobqueue gwicke is intersted in already, I think. [18:42:11] ok, yeah, so with only 30 mins left [18:42:27] let's move on to some implementation discussion? i really really want to get some things solidified by the time this meeting is done [18:42:40] +1 [18:42:55] yeah fair enough [18:42:56] so what do you consider a good 1st deliverable of system gwicke ? [18:42:58] #info I'd like to avoid a single person both designing and porting systems. It woudl benefit the design and confirm intuitive use if someone else can implement it at least once before we sort of rubberstamp it [18:43:03] #info services' primary goal for this quarter is change propagation, and things needed in the eventbus for that use case will take precedence over porting other use cases [18:43:31] nuria: the deliverable is dependent first on settling _joe_'s beef [18:43:32] :) [18:43:36] why have a service? [18:43:37] for the services team, anyway [18:43:41] is dealing with old stuff going to be /any/ quarter's priority, gwicke? [18:43:42] why not just clien libs [18:43:44] client libs [18:43:49] I think that's fundamentally the problem [18:43:55] paravoid: not all of it, obviously [18:44:09] as I said, first one to be replaced is RESTBaseUpdateJob [18:44:18] _joe_, I'm going to summarize what you've said to me [18:44:19] what are the next ones ? [18:44:22] correct me if I say aything wrong [18:44:23] next step is several classes of job queue use cases [18:44:34] gwicke: I think the issue is that the license you have from the rest of us to use up engineering resources have to do with the scope of the project [18:44:37] under the heading of change propagation [18:44:40] the idea that one team drives forward with the new sexy thing and the rest get to clean up the mess is what some of us are trying to protect against here [18:44:41] if it's just something for restbase, that's one thing [18:44:43] _joe_ doesn't see the need for an HTTP REST service proxy in front of Kafka [18:44:49] if it's something that will improve life for all engineering, that's another [18:44:58] (or alternatively, dealing with the cost of maintaining more systems) [18:45:01] i think you're aware of that which is why you started with unification / consolidation of existing systems [18:45:11] we're just trying to suss out exactly what your commitment is [18:45:13] it is another system for ops to maintain [18:45:38] and the Kafka protocol is highly optimized for reliability and scalabilty, so why put HTTP in between [18:45:43] So, assunming we want to talk implementation, let's look at the restbase usecase. Would that be listening to generic events by MW, or would the extension produce its own topic/events from MW just for this. [18:45:45] we have been pushing for a generalization exactly to avoid the sprawl of one-off solutions we have seen in the past [18:45:48] especially since you will probably want a client wrapper over the REST service anyway [18:45:58] <_joe_> I have a question about the implementation idea: it is not clear if this REST proxy you want to build in front of Kafka will be used only to produce messages to it or also to consume those. Also, in both scenarios, what this proxy would add as a benefit opposed to creating solid libraries that have a direct interface to kafka? [18:45:58] listening from restbase main, or separate service between MW and restbase that does the deed. [18:46:03] why not just make client libs that do this, instead of centralizing with a service? [18:46:03] gwicke: generalization ought to mean more than "I think my solution is best, so plz migrate, kthx" [18:46:12] gwicke: then the new push needs to show concretely how it makes the situation better, rather than just adding to the burden [18:46:13] thanks _joe) [18:46:16] we're having 2-3 conversations at once :( [18:46:16] _joe_ [18:46:26] yeah! it hought we were going to switch to implementation! [18:46:28] <_joe_> but I guess the other discussion is not over, ottomata [18:46:34] <_joe_> sorry guys, go on [18:46:39] <_joe_> I think that's important [18:46:42] ottomata called in this RFC, so let's agree that the responsibility / ownership question is unsettled for now and move on to the implementation [18:46:43] yeah [18:46:49] Krinkle: the idea is that MW core emits events (edit, delete, revchange, etc) and these are put into the kafka queue, a consumer attached to a specific topic would then consume it [18:46:53] i'll add a few #info with quotes from the last part of the convo but ignore me [18:46:57] ori, mark: sure, I'm just saying that it's not okay to say 'you pushed for generalization, so now you get to own everything' [18:47:03] Krinkle: one of them would be the change prop system whihch would update restbase [18:47:14] you get to own *coordinating the migration* [18:47:24] and advocating for resourcing it and all that [18:47:29] coordinating the migration and discussion, sure [18:47:34] OK! [18:47:41] Let's address _joe_'s Q [18:47:42] :) [18:47:46] I guess I should had put stars around *own* here too [18:47:55] #info joe> I have a question about the implementation idea: it is not clear if this REST proxy you want to build in front of Kafka will be used only to produce messages to it or also to consume those. Also, in both scenarios, what this proxy would add as a benefit opposed to creating solid libraries that have a direct interface to kafka? [18:48:27] in my opinion, there will be no REST proxy consumption, it doesn't make sense :) [18:48:39] MAYBE some special websockets type services for specific purposes, like RCstream [18:48:46] but that is an application that uses kafka [18:48:49] ottomata: agreed [18:48:50] not a http proxy [18:48:56] can someone (that proposed it, i.e. gwicke?) recap this REST proxy idea? [18:49:00] so all consumers will be only speaking the kafka protocol ? or have I misunderstood ? [18:49:01] the validation / proxy stuff is purely focused on production [18:49:03] how is this supposed to hook and where? [18:49:19] and for which use cases (all, or partial? producers, consumers or both?) [18:49:23] in front of kafka [18:49:24] paravoid: when mediawiki (or any other system) wants to publish an event, it will POST (or PUT, i dunno?) the event [18:49:25] yeah, please recap [18:49:25] to validate messages [18:49:29] paravoid: main issue is making sure that we don't write garbage to random topics [18:49:30] rather than talk the kafka protocol directly [18:49:42] is that fair? [18:49:48] ori: yes [18:49:50] kafka messages are just bytes, so we need to agree on format and topic -> format mapping [18:49:59] the kafka protocol is a little low-level and (afaik) liable to change, so having an HTTP API sounds good to me [18:50:10] <_joe_> gwicke: then consumers will use the kafka protocol and stream data from it directly? [18:50:13] also the ability to enforce constraints at that layer is a good one too, agreed [18:50:13] I imagine in MediaWiki we'll have an abstract class/interface to emit arrays as events by topic. [18:50:14] I'm not sure I understand [18:50:17] ori: yeah, that's #1 from me, abstraction [18:50:17] _joe_: for now, yes [18:50:21] to me the real point about the proxy is msg validation [18:50:31] <_joe_> gwicke: the "for now" means what? [18:50:36] aren't we just shifting the problem of emitting valid messages from MW to this proxy? [18:50:41] it's still software that does this anyway, right? [18:50:48] _joe_: as ottomata mentioned, we can add higher level things on top, like RCStream [18:51:01] but gwicke, that is not part of eventbus proxy [18:51:05] paravoid: sure, but it's a question of where and how you draw the perimeter of "guaranteed valid" [18:51:07] <_joe_> gwicke: ok, but I am talking "internal" consumers like rcstream [18:51:16] why is this rest proxy more "trusted" to be talking kafka than mediawiki? [18:51:23] so, right, paravoid, because there will be many producers [18:51:26] not just mediawiki [18:51:27] <_joe_> yeah I don't get that either [18:51:28] no design automagically corrects nonconformant implementations, but the question is where you handle them [18:51:37] paravoid: because it's one thing [18:51:38] in one place [18:51:39] and all events need to be validated against a schema [18:51:45] coordinating deploys to many producers is hard [18:51:46] no one comes unto the father except thru me, all that jazz [18:51:50] i am the way to truth and light [18:51:56] well no, because it's software and software will have bugs [18:52:01] so you need to also handle it on the consumer side as well [18:52:02] <_joe_> I try to clarify it more: we will need client libraries that will consume said messages and validate those too, right? [18:52:05] I'm confused about use of proxy here. Are we talking about a kafka consumer that listens for restbase events and does the restbase actions (because presumably restbase doesn't want to include this logic?), or is this a proxy in general between mediawiki and kafka. [18:52:13] <_joe_> paravoid: heh [18:52:15] proxy in general between mediawiki and kafka. [18:52:20] _joe_: no, for consumption no [18:52:22] Krinkle: producer, not consumer [18:52:22] paravoid: experience with eventlogging says otherwise fwiw [18:52:27] I'm not sure why we'd want to talk from MediaWiki to kafka via HTTP, only more points where things can go wrong and have to scale. [18:52:32] no need to validate, that's done on the producer side [18:52:33] the valid event stream is guaranteed valid [18:52:37] Krinkle: lke this [18:52:37] http://docs.confluent.io/1.0.1/kafka-rest/docs/intro.html [18:52:38] <_joe_> mobrovac: so you don't validate input???? [18:52:43] (but not consumption) [18:52:45] Krinkle: it's just a simple end point that takes messages, validates them against the schema configured for each message's topic, and (if successful), writes it to Kafka [18:52:53] <_joe_> that is wrong, very wrong, as paravoid stated [18:52:53] having a layer of abstraction decouples all our systems from kafka, lets use alternate implementations where kafka doesn't make sense, lets us upgrade (or replace) kafka transparently [18:52:58] oh REST proxy is not RESTBase proxy, I thought y'all where abbreviating [18:53:00] _joe_: for consumers, no since the producer side does that [18:53:06] _joe_: no need to validate, that's done on the producer side <-- no need to validate by *consumers* [18:53:10] * urandom can't type either [18:53:15] yup ori [18:53:15] fwiw, kafka currently has no authentication, this may or may not change [18:53:17] because the proxy that all producers have to go through enforces that [18:53:20] <_joe_> ori: yeah I think that's foolish [18:53:36] it's very deja vu for me [18:53:40] <_joe_> clients still have to validate the messages they get. [18:53:42] it's almost like i wrote something like this at one point [18:53:45] any kind of software bug accidentally or malicious is able to inject garbage into that "secure"/"validated" topic [18:53:49] _joe_: yes, true [18:53:49] for context: the request volume we are talking about here is in the dozens to low hundreds of messages per second [18:53:53] _joe_: why so? [18:53:56] ori: :) [18:54:00] <_joe_> urandom: what paravoid said [18:54:06] _joe_: we do not validate 150K msgs per second of webrequest data [18:54:20] saying that consumers will blindly trust that kind of data means that any such action could cause mayhem across multiple independent systems [18:54:21] <_joe_> ottomata: that is one special application, though [18:54:25] paravoid: yeah, but if / when that happens, you fix the validation, not the consumers [18:54:32] paravoid: i don't agree [18:54:40] ottomata: I assume that Kafka HTTP interface is not a single server but disdtribtued like kafka itself, so MW will still need an array of all the kafka servers, right (which is a good thing iho) [18:54:42] we'll always need validation and authentication [18:54:44] ori: no, the point is, that a malicious action (or a bug!) could bypass the proxy [18:54:46] paravoid: but we don't validate database query results [18:54:51] ori: unless you say that you *also* consume from this proxy [18:54:59] Krinkle: yes, or load balanced [18:55:00] paravoid: _joe_: consumers *can* validate them if they want to, but that has no effect on the system itself [18:55:01] Though maybe we can abstract the pool of servers via an LVS service IP, not sure if that's an improvement or not [18:55:02] from a security perspective, the validation should happen as close to the consumer as possible. abstraction often makes things worse [18:55:05] its just an HTTP post [18:55:07] paravoid: are you pushing for public key crypto? [18:55:14] <_joe_> mobrovac: I argue that they need to [18:55:15] wait what now? [18:55:18] HOooookay [18:55:32] want to take a step back here, there are lots of raised issues. [18:55:34] all together now! [18:55:43] * paravoid shuts up and waits for ottomata [18:55:45] going to try to summarize the issues, and we can tackle them individually [18:55:53] ottomata: please use #info, #action etc. [18:55:57] will try [18:56:11] ottomata: Yeah, if MW has the array, it can presumably ensure delivery if things fail by re-trying. We're not worried about performance as much since this can be done post-send. [18:56:21] (after the http req ends, MW can live on for some time) [18:56:35] #info issue 1) Contention over whether events need to be validated on consumption [18:56:39] (ottomata has the floor) [18:57:24] #info issue 2) contention over whether produce validation should be done as an HTTP service or as a client lib [18:57:40] <_joe_> also, contention about the idea such a proxy could/should be used to consume events as well over reliability and complexity concerns [18:57:55] #info issue 3) scalability - proxy will lessen it, is it worth it? [18:58:12] ottomata: that doesn't make sense [18:58:20] it'll add overhead, but won't affect *scalability* [18:58:22] <_joe_> I would like a firm statement that we won't make 100 systems dependent on a proxy we invent here instead of proven tech that is already production ready [18:58:35] gwicke: let's discuss these one by one [18:58:44] gwicke: maybe you are right, should maybe say reliability? [18:58:51] before we do, can I interject with a meta-comment? [18:58:58] yes ori [18:58:59] I have comments on the first, but waiting [18:59:35] ori, you're chair :P [18:59:42] #info correction to issue 3) s/scalability/reliability of message production/ [18:59:50] In general have to subject every aspect of one's design to a vote by a committee feels awful. Our default attitude should be that the people who are closest to this problem and have been actively thinkign about it and sketching solutions have the right idea and know what they're doing. [19:00:08] (time check: 15 minutes remaining) [19:00:14] agreed, but I don't think we're talking here about every aspect of the design [19:00:17] I think most of us agree with that, but there's a basic lack of trust, which goes back to the topic of ownership / responsibility. [19:00:22] [19:00:30] there are some big issues i'd really like to settle before our 15 minutes are up [19:00:37] so i'd like to suggest we hold off on issue 1) [19:00:52] <_joe_> ottomata: agreed [19:00:54] I just need a clarification on issue (1) though, can I raise it? [19:00:56] sure [19:01:16] I read some comments that may have been talking about something else, hence my confusion [19:01:35] paravoid: the question is what this validation entails [19:01:37] we're talking about validation of the *protocol*, as in whether it's json or something else, what kind of schema and so on [19:01:41] hence my question about public key crypto [19:01:49] What constitutes validation here. Valid JSON syntax? Or elaborate schema's? I don't particularly look forward to requiring schemas in this area. especially becahse those would be backend (e.g. kafka/avro) specific. And probably make deployment harder as MW changes, we need to coordinate updating of the schema etc. [19:01:53] yeah that :) [19:02:07] Krenair: currently jsonschema is the way we are moving [19:02:12] Krinkle: we are going to have schemas [19:02:19] but yes, schemas are very important to this system [19:02:21] Krinkle: otherwise it equals free flow data [19:02:23] more so than the proxy [19:02:27] or the *data* therein? some people mentioned whether we validate database output, for example [19:02:28] Krinkle: which is impossible to analyze [19:02:40] generally, our APIs make a promise to produce data in a specific format, and I don't think that we should make an exception here [19:02:56] paravoid: we're proposing to validate the json against a schema that corresponds to that topic [19:03:00] paravoid: we want to validate a message against a schema [19:03:04] https://en.wikipedia.org/wiki/Robustness_principle [19:03:15] "Be conservative in what you send, be liberal in what you accept" [19:03:17] Krinkle: some reading for you: http://www.confluent.io/blog/stream-data-platform-2/ :) ignore the part about avro for now, just the part about schemas in general [19:03:18] yeah, schema is what I was talking about too [19:03:20] ottomata: unless you state very explicitly which conundrums you want addressed we won't get to them, this meeting is too rambunctious [19:03:32] Yeah, but it depends on how strict they are. [19:03:38] haha [19:03:40] ok! [19:03:41] E.g. we will have optional properties and areas where extensions can add properties. [19:03:45] mmm.. i do not think whether we need schemas is up for discussion really, without them we cannot use the data..can we move on from that? [19:03:48] paravoid: can I guide discssion now? [19:03:53] yes please [19:03:57] ok [19:04:03] so, here are the two things I want settled [19:04:04] But yeah, basic validation using jsonschema seems feasible. I like. [19:04:07] 1. proxy or not. [19:04:15] 2. if proxy: eventlogging or new NodeJS service [19:04:41] so, since 2. depends on 1. [19:04:46] not kafka-rest? [19:04:46] <_joe_> My take on "proxy or not": as long as it's just for producing messages and not consuming, I'm ok with it [19:05:22] it seems to be the simplest way to get started [19:05:31] ok, haha, since _joe_ is the only real objector that I have heard to 1, can we agree on to proxy then? [19:05:36] proxy for what? [19:05:38] everything? [19:05:39] production [19:05:41] _joe_: correct, that's what we have in mind as well [19:05:41] <_joe_> proxy for what? [19:05:45] sure but we still do not have a concrete deliverable do we? [19:05:46] <_joe_> ok [19:05:54] is it required for every eventbus producer to use this proxy? [19:06:05] yes [19:06:09] so agree that we will have a service that allows for production and validates messages destined for a topic against a specific schema [19:06:15] (nuria: I don't think anyone disputes that change propagation for RESTBase is the chief immediate need) [19:06:21] paravoid: initially, yes, but we can reconsider at any point based on what we learn [19:06:21] no I don't think this is sane as a requirement [19:06:21] paravoid: i say no. [19:06:28] Does kafka native support validation? [19:06:29] paravoid: so that we are sure messages are validated [19:06:31] but i think we don't have to decide that now [19:06:34] (Krinkle: no) [19:06:38] Or is that the main motivation for a prpxy, ok [19:06:41] yeah I don't agree this would make it "sure" messages are validated either [19:06:52] Krinkle: it supports no validation or authentication [19:07:05] ok [19:07:12] so agree that we will have a service that allows for production and validates messages destined for a topic against a specific schema? [19:07:14] <_joe_> I don't think mandating everything through a rest system is a good idea. We want to be able to do streaming with something like kafka even when producing [19:07:24] re: eventlogging or new nodejs service, ottomata / mobrovac / urandom, what do you guys think? [19:07:27] agree joe, but let's not debate now [19:07:32] _joe_: we can make exceptions where warranted [19:07:33] <_joe_> ottomata: agreed [19:07:42] ok, so [19:07:43] haha [19:07:46] I think it just substitutes one protocol for another and possibly enhancing the data in the process, and I'd be okay with that if it was for ease of use etc. [19:07:47] <_joe_> gwicke: makes sense, thanks [19:07:49] 7 minutes left :p [19:07:51] as an optional component [19:07:55] ottomata has aged 20 years in the last 20 minutes [19:08:09] ayayay... [19:08:15] just noting that I do not thing that a central service in whatever implementation for validation is really the best idea [19:08:17] but definitely when I hear "eventbus" I'm not thinking of "produce with HTTP, consume with kafka", no [19:08:21] s/thing/think/ [19:08:29] (Please add #action / #info / #agreed , I am having a hard time keeping up) [19:08:31] i don't think there's consensus on needing a proxy for producing yet [19:08:34] can I add agreed? [19:08:35] what if I want to do both? [19:08:50] <_joe_> I don't think either [19:08:51] #agreed we will have a service that allows for production and validates messages destined for a topic against a specific schema [19:08:59] ok, now to the next part [19:09:00] ^ ottomata: no, I dispute that [19:09:03] ACK [19:09:05] haha [19:09:09] mamma mia [19:09:16] chaos on a friday night [19:09:21] <_joe_> mark: it /allows/, it doesn't mandate in otto's words [19:09:25] #info it's contentious whether that service should be required for *all* use cases [19:09:41] well, sort of [19:09:55] Can we require it for all messages produced from MediaWiki? [19:10:00] maybe [19:10:01] but, given the lack of auth and validation, we should be careful about which other producers we allow [19:10:02] I think the deliverable of this project should be a low-level protocol (kafka) + schemas [19:10:08] can we use 5 minutes to get to the real thing i wnat to discuss? [19:10:11] and not higher-level components [19:10:16] paravoid: interseting. [19:10:41] the protocol, so to speak, that we all agree to talk in, is kafka and this schema spec [19:10:45] you definitely will not be able to trust message validity if some producers are left to handle that on their own [19:10:56] I'm arguing that this is needed anyway [19:10:57] urandom: +1 [19:11:03] <_joe_> with and client libraries that we will need anyways? :) [19:11:03] paravoid: I would go as far as also saying some (nodejs?) library [19:11:09] ottomata: Tell us the thing. [19:11:10] (btw yall, I have several more hours left in my day, so I can be here for a while) [19:11:13] urandom: you will always anyway have that [19:11:13] ottomata: tell us the thing [19:11:26] haha [19:11:27] akosiaris: i don't understand why that is the cases [19:11:33] well, if we don't have a proxy [19:11:37] then my issue is more moot [19:11:38] so uh [19:11:41] paravoid: consumers can validate the syntax, but not the accuracy of events [19:11:43] if we don't have that in agreement... [19:11:47] mine is about implementation of the proxy [19:11:49] for the latter, you'd need public crypto [19:11:56] urandom: bugs ? rogue (not malicious) producers, etc [19:12:16] akosiaris: the analogy i've used is a database, and the same applies there [19:12:21] I think regardless of whether or not we use EventLogging, it'd be foolish to discount the experience we have with a system very much like the one that is being proposed here [19:12:24] <_joe_> gwicke: no you need standard libraries and schemas [19:12:26] akosiaris: i think it's fair to say we're restricting the producing side to internal components for now [19:12:28] there are aspects of it that have worked really well [19:12:30] we don't typically perform such validation on database query results [19:12:35] every *consumer* by design, will have access to produce to kafka, on that very same topic [19:12:46] ori: +1 [19:12:46] do you trust every consumer to not mess up and start producing garbage by mistake? [19:12:55] _joe_: to establish that the message you got is really from a trusted producer, you'd need more than that [19:13:06] win 2 [19:13:08] paravoid: yeah, i'm in favor of abstracting consumption as well :) [19:13:12] urandom: you do , if it doesn't abide to table insert fails, schema validation works th same way [19:13:13] <_joe_> gwicke: to estabslish if it's valid, no [19:13:19] <_joe_> (valid, not trusted) [19:13:22] urandom: *if* that's the case, then I might agree yeah [19:13:24] HI um [19:13:31] ottomata: hi! [19:13:32] urandom: in the sense that validation will happen on the consumer again :) [19:13:33] yall want to do this for a while longer? [19:13:37] _joe_: so, syntax is alright, but are you going to act on it? [19:13:42] <_joe_> urandom: I am strongly against abstracting consumption [19:13:52] _joe_: i know :) [19:14:11] _joe_: ultimately, you need some level of assurance on where the event comes from [19:14:11] i'm happy to continue this discussion on TO service or NOT TO service [19:14:15] if we have more time [19:14:24] ha, well, meeting is done anyway [19:14:24] haha [19:14:32] do we have more time? [19:14:34] should we restrict the conversation to internal usage only for now and leave the security aspects (external clients) for later? [19:14:43] I can stick around as a participant, but I think that as chair before the conversation continues we should summarize a few things with #action / #info [19:14:50] I can go maybe another 15mins but not much more [19:14:54] <_joe_> #info _joe_: I am strongly against abstracting consumption away from kafka, on reliability concerns and on the architectural principle that proxy-services should only be used to add flexibility to a system [19:14:56] yeah, me too [19:14:56] it's late, it's friday, I'm sick [19:15:00] i have not heard the requirement/use case of external clients yet [19:15:01] i have to go now [19:15:14] mobrovac: take care [19:15:15] byyye mobrovac :) [19:15:20] mobrovac: ttfn [19:15:22] * mobrovac waves [19:15:23] mark: we surely need something like websockets there [19:15:37] sorry everyone i did not moderate this more effectively, I didn't actually realize I was making myself chair by using #startmeeting, and I didn't anticipate how contentious this is. [19:15:40] ok, so, if we only have 15mins more, let's say for the sake of argument that we are going to build a rest proxy service for production. [19:15:40] mark: seeing how hard is to come to agreement on internal consuption i think makes no sense to talk about external [19:15:56] hehe :) [19:16:01] you did a great job ori thank you! [19:16:06] However untrustworthy consumption is in Kafka (no way to prevent consumer from injecting/producing data, basically an SQL user with no way to separate SELCET from INSERT if I understand this right? [19:16:06] yes, but - the need for it matters a lot for the discussion of library vs service :) [19:16:12] ottomata: I'm not hearing a lot of enthusiasm for even 15 minutes, and I don't see what it'll accomplish other that just more discussion [19:16:27] Consuming is gonna be a lot more important in terms of scale and uptime. We probably don't want to abtract that. [19:16:35] mark: aham...indeed [19:16:43] ok, robla so what should we do? [19:17:00] what a ride [19:17:06] (Krinkle +1) [19:17:08] just another day in the office [19:17:11] haha [19:17:11] folks, I'd like to get some clarity on next steps [19:17:26] ottomata: I'll put that back to you: what do you want to accomplish in the next few minutes? Just more non-binding discussion? [19:17:28] we have a ticket open for two boxes per DC [19:17:39] for Kafka and the potential proxy [19:17:41] gwicke: I think you have clarity on next steps -- I haven't seen you budge from the intentions you sketched out months ago, tbh. [19:18:04] ori: like, setting up kafka? [19:18:11] I want to decide if we should continue to implement the restevent stuff the services team is working on, or just adapt EventLogging to do this [19:18:21] ottomata: well, I asked you what you, urandom and mobrovac thought about that [19:18:29] haven't heard an answer yet :) [19:18:34] me? [19:18:41] i think we should adapt eventlogging, unless there is really good reason not to [19:18:46] it already does most of this, and more [19:19:29] is there a really good reason not to? [19:19:29] ottomata: that is a pretty key question that i do not think we can answer in 5 mins [19:19:42] o O ( don't know much about the subject, but judging from the topic and knowing the people, it doesn't sound like something that will be agreed on in the next 11mins ) [19:19:46] https://phabricator.wikimedia.org/T114443#1736288 [19:19:50] but let's not pessimistic, go ahead :) [19:19:52] ottomata: seemns that less piecess is better [19:20:01] haha [19:20:02] paravoid: jajaja [19:20:09] indeed nuria, paravoid [19:20:16] ori: some of the reasons are simplicity, reliability and performance [19:20:18] but imo one of the reasons for this meeting was to decide this. [19:20:19] :/ [19:20:31] gwicke: please less marketing speak, more eng. speak :) [19:20:37] gwicke: let's please not talk about perf [19:20:38] ottomata: you are an optimist! [19:20:38] <_joe_> performance and reliability? what does that mean? [19:20:46] gwicke: without knowing functional requirements [19:20:56] having a simple service that only validates messages & enqueues them is potentially good for reliability [19:21:12] <_joe_> gwicke: as opposed to what? [19:21:23] <_joe_> no one wants to build a complex service I guess [19:21:25] I'd expect something like: "EL is flawed in way X and it's impossible to fix/more work to fix it than create something new and migrate EL to it" [19:21:26] the benchmark numbers we have gotten in labs indicate a 50% difference in throughput [19:21:33] (_joe_: having to deal with other people's code and other people's ideas) [19:21:34] gwicke: isn't that a eventlogging consumer though? [19:21:42] <_joe_> ori: oh, that! [19:21:48] gwicke: that's a nice, round number! [19:21:59] gwicke: let's please, please not talk about perf [19:22:24] gwicke: let's focus on functional requirements [19:22:31] gwicke: the comparison is between a very simple nodejs service, and several layers of smart abstraction in eventlogging, that does more than just validate events. i think those abstractions are very useful. [19:22:41] any solution here has to be horizontally scalable. [19:22:57] let's not go back to original requirements ("scalable") [19:23:04] but focus on why EL is insufficient [19:23:14] someone is proposing to write something to replace it, AIUI? [19:23:19] overhead has a financial cost, but in this case the volume is small enough for that to not matter too much [19:23:22] One thing I just wanna throw in here is that we should make sure there is some kind of design aspect that prevents injection from client-side schemas into these EventBus schema's. Not like how statsv and EventLogging currently allow both client and server side to emit any schema. [19:23:40] paravoid: i think one thing EL would need added( besides a rest proxy ) is ability to retrieve schemas from disk [19:23:45] has the lack of scheme enforcement the proxy thing woudl provide been an issue with eventlogging now? [19:23:48] Krinkle: indeed. we are talking about having a more controlled schema repo [19:23:49] paravoid: we have a simple service that does what we need for the proxy [19:23:57] that says nothing to me gwicke [19:23:58] 21:21 < paravoid> I'd expect something like: "EL is flawed in way X and it's impossible to fix/more work to fix it than create something new and migrate EL to it" [19:23:59] do we have non compliant producers clogging up the works, I haven't figured that out [19:24:13] chasemp: no, it hasn't but consumption is also controlled by eventloggin [19:24:36] chasemp: EL also does schema enforcement [19:24:40] we agreed before that we would migrate old use cases to this new system, no? [19:24:43] eventlogging has all kinds of code that's specific to log line parsing, SQL table storage for analytics, etc [19:24:50] Yeah, we'll need to figure out a workflow that scales well for community engineering. E.g. adding an event in MW, the schema goes somewhere, and then in prod we need to pull it in there from, mw core git? Or explicit config repo where we copy them to from mw? [19:24:53] all of which isn't needed for the simple producer proxy [19:24:54] so unless you also have code for porting EL to your new service, you can't call this done [19:24:54] gwicke, it's pretty nicely abstracted [19:25:08] or replacing exactly what EL is doing [19:25:11] Krinkle, TBD, but something like that [19:25:12] <_joe_> gwicke: how much work does it take to move that to an independent layer? [19:25:13] #endmeeting [19:25:13] Meeting ended Fri Oct 30 19:25:13 2015 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [19:25:13] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-10-30-18.17.html [19:25:13] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-10-30-18.17.txt [19:25:13] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-10-30-18.17.wiki [19:25:14] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-10-30-18.17.log.html [19:25:36] gwicke: does it have what it takes for the producer proxy is the question, not what it has extra [19:25:40] I thought we said 15mins, not 10, but ok :) [19:25:59] akosiaris: currently, it lacks static schema support [19:26:00] (i think ori just wants to get off of the uncomfortable chair) [19:26:04] paravoid: speed-meetings is set On [19:26:07] speedy [19:26:09] there is also no mapping of topics to schemas, afaik [19:26:15] gwicke: https://gerrit.wikimedia.org/r/#/c/235671/ [19:26:24] just as easy to add as restevent [19:26:25] Krinkle: right now schemas for MW confluent producers are on a git depot (much WIP project) [19:26:25] what isn't there, can be added [19:26:32] https://gerrit.wikimedia.org/r/#/c/235671/7/server/schemas/config.json [19:26:46] another concern is third party users [19:27:06] we'd need to package eventlogging separately, and run a separate service [19:27:21] in mw? doesn't it already work for third party users? [19:27:27] (agree on package it separately though) [19:27:42] we have analysts like halfak running eventlogging locally for development [19:27:50] while with node we get to combine several logical services in a single process [19:28:08] <_joe_> gwicke: seriously, this is a limitation of python? [19:28:23] <_joe_> gwicke: we can use uwsgi which does exactly what you want [19:28:41] <_joe_> also, being for producing, I guess a single-thread eventloop would be enough for us [19:28:51] _joe_: this PoC uses tornado [19:28:53] <_joe_> so, I don't see a strong argument here, tbh [19:29:02] _joe_: main issue is that it would be the only python service in small installs [19:29:03] it's rather far fetched as an argument [19:29:11] * halfak is trying to work out the topic from the scrollback [19:29:15] <_joe_> gwicke: it's not [19:29:18] Also one thing is taht implementing this on EL now doesn't mean it cannot be separated later if we really see the need [19:29:18] * halfak is not an analyst :P [19:29:20] haha, don't do it halfak! [19:29:21] hahahhahah [19:29:26] halfak: good luck [19:29:30] ha [19:29:31] you'll hurt your eyes! [19:29:40] halfak: dohhhhh [19:29:40] * halfak runs back to his editor [19:29:41] sorry [19:29:42] o/ [19:29:52] <_joe_> rcstream is a small python service that has worked reliably for 1 year and a half [19:29:55] <_joe_> for example [19:30:04] been holding the why nodejs in as well, I haven't figured out where that requirement comes from [19:30:10] _joe_: *small installs* [19:30:24] we are talking third party small vm with low memory [19:30:35] <_joe_> oh third-parties, well [19:30:36] seems that to prove the value proposition of system the fastest we get it out there the better [19:30:55] <_joe_> would they ever use this system which is kafka-backed? I don't think so [19:31:01] and the fastest seems to me is tagging along on EL [19:31:08] that's a nice thing about EL, _joe_ [19:31:09] we're now well into the realm of irrelevance [19:31:11] it works without kafka [19:31:25] <_joe_> mark: :) [19:31:43] anyway; if analytics commit to supporting & packaging EL for third party use, then I'd be okay with using that [19:32:00] gwicke: the "Not Invented Here" syndrome? [19:32:16] mark: I don't think so [19:32:43] gwicke: can totally package EL for third party use....but isn't it already done? [19:32:52] i mean, i want to overhaul EL deployment and packaging anyway [19:32:59] like, debs [19:33:03] yeah [19:33:04] i want ot do that [19:33:13] <_joe_> gwicke: dh_python kinda makes that super-simple [19:33:13] but, for small third parties, it is easier as is [19:33:32] in any case, the rest proxy is super simple [19:33:32] they just get the mw extension like usual, and then run EL [19:33:33] <_joe_> small third parties should not use this system, let's be realistic [19:33:35] <_joe_> please [19:33:46] as long as we have a good api, we can always swap that out at any time [19:33:50] <_joe_> small third parties will not need the whole mess we do have [19:34:00] the node implementation is a couple hundred lines total [19:34:03] _joe_: i agree, i am not sure about small third party use case [19:34:12] <_joe_> nuria: I think it's irrelevant [19:34:44] <_joe_> anyways, I really have to go now, sorry. [19:35:13] me too [19:35:23] ok, let's close discussion here then. [19:35:34] have a good weekend everyone [19:35:41] ... a lot .. more .. to come .. i bet ... [19:35:42] ciao [19:35:45] :) [19:35:45] mark: thanks, you too! [19:35:47] <_joe_> you too!! [19:36:10] good evening mark and other European folks! [19:36:16] thanks all [19:36:25] thanks for staying up! [19:40:01] ottomata: what is the timeline for implementing basic topic -> schema mapping support? [19:41:03] in eventlogging gwicke? [19:42:29] i'm doing that in my PoC now [19:42:42] but, doing it nicely is a different thing [19:43:48] :) [19:45:33] gwicke: should we chat in #services or hangout and talk about next steps? [19:53:14] ottomata: sorry, was distracted IRL [19:53:18] -> #services? [22:06:27] I read http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/20151030.txt [22:06:48] I love that it's literally discussing Kafka. [22:06:59] Surely others appreciate that. [22:07:27] :) [22:08:18] It wasn't the most disorganized IRC meeting I've seen, but surely writing a wiki page would be easier... [22:08:49] or phab tickets. come on by, there are 4 you can read if this is your kinda fun [22:09:05] https://phabricator.wikimedia.org/tag/eventbus/ [22:09:23] Phabricator isn't that great for collaboration of this kind, IM[EO]. [22:09:37] ottomata: Is there an RFC on mediawiki.org? [22:09:51] don't think so