[17:36:24] ... [18:12:24] SF folks (e.g. brion, ori ) are you planning on using the conf room up on 5 for the E86 meeting that's about to happen? [18:12:37] robla: i'm remote, will be on hangout [18:12:44] * robla contemplates heading upstairs [18:13:14] shouldn't the meeting happen on IRC and not on the hangout? [18:13:17] * mobrovac confused [18:13:31] oh maybe i'm confused! [18:13:34] i'd prefer that, actually [18:13:39] but i saw that there was a hangout on the invite [18:13:40] mobrovac: I wasn't planning on joining a hangout [18:13:43] but maybe it was just the default [18:13:46] i have to run off toward internet archive soon [18:13:49] but i'll be on my phone of course ;) [18:13:54] (for irc) [18:14:01] <_joe_> so irc it is, ok [18:14:06] <_joe_> I think that's better [18:14:06] lol [18:14:08] ori: I think you're right.... I think the default setting is awful [18:14:10] unlike the rest of you mobrovac actually corrected me [18:14:18] instead of just being like "ok ori, you can join that hangout" [18:14:22] :P [18:14:38] <_joe_> ori: we wanted you to join an empty party, yes [18:15:24] <_joe_> ok, I guess it's time to start? [18:15:35] aham, irc only then [18:15:37] ? [18:15:43] _joe_: I think so. who's chairing this? [18:15:47] <_joe_> nuria: yup [18:15:50] hiayy [18:15:53] <_joe_> robla: ah! good question! [18:16:03] <_joe_> ottomata? [18:16:08] ori, I added the hangout before I knew what I was doing [18:16:08] so [18:16:09] IRC [18:16:16] yay [18:16:22] I have never run an RFC meeting, nor have I reallllly been in one. [18:16:24] <_joe_> robla: I actually assumed you would :} [18:16:28] hello [18:16:34] but, i can start, can someone else run this thar meetbot thing? [18:16:50] * paravoid is here too [18:16:56] _joe_: I didn't volunteer to run this, and I opposed setting up a one-off [18:17:05] <_joe_> robla: ok [18:17:06] hello [18:17:06] * urandom too [18:17:15] #topic EventBus | RFC meeting | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ (Meeting topic: RFC meeting) [18:17:20] thank sori [18:17:29] startmeeting first? :) [18:17:29] I'm not on the bot's list, apparently [18:17:32] oh [18:17:41] #startmeeting [18:17:41] ori: Error: A meeting name is required, e.g., '#startmeeting Marketing Committee' [18:17:46] #startmeeting EventBus RFC [18:17:46] Meeting started Fri Oct 30 18:17:46 2015 UTC and is due to finish in 60 minutes. The chair is ori. Information about MeetBot at http://wiki.debian.org/MeetBot. [18:17:46] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [18:17:46] The meeting name has been set to 'eventbus_rfc' [18:17:47] #startmeeting [18:17:47] gwicke: Error: Can't start another meeting, one is in progress. Use #endmeeting first. [18:18:07] #topic EventBus | RFC meeting | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ (Meeting topic: RFC meeting) [18:18:21] * mobrovac noise [18:18:22] so, I think most people attending this meeting (except maybe Robla?) have a lot of context already, right? [18:18:25] thanks ori! [18:18:25] look i'm like tim starling [18:18:35] now i just need to be smarter and have a dryer sense of humor [18:18:41] and maybe ori has a little. [18:18:44] * robla will catch up....don't worry about context for me [18:18:46] ayayay [18:19:21] so, the basic idea is to have a sane way to produce and distribute events to multiple consumers [18:19:22] k, so, from what I can tell, this RFC is mostly to help us do a couple of things: [18:19:49] #link https://phabricator.wikimedia.org/T88459 [18:20:14] #info [18:20:14] this meeting will help us: [18:20:14] - finalize some implementation decisions [18:20:14] - get buy in from ops and others [18:20:22] (maybe that worked, heh?) [18:21:07] I know that _joe_ in particular isn't sold on the need for an EventBus service at all [18:21:11] should we talk about that first? [18:21:14] ottomata: I think the info needs to be a single line [18:21:21] (and the bot doesn't ack those) [18:21:36] meh, oook [18:21:37] he [18:21:37] h [18:21:37] <_joe_> I guess it's a good starting point, yes [18:22:03] #info goals for meeting: 1) finalize some implementaiton decisions, 2) get buy in from ops and others [18:22:17] Ok, should we talk about use cases that benefit from an event system then? [18:22:21] ok, for a little bit of context: my main goal here is to standardize the way events are sent around, and the format of those events [18:22:34] I think we all agree that doing that is good [18:22:37] one use-case is the fact that restbase needs to update its content based on actions taken on-wiki, right? [18:22:43] I agree with nuria [18:22:45] ok [18:22:47] ori: yes [18:22:52] <_joe_> nuria: actually I'd like to separate the discussion about "event system" and "implementing a rest proxy in front of kafka" [18:23:02] we have started to discuss a first set of events needed for that use case at https://phabricator.wikimedia.org/T116247 [18:23:04] yes _joe_ agreed [18:23:09] <_joe_> the two things are not equivalent [18:23:13] use cases are agnostic to implementations [18:23:24] let's talk about the "whys" [18:23:26] for use cases, there is also: https://etherpad.wikimedia.org/p/scalable_events_system (near the bottom) [18:23:29] and fate about "hows" [18:23:30] _joe_: but which are you actually skeptical about? the use-case or the implementation? [18:23:38] <_joe_> ori: the implementation [18:23:47] the basic events defs are here - https://github.com/wikimedia/restevent/pull/5 [18:23:51] nuria++ :) [18:23:54] nuria, paravoid, we can talk about this, but we have had several meetings about the use cases already [18:24:01] let's recap them anyway [18:24:05] ok [18:24:14] they're going to be useful for context and for the discussion that will follow I think [18:24:14] <_joe_> yes that is useful for the rest of the discussion [18:24:20] <_joe_> +1 [18:24:31] okay, so one is tracking changes in MediaWiki [18:24:31] one use case is to be able to join (via event queue) differents sources of info [18:24:35] currently, there are many ways and formats that MW and other clients send events around [18:24:51] example: join pageview data and editing data w/o having to use the database records [18:24:53] eventlogging, RCfeed, RCstream, and a myriad of other direct one offs [18:25:33] there are also custom extensions creating custom jobs [18:25:38] ottomata: right, and we would like to have a uniform way in which we can inspect events that represent for example "edits" [18:25:43] like the RESTBaseUpdateExtension [18:25:57] ottomata: so would it be fair to say that a successful implementation will unify / replace the systems you just mentioned? [18:26:02] edits, and everything really. and we want a way to be sure that any particular set of events conforms to a schema [18:26:11] so downstream consumers can know what they are getting [18:26:18] without having to know about the upstream producers [18:26:22] yes ottomata, edits was just a concrete example, right [18:26:25] yes [18:26:29] ori: eventually, yes [18:26:46] ori, eventually, yes [18:26:48] haha [18:26:48] in the short term, we have more concrete use cases that we'd like to replace [18:26:49] <_joe_> sorry can I stop for a second? [18:26:52] yes. [18:26:52] * akosiaris kind of worried abut that eventually... seems to be a synonym for never [18:26:56] yeah [18:27:04] <_joe_> RCStream is an externally exposed service [18:27:08] I think EventLogging will be hard to replace in the short term [18:27:18] <_joe_> it has nothing or little to do with how events are propagated internally [18:27:19] RCStream would just swap out the backend for this system [18:27:24] that one is a little easier [18:27:28] <_joe_> ok [18:27:30] ori said "unify / replace" [18:27:36] not "replace" [18:27:56] <_joe_> ok [18:27:59] tackle all those systems is not a priority for us this quarter [18:28:03] I'm with akosiaris on this. When someone wants to introduce new things, they talk about all the existing ad hoc systems that need to be consolidated. But then when the discussion shifts to concrete plans, that tends to become ethereal, qualified. [18:28:14] yeah, agreed [18:28:21] *tacking [18:28:23] we keep adding stuff to our stack [18:28:25] * gwicke can't type [18:28:38] It doesn't have to be now, but I think a firm commitment would be nice [18:28:41] and maybe a timeline [18:28:41] but rarely go back and consolidating [18:28:46] ori++ [18:28:46] we are concretely planning to replace the RESTBaseUpdate extension [18:28:51] <_joe_> I agree that a clear path of migration should be laid out beforehand [18:28:59] this quarter [18:29:06] it doesn't have to be this quarter [18:29:10] I don't know much about RCFeed, and I only know a little about RCStream. from what I can tell, RCStream won't really be that hard to unify into this system. [18:29:18] most of what we are building is on the producer side of things [18:29:46] there is a bug about letting consumers catch up on disconnect as well, which will be easier to address with a kafka backend [18:30:07] <_joe_> we also have i.e. the search update jobs - a ton of jobs that get spawned by edits right now and that we might want to replace with the event bus [18:30:14] EventLogging requires some more discussion, specifically on the implementatino side [18:30:25] it is a contender for implementation choice [18:30:48] _joe_, indeed. [18:30:50] are those in redis now? [18:30:54] <_joe_> yes [18:30:56] aye [18:30:56] does anyone disagree that "a firm commitment and maybe a timeline" for migrating is needed? [18:31:02] (paraphrasing ori) [18:31:06] <_joe_> agreed [18:31:15] if folks sign up to do the work, then sure [18:31:18] hm. mostly. :) [18:31:21] but paravoid depends if priority is to migrate an old system [18:31:30] we have concrete use cases that take priority [18:31:40] or to create an event flow that say, i can tap into now [18:31:45] "folks" are not signing up to do the work of maintaining more systems either though, are they? [18:31:50] generalizing to cover other use cases is definitely planned, but will require a joint effort [18:32:11] agreed on the joint effort [18:32:12] but i see your point [18:32:15] <_joe_> priority is not to have multiple systems that replicate the same functionality or duplicate part of it indefinitely. [18:32:24] but someone needs to take ownership and be held accountable for coordinating that effort [18:32:26] _joe_: yeah, definitely [18:32:29] paravoid: i think there isn't a system that will work for gwicke's current use case [18:32:29] right, i see, from ops perspective the value proposition [18:32:35] so, a new one of some kind is needed [18:32:39] is to do away with one of the systems we currently have [18:32:53] if that new one can be made general enough to be able to migrate old systems to it [18:32:55] then we should do that [18:32:57] as we can [18:33:10] otherwise paravoid and _joe_ fill that we are just adding to the problem of too many systems doing teh same thing [18:33:20] what other system is doing the same thing? [18:33:21] ok, everyone hang on. let's capture consensus here because i think we have one, let me take a stab at it [18:33:21] so, first system that's going to be replaced is the RESTBaseUpdateJobs [18:33:22] <_joe_> nuria: I think that is a valid architectural principle, not just an ops concern [18:33:23] 8the [18:33:49] ori: yes, please do [18:33:51] maintaining systems doesn't solely fall into our hands [18:33:56] _joe_: right, it shifts the use cases a bit though [18:34:10] decreasing entropy's good for humans [18:34:26] * robla is also eager to hear ori's consensus capturing [18:34:35] - The success criterion for this system (and this project) is its ability to unify the set of partial and divergent implementations that currently exist. [18:34:35] then -seems to me - a good 1st use case is to have an edit stream that can replace current backend for rc stream [18:35:01] there is a balance to be struck between having work informed by a concrete use case, and making things general and universal [18:35:01] yeah I think we all agree to this [18:35:05] o/ [18:35:06] well, nuria, yes, to have an edit stream that can replace backend systems that use an edit stream. [18:35:09] <_joe_> ori: +1 [18:35:12] like RESTBaseUpdateJobs [18:35:22] - The work of refactoring / consolidating specific existing implementations should not fall exclusively on any one team; it is a joint responsibility. [18:35:24] but, [18:35:32] and eventually RCstream, i thikn RESTBaseUpdateJobs is the first use case that people are really really excited about! [18:35:34] :) [18:35:37] agreed too [18:36:00] - The implementers of this project should assume leadership / ownership of the overall process. [18:36:08] <_joe_> ori: +1 [18:36:10] +1 [18:36:15] #info: ori> - The success criterion for this system (and this project) is its ability to unify the set of partial and divergent implementations that currently exist. [18:36:16] I would like too that although I'm guessing it may be more contentious [18:36:25] robla: no, let's use #agreed I think [18:36:28] if noone disagrees [18:36:34] (only ori can, as chair) [18:36:41] I think we should share the burden [18:36:52] sharing means noone feels responsible [18:36:52] leadership / ownership is fine [18:37:17] that doesn't mean we have to implement all the changes [18:37:24] that is correct [18:37:24] I'd like to make sure that the implementation for this will be Kafka-agnostic where possible. [18:37:27] it means we have to own them and poke til they get done [18:37:29] i'm curious what that leadership/ownership actually looks like [18:37:32] Perhaps not unlike how RCFeedEngine works rightnow [18:37:32] mark: I think it depends on what the liability is [18:37:38] where we have both Redis and UDP implementations [18:37:40] <_joe_> Krinkle: please let's talk about implementations separately [18:38:00] if nobody signs up to do the work, I think that teams providing the infrastructure shouldn't be the only ones on the hook [18:38:03] (Krinkle, we'll get there, but in past discussions everyone has been for that) [18:38:15] otherwise, we just encourage more one-offs [18:38:18] gwicke: well, if you're not willing to assume responsibility, I think there is another way [18:38:39] which is to start by porting a few things that aren't already systems that you guys have written [18:38:54] as a way of demonstrating the generality and suitability of the design [18:38:58] if this system works they way I hope it does, i'm happy to have ownership of those things [18:39:28] so. [18:39:29] #agree The success criterion for this system (and this project) is its ability to unify the set of partial and divergent implementations that currently exist. The work of refactoring / consolidating specific existing implementations should not fall exclusively on any one team; it is a joint responsibility. But: the implementers of this project should assume leadership / ownership of the overall process. (Last clause is actual [18:39:29] ly somewhat contentious; Ottomata agrees but Gwicke / urandom worried about implications.) [18:39:43] ori, like RCstream? [18:39:52] <_joe_> yup [18:39:57] <_joe_> or the jobqueue [18:39:58] #agreed The success criterion for this system (and this project) is its ability to unify the set of partial and divergent implementations that currently exist. The work of refactoring / consolidating specific existing implementations should not fall exclusively on any one team; it is a joint responsibility. But: the implementers of this project should assume leadership / ownership of the overall process. [18:40:07] #info (Last clause re: responsibility is actually somewhat contentious; Ottomata agrees but Gwicke / urandom worried about implications.) [18:40:25] _joe_: meaning migrating RCStream to use eventbus [18:40:27] ? [18:40:29] jobqueue gwicke is intersted in already, I think. [18:40:46] (time check: we have 35 minutes left) [18:40:53] but, if this system does what I think it will, I would have fun porting RCStream (would recruit helpers for sure) [18:41:04] rcstream is a pretty minor usecase overall [18:41:12] _joe_: yes, i'd like to see the jobqueue replaced before rcs, if you ask me [18:41:12] well, i'm just picking one [18:41:14] <_joe_> yup, should we move to the implementation ideas and questions? [18:41:14] paravoid: not in terms of analytics, [18:41:16] but that's a bigger feat [18:41:39] yeah, paravoid, edit stream in kafka is hugely nice for analytisc stuff [18:41:40] Yeah, I'd like to avoid a single person both designing and porting systems. It woudl benefit the design and confirm intuitive use if someone else can implement it at least once before we sort of rubberstamp it [18:41:43] basically, I'm trying to set clear expectations that our primary objectives / use cases are going to take priority over porting everything & the kitchen sink to eventbus right away [18:41:48] #info _joe_: yes, i'd like to see the jobqueue replaced before rcs, if you ask me rcstream is a pretty minor usecase overall jobqueue gwicke is intersted in already, I think. [18:42:11] ok, yeah, so with only 30 mins left [18:42:27] let's move on to some implementation discussion? i really really want to get some things solidified by the time this meeting is done [18:42:40] +1 [18:42:55] yeah fair enough [18:42:56] so what do you consider a good 1st deliverable of system gwicke ? [18:42:58] #info I'd like to avoid a single person both designing and porting systems. It woudl benefit the design and confirm intuitive use if someone else can implement it at least once before we sort of rubberstamp it [18:43:03] #info services' primary goal for this quarter is change propagation, and things needed in the eventbus for that use case will take precedence over porting other use cases [18:43:31] nuria: the deliverable is dependent first on settling _joe_'s beef [18:43:32] :) [18:43:36] why have a service? [18:43:37] for the services team, anyway [18:43:41] is dealing with old stuff going to be /any/ quarter's priority, gwicke? [18:43:42] why not just clien libs [18:43:44] client libs [18:43:49] I think that's fundamentally the problem [18:43:55] paravoid: not all of it, obviously [18:44:09] as I said, first one to be replaced is RESTBaseUpdateJob [18:44:18] _joe_, I'm going to summarize what you've said to me [18:44:19] what are the next ones ? [18:44:22] correct me if I say aything wrong [18:44:23] next step is several classes of job queue use cases [18:44:34] gwicke: I think the issue is that the license you have from the rest of us to use up engineering resources have to do with the scope of the project [18:44:37] under the heading of change propagation [18:44:40] the idea that one team drives forward with the new sexy thing and the rest get to clean up the mess is what some of us are trying to protect against here [18:44:41] if it's just something for restbase, that's one thing [18:44:43] _joe_ doesn't see the need for an HTTP REST service proxy in front of Kafka [18:44:49] if it's something that will improve life for all engineering, that's another [18:44:58] (or alternatively, dealing with the cost of maintaining more systems) [18:45:01] i think you're aware of that which is why you started with unification / consolidation of existing systems [18:45:11] we're just trying to suss out exactly what your commitment is [18:45:13] it is another system for ops to maintain [18:45:38] and the Kafka protocol is highly optimized for reliability and scalabilty, so why put HTTP in between [18:45:43] So, assunming we want to talk implementation, let's look at the restbase usecase. Would that be listening to generic events by MW, or would the extension produce its own topic/events from MW just for this. [18:45:45] we have been pushing for a generalization exactly to avoid the sprawl of one-off solutions we have seen in the past [18:45:48] especially since you will probably want a client wrapper over the REST service anyway [18:45:58] <_joe_> I have a question about the implementation idea: it is not clear if this REST proxy you want to build in front of Kafka will be used only to produce messages to it or also to consume those. Also, in both scenarios, what this proxy would add as a benefit opposed to creating solid libraries that have a direct interface to kafka? [18:45:58] listening from restbase main, or separate service between MW and restbase that does the deed. [18:46:03] why not just make client libs that do this, instead of centralizing with a service? [18:46:03] gwicke: generalization ought to mean more than "I think my solution is best, so plz migrate, kthx" [18:46:12] gwicke: then the new push needs to show concretely how it makes the situation better, rather than just adding to the burden [18:46:13] thanks _joe) [18:46:16] we're having 2-3 conversations at once :( [18:46:16] _joe_ [18:46:26] yeah! it hought we were going to switch to implementation! [18:46:28] <_joe_> but I guess the other discussion is not over, ottomata [18:46:34] <_joe_> sorry guys, go on [18:46:39] <_joe_> I think that's important [18:46:42] ottomata called in this RFC, so let's agree that the responsibility / ownership question is unsettled for now and move on to the implementation [18:46:43] yeah [18:46:49] Krinkle: the idea is that MW core emits events (edit, delete, revchange, etc) and these are put into the kafka queue, a consumer attached to a specific topic would then consume it [18:46:53] i'll add a few #info with quotes from the last part of the convo but ignore me [18:46:57] ori, mark: sure, I'm just saying that it's not okay to say 'you pushed for generalization, so now you get to own everything' [18:47:03] Krinkle: one of them would be the change prop system whihch would update restbase [18:47:14] you get to own *coordinating the migration* [18:47:24] and advocating for resourcing it and all that [18:47:29] coordinating the migration and discussion, sure [18:47:34] OK! [18:47:41] Let's address _joe_'s Q [18:47:42] :) [18:47:46] I guess I should had put stars around *own* here too [18:47:55] #info joe> I have a question about the implementation idea: it is not clear if this REST proxy you want to build in front of Kafka will be used only to produce messages to it or also to consume those. Also, in both scenarios, what this proxy would add as a benefit opposed to creating solid libraries that have a direct interface to kafka? [18:48:27] in my opinion, there will be no REST proxy consumption, it doesn't make sense :) [18:48:39] MAYBE some special websockets type services for specific purposes, like RCstream [18:48:46] but that is an application that uses kafka [18:48:49] ottomata: agreed [18:48:50] not a http proxy [18:48:56] can someone (that proposed it, i.e. gwicke?) recap this REST proxy idea? [18:49:00] so all consumers will be only speaking the kafka protocol ? or have I misunderstood ? [18:49:01] the validation / proxy stuff is purely focused on production [18:49:03] how is this supposed to hook and where? [18:49:19] and for which use cases (all, or partial? producers, consumers or both?) [18:49:23] in front of kafka [18:49:24] paravoid: when mediawiki (or any other system) wants to publish an event, it will POST (or PUT, i dunno?) the event [18:49:25] yeah, please recap [18:49:25] to validate messages [18:49:29] paravoid: main issue is making sure that we don't write garbage to random topics [18:49:30] rather than talk the kafka protocol directly [18:49:42] is that fair? [18:49:48] ori: yes [18:49:50] kafka messages are just bytes, so we need to agree on format and topic -> format mapping [18:49:59] the kafka protocol is a little low-level and (afaik) liable to change, so having an HTTP API sounds good to me [18:50:10] <_joe_> gwicke: then consumers will use the kafka protocol and stream data from it directly? [18:50:13] also the ability to enforce constraints at that layer is a good one too, agreed [18:50:13] I imagine in MediaWiki we'll have an abstract class/interface to emit arrays as events by topic. [18:50:14] I'm not sure I understand [18:50:17] ori: yeah, that's #1 from me, abstraction [18:50:17] _joe_: for now, yes [18:50:21] to me the real point about the proxy is msg validation [18:50:31] <_joe_> gwicke: the "for now" means what? [18:50:36] aren't we just shifting the problem of emitting valid messages from MW to this proxy? [18:50:41] it's still software that does this anyway, right? [18:50:48] _joe_: as ottomata mentioned, we can add higher level things on top, like RCStream [18:51:01] but gwicke, that is not part of eventbus proxy [18:51:05] paravoid: sure, but it's a question of where and how you draw the perimeter of "guaranteed valid" [18:51:07] <_joe_> gwicke: ok, but I am talking "internal" consumers like rcstream [18:51:16] why is this rest proxy more "trusted" to be talking kafka than mediawiki? [18:51:23] so, right, paravoid, because there will be many producers [18:51:26] not just mediawiki [18:51:27] <_joe_> yeah I don't get that either [18:51:28] no design automagically corrects nonconformant implementations, but the question is where you handle them [18:51:37] paravoid: because it's one thing [18:51:38] in one place [18:51:39] and all events need to be validated against a schema [18:51:45] coordinating deploys to many producers is hard [18:51:46] no one comes unto the father except thru me, all that jazz [18:51:50] i am the way to truth and light [18:51:56] well no, because it's software and software will have bugs [18:52:01] so you need to also handle it on the consumer side as well [18:52:02] <_joe_> I try to clarify it more: we will need client libraries that will consume said messages and validate those too, right? [18:52:05] I'm confused about use of proxy here. Are we talking about a kafka consumer that listens for restbase events and does the restbase actions (because presumably restbase doesn't want to include this logic?), or is this a proxy in general between mediawiki and kafka. [18:52:13] <_joe_> paravoid: heh [18:52:15] proxy in general between mediawiki and kafka. [18:52:20] _joe_: no, for consumption no [18:52:22] Krinkle: producer, not consumer [18:52:22] paravoid: experience with eventlogging says otherwise fwiw [18:52:27] I'm not sure why we'd want to talk from MediaWiki to kafka via HTTP, only more points where things can go wrong and have to scale. [18:52:32] no need to validate, that's done on the producer side [18:52:33] the valid event stream is guaranteed valid [18:52:37] Krinkle: lke this [18:52:37] http://docs.confluent.io/1.0.1/kafka-rest/docs/intro.html [18:52:38] <_joe_> mobrovac: so you don't validate input???? [18:52:43] (but not consumption) [18:52:45] Krinkle: it's just a simple end point that takes messages, validates them against the schema configured for each message's topic, and (if successful), writes it to Kafka [18:52:53] <_joe_> that is wrong, very wrong, as paravoid stated [18:52:53] having a layer of abstraction decouples all our systems from kafka, lets use alternate implementations where kafka doesn't make sense, lets us upgrade (or replace) kafka transparently [18:52:58] oh REST proxy is not RESTBase proxy, I thought y'all where abbreviating [18:53:00] _joe_: for consumers, no since the producer side does that [18:53:06] _joe_: no need to validate, that's done on the producer side <-- no need to validate by *consumers* [18:53:10] * urandom can't type either [18:53:15] yup ori [18:53:15] fwiw, kafka currently has no authentication, this may or may not change [18:53:17] because the proxy that all producers have to go through enforces that [18:53:20] <_joe_> ori: yeah I think that's foolish [18:53:36] it's very deja vu for me [18:53:40] <_joe_> clients still have to validate the messages they get. [18:53:42] it's almost like i wrote something like this at one point [18:53:45] any kind of software bug accidentally or malicious is able to inject garbage into that "secure"/"validated" topic [18:53:49] _joe_: yes, true [18:53:49] for context: the request volume we are talking about here is in the dozens to low hundreds of messages per second [18:53:53] _joe_: why so? [18:53:56] ori: :) [18:54:00] <_joe_> urandom: what paravoid said [18:54:06] _joe_: we do not validate 150K msgs per second of webrequest data [18:54:20] saying that consumers will blindly trust that kind of data means that any such action could cause mayhem across multiple independent systems [18:54:21] <_joe_> ottomata: that is one special application, though [18:54:25] paravoid: yeah, but if / when that happens, you fix the validation, not the consumers [18:54:32] paravoid: i don't agree [18:54:40] ottomata: I assume that Kafka HTTP interface is not a single server but disdtribtued like kafka itself, so MW will still need an array of all the kafka servers, right (which is a good thing iho) [18:54:42] we'll always need validation and authentication [18:54:44] ori: no, the point is, that a malicious action (or a bug!) could bypass the proxy [18:54:46] paravoid: but we don't validate database query results [18:54:51] ori: unless you say that you *also* consume from this proxy [18:54:59] Krinkle: yes, or load balanced [18:55:00] paravoid: _joe_: consumers *can* validate them if they want to, but that has no effect on the system itself [18:55:01] Though maybe we can abstract the pool of servers via an LVS service IP, not sure if that's an improvement or not [18:55:02] from a security perspective, the validation should happen as close to the consumer as possible. abstraction often makes things worse [18:55:05] its just an HTTP post [18:55:07] paravoid: are you pushing for public key crypto? [18:55:14] <_joe_> mobrovac: I argue that they need to [18:55:15] wait what now? [18:55:18] HOooookay [18:55:32] want to take a step back here, there are lots of raised issues. [18:55:34] all together now! [18:55:43] * paravoid shuts up and waits for ottomata [18:55:45] going to try to summarize the issues, and we can tackle them individually [18:55:53] ottomata: please use #info, #action etc. [18:55:57] will try [18:56:11] ottomata: Yeah, if MW has the array, it can presumably ensure delivery if things fail by re-trying. We're not worried about performance as much since this can be done post-send. [18:56:21] (after the http req ends, MW can live on for some time) [18:56:35] #info issue 1) Contention over whether events need to be validated on consumption [18:56:39] (ottomata has the floor) [18:57:24] #info issue 2) contention over whether produce validation should be done as an HTTP service or as a client lib [18:57:40] <_joe_> also, contention about the idea such a proxy could/should be used to consume events as well over reliability and complexity concerns [18:57:55] #info issue 3) scalability - proxy will lessen it, is it worth it? [18:58:12] ottomata: that doesn't make sense [18:58:20] it'll add overhead, but won't affect *scalability* [18:58:22] <_joe_> I would like a firm statement that we won't make 100 systems dependent on a proxy we invent here instead of proven tech that is already production ready [18:58:35] gwicke: let's discuss these one by one [18:58:44] gwicke: maybe you are right, should maybe say reliability? [18:58:51] before we do, can I interject with a meta-comment? [18:58:58] yes ori [18:58:59] I have comments on the first, but waiting [18:59:35] ori, you're chair :P [18:59:42] #info correction to issue 3) s/scalability/reliability of message production/ [18:59:50] In general have to subject every aspect of one's design to a vote by a committee feels awful. Our default attitude should be that the people who are closest to this problem and have been actively thinkign about it and sketching solutions have the right idea and know what they're doing. [19:00:08] (time check: 15 minutes remaining) [19:00:14] agreed, but I don't think we're talking here about every aspect of the design [19:00:17] I think most of us agree with that, but there's a basic lack of trust, which goes back to the topic of ownership / responsibility. [19:00:22] [19:00:30] there are some big issues i'd really like to settle before our 15 minutes are up [19:00:37] so i'd like to suggest we hold off on issue 1) [19:00:52] <_joe_> ottomata: agreed [19:00:54] I just need a clarification on issue (1) though, can I raise it? [19:00:56] sure [19:01:16] I read some comments that may have been talking about something else, hence my confusion [19:01:35] paravoid: the question is what this validation entails [19:01:37] we're talking about validation of the *protocol*, as in whether it's json or something else, what kind of schema and so on [19:01:41] hence my question about public key crypto [19:01:49] What constitutes validation here. Valid JSON syntax? Or elaborate schema's? I don't particularly look forward to requiring schemas in this area. especially becahse those would be backend (e.g. kafka/avro) specific. And probably make deployment harder as MW changes, we need to coordinate updating of the schema etc. [19:01:53] yeah that :) [19:02:07]