[16:31:40] schana: I need to skip the hangout, but we can sync now if you have a sec. I have a WIP patch for the Puppetization, but I'm going to wait for you to package the dependencies as wheels (just bravado and its dependencies) [16:32:09] ori: I commented on the puppet phab task [16:32:19] oh, let me catch up [16:35:45] having the package hosted on PyPi is a no-go. Or rather, we can't deploy it from PyPi. I'd prefer to have the recommendation API cloned to a subdirectory of /srv, rather than the system Python lib path [16:36:59] I'd also prefer to use wheels only for the dependencies that aren't already packaged (that is to say: bravado and its dependencies) [16:39:39] running the pip install --use-wheel command won't install anything that's already isntalled [16:40:11] so we can 1) install debian-packaged dependencies and then 2) install the wheel'd dependencies [16:40:47] ori ^ [16:40:59] ok, sounds good then [16:41:13] should I go ahead and create the repository? [16:41:19] that would be great [16:41:59] I'm not sure what sort of environment halfak has for building their wheels or if it's done in an automated fashion [16:42:23] probably labs [16:42:38] it'd make sense to create a labs instance and dedicate it for this purpose [16:42:51] or just use docker :) [16:43:08] get off my lawn [16:43:10] (I have no preference) [16:43:21] docker would be fine [16:44:15] in general, the more novelty you introduce, the harder it gets to recruit ops to participate in maintenance [16:44:41] so it's good to spend the novelty allowance where it counts [16:44:50] so I'd go with labs rather than docker, but it's your call [16:49:51] ori: I think we're going to need a project in labs [16:50:58] schana: what's your wikitech username, again? [16:51:02] nschaaf [16:51:42] created project recommendation-api, made you admin [16:51:49] thanks ori [16:53:10] np, ttyl [16:59:52] hey schana. shall we still jump in the meeting, or we skip it? [17:00:17] we can talk about wdqs if you want [17:00:24] sure. :) [17:32:20] o/ [17:32:35] schana & ori, we use a labs instance. [17:32:41] It's the same instance that we use to build models [17:32:53] Like wheels, there's some system specifics in the model files we produce. [19:13:51] halfak: , yt? [19:13:57] o/ [19:14:02] what's up, ottomata? [19:14:18] yo so, got a bit of an eventstreams existential crisis, wondering what you think [19:14:25] you might not care much, but i thought i'd ask [19:14:25] its ok if you don't [19:14:26] so [19:14:36] we are having a websockets vs http streaming response body discussion [19:14:50] Yeah. been watching the debate go by [19:14:59] there are pros and cons to both, but i don't have a lot of client side dev experience [19:15:06] so i don't really know what would be be better user experience [19:15:29] like, if consuming via websockets in a browser is easier/works better, than just getting a streamed http response [19:15:53] From my point of view, I'm not sure I mind the difference between web sockets and SSE. [19:15:59] we can more easily make fancy features with websockets, but a single http request -> stream is very simple for a client to implement [19:16:02] Is a streamed http response different from SSE? [19:16:09] SSE uses streamed http response [19:16:11] its just a special format [19:16:13] Gotcha. [19:16:22] What kind of fancy features are we talking about? [19:16:25] so, SSE vs non SSE isn't a big deal, we can figure that out later [19:16:34] the only ones that exist now that we'd lose are [19:16:40] - ability to change filters for the current connection [19:16:49] - pull based consume instead of push [19:16:56] kasocki lets you consume in 1 of 2 ways [19:17:09] either emit('start'), and you are then pushed on('message' events as fast as posisble [19:17:11] or [19:17:15] emit("consume', cb) [19:17:19] and cb is called when you get a message [19:17:41] if we go http, we lose emit('consume', you will only ever get pushed a stream as fast as possible [19:19:26] I'm struggling to understand the practical difference. Are we assuming my client is written in a certain language? [19:20:28] with websockets, sorta. you'd have to use a language that had a socket.io library [19:20:59] with http + SSE, you'd need a language that could parse SSE format events...although it is text based and very simple [19:21:03] with http + just JSON blobs [19:21:10] you'd just be pushed newline delimited JSON objects [19:21:30] i think if we did http instead of websockets, we'd probably support both http + SSE and http + json blobs [19:21:33] on different endpoints [19:23:20] It seems to me that I could mimic emit("consume", cb) in python if I can get an iterable from emit("start") [19:23:31] I offer this as an example because it seems that I'm missing something. [19:26:12] ottomata, ^ [19:26:39] FWIW, right now all of these solutions sound good. I can't imagine that I'd want to refilter a socket.io stream [19:26:43] aye [19:26:48] I mean, why not just reconnect? [19:26:52] right [19:26:54] Isn't it effectively the same [19:26:58] and the pull based consume isn't that interesting to you, right? [19:27:08] That's something I'm not quite clear on. [19:27:28] Essentially I'd be doing event = next(events)? [19:27:43] halfak: for pull based [19:27:48] you get one message for each consume call [19:27:52] so its more like [19:28:09] while true: [19:28:09] msg = consume() [19:28:09] do stuff [19:28:18] with pushed based [19:28:20] its more like [19:29:16] def processMessage(message): [19:29:16] print( [19:29:16] on('message', processMessage) [19:29:17] startConsuming() [19:29:23] so, with pull based [19:29:30] you have control over how fast you are given messages [19:29:35] but you have to tell the server you want the next message [19:29:45] with push, you tell the server to start pushing you messages as fast as it can [19:30:32] Gotcha. I imagine with http-based I can do somethinglike this though: [19:30:50] events = (json.loads(line) for line in http_response.stream) [19:30:58] while true: [19:31:09] msg = next(events) [19:31:14] do stuff [19:31:31] yes [19:31:32] def [19:31:36] that's how that woudl work [19:31:57] So, effectively the same? Maybe there's some performance consideration I should be aware of [19:32:03] e.g. buffering on the server vs. client [19:33:58] yeah, you mostly foresee working on the CLI anyway, right, not in a browser [19:34:02] (ha, def, python :p ) [19:34:08] in your case then, it probably doesn't really matter [19:34:26] i think there are complicated browser implications for all of this that I don't really understand [19:34:43] Gotcha. Yeah. So now thinking about JS land, this gets weirder, right? [19:36:40] Seems like the problem with on('message', ...) in JS is that I can't have a guarantee that some past operation finished before I start a new one. [19:37:17] Whereas with emit("consume", ...), I'd be able to choose when I take the next message. [19:37:27] I can hardly imagine maintaining that buffer in JS. [19:46:08] ottomata, ^ sound about right to you? [19:48:26] that's right halfak [19:48:46] also, its not just about js land, its more about browsers and memory buffers and long lived sessions [19:48:49] and that i don't fully understand [19:49:14] OK. So, if I were working primarily in JS (happens sometimes), I want the flexibility of socket.io [19:49:16] it *sounds* to me like websockets would behave a little better and more consistently, since its a lower level TCP socket (usually) [19:55:40] ottomata, I'm a big fan of going with what we're familiar with too [19:55:44] Is there demand for http? [19:56:42] halfak: that's one of my problems, i don't know what real world people would demand [19:56:53] there are pros and cons to both [19:57:02] but, i'd prefer to make a good choice now [19:57:06] If there haven't been complaints about socket.io, I'd go with that. [19:57:10] so a couple of years from now we don't ask users to change again [19:57:42] haha, halfak that is the biased conlusion i'd prefer to make, because Kasocki is basically done! [19:57:43] aha [19:57:54] i'm having a lot of decision paralysis now [19:57:59] ottomata, that's a good reason! [19:58:19] "We've already pursued a technology and it's expensive to go back now -- with little clear advantage." [19:58:36] there are some big pros to http, some of them on the server side (no need for sticky sessions to do protocol negotiation, fits in with already existing rest api schemes, etc.) [19:58:54] Fitting the current schemes does not impress me that much [19:59:07] E.g. most of api.php doesn't fit in the rest API schemes [19:59:18] halfak: aside q, how important is server side filtering? [19:59:20] to you? [19:59:40] ottomata, mostly not a big deal, but I'll be operating close to the source (labs, mostly) [19:59:59] If I have a client in india receiving events, then I'd say *very* important [20:03:29] halfak: what types of filters do you think are important [20:03:31] just wildcards? [20:03:36] filtering on any field? [20:03:47] array of possible values to matcha field against? [20:03:48] regexes? [20:04:36] First, event type, then, it depends on the event. [20:05:11] event type is no problem, those will be in separate topics [20:05:11] For revision-create, I want page_namespace, set(user_name/user_id), set(page_title/page_id), category [20:05:30] ok, so arbitrary fields, but what about the fitler values [20:05:31] e.g. [20:05:40] do you need to select a range of page_namespaces? [20:05:46] do you need to match a page_title regex? [20:06:17] Title regex would be useful, yes. Prefix would be just fine. [20:07:25] I can't imagine wanting a range of namespaces. There's few of them, so a set would be more useful [20:50:53] halfak: what about ranges in general, like, where rev_bytes > 100 [20:51:00] sorry [20:51:01] rev_len [20:51:02] i guess [20:51:27] ottomata, I can see it. I'd like to have a basic capacity for it. But it seems like this could get complex fast. [20:51:37] I could probably live without it. [21:42:05] 10Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1195035 (10Quiddity) This feature would be incredibly helpful, IIUC. I have two tasks that require checking things across all our projects, and I don't know how else to do it, oth... [23:43:27] 10Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1195035 (10yuvipanda) @Quiddity having some form of official resources dedicated to it might be helpful. I unfortunately don't think I'll have any bandwidth to be able to look at...