[22:00:15] #startmeeting RFC meeting [22:00:58] #topic RfC: Session storage service interface https://phabricator.wikimedia.org/T206010 [22:01:43] is the bot here, I didn't get an expected response... [22:02:31] <_joe_> uh, no idea [22:02:41] I don't see the bot [22:02:46] <_joe_> me neither [22:02:56] we can pretend it's here and compile the notes manually. [22:02:57] https://wikitech.wikimedia.org/wiki/Tool:Meetbot has some info on how to restart it [22:03:27] I'd do it but I misplaced my SSH keys [22:03:33] I'll do it [22:03:53] merci TimStarling [22:04:35] TimStarling: i was about to, but i'll let you do it - otherwise we'll have two meetbots again :) [22:04:40] #startmeeting RFC meeting [22:04:41] Meeting started Wed Nov 14 22:04:40 2018 UTC and is due to finish in 60 minutes. The chair is KateChapman. Information about MeetBot at http://wiki.debian.org/MeetBot. [22:04:41] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [22:04:41] The meeting name has been set to 'rfc_meeting' [22:04:52] #topic RfC: Session storage service interface https://phabricator.wikimedia.org/T206010 [22:05:10] #link https://phabricator.wikimedia.org/T206010 [22:05:33] who is here to discussion this RFC? [22:05:38] <_joe_> o/ [22:05:41] o/ [22:05:41] o/ [22:05:48] o/ [22:05:51] o/ [22:05:54] o/ [22:05:58] o/ [22:05:58] o/ [22:06:02] o/ [22:06:06] wow, full house this early in the meeting, that's rare! [22:06:30] urandom can you kick things off with a brief summary? [22:06:35] o/ [22:06:41] o/ [22:06:46] KateChapman: sure [22:06:52] <_joe_> and maybe a list of things you want to discuss, too :) [22:07:02] #link https://www.mediawiki.org/wiki/Requests_for_comment/SessionStorageAPI [22:07:05] I think our committee discussion on this was side-tracked by being a bit too ambitious, we have to remember this is in the critical path for multi-DC which is quite urgent [22:07:43] if you look at MW's includes/session/SessionBackend.php, the interface needed by $this->store is quite narrow and could more or less be met by the RFC [22:08:23] the only thing missing is client-side specification of the TTL [22:09:14] TimStarling: i think we also have to take into account that once sessions are separated out, there is nothing that prevents other entities outside mw to access the session storage (given they have the right credentials) [22:09:44] So to summarize: The RfC is for session storage with multi-master replication semantics [22:09:57] meaning, a session created in one DC is valid in the other [22:10:30] <_joe_> the RfC is specifically about its API, right? [22:10:42] Specifically, the RfC is specified as a REST API, and covers just the interface [22:10:48] _joe_: correct [22:10:50] <_joe_> I think a lot of important things are out of scope here [22:11:02] but only with eventual consistency? previously some stronger consistency model was discussed [22:11:04] TimStarling: It would still be nice to allow for more fine grained storage of the session data in the future though, even if we don't use it right away. We could simply say that must not contain "/" since that is reserved for fained grrained access, if needed later. [22:11:34] TimStarling: eventual consistency on writes, strong consistency on delete (logout) [22:11:59] ok [22:12:05] and on reads [22:12:08] _joe_: it's about the API, which includes the contract/semantics. Stuff like TTLs and consistency guarantees are in scope, i think. [22:12:09] and this is the advantage of cassandra over mysql? [22:12:12] how authorization is supposed to work? [22:12:14] _joe_: we went back and forth on what the scope should be, TL;DR if we need to go deeper, it'd probably be another RfC [22:12:22] <_joe_> urandom, clarakosi I have a specific question: why no CompareAndSwap? [22:12:26] _joe_: because it's a lot to go over in one RfC [22:12:37] <_joe_> urandom: I agree, let's limit the scope for now [22:12:50] SMalyshev: none. if you know the session key, you can access the session. [22:13:14] <_joe_> urandom: also, you don't discuss authn and authz at all. I think it should be part of this rfc [22:13:21] duesen: then why api has 401/not authorized responses? [22:13:35] i'd say auth(n|z) is out of scope for this particular rfc [22:13:37] i don't know :) [22:13:54] <_joe_> mobrovac: is it? it's part of the api [22:14:00] security is supposedly half the point of it [22:14:05] <_joe_> at least authn [22:14:06] that looks confusing - if there's no authn/z then there should be no 401 [22:14:19] ok ok ppl, let's rewind a bit [22:14:38] authorization, meaning authorization of mw to set/get/delete sessions isn't covered in the RfC [22:14:44] i don't think urandom finished his initial explanation [22:14:48] heh [22:14:52] <_joe_> ahah ok [22:14:58] <_joe_> sorry :P [22:15:01] mobrovac: didn't I? [22:15:04] I have no idea... [22:15:11] I agree with tgr's comment on talk page that without a threat model security benefits seem questionable [22:15:41] bawolff: talk of threats might have been a bit of a red herring [22:15:51] bawolff: imho, that's intertwined with auth(n|z) which was deemed (at an earlier time) a spearate concern [22:16:23] that was (part of) the justification for using a service, instead of for example, a CassandraBagOStuff [22:16:31] <_joe_> bawolff: it is now possible from a MediaWiki RCE to dump all session tokens in one command. it wouldn't be as this interface doesn't allow enumeration [22:16:39] I get it it's not covered in the RFC but if we have a service that is supposed to tell me I'm sometimes authorized and sometimes not, I'd expect at least some mention - like is it stateful? token? some other way? or is that completely offtopic too? [22:17:12] SMalyshev: it's intentionally omitted from the RfC, but I'd assume certification verification [22:17:16] assumed [22:17:17] we are talking about a storage service, not an authn service, right? [22:17:23] urandom: ok [22:17:38] the 401 is returned if mw is for some reason not authorized [22:17:51] SMalyshev: i think that's a second step hence the 401s in the api, but as of right now, auth(n|z) is out of scope in my view (it's a todo) [22:18:06] correct tgr [22:18:09] <_joe_> tgr: yes, authn in this context is "how can mw be authorized to read/set/delete key on this service" [22:18:21] tgr: session storage service [22:18:44] <_joe_> urandom: I reiterate my question: why not allow CompareAndSwap? [22:18:47] and presumably the answer to that is by IP? [22:18:48] otherwise this looks like standard CRUD REST API... [22:18:50] or perhaps more importantly, how are others prevented from it [22:19:01] tgr: for the time being, i'd say yes [22:19:02] CAS is not needed by SessionBackend [22:19:20] if a user logs out and logs in at the same time, does it really matter which wins? [22:19:24] TimStarling: but i would assume it'd be needed in the future, no? [22:19:26] _joe_: because until very recently, a need was never under discussion [22:19:56] if a session is refreshed to extend its TTL, it doesn't matter if it is done simultaneously by multiple clients [22:20:06] it is the same data in each case [22:20:07] <_joe_> to be clear, our current model is pretty much prone to any race condition [22:20:23] #info _joe_: to be clear, our current model is pretty much prone to any race condition [22:20:24] <_joe_> so even eventual consistency is a net win :) [22:20:29] TimStarling: OAuth (and OATHAuth I think?) set session keys to prevent replay attacks [22:20:44] that needs CAS or something similar [22:20:46] TimStarling: according to Brad there's a need for 'set if unset' kind of CAS in OAuth [22:21:01] #info concerns wrt auth(n|z) when ti comes to the session mgmt service [22:21:27] ok, so there is SessionBackend::addData() etc. which is currently implemented without consistency [22:21:28] <_joe_> #info according to Brad there's a need for 'set if unset' kind of CAS in OAuth [22:21:38] Is this service supposed to be firewalled from all non-MW clients? [22:21:47] SMalyshev: yes, I assume [22:21:51] <_joe_> SMalyshev: can we leave that discussion for later? [22:22:01] SMalyshev: i'd supose yes in the beginning (until auth(n|z) is solved) [22:22:01] in addition to whatever trust mechanism [22:22:09] also it's nice if session changes coming from actions by the same user on different servers do not clobber each other, although it would not be the end of the world [22:22:09] "depth of defense" [22:22:10] <_joe_> I don't like any of the auth method I saw being proposed here FWIW :) [22:22:25] So is this service intended to only host session stuff maintained by AuthManager, or also random other session stuff that other code might add? [22:22:31] _joe_: I am trying not to get too deep into it, but to understand the case I think it makes sense to understand at least the basics. Otherwise we'd be just talking about "what is CRUD"... [22:22:40] RoanKattouw: the latter, imo [22:22:55] I assume it would have to host everything that needs session-like cross-DC behavior? [22:22:59] RoanKattouw: that is what I was confused about, but actually it looks like the random session stuff is embedded inside the AuthManager stuff [22:22:59] I would assume the latter to be scope creep FWIW [22:23:14] maybe at a later date they can be split out [22:23:17] tgr: what other stuff like that is there? [22:23:21] CentralAuth does not use SessionManager for its global sessions, but the use case is the same [22:23:24] <_joe_> so if other software can add his session keys, I'd expect the keys to be namespaced for easier multitenancy? [22:23:34] there should be a single source of truth when it comes to sessions, right? [22:23:35] OAuth also uses sessio storage directly [22:23:45] OATHAuth too, probably? [22:23:56] <_joe_> so yes, let's concentrate on mediawiki [22:24:31] i don't see how including or excluding other stuff apart from mw changes the discussion provided auth(n|z) is in place? [22:24:44] <_joe_> It's still not clear to me if general-cas or set-if-unset are needed, and with what level of consistency [22:24:45] if there is a single source of truth for this, it should accommodate all use cases [22:24:56] _joe_: it's not clear to me either [22:25:04] _joe_: though supporting it would be trivial [22:25:17] <_joe_> I would expect a general interface to storage of sessions to have CAS semantics [22:25:40] <_joe_> urandom: I thought so :) And if that's the case, I would support it into the interface for sure [22:25:41] my preference is to implement neither and rush this thing into production with a minimal feature set reflecting current use cases, so that multi-dc can be unblocked [22:25:42] _joe_: I don't disagree, but please add rationale to that statement :) [22:25:57] also my strong preference is for there to be client-side TTL specification [22:26:05] <_joe_> TimStarling: +1 [22:26:09] TimStarling: in doing so, wouldn't we be paying the price later? [22:26:11] TimStarling: adding it later in a backward compatible manner would also be trivial [22:26:18] TimStarling: as would client supplied TTLs [22:26:33] mobrovac: what price? [22:26:44] TimStarling: how is multi-DC unblocked if CentralAuth global sessions are not cross-DC consistent? [22:27:06] urandom: well, right now we seem to be focused on multi-dc, but there is also another level of being able to access arbitrary segments on sessions [22:27:20] would there be some mechanism for expiring stale sessions? (or this is offtopic too?) [22:27:23] <_joe_> tgr: sessions are currently not consistent (not in the sense you intend the term here) within one DC [22:27:25] #info TimStarling: my preference is to implement neither and rush this thing into production with a minimal feature set reflecting current use cases, so that multi-dc can be unblocked [22:27:26] mobrovac: that is a very recent addition to the discussion that I don't even [22:27:35] SessionBackend is specific to core's idea of a session, including cookie domains, it's not a general data store [22:27:36] So one feature that is requested sometimes is having the ability for users to view/terminate other sessions (like i think gmail lets you). I guess this interface would preclude that, but nobody has ever implemented it so demand cant be that high [22:27:42] TimStarling: I agree, but would like to conider future use cases and accommodate them in the API design, if that isn't too cumbersome. E.g. the input for the set operation needs to have some place to put the CAS info (it does). And the URL schhema schould reserve a way to address things "inside" a session, even if that's not currently supported (would be easy to do) [22:27:56] by stale I don't mean on auth scale but on the scale "some code created the key and forgot about it and it's sitting there for 10 years" [22:28:24] could CentralAuth use a client pool shared with SessionBackend? [22:28:35] <_joe_> SMalyshev: the proposed interface has a server-side TTL [22:28:50] SMalyshev: there would be an auto-expiry feature [22:28:51] you know currently behind SessionBackend we have BagOStuff, which is shared with CentralAuth [22:29:01] so the idea would be to replace BagOStuff with the new thing [22:29:04] ah yes, thanks, found it now. [22:29:16] or you know, extend BagOStuff, then you have an obvious way to provide CAS etc. [22:29:27] since BagOStuff already provides those interfaces [22:29:31] time check: 30 mins [22:29:32] bawolff: allowing one user to terminate another's sessionn without knowing the session id would remove any security benefit from the service separation. [22:29:33] _joe_: aren't the current single-DC sessions just relying on consistent hashing so every key is stored on a single machine? you can't get more consistent than that [22:29:43] [14:27:36] So one feature that is requested sometimes is having the ability for users to view/terminate other sessions (like i think gmail lets you). I guess this interface would preclude that, but nobody has ever implemented it so demand cant be that high <-- the demand is high IMO, it's just people don't know how (generalization) to implement it [22:29:51] bawolff: the one thread this removves is enumerating admin user names and killing their sessions to prevent counter-emsures [22:29:58] *counter-measures [22:30:08] <_joe_> tgr: except if you do concurrent read-and-writes, you are in a last-write-wins scenario [22:30:10] duesen: he means users being table to see their *own* sessions, and kill them [22:30:19] <_joe_> which is exactly what we're proposing here [22:30:26] duesen: so you can log yourself out of a shared computer if you forgot for example. [22:30:28] legoktm: their own on other devices? [22:30:32] Yes [22:30:32] yes [22:30:33] <_joe_> tgr: redis itself offers no strong consistency [22:30:41] #info legoktm: [14:27:36] So one feature that is requested sometimes is having the ability for users to view/terminate other sessions (like i think gmail lets you). I guess this interface would preclude that, but nobody has ever implemented it so demand cant be that high <-- the demand is high IMO, it's just people don't know how (generalization) to implement it [22:30:41] Or at least have a nuclear "log me out of everything everywhere right now" button [22:30:58] duesen: right now when you log out, it kills all of your sessions, which is a pain for people who log in on their phones and potentially public computers [22:31:11] RoanKattouw: current log out == log me out of everything everywhere right now [22:31:14] Gotcha [22:31:17] anyways, I think we're getting a bit off topic [22:31:18] a lot of services force the 'nuclear' logout of all sessions when you perform a password change, too. [22:31:28] my point was that such a feature is definitely wanted [22:31:28] legoktm: oh really? I had the impression this was not the case. [22:31:39] pity this session service doesn't have a name, SessionBagOStuff is a bit too generic [22:31:44] <_joe_> legoktm: ok, I'm not sure this would be possible with this interface? [22:31:46] duesen: you can always log people out by changing the cached value of user_token [22:32:04] _joe_: right, I think bawolff was asking whether that should be something taken into consideration [22:32:05] legoktm: it's not off topic - the current interface would actively prevent any "kill my other sessions" logic. [22:32:05] legoktm: ok, but how does this relate to this specific RfC? [22:32:12] I thought special:logout only killed current session [22:32:15] <_joe_> legoktm: I don't think so [22:32:16] legoktm: it would become impossiblle by design [22:32:20] <_joe_> bawolff: I did too [22:32:35] <_joe_> urandom: see above ^^ this is a valid concern [22:32:42] so in current MediaWiki the user session and the session storage are not quite the same, validating a session involves checking against the user table or a cached version of it [22:32:50] <_joe_> should we keep some user id in the session path? [22:32:58] and we take advantage of that for cross-wiki logout for example [22:33:15] _joe_: that wouldn't be enough, you'd need to index by-user [22:33:27] tgr: ah, right - you can invalidate a session without deleting it, by changing the token. good point. [22:33:27] if want to keep it that way, the session service does not need to be concerned with that [22:33:30] <_joe_> tgr: so the user table has a record keeping track of all session tokens? [22:33:44] I think expiring all user's sessions is important scenario for security - if somebody suspects one of the session IDs may have leaked [22:33:56] if we want to make the session service a source of truth wrt logged-in status - that's a much more complicated discussion [22:34:08] <_joe_> tgr: I don't think we want [22:34:10] tgr: that hover means that an attqacker can do that as well - they can't enumerate sessionn tookens, but they can still kill sessions in bulk [22:34:20] _joe_: the user table has a token, all sessions store a copy, when the token changes the sessions become invalid [22:34:29] <_joe_> oh ok [22:34:31] <_joe_> even better [22:34:33] <_joe_> :) [22:34:47] duesen: yeah, which is why I'm saying the security benefits from this as currently envisioned are minimal [22:35:07] but the main driver seems to be multi-DC anyway [22:35:09] <_joe_> can we say we have a consensus on client-set TTLs to be a desirable feature? [22:35:48] I think that would be good [22:35:48] I agree with tgr that enumerating sessions doesnt seem like a useful attack to an attacker [22:35:54] yes. q: is client TTL still subject to server one-fits-all ttl or replaces it? [22:35:57] _joe_: i would say it depends on whether we want the service to be the source of truth or not, which i tihnk it should be in which case it should be the one setting them [22:36:01] tgr: but multi-dc could also be had by letting mw talk to cassandra directly. the justification for having a separate service is then really just "make things nice and separate". [22:36:13] #info but multi-dc could also be had by letting mw talk to cassandra directly. the justification for having a separate service is then really just "make things nice and separate". [22:36:16] I'm also wondering what people think about leaving room in this API for a future CAS or merge feature [22:36:20] if the service is implemented as a new BagOStuff, which seems to be the easy way, it should honor the BagOStuff contract, and that includes TTL [22:36:21] one reason I don't like server-side TTL is because configuring services is a PITA [22:36:25] <_joe_> mobrovac: but we want the clients to be able to set a LOWER ttl for a specifc session [22:36:51] and adding TTL is easy, so really it seems less effort to support it than not [22:36:52] Right now you can only do set calls that completely overwrite a key, which apparently works with the narrow thing we need this for right now (or at least isn't worse than it), but doesn't seem very future-proof to me [22:36:54] <_joe_> TimStarling: also we might want some sessions (say on private wikis) to last 1 week, and on other wikis to last longer, right? [22:37:08] RoanKattouw: i'd want the spec to reserve room for that, at least say whether it should be doone with etags or with a field in the input JSON or what [22:37:09] _joe_: sure [22:37:15] RoanKattouw: it seems easy to add new methods in the future, which I think urandom has in mind? [22:37:24] RoanKattouw: Petr suggested something as simple as https://www.mediawiki.org/wiki/Requests_for_comment/SessionStorageAPI [22:37:25] <_joe_> so we need client-side TTLs :) [22:37:31] _joe_: but that seems like you want to expire only a part of the session given SUL [22:37:32] well damn, that's not a permalink [22:37:44] RoanKattouw: https://www.mediawiki.org/w/index.php?title=Topic:Uoge9eeznt3nntm0&topic_showPostId=uoggxstpey9zb21k#flow-post-uoggxstpey9zb21k [22:37:47] <_joe_> mobrovac: private wikis are non-SUL most of the time [22:37:59] i.e. use a If-None-Match: * header [22:38:06] <_joe_> urandom: nice! [22:38:20] duesen: yes. Which is not a bad justification. Although if we do want more from the session service in the future, I would probably go for CassandraBagOStuff now and more effort in speccing out what we want later, when it's not blocking multi-DC [22:38:38] as for TTL, I'd add that as an attribute to the JSON payload of the PUT [22:38:44] #link https://www.mediawiki.org/w/index.php?title=Topic:Uoge9eeznt3nntm0&topic_showPostId=uoggxstpey9zb21k#flow-post-uoggxstpey9zb21k [22:38:55] <_joe_> urandom: +1 [22:39:09] but am personally leaning toward doing either/both as a later iteration [22:39:17] if the need is not concrete now [22:39:33] because extending it is so easy, and removing things is awful and hard [22:39:37] If-None-Match would be good, but I think an endpoint like /update that does a merge would be good too [22:39:48] should we go back to discussing whether we need a standalone service at all, or just want CassandraBagOStuff, and then think about a "full" session/login/auth services? [22:39:51] But yeah clearly this is not something we need right this minute [22:39:54] <_joe_> urandom: now I was asking myself if the API shouldn't also define rate limiting semantics, but that's intertwined with the authn/authz discussion [22:39:56] RoanKattouw: that is not very RESTful :) [22:40:15] <_joe_> duesen: no I think we need to create a service, as basic as it might be [22:40:16] I still feel a standalone service is worthwhile for this. but i have trouble justifying it. [22:40:36] RoanKattouw: that begs the question of whether that should be only one endpoint that updates things and that does so based on the session segment that one wants to update [22:40:39] #info <_joe_> I think we need to create a service, as basic as it might be [22:40:44] <_joe_> if we want to be able to expand its properties in the future [22:40:48] duesen: principle of least privilege [22:41:04] *** 20 minutes left *** [22:41:10] duesen: also, I think a Cassandra driver for PHP is going to be a pain [22:41:11] What's wrong with being able to say "this key contains a JSON blob with a bunch of properties, I would like you to change this one property to this one value" as opposed to having to do that on the client side with (possibly repeated) get+ If-None-Match set cycles? [22:41:15] cassandra by itself can't do expiry can it? [22:41:17] _joe_: re CAS, Redis has setnx which is a very limited form of CAS, BagOStuff builds full CAS support on the top of that (using setnx-based locks) AIUI [22:41:28] urandom: I've used cassandra from PHP before, I don't recall it being a pain [22:41:40] I made an ExternalStore subclass using it [22:41:43] TimStarling: Cassandra can do expiry [22:41:45] TimStarling: OK [22:41:46] <_joe_> tgr: yes, but setnx doesn't have strong consistency guarantees. [22:42:06] TimStarling: (OK regarding the driver; I was referring to it's broken status in Debian) [22:42:06] it does if you use it for a binary lock [22:42:10] let's remember we are talking about apis here, not implementation, please [22:42:18] TimStarling: was it an entirely native version, or one with an extension? [22:42:54] mobrovac: if the intent is to get something into production as quickly as possible, I guess it might be relevant that it is easy to implement? :) I certainly get the impression that there is an implementation in mind for the API as proposed [22:43:02] urandom: I'll get back to you on that [22:43:31] cdanis: the RfC is about the API though, there is no mention of "need to have it tomorrow" [22:44:00] urandom: getting back to the API: what do you think of the idea of reserving some separator, like "/", from the part of the URL, so we can make URLs for getting or setting individualk bits of the session later? [22:44:22] urandom: for now, we'd just have to reserve a character for making such pathes, everything else could be figured out later [22:44:49] duesen: that still begs the question of how do we do merges and whether the service should be the one doing them [22:45:09] duesen: we can reserve a character [22:45:14] which influences the api ultimately [22:45:16] <_joe_> urandom: I would propose we add a namespace to the urls of the individual keys [22:45:23] duesen: if the API is just a k/v store then keys can just be opaque [22:45:46] <_joe_> that would help with separating different types of storages and allow different things to store their sessions separately [22:45:51] mobrovac: for now, the answer is: the service doesn't know or care. [22:45:59] _joe_: not sure I follow [22:46:29] TimStarling: sure, but if we want it to become "smart" about the structure of values in the future, the notion of a "path" should be part of the url spec. [22:46:44] duesen: but the answer to that influences the scope of the service and its api [22:46:45] TimStarling: just be reserving a character, we'd keep that option open [22:46:46] <_joe_> urandom: if your url for retreiving a key is /v1/keys/$key vs /v1/$namespace/keys/$key [22:47:03] <_joe_> you can't separate keys with different uses based on the url [22:47:05] BagOStuff uses colon separators, following the memcached convention [22:47:21] _joe_: oh, the URI is prefixed with `sessions` now [22:47:26] mobrovac: my idea is to make it "dumb blobs" for now, and possibly add "smart structured" later, usinng more compley url syntax [22:47:40] _joe_: that's a recent change, the result of our last meeting [22:47:46] <_joe_> urandom: oh, heh, sorry, I completely overlooked that [22:48:11] <_joe_> so /sessions is a namespace in itself, ok [22:48:13] i kind of like the idea of reserving a delimiter so the storage can do updates to individual fields later [22:48:15] _joe_: no, it was literally early today (i'd forgotten) [22:48:30] Are these dumb blobs going to be json (as in please dont make it php serialized stuff) [22:48:43] bawolff: +1gazillion [22:49:00] <_joe_> bawolff: json storing base64-encoded serialized php :P [22:49:11] I don't understand where this field updating is coming from, other than it's obvious other have had some very recent discussions that I missed [22:49:15] don't ruin the party _joe_ :) [22:49:25] but if reserving a character works, that seems easy enough [22:49:27] MW will use a key like enwiki:MWSessions:123456789 [22:49:46] urandom: less chance for write/write collisions, less need for cas or service-side merging of values. [22:49:50] doing s/:/\// seems simple enough [22:50:00] #info i kind of like the idea of reserving a delimiter so the storage can do updates to individual fields later [22:50:08] *** 10 minutes remaining *** (#info is your friend) [22:51:23] urandom: since we have 10 mins, can we have your summary of the open questions and things that have been agreed upon? [22:51:24] <_joe_> urandom: I would expect something like PUT /sessions/v1/keys/$key#field [22:51:36] so I still feel this needs a clearer goal. If it is multi-DC support (we want the session storage to behave like Cassandra, not like Redis), let's replace RedisBagOStuff with CassandraBagOStuff. Easy, well scoped and an API can built on top of it in the future. [22:51:36] #? [22:51:45] <_joe_> any separator :P [22:52:10] urandom: uuuhhh.... the "value" field in the input JASON... does it have to be a string? [22:52:15] Can it be a JSON structure?` [22:52:20] mobrovac: has anything been agreed upon? [22:52:26] If the goal is security and/or exposing authn state outside MediaWiki, those are worthwhile goals but way more complex than most people seem to assume, and the current API is not really useful for them. [22:52:33] "The body of the request is a JSON object with a single string attribute named value. " [22:52:39] duesen: the assumption has been opaque k/v up until (what seems) hours ago [22:53:07] urandom: doesn't *necessarily* translate to blobs in my mind, but ok. [22:53:17] just thinkking that we'## end up with json in a json string [22:53:24] duesen: https://www.mediawiki.org/w/index.php?title=Topic:Uo7dbouu25vfykmg&topic_showPostId=uogfbe4ok0p91d9n#flow-post-uogfbe4ok0p91d9n has some points about why subkeys might be a bad idea [22:53:25] tgr: im ny mind, both are the goals [22:53:41] urandom: it has been ageed that we want a service. REST seems uncontroversial. Cassandra seems uncontroversial. [22:53:46] who is leading this project? [22:53:47] not sure about anything else :) [22:53:53] duesen: would json in a json string be bad? [22:54:02] duesen: see tgr's latest :) [22:54:09] duesen: REST is controversial, tgr keeps saying better to not do that [22:54:14] <_joe_> I have one more question: do we want what we store to be completely arbitrary json? [22:54:20] mobrovac: in that case IMO the RfC needs a threat model (or for exposing authn, a use case) [22:54:33] but apparently we don't have time for that because multi-DC? [22:54:50] urandom seemed interested in PHP->Cassandra drivers as well [22:55:04] I agree that id like to see more explicit requirements stated [22:55:24] if we're talking about exposing authn state to non-MW clients, there's a huge can of worms to open... because we need definitions how authn state is described, etc. [22:55:25] I have some ideas for action items but I'm not sure who to assign them to, that's why I ask [22:55:27] tgr: i see that it may not be worht while, i'm just saying that reserving a separator costs nothing, and keeps the option open. [22:55:33] *** 5 more minutes *** [22:55:35] bawolff: yeah, I'd like that too :| [22:56:03] tgr: i think the assumption was at this point that in v1 auth has been taken care of (which i know is not sufficient) [22:56:12] ok I am going to say mobrovac [22:56:16] hahahaha [22:56:18] ok [22:56:24] _joe_: i didn't see you argue against REST. did i miss something? [22:56:37] <_joe_> me argue against REST? Why? [22:57:06] <_joe_> I think k-v storage is one of the cases where rest can work decently well [22:57:07] ah, no sorry. i got confused. tim said *tgr* was arguing against it. sorry. [22:57:11] we are running out of time [22:57:30] #action mobrovac to update RFC with discussion of unresolved question: cassandra versus REST [22:57:35] Do we have consensus on anything? [22:58:10] objections to "let there be a standalone service storing sessions as blobs for now"? [22:58:33] <_joe_> I am strongly against having php talking to cassandra directly in this case, it will just surrender any possibility of enhancing security in the future. [22:58:57] #info <_joe_> I am strongly against having php talking to cassandra directly in this case, it will just surrender any possibility of enhancing security in the future. [22:59:00] duesen: a CassandraBehindRestBagOStuff does not add anything over a CassandraBagOStuff, is more work, and the API will have to be redone at some point if we really want a standalone authn service [22:59:01] TimStarling: these are two different levels of abstraction, but we can discuss it some other time [22:59:02] <_joe_> a topic I avoided as it was clearly off topic right now. [22:59:08] we just need a document saying that [22:59:28] it's out of time for actually discussing it here, but there is clearly a question of how many levels of abstraction are required [22:59:31] #info duesen: a CassandraBehindRestBagOStuff does not add anything over a CassandraBagOStuff, is more work, and the API will have to be redone at some point if we really want a standalone authn service [23:00:00] #info _joe_: I am strongly against having php talking to cassandra directly in this case, it will just surrender any possibility of enhancing security in the future. [23:00:02] enhancing security would come in the form of replacing the direct connection with a service, presumably, once we have a better idea what that service needs to do [23:00:06] TimStarling: that, i suppose, is very often the question ;) [23:00:07] tgr: I can't argue for or against the latter because I have no idea what an authn service would look like [23:00:43] much like a session service, I imagine, except the sessions would actually be invalidated when the user logs out [23:00:56] which is not something that the current service supports [23:01:00] in a past time, there were discussions of an authn/z service and a session service, but the former was deemed too complex at that point [23:01:12] last change to add to the minutes with a tag before I close things [23:01:25] sorry, by authn service I really meant authn state service [23:01:34] tgr: it's arguable whether a session mgmt service should support that, given that it's a authn/z concern primarily [23:01:47] <_joe_> tgr: so you say that having the possibility to dump the entire sessions table with one command from mediawiki has no security implications compared to not being able to do that [23:02:33] okay, let's continue the discussion on the talk page [23:02:39] <_joe_> I'm not naive, I know there are ways around that, but seriously, if you have business logic between the data and Mediawiki, you can progressively improve that logic [23:02:57] _joe_: +1 [23:02:59] _joe_: session data is not really valuable, it's the ability to create or delete sessions that is, and the service as currently proposed does not stop that [23:03:07] and stopping it is a very hard problem [23:03:09] #endmeeting [23:03:10] Meeting ended Wed Nov 14 23:03:09 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [23:03:10] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-11-14-22.04.html [23:03:10] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-11-14-22.04.txt [23:03:10] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-11-14-22.04.wiki [23:03:10] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-11-14-22.04.log.html [23:03:24] _joe_: principle of least privilege seems to apply here [23:03:34] <_joe_> session data is not valuable? ok then... [23:03:42] At most there are maybe some privacy implications of dropping session table [23:03:43] * _joe_ off [23:03:43] really the only security impact I can think of is passwords being stored unhashed in sessions for a few seconds [23:03:45] there are always vectors that go beyond what I am able to imagine [23:04:05] passwords are being stored unhashed? [23:04:14] that doesnt sound right [23:04:30] of course, we could lock down access from a CassandraBagOStuff Cassandra-side too [23:04:33] they are needed for the duration of the login process, yes [23:04:42] they're supposed to be encrypted [23:04:54] encrypted, sure [23:04:57] which i guess is technically unhashed... [23:04:59] fyi, this is not being recorded in the meeting minutes [23:05:15] encryption offers no defense against a code exection attack [23:05:17] well its kind of tangential [23:05:32] mobrovac: it was discussed on the talk page before [23:05:39] hashes dont really offer that much either [23:05:52] Once you own the server [23:06:05] and can record all connections [23:06:34] that's an additional escalation from ordinary RCE [23:06:45] you should need root to record all connections [23:09:11] Yes thats true. I was thinking the case where a person got persistent code execution (e.g. replacing a php file or something like that) [23:10:17] I can think of two attacks that the service as currently proposed would prevent, assuming some kind of reflected RCE [23:10:37] one is enumerating sessions and fishing for passwords [23:12:17] We only store the password if its a multistage login right. So only 2FA users would be affected in this scenario? [23:12:23] the other is trying to find an active session being used by a targeted user and then manipulating it to run code on their machine (maybe turn the reflected RCE into one stored in the session, assuming some further vulnerability allows it, and then use that to send the malicious Javascript) [23:13:30] if you are able to steal the session cookie [23:13:41] thats concievable if we were using php serialization. Seems much less likely with json blobs [23:13:44] you can add the malicious javascript on the user common.js [23:13:52] noisy, but effective [23:13:53] so the security benefit is not zero, but those are both fairly unlikely attacks as the sheer volume of session data probably makes them hard to pull of (and you need an RCE to start with, and at that point you can do worse things) [23:14:39] Platonides: but in this scenario the attacker can create new sessions for the user, so they dont need to steal the cookie [23:15:51] if session enumeraction is a real concern, we could only store a hash of the session id so you couldnt make a cookie from a dump of session table [23:16:37] bawolff: yes, it would only apply to multistage login (or normal login with millisecond-exact timing I suppose), that includes 2FA, a password change form being shown, maybe login captchas (I don't remember how they are implemented) [23:16:44] it's so cheap that seems worth doing it [23:16:53] that still leaves the info in the session though. Encrypted password would be a biggie. Im not sure anything else is of interest. Login date from a privacy perspective maybe? [23:17:59] tgr: from what i remember i thought in normal case, it was never persisted to backend unless the process extended beyond one request [23:18:16] but it has been a long time since i looked at the code [23:18:23] session enumeration for the purpose of creating cookies is not a real concern, IMO, the attacker can just as easily create its own session and you cannot defend against that, short of moving the cookie handling completely out of MediaWiki [23:18:53] I guess a session service that talks to Varnish directly could do something like that, but as I said, way more complicated [23:19:24] urandom: I used Uri Hartmann's pure PHP Cassandra client library, not an extension: https://github.com/uri2x/php-cassandra [23:19:54] this was a 2015 experiment which I did not submit to gerrit [23:23:12] * bawolff wasnt paying attention and my battery went to 0% [23:24:37] The other scenario maybe is attacker has short window, doesnt fully know what s/he's doing and exfiltrates as much data as possible for later analysis [23:25:24] but doesnt have time to setup a more active attack with inserting malicious sessions [23:26:23] bawolff: re: AuthManager, in the normal login case the authn session data gets set at the beginning of the request and deleted at the end of the request [23:26:58] why is the password in the session? [23:27:00] I think that means the data does not get written to the session storage, but that could break easily without anyone noticing [23:28:08] TimStarling: i think its so there can be multistage logins where password is only compared at end [23:28:20] TimStarling: all data submitted during the authn process is stored in the session, becase it's the authn framework's job to get the data to the providers, but it has no understanding of what the data contains [23:28:49] yeah, authentication can be spread over multiple requests [23:29:31] seems fixable, with difficulty [23:29:55] authn providers can add new requests to the process (e.g. by showing a 2FA form) and then authn providers coming after that might need the data from the initial request [23:30:50] it's not generically fixable IMO, we could special case passwords as they are usually processed early (at the very least that's the case on WMF servers) [23:31:47] you would have to give the password provider the opportunity to store data to the session early on, which it then receives later [23:31:56] but not in a nice way, you could have multiple authn providers which all use the password (normal MediaWiki authn and CentralAuth for example) [23:31:58] instead of getting the form data it would get its own data [23:33:01] you mean it would hash the data before putting it in the session? [23:33:51] yes, hash with matching salt, that would be fine [23:35:20] or just a boolean to say whether the password was correct [23:36:19] I guess that would work as long as no secondary providers use anything that needs password-like security [23:36:51] Password requirements checks i guess need the raw password [23:37:04] but i guess they could do the same thing [23:37:13] yes, have the same split logic [23:37:26] store the password length or dictionary match status or whatever, then later report on it [23:38:04] * bawolff likes this idea [23:38:33] we would have to change the PrimaryAuthenticationProvider interface, which is a bit of a pain [23:41:30] all providers would have to do the pw check at the start of the request, instead of the process aborting at the first successful / failed one, so more work and DB queries but that seems like a small price to pay [23:42:49] luckily extensions use "extends AbstractPrimaryAuthenticationProvider" rather than "implements PrimaryAuthenticationProvider", that makes b/c possible [23:43:55] tgr: that may be good thing. Reduces timing attack risk of figuring out which part failed [23:44:41] which is probably mostly not sesnsitive but i could imagine some complex scenario where it would be