[11:15:54] how people feel about cancelling this week's meeting in favor of the goal meeting? [11:16:00] the alternative would be to reschedule for Thursday [11:16:45] I lean towards cancelling because it's been a busy week (and will get busier), so I'd like to avoid more meetings etc. [11:23:40] +1 [11:23:51] (for cancelling) [11:26:24] +1 [12:41:14] (done) [12:41:33] herron: would like to see a draft Kafka goal with a few different bullet points [12:41:51] is that something that you could work on, or do you feel it's still too nebulous? [12:43:25] moritzm, jbond42: would like to talk about authn/z if you get a chance (if you're responding to the incident, then definitely not now :) [12:44:00] sure, wfm [12:44:04] sure, I’ll draft a few items [12:44:33] yes im free [12:45:05] herron: thanks :) [12:45:34] * Audit infrastructure for authentication requirements/protocols [12:45:45] what does that mean? [12:45:46] actually [12:45:51] scratch that, different question first [12:46:21] so the way I've thought about this problem so far (and it may an incorrect view, correct me if you feel differently) [12:46:27] has been that there's a few different parts to it [12:46:50] currently we have lots of applications gerrit, icinga, logstash etc. we need to know what they all support i.e. saml, openid , cas so we know what products fit out needs [12:46:56] the first is the "account management" part, where we take over from OpenStackManager/wikitech for that [12:46:59] sorry carry on [12:47:26] for users creating their own "developer account", and us managing that etc. [12:47:55] the second part is the SSO part, where we replace HTTP Basic Auth + custom LDAP binds with something unified, that ideally also supports WebAuthn, 2FA etc. [12:48:14] and the third part I guess is self-service group management and all that [12:48:43] are we going to address each of those separately, or do you guys feel we should just kind of attack it from all angles at the same time? [12:48:46] I can see arguments for both :) [12:49:55] I think jbond42 is right, maybe we should start with which apps and what each one supports for auth [12:50:16] the "Design a workflow for administrative changes" from the third sub task is about coming up with a more fine-grained plan (which some high level steps outlined in the gdoc) [12:50:40] several of the actual workflows will need more input from other teams as well [12:50:54] is it workflows for LDAP account management? [12:51:07] I don't fully grasp "administrative changes" :) [12:53:10] some of the steps are outlined in the gdoc, e.g. a new user gets created in self-service and then e.g. requests access to Superset, then the access could be requested by the user, but the access would be authorised by the analytics team [12:53:34] I haven't looked at the doc much, I'm sorry :( [12:53:44] I think we have some core issues identity managemtent (including SSO); user managment and application intergration. The data may be in (probably) LDAP and the user managment could likley by stricker however the service intergration and identiy managment pices are probably going to be a floss solution so i think we need to get thsoe requirments in place first [12:53:57] and part of the 3rd bullet is to look a all current wirkflows and task to the stakeholders involved for a more detailed implementation plan [12:54:04] ack [12:54:19] talk, not task [12:54:43] the goal wording is confusing me [12:55:01] and I'm trying to figure out why and express it well :) [12:55:06] so part of it I guess is [12:55:28] there is talk about "administrative changes", but what about user registration & changes? [12:55:40] end users registering themselves, reseting their password, etc.? [12:56:03] none of the three bullet points seem to cover that, right? [12:56:10] that was more or less implied, but we should probably spell it out by including self-service or so [12:56:22] yeah [12:57:36] e.g. "Design a workflow for user permissions (self service and administrative) (,..)" ? [13:00:31] sec, working on a strawdog :) [13:01:47] * Audit production and WMCS infrastructure and document all authenticated services and their authentication & authorization capabilities [13:01:50] * Engage with stakeholders and collect functional and non-functional requirements for identity and access management for web services [13:01:53] * Evaluate free & open source Identity Management/SSO software solutions against our requirements and deploy test installations of 1-2 [13:01:56] * Build a migration plan from OpenStackManager and Striker towards a unified identity and access management system for developer accounts [13:01:59] how does that sound? [13:02:13] sorry if it sounds dismissive, I just didn't feel good about opposing without proposing :) [13:06:01] sounds good, but let's maybe strike the "and deploy test installations of 1-2", installing the individual solutions will be part of the evaluation of any solution anyway [13:07:17] are you saying that the deployment of test installs is to be left for next quarter [13:07:24] or that it's going to happen earlier than that? [13:10:57] my point is that "deploy test installations" is inpreceise; during the tests of the solutions we'll surely install them in WMCS or some other VM, but a PoC of the selected solution in prod is also dependent on the outcome of (1) and partly (4) and will probably only happen after the ground work is done [13:11:20] oh i was referring to test installs in WMCS, not prod [13:11:27] but I see the confusion [13:11:54] i think the test looks good, also i added the bit about installing and the intention was to test not as a PoC [13:11:57] we can totally make sure to point out that we won't pick the tools based on slides and websites :-) [13:12:00] copied the previous wording [13:12:00] s/test/text/ [13:12:00] * Evaluate free & open source Identity Management/SSO software solutions against our requirements and create a short list of 1-2 [13:12:07] moritzm: yeah that was the point :) [13:12:22] +1 on the revised wording :-) [13:12:30] yes lgtm [13:12:38] final question [13:12:50] do you guys want to include some wording around the offboarding script [13:13:00] the stuff from the previous days basically? [13:13:05] good point [13:13:38] maybe we can reflect the wording to also cater for the offboarding/removal process [13:14:17] "(..) requirements for identity and access management (and their revocation) (..) " maybe? [13:15:08] designing such workflows is certainly a part of it, e.g. I'd like to incluce Legal in the NDA sign off process [13:15:32] I think that's implied, but I was wondering if you wanted to include something more short-term [13:15:50] changing our scripts in the meantime to include some of the stuff we've been discussing the past few days [13:16:15] I changed the LDAP script this morning, it's only waiting on gerrit being back [13:16:19] oh ok [13:16:25] for proper review etc, [13:16:29] sure yeah [13:16:32] I wasn't sure if it was that short-term :) [13:17:10] cool ok [13:17:18] I guess I'll post it as draft then [13:17:27] jbond42: do you feel comfortable with me posting the puppet one as well? [13:17:27] sounds good to me! [13:17:36] yes im happy with that [13:17:39] cool thanks [13:17:42] thanks [13:17:45] we'll discuss them all tomorrow again [13:17:51] but we're discussing them at tech-mgmt in a couple of hours [13:18:11] so I'd like to at least give a heads-up with the draft to my peers [13:18:11] ack [13:19:26] more broadly [13:20:13] I'd like to keep a balance between exploratory goals ("we're going to investigate A, evaluate B and C, plan for D") and tangible goals ("we are going to upgrade X to Y, replace K by L") [13:20:35] authn/z goal is more in the first category (but would love to find more tangible things to include in it) [13:20:39] puppet is in the latter, that's good [13:21:03] Kafka seems to be going towards the former atm, wondering if we can add more tangible deliverables to it herron [13:21:30] the whole automation goal can probably be made very tangible ("implement X cook books to automate common maintenance tasks") [13:21:39] which goal is that? [13:22:04] the one that is yet to be discussed tomorrow :-) [13:22:10] oh [13:22:19] do we have a draft or raw thoughts somewhere in the pad? [13:22:44] I mean there were only ideas floating around, not sure if this will even be a goal [13:23:12] FYI, related: https://phabricator.wikimedia.org/T203943 [13:23:47] in some ways the DBA automation goal is a tangible realization of the larger automation goal ;) [13:24:03] nod [13:24:16] the DBA automation goal is my strawdog, haven't discussed it with anyone [13:24:30] serviceops drafted a wider mediawiki goal that is in the pad but I think that's unrealistic right now [13:24:39] it's way too much work given all the other stuff [13:24:42] it seemed reasonable as written to me [13:24:48] I know _joe_ wanted to sneak in some Envoy stuff on that goal [13:24:55] (was talking about line 232) [13:28:34] paravoid: we could add the hardware for cluster expansion, but will need to decide if we will proceed with the upgrade-ram-in-place approach the analytics hardware planning spreadsheet outlines (https://docs.google.com/spreadsheets/d/1123OTmek4eRriBkZrAjbp06aH0RMmR0e69TMUlVF84s/edit#gid=1814156814) or address it with a different approach/timeline [13:30:04] uh [13:30:12] item (4) there we can do [13:30:18] item (3) (RAM in place) is a bit... meh [13:30:25] gotta run, interview [13:30:28] talk to you in an hour :) [13:30:39] kk [15:54:31] herron: is there a big issue with having mixed specs [15:55:43] chaomodus: hey, fai.don and I spok about it off irc and updated here https://etherpad.wikimedia.org/p/SRE-goals-FQ4-FY1819 (line 110) [15:56:12] ah hah i just noticed the timestamps :) [15:57:21] in a nutshell no not a big issue, but since the hosts are nearing retirement age it will feed two birds with one scone. hw refresh and cluster expansion at the same time [15:57:38] makes sense [16:19:07] if anyone has review bandwidth: https://gerrit.wikimedia.org/r/c/operations/puppet/+/497421 https://gerrit.wikimedia.org/r/c/operations/software/netbox-reports/+/495267 [17:06:17] anyone have any thought as to where on officewiki my backups of catchpoint configuration should live? [17:14:28] is tehre an external monitoring page? [17:14:37] like a subpage from there seems logical. [17:14:53] shdubsh: if you would +1 this again, there was a gerrit snafu : https://gerrit.wikimedia.org/r/c/operations/puppet/+/497563 [17:19:02] chaomodus: you got it [17:20:19] thanks!_ :D [17:54:10] o/ [17:55:22] bryan mentioned that you (you as in SRE team) are looking for some replacement for ldap/account management stuff/sso sollution [17:57:48] wondering if I can take a look at the phab task [18:03:09] arturo: just quarterly goal drafts so far, no phab task yet, other than an old one: https://phabricator.wikimedia.org/T179463