[09:48:54] mutante: if i change my icinga name from 'kormat' to 'Kormat' (to reduce potential issues), will anything bad happen? [09:49:18] or, more realistically, will whatever bad thing that happens be survivable? :) [09:49:47] <_joe_> kormat: your username should match, also in case, your wikitech username IIRC [09:49:57] <_joe_> not the shell name [09:50:04] kormat: most likely it will mean you can still login abut won't be allowed to run commands anymore on Icinga web ui [09:50:33] because the simple auth login is not case-sensitive but the Icinga contact name is [09:51:02] <_joe_> mutante: so it should /not/ match the case? [09:53:12] Icinga Classic UI 1.13.4 (Backend 1.13.4) - Logged in as Ayounsi [09:53:29] if I log in as ayounsi it doesn't accept commands [09:53:31] _joe_: double checked because we always get it wrong at first try: [09:53:50] my Icinga name is : dzahn. this matches the "sn" field in LDAP [09:54:04] and the name in modules/icinga/files/cgi.cfg [09:54:06] <_joe_> and the sn name is? [09:54:13] sn: dzahn [09:54:15] <_joe_> the wikitech username, correct? [09:54:36] if i log in using https://cas-icinga.wikimedia.org/icinga/, i get 'Kormat'. [09:55:31] i can login as Dzahn and dzahn on Icinga but as "Dzahn" i am not authorized to run commands [09:56:24] mutante: you are in cgi.cfg as 'dzahn'. i think that's what controls it, right? [09:56:27] uid: dzahn cn: Dzahn sn: dzahn [09:56:36] kormat: correct [09:56:53] ok. so i'm currently in there as 'kormat', my plan is to change that to 'Kormat', so it matches what cas-icinga uses [09:57:15] uid: kormat sn: Kormat cn: Kormat [09:57:33] <_joe_> kormat: I think you're right, actually [09:57:57] <_joe_> your login capitalization needs to conform to what is in cgi.cfg [09:58:13] kormat: just try it out and test if you can still run commands on a random server, like a downtime of 5 min or something [09:58:19] <_joe_> (what a beautiful, expressive name. It tells you all you need to do about icinga, just from the name) [09:58:28] (haha) [09:58:32] ok, will do :) [09:58:52] we really need to fix these inconsistencies across all accounts at some point [09:59:04] like realnames in LDAP, full name vs. unix name in logins etc. [09:59:15] is there a task? [09:59:16] I remember being impacted very much by that [09:59:17] : [09:59:18] :) [09:59:18] <_joe_> I hope cas brings that? [09:59:28] but because it only affected my on onboarding [09:59:37] then it went to the bottom of my priorities :-D [09:59:46] CAS will help with some of it [10:00:15] but if the app is broken, is broken (from a certain point of view) [10:00:51] the app typically isn't broken [10:00:54] remember when icinga did not have the simple auth in front of it at all and was public? it removed one of these issues and was also like a status page for end users. we just added it because of a security issue years ago but never checked (afaict) if that is since fixed [10:01:28] but whenever we transition we'll need to change things in app databases etc. [10:01:34] my Gerrit username is my full name [10:01:46] so the problem is that ldap attributes are cas insensetive when preforming the authentications however the ldap attribute returned, which is ultimatly what ends up as the remote_user preserves case. this is why authentication works with either cas but cgi.cfg has to match ldap [10:01:55] CAS may fix the consistent "login" part of it, but you'll still need to link the login with whatever is in gerrit's database [10:02:11] what jbond says is what I call broken on icinga [10:02:24] mutante: https://gerrit.wikimedia.org/r/c/operations/puppet/+/591315 for science [10:02:24] not cas, not our install ,not anything, but icinga model [10:02:29] but the login is not even part of icinga itself, we just slapped it on there [10:02:44] but it has users, right? [10:03:13] it is the interaction of that an the login system [10:03:25] i dont think cas will neccesarily change this. i think we could do a number of things. * make ldap case sensetive; * normalise case in the ldap database; downcase the remote_user in the apache config [10:03:34] yea, contacts and if you are the contact of a service then you have permissions to run commands on it [10:03:44] and cgi.cfg is just globally overriding that [10:06:33] I think cas will help for this at least from the point of view of "login in a single place" [10:06:44] also: death to http basic auth [10:06:55] I think the short/immediate-term fix would be to update onboarding checklists [10:07:03] +1 [10:07:22] on what is expected for each field (full name, unix shell name) and where case matters [10:07:46] if documented, later can be killed/fixed :-D [10:07:54] half the team logins to e.g. gerrit using the unix name, the other half using their full name, and case also varies wildly -- let's just pick one and stick to it [10:09:11] probably needs a task, otherwise people agree here and it never happen :) [10:09:29] +1 [10:09:38] +1 for XioNoX to create a task [10:09:58] :-D [10:12:02] haha, happy to create a placeholder task, but I'm sure there are people with better knowledge of our auth than me 👀 [10:20:09] here is an example ticket about renaming wkiteach/LDAP/gerrit user that is open since 2017 for ..reasons . there are more tagged with LDAP .. https://phabricator.wikimedia.org/T171417 [10:27:13] also my own ticket to rename myself 5 years ago turned into "Change LDAP cn to something more useful" https://phabricator.wikimedia.org/T113792 [10:27:44] hhaha [10:27:52] a bunch of these at https://phabricator.wikimedia.org/tag/ldap/ -> open tasks [11:12:41] JFTR, the updated onboarding already covers that in the Icinga step: "Create a patch which adds your Wikimedia Developer user name to authorized_for_system_information, (...)" [11:16:55] _joe_: ^^ (for what we were talking earlier) [11:17:21] <_joe_> volans: I'm going to lunch, but moritzm and I are already talking about it [11:18:59] yeah, I'll fold it the various suggestions later on [12:49:31] TIL puppet variables are not as imutable as i thought https://phabricator.wikimedia.org/P11029 [12:50:13] jbond42: stop breaking puppet! :D [12:50:21] :) [12:50:32] the whole world might collaps [12:50:35] *collapse [13:07:36] more puppet madness: https://phabricator.wikimedia.org/P11031. basicly becarfull using ruby inline functions `!` in erb [13:12:36] _joe_: mc1028 traffic abruptly dropped back to reasonable levels at 23:00 [13:15:28] <_joe_> cdanis: I don't want to look into it anymore tbh, but I suspect some scraping then [13:16:44] <_joe_> I just tested the gutter pool with a small twist: increasing packet loss [13:17:29] <_joe_> at about 50% packet loss, we start getting timeouts in the client, but we don't immediately failover to the gutter pool because mcrouter doesn't immediately mark the server as TKO [13:22:38] moritzm: `wmf-update-known-hosts-production` doesn't cover `ns0.wikimedia.org` [13:24:02] kormat: I generally ssh to authdns1001.wikimedia.org for performing DNS updates [13:24:13] I normally use ns0.wikimediaorg [13:24:13] any of the DNS servers will do, though; they all update each other [13:24:40] sshing to the nsX addresses will probably lose some meaning in the future, as I think there's eventual plans to anycast those IPs amongst all DNS servers [13:26:06] kormat: yes as that's not a cname [13:26:06] kormat: AFAIK wmf-update-known-hosts-production only populates with the hostname of puppet agents and no puppet agents have the hostname ns0.wikimedia.org it is just an entry in the wikimedia.org zone [13:26:28] jbond42: actually it popuplated also the cnames if you pass the checkout of a dns repo ;) [13:26:32] like icinga.w.o [13:26:39] ^ that's what i've done [13:26:39] TIL, thanks [13:26:41] but nsX are different [13:27:00] also what chris said, it might become moot soon~ish for anycast [13:27:53] any dns auth host will do, we'll cover that example in ~30m during the session [13:27:58] on how to find them ;) [13:28:14] and I didn't add it now, had already picked that one for some reason :D [13:28:41] so, hold your breath :) [13:28:50] * kormat turns slightly blue [13:29:15] c'mon, can't hold your breath for 30 minutes? wasn't that part of one of the interviews? :-P [13:29:32] we're getting soft [13:29:37] <_joe_> not for dbas, no [13:29:42] right [13:30:03] for dbas it is 15 minutes [13:30:10] https://www.youtube.com/watch?v=iqEZ7T-wHOo [13:30:41] <_joe_> we ask the dba candidates to stick their hands in a box full of venomous spiders for 30 minutes, and somewhere in that box there is a trigger for a ton of C4 under their chairs. Blindfolded [13:30:50] <_joe_> that's more or less their daily job [13:31:17] based on my first week, that seems pretty accurate [13:31:35] XDDDDD [13:32:01] kormat: it's perfectly safe, I wrote the PHP code for the trigger as my first project in the language [13:32:30] * kormat whimpers [13:32:46] so is it the time now to let kormat discover the etherpad database or what? [13:33:07] marostegui: please wait after my session [13:33:22] I'd like kormat to still have some functioning neuron when attending [13:33:27] ;) [13:33:30] hahaha [13:33:48] volans: : you might want to move the session forward in that case [13:34:15] few hours/months/centuries? [13:51:24] ah - "forward" in this context (in english) is ambiguous. i meant to move it to now, in the hope of having any neurons left :) (https://www.theguardian.com/lifeandstyle/2018/apr/13/time-move-meeting-forward-oliver-burkeman) [13:56:31] lol [14:01:10] kormat: ambiguities in language? better call volans [14:20:35] <_joe_> cdanis: lol [15:22:01] is our CI overloaded/broken? no test result for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/591357/ since 15 mins (way beyond the typical 4-5 mins) [15:25:55] it's not shown at all on https://integration.wikimedia.org/zuul/ so seems like it was dropped somehow [15:26:02] See -releng [15:31:15] https://phabricator.wikimedia.org/T250820 [15:33:53] RhinosF1: thanks [15:34:52] Np [15:41:03] I am replacing the mgmt switch on A6 in eqiad shortly, [16:01:01] what's on that? [16:03:52] it is just the mgmt switch [16:04:19] a bunch of hosts were flapping a few days ago and looks like the switch was guilty [16:05:08] apergos: icinga is building you a list of what mgmt hosts are on it in #-operations right now ;) [16:06:39] so it is :-D [16:06:41] ty! [18:20:58] If anyone feels like load testing our system (meet.wmcloud.org) please join in the meeting [18:21:10] https://meet.wmcloud.org/testtesttest [18:34:28] done now [18:34:37] Thanks everyone who participated