[08:54:38] moritzm, jbond42: I'm unable to idp-login right now, did I do something bad? :) [08:55:06] looking [08:55:21] mmmh, taking out and back in the yubikey twice worked [08:55:33] the weird part is that it was not showing any error [08:55:41] just back to the login page [08:55:51] sorry for the ping I guess [08:56:15] volans: so it accepted the ubikey token but just bounced yiou to the login page? any error message [08:56:23] not on the UI [08:56:36] accepted is a big word [08:56:44] it was showing the authenticate device [08:56:52] so I guess it recognized the serial? [08:58:05] i sometimes have an issue where my yubi key dose recognise it is being pressed and the site obvioulsy stays on the page asking you to press the key. this i think is an issue with my key but wanted to double check it wasn't the same thing [08:58:47] no I didn't get that, I get sometimes the double press as stated in the task though [08:58:54] the register device + authenticate device [08:59:47] ack thanks [09:04:55] volans: did you login redirected via icinga. [09:05:16] the first attempt that failed was puppetboard [09:05:19] twice [09:05:26] then I tried icinga [09:05:33] and then took out and back the yubikey [09:05:43] that was flashing correctly, hence I din't tought aboutit at first [09:05:58] so yeah I guess I ended up loggin in for icinga in the end [09:07:39] strange from the logs it looks like the first one succeaded, the second one looks like you logged in with WHO: Volans,Volans [09:08:18] yes [09:08:30] and that's another separate bug I think [09:08:37] so after the first attempt I got back the login page [09:08:41] username was already filled [09:08:44] so I just filled the password [09:08:49] check remember me [09:08:54] and hit login [09:09:07] and it said the username was wrong and was Volans,Volans [09:10:41] ack i think i have seen that before myself tbh [09:29:00] volans looking at the i see a logout event straight after the login event with a final cause of "No authentication found for ticket TGT-106-*****KDdACBRID0-idp1001, code=INVALID_AUTHN_REQUEST" my best guess at this point is something prevented the ticket making it to memcached. i.e. it issues the ticket but when it came to validate it, there was nothing in memcached so it assumed the [09:29:06] ticket was invalid and loged you out [09:29:42] and I guess we dont' have any logs of this failure [09:30:18] still looking but i dont see any thing specific at the moment [09:31:35] maybe related to Monday's failover of the IDP, we had noticed a slight inconsistency in IDP local memcached keys before? [09:33:29] are those replicated? [09:35:03] they are via mcrouter, but last week when I compared keys between idp1001 and 2001 I noticed a slight discrepency, possibly related to transient errors like network/restarts etc. [09:35:10] not sure it's related, but could be [09:36:13] I see [09:39:36] moritzm: possible in the output i see "The recorded [09:39:38] authentication is from a remember-me request" [09:39:44] https://phabricator.wikimedia.org/T259110 [09:46:49] fyi i notice the following in syslog so to me looking more likley that cas was unable to right the TGT to memcache Jul 29 08:53:01 idp1001 memcached[471]: accept4(): Resource temporarily unavailable [09:46:53] Jul 29 08:53:25 idp1001 memcached[471]: accept4(): Resource temporarily unavailable [09:46:56] Jul 29 08:54:11 idp1001 memcached[471]: accept4(): Resource temporarily unavailable [09:46:59] Jul 29 08:54:35 idp1001 memcached[471]: accept4(): Resource temporarily unavailable [09:51:45] interesting, quite a few of those accept4() failures in memcached journald [12:59:07] cdanis: gussing you have already seen this but just in case https://github.com/google/nel-collector [12:59:34] jbond42: yes thanks :) [14:04:09] jbond42: when logging via idp to icinga, I got redirected to https://icinga.wikimedia.org/cgi-bin/icinga/tac.cgi?tac_header [14:04:12] that is an empty page [14:04:15] with just some headers [14:12:14] volans: looks like a race with this hack https://gerrit.wikimedia.org/r/c/operations/software/cas-overlay-template/+/609399 [14:12:51] ouch, didn't know we had to hack all this [14:13:31] its more hacking around icingas use of frames [14:13:54] maybe we just put in a spoecific hack for icinga afterall :S [14:14:02] ehehe [17:16:25] volans, chaomodus, do you have some time to chat about the network side of hosts provisioning/changes/etc.? [17:17:29] * volans here [17:17:57] more specifically how the switch port config (in netbox) and push to the switches could fit in the current workflow (and what's missing, etc..) [17:20:23] let's start with the data that homer needs [17:20:31] is that all coming from netbox? anything in the yaml file? [17:22:00] all netbox [17:22:56] is there a hight level step by step server provisioning doc? [17:23:37] I guess it's this step https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Requested_-%3E_Spare_&_Requested_-%3E_Planned [17:23:54] yes [17:24:04] er, https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Requested_-%3E_Planned_additional_steps_&_Spare_-%3E_Planned [17:25:09] I mean that whole page [17:25:43] I was looking for the network config [17:26:34] I didn't realize there was that many steps [17:27:03] maybe networking is not something we should worry about now? (vs. other things to configure automatically from that list) [17:27:41] you ask us? :D [17:28:52] hahaha [17:30:38] if this is the end of the chat that was blazingly fast :D [17:31:11] I hope not [17:32:08] I think I'm missing the big picture [17:32:36] like how DCops create hosts, etc... I remember mentions of a prompt or importing a csv [17:32:42] which would have the ports info [17:34:06] if we need dcops we might need to move in their chan [17:37:22] I thought something was discussed/decided during the January all hands [17:38:41] about the steps for provisioning related to servers yes [17:39:01] but mostly focused on the other sides, not too much on the network side IIRC [17:46:34] ok, I think it will fit the same way in whatever was decided. If it's a prompt it would either ask for how many interfaces and then a vlan for each (or just do 1 interface for now as it's most of the devices), and if it's a CSV import, add a column for it [17:46:40] it=vlan [17:48:13] maybe show DCops as well how to check the netbox interface and do spot vlan changes in parallel (for special cases) [17:48:37] then the question is how to push those changes, who can do a homer run, etc. [17:49:07] we can either add a homer module to spicerack [17:49:27] and run that [17:49:32] but there are a lot of details [17:49:37] which devices, what if there are other diffs [17:50:07] yeah of course [17:50:07] given that we can't push just patches but only the whole config [17:50:49] we could modify homer to only push parts of the config (eg. the interfaces) [17:51:17] I thought our junipers didn't support that [17:51:27] volans: I mean using templates [17:51:33] only load the interfaces template [17:52:27] but a switch is mostly interfaces, so it's not a big deal to have more people able to do switch changes, especially if they're framed by homer/netbox/common-sens-when-you-see-the-diff [17:56:00] ok [17:56:11] so back to the data [17:56:18] anything needd in the homer's yaml? [17:56:25] or it's all defined in netbox? [17:57:41] volans: netπŸ“¦ [17:57:59] lol [17:58:31] β˜οΈπŸ“¦ [17:59:07] does cdanis have an irc highlight for all emojis? :) [17:59:14] 🀷 [18:00:00] rotfl [18:00:22] cdanis: dunno what's that supposed to be, but I see half of a cloud and a box [18:00:32] volans: get a better terminal :) [18:01:12] locally or on the bouncer? :D [18:01:23] yes [18:01:51] locally! the terminal doing the rendering to X doesn't know the proper width of emoji-styled characters, and isn't leaving enough space, although your fontconfig is asking them to be rendered as emojis [18:03:00] anyway I use ☁️ as a proxy for networks, as there is U+1F5A7 THREE NETWORKED COMPUTERS πŸ–§ but it isn't a 'proper' emoji that has been Recommended for General Interchange, and is only drawn emoji-like on LG platforms [18:03:59] three networked computers renders on my irc client [18:04:03] the other doesn't [18:04:36] the former was ☁ U+2601 CLOUD [18:06:19] sorry, originally it was ☁️ U+2601 U+FE0F: CLOUD plus VARATION SELECTOR-16 [18:08:59] interesting [18:09:02] the cloud does render [18:31:49] XioNoX: reserved addresses had been used in real life [18:31:59] is that ok and we can just update from reality netbox? [18:32:19] volans: what? [18:32:52] XioNoX: for example https://netbox.wikimedia.org/ipam/ip-addresses/2517/ [18:32:57] 132.154.80.208.in-addr.arpa. 3600 IN PTR cloudcontrol1004.wikimedia.org. [18:35:16] XioNoX: the full list should be https://netbox-next.wikimedia.org/ipam/ip-addresses/?status=reserved&assigned_to_interface=True [18:35:30] that's netbox-next [18:35:35] where we're trying the import [18:35:43] and those came up as inconsistencies [18:37:04] if it's "ok", we just adapt the script to cleanup the description and set them active [18:37:59] volans: yeah, we're not going to ask people to re-imagine their servers :) [18:38:04] image* [18:39:31] just asking because some of them are public IPs and maybe were reserved for a reason [18:39:47] volans: also the one I set as reserved for cloudsw are reserved because I'm working on them [18:40:00] ack [18:40:06] but they can be active if it's confusing [18:41:06] volans: I checked them all it's either ^ or it's part of hte ones we mass reserved *just in case* [18:42:05] yes, but some of the ones we reserved were used :D [18:42:12] yours are not a problem [18:42:21] the ones we catch are the ones coming from puppetdb [18:42:37] assigned to interfaces [18:44:57] volans: yeah but it's fine [18:48:29] ack [18:48:49] we were going n that direction but wanted to confirm ,t hanks a ot [18:48:52] *a lot