[05:11:11] s5 master switched [08:05:42] elukey: the rune in T424674 "openssl s_client ip:port | grep openssl x509 -text -noout" is I think not right? [08:05:43] T424674: Migrate Data Persistence Envoy TLS proxy services to the 2026 discovery intermediate - https://phabricator.wikimedia.org/T424674 [08:08:44] openssl s_client apus-fe2005.codfw.wmnet:443 /dev/null | openssl x509 -text -noout | grep CN [08:08:48] ^-- maybe better? [08:12:46] Emperor: not sure where it was written, but if I suggested to |grep openssl I was clearly a little but tired :D Yes definitely, | openssl x509 then grep [08:12:54] *bit [08:14:19] +1ed your change [08:14:34] right, let's try it on one apus frontend... [08:14:42] I am also rolling out the new intermediate to restbase*, I noticed Eric's patch and I just started it [08:17:29] oh fsck it, just got p.aged, will put this down and restart puppet [08:20:34] Emperor: I can take care of it if you want [08:21:01] I've restarted puppet on apus/codfw, the change isn't +2d yet, I'll come back to it (hopefully) [09:09:18] I won't be around in a few hours, in case you need me for any reason (shouldn't take long, but one never knows) [09:49:38] federico3: what's the status of the orchestrator issue? [09:50:24] no updates for now, I don't know what's causing it [09:50:42] did you see my last comment on the task? [09:53:21] yes [09:53:32] and did you issue it? [09:55:33] no, I'm trying to understand how it works and what can be the root cause as I'm not familiar with orchestrator's internals and I don't want to break it [09:56:21] that command is pretty safe to run, but ok I understand [09:58:42] federico3: I just issued it and it fixed the problem [09:59:20] how long ago? I've been looking at SELECT * FROM orchestrator.hostname_resolve; and did only contain full fqdns [10:37:35] apus tls intermediate rollover done, thanos CR gone in [13:22:24] I'm seeing a depooling hanging for a long time while draining connections: https://phabricator.wikimedia.org/P91927 on db1236 [13:23:19] federico3: wikiadmin is used for scripts most of the time, so I am sure if you resolve those ips it will come from mwmaint, deploy or some of other admin hosts [13:23:29] so probably something not re-loading the config I'd guess [13:23:30] Amir1: ^ [13:23:59] yeah, how long it's taking? [13:24:32] some might take five ten minutes [13:24:40] but if it's taking longer, then I can check [13:34:19] it took 30+ mins but now cleared [14:05:10] FYI, cassandra-dev2001 has been evicted from puppetdb since puppet was disabled for too long [14:06:05] urandom: ^-- [14:23:18] Hi folks, could I get a +1 on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1279372 please to remove 2 drained nodes so they can be reimaged to new style storage & VLAN, and the necessary changes to they'll be re-imaged for new-style storage ? [14:47:38] !? [14:48:49] moritzm: you mean the agent was disabled? [14:48:59] moritzm: what's required to get it back? [14:59:55] just checked over the serial console: it was disabled by you with "testing k8s service instance startups when as cassandra node is down" [15:01:07] you can reactivate it with the sre.puppet.renew-cert cookbook [15:04:02] just an `enable-puppet` should be enough, I don't think it needs a new cert [15:06:30] I remember disabling it (now), I can't believe I failed to re-enable it (I could have done so minutes afterward). [15:17:23] Emperor: {{done}} [15:22:24] thanks :) [15:28:36] moritzm: is that cookbook enough for the new key to propagate to https://config-master.wikimedia.org/known_hosts ? [15:28:53] maybe that takes a bit of time... ? [15:41:49] it'll show up once puppet ran on config-master1001 [23:19:03] FIRING: SystemdUnitFailed: swift_ring_manager.service on ms-fe2009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed