[07:32:44] <jayme>	 klausman, brouberol, btullis, elukey: I've switched the k8s clusters to use discovery2026 and will start kicking off certificate refreshes on codfw clusters in 10min if there are no objections
[07:33:25] <jayme>	 I will stagger the refreshes across 30min per cluster, so we don't have "big bang" expiration time windows
[07:33:54] <btullis>	 Super, thanks jayme.
[07:46:25] <klausman>	 ack!
[07:57:04] <brouberol>	 thanks for the headsup!
[09:44:29] <atsukoito>	 I'm gonna roll https://gerrit.wikimedia.org/r/1277471 which replaces /usr/local/bin/charlie with /usr/bin/charlie. shouldn't be any problems, just heads up
[09:51:14] <btullis>	 We've got an SSL error from opensearch in codfw here: https://airflow-platform-eng.wikimedia.org/dags/spur_download_and_index_anonymous_residential_codfw/grid?dag_run_id=scheduled__2026-04-27T16%3A00%3A00%2B00%3A00&task_id=download_and_index_feed_codfw
[09:52:01] <btullis>	 I'm not 100% sure that it's related to this work, but it seems possible. I can roll-restart the cluster, which should pick up any new certificates.
[09:59:23] <btullis>	 The istio ingressgateway had the new certificate and was fine, but it also re-encrypts the request and sends it to the upstream opensearch cluster. This might be where it was getting broken. This gave a 503. `curl -I https://opensearch-ipoid.svc.codfw.wmnet:30443/_bulk`
[10:04:29] <jayme>	 btullis: from what I recall that's a self signed cert the opensearch operator issues, right?
[10:05:00] <jayme>	 ah, it's not
[10:05:06] <btullis>	 Not self-signed. It uses cert-manager.
[10:05:16] <jayme>	 but 503 means everything is fine :)
[10:05:18] <jayme>	 cert wise
[10:05:28] <btullis>	 Roll-restarting the cluster didn't help.
[10:07:39] <btullis>	 Well it's doing double TLS. Once it's decrypting at the ingressgateway, then it's re-encrypting to send onto the opensearch cluster itself. It could be this second TLS that is having a problem, and then I think that perhaps istion would be generating the 503 to send back to the client. I'll keep investigating.
[10:08:39] <brouberol>	 btullis: the opensearch cluster have a reference to the discovery issuer in their chart IIRC
[10:08:45] <jayme>	 maybe...but the error from the airflow logs seems different
[10:09:22] <brouberol>	 https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/8f85b131df743fad97527c318a1816264931676d/charts/opensearch-cluster/templates/certificate_wmf.yaml#74
[10:09:26] <btullis>	 brouberol: OK, yes. I see this:
[10:09:29] <jayme>	 SSLError(HTTPSConnectionPool(host='opensearch-ipoid.svc.codfw.wmnet', port=30443): Max retries exceeded with url: /_bulk (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2393)'))))
[10:09:30] <btullis>	 https://www.irccloud.com/pastebin/gINcgkVD/
[10:09:50] <brouberol>	 hmm, I've never seen this before :/
[10:09:56] <jayme>	 brouberol: thats fine. the "issuer" in that regard did not change
[10:10:18] <brouberol>	 > This error typically occurs during the SSL/TLS handshake when the client fails to authenticate the server, causing the connection to drop unexpectedly (resulting in an "End of File" or EOF).
[10:10:18] <brouberol>	 https://www.w3tutorials.net/blog/python-sockets-ssl-eof-occurred-in-violation-of-protocol/
[10:10:23] <brouberol>	 jayme: ack
[10:12:03] <brouberol>	 I have to step away for a couple of hours, sorry :/
[10:12:27] <btullis>	 No worries.
[10:19:29] <btullis>	 Oh I think I might know what it is. It might be unrelated to this cert-manager referesh. Checking something.
[10:21:03] <btullis>	 Yes, it's my fault. This gives a 503 too `curl -I https://opensearch-ipoid.svc.eqiad.wmnet:30443/_bulk` but this doesn't `curl -I https://opensearch-ipoid.svc.eqiad.wmnet:30443/`
[10:21:26] <btullis>	 Therefore, it's a mistake that I made in this patch: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1277486
[10:41:12] <btullis>	 I think that this might fix it. https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1278389
[10:41:30] <btullis>	 Sorry for wasting your time, jayme 
[10:41:37] <jayme>	 np
[10:41:56] <jayme>	 if all you have is openssl, everything looks like a certificate
[10:57:54] <brouberol>	 ^ drive by comment, but I suggest we bash this
[13:42:33] <elukey>	 jayme: nice woork!! <3
[13:43:07] <jayme>	 I really hope it is :D