[07:27:32] https://geekpython.in/gil-become-optional-in-python Python 3.13 will make the GIL optional [08:12:05] can I do something to help you, kamila_ ? [08:12:48] fabfur: I'm not sure what's happening [08:13:25] me neither, reading some docs [08:13:35] (I'm not particularly awake yet '^^) [08:14:03] (ah... can totally understand you) [08:14:10] did you do anything? [08:15:05] no, it resolved itself [08:16:16] probably other alerts are related, let me have a look at some dashboards [08:17:06] yes, php workers busy is def related, wondering if it was caused by a traffic spike or something [08:18:35] I see just a small spike of traffic in magru but I need to investigate better [08:20:18] I don't even think it's related [12:18:51] Hey folks, I noticed https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org&q=lists [12:19:08] but checking on list1004's 443 port I see [12:19:17] Not Before: Jul 14 00:50:01 2024 GMT [12:19:17] Not After : Oct 12 00:50:00 2024 GMT [12:19:57] it is probably an old check, but better to double check [12:23:47] elukey: what's the problem? [12:24:05] maybe I need more coffee but that looks good? [12:24:21] vgutierrez: it does, but icinga/alert-manager don't think so :D [12:24:27] oh the alert is complaining [12:24:44] I think that's smtp not https that's being checked [12:25:03] ahhhh [12:30:55] I've lost my openssl mojo, openssl s_client -connect lists1004:25 -starttls smtp doesn't work as expected :| [12:33:32] tried this on the node [12:33:32] elukey@lists1004:~$ /usr/lib/nagios/plugins/check_smtp -F $(hostname -f) -H localhost --starttls -D 20,15 [12:33:35] CRITICAL - Certificate 'lists.wikimedia.org' expires in 12 hours (Tue 13 Aug 2024 12:55:14 AM GMT +0000) [12:33:37] elukey: quick check on lists puppetization tells me that acme_chief::cert is notifying apache2 but not exim? [12:34:01] so a manual reload of exim4 should fix it assuming that would trigger a TLS material reload [12:34:03] vgutierrez: this is a good point, so you suspect that reloading exim should do the trick [12:34:15] okok yes we are on the same page [12:34:55] vgutierrez@lists1004:~$ sudo -i openssl x509 -dates -noout -in /etc/acmecerts/lists/live/rsa-2048.crt [12:34:55] notBefore=Jul 14 00:50:01 2024 GMT [12:34:56] notAfter=Oct 12 00:50:00 2024 GMT [12:34:56] exim4 was restarted a month and 10 days ago [12:35:13] okok it matches with the dates [12:35:16] modules/profile/templates/exim/exim4.conf.mailman.erb:tls_certificate = /etc/acmecerts/lists/live/rsa-2048.chained.crt [12:35:20] ok if I restart exim4? [12:35:28] +1 [12:36:25] OK - Certificate 'lists.wikimedia.org' will expire on Sat 12 Oct 2024 12:50:00 AM GMT +0000. [12:36:28] bingo :) [12:36:51] today vgutierrez is on fire [12:37:20] restarted exim4 also on list2001 [12:37:47] elukey: so modules/profile/manifests/lists.pp:152, I think that adding "puppet_rsc => Service['exim4']" should fix it [12:38:58] vgutierrez: makes sense, what rsc stands for? [12:39:04] resource [12:39:36] from acme_chief::cert source code [12:39:43] https://www.irccloud.com/pastebin/oVZuU1he/ [12:40:04] assuming that notifying Service['exim4'] is enough of course [12:40:06] yeah I checked it but it wasn't super clear.. why don't we use puppet_svc ? So restart both exim4 and apache2 [12:40:19] elukey: that wouldn't work? [12:40:37] puppet_svc is Optional[String] so that would require refactoring acme_chief::cert AFAIK [12:40:56] unless I'm missing something pretty obvious about puppet handling that Service[$puppet_svc] [12:41:59] vgutierrez: yes yes got it, but it doesn't seem a huge patch, it is just allowing a list of svcs instead of a single name. Not sure why we have both _svc and _rsc, and maybe not a single list of resources to act on [12:42:18] just trying to understand the code, not opposing to your patch [12:42:57] anyway, for the moment it seems the quickest [12:43:29] puppet_rsc is still needed.. we have some places where we need to notify an Exec[] rather than a Service[] [12:44:14] and refactoring puppet_svc on acme_chief::cert would require some changes across the repo... [12:44:17] vgutierrez@carrot:~/wikimedia.org/operations/puppet$ git grep "acme_chief::cert {" |wc -l [12:44:17] 33 [12:44:22] but yes, totally doable [12:45:51] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1062003 running PCC :D [12:54:47] elukey: looks good assuming it's ok to reload/restart exim4 automatically [13:01:56] vgutierrez: I added Jesse to the code review, so we can get more +1s.. in my opinion leaving things as they are is very dangerous [13:04:04] yep, i'm just not familiar enough with the service [13:04:44] same, we do have time though before the next expire [13:04:55] no rush, we can do it anytime [13:05:00] thanks for the help! [13:13:07] (merged, Jesse already reviewed) [13:13:09] ----- [13:13:37] for the on-callers - I am going to move puppet private's main usage from puppetmaster1001 to puppetserver1001 [13:14:01] I am not copying/moving anything, only merging two changes: [13:14:05] 1) https://gerrit.wikimedia.org/r/c/operations/puppet/+/1052261 [13:14:11] 2) https://gerrit.wikimedia.org/r/c/operations/puppet/+/1061991 [13:14:28] basically 2) moves a timer that commits to /srv/private on puppetmaster1001 to puppetserver1001 [13:14:42] and 1) just disarm pre/post-commit hooks on puppetmaster1001 [13:14:47] easy to revert if needed [13:15:36] please refrain from making puppet private changes for the next 15/20 mins [13:16:43] on puppetmaster1001 there is an unstaged requestctl/request-haproxy-actions/, anybody working on it? [13:25:57] jelto, godog, btullis - I see you logged in from `last`, anything that you are working on? [13:26:07] otherwise I'll unstage the files [13:26:25] elukey: not on my end, I've logged out [13:27:17] I closed my SSH connection, I'm not working on anything in puppet at the moment [13:27:27] thank youu [13:27:50] I think for Ben is the same, probably some leftovers for tests [13:28:47] I moved those staged files under my home dir [13:41:15] aaand we are done [13:41:26] going to do some sanity tests [13:42:10] jhathaway: Can you give https://gerrit.wikimedia.org/r/c/operations/puppet/+/1060919 a glance? It fixes a current breakage on cloud-vps but I'm not sure what it'll do in prod [13:42:34] elukey: ^ [13:42:37] andrewbogott: yup will do [13:42:45] thx [13:47:46] all done folks! [13:48:10] from now on please use requestctl and commit to /srv/git/private on puppetserver1001 [13:48:17] for any doubt/question/etc.. ping me :) [13:50:51] elukey: should we change motd on puppetmaster1001 ? [13:51:03] and/or ping ops-private@? :) [13:51:16] or disable puppet-merge there? [13:52:13] wait wait too many things :) [13:53:01] re: motd, yes we could, we never really had one, plus now the pre-commit hook on puppetmaster1001 is disabled so it shouldn't be a problem (somebody will forget even with the MOTD for sure :D) [13:53:37] re: ops-private@, I sent an email to all SREs, should be enough, lemme know if there are more people on ops-private@ that I am not aware of [13:54:18] re: disable puppet-merge, still needed since it is only private for the moment (I know it is annoying) [14:26:52] elukey: Sorry, I missed the ping. I'm also clear and hadn't made any changes to private puppet today. [14:27:00] <3