[11:11:12] tappof: your patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/1182148 breaks Puppet on the hosts still on Puppet 5: https://puppetboard.wikimedia.org/nodes?status=failed [11:11:22] max() isn't supported yet in Puppet 5 [11:12:05] ok, I'm going to revert hnowlan moritzm mszabo [11:12:30] thz moritzm [11:14:55] ack [11:16:02] thanks. hopefully the last Puppet 5 hosts will be gone in 3-4 weeks, if that's needed earlier we could re-submit the patch and make it conditional on the Puppet version [11:20:11] No problem moritzm, I can swap the max function with an if for the required usage.. [11:27:30] ok, there's a "puppetversion" fact, so if you want to make something conditional on Puppet 7, that would be one option [13:26:19] hello oncallers [13:26:34] we are finally ready to repool codfw for kartotherian [13:26:39] (so maps.wikimedia.org) [13:26:51] codfw runs on a new stack, finally on Bookworm [13:26:58] {◕ ◡ ◕} [13:29:26] \o/ [13:42:00] oncallers, FYI - around 14:30 UTC, m.oritzm and I will be picking up where we left off yesterday, and applying the same etcd maintenance in eqiad as we did in codfw yesterday for T352245. [13:42:00] this would be the same process as yesterday, though probably a bit faster now that we've done it once :) [13:42:01] T352245: Migrate the etcd main cluster to cfssl-based PKI - https://phabricator.wikimedia.org/T352245 [13:42:21] claime: o/ I just saw this sitting in my gerrit queue! QQ: Can I just merge, or do we need to manually delete the canary release from staging? [13:42:21] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1119206 [13:43:56] I think merge and then helmfile destroy with the proper selectors [13:44:09] right... okay [13:45:55] Hmm maybe destroy first actually [13:46:14] Not sure if helmfile can destroy a release it doesn't know about ottomata [13:47:57] k found https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#%60helmfile_destroy%60_without_root_permissions [13:50:33] yeah if it doesn't work ping me and I'll do it [13:50:35] (but it should) [13:51:52] anyone seeing puppet errors with gpg key not available from upstream bookwoorm repos? `GPG error: http://mirrors.wikimedia.org/osbpo bookworm-dalmatian-backports-nochange InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 56056AB2FEE4EECB` [13:52:49] hmm, I can find that key on keys.openpgp.org [13:52:57] https://www.irccloud.com/pastebin/Wcpa83V5/ [13:53:34] the keys are install via Puppet; modules/aptrepo/files/updates-keys [13:53:38] installed [13:53:50] 👀 [13:54:31] maybe the key expired, or it changed or maybe we didn't install it via Puppet in the first place? [13:54:56] looking [13:56:02] hmm, that's not what is making puppet fail though, but would be good to cleanup yep [13:58:27] claime: help me construct the correct command and i'll update the wikitech page [13:58:33] i'm trying [13:58:34] helmfile -i destroy -l release=canary -e staging [13:58:38] err: no releases found that matches specified selector(release=canary) and environment(staging), in any helmfile [14:01:11] helmfile -e staging --selector name=canary << that's the right selector [14:01:14] not release, name [14:01:16] those repos are not installed anymore, but as the VMs are old the got them installed (and were not absented), so they still have them even if they don't work, cumin to the rescue [14:01:19] oh name okay [14:03:40] thanks claime it worked! updated docs: [14:03:40] https://wikitech.wikimedia.org/w/index.php?title=Kubernetes%2FDeployments&diff=2336367&oldid=2309913 [14:37:24] herron: claime: arnaudb: FYI, moritzm and I will be starting the etcd maintenance in eqiad shortly [14:37:41] swfrench-wmf: ack, good luck! [14:38:04] thank you :) [14:39:24] ack thanks for the notification [14:45:14] swfrench-wmf: dogspeed [15:26:09] updates: etcd maintenance in eqiad is largely complete. all that remains are the rolling restarts of all eqiad-associated confds, ETA 10-20 minutes for those to finish. [15:26:20] very nice work :) [15:26:33] many thanks to m.oritzm and v.gutierrez :) [15:26:35] thanks! [20:03:20] claime: RE https://phabricator.wikimedia.org/T313900#11119488 - It looks like mw-script has a memory limit that is lower than what MediaWiki sets as its limit for a single web request (wmgMemoryLimit 1400MiB vs helmfile.d/services/mw-script/values.yaml 1200Mi) [20:06:24] A maintenance script is generally likely to consume more memory than a web request. In core we default to 3x the web memory for a CLI process (50M vs 150M). With the web limit having risen so far for Parsoid, we don't need the same ratio in prod of course. Historically mwscript run without limit I think? (as per php-cli default) I'm guessing we haven't seen (other) OOMs, so maybe it's fine, but the mismatch seemed sus.