[08:59:23] 10SRE-tools, 06Data-Persistence-SRE, 06DBA, 10Spicerack: spicerack mysql_legacy: support fetch metrics for instance - https://phabricator.wikimedia.org/T376596 (10ABran-WMF) 03NEW [09:03:29] 10SRE-tools, 06Data-Persistence-SRE, 06DBA, 06Infrastructure-Foundations, and 2 others: spicerack mysql_legacy: support fetch metrics for instance - https://phabricator.wikimedia.org/T376596#10205940 (10ABran-WMF) 05Open→03In progress p:05Triage→03Medium a:03ABran-WMF [09:04:40] 10SRE-tools, 06Data-Persistence-SRE, 06DBA, 06Infrastructure-Foundations, and 2 others: spicerack mysql_legacy: support fetch metrics for instance - https://phabricator.wikimedia.org/T376596#10205946 (10Volans) Spicerack has support for prometheus, why not getting the metrics directly from there? [09:21:12] 10SRE-tools, 06Data-Persistence-SRE, 06DBA, 06Infrastructure-Foundations, and 2 others: spicerack mysql_legacy: support fetch metrics for instance - https://phabricator.wikimedia.org/T376596#10205989 (10ABran-WMF) I was unaware of that feature, its way better indeed :) [09:21:14] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Support listing pooled / active authdns hosts (rather than all) - https://phabricator.wikimedia.org/T375014#10205990 (10Volans) @ssingh what do you think of the above draft patch proposal? If that works for you I'll complete it and... [09:32:08] 10CAS-SSO, 06Infrastructure-Foundations: CAS update service script to sync memcache - https://phabricator.wikimedia.org/T273484#10206022 (10SLyngshede-WMF) 05Invalid→03Declined [09:32:22] 10CAS-SSO, 06Infrastructure-Foundations: CAS update service script to sync memcache - https://phabricator.wikimedia.org/T273484#10206019 (10SLyngshede-WMF) 05Open→03Invalid We're no longer using memcached and will be switching to using the Redis cluster. [11:05:43] 10SRE-tools, 06Infrastructure-Foundations: redfish: minimum version support - https://phabricator.wikimedia.org/T328593#10206342 (10ayounsi) I re-ran John's script: ===== Hosts that require Manual upgrade (53): ====== 2.30.30.30 (1) puppetmaster1001 ====== 2.50.50.50 (38) an-launcher1002, an-presto1001, an-p... [11:21:51] moritzm: what's the timeline to decom puppetmaster1001 ? To know if my last comment on https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1077497/comment/5b5e53c0_5f260675/ is realistic or not :) [11:22:12] otherwise the other option is to upgrade iDRAC on it [11:24:10] unrelated, someone knows if the CI error here is a CI bug or related to my recent PS (only added tests) https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/1077661 ? [11:25:07] I'm sending a patch [11:25:52] XioNoX: https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/1078383 should fix it [11:26:58] XioNoX: depends on external factors we can't control, maybe end of the quarter, maybe end of next quarter [11:27:28] buster updates are still missing by various teams and maps is unowned and on buster as well [11:27:59] hopefully it'll be moved to k8s/bookworm in the next months, then we could handle the rest of the stack update ourselves [11:28:53] OTOH, it's far simpler these days to take it out of production, so if you want DC ops to update it, we can make that work with not too much effort [11:43:28] yeah and upgrading iDRAC doesn't affect the host itself iirc [11:43:36] so I'll open a task for that [11:43:43] volans: thx! [13:40:35] 10SRE-tools, 10conftool, 06DBA, 06Infrastructure-Foundations, 10Spicerack: Spicerack support for dbctl - https://phabricator.wikimedia.org/T362893#10206767 (10Volans) 05Open→03Resolved This has been released and tested. Resolving. [14:45:39] Hey! I've replicated the alerts related to RIPE Atlas IPv4 anchors from Icinga to Prometheus/Alertmanager, and they seem to be working fine. I have a second patchset ready to clean up the Puppet repo and remove the old checks from Icinga. [14:45:41] Since I believe these checks might be of interest to you, I was wondering if you could take a look and review the patchset: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1069117. [14:45:43] Thank you [15:07:00] sure, will have a look today or tomorrow! [15:07:25] thanks XioNoX [15:16:03] fyi, jobo and all, all the doc on UEFI boot is on that page https://wikitech.wikimedia.org/wiki/UEFI_Boot [15:18:41] jhathaway (and sharing it here for wider audience): https://github.com/ipxe/ipxe/issues/1316#issuecomment-2396671777 "while UEFI does technically provide an HTTP client abstraction, it is such an abomination of design failures that we choose not to go anywhere near it" [15:20:40] there are some pretty great commit messages along those lines in the iPXE source code as well [15:20:40] rotfl [15:28:10] volans: fyi, different jenkins -1 that seems unrelated to my latest PS https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/1077661/7..8#message-0477d4314147c854a23f26df0089a7fa24d2bcad [15:29:01] nevermind, just saw your message on -sre [15:29:04] XioNoX: yes, see -sre, same of cookbooks :) [15:29:15] pylint 3+, I'll send a patch [15:52:21] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade Management routers to 23.4R2-S2 - https://phabricator.wikimedia.org/T369504#10207239 (10Papaul) [15:54:27] Thank you! [16:19:29] puppetserver1002 and 1003 now have 128G RAM (using DIMMs from decommissioned servers, as a bridge until they are refreshed with new hardware directly with 128G) [16:19:50] awesome [16:19:51] the servers in codfw will also get decommed DIMMs shipped over from eqiad [16:21:25] for 1001 and 2001 we'll probably just need to announce some 30 minutes windows where people can't puppet-merge and decom/install servers [16:24:51] really great :) [16:27:05] jhathaway, XioNoX - is it ok if I reset the sretest2001's BMC to make some BIOS related tests? [16:27:32] yup go for it, I am done testing with it for the moment [16:27:51] super [16:32:32] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Spicerack: expand Supermicro support in the Redfish module - https://phabricator.wikimedia.org/T365372#10207364 (10elukey) Some notes: ml-serve* Supermicro nodes are AMD CPU based, so some BIOS settings don't apply to... [20:15:54] 10Mail, 06Infrastructure-Foundations, 06SRE: Lisa@wikipedia.org is receiving a large number of donor responses - https://phabricator.wikimedia.org/T375643#10208242 (10nisrael) Hi all, I met with Lisa today and we retrieved an example of one of the responses she's received. I'm attaching an image of it and I... [20:20:43] 10Mail, 06Infrastructure-Foundations, 06SRE: Lisa@wikipedia.org is receiving a large number of donor responses - https://phabricator.wikimedia.org/T375643#10208262 (10jhathaway) >>! In T375643#10208242, @nisrael wrote: > I met with Lisa today and we retrieved an example of one of the responses she's received...