[00:29:26] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Q1:eqiad:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371435#10381923 (10Jclark-ctr) 05Open→03Resolved [08:49:45] I've just disabled Puppet on idp2004, because I honestly have no idea if rolling out the jmx agent will restart Tomcaat [08:53:30] For future reference: Yes, messing with /etc/default/tomcat10 will restart Tomcat [09:05:06] 10netops, 06Infrastructure-Foundations, 06SRE: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547#10382536 (10cmooney) >>! In T344547#9301201, @cmooney wrote: > One other observation is that the MED setting does not optimize the outbound path where we are us... [10:48:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:51:54] 10netops, 06Infrastructure-Foundations, 06SRE: Export routes generated from ARP/ND in EVPN - https://phabricator.wikimedia.org/T329369#10382861 (10cmooney) Huh so I've been looking at some of these old tasks while working on the Nokia testing. It's clear in the above the before / after are both the AFTE... [10:53:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:50:25] FIRING: SystemdUnitFailed: user-runtime-dir@499.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:31:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on build2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [14:16:16] 10CAS-SSO, 06Infrastructure-Foundations, 06SRE: Registry of multiple webauthn devices - https://phabricator.wikimedia.org/T380180#10383403 (10SLyngshede-WMF) Current working configuration: ` # WebAuthN cas.authn.mfa.web-authn.core.application-id=https://idp-test.wikimedia.org cas.authn.mfa.web-authn.core.r... [15:19:32] moritzm & volans, per the https://phabricator.wikimedia.org/T381538 task, do we have any docs on best practices for doing package upgrades? I can't seem to find anything on wikitech. What is your typical workflow moritzm? [15:22:01] see https://wikitech.wikimedia.org/wiki/Software_deployment [15:22:37] once you're uploaded to apt.wikimedia.org you can first deploy to "-s sretest" [15:23:00] and the simply roll out to "-s bullseye" if all is working fine [15:23:17] cloud vps will pick up the new version via unattended-upgrades over night [15:24:28] perfect thanks, I'm not sure why that didn't come up in my searches [15:28:04] admittedly the title might be a little too generic :-) [15:28:52] true :) [16:15:25] FIRING: SystemdUnitFailed: user@499.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:34:26] moritzm: given this config `bullseye: 4.3.0-2~wmf11+1` any idea why debdeploy would report that sretest1001, is already upgraded? [16:35:19] perhaps some artifact of me testing on the same host yesterday? [16:35:50] checking [16:36:09] where is your debdeploy spec file, on cumin1001? [16:37:07] yes, cumin1002: /home/jhathaway/2024-12-05-facter.yaml [16:37:44] s/yes/no/ :( [16:38:04] there is no cumin1001 :D so yes was the right answer :D [16:38:06] checking [16:38:09] yeah :-) [16:38:11] phew [16:38:42] ah, so this is a bit of a special corner case you're hitting [16:38:59] debdeploy attempts to upgrade all existing binary packages built from a source package [16:39:57] in this case the old package has facter and libfacter3.14.12 installed, but it the new one ships facter and thus debdeploy can't upgrade libfactor3.14.12 [16:40:04] nod, makes sense [16:40:15] this is intentional to prevent broken updates [16:40:30] if e.g. in Debian a package failed to built on an arch [16:41:19] got it [16:41:23] so in this case I'd actually upgrade via cumin and just run "apt-get install facter" [16:42:19] standard_packages.pp also has a some general cleanup for old library packages (since these will also be found in cloud vps), we can add libfacter3.14.12 there when the update is though [16:42:41] ok, wasn't sure if some of those corner case prompts on creating the spec would help, but looks like no [16:43:41] I have some long standin TODO do also handle such transitons via a spec file, but it never made it to the top :-) [16:44:30] the curse of everyone's TODO list :) [16:46:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on build2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [20:15:25] FIRING: SystemdUnitFailed: user@499.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed