[02:20:05] FIRING: [4x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [03:33:32] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [06:20:05] FIRING: [4x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [07:04:10] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: InboundInterfaceErrors reports for fasw2-c1a-eqiad:9804 frmon1002 ge-0/0/11 - https://phabricator.wikimedia.org/T398442#10973970 (10ayounsi) →14Duplicate dup:03T398315 [07:04:49] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: InboundInterfaceErrors reports for fasw2-c1a-eqiad:9804 frmon1002 ge-0/0/11 - https://phabricator.wikimedia.org/T398442#10973975 (10ayounsi) Closing that task as duplicate of the automatically opened one. If I do th... [07:33:32] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [09:01:22] 10netops, 06Infrastructure-Foundations: lsw1-a8-codfw: fpc0 PFE Statistics received unknown trigger (type Semaphore, id 0) - https://phabricator.wikimedia.org/T398433#10974243 (10cmooney) My vote would be to try a reboot first. We've 49 EVPN switches running 22.2R3.15, and we only have this issue on one of th... [09:06:20] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: VC link from asw2-c4-eqiad to asw2-c7-eqiad flapping - https://phabricator.wikimedia.org/T398612#10974246 (10cmooney) p:05High→03Medium This has been stable since the optics were replaced yesterday. I will review again next week a... [09:20:35] 10netops, 06Infrastructure-Foundations: lsw1-a8-codfw: fpc0 PFE Statistics received unknown trigger (type Semaphore, id 0) - https://phabricator.wikimedia.org/T398433#10974325 (10ayounsi) Sounds good! @jijiki @Ladsgroup @Marostegui Would it be possible to sync up to depool those 3 hosts for a switch reboot? [09:46:57] 10netops, 06Infrastructure-Foundations: lsw1-a8-codfw: fpc0 PFE Statistics received unknown trigger (type Semaphore, id 0) - https://phabricator.wikimedia.org/T398433#10974426 (10Ladsgroup) db1246 is a normal s1 replica. I can depool the db at any time (you can depool it yourself too if you want to). Let me kn... [09:48:52] 10netops, 06Infrastructure-Foundations: lsw1-a8-codfw: fpc0 PFE Statistics received unknown trigger (type Semaphore, id 0) - https://phabricator.wikimedia.org/T398433#10974433 (10ayounsi) Sweet, what about 12:00UTC on Monday 7th ? [09:49:52] 10netops, 06Infrastructure-Foundations, 06SRE: DNS resolution not working on Juniper virtual-chassis switches eqiad - https://phabricator.wikimedia.org/T398690 (10cmooney) 03NEW p:05Triage→03Medium [09:53:32] 10netops, 06Infrastructure-Foundations: lsw1-a8-codfw: fpc0 PFE Statistics received unknown trigger (type Semaphore, id 0) - https://phabricator.wikimedia.org/T398433#10974481 (10Ladsgroup) sounds good. I try to be around but if I couldn't for any reason, depool it yourself (https://wikitech.wikimedia.org/wiki... [10:11:28] hello folks [10:11:50] I am working on fixing the http_boot_once spicerack api for idrac 10, it seems not working [10:11:53] I came up with https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/1166371 [10:12:38] I wanted to test it manually via redfish, and I thought to use sre.hosts.dhcp to create the config [10:12:49] but afaics it is very distant now from what reimage offers [10:12:57] (no uefi, no mac, etc..) [10:13:14] should we try to refactor the code to be more common between the two? [10:13:30] ideally, we could call sre.hosts.dhcp from reimage [10:13:35] does it make sense? [10:20:05] FIRING: [4x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [10:34:32] (for the moment I am going to modify reimage directly and test with test-cookbook) [10:34:50] FIRING: [6x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [10:44:50] FIRING: [8x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [11:33:32] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [12:33:18] it worked, but https://phabricator.wikimedia.org/T393044#10975047 [12:33:21] lol [13:00:59] of course I don't find those TLS settings [13:03:53] ok found it in the BIOS, good luck doing it in redfish [13:03:57] * elukey cries in a corner [13:06:20] all right they are all called "HttpDev1TlsMode" etc.. [13:06:51] so they need to be set for all NICs [13:11:35] ok now reimage works :) [13:12:06] correction - it is apparently booting in d-i [13:12:58] to summarize - we do the following for UEFI boot (in reimage) [13:12:59] dhcp_filename = f'http://{apt_ip}/efiboot/snponly.efi' [13:13:24] there is an explanation why we need to add the ip as opposed to the fqdn, but we use plain http [13:13:52] the new idrac 10 want to use https://, unless we explicitly disable TLS via Bios settings like HttpDev1TlsMode: None [13:16:51] elukey: yeah that would make sens, also adapt sre.hosts.dhcp to work with VMs