[01:30:04] FIRING: [2x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [03:33:32] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [05:30:04] FIRING: [2x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [06:18:20] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: network and DNS configuration for new eqiad frack pay-lb servers - https://phabricator.wikimedia.org/T397865#10961884 (10ayounsi) @Jgreen please sync up with us on IRC for the Netops part. [07:33:32] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [08:58:36] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 06SRE, and 2 others: Replace Exim on VRTS servers with Postfix - https://phabricator.wikimedia.org/T378028#10962185 (10Arnoldokoth) [09:30:04] FIRING: [2x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [10:54:49] FIRING: [3x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [11:33:32] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [12:04:49] FIRING: [4x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [12:09:49] FIRING: [5x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [12:14:49] FIRING: [6x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [12:19:49] FIRING: [8x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [12:30:25] FIRING: SystemdUnitFailed: user@499.service on aux-k8s-worker1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:34:47] topranks, XioNoX: in https://phabricator.wikimedia.org/T372909 we created a separate netflow1003 VM running on the routed ganeti cluster, but currently netflow1002 is still the active one for eqiad. given that routed ganeti is stable, I would fail this over to 1003 and then decom netflow1002? [12:40:04] moritzm: I can have a look [12:40:25] FIRING: [2x] SystemdUnitFailed: user@499.service on aux-k8s-worker1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:45:25] FIRING: [3x] SystemdUnitFailed: user@499.service on aux-k8s-worker1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:50:25] FIRING: [3x] SystemdUnitFailed: user@499.service on aux-k8s-worker1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:55:25] FIRING: [3x] SystemdUnitFailed: user@499.service on aux-k8s-worker1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:00:25] RESOLVED: [3x] SystemdUnitFailed: user@499.service on aux-k8s-worker1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:08:38] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: network and DNS configuration for new eqiad frack pay-lb servers - https://phabricator.wikimedia.org/T397865#10963400 (10ayounsi) [13:13:33] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 13Patch-For-Review: network and DNS configuration for new eqiad frack pay-lb servers - https://phabricator.wikimedia.org/T397865#10963414 (10Jgreen) [13:16:08] moritzm: ah no, those two are in eqiad, it's to spread the gNMI load on two instances [13:16:20] so we need the two [13:19:08] but I don't see netflow1003 in homer? only 1002 [13:20:07] moritzm: yeah gNMI is pull based, it's netflow100X that connects to the devices [13:21:24] ok [13:22:56] if it helps making the setup simpler we could also simply double the number of vpus and RAM, but having two also works for me. I had only assumed that was a left over of the routed ganeti pilot VMs [13:23:39] moritzm: doubling it also gives some redundancy, and spread the load on the prometheus scrappers, so it's better all around [13:25:02] ok! [13:35:49] XioNoX: for my info what’s the setup then? we have the two VMs but we only send flows to one? [13:36:01] topranks: yep [13:36:43] cool thanks [13:52:25] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 13Patch-For-Review: network and DNS configuration for new eqiad frack pay-lb servers - https://phabricator.wikimedia.org/T397865#10963580 (10Jgreen) 05Open→03Resolved a:03Jgreen [13:53:03] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 13Patch-For-Review: network and DNS configuration for new eqiad frack pay-lb servers - https://phabricator.wikimedia.org/T397865#10963585 (10Jgreen) p:05Triage→03Medium [13:56:31] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 13Patch-For-Review: network and DNS configuration for new eqiad frack pay-lb servers - https://phabricator.wikimedia.org/T397865#10963607 (10Jgreen) [13:57:34] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10963611 (10Jgreen) [15:33:32] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [15:42:02] 10netbox, 06Infrastructure-Foundations: Improve Netbox "locations" use - https://phabricator.wikimedia.org/T333948#10964356 (10cmooney) Hadn't re-read this one in a while. +1 on using the 'locations' to represent things that are in the same network. i.e. remove cloud racks or fundraising racks from the 'rows... [15:44:00] we have another issue with idrac10 I am afraid :( https://phabricator.wikimedia.org/T393044#10964305 [15:55:22] this one has a boss card though.. [16:01:27] ugh, does dell publish a changelog that we should be reviewing? [16:19:49] FIRING: [7x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [16:22:59] jhathaway: I think that they just publish the new Redfish reference manual online, but it is not super easy to navigate [16:23:22] and it involves some guesswork to figure out the new undocumented options :D [17:32:51] that is unfortunate, you would hope they would specify breaking changes [18:17:18] 10Mail, 06Infrastructure-Foundations, 06Trust-and-Safety: Emails from wikimediats.zendesk.com fails DMARC policy - https://phabricator.wikimedia.org/T378285#10964962 (10jhathaway) a:03jhathaway [19:33:32] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [20:20:04] FIRING: [7x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [23:33:32] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts