[00:24:38] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [00:24:38] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [04:24:38] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [04:24:38] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [08:24:38] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [08:24:38] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [13:53:30] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: Link down between cr3-ulsfo and cr4-ulsfo - https://phabricator.wikimedia.org/T390731#10783352 (10cmooney) p:05High→03Medium [15:36:33] e-lukey and X-ioNoX , I added y'all to https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1140504 (including dns wipe-cache cookbook in the rename cookbook). No hurry but feel free to ping me if I can fix it up [16:24:38] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [16:24:38] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [19:13:25] FIRING: [6x] SystemdUnitFailed: netbox_ganeti_codfw_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:18:25] FIRING: [11x] SystemdUnitFailed: netbox_ganeti_codfw_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:18:39] hi folks. was someone working on netbox? [19:18:42] it seems to be down [19:19:32] seems like uwsgi-netbox was OOM'ed [19:19:33] [Thu May 1 19:01:44 2025] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/uwsgi-netbox.service,task=uwsgi,pid=3013897,uid=33 [19:21:08] that brought it back up. [19:22:04] I started the service again to resolve it and I am not sure yet what happened but it happened around the 19:03 mark [19:22:07] https://grafana.wikimedia.org/goto/RmLRtvbNg?orgId=1 [19:23:19] nothing in the SAL as well to indicate what went wrong, nothing related that is. [19:23:25] FIRING: [15x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:28:25] FIRING: [15x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:33:25] FIRING: [15x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:57:34] thanks sukhe! [20:24:38] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [20:24:38] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [21:50:43] 10Mail, 06Infrastructure-Foundations, 06Trust-and-Safety: Enable inbound Zendesk SMTP Connector support - https://phabricator.wikimedia.org/T393131 (10jhathaway) 03NEW [21:50:50] 10Mail, 06Infrastructure-Foundations, 06Trust-and-Safety: Enable inbound Zendesk SMTP Connector support - https://phabricator.wikimedia.org/T393131#10784659 (10jhathaway) p:05Triage→03Medium [21:50:58] 10Mail, 06Infrastructure-Foundations, 06Trust-and-Safety: Enable inbound Zendesk SMTP Connector support - https://phabricator.wikimedia.org/T393131#10784660 (10jhathaway) a:03jhathaway [21:54:08] 10Mail, 06Infrastructure-Foundations, 06Trust-and-Safety: Enable inbound Zendesk SMTP Connector support - https://phabricator.wikimedia.org/T393131#10784668 (10jhathaway) 1. Install `libsasl2-modules`, otherwise postfix will not have support for any SASL mechanisms, including plain text 2. SASL credentials... [21:55:24] 10Mail, 06Infrastructure-Foundations, 06Trust-and-Safety: Enable inbound Zendesk SMTP Connector support - https://phabricator.wikimedia.org/T393131#10784671 (10jhathaway) This setup worked in testing, but in production, we receive the following error from Zendesk: ` 535 5.7.8 Valid client TLS certificate is... [23:33:25] FIRING: SystemdUnitFailed: wmf_auto_restart_uwsgi-netbox.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed