[06:06:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:49:19] 10CAS-SSO, 10Bitu, 06Infrastructure-Foundations, 10Phabricator, 10Striker: Inconsistent mapping of Developer accounts and SUL accounts across Phabricator, Bitu, and Striker - https://phabricator.wikimedia.org/T388498#10627203 (10Arendpieter) >>! In T388498#10626310, @bd808 wrote: > That would be @Bugrepo... [08:31:16] 07Puppet, 06SRE: puppet error at the end of the run on prometheus2008: Could not autoload puppet/reports/logstash: Cannot invoke "jnr.netdb.Service.getName()" because "service" is null - https://phabricator.wikimedia.org/T388629 (10fgiunchedi) 03NEW [09:06:34] o/ [09:07:13] I filed https://gerrit.wikimedia.org/r/c/mediawiki/services/kartotherian/+/1126598 for kartotherian, the idea is to try jemalloc via LD_PRELOAD to figure out if what appears to be a memory leak may be memory fragmentation or similar [09:07:27] first time that I use it, if anybody has more experience lemme know [09:07:46] but a lot of nodejs users were happy about it [09:08:21] the other road is https://nodejs.org/en/learn/diagnostics/memory/using-heap-snapshot#how-to-find-a-memory-leak-with-heap-snapshots but in production is a little more difficult (we could try to repro locally too) [10:06:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:46:52] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641 (10ayounsi) 03NEW p:05Triage→03Low [10:53:31] 10netops, 06Infrastructure-Foundations: gnmi_interfaces_interface_state_oper_status missing from most devices - https://phabricator.wikimedia.org/T388642 (10ayounsi) 03NEW [10:54:01] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10627856 (10ayounsi) [10:54:44] 10netops, 06Infrastructure-Foundations: gnmi_interfaces_interface_state_oper_status missing from most devices - https://phabricator.wikimedia.org/T388642#10627863 (10ayounsi) [10:54:47] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10627862 (10ayounsi) [10:56:39] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Migrate port utilisation alert from LibreNMS to alertmanager - https://phabricator.wikimedia.org/T384052#10627894 (10cmooney) [10:56:42] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10627895 (10cmooney) [11:41:03] 10netops, 06Infrastructure-Foundations: gnmi_interfaces_interface_state_oper_status missing from most devices - https://phabricator.wikimedia.org/T388642#10628022 (10ayounsi) 05Open→03Resolved a:03ayounsi Chatted about it with Cathal on IRC, the gNMIc deamon just needed a restart. [12:06:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:09:56] FIRING: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:14:56] RESOLVED: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [13:36:42] !incidents [13:36:46] grrr