[04:05:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:48:44] FIRING: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [07:58:44] RESOLVED: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [08:05:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:44:58] 10SRE-tools, 06Infrastructure-Foundations, 06SRE, 07IPv6: Enable ipv6 on ganeti2019-ganeti2024 - https://phabricator.wikimedia.org/T379890#10604157 (10Volans) AFAICS we are still missing the AAAA record on all of the hosts listed in the task description. [08:50:21] aux-k8s-etcd1003 will temporarily be switched to DRBD, latencies will go up a bit [09:16:44] and back to normal [09:30:59] XioNoX, topranks: the puppet alert - operations above is related to some (broken?) Netbox data for network devices, does that ring a bell? https://puppetboard.wikimedia.org/report/prometheus7001.magru.wmnet/7a721c733eb6ca52314ec8f8e60e548b8b5b8d91 [09:31:13] (the puppet alert from -operations I meant) [09:31:58] moritzm: my bad, I'm working on it [09:33:24] I blame jbond :) [09:33:44] thanks for the ping, it's being sorted in -observability [09:33:48] always :-) [09:34:00] ack, I'll post a note to -operations that this is WIP [10:05:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:11:27] 10netops, 06Infrastructure-Foundations, 10Data-Engineering (Q3 2025 January 1st - March 31th): Update `netflow` retention strategy in Druid (too much data) - https://phabricator.wikimedia.org/T387839#10604443 (10ayounsi) So what about: * turnilo full dimensions - 1 months * turnilo sanisitzed/reduced - 12 mo... [10:13:37] 10netops, 06Infrastructure-Foundations, 10Data-Engineering (Q3 2025 January 1st - March 31th): Update `netflow` retention strategy in Druid (too much data) - https://phabricator.wikimedia.org/T387839#10604451 (10JAllemandou) >>! In T387839#10604443, @ayounsi wrote: > So what about: > * turnilo full dimension... [10:20:44] 10netops, 06Infrastructure-Foundations: Different BFD settings on direct connected links - https://phabricator.wikimedia.org/T387773#10604469 (10ayounsi) They're mandatory on long distance link as we've had issue with interface status being up but the provider not forwarding traffic through said link. For loca... [12:39:52] moritzm: there are a bunch of new ganeti hosts racked up in codfw [12:40:04] unfortunately they were added on the wrong vlans :( [12:40:46] I'll fix it up but fyi right now they won't work properly so hold off on imaging them or anything [12:56:21] yeah, thanks. I saw Papaul's update from yesterday. I'll wait until these are sorted [12:56:36] on the upside, we can test than pending cookbook change with these new hosts [14:05:54] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q3): Prometheus: attach host's BGP/interface remote side metrics - https://phabricator.wikimedia.org/T387287#10605317 (10ayounsi) It's live and working fine : {F58611687} https://grafana... [15:06:14] 10netops, 06Infrastructure-Foundations: Different BFD settings on direct connected links - https://phabricator.wikimedia.org/T387773#10605720 (10Papaul) @cmoone @ayounsi thank you all for the input. since we have only cr1/2-codfw with the bfd configuration and the others without it for the main time can i go a... [15:11:51] 10netops, 06Infrastructure-Foundations: Different BFD settings on direct connected links - https://phabricator.wikimedia.org/T387773#10605755 (10cmooney) >>! In T387773#10604469, @ayounsi wrote: > They're mandatory on long distance link as we've had issue with interface status being up but the provider not for... [15:37:33] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06serviceops: WikiKube clusters close to exhausting Calico IPPool allocations - https://phabricator.wikimedia.org/T375845#10605886 (10cmooney) FYI I've updated the prefix-list on our switches and routers in eqiad/codfw from the old /18 to the wider... [16:34:21] XioNoX: let me know if i can help [16:35:09] jbond: don't worry :) [16:35:38] hi :) [16:36:29] ack, well ping if it changes [16:38:03] FYI was just a missed sync of the exported hiera data from netbox with the puppet's definition of types for the same data ;) [16:41:35] ack [16:47:09] 07Puppet, 06Web-Team: Certain mobile devices including XiaoMi are not being redirected to our mobile site - https://phabricator.wikimedia.org/T388032 (10Jdlrobson-WMF) 03NEW [16:51:10] 07Puppet, 06SRE, 06Web-Team: Certain mobile devices including XiaoMi are not being redirected to our mobile site - https://phabricator.wikimedia.org/T388032#10606475 (10Jdlrobson-WMF) [17:12:38] 07Puppet, 06SRE, 06Web-Team: Certain mobile devices including XiaoMi are not being redirected to our mobile site - https://phabricator.wikimedia.org/T388032#10606551 (10bwang) p:05Triage→03High [17:24:45] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q3): Prometheus: attach host's BGP/interface remote side metrics - https://phabricator.wikimedia.org/T387287#10606626 (10ayounsi) 05Open→03Resolved [17:41:52] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q3): Prometheus: attach host's BGP/interface remote side metrics - https://phabricator.wikimedia.org/T387287#10606718 (10ayounsi) I also added that metric to this dashboard as an exa... [17:44:14] 10netops, 10Hiddenparma, 06Infrastructure-Foundations: HIDDENPARMA feature: superset link → requestctl rule - https://phabricator.wikimedia.org/T388039 (10kamila) 03NEW [17:44:33] 10netops, 10Hiddenparma, 06Infrastructure-Foundations: HIDDENPARMA feature: superset link → requestctl rule - https://phabricator.wikimedia.org/T388039#10606746 (10kamila) p:05Triage→03Medium [18:03:29] 07Puppet, 06SRE, 06Web-Team: Certain mobile devices including XiaoMi are not being redirected to our mobile site - https://phabricator.wikimedia.org/T388032#10606837 (10Jdlrobson-WMF) [18:11:11] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06serviceops: WikiKube clusters close to exhausting Calico IPPool allocations - https://phabricator.wikimedia.org/T375845#10606894 (10JMeybohm) p:05Medium→03High [18:18:34] 07Puppet, 06SRE, 06Web-Team: Certain mobile devices including XiaoMi are not being redirected to our mobile site - https://phabricator.wikimedia.org/T388032#10606946 (10Krinkle) I believe it would be a mistake to hardcode `MiuiBrowser` as a mobile browser, as this would break the browser UI and the end-users... [19:32:23] 10Mail, 06Infrastructure-Foundations, 06Trust-and-Safety: Emails from wikimediats.zendesk.com fails DMARC policy - https://phabricator.wikimedia.org/T378285#10607255 (10JAbrams) Hi @revi I met @jhathaway yesterday, and we tested the SMTP connector for outbound email only within the testing environment of... [22:05:17] 10Mail, 06Infrastructure-Foundations, 06Trust-and-Safety: Emails from wikimediats.zendesk.com fails DMARC policy - https://phabricator.wikimedia.org/T378285#10607674 (10revi) You should have received an email, from domain `revi.wiki`, Subject: `Testing testing one two three one two three`. [23:41:23] 07Puppet, 06SRE, 06Web-Team: Certain mobile devices including XiaoMi are not being redirected to our mobile site - https://phabricator.wikimedia.org/T388032#10608114 (10Jdlrobson-WMF) > This hits both the android and mobile tokens in our regex, and is correctly routed to the mobile site. Yes that's correct,...