[07:32:23] 10netops, 06Infrastructure-Foundations: Set MTU on mr1 interfaces - https://phabricator.wikimedia.org/T359320#10224664 (10ayounsi) a:05ayounsi→03Papaul [08:13:14] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Evaluate usage of Kubernetes/Wikikube Tags in netbox and replace them with something if possible - https://phabricator.wikimedia.org/T354169#10224753 (10ayounsi) Above path tested on Netbox next and ready for review. {F57612638} {F57612637} [08:15:18] 10Mail, 06Infrastructure-Foundations, 07User-notice-archive: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984#10224757 (10Aklapper) @Primefac: Do you think it's better not to reply at all, instead of adding a quick comment explaining how to reali... [08:49:22] topranks: hey, how are things? Just wondering if there was a schedule in place for moving eqiad A-D to leaf switches? No problem if not, just need the info :) [08:51:46] claime: hey, em yeah there is a rough one that rows C/D will be done this FY [08:52:19] currently the order of the equipment is still being worked out (with some open questions on exactly what software features we need and/or what vendor to go with) [08:52:22] task is T368959 on that [08:52:35] fantastic thanks <3 [08:53:17] context is whether we could group moving vlans with other reimage campaigns that will happen soon-ish [08:55:04] sounds like that may not be possible, but it's ok [08:55:40] claime: maybe at least the renames ? [08:56:11] XioNoX: the renames will happen with the next reimage campaign yeah [08:56:18] cool! [08:57:02] (it's confusing for us too :D) [08:59:11] claime: how often do you do those re-image campaigns ? [08:59:38] XioNoX: we do them for k8s upgrades [08:59:51] and this time will also be to migrate to containerd [09:00:02] and bullseye [09:00:15] bookworm [09:00:17] sorry [09:00:25] I got worried for a second :) [09:01:19] nah just a brainfart :P [09:17:22] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack: Upload redfish licenses to supermicro hosts - https://phabricator.wikimedia.org/T376121#10224935 (10elukey) [09:33:06] 10netbox, 06Infrastructure-Foundations: Netbox: enrich prefixes - https://phabricator.wikimedia.org/T377114 (10ayounsi) 03NEW [09:35:17] 10netbox, 06Infrastructure-Foundations: Evaluate usage of Kubernetes/Wikikube Tags in netbox and replace them with something if possible - https://phabricator.wikimedia.org/T354169#10225051 (10ayounsi) 05Open→03Resolved Closing that task as the original goal has been reached. I opened {T377114} to tra... [10:13:58] 10Mail, 06Infrastructure-Foundations, 07User-notice-archive: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984#10225229 (10Primefac) No, and I apologise for my tone. [11:52:24] FIRING: SystemdUnitFailed: etcd.service on aux-k8s-etcd1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:57:24] RESOLVED: SystemdUnitFailed: etcd.service on aux-k8s-etcd1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:08:54] FIRING: [2x] SystemdUnitFailed: etcd.service on aux-k8s-etcd1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:13:54] RESOLVED: [2x] SystemdUnitFailed: etcd.service on aux-k8s-etcd1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:43:49] o/ [12:43:58] has anybody time for a quick puppet change? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1079999 [12:47:53] 10SRE-tools, 06Data-Persistence-SRE, 06DBA, 10Spicerack: mariadb: systemctl status accessor in mysql_legacy - https://phabricator.wikimedia.org/T377129 (10ABran-WMF) 03NEW [12:48:00] elukey: +1 [12:48:49] thanksss [12:48:58] 10SRE-tools, 06Data-Persistence-SRE, 06DBA, 06Infrastructure-Foundations, 10Spicerack: mariadb: systemctl status accessor in mysql_legacy - https://phabricator.wikimedia.org/T377129#10225683 (10ABran-WMF) p:05Triage→03Medium [12:49:00] 10SRE-tools, 06Data-Persistence-SRE, 06DBA, 06Infrastructure-Foundations, 10Spicerack: mariadb: systemctl status accessor in mysql_legacy - https://phabricator.wikimedia.org/T377129#10225686 (10ABran-WMF) [12:53:09] 10netops, 10Ceph, 06Infrastructure-Foundations: cephosd advertised v6 prefix flapping - https://phabricator.wikimedia.org/T376697#10225721 (10cmooney) I see the healthcheck fails quite regularly `lines=10 cmooney@cephosd1002:~$ grep DOWN /var/log/anycast-healthchecker/anycast-healthchecker.log 2024-10-13 20:... [12:56:28] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 06SRE: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10225742 (10aborrero) >>! In T375847#10217807, @cmooney wrote: >>>! In T375847#10195673, @aborrero wrote: >> `lang=shell-session >> roo... [13:06:57] I'm wondering what people think of https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1080012 is it too much nagging ? Is it what's needed for people to not forget to migrate their servers ? [13:09:57] XioNoX: I think it is great, I'd just expand the nagging with a link about what/how to do it. Otherwise the end result could be folks stuck in reimage until somebody from our team o sre in general replies with what to do [13:35:54] 10SRE-tools, 06Data-Persistence-SRE, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: mysql_legacy data_directory getter - https://phabricator.wikimedia.org/T376701#10225935 (10ABran-WMF) 05In progress→03Resolved [13:49:12] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 06SRE: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10225985 (10cmooney) We definitely want to use DHCPv6 (stateful) for address assignment. So OpenStack is in control of what IPs are us... [13:58:40] ;28 [14:01:38] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 06SRE: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10226051 (10cmooney) So.... maybe this is normal for DHCPv6? Re-reading the reddit post and looking at the setup on the VM it seems li... [14:40:16] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack: Upload redfish licenses to supermicro hosts - https://phabricator.wikimedia.org/T376121#10226222 (10elukey) p:05Triage→03Medium [14:44:45] 10netops, 06Infrastructure-Foundations: Transient DOWN alert on cr2-magru - https://phabricator.wikimedia.org/T374401#10226253 (10ayounsi) 05Open→03Resolved a:03ayounsi Closing this. Please re-open if it happens again. [14:45:30] 10SRE-tools, 06Infrastructure-Foundations: debmonitor could provide users with cumin and/or debdeploy pre-made config/command - https://phabricator.wikimedia.org/T375475#10226275 (10joanna_borun) p:05Triage→03Low [14:45:36] 10SRE-tools, 06Infrastructure-Foundations: debmonitor could provide users with cumin and/or debdeploy pre-made config/command - https://phabricator.wikimedia.org/T375475#10226274 (10joanna_borun) @fgiunchedi could you please expand on the use-case and problem so we can figure out best way to address it? [14:51:59] 10netops, 06Infrastructure-Foundations, 06SRE-OnFire, 10Sustainability (Incident Followup): Juniper: regularly run `request system configuration rescue save` - https://phabricator.wikimedia.org/T376005#10226302 (10joanna_borun) p:05Triage→03Low a:03ayounsi [14:56:59] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06serviceops: WikiKube clusters close to exhausting Calico IPPool allocations - https://phabricator.wikimedia.org/T375845#10226327 (10cmooney) >>! In T375845#10182322, @JMeybohm wrote: >> It might be possible though to migrate to a new IPPool that... [14:58:41] 10netbox, 06Infrastructure-Foundations: Netbox: enrich prefixes - https://phabricator.wikimedia.org/T377114#10226339 (10joanna_borun) p:05Triage→03Low [15:01:11] 10netops, 10Ceph, 06Infrastructure-Foundations: cephosd advertised v6 prefix flapping - https://phabricator.wikimedia.org/T376697#10226356 (10joanna_borun) p:05Triage→03Medium [15:04:02] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06serviceops: WikiKube clusters close to exhausting Calico IPPool allocations - https://phabricator.wikimedia.org/T375845#10226362 (10JMeybohm) >>! In T375845#10226327, @cmooney wrote: > If we do this we probably need to allocate a new single pool,... [15:05:52] the k8s aux cluster is now fully row redundant [15:06:06] nice!