[00:03:37] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade Management routers to 23.4R2-S2 - https://phabricator.wikimedia.org/T369504#10236283 (10Papaul) [00:06:03] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade Management routers to 23.4R2-S2 - https://phabricator.wikimedia.org/T369504#10236284 (10Papaul) 05Open→03Resolved This is complete [03:47:04] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: cr1-eqiad: disk failure - https://phabricator.wikimedia.org/T372781#10236448 (10Papaul) @VRiley-WMF thank you for following up on this. It looks like the router is back running on re0 and disks are all there. We can close. @ayounsi any... [04:20:51] 10netops, 06Infrastructure-Foundations: mr1-eqsin performance issue - https://phabricator.wikimedia.org/T362522#10236460 (10Papaul) I checked the router again today after the Junos upgrade and reboot no core-dump file so far. ` show system core-dumps no-forwarding /var/crash/*core*: No such file or directory... [04:25:43] 10netops, 06Infrastructure-Foundations: cr2-codfw - Host 0 ECC single bit parity error - https://phabricator.wikimedia.org/T371868#10236461 (10Papaul) since August until now no errors so far ` cr2-codfw> show system alarms No alarms currently active [06:37:10] 10netops, 06Infrastructure-Foundations: mr1-eqsin performance issue - https://phabricator.wikimedia.org/T362522#10236570 (10ayounsi) We will need to monitor it a bit more, at they seem to happen once a month or about. [06:38:37] 10netops, 06Infrastructure-Foundations: cr2-codfw - Host 0 ECC single bit parity error - https://phabricator.wikimedia.org/T371868#10236571 (10ayounsi) 05Open→03Resolved a:03ayounsi Perfect, thanks ! [06:40:12] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: cr1-eqiad: disk failure - https://phabricator.wikimedia.org/T372781#10236574 (10ayounsi) ` re1.cr1-eqiad> show system alarms 1 alarms currently active Alarm time Class Description 2024-07-18 16:11:37 UTC Minor Backup... [06:42:50] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: cr1-eqiad: disk failure - https://phabricator.wikimedia.org/T372781#10236576 (10ayounsi) [06:42:50] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10236577 (10ayounsi) [07:47:05] 10netops, 06Infrastructure-Foundations, 06SRE: Re-IP codfw private baremetal hosts to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869#10236702 (10ayounsi) [07:49:41] Easy (I think) netbox report review for anyone who feels like it https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/1081061 :) It's live in https://netbox-next.wikimedia.org/extras/scripts/results/44862/ for example [07:54:03] XioNoX: have you tried to print them all and see if it's really too unreadable? [07:54:46] or print the first 10~15, it will cover most groups [07:55:44] With the cumin command not sure it's needed. I wanted to keep it as a high level overview. [07:56:50] long netbox reports are usually unreadable [07:57:25] ack, but if it says 12 hosts having printed one how does it help? :D [08:01:39] about the same as having printed 12 when it says 100 :) [08:04:16] sure, but we have only 13 groups above 12 hosts :D [08:04:28] over ~63 groups [08:04:44] up to you,was just a question [08:04:54] no strong opinion [08:04:58] I worry that if I print 12 of each, then it becomes unreadable and we're back to square 1. The idea here it just to have an overall idea of what's left to do, and how it's progressing. The ideal would be a report like the os-report, sorted by teams, etc. I need to discuss that. [08:07:11] ack [08:08:13] Let's see how that goes, and if people feel strongly I can always adapt it [08:08:31] actually I'm going to filter out servers that are more than 5 years old too [08:18:43] volans: updated :) [09:28:24] 10SRE-tools, 06Data-Persistence-SRE, 06DBA, 06Infrastructure-Foundations, and 2 others: mariadb: systemctl status accessor in mysql_legacy - https://phabricator.wikimedia.org/T377129#10236907 (10ABran-WMF) [09:28:33] 10SRE-tools, 06Data-Persistence-SRE, 06Infrastructure-Foundations, 10Spicerack: mysql_legacy data_directory getter - https://phabricator.wikimedia.org/T376701#10236909 (10ABran-WMF) [09:34:39] 10netops, 06Infrastructure-Foundations, 06SRE: Re-IP codfw private baremetal hosts to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869#10236937 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin2002 for host phab2002.codfw.wmnet with OS bullseye executed... [09:37:59] elukey: let me know when/if you need review on https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1080456 [09:54:54] 10netops, 06Infrastructure-Foundations, 06SRE: Re-IP codfw private baremetal hosts to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869#10236988 (10cmooney) [10:09:24] XioNoX: o/ yep if you have time and you want to do it please go ahead! I tested it and it seems working fine for both dells and supermicros, but it is a big change so I may have missed some bugs. Hopefully it is more readable/maintainable now [14:38:53] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10238146 (10cmooney) [14:44:53] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10238176 (10cmooney) [14:51:21] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10238216 (10cmooney) [15:25:14] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10238331 (10cmooney) [15:26:42] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10238348 (10cmooney) [15:32:43] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10238391 (10cmooney) [15:33:51] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10238395 (10cmooney) [16:17:12] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10238641 (10RobH)