[00:01:00] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 22.4R3 - https://phabricator.wikimedia.org/T364092#10190074 (10Papaul) [00:02:50] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: cr3-ulsfo incident 22 Sep 2024 - https://phabricator.wikimedia.org/T375345#10190077 (10Papaul) Junos upgrade complete for the system Icinga checks back green. All good on the router, site can be pool back Thanks [06:55:52] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: cr3-ulsfo incident 22 Sep 2024 - https://phabricator.wikimedia.org/T375345#10190475 (10ayounsi) 05Open→03Resolved Thanks, all is good now ! [07:15:44] 10netops, 06Infrastructure-Foundations: Juniper: regularly run `request system configuration rescue save` - https://phabricator.wikimedia.org/T376005#10190489 (10ayounsi) ` cr3-ulsfo> request vmhost snapshot ? Possible completions: <[Enter]> Execute this command config Sychronise C... [07:36:16] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10190518 (10ayounsi) [09:17:36] does https://bugs.debian.org/grub2 work for you? It times out for me [09:23:09] works for me [09:23:10] XioNoX: as in does the site load? [09:23:14] it does for me yeah [09:23:22] but I had the BTS fail for me ~ 10 minutes ago as well [09:23:26] maybe there was some maintenance [09:23:27] weird [09:23:44] or some ML bullshit scraper [09:23:45] of course now that loads [09:23:50] thx! [09:23:52] can I ask a dumb puppet question? [09:24:03] I know "there is no such thing as a stupid question" [09:24:10] but this is definitely me being dumb :P [09:24:20] there are definitively dumb questions :) [09:24:27] :D [09:24:47] If I put some data in hiera, and I just want to write that out to a YAML file on the target hosts, what is the best way? [09:25:10] I see there is a to_yaml function but this mostly seems to be passed data which is compiled in the puppet code from various elements [09:25:35] to_yaml jsut transforms a dict do yaml [09:25:35] which I can probably use, but I literally just want to create a yaml file with the exact data under a specific hiera key [09:25:46] so I think it's the best way [09:26:35] ok thanks I'll see how I get on [09:33:23] ok, what about https://www.gnu.org/software/grub/grub-development.html ? [09:33:34] does that load for anyone? [09:37:49] times out for me [09:38:19] ok, thx [10:29:06] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack: Upload redfish licenses to supermicro hosts - https://phabricator.wikimedia.org/T376121 (10elukey) 03NEW [10:47:55] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack: Upload redfish licenses to supermicro hosts - https://phabricator.wikimedia.org/T376121#10191021 (10elukey) [13:21:25] FIRING: [2x] SystemdUnitFailed: docker.service on aux-k8s-ctrl1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:26:25] RESOLVED: [2x] SystemdUnitFailed: docker.service on aux-k8s-ctrl1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:27:29] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: cr3-ulsfo incident 22 Sep 2024 - https://phabricator.wikimedia.org/T375345#10191716 (10Papaul) 05Resolved→03Open a:05ayounsi→03Papaul I have to update netbox with the inventory and new serial number [13:46:42] hey folks [13:47:05] is there a reason why homer is super slow with cr{1,2}-eqiad routers? [13:47:21] you can blame me if there is an open task for automation that needs to be solved [13:47:29] I just want to know why :D [13:47:52] elukey: because it needs to iterate over all the site's routers to know if they need to have BGP configured on those routers [13:48:00] elukey: T271864 [13:48:00] T271864: Homer: optimize API calls to Netbox - https://phabricator.wikimedia.org/T271864 [13:48:03] all the servers I mena [13:48:16] that's totally outdated, I bet we do many more nowadays [13:48:26] yeah it's different [13:48:48] okok thanks for the info [13:48:58] sounds something that I can add in my todo list [13:49:02] XioNoX: can't we filter the devices in netbox based on bgp status and optimize that part? [13:49:56] volans: we already do :) [13:50:11] that's the line that's taking time: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/homer/deploy/+/refs/heads/master/plugins/wmf-netbox.py#117 [13:50:34] how long is bgp_devices? [13:51:02] maybe with a graphql query it might be quicker [13:51:13] yeah graphql is probably better [13:51:22] ah right because of the wikikube workers, they are a ton now [13:51:31] yeah [13:52:01] the day where all servers peers with their ToR, then we won't need that bit [13:52:31] the issue is that the line links is actually multiple HTTP calls [13:52:46] so indeed graphql would make it much faster [13:53:08] https://phabricator.wikimedia.org/T341968 - https://phabricator.wikimedia.org/T310577 [13:53:23] ah, and of course... https://phabricator.wikimedia.org/T310577#10105348 [13:54:59] thinking out loud, maybe we could improve it by checking the subnet the primary IP belongs to, instead of checking what the server is connected to [14:00:17] yeah if it's quicker to do that worth a shot [14:00:43] get the vlan name and check it's legacy_vlan_name() as we're already doing [14:03:42] another approach might be to do it like capirca - and generate the list of server bgp sessions periodically from netbox itself for all the routers, which homer could just read as a file [14:03:48] though I'm not sure I like that tbh [14:14:42] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10191898 (10Jhancock.wm) [14:50:48] 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: Arelion IPv6 transit renumbering - https://phabricator.wikimedia.org/T365697#10192040 (10Papaul) [14:51:02] 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: Arelion IPv6 transit renumbering - https://phabricator.wikimedia.org/T365697#10192044 (10Papaul) [15:05:18] very nice trap set for me - the BGP flag for the "old" aux ctrl/workers was "false", so Homer wanted to remove all the old settings and only add the new hosts :D :D [15:07:43] lol [15:22:39] ha yes that almost got me before too [15:23:47] when cr2-eqsin rebooted with an old config at the w/end I couldn't work out why, even when I ran homer, it didn't remove the old LVS IPs and add the new ones [15:23:58] turns out all the LVS hosts had the bgp flag disabled [15:24:11] but both core routers had the group built manually on them [15:24:15] so homer wasn't touching them [15:25:27] 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: Arelion IPv6 transit renumbering - https://phabricator.wikimedia.org/T365697#10192253 (10Papaul) [15:58:17] 10netops, 06Infrastructure-Foundations: Arelion IPv6 transit renumbering - https://phabricator.wikimedia.org/T365697#10192596 (10Papaul) [18:13:27] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: cr3-ulsfo incident 22 Sep 2024 - https://phabricator.wikimedia.org/T375345#10193306 (10Papaul) 05Open→03Resolved Add both power supplies in Netbox under inventory [18:22:56] 10netops, 06Infrastructure-Foundations: Arelion IPv6 transit renumbering - https://phabricator.wikimedia.org/T365697#10193328 (10Papaul) [18:23:50] 10netops, 06Infrastructure-Foundations: Arelion IPv6 transit renumbering - https://phabricator.wikimedia.org/T365697#10193330 (10Papaul) 05Open→03Resolved This is complete. @ayounsi thanks for the patch [20:34:09] slyngs: I think you wrote bitu, where is the source for https://idm.wikimedia.org/signup/ - I'm struggling to find the template in bitu's repo [23:06:48] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10194218 (10Papaul) [23:14:14] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10194252 (10Papaul) [23:20:48] 10CAS-SSO, 06Infrastructure-Foundations: Enable self-service IDP two-factor authentication management - https://phabricator.wikimedia.org/T359552#10194295 (10bd808)