[01:54:26] <jinxer-wm>	 FIRING: [8x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[05:54:26] <jinxer-wm>	 FIRING: [8x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[09:49:50] <btullis>	 Morning. Would someone be able to help us with allocating some IP ranges for the new dse-k8s-codfw cluster, please? T400037
[09:49:51] <stashbot>	 T400037: Determine dse-k8s-codfw Kubernetes IP ranges - https://phabricator.wikimedia.org/T400037
[09:54:26] <jinxer-wm>	 FIRING: [8x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[09:54:28] <btullis>	 Ideally, for IPv4, we would like a `/20` (services) and a `/21` (pods) in here https://netbox.wikimedia.org/ipam/prefixes/379/prefixes/ to match the values for dse-k8s-eqiad.
[09:55:50] <btullis>	 But I wouldn't like to make any changes in netbox without suitable consultation.
[10:09:41] <elukey>	 Hey Ben! I think it should be a matter of allocating a /18 right?
[10:09:48] <elukey>	 to then be split again
[10:10:20] <btullis>	 elukey: Yes, I think so. That's what ml-serve and aux are doing. I just wanted to check before doing anything.
[10:10:58] <elukey>	 yep yep, wikikube has a bigger one but it makes sense
[10:12:13] <elukey>	 I think that in general we should try to avoid as much as possible wasting addresses, but you can definitely make a reservation and then ask the review from Cathal or Arzhel. Worst case we'll delete it and re-create another one :)
[10:12:45] <elukey>	 I can review it as well if you want but I compared to them I have less than zero authority :D
[10:12:57] <topranks>	 I guess it comes down to what is “waste”
[10:13:32] <topranks>	 the biggest risk in my view is making too small allocations and having to renumber down the road
[10:14:08] <btullis>	 Yeah, originally, the large (/20) for service addresses was because we thought that knative-serving might be deployed to these clusters: https://phabricator.wikimedia.org/T310169#7992185
[10:14:10] <topranks>	 10/8 is large, I glanced at this it seems to make sense, I’ll take a detailed look shortly
[10:14:28] <btullis>	 That hasn't happened yet, but it could still theoretically happen.
[10:14:42] <btullis>	 topranks: Many thanks.
[10:19:09] <jinxer-wm>	 FIRING: [8x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[10:20:42] <elukey>	 I think nowadays we do /20 and /21 to guarantee future growth without renumbering
[10:21:07] <elukey>	 I wouldn't go with something less
[10:21:34] <btullis>	 Ack, thanks.
[10:26:19] <elukey>	 in other news, kartotherian and tegola (maps) are running using the maps-test2* cluster in codfw, all Bookworm based
[10:26:29] <elukey>	 still not getting live traffic, but fingers crossed
[10:39:13] <jinxer-wm>	 FIRING: [7x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[10:44:09] <jinxer-wm>	 FIRING: [7x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[10:48:02] <topranks>	 elukey: am I correct in thinking the default POD IP range per host is a /26?  So a /21 POD IP allocation allows for up to 32 hosts in the cluster?
[10:49:09] <jinxer-wm>	 FIRING: [7x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[10:59:09] <jinxer-wm>	 FIRING: [5x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[11:04:09] <jinxer-wm>	 RESOLVED: [2x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[11:08:10] <wikibugs>	 10CAS-SSO, 06Infrastructure-Foundations, 10Phabricator: Phabricator should use IDP for developer account logins - https://phabricator.wikimedia.org/T377061#11019974 (10Aklapper) Tyler mentioned CAS for Spiderpig implementation; maybe parts are reusable? https://gitlab.wikimedia.org/repos/releng/scap/-/blob/m...
[11:28:01] <elukey>	 topranks: o/ We moved to /20 and /21 in all clusters except Wikikube to allow for growth, IIRC with a /21 pod ip allocation we are able to spin up ~2000 pods. What do you mean with 32 hosts in the cluster?
[11:28:30] <topranks>	 I mean each host announces a fixed /26 subnet to the network right?
[11:29:08] <topranks>	 like even if it only has one pod on it, the host will still use a /26 ?
[11:29:56] <elukey>	 good question, I don't recall
[11:30:08] <elukey>	 you mean calico doing BGP with the tor or the routers
[11:30:26] <topranks>	 yeah I think that is how it works, so you do get 2048 IPs for PODs, but probably the more meaningful scaling factor is it gives you 32 x /26 networks so that number of maximum hosts 
[11:30:46] <topranks>	 elukey: yep what calico announces to the routers 
[11:33:15] <elukey>	 okok this bit wasn't clear to me
[13:48:44] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: BGP: Support receipt of graceful-shutdown community and set local-pref - https://phabricator.wikimedia.org/T399931#11020341 (10cmooney) 05Open→03Resolved a:03cmooney
[15:45:52] <jhathaway>	 elukey: which ml nodes can use to look at the nvme uefi issue?
[15:51:12] <elukey>	 jhathaway: ml-serve1012 is currenty in d-i
[15:51:26] <elukey>	 all details here https://phabricator.wikimedia.org/T393948
[15:58:58] <jhathaway>	 thanks
[16:05:05] <jinxer-wm>	 FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting
[17:02:12] <ragesoss>	 Wiki Education has been getting a number of reports from new users who don't receive newly-required confirmation emails from auth.wikimedia.org. This seems to be primarily an issue with university email systems (so using a personal email has been a successful workaround so far), but we've had reports from at least three different universities.
[17:03:10] <ragesoss>	 as the fall semester starts, we're likely to be fielding a lot more emails about new users who can't confirm their emails, if this is a widespread problem.
[17:03:57] <ragesoss>	 any advice welcome.
[17:10:05] <jinxer-wm>	 RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting
[17:39:53] <wikibugs>	 10Mail, 06Infrastructure-Foundations, 06SRE, 10SRE-Access-Requests: Access Request to DMarcDigests - https://phabricator.wikimedia.org/T399976#11021555 (10nisrael) @Aklapper my apologies! I will make a note to myself to do this for future tasks!
[19:12:33] <wikibugs>	 10CAS-SSO, 06cloud-services-team, 10Striker: Use IDP for authentication in Striker - https://phabricator.wikimedia.org/T359554#11021861 (10Arendpieter)
[21:58:09] <inflatador>	 I can't seem to find `systemd-standalone-tmpfiles` in our Bullseye repo anymore, does anyone know what happened? I've been building a docker image that needs the package, ref https://gitlab.wikimedia.org/repos/data-engineering/opensearch/-/blob/main/blubber.yaml?ref_type=heads#L25
[23:33:20] <btullis>	 I think it's related to the deprecation of bullseye-backports, which got removed recently. You can use this base image instead for your opensearch container. https://docker-registry.wikimedia.org/openjdk-17-jre/tags/
[23:38:41] <inflatador>	 ACK, I was wondering if it has something to do with backports, but the message I found was from a year ago so I didn't think it was that