[08:20:10] greetings [10:29:44] hello! I'm a bit under the weather but I'm around :) [10:30:55] o/ [10:33:57] I think I've just fallen into a puppet typing rabbit hole [10:44:57] taavi: need help? :D [10:45:25] pupppet+typing rabbit holes might be very dangerous to mental health :-P [10:46:40] for real [10:49:00] sounds like a perfectly normal friday activity to me :) [10:49:04] so basically I noticed that cloudgws have some hiera keys that in codfw had plain IP addresses but in eqiad had a cidr netmask included [10:49:31] so I started looking on how that didn't break anything (before I added strict typings to the cloudgw profile) [10:50:03] and now I'm trying to add proper typings to some of the interface::* defines that were written in 2013 and have only had very minor changes since then [10:52:47] so far so good [10:53:44] that, in turn, lead me to a bunch of inconsistenies on whether VLAN numbers are treated as numbers or strings [10:53:56] lol [11:01:48] I wasted quite some time yesterday because somewhere in the path of creating a secret from toolforge-deploy via get_secret.sh a spurious space at the end of the secret was added in k8s. Workarounded with a `| trim` but I'm sure it's another rabbit hole that would make sense to look at if I have time. [11:40:15] please don't run any tofu pipelines, I'm manually editing the tfstate to merge https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/102 [11:40:40] k [11:46:30] change done, tofu plan is a no-op, I'm self-merging [11:46:39] k [11:58:12] I created T410720 [11:58:12] T410720: [tofu-provisioning] Allow running manual "tofu state" commands - https://phabricator.wikimedia.org/T410720 [11:58:38] right now the only way is downloading the tfstate file and editing it manually which is very ugly and error-prone [11:59:09] if someone feels like reviewing puppet patches, I have a stack of fun starting from https://gerrit.wikimedia.org/r/c/operations/puppet/+/1208297 [11:59:36] * dhinus checks out how "fun" they are :) [12:02:15] this PCC is definitely fun, I think it's not able to display the diff between strings and ints https://puppet-compiler.wmflabs.org/output/1208298/7671/cloudnet1006.eqiad.wmnet/index.html [12:03:57] indeed, if you look ad the prod catalog vs change catalog for example network_flat_interface_vlan_external [12:04:04] changes from "1107" to 1107 [12:05:05] discovering a bug in PCC thanks to your code >>> discovering a bug in your code thanks to PCC [14:57:31] moritzm (or possibly taavi since he was thinking about pcc today): how does profile::idp ever work? I see it taking several args (e.g. $web_authn_signing_key) as a single string and then passing them to apereo_cas which takes string[1]. Is that something that the PCC dislikes but which works in practice? [14:58:32] andrewbogott: which problem are you seeing there? [14:58:46] https://puppet-compiler.wmflabs.org/output/1208350/7868/cloudidp2001-dev.wikimedia.org/change.cloudidp2001-dev.wikimedia.org.err [14:59:04] I can't see that I'm doing anything different from existing uses of that profile [14:59:11] (which presumably work) [14:59:25] * andrewbogott awaits being shown the typo [14:59:48] you probably need values in labs/private setting non-empty values for web_authn_signing_key and web_authn_encryption_key [14:59:56] dummy values in labs/private* [15:00:32] like this? https://gerrit.wikimedia.org/r/c/labs/private/+/1208359/2/hieradata/role/codfw/idp_clouddev.yaml [15:00:45] seems reasonable [15:01:27] that's already merged though [15:01:48] but did you run puppet-merge as well? [15:01:58] it's needed these days to deploy labs-private changes [15:02:04] yes, I did [15:02:17] also the error is about it getting a string, not about something being undefined [15:02:20] iirc pcc is sometimes a bit slow updating that repo, unless it's explicitely specified as a dependency [15:02:40] So, is the subtext here that a string and a string[1] are the same type? [15:03:04] String is any string, String[1] is a non-empty string [15:03:44] ok. That fits with something being undefined then... [15:03:47] * andrewbogott re-runs the pcc [15:04:44] I would get rid of the empty string values in puppet.git:hieradata/role/codfw/idp_clouddev.yaml, they're going to do nothing except confuse us later with this exact same thing [15:12:18] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1208370/2 [15:18:11] ooh, now I'm getting a different pcc warning. I guess I just had to wait a while :/ [15:24:06] your `profile::tlsproxy::envoy::ssl_provider` is off [15:24:52] assuming this'll be behind the CDN, you'll want to set ::ssl_provider, ::global_cert_name and ::cfssl_options similar to how they are on the cloudwebs [15:27:31] why the same as the cloudwebs rather than the same as other idp nodes? [15:31:10] because the same availability-during-outages concerns don't apply to this deployment so there's no need to use precious public ips for them [16:28:47] * andrewbogott circles back [16:29:23] I don't 100% understand why envoy is in the mix on cloudweb2002-dev. Is it to support TLS between the cdn and the host? [16:30:29] yes [16:35:34] then I'm missing something important. If traffic from the CDN is on TLS, then how is that different (from the server's perspective) from traffic coming from the outside internet? [16:49:42] since it's internal traffic we can use an internal ca (which requires less setup) for the certificate instead of getting publicly trusted certificiates for it [17:16:13] ah, ok -- so not a reason my current setup wouldn't work. makes sense, thanks [17:17:14] no, your current setup will not work because the additional config needed for getting the LE certs isn't there [17:18:54] yep, i'm caught up I think [17:30:13] I stumbled on https://github.com/digitalocean/clusterlint today because DO sent an email to RelEng about a linter warning for the next forced k8s cluster upgrade. I haven't poked around to guess if their existing linter rules would be useful for our clusters, but it is at least an interesting idea to look into. [18:11:13] I did some more research today on tools-db disk usage, the file size of ibdata1 is still increasing, but very slowly for now https://phabricator.wikimedia.org/T409716#11396763 [18:12:38] I claimed T409404 and will work on it next week [18:12:38] T409404: Add filesystem space alerts for tools(db) - https://phabricator.wikimedia.org/T409404 [18:13:10] fixing the root cause is T409857 [18:13:10] T409857: [toolsdb] Automatically terminate long transactions - https://phabricator.wikimedia.org/T409857 [18:34:38] * dhinus off, enjoy your weekend! [18:35:10] ack, thx for the update