[09:19:10] topranks: good morning, is this alert "known"? [09:19:12] https://usercontent.irccloud-cdn.com/file/FPdgSjng/image.png [09:20:26] arturo: hmm no I missed that [09:20:34] * topranks looking [09:20:37] ok, than ks [09:21:34] blancadesal: you are next on clinic duty rotation, starting today, is that OK for you? [09:27:32] arturo: thanks for the reminder! [09:27:38] np [09:27:43] topranks: there was a recovery for the alerts [09:29:39] arturo: yeah, it's the same link we had problems with yesterday [09:29:52] it bounced up/down around midnight UTC [09:29:58] https://www.irccloud.com/pastebin/AouSVsou/ [09:30:45] One of the BFD sessions didn't re-establish, although BGP did come back up (this is a known occasional thing - it's typically not a problem clearing the session fixes it) [09:30:51] I cleared the session there to restore it [09:30:56] but either way the link isn't being used [09:31:26] Valerie is going to swap out those optics today hopefully it'll remain stable after [09:31:51] I'll probably leave the link drained over the weekend after the swap, we can "repool" next week providing it stays stable [09:36:26] ok, sounds good [09:36:31] thanks for working on that [09:47:26] * arturo errand, back in 20m [13:21:51] arturo: it took me a while, but I managed to get recordset import working in tofu! [13:21:53] https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/135 [13:22:22] 🎉! [13:22:26] your upstream patch was included in the latest provider release [13:22:30] and it seems to be working [13:22:51] oh, wait import [13:23:03] that is really nice indeed [13:23:21] we no longer have to delete and recreate [13:24:01] nice [13:27:26] +1'd [13:27:36] I pushed a small fix for the "description" field [13:27:42] re-running plan [13:28:02] hopefully `apply` after merge wont report unexpected API problems [13:28:33] fingers crossed [13:28:47] let me try to add the "NS" and "SOA" records as well [13:29:01] the NS and SOA records will most likely fail [13:29:32] well, not if you import them [13:29:41] they will fail to be updated [13:29:50] see https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/OpenTofu#DNS_records_of_type_NS_in_root_zone_cannot_be_updated [13:30:04] yes I remember, but importing should work (hopefully) [13:30:19] I see the description loses the `tofu-infra` mention [13:30:30] hmm I was expecting it was appended [13:30:55] possibly this is missing [13:30:56] https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/blob/main/modules/secgroups/main.tf?ref_type=heads#L21 [13:31:16] yes we need something similar [13:31:18] I'll fix it [13:31:49] mmm we actually have this [13:31:50] https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/blob/main/modules/records/main.tf?ref_type=heads#L28 [13:31:56] maybe the logic is not working as expected [13:33:48] in the latest plan seems to be correct [13:34:56] ok, I think the previous one was just missing the custom description [13:35:03] in the tofu file [13:35:13] I added it in the latest commit [13:35:22] ok [13:35:24] I'm pushing the NS and SOA, let's see if they work [13:35:42] 🚢 🇮🇹 [13:37:53] it's working but I need to add the ttl [13:39:42] our default ttl doesn't match the one in the API? [13:40:27] SOA and NS have a different one [13:40:41] so it must be specified, the default ones are matching [13:40:45] ok [13:40:49] *the ones with the default value [13:41:06] are you running apply for the MR? [13:41:08] look at the latest diff and plan, it should be good to go [13:41:21] I've just run another plan, and I think it's ready for apply [13:41:28] I mean, running apply without merging first? [13:41:56] no [13:42:00] I would merge first [13:42:06] wdyt? [13:42:22] oh, so the plan has the resources because they are imported [13:42:27] yes [13:42:37] was confused, sorry [13:42:52] "Plan: 5 to import, 0 to add" [13:43:08] apply should only change the description adding "managed by tofu-infra [13:43:31] I don't think the NS record can be updated even for the description [13:43:35] but we shall see :-) [13:43:42] ship it! [13:44:12] ha, good point. [13:45:05] should I try apply before merging? [13:45:35] no, that would mess the plan for other MRs :-( [13:45:41] ok! I was reading yesterday a good post about pros and cons of apply-and-merge vs merge-and-apply :) [13:46:04] I would either merge as it is, and be ready to follow up, or drop the NS/SOA from that particular MR [13:46:12] and the post was agreeing it's generally best to merge-and-apply, but some people prefer the other way [13:46:27] I will merge and apply, if it fails I will remove SOA/NS from a follow-up MR [13:46:53] ok [13:49:14] as expected :( "Updating SOA recordsets is not allowed" [13:49:27] and "Updating a root zone NS record is not allowed" [13:50:30] ok [13:50:47] you have two options [13:50:59] delete the records using the CLI and let tofu recreate them [13:51:05] or drop the records from tofu-infra [13:51:24] I am not sure what is best [13:52:13] yeah good question [13:53:00] for some zones I have recreated as you can see, but what is the point in keeping something in tofu that can't be updated [13:53:15] I kinda like tracking them in tofu, because you can see them in gitlab, grep for values, etc. [13:53:16] the point might be to make it explicit what is the content/value of the records [13:53:20] but not a huge advantage [13:53:21] yeah [13:53:58] you may think: why would we update the NS records [13:54:11] well, we have done that at least twice since my time here :-) [13:54:32] :) [13:55:43] I think I'm in favour of drop+recreate. if we need to update the NS records, that becomes a tofu MR with another drop+recreate [13:55:57] ok, go for it then! [13:56:04] they also have a very low TTL so that should help [13:57:54] "Deleting a SOA recordset is not allowed" [13:58:04] :-S [13:59:24] yeah I don't think SOAs are in tofu-infra at all [14:00:01] we may want to add a warning to the docs [14:00:03] so maybe only NS [14:01:16] * arturo nursery run and food time [14:17:40] I see that both NS and SOA records are created automatically when you create a zone [14:18:40] but I can't find any official docs on how you should modify them if you ever need to [14:29:03] I think I changed my mind and I will just remove both from tofu-infra for now [14:36:20] done in https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/136 [14:36:39] I had to manually "tofu state rm" to remove the 2 records from the tfstate [14:41:56] dhinus: top-level dns records have to be created using wmcs-makedomain which does some juggling with ownership. So they probably can't be modified much, just deleted/replaced. [14:46:13] andrewbogott: thanks, where is the source of truth for the NS values? [14:47:06] designate [14:47:12] unless that's not what you mean [14:47:46] it looks like if you create a new zone with "openstack zone create", the NS record is created automatically [14:47:59] but it's not clear to me where the values for that record are coming from [14:49:26] ah, I see. [14:49:47] I don't know off the top of my head but I would assume inherited from the parent zone? [14:49:52] Or possibly in designate.conf [14:50:12] yeah makes sense. I was just curious, not really an issue [14:50:27] ok :) [14:50:42] I can research if you need to actually accomplish something :) [14:51:08] not really, I was just trying to understand the lifecycle of those records, as arturo mentioned they were modified in the past [14:51:45] I think it's fine to keep them out of tofu-infra for now, and just pretend they don't exist :) [14:54:27] separate q: do you think I can use "openstack zone transfer request" to migrate zones from noauth-project to cloudinfra? or should I delete the old ones and recreate them in the new project? (the second route is more risky because there could be some temporary resolution failures) [15:27:03] yeah zone transfer [15:28:22] which reminds me that there is no 'clean' workflow here with tofu-infra for creating sub-zones [15:28:31] as we need the transfer dancing [15:28:43] I think there is a ticket opened somewhere [15:30:00] https://phabricator.wikimedia.org/T376110#10261854 [15:31:02] we will need to figure that out when we get to that ticket [15:33:21] annoying :/ [15:34:17] I will try a "transfer request" for db.svc.eqiad.wmflabs. (T380491) [15:34:17] T380491: Migrate "db.svc.eqiad.wmflabs." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380491 [15:34:34] there is one single record we care about in that zone, but it's an important one (toolsdb) [15:36:03] actually let's play it safe, I will do it on Monday morning just before the toolsdb upgrade [15:55:00] ok! [15:55:43] with a bit of luck, we will be able to 'hide' all the zone transfer dancing in a single module, and just have something like `transfer_from: whatever` in the YAML [15:59:30] andrewbogott: is cloudcontrol1005 a new server? [16:00:19] arturo: no, I'm building a new image there and must've done it on the wrong partition. [16:00:22] I'll clean up shortly. [16:01:49] there are also a bunch of IO errors from the nbd partition from the VM [16:01:53] I'll ACK the alert [16:02:04] thanks [16:10:53] I'm moving to 1006 which seems to have more space [16:12:29] topranks: I would like to merge this next monday if that's OK for you: https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/132 [16:48:50] * arturo offline [17:19:04] arturo: sorry for the delay it looks good to me I was looking earlier but didn't quite get through all of it [17:19:12] I'll respond this evening before I sign off [17:20:08] also heads up folks Valerie is on site and is going to replace the optics either side of that link between E4 and D5 [17:20:19] I'm going to downtime the switches to silence any alerts from them for the next hour [18:43:19] dhinus, silly question: On cloudcumin1002, how do I get cumin to match cloud-vps hostnames? By default it matches prod hosts it seems, and if I specify O{} it doesn't seem to want to do pattern matching... [18:43:27] I just want all VMs with name like "overload*" [18:45:44] * andrewbogott figures out that the answer is maybe O{name:"overload*"} [18:45:53] npe [18:45:54] nope [18:47:05] wait, yes [18:47:07] minus typos