[08:34:15] volans: can I ask you a few questions about dns/netbox? [08:34:47] I have some records for addresses in the wikimediacloud.org domain that I'm no longer sure how they are supposed to be managed [08:34:53] arturo: sure, ask away [08:35:08] you picked the right domain :) [08:35:25] heh [08:35:35] for example this one shows up in netbox: https://netbox.wikimedia.org/ipam/ip-addresses/7044/ [08:36:01] we neither, hence we left them as manual yesterday and I wanted to follow up with your team on what we want to do there [08:36:35] ok! [08:37:05] what are the options? [08:37:23] depending on what are your plans for that domain and how it's supposed to be managed [08:37:38] what's the context of that domain? can you give me some more info? [08:39:04] the domain represents the WMCS infra, or lets say, the public bits of the CloudVPS infra [08:39:26] for example, openstack.eqiad1.wikimediacloud.org is the service address for openstack APIs [08:39:38] ok [08:39:43] ns0.openstack.eqiad1.wikimediacloud.org is the address of the auth DNS server in openstack [08:40:05] nat.openstack.eqiad1.wikimediacloud.org is the address of the egress NAT in cloudvps [08:40:08] that kind of stuff [08:40:20] (refer to templates/wikimediacloud.org for more) [08:40:23] so, as they are part of the shared public space the IP allocation of new IPs should be done in Netbox [08:40:51] it depends, some addresses are in WMCS-specific CIDRs [08:41:25] for example: [08:41:28] XioNoX might be interested too ^^ [08:41:29] ± host nat.openstack.eqiad1.wikimediacloud.org [08:41:29] nat.openstack.eqiad1.wikimediacloud.org has address 185.15.56.1 [08:41:36] that one is our own range [08:41:52] got it [08:43:03] I think we have some docs on how we are organizing our several domains, let me search fo ir [08:43:06] for it* [08:43:44] https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/DNS#Domain_names [08:44:34] volans: where are the current records? is it possible to see the full list to have an idea? [08:44:50] templates/wikimediacloud.org [08:46:03] XioNoX: https://netbox.wikimedia.org/search/?q=wikimediacloud.org [08:46:30] the eqiad ones that we had we move to "manual" to avoid to mess with yesterday's migration [08:48:24] ah yeah it's not many [08:49:00] no strong opinion :) [08:49:14] they're not going to change often so it would be fine keeping manual [08:49:31] on the other hand the more we have in NB, the better [08:49:51] XioNoX: we have 2 types I think, those that are in our shared spaces and those in their dedicated subnets [08:50:07] for the former I think we need them to be in Netbox from the IPAM point of view [08:50:18] the DNS can be either manual or automatic [08:50:40] all addresses are in Netbox anyway [08:50:59] arturo: I don't see them [08:51:03] https://netbox.wikimedia.org/ipam/prefixes/2/ip-addresses/ [08:51:05] it's empty [08:53:29] well, I meant the CIDRs, they are in netbox [08:53:52] if you were thinking on specific address objects associated with NICs/servers, then I don't think we have that [08:54:23] anything that was on puppetdb as Sep. 14th was imported automatically [08:55:00] some virtual addresses aren't in netbox. Like the NAT address, and some of them used by Neutron [08:59:24] volans: also those https://netbox.wikimedia.org/ipam/prefixes/4/ip-addresses/ [09:26:22] moritzm: re: T266023, what component should the vendor-built deps be uploaded to in the short-term? [09:26:22] T266023: orchestrator: Get packages into WMF apt - https://phabricator.wikimedia.org/T266023 [09:35:17] volans: any conclussion [09:35:43] I think the easiest way for now is to keep manually updating stuff in ops/dns.git [09:35:45] ? [09:36:10] for the dns yes, but for any IP in the shared spaces they must be allocatd in netbox [09:36:17] for IPAM purposes [09:36:46] to mark a manual DNS management for now we've used the description, see for example: [09:36:49] https://netbox.wikimedia.org/ipam/ip-addresses/3522/ [09:37:02] empty DNS Name and descripton filled in, but it's meant to track exceptions [09:37:07] not as a normal way [09:39:17] kormat: all packages synched from an external repo need to end up in a thirdpary/foo component, so maybe thirdparty/orchestrator? [09:39:58] volans: ACK [09:40:20] the relevant files are in modules/aptrepo/files: distributions-wikimedia to add the component and updates for the sync definition [09:41:08] arturo: feel free to add me to the dns patch [09:41:40] ack [09:42:11] moritzm: i assumed `updates` required a remote apt repo to pull from [09:44:37] ah, to clarify: i don't think there is an external _repo_. there's just .deb files in the github release [09:45:28] ah, gotcha. I thought there was an actual repo. in that case you only need the patch for distributions-wikimedia [09:46:49] i'm guessing that `thirdparty` implies syncing from a remote repo; i was thinking of adding a ` component/orchestrator`. but our docs aren't very clear on the conventions [09:48:02] https://wikitech.wikimedia.org/wiki/APT_repository#Repository_Structure says [09:48:09] > All other components using the thirdparty/ prefix are synchronised from external repositories [09:48:28] it's a bit of a cornercase: thirdparty/* is for anything imported without any change/rebuild from an external party [09:48:52] the idea is to be able (at some point) to cross-check what's in thirdparty/* with the respective upstream source [09:49:11] which won't work here. but it still falls under the "unmodified" umbrella [09:49:28] ah hah. alright, `thirdparty/orchestrator` it is. [09:50:33] ack, great [09:54:39] question. If I want to add something to alertmanager, how do I do it? [09:55:18] https://wikitech.wikimedia.org/wiki/Alertmanager isn't helping :P [09:57:50] volans: https://gerrit.wikimedia.org/r/c/operations/dns/+/635965 [09:57:58] akosiaris: why would documentation _help_? [09:58:05] akosiaris: ask in -observability ;) [09:58:29] E_TOO_MANY_CHANNELS [10:16:40] arturo: but why some wikimediacloud.org records are in the shared prefixes? Wouldn't more logical and correct to have them as wikimedia.org and keep the cloud domain for cloud-dedicated subnets? [10:17:07] ns recursors and gw specifically [10:18:14] well, the ns recursors are specifically openstack virtual resources, the server is managed by openstack designate. Is a software defined thing, and from that point of view, makes sense to have it live in the wikimediacloud.org domain [10:18:20] same for cloudinstances2b-gw [10:20:01] volans: ^^^ [10:20:51] but they are in the publicN-ROW-DC subnets, and those are all wikimedia.org domains [10:21:47] then perhaps the address should change [10:21:48] not sure it's sane to have those mixups, I'm worried it will hit us exactly in those kind of things like automation and separation between WMCS adn prod [10:22:09] welcome to the interesting topic of prod <-> cloud separation [10:22:14] you want a coffe? [10:22:19] or popcorn? :-) [10:22:25] I know I know, it's a long story [10:22:32] chocolate thanks :) [10:22:54] * arturo searches for cloud-native hot chocolate :-P [10:23:54] you are not wrong volans. There are many blurred lines everywhere, that's the truth, and we simply cannot fix all of them at the same time [10:25:14] sure, not at the same time, one at a time, possibly heading in the right direction improving the situation instead of complicating it even more ;) [10:26:02] in particular, all this domain thing was decided back in the ireland offsite, it was decided back then to introduce the wikimediacloud.org domain, what kind of addresses would it hold, and how the domain would be hosted [10:26:38] that meeting was recorded in this wikitech page https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/DNS_domain_usage#Resolution [10:27:16] I don't see any mention of IP space addresses though [10:28:32] it is derived from the info. We server openstack services and APIs from the production address space [10:28:52] example [10:28:55] host openstack.eqiad1.wikimediacloud.org [10:28:55] openstack.eqiad1.wikimediacloud.org is an alias for cloudcontrol1003.wikimedia.org. [10:28:55] cloudcontrol1003.wikimedia.org has address 208.80.154.23 [10:28:55] cloudcontrol1003.wikimedia.org has IPv6 address 2620:0:861:1:208:80:154:23 [10:29:19] host ns1.openstack.eqiad1.wikimediacloud.org [10:29:19] ns1.openstack.eqiad1.wikimediacloud.org has address 208.80.154.11 [10:29:19] ns1.openstack.eqiad1.wikimediacloud.org has IPv6 address 2620:0:861:1:208:80:154:11 [10:30:27] I suspect you see all this as a big snowflake to the netbox automation plans, right? [10:30:35] but earlier you mentioned that nat.openstack.eqiad1.wikimediacloud.org is 185.15.56.1 [10:30:53] so what's the separation there? they are in the same subdomain AFAICT [10:31:48] your concern is that we have addresses in 2 different realms being resolved by the same domain? [10:33:10] my concern is that if we consider wikimediacloud.org the "infra" part of it and manage it via netbox as all the rest of the infra, if they have addresses in different subnets to manage their reverse records we need to manage those too, but if part of the records are in a subnet we decided not to manage because cloud-specific that creates an issue [10:33:24] because we can manager only one side of the records and not both direct/reverse [10:33:33] I see [10:35:45] I think there is only one problematic range/CIDR here [10:36:20] the one used for floating IPs and the general egress NAT is mixed in the current network model [10:36:42] but in the new network model we are currently developing, that mixture disappears [10:37:29] so the domain for floating IPs will be 100% virtually managed in designate, and the egress NAT will be 100% physical infra and managed by netbox [10:37:55] take 185.15.56.0/25 for example [10:38:33] that is the CIDR we use in eqiad1 (old network model) for both floating IPs and egress network [10:38:55] and that generates this weird thing: [10:38:57] https://www.irccloud.com/pastebin/1StXL9Ky/ [10:40:07] the PoC for the new model in the codfw1dev deployment (the patch that you just reviewed) separates the CIDR for floating IPs and the egress network [10:41:01] (or at least, that's what I think, I never really evaluated this new model from the DNS hierarchy point of view before) [10:41:58] yes, I think what I'm saying is true [10:42:08] lol :) [10:42:10] the edge network now uses a separate CIDR from floating IPs [10:42:35] this is complex :-) [10:42:45] I can imagine [10:43:11] I will make a note here to make sense of the DNS setup in the new network model volans [10:43:20] thanks [10:43:43] and will try to make a diagram or two for us to have a better shared understanding of this topic [10:43:57] but basically, after this brief review, I think this is moving in the right direction [10:44:18] in the right direction to address your DNS management concerns [10:44:28] * arturo brb [10:46:49] thanks for discussing this -- I think it'd help to have this captured somewhere like a task [10:47:32] I don't know what "the edge network now uses a separate CIDR from floating IPs" means exactly for example, but "cidr" can be ambiguous [10:47:46] indeed for capturing this [11:12:12] basically, in the current model (the one in use by eqiad1) the egress NAT address (routing_source_ip) was an address taken (or reserved) in the floating IP CIDR (185.15.56.0/25 https://netbox.wikimedia.org/ipam/prefixes/2/). So the PTR records for that CIDR were mixed: one managed by hand by us the WMCS team (nat.openstack.eqiad1.wikimediacloud.org) and the rest auto-generated by designate for each floating IP [11:12:24] that's the mixed model we are moving away now in the codfw1dev PoC with cloudgw [11:12:58] in the PoC new model, the routing_source_ip address belongs to a different CIDR, not shared with the floating IP subnet [11:13:08] in this context, subnet == CIDR [11:25:29] no, what I'm saying might not be true [11:25:41] -_-U [11:34:00] it's ok :) let's think this through in a task or something [11:34:43] I'd also prefer to focus on the ~now rather than PoC/experiments and/or future states (although it does help to think of the future as well!) [11:50:04] ack [11:56:46] * arturo creates https://phabricator.wikimedia.org/T266331 [12:14:26] thx [12:51:47] is there a convention of how to inject passwords into config files in puppet? pretty much every call to `secret()` in puppet is used where the value is the entire content of the file [12:53:15] https://phabricator.wikimedia.org/P13056 is what i'm trying to generate, for context [12:54:25] kormat: see also passwords:: [12:54:40] we have 3 ways to inject secrets [12:54:44] at least one is deprecated [12:55:01] and the deprecation rotates, right? :) [12:55:10] based on the day of the week, yes [12:55:46] jokes aside IIRC the passwords:: stuff is deprecated, hiera and secrets() are ok [12:55:52] secrets for files hiera for single values [12:56:32] I'd simply use a valiue stored in private Hiera [12:56:36] I guess we should write it down in the README inside the repo at some point :-P [12:56:50] ahh. i didn't notice there was a private-repo hiera [12:56:57] so /srv/private/hiera/role/orchestrator.yaml on pm1001 [12:57:08] is it allowed to look that up from modules, by any chance? [12:57:19] or do i need to look it up in a profile and then pass the password(s) as parameters to the module? [12:57:24] the latter [12:57:37] ok. passing passwords around makes me uneasy, but 🤷‍♀️ [12:58:04] Even if it is 1234? [12:58:06] they're not being passed via RPC :-P [12:58:12] marostegui: ****? [12:58:22] I thought we agreed on using 1234 [12:58:23] volans: they're logged extensively [12:58:29] in your puppet code you can simply set some stub value like changeme or so [12:59:10] kormat: one thing sometimes we do is to add the hiera key in the public repo too, commented out, that says it's in the private repo [12:59:35] as well as adding it to the public 'private' repo, i suppose. [12:59:39] ofc [13:21:20] mm. how long after submitting a change to labs/private.git should i expect it to be visible in pcc? [13:22:46] kormat: I remember there could be a puppet merge specific on labs? [13:22:54] but maybe I am mixing things [13:23:01] i ran puppet-merge, it picked up the change [13:23:09] then i ran pcc, which failed [13:23:22] yeah, I was right [13:23:25] see puppet-merge --help [13:23:30] --labsprivate [13:23:44] although not sure if included by default? [13:23:55] I am guessing no based on your feedback [13:23:57] it must be, because as i said, puppet-merge picked up my change [13:24:07] ah, then it is not that [13:24:15] but doesn't hurt trying? [13:24:39] hah. it's very broken. [13:24:47] I am not sure, but I remember finding myself in your situation [13:24:59] https://phabricator.wikimedia.org/P13059 [13:25:15] he [13:25:40] jbond42: do you have any idea? [13:25:44] is it a new host, maybe puppet facts are outdated on test hosts? [13:25:57] when it is a new host I have to manually update [13:25:58] the host is newish, but pcc worked earlier [13:26:10] https://puppet-compiler.wmflabs.org/compiler1001/26097/dborch1001.eqiad.wmnet/change.dborch1001.eqiad.wmnet.err [13:26:11] then I think I ran out of ideas [13:26:16] yeah, let me see [13:26:43] ohh. i think i put my .yaml in the wrong dir [13:26:46] where it was defined? [13:26:52] yeah, that was the next step [13:26:59] i think it should have been hieradata/common/orchestrator.yaml [13:27:08] er, hieradata/role/common/orch.. [13:27:32] i listened to moritzm, that was my downfall. ;) [13:29:37] kormat: the error in the past looks like the same error that triggered me to create https://gerrit.wikimedia.org/r/c/operations/puppet/+/630897 [13:29:56] however a standard puppet-merege should have been enough and as you saw the change it sounds like it was [13:31:04] and if the role is orchestrator then yes it shuld be role/common common//orchestrator would be for the class orchestrator which we shouldn;t really use [13:31:28] anyway sounds like you may have resolved this now? [13:31:28] jbond42: mind double-checking https://gerrit.wikimedia.org/r/c/labs/private/+/636019 ? [13:31:34] yes will do [13:32:54] kormat: yes thats looks goot to me [13:33:02] great, thanks :)