[12:12:48] topranks: do you have time to look at the cloud-private ipv6 rollout today? [12:13:03] taavi: yep sure [12:13:38] starting in codfw I assume? [12:13:50] yeah definitely [12:15:51] so I think the order is roughly: [12:15:53] 1) Assign IPv6 IPs for all the hosts in Netbox [12:15:54] 2) Add the IPs to the relevant host interfaces with cumin (and static routes) [12:15:54] 3) Add the DNS names for the hosts in Netbox [12:16:00] does that make sense? [12:16:43] yep [12:17:00] https://netbox.wikimedia.org/ipam/prefixes/657/ip-addresses/ https://netbox.wikimedia.org/ipam/prefixes/1108/ip-addresses/ [12:17:23] having the last octet match is probably a good convention here [12:19:16] do you have some script ready for assigning those addresses? [12:20:14] leave it with me for a few mins should be easy enough to make one [12:20:39] cool, I'll look at figuring out the cumin commands in the meantime [12:20:52] yeah if you can figure out the globbing that would be good [12:31:34] something like this should work for codfw1dev https://phabricator.wikimedia.org/P77152 [12:53:28] taavi: thanks, I commented back on the paste overall looks good, just a few questions on the exact routes we need to include [12:59:54] topranks: commented, those routes are what the current puppet code would provision anyway [13:01:22] taavi: ok thanks [13:01:46] yeah so I think the only issue in Netbox is this network isn't used, octavia-mgmt is instead using the one from the paste [13:01:46] https://netbox.wikimedia.org/ipam/prefixes/1207/ [13:02:02] ^^ I think we can delete this? not urgent we can check with andrew if there is a doubt [13:03:42] i deleted it, https://netbox.wikimedia.org/ipam/prefixes/1208/ is the one actually in use [13:06:23] ok nice [13:06:53] taavi: I think we can probably add that to hosts, I had to help out with something else so I'm a bit delayed adding them to netbox [13:07:11] but possibly we can add to hosts, then add to netbox with the dns_name all at once? [13:07:54] i guess so [13:08:10] ok, minor problem: when trying to run the command via cumin, the shell thing runs on the cumin host and not on the target [13:08:11] sigh [13:11:19] oh right - like the bash expansion and stuff? [13:11:28] yeah, the bash expansion runs too early [13:12:52] we could do via ssh or something perhaps [13:13:07] or make a script and scp it to them all then execute that with cumin [13:13:55] the ugly but working hack is to base64 encode the command, then turn the cumin command into 'echo blalbalala== | base64 -d | bash' [13:14:34] that's smart... if ugly :) [13:14:40] works for me [13:15:13] nice little trick I never thought of that any of the times I hit similar things before :) [13:17:42] ok, now trying to run that on cloudcontrol2004-dev fails with 'Error: either "local" is duplicate, or "vlan2151" is a garbage.' [13:18:15] oops, missing 'dev' I think [13:19:45] certainly seen that class of error due to missing 'dev' [13:19:48] 'is a garbage'? [13:20:23] seems like that worked on cloudcontrol2004-dev [13:20:27] iproute2 is annoying in that sometimes it requires 'dev' explicitly, sometimes it's optional [13:20:57] ready for me to run that on all cloud-private nodes in codfw1dev? [13:30:50] ^ topranks: any reason not to continue? [13:31:15] taavi: sry, yep I think we are good to go [13:31:33] cool, doing [13:32:07] done! [13:32:27] looks ok [13:32:30] https://www.irccloud.com/pastebin/0EFep0Cf/ [13:33:18] yeah, the nodes can now ping each other [13:33:31] next up is adding the DNS records I think [13:33:35] ok... [13:33:45] let me get back to the script [14:10:34] taavi: ok the names are in netbox now [14:10:35] https://netbox.wikimedia.org/ipam/prefixes/1108/ip-addresses/ [14:11:37] thanks! next up is running the dns cookbook I think? [14:11:59] also why are a few of those not attached to the interfaces? same thing with some v4 addresses for some reason [14:12:10] yep unless there is any reason to hold off? [14:12:17] yeah I just copied the v4 assignment [14:12:41] want to run the cookbook or should I? [14:12:53] we can run the puppetdb address import script to fix that up for cloudnet2006 and cloudlb2004 [14:13:08] taavi: if you want to run it I'll fix that assignment thing in netbox [14:13:15] will do [14:14:40] taavi: wait! [14:15:12] sure, why? [14:15:40] taavi: no nevermind sorry being paranoid [14:15:58] I ran the script to do that import and confused myself - thought we missed an address [14:16:39] https://phabricator.wikimedia.org/P77180 [14:16:42] the diff looks good to me [14:16:52] let me see [14:19:58] taavi: ok looks good to me [14:22:45] done, and then I manually wiped the DNS cache with 'sudo cookbook sre.dns.wipe-cache -s codfw "private.codfw.wikimedia.cloud$"' [14:23:15] puppet seems to be correctly persisting the config and changing firewall rules etc now, which is great [14:23:36] great! [14:26:05] let's do the same for eqiad1 like early next week or so if we don't find anything broken until then? [14:26:11] (and thanks for your help as always!!) [14:28:22] sounds like a plan yep! [14:36:55] topranks: is "BGP session for cloudlb200[34]-dev is down" related to what you're doing? [14:37:23] it should not be but co-incidence is large [14:37:48] I'll have a look [14:37:51] the bgp session is up on cloudlb2003 [14:37:59] andrewbogott: where are you seeing that? [14:38:12] alert manager [14:39:04] ah whoops [14:39:24] I enabled the bgp sessions for the new IPv6 addresse but forgot to move `profile::bird::do_ipv6: true` from individual host hiera to role hiera [14:39:25] * taavi fixes [14:41:35] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1154040 [14:41:45] pcc still running [15:00:59] hey [15:01:10] sorry a bird flew into the house here i've been having fun... [15:01:26] so the v6 sessions aren't configured on the cloudsw we'll need to do that I take it? [15:03:39] taavi: ah I see you added them [15:03:41] cool looks good [15:04:40] cmooney@cloudvirt2006-dev:~$ ping 2a02:ec80:a100:4000::1 [15:04:40] PING 2a02:ec80:a100:4000::1(2a02:ec80:a100:4000::1) 56 data bytes [15:04:40] 64 bytes from 2a02:ec80:a100:4000::1: icmp_seq=1 ttl=61 time=0.386 ms [15:04:41] 64 bytes from 2a02:ec80:a100:4000::1: icmp_seq=2 ttl=61 time=0.320 ms [15:05:16] wait.... [15:06:28] the cloudvirt is lacking a route for this [15:06:35] so it's taking the default the way we had feared [15:06:47] huh. is that not handled by the supernet route? [15:06:54] cmooney@cloudvirt2006-dev:~$ ip route get fibmatch 2a02:ec80:a100:4000::1 [15:06:54] default via fe80::b2eb:7f08:4637:f920 dev eno12399np0 proto ra metric 1024 expires 588sec hoplimit 64 pref medium [15:08:47] taavi: no the aggregate that we have is 2a02:ec80:a100:200::/56, which is the cloud-private "subnet" ranges [15:09:02] ah, we need to fix that then [15:09:09] but you are correct in that it appears we didn't follow the v4 setup here somehow [15:09:15] 2a02:ec80:a100:4000::/64 is outside the aggregate [15:09:19] yep. indeed [15:09:29] I can have a closer look after this meeting [15:11:01] taavi: yeah we either add an extra static route [15:11:24] or probably what's better is we change the VIP range so it's within the aggregate [15:14:17] no sorry - I think we are matching the v4 setup [15:14:18] https://phabricator.wikimedia.org/T379282#10717661 [15:15:03] The VIPs can't be from the cloud-private aggregate if they are meant to be accessed from the internet [15:15:06] hence a different range [15:15:21] but we should add a route on the hosts for this [15:16:27] the v6 equivalent of this: [15:16:31] cmooney@cloudvirt2006-dev:~$ ip route get fibmatch 185.15.57.24 [15:16:31] 185.15.57.0/26 via 172.20.5.1 dev vlan2151 [15:34:55] topranks: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1154050 [15:39:23] taavi: yep looks good +1