[08:49:40] good morning [08:49:45] +1 here anybody? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1078954 [09:05:37] arturo: +1d [09:06:11] dhinus: thanks! [09:23:56] arturo: did you make any progress with the VRRP stuff? [09:24:04] topranks: yes [09:24:24] oh great [09:24:36] I was gonna dip in and see if I could work anything out [09:24:53] I'll send you a patch shortly for review [09:26:27] topranks: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1079449 [09:31:11] ok cool [09:31:16] all that seems sensible [09:31:23] the rp_filter thing I'm surprised about though [09:32:46] quick search seems to confirm only available in ipv4. that sucks :) [09:36:34] commented back with one concern [09:37:17] ok, thanks [09:37:18] maybe I'm being silly, just something about hard-coding an auto-generated link-local IP rings alarm bells [09:37:22] I'll explore the multicast thing [09:37:31] yeah, felt weird for me too [09:37:33] I think if you remove the unicast_peer that will be the default mode [09:38:12] we can leave it on for the v4 one I think [09:38:38] with multicast, I see these packets [09:38:38] 09:38:18.566233 vlan2120 Out IP6 fe80::2eea:7fff:fe7b:e104 > ff02::12: VRRPv3, Advertisement, vrid 52, prio 47, intvl 100cs, length 40 [09:38:44] which seems right to me [09:42:52] topranks: updated [09:43:47] that looks correct yep [09:43:50] let's give it a shot :) [09:47:50] thanks [10:02:01] * arturo brb [10:02:25] topranks: seems there is no routing to 2a02:ec80:a100:fe03::2 ? [10:02:28] * arturo brb for real [10:29:43] arturo: yeah the cloudgw has no routes set up [10:30:05] these are the missing oness [10:30:15] ip -6 route add vrf vrf-cloudgw default via 2a02:ec80:a100:fe03::1 [10:30:15] ip route add vrf vrf-cloudgw 2a02:ec80:a100::/55 via 2a02:ec80:a100:fe04::2:1 [10:31:48] we could potentially enable IPv6 RA generation on the cloudsw for vlan2120 [10:32:02] that would create the first route (default via the cloudgw) [10:32:18] but tbh I'm not much of a fan of that, think it's easier to set them up manually [10:40:19] ok! [10:44:48] arturo: I added them but something still not working [10:44:51] can't work it out [10:45:07] they do need to be in /etc/network/interface - similar to how we've done it for the v4 routes [10:45:08] mind the vrf [10:45:21] yeah no it's ok on that score [10:45:33] I'll cook a patch for persisting them shortly [10:45:50] nftables forward chain seems ok. sysctls to enable forwarding seem the same for v4 and v6 [10:47:00] I ran pwru (https://github.com/cilium/pwru) but I still can't work out where it's being dropped [10:47:05] https://phabricator.wikimedia.org/P69681 [10:49:20] it gets to the ip_route_input() function but no further [10:49:26] so seems to be allowed [10:49:53] route should be ok [10:49:56] cmooney@cloudgw2002-dev:~$ ip -6 route get fibmatch vrf vrf-cloudgw 2a02:ec80:a100:1::1 [10:49:56] 2a02:ec80:a100::/55 via 2a02:ec80:a100:fe04::2:1 dev vlan2107 table cloudgw metric 1024 pref medium [10:50:18] and indeed that IP is reachable in a trace from the cloudgw [10:50:21] cmooney@cloudgw2002-dev:~$ sudo ip vrf exec vrf-cloudgw traceroute -I -n -w 1 2a02:ec80:a100:1::1 [10:50:21] traceroute to 2a02:ec80:a100:1::1 (2a02:ec80:a100:1::1), 30 hops max, 80 byte packets [10:50:21] 1 2a02:ec80:a100:1::1 1.495 ms 1.430 ms 1.419 ms [10:51:38] next-hop on the route is ok [10:51:42] https://www.irccloud.com/pastebin/iZZhElny/ [10:55:44] maybe we need `net.ipv6.conf.all.forwarding=1` [10:55:50] because of the vrf thingy [10:55:57] let me set that on cloudgw2002-dev [10:56:13] could be maybe [10:56:20] it's not set for v4 but potentially [10:57:06] did you just break cloudgw2002-dev doing that :D [10:57:15] I believe I did [10:57:16] :-) [10:57:31] I don't understand why though [10:57:51] let me unset it via console [10:58:45] I'm on there now over v4 [10:59:01] ok, any hints why that sysctl may break the network? [11:00:55] my ssh session just froze briefly again [11:01:40] no not really sure why that would break things tbh [11:06:28] I'm reading this https://stbuehler.de/blog/article/2020/02/29/using_vrf__virtual_routing_and_forwarding__on_linux.html wondering if we need to somehow tune `ip rules` for the IPv6 routing in the VRF to work as expected [11:10:14] arutro: not sure, the pwru clearly shows the packet traverses the l3mdev (vrf-cloudgw), which seems to suggest the rules are catching it [11:10:29] we are missing the blackhole / unreachable routes though - they should be in vrf [11:10:50] (without that if a route is not found in the vrf table it continues to next rule and packet leaks into main table / vrf) [11:10:57] but I don't think that's our issue here [11:13:03] probably want these in /etc/network/interfaces.d/cloud: [11:13:15] post-up ip -4 route add vrf vrf-cloudgw unreachable default metric 4278198272 [11:13:24] post-up ip -6 route add vrf vrf-cloudgw unreachable default metric 4278198272 [11:41:01] arturo: I'm still lost tbh [11:41:15] one thing I did as an experiment is add a rule to the nftables with my own IP in it [11:41:39] sudo nft insert rule inet cloudgw forward ip6 saddr counter accept [11:41:53] same here [11:41:57] but I see no hits even though I can see the packets coming in vlan2120 in a tcpdump [11:42:23] what is the destination address? [11:42:44] 2a02:ec80:a100:1::1 [11:43:42] what about `2a00:ec80:a100:fe04::2:1` ? that's the other end of the neutron virtual router [11:44:11] I cannot ping that one either from my laptop [11:46:09] there's a replag alert for clouddb1019, I checked and replication is working, just slow. hopefully it will catch up soon, otherwise I will depool it [11:46:29] my ssh session on cloudgw2002-dev just froze again ... [11:46:33] arturo: yeah they are kind of the same [11:46:48] I toggled net.ipv6.conf.eno1.forwarding on and off maybe that interrupted you? [11:46:53] didn't notice any disruption myself [11:46:56] yeah [11:47:31] Yeah both 2a00:ec80:a100:fe04::2:1 and 2a02:ec80:a100:1::1 should work [11:47:57] both ping from the cloudgw [11:47:59] arturo: there are a bunch of alerts about the new prometheus-node-kernel-panic.service being down [11:48:16] but packets from outside are not forwarded... [11:48:25] dhinus: oh! :-( [11:49:18] prometheus-node-kernel-panic[10180]: Failed to parse boot descriptor 'all' [11:49:23] I guess they are too old [11:54:35] dhinus: this should fix it https://gerrit.wikimedia.org/r/c/operations/puppet/+/1079480 [12:04:50] topranks: I made some changes to /etc/n/i in clodgw2002-dev/2003-dev (just getting them in sync with puppet) [12:04:58] and I would like to reboot the nodes. Ok for you? [12:05:06] yeah go ahead [12:05:36] I think the default route was not being inserted into the routing table bc it was just after the ipv6 rp_filter sysctl call, which fails [12:05:56] so a reboot should confirm that [12:07:10] I'm curious, how do you force a bastion -> server ssh session over IPv4 anyway? [12:07:59] the default was there from when I added it manually, but yeah we need to get it working on a clean boot [12:08:55] arturo: you can ssh to the ipv4 address itself (not hostname), and configure your ssh_config to forward that via bastion [12:08:56] messy [12:09:09] :-S ok [12:09:39] cloudgw2002-dev back online [12:10:41] routes are now added as expected on boot [12:12:16] cloudgw2003-dev back online [12:12:42] topranks: I just discovered this [12:13:07] nah nevermind [12:13:46] damn! [12:13:57] let me do a bit more nftables tracing [12:14:40] ugh I think I broke it [12:14:45] mmmm just lost the session again? [12:14:48] or... paused [12:15:00] 12 loadavg?? [12:15:01] I issued "sudo sysctl -w net.ipv6.conf.vrf-cloudgw.forwarding=1" [12:15:16] ok [12:15:39] I think that sysctl just creates a huge load spike! [12:16:01] doesn't make sense, but yeah seems to [12:16:24] re: nftables tracing I'm pretty sure it's not going to the forward chain [12:16:31] or being dropped by nftables (drop counters are zero) [12:16:52] aborrero@cloudgw2002-dev:~ 58s 1 $ sudo nft add rule inet base prerouting counter ip6 daddr 2a02:ec80:a100:fe04::2:1 counter nftrace set 1 counter [12:16:58] ok [12:17:21] Looking at the pwru trace seems to suggest the kernel just doesn't route it when it tries... [12:17:25] https://usercontent.irccloud-cdn.com/file/FNSZEiHh/image.png [12:17:47] you are right [12:17:52] the packet never gets routed [12:17:54] https://usercontent.irccloud-cdn.com/file/9E666xFT/image.png [12:17:58] nftables is not dropping it [12:18:06] goes from "ip6_forward" to "kfree_skb" but why [12:18:18] ooh I didn't know about "nft monitor trace" [12:18:20] nice :) [12:21:17] arturo/dhinus: do you see any objections to this request? T376637 [12:21:17] T376637: Request floating IP for wikiwho project - https://phabricator.wikimedia.org/T376637 [12:21:28] topranks: [12:21:42] this seems to be as expected. The IP is my laptop https://www.irccloud.com/pastebin/YBEZZy0P/ [12:22:35] blancadesal: I think they can redirect to the proxy IP address, no? that name is on the nova proxy? [12:22:58] arturo: yeah the routing tables are definitely correct [12:23:02] cmooney@cloudgw2002-dev:~$ ip route get fibmatch vrf vrf-cloudgw 2a0c:5a81:d613:3900:6866:82b2:f8d:1e11 [12:23:02] default via 2a02:ec80:a100:fe03::1 dev vlan2120 table cloudgw metric 1024 pref medium [12:24:31] so I get to your earlier theory: IPv6 forwarding is somehow broken with the vrf [12:24:53] yeah it's infurating [12:28:09] blancadesal: replied on task [12:29:15] arturo: thanks! [12:30:40] arturo: are cloud vps quota increases still managed "as before", or are we using tofu now? [12:30:55] blancadesal: quotas not in tofu yet [12:31:08] are you interested in migrating them? :-^ [12:32:12] not right now :p just looking to do T376847 [12:32:13] T376847: Quota increase for Integration project (Jenkins CI runners) - https://phabricator.wikimedia.org/T376847 [12:33:41] topranks: I'll try setting all.forwarding yet again [12:33:52] f it yeah [12:33:57] the thought had occured to me too [12:34:16] aborrero@cloudgw2002-dev:~ $ sudo sysctl net.ipv6.conf.all.forwarding=1 [12:35:01] I lost my ssh session [12:35:29] I wonder if my ssh session getting killed has anything to do [12:35:31] huh [12:35:37] the default route dropped out [12:35:37] like, if I should be using screen or tmux [12:35:56] ah! [12:36:22] this smell like some kernel "feature" [12:37:53] topranks: works now! [12:38:00] I added the route manually again [12:38:30] so this is some kind of race between the two things [12:38:34] It was getting RAs, but not adding a route [12:38:36] https://phabricator.wikimedia.org/P69689 [12:38:37] configuring the routes and setting the forwarding [12:39:11] wait!!! [12:39:22] I thouhgt you meant ssh to cloudgw worked... the f---- ping is working from here! [12:39:22] * arturo waits [12:39:37] yeah! that's what I meant, the ping :-P [12:41:50] cathal@officepc:~/Desktop/fuinneamh$ ping 2a02:ec80:a100:1::29c [12:41:50] PING 2a02:ec80:a100:1::29c(2a02:ec80:a100:1::29c) 56 data bytes [12:41:50] 64 bytes from 2a02:ec80:a100:1::29c: icmp_seq=1 ttl=47 time=149 ms [12:41:50] 64 bytes from 2a02:ec80:a100:1::29c: icmp_seq=2 ttl=47 time=146 ms [12:42:34] is that a VM? :_) [12:42:40] beautiful, isn't it? [12:42:44] think so [12:43:05] topranks: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1079508 [12:43:20] is this the issue? i've no idea tbh [12:43:20] https://github.com/torvalds/linux/blob/master/net/ipv6/route.c#L2226 [12:44:43] arturo: +1 if that works [12:45:03] we could also do it earlier as it's a global setting, with sysctl::parameters maybe [12:45:21] yeah, that's a good idea as well [12:45:53] that's better actually, let me refresh the patch [12:50:16] we could quickly verify without merging the patch [12:50:19] let me try that [12:51:29] ok [12:54:21] rebooting [12:57:08] we need to think about firewalling [12:57:31] I think the forward chain is only allowing routing in the vrf [12:57:42] is that what you mean? [12:58:04] no the question as to whether the VMs should be exposed to t he world [12:58:14] or perhaps neutron takes care of all that? [12:58:23] oh, yeah, right, definitely [12:58:40] at the moment the IPv4 setting is that VMs are "protected" by NAT [12:58:51] we could emulate that for IPv6 via firewall [12:59:01] i.e, only allow new egress network connections [12:59:50] yeah [12:59:59] you'll need to change the rules in the forward chain a little [13:00:16] need two rules replacing the one you have [13:00:47] iifname vlan2107 oifname vlan2120 accept [13:01:07] iifname vlan2120 oifname vlan2107 ct state related,established accept [13:01:15] something along those lines [13:01:32] should probably allow icmpv6 also though [13:01:38] so before the one with "ct state" have a [13:01:56] iifname vlan2120 oifname vlan2107 ip6 nexthdr ipv6-icmp accept [13:02:22] shall we allow icmp6 in every direction? [13:02:39] we should I think yeah [13:03:25] looking at my home router I got it as first line in forward for everything [13:03:28] https://www.irccloud.com/pastebin/aL0uSAhp/ [13:03:40] ok [13:05:51] are the servers not coming back online? [13:07:42] hmm [13:07:44] on via console [13:07:47] oot@cloudgw2002-dev:~# uptime [13:07:47] 13:07:33 up 11 min, 1 user, load average: 0.03, 0.08, 0.09 [13:08:11] ok, so is just the network that failed to be online :-P [13:09:05] yeah weird [13:09:21] the sysctl seemed to set [13:09:34] I think maybe it's something else in /e/n/i perhaps [13:09:46] yeah, I'll look in a second [13:18:03] topranks: potential fw change: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1079515 (I don't think we should merge on friday?) [13:19:52] topranks: the network setting problem with ifupdown is this [13:19:56] https://www.irccloud.com/pastebin/YL3jX4qs/ [13:20:17] I have seen this before, and it is usually a wrong setting regarding ra params and such [13:20:48] https://ral-arturo.org/2021/04/01/ip-token.html [13:21:01] actually, it conflicts directly with the forwarding bit [13:28:35] ah ok [13:28:52] it can be deleted I think [13:29:04] the "up ip addr add 2620:0:860:118:10:192:20:18/64 dev eno1" covers setting the IP [13:30:52] who added that? [13:31:46] it's part of our normal config from the d-i stage [13:31:52] ok [13:32:03] but afaik the "ip token" command is now redundant because we have that [13:32:09] ok [13:33:29] what I don't know is if we might end up with two IPs on the int without the token one [13:34:22] because autoconf? [13:35:44] yeah, possibly the token set means autoconf sets the same ip as in the "ip addr add" command, and without it might add a second [13:36:24] we may disable autoconf entirely with `net.ipv6.conf.eth1.autoconf=0` if we wanted? [13:36:30] would that be acceptable? [13:37:11] the interfaces still should assign themselves a link-local, and we need to process the router-advertisements from the switch [13:37:22] actually... one sec [13:38:09] we can set this sysctl as well [13:38:10] net.ipv6.conf.eno1.forwarding = 0 [13:38:18] then the token bit stays as is and we should be ok I think [13:38:49] https://usercontent.irccloud-cdn.com/file/a4JkvoF3/image.png [13:38:53] are you sure eno1 forwarding is not what was causing the problem earlier? [13:39:48] yeah - the ping is still working :) [13:40:26] I don't understand then --- using all.forwarding = 1 solved the routing problem. I thought that was because it included eno1 [13:40:28] plus I'd toggled it on and off for eno1 earlier and it had no effect [13:40:31] it's the _all_ [13:40:37] ok [13:41:23] yeah don't know, that one line in route.c looked iffy and maybe why it's needed on _all_ but I don't know tbh [13:44:22] ok new version of the patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1079508 [13:44:39] I need to run to pick up child from the nursery, be back later [13:45:05] ok [13:45:15] the one worry I'd have here is a race [13:45:40] but you're correct to use post_up_command and then "interface_primary" as opposed to hard-coding the int IP [13:47:07] token is set as pre-up so it'll try to do that first afaik [13:47:18] s/int IP/int name/ [14:47:17] actually [14:47:28] we may want to set as a pre-up? [14:48:29] the token set is a pre-up command [14:48:34] so post-up wont work [14:48:39] this needs to be on sysctl [14:51:51] ok, I think this version might work: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1079508 [15:09:55] topranks: confirmed via manual hacks on the server, it seems to work! ^^^ [15:11:04] ah cool.... didn't know we could use $interface_primary there but it makes sense, it's a puppet var [15:11:06] +1 [15:11:28] thanks, merging [15:15:09] cool... you did a reboot just now? [15:15:14] yes! [15:15:15] sorry -_- [15:15:21] to verify puppet did the right thing [15:15:28] and they will boot OK with the new config [15:15:38] yep np at all [15:15:53] shall we merge the fw change as well? [15:16:11] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1079515 [15:16:28] this one is a bit scarier, as it also changes stuff for eqiad1 [15:16:52] we probably should I think... but let me have a closer look [15:17:28] mmm I just noticed I hardcoded a few things, let me refresh it [15:18:25] ok [15:18:37] there you go [15:18:48] it's a nice property of the way vrf works - and has a virtual interface - that the interface can be used in "oif" "iif" netfilter rules [15:18:54] let me look again [15:19:40] lgtm, for eqiad I guess maybe we should disable puppet on the active host, then merge, run puppet on the backup cloudgw, check things look ok etc?? [15:19:48] yeah [15:20:01] not ideal for friday anyway, but I'll do that [15:23:07] merging [15:33:46] rolled out just fine [15:35:07] \o/ [15:35:10] nice [15:35:20] I think that would be all network-wise for this week [15:35:28] thanks for all the hard work topranks :-) [15:35:42] are you sure??? [15:36:05] I was thinking we could migrate the VXLAN to GENEVE encapsulation with BGP control plane before we sign off :P [15:36:18] xd [15:36:19] or... maybe let's leave something for next week :) [15:36:31] enjoy the weekend! [15:36:43] thanks, likewise [15:37:11] btw I added the IPs used by cloudgw to netbox manually, the import from puppetdb script failed (didn't like multiple IPs on an int, we can fix again for this edge-case with vrrp) [15:37:23] ok [15:38:25] traceroute now looks pretty: [15:38:26] https://phabricator.wikimedia.org/P69696 [15:38:38] we gotta get designate working for the VM IPs though [15:38:49] I did something today, that I did not have to do since a long time ago, drawing the network diagram on a piece of paper [15:39:12] https://usercontent.irccloud-cdn.com/file/zUGovMuW/irccloudcapture5753843478898742390.jpg [15:39:18] can't beat pen & paper for this stuff [15:39:40] wow nice <3 [15:40:51] that traceroute is 🌈 [16:29:04] FYI I did update a few wikitech pages, reviews/feedback welcome, see this comment for the involved pages: https://phabricator.wikimedia.org/T375113#10222009 [16:29:43] ah and the clouddb1019 replication lag I mentioned earlier is back to zero :) [16:30:07] have a good weekend!