Understand more about KubeSpan for Talos Linux.
192.168.2.10
, but a node across the internet may see it as 2001:db8:1ef1::10
.
We need to be able to handle any number of addresses and ports, and we also need to have a mechanism to try them.
WireGuard only allows us to select one at a time.
KubeSpan implements a controller which continuously discovers and rotates these IP:port pairs until a connection is established.
It then starts trying again if that connection ever fails.
iptables
.
Luckily, the kernel supplies a convenient mechanism by which to define this arbitrarily large set of IP addresses: IP sets.
Talos collects all of the IPs and subnets which are considered āin-clusterā and maintains these in the kernel as an IP set.
Now that we have the IP set defined, we need to tell the kernel how to use it.
The traditional way of doing this would be to use iptables
.
However, there is a big problem with IPTables.
It is a common namespace in which any number of other pieces of software may dump things.
We have no surety that what we add will not be wiped out by something else (from Kubernetes itself, to the CNI, to some workload application), be rendered unusable by higher-priority rules, or just generally cause trouble and conflicts.
Instead, we use a three-pronged system which is both more foundational and less centralised.
NFTables offers a separately namespaced, decentralised way of marking packets for later processing based on IP sets.
Instead of a common set of well-known tables, NFTables uses hooks into the kernelās netfilter system, which are less vulnerable to being usurped, bypassed, or a source of interference than IPTables, but which are rendered down by the kernel to the same underlying XTables system.
Our NFTables system is where we store the IP sets.
Any packet which enters the system, either by forward from inside Kubernetes or by generation from the host itself, is compared against a hash table of this IP set.
If it is matched, it is marked for later processing by our next stage.
This is a high-performance system which exists fully in the kernel and which ultimately becomes an eBPF program, so it scales well to hundreds of nodes.
The next stage is the kernel routerās route rules.
These are defined as a common ordered list of operations for the whole operating system, but they are intended to be tightly constrained and are rarely used by applications in any case.
The rules we add are very simple: if a packet is marked by our NFTables system, send it to an alternate routing table.
This leads us to our third and final stage of packet routing.
We have a custom routing table with two rules:
ip rule
match:
for our fwmark, and sending those matched packets to a separate routing table
with one rule: default to the wireguard interface.
So we have three components:
talos_kubespan
.
Next, two chains are set up: one for the prerouting
hook (kubespan_prerouting
)
and the other for the outgoing
hook (kubespan_outgoing
).
We define two sets of target IP prefixes: one for IPv6 (kubespan_targets_ipv6
)
and the other for IPv4 (kubespan_targets_ipv4
).
Last, we add rules to each chain which basically specify:
0x00000060
.
Note: if other software on the node is using the bitsIn the routing rules table, we match on the mark0x60
of the firewall mark, this might cause conflicts and break KubeSpan. At the moment of the writing, it was confirmed that Calico CNI is using bits0xffff0000
and Cilium CNI is using bits0xf00
, so KubeSpan is compatible with both. Flannel CNI uses0x4000
mask, so it is also compatible.
0x40
with the mask 0x60
:
0x60
and we set the mask by only modifying
bits from the 0x60
mask: