Requirements
The controlplane nodes must share a layer 2 network, and the virtual IP must be assigned from that shared network subnet. In practical terms, this means that they are all connected via a switch, with no router in between them. Note that the virtual IP election depends onetcd
being up, as Talos uses etcd
for elections and leadership (control) of the IP address.
The virtual IP is not restricted by ports - you can access any port that the control plane nodes are listening on, on that IP address.
Thus it is possible to access the Talos API over the VIP, but it is not recommended, as you cannot access the VIP when etcd is down - and then you could not access the Talos API to recover etcd.
Video Walkthrough
To see a live demo of this writeup, see the video below:Choose your Shared IP
The Virtual IP should be a reserved, unused IP address in the same subnet as your controlplane nodes. It should not be assigned or assignable by your DHCP server. For our example, we will assume that the controlplane nodes have the following IP addresses:192.168.0.10
192.168.0.11
192.168.0.12
192.168.0.15
Configure your Talos Machines
The shared IP setting is only valid for controlplane nodes. For the example above, each of the controlplane nodes should have the following Machine Config snippet:adresses
) instead of DHCP.
When using predictable interface names, the interface name might not be eth0
.
If the machine has a single network interface, it can be selected using a dummy device selector:
Caveats
Since VIP functionality relies onetcd
for elections, the shared IP will not come
alive until after you have bootstrapped Kubernetes.
Don’t use the VIP as the endpoint
in the talosconfig
, as the VIP is bound to etcd
and kube-apiserver
health, and you will not be able to recover from a failure of either of those components using Talos API.
VIP Failover Behavior
When the control plane node holding the VIP shuts down gracefully, the address is reassigned almost instantly, ensuring uninterrupted access. However, if the node fails unexpectedly, for example, due to a power loss or crash, the failover process takes longer, typically up to a minute. This slower response is by design as Talos coordinates VIP ownership through an etcd election process that must balance speed with safety. The delay ensures that a temporary network hiccup or brief pause in communication does not lead to multiple nodes believing they own the VIP at the same time, a dangerous split-brain scenario. By waiting out the election timeout before reassigning the VIP, Talos guarantees that only one node will advertise the shared IP, even if it means failover is slower in sudden failure scenarios.Impact on Workloads
A VIP failover impacts only external access to the cluster, such as when you runkubectl
against the API server.
Inside the cluster, workloads continue to function normally.
With the default configuration using KubePrism and discovery, pods can still reach the Kubernetes API through service discovery, so they remain unaffected by the VIP status.
For external clients, however, a failover can briefly interrupt connectivity.
Long-lived connections, such as HTTP/2 sessions, may be broken when the VIP moves to a new node, and clients should be ready to reconnect once the failover is complete.