talosctl CLI utility.
The upgrade API call passes a node the installer image to use to perform the upgrade.
Each Talos version has a corresponding installer image, listed on the release page for the version, for example .
Upgrades use an A-B image scheme in order to facilitate rollbacks.
This scheme retains the previous Talos kernel and OS image following each upgrade.
If an upgrade fails to boot, Talos will roll back to the previous version.
Likewise, Talos may be manually rolled back via API (or talosctl rollback), which will update the boot reference and reboot.
Note An upgrade of the Talos Linux OS will not (since v1.0) apply an upgrade to the Kubernetes version by default.
Kubernetes upgrades should be managed separately per upgrading kubernetes.
Supported upgrade paths
Because Talos Linux is image based, an upgrade is almost the same as installing Talos, with the difference that the system has already been initialized with a configuration. The supported configuration may change between versions. The upgrade process should handle such changes transparently, but this migration is only tested between adjacent minor releases. Thus the recommended upgrade path is to always upgrade to the latest patch release of all intermediate minor releases. For example, if upgrading from Talos 1.0 to Talos 1.2.4, the recommended upgrade path would be:- upgrade from 1.0 to latest patch of 1.0 - to v1.0.6
- upgrade from v1.0.6 to latest patch of 1.1 - to v1.1.2
- upgrade from v1.1.2 to v1.2.4
Before upgrade to
Talos v1.13 onwards now supports NVIDIA GPU via the gpu-operator.NVIDIA GPU users
Talos 1.13 now supports configuring NVIDIA GPU via the gpu-operator. It’s recommended to uninstall the nvidia-device-plugin helm chart and follow the instructions in NVIDIA GPU (OSS drivers) or NVIDIA GPU (Proprietary drivers) guide to switch to the new gpu-operator based configuration.Video walkthrough
To see a live demo of an upgrade of Talos Linux, see the video below:After upgrade to
There are no specific actions to be taken after an upgrade.talosctl upgrade
To upgrade a Talos node, specify the node’s IP address and the
installer container image for the version of Talos to upgrade to.
For instance, if your Talos node has the IP address 10.20.30.40 and you want
to install the current version, you would enter a command such
as:
Note that because Talos Linux reboots via the kexec syscall, the extra reboot adds very little time.
Upgrade API changes in Talos v1.13
Talos v1.13 introduces a new streaming upgrade API viaLifecycleService.Upgrade.
The talosctl upgrade command now uses this new API by default, providing real-time progress reporting and support for parallel upgrades across multiple nodes.
New flags:
--progress <mode>- Controls the output mode for upgrade progress. Values:auto(default),plain.autouses a dynamic progress reporter if the output is a terminal, and falls back to plain text otherwise.plainalways uses plain text output.--namespace <name>- Containerd namespace to use for the upgrade image. Values:system(default),cri,inmem.
--reboot-mode flag now supports three values: default, powercycle, and force.
Deprecation notice The legacy upgrade flags (--force,--insecure,--preserve,--stage) are deprecated and will be removed in Talos 1.18. These flags are only used when falling back to the legacyMachineService.UpgradeAPI for older Talos versions.
Machine configuration changes
- VolumeConfig and UserVolumeConfig now support negative byte sizes and percentages to specify space left on the disk after creating a partition.
- Mount specification in volume configuration has two new fields:
secure(defaults totrue) to enablenosuid,nodevoptions;disableAccessTime(defaults tofalse) to enablenoatimeoption (performance).
- New ExternalVolumeConfig document to configure external volumes, at the moment
virtiofsvolumes are supported (e.g. Proxmox). - New BlackholeRouteConfig document to configure blackhole routes.
- New KubeSpanConfig document which replaces
.machine.network.kubespanconfiguration, and adds support for filtering advertised addresses by each node. - LinkAliasConfig now supports creating aliases for multiple matching links if the
name:has%dspecifier, e.g.net%d. - New RoutingRuleConfig document to configure Linux routing rules.
- New TCPProbeConfig document to configure TCP-based connectivity checks (probes).
- New VRFConfig document to configure VRF interfaces.
- New EnvironmentConfig document to configure environment variables set on the host.
- New ImageVerificationConfig document to configure container image verification rules.
Upgrade sequence
When a Talos node receives the upgrade command, it cordons itself in Kubernetes, to avoid receiving any new workload. It then starts to drain its existing workload. NOTE: If any of your workloads are sensitive to being shut down ungracefully, be sure to use thelifecycle.preStop Pod spec.
Once all of the workload Pods are drained, Talos will start shutting down its
internal processes.
Once all the processes are stopped and the services are shut down, the filesystems will be unmounted.
This allows Talos to produce a very clean upgrade, as close as possible to a pristine system.
We verify the disk and then perform the actual image upgrade.
We set the bootloader to boot once with the new kernel and OS image, then we reboot.
After the node comes back up and Talos verifies itself, it will make
the bootloader change permanent, rejoin the cluster, and finally uncordon itself to receive new workloads.
FAQs
Q. What happens if an upgrade fails? A. Talos Linux attempts to safely handle upgrade failures. The most common failure is an invalid installer image reference. In this case, Talos will fail to download the upgraded image and will abort the upgrade. Sometimes, Talos is unable to successfully kill off all of the disk access points, in which case it cannot safely unmount all filesystems to effect the upgrade. In this case, it will abort the upgrade and reboot. It is possible (especially with test builds) that the upgraded Talos system will fail to start. In this case, the node will be rebooted, and the bootloader will automatically use the previous Talos kernel and image, thus effectively rolling back the upgrade. Lastly, it is possible that Talos itself will upgrade successfully, start up, and rejoin the cluster but your workload will fail to run on it, for whatever reason. This is when you would use thetalosctl rollback command to revert back to the previous Talos version.
Q. Can upgrades be scheduled?
A. Because the upgrade sequence is API-driven, you can easily tie it in to your own business logic to schedule and coordinate your upgrades.
Q. Can the upgrade process be observed?
A. Yes, using the talosctl dmesg -f command.
You can also use talosctl upgrade --wait, and optionally talosctl upgrade --wait --debug to observe kernel logs
Q. Are worker node upgrades handled differently from control plane node upgrades?
A. Short answer: no.
Long answer: Both node types follow the same set procedure.
From the user’s standpoint, however, the processes are identical.
However, since control plane nodes run additional services, such as etcd, there are some extra steps and checks performed on them.
For instance, Talos will refuse to upgrade a control plane node if that upgrade would cause a loss of quorum for etcd.
If multiple control plane nodes are asked to upgrade at the same time, Talos will protect the Kubernetes cluster by ensuring only one control plane node actively upgrades at any time, via checking etcd quorum.
Q. Can I break my cluster by upgrading everything at once?
A. Possibly - it’s not recommended.
Nothing prevents the user from sending near-simultaneous upgrades to each node of the cluster - and while Talos Linux and Kubernetes can generally deal with this situation, other components of the cluster may not be able to recover from more than one node rebooting at a time.
(e.g. any software that maintains a quorum or state across nodes, such as Rook/Ceph)
Q. Which version of talosctl should I use to update a cluster?
A. We recommend using the version that matches the current running version of the cluster.