Etcd holds all Kubernetes cluster state. If etcd becomes corrupted or unavailable, restoring from a backup is the primary recovery path. This guide walks through restoring an etcd snapshot to a cluster managed by cluster templates in Omni. Before following this guide, ensure you have created etcd backups for your cluster. If you have not, see Create Etcd Backups.Documentation Index
Fetch the complete documentation index at: https://docs.siderolabs.com/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
omnictlmust be installed and configured.talosctlmust be installed andtalosconfigmust be configured for the cluster you want to restore. You can downloadtalosconfigfrom the Omni UI or by runningomnictl talosconfig -c <cluster-name>.- The cluster must still exist in Omni and must not have been deleted. Omni cannot restore a cluster that no longer exists as a resource.
- The cluster must be managed using cluster templates. If your cluster was created via the UI, see Export a Cluster Template to convert it first.
- At least one etcd backup must be available for the cluster.
Step 1: Find the cluster UUID
Start by exporting your cluster name as an environment variable. Replace<cluster-name> with the name of your cluster:
UUID column, you will need it in a later step.
Step 2: Find the snapshot to restore
List the available snapshots for the cluster:SNAPSHOT column for the snapshot you want to restore, you will need this in a later step. Use the CREATED AT timestamp to identify the most appropriate snapshot.
Step 3: Delete the existing control plane
To restore etcd, the existing control plane must be deleted first. This puts the cluster into a non-bootstrapped state so that a new control plane can be created with the restored etcd snapshot. Run the following command to delete the existing control planes:Step 4: Create the restore template
Retrieve your existing cluster template file. If you do not have it locally, you can export it by running:bootstrapSpec block to the ControlPlane section, substituting the cluster UUID from Step 1 and the snapshot name from Step 2:
Step 5: Sync the restore template
Apply the template to trigger the restore:Step 6: Restart kubelet on worker nodes
After the restore completes, kubelet must be restarted on all worker nodes to ensure healthy cluster operation. First, retrieve the IDs of the worker nodes:Step 7: Verify the restore
Confirm that all nodes are ready and the cluster is healthy:Ready. If any nodes remain NotReady after a few minutes, check the kubelet status on the affected node: