> ## Documentation Index
> Fetch the complete documentation index at: https://docs.siderolabs.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Restore Etcd of a Cluster

> Recover a template-managed Omni cluster from an etcd backup.

Etcd holds all Kubernetes cluster state. If etcd becomes corrupted or unavailable, restoring from a backup is the primary recovery path. This guide walks through restoring an etcd snapshot to a cluster managed by cluster templates in Omni.

Before following this guide, ensure you have created etcd backups for your cluster. If you have not, see [Create Etcd Backups](./etcd-backups).

## Prerequisites

* [`omnictl`](../getting-started/install-and-configure-omnictl) must be installed and configured.
* [`talosctl`](../getting-started/how-to-install-talosctl) must be installed and `talosconfig` must be configured for the cluster you want to restore. You can download `talosconfig` from the Omni UI or by running `omnictl talosconfig -c <cluster-name>`.
* The cluster must still exist in Omni and must not have been deleted. Omni cannot restore a cluster that no longer exists as a resource.
* The cluster must be managed using cluster templates. If your cluster was created via the UI, see [Export a Cluster Template](../cluster-management/export-a-cluster-template-from-a-cluster-created-in-the-ui) to convert it first.
* At least one etcd backup must be available for the cluster.

## Step 1: Find the cluster UUID

Start by exporting your cluster name as an environment variable. Replace `<cluster-name>` with the name of your cluster:

```bash theme={null}
export CLUSTER_NAME=<cluster-name>
```

Then retrieve the cluster UUID:

```bash theme={null}
omnictl get clusteruuid $CLUSTER_NAME
```

The output will look similar to this:

```bash theme={null}
NAMESPACE   TYPE          ID              VERSION   UUID
default     ClusterUUID   my-cluster      1         bb874758-ee54-4d3b-bac3-4c8349737298
```

Note the value in the `UUID` column, you will need it in a later step.

## Step 2: Find the snapshot to restore

List the available snapshots for the cluster:

```bash theme={null}
omnictl get etcdbackup -l omni.sidero.dev/cluster=$CLUSTER_NAME
```

The output will look similar to this:

```
NAMESPACE   TYPE         ID                         VERSION     CREATED AT                         SNAPSHOT
external    EtcdBackup   my-cluster-1701184522   undefined   {"nanos":0,"seconds":1701184522}   FFFFFFFF9A99FBF6.snapshot
external    EtcdBackup   my-cluster-1701184515   undefined   {"nanos":0,"seconds":1701184515}   FFFFFFFF9A99FBFD.snapshot
external    EtcdBackup   my-cluster-1701184500   undefined   {"nanos":0,"seconds":1701184500}   FFFFFFFF9A99FC0C.snapshot
```

Note the value in the `SNAPSHOT` column for the snapshot you want to restore, you will need this in a later step. Use the `CREATED AT` timestamp to identify the most appropriate snapshot.

## Step 3: Delete the existing control plane

To restore etcd, the existing control plane must be deleted first. This puts the cluster into a non-bootstrapped state so that a new control plane can be created with the restored etcd snapshot.

Run the following command to delete the existing control planes:

```bash theme={null}
omnictl delete machineset $CLUSTER_NAME-control-planes
```

## Step 4: Create the restore template

Retrieve your existing cluster template file. If you do not have it locally, you can export it by running:

```bash theme={null}
omnictl cluster template export -c $CLUSTER_NAME > restore-template.yaml
```

Open the file and add a `bootstrapSpec` block to the `ControlPlane` section, substituting the cluster UUID from Step 1 and the snapshot name from Step 2:

```yaml theme={null}
kind: Cluster
name: <cluster-name>
kubernetes:
  version: v1.28.2
talos:
  version: v1.5.5
---
kind: ControlPlane
machines:
  - <controlplane-machine-uuid-1>
  - <controlplane-machine-uuid-2>
  - <controlplane-machine-uuid-3>
bootstrapSpec:
  clusterUUID: <cluster-uuid>       # UUID from Step 1
  snapshot: <snapshot-name>         # snapshot name from Step 2
---
kind: Workers
machines:
  - <worker-machine-uuid-1>
  - <worker-machine-uuid-2>
```

## Step 5: Sync the restore template

Apply the template to trigger the restore:

```bash theme={null}
omnictl cluster template sync -f restore-template.yaml
```

Monitor the status until the cluster is fully restored:

```bash theme={null}
omnictl cluster template status -f restore-template.yaml
```

## Step 6: Restart kubelet on worker nodes

After the restore completes, kubelet must be restarted on all worker nodes to ensure healthy cluster operation.

First, retrieve the IDs of the worker nodes:

```bash theme={null}
omnictl get clustermachine -l omni.sidero.dev/role-worker,omni.sidero.dev/cluster=$CLUSTER_NAME
```

The output will look similar to this:

```
NAMESPACE   TYPE             ID                                     VERSION
default     ClusterMachine   26b87860-38b4-400f-af72-bc8d26ab6cd6   3
default     ClusterMachine   2f6af2ad-bebb-42a5-b6b0-2b9397acafbc   3
default     ClusterMachine   5f93376a-95f6-496c-b4b7-630a0607ac7f   3
default     ClusterMachine   c863ccdf-cdb7-4519-878e-5484a1be119a   3
```

Run a kubelet restart for each worker node ID in the output, replacing each ID with those from your cluster:

```bash theme={null}
talosctl -n <worker-node-id-1> service kubelet restart
talosctl -n <worker-node-id-2> service kubelet restart
talosctl -n <worker-node-id-3> service kubelet restart
talosctl -n <worker-node-id-4> service kubelet restart
```

## Step 7: Verify the restore

Confirm that all nodes are ready and the cluster is healthy:

```bash theme={null}
kubectl get nodes
```

All nodes should show a status of `Ready`. If any nodes remain `NotReady` after a few minutes, check the kubelet status on the affected node:

```bash theme={null}
talosctl -n <node-id> service kubelet
```

You can also verify etcd membership is healthy by running:

```bash theme={null}
talosctl -n <controlplane-node-id> etcd members
```
