Operational instructions for etcd database.
etcd
database backs Kubernetes control plane state, so etcd
health is critical for Kubernetes availability.
etcd
default database space quota is set to 2 GiB by default.
If the database size exceeds the quota, etcd
will stop operations until the issue is resolved.
This condition can be checked with talosctl etcd alarm list
command:
etcd
section in the machine configuration:
talosctl etcd alarm disarm
to clear the NOSPACE
alarm.
etcd
database can become fragmented over time if there are lots of writes and deletes.
Kubernetes API server performs automatic compaction of the etcd
database, which marks deleted space as free and ready to be reused.
However, the space is not actually freed until the database is defragmented.
If the database is heavily fragmented (in use/db size ratio is less than 0.5), defragmentation might increase the performance.
If the database runs over the space quota (see above), but the actual in use database size is small, defragmentation is required to bring the on-disk database size below the limit.
Current database size can be checked with talosctl etcd status
command:
ERRORS
column.
To defragment the database, run talosctl etcd defrag
command:
Note: defragmentation is a resource-intensive operation, so it is recommended to run it on a single node at a time. Defragmentation to a live member blocks the system from reading and writing data while rebuilding its state.Once the defragmentation is complete, the database size will match closely to the in use size:
etcd
database should be performed to ensure that the cluster can be restored in case of a failure.
This procedure is described in the disaster recovery guide.