Hosted Omni
The hosted Omni service is the simplest and most reliable way to use Omni. The platform is maintained by SideroLabs and provides a fully managed environment with built-in recovery and upgrades. You do not need to manage infrastructure. For users without strict self-hosting or air-gap requirements, hosted Omni offers the best overall experience.Self-Hosted
Self-hosting Omni provides more control over data locality and infrastructure but introduces additional operational responsibilities. Omni is not part of the Kubernetes control plane, and temporary unavailability does not affect how your clusters run. They continue operating normally, and Talos machines reconnect when it becomes available again. This behavior is important when deciding whether high availability is necessary. Kubernetes clusters deployed on Talos use technologies such as KubePrism and a discovery service to make sure the cluster and workloads can run without highly available, external management such as Omni. Omni is the authentication mechanism for external access to Talos and Kubernetes. All external user (e.g.,kubectl) and service (e.g., Infrastructure Providers) communication goes through Omni. If Omni is unavailable for extended periods of time, external communication will not work until Omni is recovered. Omni also offers an emergency “break glass” configuration to access Talos machines and Kubernetes clusters when Omni is not available.
These are the recommended deployment models for self-hosting Omni:
- A single VM deployment
- Kubernetes deployment
- External etcd
- Highly available
Single VM deployment
Running Omni as a Docker Compose application on a single virtual machine is the preferred on-premises setup. It is simple to operate, dependable, and fully supported. In this setup, Omni keeps its state in an embedded etcd database. Because everything is stored locally on the VM, backing up and recovering the system is straightforward, VM snapshots are usually enough. Non-critical information such as metrics and machine logs are stored in a local SQLite database on disk. One of the strengths of this model is that downtime has no effect on Kubernetes clusters. Your clusters continue running even if Omni goes offline, and Talos simply reconnects and resumes management once the VM is back. For most self-hosted environments, with the ability to live migrate VMs this should provide about 99.9% uptime. Backups and restores can happen at a VM layer like traditional VMs.Kubernetes deployment
Some environments require faster recovery, such as those with strict SLAs or compliance requirements. In these cases, Omni can be deployed inside a Kubernetes cluster, where health checks and automatic rescheduling allow the application to recover quickly from failures. Omni should never be hosted in a Kubernetes cluster that it manages itself. Doing so creates a circular dependency that would be difficult, or impossible, to recover from. Instead, Omni can be deployed on a Kubernetes cluster running on Talos that is not managed by Omni. This pattern is supported, and examples are available in the contrib repository. Running Omni on Kubernetes can make operations more standardized and troubleshooting easier, and it can enable faster recovery. However, it is not required and typically does not provide significantly higher availability than a single-VM deployment. The Sidero Omni SaaS runs multiple Omni instances on Kubernetes backed by external etcd storage. This architecture allows instances to be created quickly, migrated for maintenance, and recovered efficiently. If you do not need to run multiple Omni instances, a single-VM deployment is often the more appropriate choice.External etcd
Omni can connect to an external etcd database, which is the recommended configuration for production environments. The etcd datastore can be deployed and managed using your organization’s existing database practices, providing Omni with a highly available and scalable backend for larger deployments. A single external etcd cluster can also be shared across multiple Omni instances. Each Omni instance is assigned a uniqueaccountUUID, which is used as the key for isolating and storing its data.
Omni has been tested with up to 5,000 connected nodes per instance. For multi-Omni deployments, careful planning around etcd scaling and data compaction is required to ensure reliable performance
Highly available
Running Omni in a highly available (HA) mode requires careful planning. Whether Omni is deployed on virtual machines or in Kubernetes does not change the overall architecture, but an external, highly available etcd cluster is required to ensure all core components remain available. To eliminate single points of failure, supporting services must also be deployed in HA configurations. This includes the container registry, Image Factory, and authentication systems.You can run only one Omni instance at a time.
- Omni (1)
- Etcd (3)
- Image factory (2)
- Container registry (2)
- Vault API (2)
- Vault storage (2)
- Authentication
Most users do not require this level of availability due to the design of Omni and Kubernetes communication.
Compare deployment models
The following table provides a comparison of the recommended deployment models:| Deployment model | Best for | Complexity | Notes |
| Hosted Omni | Most users; no infra to manage | None | Fully managed and resilient |
| Single VM | Most self-hosted environments | Low |
|
| Kubernetes (Non-Omni-managed) | Environments requiring faster recovery or standardized platform operations | Medium |
|
| Single Omni with external etcd | Larger installations needing more durable storage | Medium–high |
|
| Omni HA | Strict uptime requirements (≈99.99%) and mature ops teams | Very high |
|