Example Provider
Reference implementation is our KubeVirt provider. The easiest way is to copy it and change the code that interacts with the platform.Overview
This guide explains how Omni dynamically provisions machines through infrastructure providers and how to implement one. Lets consider we have aMachineClass and a MachineSet created:
- Omni creates a
MachineRequestSetwith the same name as theMachineSet, aligning the desired number of machines. - Another Omni controller generates individual
MachineRequestresources to match the required count. TheseMachineRequestobjects are created in theinfra-providernamespace and labeled withomni.sidero.dev/infra-provider: <provider-id>. - The provider controller detects new
MachineRequestobjects matching its ID and executes the definedProvisionStepsuntil completion. - During execution, the provider updates the
MachineRequestStatuswith the current step name. - Omni waits until the provisioned VM joins.
- Since the machine request ID differs from the actual machine UUID (which may be provider-controlled), there are two options:
- If the provider can set or retrieve the machine UUID, it should update it in
MachineRequestStatususingprovision.Context.SetMachineUUIDmethod. Omni then maps this status to the correspondingLinkresource. - Alternatively, the machine request ID can be encoded into the SideroLink join token, allowing immediate mapping.
- If the provider can set or retrieve the machine UUID, it should update it in
- Once the link is mapped, Omni creates the related resources (
Machine,MachineStatus, etc.), making the machine usable. - The controller responsible for automatic
MachineSetNodecreation assigns the machine to a cluster. From this point, the workflow is identical to that of manually added machines.
Provider Implementation Details
A provider is a standalone service that must have access to the Omni API. It should be written in Go and stores its state in Omni under the namespaceinfra-provider:<provider-id>,
meaning it does not require its own persistent storage.
You can use the shared library for provider development:
github.com/siderolabs/omni/tree/main/client/pkg/infra
When using this library, implement the provision.Provision interface, which defines two methods:
ProvisionSteps()— returns the list of provisioning steps (provision.Step[T]) executed when a new machine is requested.Deprovision()— invoked when a machine should be removed.
ProvisionSteps
Provisioning steps are defined using provision.NewStep(),
where the first argument is the step name, and the second is a callback function.
Each successful step runs once before moving to the next.
If a step returns an error, it is retried only when the corresponding MachineRequest changes.
Although steps may be blocking, keep in mind that provisioning and deprovisioning share a limited worker pool.
The pool size can be configured via WithConcurrency(N) in the infra.Run call.
For long-running or polling operations, return provision.NewRetryInterval(time.Duration)
to recheck progress periodically instead of blocking.
Each step callback receives:
context.Context— for cancellation.zap.Logger— preconfigured with contextual fields for the current machine request.provision.Context[T]— provides access to state and utilities needed during provisioning.
Example: Defining Steps
Suppose you have a provisioner with a client for your platform:Step 1: Create the schematic
A schematic is generated to facilitate the download of the installation media. During the image upload process to the provider, the schematic ID is referenced to construct the image download URL. When creating the schematic, additional customizations can be applied to the image, such as including system extensions, specifying kernel arguments, or defining other configuration parameters.Step 2: Upload the Talos image
This is platform-specific. Talos provides images for different platforms — see the Image Factory for options. In this example, we generate the image factory URL using the schematic ID and Talos version, then compute a SHA-256 hash for deduplication when storing images.Step 3: Create the machine
In the last step we create the VM in the provider using the previously uploaded image.Deprovision
Deprovisioning removes created VMs and associated volumes.
If the ISO image is shared across multiple machines, it can be retained.
There is currently no automatic garbage collection for unused ISO images.
The Generic Type T in provision.Step
T is the generic that should implement COSI resource.Resource to make it possible to store it in the state.
It typically mirrors internal Omni resources and allows the provider to persist state between steps.
For example, you can store volume names or other generated IDs during provisioning,
then access it later in Deprovision.
T is available through pctx.State in the provision.Step callbacks,
and as the third argument in the Deprovision call.
Machine Connection to Omni
There are two main ways a machine can connect back to Omni:Using a schematic with embedded kernel args
Generate the schematic without additional options.Using external join configuration
Supply the join config via nocloud data or metadata service.CreateSchematicID should have the provision.WithoutConnectionParams option to exclude the join config.
KernelArgs and machine join config are stored in pctx.ConnectionParams. No need to generate them.
GenerateSchematicID Options
provision.WithoutConnectionParams— excludes connection parameters from kernel args. It’s a good idea to use withinfra.WithEncodeRequestIDsIntoTokens.provision.WithExtraExtensions— adds additional extensions.provision.WithMetaValues— injects metadata values.provision.WithExtraKernelArgs— adds kernel arguments.provision.WithOverlay— adds overlay configuration.
provider.Run Options
infra.WithClientOptions— customizes Omni client configuration.infra.WithImageFactoryClient— overrides the image factory client.infra.WithConcurrency— sets concurrency (default: 1).infra.WithOmniEndpoint— specifies the Omni API endpoint (same as--advertised-api-url).infra.WithState— uses a direct COSI state interface (advanced usage).infra.WithHealthCheckFunc— registers a custom health check (displayed in the Omni UI).infra.WithHealthCheckInterval— customizes health check frequency.infra.WithEncodeRequestIDsIntoTokens— encodes machine request IDs into join tokens. Must be paired with provision.WithoutConnectionParams.
V2 Join Tokens
Omni uses V2 tokens for machine authentication. These tokens contain a signed JSON payload, encoded in Base64. Omni verifies the signature to ensure authenticity. V2 tokens allow embedding machine request IDs directly into the join token, enabling immediate mapping between a machine and itsMachineRequest.
That’s enabled by infra.WithEncodeRequestIDsIntoTokens option in the provider.Run.
provision.Context Reference
GetRequestID() string— returns theMachineRequestID.GetTalosVersion() string— returns the Talos version used for the installation media.SetMachineUUID(id string)— records the created machine’s UUID (optional if encoding IDs in tokens).UnmarshalProviderData(dest any) error— parses provider-specific configuration from JSON.CreateConfigPatch(ctx, name, data)— adds configuration patches for the machine.GenerateSchematicID(ctx, logger, opts...)— invokes the image factory to create a schematic and returns its ID.
Provider Data
Provider data is a JSON-encoded field in theMachineRequest that contains provider-specific configuration parameters.
When a provider starts, it registers its schema with Omni.
Omni uses this schema to render UI forms and validate MachineRequest objects.
Best Practices
- Avoid generating unique images per machine.
- Use the image factory to build base images and upload them as part of the provisioning flow.
- Prefer
provision.WithoutConnectionParamswithinfra.WithEncodeRequestIDsIntoTokensto reduce image count and accelerate provisioning. - Inject connection parameters via join configs or kernel args.
- Use
provision.NewRetryInterval()for polling instead of blocking operations — this enables concurrency without requiring highWithConcurrency(N)settings.