> ## Documentation Index
> Fetch the complete documentation index at: https://docs.siderolabs.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Production Clusters

> Recommendations for setting up a Talos Linux cluster in production.

export const VersionWarningBanner = () => {
  const latestVersion = "v1.13";
  const [latestUrl, setLatestUrl] = useState(null);
  const [currentVersion, setCurrentVersion] = useState(null);
  const [isBeta, setIsBeta] = useState(false);
  const parseVersion = v => v.replace("v", "").split(".").map(Number);
  const isGreaterVersion = (a, b) => {
    const [aMajor, aMinor] = parseVersion(a);
    const [bMajor, bMinor] = parseVersion(b);
    if (aMajor > bMajor) return true;
    if (aMajor === bMajor && aMinor > bMinor) return true;
    return false;
  };
  useEffect(() => {
    if (typeof window === "undefined") return;
    const {pathname, hash, search} = window.location;
    const match = pathname.match(/\/talos\/(v\d+\.\d+)\//);
    if (!match) return;
    const detectedVersion = match[1];
    if (detectedVersion === latestVersion) return;
    setCurrentVersion(detectedVersion);
    if (isGreaterVersion(detectedVersion, latestVersion)) {
      setIsBeta(true);
    }
    const newPath = pathname.replace(`/talos/${detectedVersion}/`, `/talos/${latestVersion}/`);
    setLatestUrl(`${newPath}${search}${hash}`);
  }, []);
  if (!latestUrl || !currentVersion) return null;
  return <div className="not-prose sticky top-6 z-50 my-6">
      <div className="border border-yellow-500/30 bg-yellow-500/10 px-4 py-3 rounded-xl">
        <div className="text-sm">
          {isBeta ? <>
              ⚠️ You are viewing a <strong>beta version</strong> of Talos ({currentVersion}).
              This version may be unstable.
              <a href={latestUrl} className="ml-2 underline text-yellow-400 hover:text-yellow-300 font-medium">
                View latest stable version {latestVersion} →
              </a>
            </> : <>
              ⚠️ You are viewing an older version of Talos ({currentVersion}).
              <a href={latestUrl} className="ml-2 underline text-yellow-400 hover:text-yellow-300 font-medium">
                View the latest version {latestVersion} →
              </a>
            </>}
        </div>
      </div>
    </div>;
};

<VersionWarningBanner />

This guide explains things to consider to create a production quality Talos Linux cluster for bare metal.
Check out the [Reference Architecture documentation](https://www.siderolabs.com/resources/) for architectural diagrams and guidance on creating production-grade clusters in other environments.

This guide assumes that you’ve already created a development cluster and are familiar with the **Getting Started** documentation.
If not, please refer to the [Getting Started](./getting-started) guide for more information.

When moving from a learning environment to a production-ready Talos Linux cluster, you have to consider several critical factors:

* High availability for your control plane nodes.
* Secure configuration management.
* Reliability for continuous service and minimal downtime.
* Authentication for access control.

Follow the steps below to build a production-grade Talos cluster that is highly available, reliable, and secure.

**Note**: Check out [Omni](https://www.siderolabs.com/omni-signup/) for managing large-scale Talos Linux clusters automatically.

## Step 1: Prepare your infrastructure

To create your production cluster infrastructure:

1. Boot your machines using the Talos ISO image
2. Ensure network access on your nodes.

Here is how to do each step:

### Boot your machines using the Talos ISO image

Follow these steps to boot your machines using the Talos ISO image:

1. Download the latest ISO for your infrastructure depending on the hardware type from the [Talos Image factory](https://factory.talos.dev/).

   **Note**: For network booting and self-built media using published kernel there are a number of required kernel parameters.
   Please see the [kernel docs](../reference/kernel) for more information.

2. Boot three control planes using the ISO image you just downloaded.

3. Boot additional machines as worker nodes.

### Ensure network access

If your nodes are behind a firewall, in a private network, or otherwise not directly reachable, you would need to configure a load balancer to forward TCP port 50000 to reach the nodes for Talos API access.

**Note**: Because the Talos Linux API uses gRPC and mutual TLS, it cannot be proxied by a HTTP/S proxy, but only by a TCP load balancer.

With your control plane and worker nodes booted, next configure your Kubernetes endpoint.

## Step 2: Store your IP addresses in a variable

To store variables for your machines’ IP addresses:

1. Copy the IP address displayed on each machine console, including the control plane and any worker nodes you’ve created.

   If you don’t have a display connected, retrieve the IP addresses from your DHCP server.

2. Create a Bash array for your control plane node IP addresses, replacing each `<control-plane-ip>` placeholder with the IP address of a control plane node.
   You can include as many IP addresses as needed:

   ```bash theme={null}
   CONTROL_PLANE_IP=("<control-plane-ip-1>" "<control-plane-ip-2>" "<control-plane-ip-3>")
   ```

   **For example**:

   If your control plane nodes IP addresses are `192.168.0.2`, `192.168.0.3`, `192.168.0.4`, your command would be:

   ```bash theme={null}
   CONTROL_PLANE_IP=("192.168.0.2" "192.168.0.3" "192.168.0.4")
   ```

3. If you have worker nodes, store their IP addresses in a Bash array.
   Replace each `<worker-ip>` placeholder with the actual IP address of a worker node.
   You can include as many IP addresses as needed:

   ```bash theme={null}
   WORKER_IP=("<worker-ip-1>" "<worker-ip-2>" "<worker-ip-3>")
   ```

## Step 3: Decide your Kubernetes endpoint

You've set up multiple control planes for high availability, but they only provide true high availability if the Kubernetes API server endpoint can reach all control plane nodes.

Here are two common ways to configure this:

* **Dedicated load balancer**: Set a dedicated load balancer that routes to your control plane nodes.
* **DNS records**: Create multiple DNS records that point to all your control plane nodes

With these, you can pass in one IP address or DNS name during setup that route to all your control plane nodes.

Here is how you can configure each option:

### Dedicated load balancer

If you're using a cloud provider or have your own load balancer (such as HAProxy, an NGINX reverse proxy, or an F5 load balancer), setting up a dedicated load balancer is a natural choice.

It is also important to note that if you [created the cluster with Omni](https://omni.siderolabs.com/tutorials/getting_started), Omni will automatically be a load balancer for your Kubernetes endpoint.

Configure a frontend to listen on TCP port 6443 and direct traffic to the addresses of your Talos control plane nodes.

Your Kubernetes endpoint will be the IP address or DNS name of the load balancer's frontend, with the port appended, for example, `https://myK8s.mydomain.io:6443`.

**Note**: You cannot use a HTTP load balancer, because the Kubernetes API server handles TLS termination and mutual TLS authentication.

### DNS records

Additionally, you can configure your Kubernetes endpoint using DNS records.
Simply, add multiple A or AAAA records, one for each control plane, to a DNS name.

For example, you can add:

```url theme={null}
kube.cluster1.mydomain.com  IN  A  192.168.0.10
kube.cluster1.mydomain.com  IN  A  192.168.0.11
kube.cluster1.mydomain.com  IN  A  192.168.0.12
```

Then your endpoint would be:

```url theme={null}
https://kube.cluster1.mydomain.com:6443
```

> **Note**: This endpoint serves as a fallback if [KubePrism](../../../kubernetes-guides/advanced-guides/kubeprism) becomes unavailable, for example, after a reboot. It also ensures that Talos nodes can still locate and communicate with the control plane if other discovery services are not available.

## Step 4: Save your endpoint in a variable

Set a variable to store the endpoint you chose in Step 3.
Replace `<your_endpoint>` placeholder with your actual endpoint:

```bash theme={null}
export YOUR_ENDPOINT=<your_endpoint>
```

## Step 5: Generate secrets bundle

The secrets bundle is a file that contains all the cryptographic keys, certificates, and tokens needed to secure your Talos Linux cluster.

To generate the secrets bundle, run:

```bash theme={null}
talosctl gen secrets -o secrets.yaml
```

## Step 6: Generate machine configurations

Follow these steps to generate machine configuration:

1. Set a variable for your cluster name by running the following command.
   Replace `<your_cluster_name>` with the name you want to give your cluster:

   ```bash theme={null}
   export CLUSTER_NAME=<your_cluster_name>
   ```

2. Run this command to generate your machine configuration files using your secrets bundle:

   ```bash theme={null}
   talosctl gen config --with-secrets secrets.yaml $CLUSTER_NAME https://$YOUR_ENDPOINT:6443
   ```

   <Note>
     The `talosctl` version (`talosctl version --client`) should match the Talos OS version installed on your nodes.
     If you are using a newer version of `talosctl` to generate configurations for an older Talos OS, use the `--talos-version` flag to ensure compatibility.
     For example, to generate a configuration compatible with Talos v1.13:

     ```bash theme={null}
     talosctl gen config --with-secrets secrets.yaml --talos-version v1.13 $CLUSTER_NAME https://$YOUR_ENDPOINT:6443
     ```

     Even when using `--talos-version`, review the generated configuration files to verify that the Kubernetes version is supported by your target Talos OS.
     Refer to the [support matrix](./support-matrix) to check which Kubernetes versions are compatible with your Talos version.
   </Note>

This command will generate three files:

* **controlplane.yaml**: Configuration for your control plane.
* **worker.yaml**: Configuration for your worker nodes.
* **talosconfig**: The `talosctl` configuration file used to connect to and authenticate with your cluster.

## Step 7: Unmount the ISO

Unplug your installation USB drive or unmount the ISO from all your control plane and worker nodes.
This prevents you from accidentally installing to the USB drive and makes it clearer which disk to select for installation.

## Step 8: Understand your nodes

The default machine configurations for control plane and worker nodes are typically sufficient to get your cluster running.
However, you may need to customize certain settings such as network interfaces and disk configurations depending on your specific environment.

Follow these steps to verify that your machine configurations are set up correctly:

1. **Check network interfaces**: Run this command to view all network interfaces on any node, whether control plane or worker.

   Replace `<node-ip-address>` with the IP of the node you want to inspect.

   **Note**: Copy the network ID with an Operational state (OPER) value of **up**.

   ```bash theme={null}
   talosctl --nodes <node-ip-address> get links --insecure
   ```

2. **Check Available Disks:** Run this command to check all available disks on any node.
   Replace `<node-ip-address>` with the IP address of the node you want to inspect:

   ```bash theme={null}
   talosctl get disks --insecure --nodes <node-ip-address>
   ```

3. **Verify Configuration Files:** Open your `worker.yaml` and `controlplane.yaml` configuration files in your preferred editor.
   Check that the values match your worker and control plane node's network and disk settings.
   If the values don't match, you'll need to update your machine configuration.

   **Note**: Refer to the [Talos CLI reference](../reference/cli) for additional commands to gather more information about your nodes and cluster.

## Step 9: Patch your machine configuration (optional)

You can patch your worker and control plane machine configuration to reflect the correct network interface and disk of your control plane nodes.

Follow these steps to patch your machine configuration:

1. Create patch files for the configurations you want to modify:

   ```bash theme={null}
   touch controlplane-patch-1.yaml # For patching the control plane nodes configuration
   touch worker-patch-1.yaml # For patching the worker nodes configuration
   ```

   **Note**: You don't have to create both patch files, only create patches for the configurations you actually need to modify.

   You can also create multiple patch files (e.g., `controlplane-patch-2.yaml`, `controlplane-patch-3.yaml`) if you want to make multiple subsequent patches to the same machine configuration.

2. Copy and paste this YAML block of code and add the correct hardware values to each patch file.

   For example, for `controlplane-patch-1` use the network interface and disk information you gathered from your control plane nodes:

   ```yaml theme={null}
   # controlplane-patch-1 file
   machine:
     network:
       interfaces:
         - interface: <control-plane-network-interface>  # From control plane node
           dhcp: true
     install:
       disk: /dev/<control-plane-disk-name> # From control plane node
   ```

   For `worker-patch-1.yaml`, use network interface and disk information from your worker nodes:

   ```yaml theme={null}
   # worker-patch-1.yaml file

   machine:
     network:
       interfaces:
         - interface: <worker-network-interface>  # From worker node
           dhcp: true
     install:
       disk: /dev/<worker-disk-name> # From worker node
   ```

3. Apply the different patch files for the different machine configuration:

   * **For control plane**:

   ```bash theme={null}
   talosctl machineconfig patch controlplane.yaml --patch @controlplane-patch-1.yaml --output controlplane.yaml
   ```

   * **For worker**:

   ```bash theme={null}
   talosctl machineconfig patch worker.yaml --patch @worker-patch-1.yaml --output worker.yaml
   ```

Additionally, you can learn more about [patches](../configure-your-talos-cluster/system-configuration/patching) from the configuration patches documentation.

## Step 10: Configure your multihomed machines

If your machines are multihomed, i.e., they have more than one IPv4 and/or IPv6 addresses other than loopback, then additional configuration is required.
Refer to [Multihoming](../networking/multihoming) for more information.

## Step 11: Apply the machine configuration

To apply your machine configuration:

1. Run this command to apply the `controlplane.yaml` configuration to your control plane nodes:

   ```bash theme={null}
   for ip in "${CONTROL_PLANE_IP[@]}"; do
     echo "=== Applying configuration to node $ip ==="
     talosctl apply-config --insecure \
       --nodes $ip \
       --file controlplane.yaml
     echo "Configuration applied to $ip"
     echo ""
   done
   ```

2. Run this command to apply the `worker.yaml`configuration to your worker node:

   ```bash theme={null}
   for ip in "${WORKER_IP[@]}"; do
     echo "=== Applying configuration to node $ip ==="
     talosctl apply-config --insecure \
       --nodes $ip \
       --file worker.yaml
     echo "Configuration applied to $ip"
     echo ""
   done
   ```

## Step 12: Manage your Talos configuration file

The `talosconfig` is your key to managing the Talos Linux cluster, without it, you cannot authenticate or communicate with your cluster nodes using `talosctl`.

You have two options for managing your `talosconfig`:

1. Merge your new `talosconfig` into the default configuration file located at `~/.talos/config`:

   ```bash theme={null}
   talosctl config merge ./talosconfig
   ```

2. Copy the configuration file to your `~/.talos` directory and set the `TALOSCONFIG` environment variable:

   ```bash theme={null}
   mkdir -p ~/.talos
   cp ./talosconfig ~/.talos/config
   export TALOSCONFIG=~/.talos/config
   ```

## Step 13: Set endpoints of your control plane nodes

Configure your endpoints to enable talosctl to automatically load balance requests and fail over between control plane nodes when individual nodes become unavailable.

Run this command to configure your endpoints.
Replace the placeholders `<control_plane_IP_1> <control_plane_IP_2> <control_plane_IP_3>` with the IP addresses of your control plane nodes:

```bash theme={null}
talosctl config endpoint <control_plane_IP_1> <control_plane_IP_2> <control_plane_IP_3>
```

**For example**:

If your control plane nodes IP addresses are `192.168.0.2`, `192.168.0.3`, `192.168.0.4`, your command would be:

```bash theme={null}
talosctl config endpoint 192.168.0.2 192.168.0.3 192.168.0.4
```

## Step 14: Bootstrap your Kubernetes cluster

Wait for your control plane nodes to finish booting, then bootstrap your etcd cluster by running the command below.

Replace the `<control-plane-IP>` placeholder with the IP address of ONE of your three control plane nodes:

```bash theme={null}
talosctl bootstrap --nodes <control-plane-IP>
```

**Note**: Run this command ONCE on a SINGLE control plane node.
If you have multiple control plane nodes, you can choose any of them.

## Step 15: Get Kubernetes access

Download your `kubeconfig` file to start using `kubectl` with your cluster.
These commands must be run against a single control plane node.

You have two options for managing your `kubeconfig`.
Replace `<control-plane-IP>` with the IP address of any one of your control plane nodes:

* Merge into your default `kubeconfig`:

```bash theme={null}
talosctl kubeconfig --nodes <control-plane-IP>
```

* Create a separate `kubeconfig` file:

```bash theme={null}
talosctl kubeconfig alternative-kubeconfig --nodes <control-plane-IP>
export KUBECONFIG=./alternative-kubeconfig

```

## Step 16: Verify your nodes are running

Run the command to ensure that your nodes are running:

```bash theme={null}
kubectl get nodes
```

## Next steps

Congratulations!
You now have a working production grade Talos Linux Kubernetes cluster.

### What's next?

* [Set up persistent storage](../../../kubernetes-guides/csi/storage)
* [Deploy a Metrics Server](../../../kubernetes-guides/monitoring-and-observability/deploy-metrics-server)
* [Explore the talosctl CLI reference](../reference/cli)
