> ## Documentation Index
> Fetch the complete documentation index at: https://docs.siderolabs.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Upgrading Kubernetes

> Guide on how to upgrade the Kubernetes cluster from Talos Linux.

export const k8s_release = '1.36.0';

export const k8s_prev_release = '1.35.0';

This guide covers upgrading Kubernetes on Talos Linux clusters.

For a list of Kubernetes versions compatible with each Talos release, see the [Support Matrix](../../talos/v1.10/getting-started/support-matrix).

For upgrading the Talos Linux operating system, see [Upgrading Talos](../../talos/v1.10/configure-your-talos-cluster/lifecycle-management/upgrading-talos)

## Video walkthrough

To see a demo of this process, watch this video:

<iframe width="560" height="315" src="https://www.youtube.com/embed/uOKveKbD8MQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

## Automated Kubernetes upgrade

The recommended method to upgrade Kubernetes is to use the `talosctl upgrade-k8s` command.
This will automatically update the components needed to upgrade Kubernetes safely.
Kubernetes upgrades are non-disruptive from Talos, but Kubelet upgrades may cause workloads to restart.

To trigger a Kubernetes upgrade, issue a command specifying the version of Kubernetes to upgrade to, such as:

`talosctl --nodes <controlplane node> upgrade-k8s --to ${k8s_release}`

Note that the `--nodes` parameter specifies the control plane node to send the API call to, but all members of the cluster will be upgraded.

To check what will be upgraded you can run `talosctl upgrade-k8s` with the `--dry-run` flag:

<CodeBlock lang="sh">
  {`
    $ talosctl --nodes <controlplane node> upgrade-k8s --to ${k8s_release} --dry-run
    WARNING: found resources which are going to be deprecated/migrated in the version ${k8s_release}
    RESOURCE                                                               COUNT
    validatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io   4
    mutatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io     3
    customresourcedefinitions.v1beta1.apiextensions.k8s.io                 25
    apiservices.v1beta1.apiregistration.k8s.io                             54
    leases.v1beta1.coordination.k8s.io                                     4
    automatically detected the lowest Kubernetes version ${k8s_prev_release}
    checking for resource APIs to be deprecated in version ${k8s_release}
    discovered controlplane nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
    discovered worker nodes ["172.20.0.5" "172.20.0.6"]
    updating "kube-apiserver" to version "${k8s_release}"
    > "172.20.0.2": starting update
    > update kube-apiserver: ${k8s_prev_release} -> ${k8s_release}
    > skipped in dry-run
    > "172.20.0.3": starting update
    > update kube-apiserver: ${k8s_prev_release} -> ${k8s_release}
    > skipped in dry-run
    > "172.20.0.4": starting update
    > update kube-apiserver: ${k8s_prev_release} -> ${k8s_release}
    > skipped in dry-run
    updating "kube-controller-manager" to version "${k8s_release}"
    > "172.20.0.2": starting update
    > update kube-controller-manager: ${k8s_prev_release} -> ${k8s_release}
    > skipped in dry-run
    > "172.20.0.3": starting update

    <snip>

    updating manifests
    > apply manifest Secret bootstrap-token-3lb63t
    > apply skipped in dry run
    > apply manifest ClusterRoleBinding system-bootstrap-approve-node-client-csr
    > apply skipped in dry run
    <snip>
    `}
</CodeBlock>

To upgrade Kubernetes from `${k8s_prev_release}` to v`${k8s_release}` run:

<CodeBlock lang="sh">
  {`
    $ talosctl --nodes <controlplane node> upgrade-k8s --to ${k8s_release}
    automatically detected the lowest Kubernetes version ${k8s_prev_release}
    checking for resource APIs to be deprecated in version ${k8s_release}
    discovered controlplane nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
    discovered worker nodes ["172.20.0.5" "172.20.0.6"]
    updating "kube-apiserver" to version "${k8s_release}"
    > "172.20.0.2": starting update
    > update kube-apiserver: ${k8s_prev_release} -> ${k8s_release}
    > "172.20.0.2": machine configuration patched
    > "172.20.0.2": waiting for API server state pod update
    < "172.20.0.2": successfully updated
    > "172.20.0.3": starting update
    > update kube-apiserver: ${k8s_prev_release} -> ${k8s_release}
    <snip>
    `}
</CodeBlock>

This command runs in several phases:

1. Images for new Kubernetes components are pre-pulled to the nodes to minimize downtime and test for image availability.
2. Every control plane node machine configuration is patched with the new image version for each control plane component.
   Talos renders new static pod definitions on the configuration update which is picked up by the kubelet.
   The command waits for the change to propagate to the API server state.
3. The command updates the `kube-proxy` daemonset with the new image version.
4. On every node in the cluster, the `kubelet` version is updated.
   The command then waits for the `kubelet` service to be restarted and become healthy.
   The update is verified by checking the `Node` resource state.
5. Kubernetes bootstrap manifests are re-applied to the cluster.
   Updated bootstrap manifests might come with a new Talos version (e.g. CoreDNS version update), or might be the result of machine configuration change.
6. Unneeded resources are pruned from the cluster (Talos v1.13+).

If the command fails for any reason, it can be safely restarted to continue the upgrade process from the moment of the failure.

> Note: When using custom/overridden Kubernetes component images, use flags `--*-image` to override the default image names.

### Kubernetes manifest synchronization

A Talos version change might also cause changes in the bootstrap manifests.
The updated bootstrap manifests, inline manifests and extra manifests are applied to the cluster as part of the Kubernetes upgrade process.

If any services were deleted or disabled, the resources associated with them will be pruned.
For example if [kube-proxy](https://docs.siderolabs.com/talos/latest/reference/configuration/v1alpha1/config#proxy) was disabled in the machine configuration the resources associated with it would be deleted at this stage.
Pruning can be disabled by passing the `--manifests-no-prune` flag.

> Note: Pruning is supported from Talos and talosctl v1.13 onwards.

From Talos v1.13 onwards, all Kubernetes manifests are applied via [Kubernetes Server-Side Apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/).
Talos forces ownership of all fields it applies, even if they have a different field manager.

If you wish to manage a resource previously applied by Talos, you need to take the following steps:

1. Remove the inline manifest or extra manifest entry from the machine configuration.
   After this step you can run `talosctl upgrade-k8s --dry-run --to <in-cluster-k8s-version>`.
   Resources affected will be marked for pruning in the output.
2. Remove the resource entries from the talos tracking inventory configmap data block:

```bash theme={null}
kubectl edit cm --namespace kube-system talos-bootstrap-manifests-inventory
```

3. Remove the `config.k8s.io/owning-inventory` annotation from the resources.
4. Re-apply the resource with your own field manager or remove the `metadata.managedFields` manually.

### Synchronize declared and deployed configurations

When Kubernetes is upgraded with `talosctl upgrade-k8s`, the live machine configuration on your nodes is updated with new component image versions (API server, controller manager, scheduler, kube-proxy, etc.).

If you are storing full machine configuration files (`controlplane.yaml`, `worker.yaml`) in Git, these versions will drift out of sync.
Re-applying those stale files later could unintentionally downgrade components.

That is why we do not recommend storing full machine configurations.

Version numbers for Talos, etcd, Kubernetes components, and add-ons change frequently. Maintaining these across upgrades requires editing many fields by hand.

See the [Reproducible Machine Configuration](../../talos/v1.10/configure-your-talos-cluster/system-configuration/reproducible-machine-configuration) guide for full instructions on handling machine configurations after version bumps.

## Manual Kubernetes upgrade

Kubernetes can be upgraded manually by following the steps outlined below.
They are equivalent to the steps performed by the `talosctl upgrade-k8s` command.

### Kubeconfig

In order to edit the control plane, you need a working `kubectl` config.
If you don't already have one, you can get one by running:

```bash theme={null}
talosctl --nodes <controlplane node> kubeconfig
```

### API server

Patch machine configuration using `talosctl patch` command:

```bash theme={null}
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "registry.k8s.io/kube-apiserver:v${k8s_release}"}]'
patched mc at the node 172.20.0.2
```

The JSON patch might need to be adjusted if current machine configuration is missing `.cluster.apiServer.image` key.

Also the machine configuration can be edited manually with `talosctl -n <IP>  edit mc --mode=no-reboot`.

Capture the new version of `kube-apiserver` config with:

```bash theme={null}
$ talosctl -n <CONTROL_PLANE_IP_1> get apiserverconfig -o yaml
node: 172.20.0.2
metadata:
    namespace: controlplane
    type: APIServerConfigs.kubernetes.talos.dev
    id: kube-apiserver
    version: 5
    owner: k8s.ControlPlaneAPIServerController
    phase: running
spec:
    image: registry.k8s.io/kube-apiserver:v${k8s_release}
    cloudProvider: ""
    controlPlaneEndpoint: https://172.20.0.1:6443
    etcdServers:
        - https://localhost:2379
    localPort: 6443
    serviceCIDR:
        - 10.96.0.0/12
    extraArgs: {}
    extraVolumes: []
    environmentVariables: {}
    podSecurityPolicyEnabled: false
    advertisedAddress: $(POD_IP)
    resources:
        requests:
            cpu: ""
            memory: ""
        limits: {}
```

In this example, the new version is `5`.
Wait for the new pod definition to propagate to the API server state (replace `talos-default-controlplane-1` with the node name):

```bash theme={null}
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
5
```

Check that the pod is running:

```bash theme={null}
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-controlplane-1
NAME                                    READY   STATUS    RESTARTS   AGE
kube-apiserver-talos-default-controlplane-1   1/1     Running   0          16m
```

Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.

### Controller manager

Patch machine configuration using `talosctl patch` command:

```bash theme={null}
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/controllerManager/image", "value": "registry.k8s.io/kube-controller-manager:v${k8s_release}"}]'
patched mc at the node 172.20.0.2
```

The JSON patch might need be adjusted if current machine configuration is missing `.cluster.controllerManager.image` key.

Capture new version of `kube-controller-manager` config with:

```bash theme={null}
$ talosctl -n <CONTROL_PLANE_IP_1> get controllermanagerconfig -o yaml
node: 172.20.0.2
metadata:
    namespace: controlplane
    type: ControllerManagerConfigs.kubernetes.talos.dev
    id: kube-controller-manager
    version: 3
    owner: k8s.ControlPlaneControllerManagerController
    phase: running
spec:
    enabled: true
    image: registry.k8s.io/kube-controller-manager:v${k8s_release}
    cloudProvider: ""
    podCIDRs:
        - 10.244.0.0/16
    serviceCIDRs:
        - 10.96.0.0/12
    extraArgs: {}
    extraVolumes: []
    environmentVariables: {}
    resources:
        requests:
            cpu: ""
            memory: ""
        limits: {}
```

In this example, new version is `3`.
Wait for the new pod definition to propagate to the API server state (replace `talos-default-controlplane-1` with the node name):

```bash theme={null}
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
3
```

Check that the pod is running:

```bash theme={null}
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-controlplane-1
NAME                                             READY   STATUS    RESTARTS   AGE
kube-controller-manager-talos-default-controlplane-1   1/1     Running   0          35m
```

Repeat this process for every control plane node, verifying that state propagated successfully between each node update.

### Scheduler

Patch machine configuration using `talosctl patch` command:

```bash theme={null}
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/scheduler/image", "value": "registry.k8s.io/kube-scheduler:v${k8s_release}"}]'
patched mc at the node 172.20.0.2
```

JSON patch might need be adjusted if current machine configuration is missing `.cluster.scheduler.image` key.

Capture new version of `kube-scheduler` config with:

```bash theme={null}
$ talosctl -n <CONTROL_PLANE_IP_1> get schedulerconfig -o yaml
node: 172.20.0.2
metadata:
    namespace: controlplane
    type: SchedulerConfigs.kubernetes.talos.dev
    id: kube-scheduler
    version: 3
    owner: k8s.ControlPlaneSchedulerController
    phase: running
    created: 2024-11-06T12:37:22Z
    updated: 2024-11-06T12:37:20Z
spec:
    enabled: true
    image: registry.k8s.io/kube-scheduler:v${k8s_release}
    extraArgs: {}
    extraVolumes: []
    environmentVariables: {}
    resources:
        requests:
            cpu: ""
            memory: ""
        limits: {}
    config: {}
```

In this example, new version is `3`.
Wait for the new pod definition to propagate to the API server state (replace `talos-default-controlplane-1` with the node name):

```bash theme={null}
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
3
```

Check that the pod is running:

```bash theme={null}
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-controlplane-1
NAME                                    READY   STATUS    RESTARTS   AGE
kube-scheduler-talos-default-controlplane-1   1/1     Running   0          39m
```

Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.

### Proxy

In the proxy's `DaemonSet`, change:

```yaml theme={null}
kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: kube-proxy
          image: registry.k8s.io/kube-proxy:v${k8s_prev_release}
      tolerations:
        - ...
```

to:

```yaml theme={null}
kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: kube-proxy
          image: registry.k8s.io/kube-proxy:v${k8s_release}
      tolerations:
        - ...
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule
```

To edit the `DaemonSet`, run:

```bash theme={null}
kubectl edit daemonsets -n kube-system kube-proxy
```

### Bootstrap manifests

Bootstrap manifests can be retrieved in a format which works for `kubectl` with the following command:

```bash theme={null}
talosctl -n <controlplane IP> get manifests -o yaml | yq eval-all '.spec | .[] | splitDoc' - > manifests.yaml
```

Diff the manifests with the cluster:

```bash theme={null}
kubectl diff -f manifests.yaml
```

Apply the manifests:

```bash theme={null}
kubectl apply -f manifests.yaml
```

> Note: if some bootstrap resources were removed, they have to be removed from the cluster manually.

### kubelet

For every node, patch machine configuration with new kubelet version, wait for the kubelet to restart with new version:

```bash theme={null}
$ talosctl -n <IP> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v${k8s_release}"}]'
patched mc at the node 172.20.0.2
```

Once `kubelet` restarts with the new configuration, confirm upgrade with `kubectl get nodes <name>`:

```bash theme={null}
$ kubectl get nodes talos-default-controlplane-1
NAME                           STATUS   ROLES                  AGE    VERSION
talos-default-controlplane-1   Ready    control-plane          123m   v${k8s_release}
```
