In this guide we’ll follow the procedure to support NVIDIA GPU using proprietary drivers on Talos.
Enabling NVIDIA GPU support on Talos is bound by NVIDIA EULA. The Talos published NVIDIA drivers are bound to a specific Talos release. The extensions versions also needs to be updated when upgrading Talos.We will be using the following NVIDIA system extensions:
nonfree-kmod-nvidia
nvidia-container-toolkit
To build a NVIDIA driver version not published by SideroLabs follow the instructions hereCreate the boot assets which includes the system extensions mentioned above (or create a custom installer and perform a machine upgrade if Talos is already installed).
Make sure the driver version matches for both thenonfree-kmod-nvidia
andnvidia-container-toolkit
extensions. Thenonfree-kmod-nvidia
extension is versioned as<nvidia-driver-version>-<talos-release-version>
and thenvidia-container-toolkit
extension is versioned as<nvidia-driver-version>-<nvidia-container-toolkit-version>
.
nvidia.ko
, nvidia-modeset.ko
, nvidia-uvm.ko
, nvidia-drm.ko
, and nvidia-peermem.ko
.
Two “flavors” of these kernel modules are provided, and both are available for use within Talos:
gpu-worker-patch.yaml
:
RuntimeClass
Apply the following manifest to create a runtime class that uses the extension:
nvidia
Do note that this will set the default runtime class to nvidia
for all pods scheduled on the node.
Create a patch yaml nvidia-default-runtimeclass.yaml
to update the machine config similar to below:
Note theRun the following command to test the runtime class:spec.runtimeClassName
being explicitly set tonvidia
in the pod spec.