In this guide we’ll follow the procedure to support NVIDIA GPU using OSS drivers on Talos.
Enabling NVIDIA GPU support on Talos is bound by NVIDIA EULA. The Talos published NVIDIA OSS drivers are bound to a specific Talos release. The extensions versions also needs to be updated when upgrading Talos.We will be using the following NVIDIA OSS system extensions:
nvidia-open-gpu-kernel-modules
nvidia-container-toolkit
Make sure the driver version matches for both thenvidia-open-gpu-kernel-modules
andnvidia-container-toolkit
extensions. Thenvidia-open-gpu-kernel-modules
extension is versioned as<nvidia-driver-version>-<talos-release-version>
and thenvidia-container-toolkit
extension is versioned as<nvidia-driver-version>-<nvidia-container-toolkit-version>
.
nvidia.ko
, nvidia-modeset.ko
, nvidia-uvm.ko
, nvidia-drm.ko
, and nvidia-peermem.ko
.
Two “flavors” of these kernel modules are provided, and both are available for use within Talos:
gpu-worker-patch.yaml
:
RuntimeClass
Apply the following manifest to create a runtime class that uses the extension:
nvidia
Do note that this will set the default runtime class to nvidia
for all pods scheduled on the node.
Create a patch yaml nvidia-default-runtimeclass.yaml
to update the machine config similar to below:
Note theRun the following command to test the runtime class:spec.runtimeClassName
being explicitly set tonvidia
in the pod spec.