The NVIDIA GPU Operator manages NVIDIA GPU resources in a OpenShift Container Platform cluster and automates tasks related to bootstrapping GPU nodes. Because the GPU is a special resource in the cluster, you must install some components before you can deploy application workloads to the GPU. These components include the NVIDIA drivers that enable the compute unified device architecture (CUDA), Kubernetes device plugin, container runtime, and other features such as automatic node labeling, monitoring, and more.
|
The NVIDIA GPU Operator is supported only by NVIDIA. For more information about obtaining support from NVIDIA, see Obtaining Support from NVIDIA.
|
There are two ways to enable GPUs with OpenShift Container Platform OpenShift Virtualization: the OpenShift Container Platform-native way described here and by using the NVIDIA GPU Operator.
The NVIDIA GPU Operator is a Kubernetes Operator that uses OpenShift Container Platform OpenShift Virtualization to provision GPUs for virtualized workloads running on OpenShift Container Platform. With the Operator, you can easily provision and manage GPU-enabled virtual machines to run complex artificial intelligence/machine learning (AI/ML) workloads on the same platform as their other workloads. The Operator also provides an easy way to scale the GPU capacity of their infrastructure, enabling rapid growth of GPU-based workloads.