Dynamic Resource Allocation (DRA) with NVIDIA virtualized GPU (vGPU) on AKS
Recently, dynamic resource allocation (DRA) has emerged as the standard mechanism to consume GPU resources in Kubernetes. With DRA, accelerators like GPUs are no longer exposed as static extended resources (for example, nvidia.com/gpu) but are dynamically allocated through DeviceClasses and ResourceClaims. This unlocks richer scheduling semantics and better integration with virtualization technologies like NVIDIA vGPU.
Virtual accelerators such as NVIDIA vGPU are commonly used for smaller workloads because they allow a single physical GPU to be securely partitioned across multiple tenants or apps. This is especially valuable for enterprise AI/ML development environments, fine-tuning, and audio/visual processing. vGPU enables predictable performance profiles while still exposing CUDA capabilities to containerized workloads.
On Azure, the NVadsA10_v5 virtual machine (VM) series is backed by the physical NVIDIA A10 GPU in the host and offers this resource model. Instead of assigning the entire GPU to a single VM, the vGPU technology is used to partition the GPU into multiple fixed-size slices at the hypervisor layer.
In this post, we’ll walk through enabling the NVIDIA DRA driver on a node pool backed by an NVadsA10_v5 series vGPU on Azure Kubernetes Service (AKS).

Prepare your AKS cluster
Verify DRA is enabled
Starting with your AKS cluster running Kubernetes version 1.34 or above, you can confirm whether DRA is enabled on your cluster by looking for deviceclasses and resourceslices.
Check deviceclasses via kubectl get deviceclasses or check resourceslices via kubectl get resourceslices.
At this point, the results for both commands should look similar to:
No resources found
If DRA isn't enabled on your cluster (for example, if it is running an earlier Kubernetes version than 1.34), you may instead see an error like:
error: the server doesn't have a resource type "deviceclasses"/"resourceslices"
Add a vGPU node pool and label your nodes
Add a GPU node pool and specify an Azure virtual machine (VM) size which supports virtualized accelerator workloads, such as NVadsA10_v5 series. These VM sizes are specifically designed for virtualized GPU scenarios, where each node receives a slice of a physical NVIDIA A10 rather than the entire card.
az aks nodepool add \
--resource-group <resource-group> \
--cluster-name <aks-cluster-name> \
--name gpunodepool1 \
--node-count 2 \
--node-vm-size Standard_NV6ads_A10_v5
The NVIDIA DRA kubelet plugin runs as a DaemonSet and requires specific node labels, such as nvidia.com/gpu.present=true.
Today, AKS GPU nodes already include accelerator=nvidia, so we'll use this selector to apply the required label:
kubectl get nodes -l accelerator=nvidia -o name | \
xargs -I{} kubectl label --overwrite {} nvidia.com/gpu.present=true
You can expect a similar output to the following:
node/aks-gpunodepool1-12345678-vmss000000 labeled
node/aks-gpunodepool1-12345678-vmss000001 labeled
Install the NVIDIA DRA driver
The recommended way to install the driver is via Helm. Ensure you have Helm updated to the correct version.
Add the Helm chart that contains the DRA driver.
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update
Now, install the NVIDIA DRA driver version 25.8.1:
helm --install nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \
--version="25.8.1" \
--create-namespace \
--namespace nvidia-dra-driver-gpu \
--set "resources.gpus.enabled=true" \
--set "gpuResourcesEnabledOverride=true" \
--set "controller.nodeSelector=null" \
--set "controller.tolerations[0].key=CriticalAddonsOnly" \
--set "controller.tolerations[0].operator=Exists" \
--set "controller.affinity=null" \
--set "featureGates.IMEXDaemonsWithDNSNames=false"
Confirm that the following pods are running:
kubectl get pods -n nvidia-dra-driver-gpu
You can expect a similar output to the following:
NAME READY STATUS RESTARTS
nvidia-dra-controller-xxxxx 1/1 Running 0
nvidia-dra-kubelet-plugin-aks-gpunodepool1-xxxxx 1/1 Running 0
At this stage, the NVIDIA DRA driver scans the node, detects the single vGPU device exposed by the Azure VM, and publishes it to the Kubernetes control plane as a DRA-managed device. Even though the underlying hardware is a shared A10, the driver registers one allocatable device on each node because that is what the VM presents.
Why do these Helm settings matter?
Let's walk through some of the DRA Helm chart settings set earlier for vGPU:
-
resources.gpus.enabled=trueStandard DRA workloads request devices via the
gpu.nvidia.comdevice class, soresources.gpus.enabled=trueis needed for GPU-accelerated workloads to schedule on these A10 nodes. -
gpuResourcesEnabledOverride=trueThe Helm chart includes a validation guard to prevent collisions between the NVIDIA DRA driver (
gpu.nvidia.com) and legacy NVIDIA device plugin (nvidia.com/gpuextended resource). Since we are running DRA exclusively (with no legacy device plugin), we bypass the validation:gpuResourcesEnabledOverride=trueto ensure the chart installs successfully. -
featureGates.IMEXDaemonsWithDNSNames=falseThis feature gate is enabled by default and requires NVIDIA GRID GPU driver version >=
570.158.01. For Azure VM sizes, like the A10 series, that require the distinct GRID driver550branch today, we explicitly setfeatureGates.IMEXDaemonsWithDNSNames=falseto disable IMEX for this GPU size.
Verify DeviceClass and ResourceSlices
After deployment, confirm that the gpu.nvidia.com DeviceClass exists:
kubectl get deviceclasses
Expected output:
NAME DRIVER
gpu.nvidia.com nvidia.com/dra
Check ResourceSlices:
kubectl get resourceslices
Expected output:
NAME NODE
gpu-aks-gpunodepool1-xxxxx-0 aks-gpunodepool1-xxxxx-vmss000000
Now, we’ve confirmed that the DRA driver discovered and published our vGPU-backed resources, and the nodes are ready to accept workloads! You can follow these steps to run a sample workload using the DRA specifications.
vGPU options in Azure
Beyond Standard_NV6ads_A10_v5 (one-sixth of an A10 GPU), this VM series offers larger fractional profiles:
Standard_NV12ads_A10_v5: one-third of a physical A10, with 8 GB of accelerator memoryStandard_NV18ads_A10_v5: one-half of a physical A10, with 12 GB of accelerator memory
These sizes (and more) in the NVadsA10_v5 VM series map to a fixed NVIDIA vGPU profile and the fraction determines how much GPU memory and compute capacity (serial multiprocessors) the VM receives. The limits are enforced at the hypervisor layer, so AKS ultimately sees a single GPU device with predictable, guaranteed capacity.
Looking ahead
As GPUs become first-class resources in Kubernetes, combining virtualized GPU with DRA provides a practical way to run shared, production-grade workloads on AKS. vGPU supported Azure VM series offer partial GPUs for scenarios such as media rendering and transcoding and fine-tuning small to medium language models, while DRA ensures those resources are allocated explicitly and scheduled with awareness of real cluster state.
For large AKS deployments, especially in regulated or cost-sensitive industries, getting GPU placement and utilization right directly affects job throughput and infrastructure efficiency. Using DRA with vGPU will enable organizations to move beyond coarse node-level allocation toward controlled, workload-driven GPU consumption at scale.

