Running more with less: Multi-instance GPU (MIG) with Dynamic Resource Allocation (DRA) on AKS
GPUs power a wide range of production Kubernetes workloads across industries. For example, media platforms rely on them for video encoding/transcoding, financial services firms run quantitative risk simulations, and research groups process and visualize large datasets. In each of these scenarios, GPUs significantly improve job throughput, yet individual workloads often consume only a portion of the available device.
By default, Kubernetes schedules GPUs as entire units; when a workload requires only a fraction of a GPU, the remaining capacity can remain unused. Over time, this leads to lower hardware utilization and higher infrastructure costs within a cluster.
Multi-instance GPU (MIG) combined with dynamic resource allocation (DRA) helps address this challenge. MIG partitions a physical GPU into isolated instances with dedicated compute and memory resources, while DRA enables those instances to be provisioned and bound dynamically through Kubernetes resource claims. Rather than treating a GPU as an indivisible resource, the cluster can allocate right-sized GPU partitions to multiple workloads at the same time!


