Suraj Deshmukh - One post | AKS Engineering Blog

Simplifying InfiniBand on AKS

April 11, 2025 · 5 min read

Sachi Desai

Product Manager for AI/ML, GPU workloads on Azure Kubernetes Service

Suraj Deshmukh

Software Engineer at Microsoft

Ernest Wong

Software Engineer at Microsoft

High performance computing (HPC) workloads, like large-scale distributed AI training and inferencing, often require fast, reliable data transfer and synchronization across the underlying compute. Model training, for example, requires shared memory across GPUs because the parameters and gradients need to be constantly shared. For models with billions of parameters, the available memory in a single GPU node may not be enough, so "pooling" the memory across multiple nodes also requires high memory bandwidth due to the sheer volume of data involved. A common way to achieve this at scale is with a high-speed, low-latency network interconnect technology called InfiniBand (IB).