<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="rss.xsl"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>AKS Engineering Blog</title>
        <link>https://blog.aks.azure.com</link>
        <description>AKS Engineering Blog Blog</description>
        <lastBuildDate>Fri, 06 Mar 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[Dynamic Resource Allocation (DRA) with NVIDIA virtualized GPU (vGPU) on AKS]]></title>
            <link>https://blog.aks.azure.com/2026/03/06/dra-with-vGPUs-on-aks</link>
            <guid>https://blog.aks.azure.com/2026/03/06/dra-with-vGPUs-on-aks</guid>
            <pubDate>Fri, 06 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Configure dynamic resource allocation (DRA) for NVIDIA vGPU workloads and learn the prerequisites with setup steps on Azure Kubernetes Service (AKS).]]></description>
            <content:encoded><![CDATA[<p>Recently, dynamic resource allocation (DRA) has emerged as the standard mechanism to consume GPU resources in Kubernetes. With DRA, accelerators like GPUs are no longer exposed as static extended resources (for example, <code>nvidia.com/gpu</code>) but are dynamically allocated through <code>DeviceClasses</code> and <code>ResourceClaims</code>. This unlocks richer scheduling semantics and better integration with virtualization technologies like NVIDIA vGPU.</p>
<p>Virtual accelerators such as NVIDIA vGPU are commonly used for smaller workloads because they allow a single physical GPU to be securely partitioned across multiple tenants or apps. This is especially valuable for enterprise AI/ML development environments, fine-tuning, and audio/visual processing. vGPU enables predictable performance profiles while still exposing CUDA capabilities to containerized workloads.</p>
<p>On Azure, the <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/nvadsa10v5-series" target="_blank" rel="noopener noreferrer">NVadsA10_v5</a> virtual machine (VM) series is backed by the physical NVIDIA A10 GPU in the host and offers this resource model. Instead of assigning the entire GPU to a single VM, the vGPU technology is used to partition the GPU into multiple fixed-size slices at the hypervisor layer.</p>
<p>In this post, we’ll walk through enabling the NVIDIA DRA driver on a node pool backed by an <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/nvadsa10v5-series" target="_blank" rel="noopener noreferrer">NVadsA10_v5 series</a> vGPU on Azure Kubernetes Service (AKS).</p>
<p><img decoding="async" loading="lazy" alt="DRA with fractional A10 vGPU node on AKS" src="https://blog.aks.azure.com/assets/images/DRA_A10_vGPU_AKS_diagram-205255c16d9ef73b720c03cec8e0de7f.png" width="798" height="451" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="prepare-your-aks-cluster">Prepare your AKS cluster<a href="https://blog.aks.azure.com/2026/03/06/dra-with-vGPUs-on-aks#prepare-your-aks-cluster" class="hash-link" aria-label="Direct link to Prepare your AKS cluster" title="Direct link to Prepare your AKS cluster" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="verify-dra-is-enabled">Verify DRA is enabled<a href="https://blog.aks.azure.com/2026/03/06/dra-with-vGPUs-on-aks#verify-dra-is-enabled" class="hash-link" aria-label="Direct link to Verify DRA is enabled" title="Direct link to Verify DRA is enabled" translate="no">​</a></h3>
<p>Starting with your AKS cluster running <em>Kubernetes version <code>1.34</code> or above</em>, you can confirm whether DRA is enabled on your cluster by looking for <code>deviceclasses</code> and <code>resourceslices</code>.</p>
<p>Check <code>deviceclasses</code> via <code>kubectl get deviceclasses</code> or check <code>resourceslices</code> via <code>kubectl get resourceslices</code>.</p>
<p>At this point, the results for both commands should look similar to:</p>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">No resources found</span><br></span></code></pre></div></div>
<p>If DRA isn't enabled on your cluster (for example, if it is running an earlier Kubernetes version than <code>1.34</code>), you may instead see an error like:</p>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">error: the server doesn't have a resource type "deviceclasses"/"resourceslices"</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="add-a-vgpu-node-pool-and-label-your-nodes">Add a vGPU node pool and label your nodes<a href="https://blog.aks.azure.com/2026/03/06/dra-with-vGPUs-on-aks#add-a-vgpu-node-pool-and-label-your-nodes" class="hash-link" aria-label="Direct link to Add a vGPU node pool and label your nodes" title="Direct link to Add a vGPU node pool and label your nodes" translate="no">​</a></h3>
<p>Add a GPU node pool and specify an Azure virtual machine (VM) size which supports virtualized accelerator workloads, such as <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/nvadsa10v5-series" target="_blank" rel="noopener noreferrer">NVadsA10_v5 series</a>. These VM sizes are specifically designed for virtualized GPU scenarios, where each node receives a slice of a physical NVIDIA A10 rather than the entire card.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks nodepool add \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group &lt;resource-group&gt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --cluster-name &lt;aks-cluster-name&gt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name gpunodepool1 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --node-count 2 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --node-vm-size Standard_NV6ads_A10_v5</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>The NVIDIA DRA kubelet plugin runs as a DaemonSet and requires specific node labels, such as <code>nvidia.com/gpu.present=true</code>.</p></div></div>
<p>Today, AKS GPU nodes already include <code>accelerator=nvidia</code>, so we'll use this selector to apply the required label:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get nodes -l accelerator=nvidia -o name | \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  xargs -I{} kubectl label --overwrite {} nvidia.com/gpu.present=true</span><br></span></code></pre></div></div>
<p>You can expect a similar output to the following:</p>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">node/aks-gpunodepool1-12345678-vmss000000 labeled</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">node/aks-gpunodepool1-12345678-vmss000001 labeled</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="install-the-nvidia-dra-driver">Install the NVIDIA DRA driver<a href="https://blog.aks.azure.com/2026/03/06/dra-with-vGPUs-on-aks#install-the-nvidia-dra-driver" class="hash-link" aria-label="Direct link to Install the NVIDIA DRA driver" title="Direct link to Install the NVIDIA DRA driver" translate="no">​</a></h2>
<p>The recommended way to install the driver is via Helm. Ensure you have Helm updated to the <a href="https://helm.sh/docs/topics/version_skew/#supported-version-skew" target="_blank" rel="noopener noreferrer">correct version</a>.</p>
<p>Add the Helm chart that contains the DRA driver.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">helm repo add nvidia https://helm.ngc.nvidia.com/nvidia &amp;&amp; helm repo update</span><br></span></code></pre></div></div>
<p>Now, install the NVIDIA DRA driver version <code>25.8.1</code>:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">helm --install nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --version="25.8.1" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --create-namespace \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --namespace nvidia-dra-driver-gpu \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --set "resources.gpus.enabled=true" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --set "gpuResourcesEnabledOverride=true" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --set "controller.nodeSelector=null" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --set "controller.tolerations[0].key=CriticalAddonsOnly" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --set "controller.tolerations[0].operator=Exists" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --set "controller.affinity=null" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --set "featureGates.IMEXDaemonsWithDNSNames=false"</span><br></span></code></pre></div></div>
<p>Confirm that the following pods are running:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get pods -n nvidia-dra-driver-gpu</span><br></span></code></pre></div></div>
<p>You can expect a similar output to the following:</p>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">NAME                                                  READY   STATUS    RESTARTS</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nvidia-dra-controller-xxxxx                          1/1     Running   0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nvidia-dra-kubelet-plugin-aks-gpunodepool1-xxxxx     1/1     Running   0</span><br></span></code></pre></div></div>
<p>At this stage, the NVIDIA DRA driver scans the node, detects the single vGPU device exposed by the Azure VM, and publishes it to the Kubernetes control plane as a DRA-managed device. Even though the underlying hardware is a shared A10, the driver registers one allocatable device on each node because that is what the VM presents.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="why-do-these-helm-settings-matter">Why do these Helm settings matter?<a href="https://blog.aks.azure.com/2026/03/06/dra-with-vGPUs-on-aks#why-do-these-helm-settings-matter" class="hash-link" aria-label="Direct link to Why do these Helm settings matter?" title="Direct link to Why do these Helm settings matter?" translate="no">​</a></h3>
<p>Let's walk through some of the DRA Helm chart settings set earlier for vGPU:</p>
<ol>
<li>
<p><code>resources.gpus.enabled=true</code></p>
<p>Standard DRA workloads request devices via the <code>gpu.nvidia.com</code> device class, so <code>resources.gpus.enabled=true</code> is needed for GPU-accelerated workloads to schedule on these A10 nodes.</p>
</li>
<li>
<p><code>gpuResourcesEnabledOverride=true</code></p>
<p>The Helm chart includes a validation guard to prevent collisions between the NVIDIA DRA driver (<code>gpu.nvidia.com</code>) and legacy NVIDIA device plugin (<code>nvidia.com/gpu</code> extended resource). Since we are running DRA exclusively (with no legacy device plugin), we bypass the validation: <code>gpuResourcesEnabledOverride=true</code> to ensure the chart installs successfully.</p>
</li>
<li>
<p><code>featureGates.IMEXDaemonsWithDNSNames=false</code></p>
<p>This feature gate is enabled by default and requires NVIDIA GRID GPU driver version &gt;= <code>570.158.01</code>. For Azure VM sizes, like the A10 series, that require the distinct GRID driver <code>550</code> branch today, we explicitly set <code>featureGates.IMEXDaemonsWithDNSNames=false</code> to disable IMEX for this GPU size.</p>
</li>
</ol>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="verify-deviceclass-and-resourceslices">Verify DeviceClass and ResourceSlices<a href="https://blog.aks.azure.com/2026/03/06/dra-with-vGPUs-on-aks#verify-deviceclass-and-resourceslices" class="hash-link" aria-label="Direct link to Verify DeviceClass and ResourceSlices" title="Direct link to Verify DeviceClass and ResourceSlices" translate="no">​</a></h3>
<p>After deployment, confirm that the <code>gpu.nvidia.com</code> DeviceClass exists:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get deviceclasses</span><br></span></code></pre></div></div>
<p>Expected output:</p>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">NAME             DRIVER</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gpu.nvidia.com   nvidia.com/dra</span><br></span></code></pre></div></div>
<p>Check ResourceSlices:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get resourceslices</span><br></span></code></pre></div></div>
<p>Expected output:</p>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">NAME                                  NODE</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gpu-aks-gpunodepool1-xxxxx-0          aks-gpunodepool1-xxxxx-vmss000000</span><br></span></code></pre></div></div>
<p>Now, we’ve confirmed that the DRA driver discovered and published our vGPU-backed resources, and the nodes are ready to accept workloads! You can follow <a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#run-a-gpu-workload-using-dra-drivers" target="_blank" rel="noopener noreferrer">these steps</a> to run a sample workload using the DRA specifications.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="vgpu-options-in-azure">vGPU options in Azure<a href="https://blog.aks.azure.com/2026/03/06/dra-with-vGPUs-on-aks#vgpu-options-in-azure" class="hash-link" aria-label="Direct link to vGPU options in Azure" title="Direct link to vGPU options in Azure" translate="no">​</a></h3>
<p>Beyond <code>Standard_NV6ads_A10_v5</code> (one-sixth of an A10 GPU), this VM series offers larger fractional profiles:</p>
<ul>
<li><code>Standard_NV12ads_A10_v5</code>: one-third of a physical A10, with 8 GB of accelerator memory</li>
<li><code>Standard_NV18ads_A10_v5</code>: one-half of a physical A10, with 12 GB of accelerator memory</li>
</ul>
<p>These sizes (and more) in the <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/nvadsa10v5-series" target="_blank" rel="noopener noreferrer">NVadsA10_v5 VM series</a> map to a fixed NVIDIA vGPU profile and the fraction determines how much GPU memory and compute capacity (serial multiprocessors) the VM receives. The limits are enforced at the hypervisor layer, so AKS ultimately sees a single GPU device with predictable, guaranteed capacity.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="looking-ahead">Looking ahead<a href="https://blog.aks.azure.com/2026/03/06/dra-with-vGPUs-on-aks#looking-ahead" class="hash-link" aria-label="Direct link to Looking ahead" title="Direct link to Looking ahead" translate="no">​</a></h2>
<p>As GPUs become first-class resources in Kubernetes, combining virtualized GPU with DRA provides a practical way to run shared, production-grade workloads on AKS. vGPU supported Azure VM series offer partial GPUs for scenarios such as media rendering and transcoding and fine-tuning small to medium language models, while DRA ensures those resources are allocated explicitly and scheduled with awareness of real cluster state.</p>
<p>For large AKS deployments, especially in regulated or cost-sensitive industries, getting GPU placement and utilization right directly affects job throughput and infrastructure efficiency. Using DRA with vGPU will enable organizations to move beyond coarse node-level allocation toward controlled, workload-driven GPU consumption at scale.</p>]]></content:encoded>
            <category>GPU</category>
            <category>Performance</category>
            <category>Operations</category>
        </item>
        <item>
            <title><![CDATA[Running more with less: Multi-instance GPU (MIG) with Dynamic Resource Allocation (DRA) on AKS]]></title>
            <link>https://blog.aks.azure.com/2026/03/03/multi-instance-gpu-with-dra-on-aks</link>
            <guid>https://blog.aks.azure.com/2026/03/03/multi-instance-gpu-with-dra-on-aks</guid>
            <pubDate>Tue, 03 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to use dynamic resource allocation (DRA) to allocate right-sized multi-instance GPU (MIG) node partitions for your GPU-accelerated workloads on Azure Kubernetes Service (AKS).]]></description>
            <content:encoded><![CDATA[<p>GPUs power a wide range of production Kubernetes workloads across industries. For example, media platforms rely on them for video encoding/transcoding, financial services firms run quantitative risk simulations, and research groups process and visualize large datasets. In each of these scenarios, GPUs significantly improve job throughput, yet individual workloads often consume only a portion of the available device.</p>
<p>By default, Kubernetes schedules GPUs as entire units; when a workload requires only a fraction of a GPU, the remaining capacity can remain unused. Over time, this leads to lower hardware utilization and higher infrastructure costs within a cluster.</p>
<p>Multi-instance GPU (MIG) combined with dynamic resource allocation (DRA) helps address this challenge. MIG partitions a physical GPU into isolated instances with dedicated compute and memory resources, while DRA enables those instances to be provisioned and bound dynamically through Kubernetes resource claims. Rather than treating a GPU as an indivisible resource, the cluster can allocate right-sized GPU partitions to multiple workloads at the same time!</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>To learn more about dynamic resource allocation on AKS, visit our <a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes" target="_blank" rel="noopener noreferrer">previous blog</a> on getting started with DRA and NVIDIA GPU Operator!</p></div></div>
<p>In this post, we walk through how to configure MIG with the <a href="https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html" target="_blank" rel="noopener noreferrer">NVIDIA GPU Operator</a> on AKS, enable the <a href="https://github.com/NVIDIA/k8s-dra-driver-gpu" target="_blank" rel="noopener noreferrer">NVIDIA DRA driver</a>, define the necessary Kubernetes resource abstractions, and deploy a workload that consumes a MIG-backed GPU instance (NVIDIA GPUs - such as the A100, H100, H200 and more - that support partitioning can be found in the <a href="https://docs.nvidia.com/datacenter/tesla/mig-user-guide/supported-gpus.html" target="_blank" rel="noopener noreferrer">MIG User Guide</a>).</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="prepare-your-aks-cluster">Prepare your AKS cluster<a href="https://blog.aks.azure.com/2026/03/03/multi-instance-gpu-with-dra-on-aks#prepare-your-aks-cluster" class="hash-link" aria-label="Direct link to Prepare your AKS cluster" title="Direct link to Prepare your AKS cluster" translate="no">​</a></h2>
<p>Starting with your AKS cluster running <em>Kubernetes version <code>1.34</code> or above</em>, you can confirm whether DRA is enabled on your cluster by looking for <code>deviceclasses</code> and <code>resourceslices</code>.</p>
<p>Check <code>deviceclasses</code> via <code>kubectl get deviceclasses</code> or check <code>resourceslices</code> via <code>kubectl get resourceslices</code>.</p>
<p>At this point, the results for both commands should look similar to:</p>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">No resources found</span><br></span></code></pre></div></div>
<p>If DRA isn't enabled on your cluster (for example, if it is running an earlier Kubernetes version than <code>1.34</code>), you may instead see an error like:</p>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">error: the server doesn't have a resource type "deviceclasses"/"resourceslices"</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="set-up-nvidia-gpu-operator">Set up NVIDIA GPU Operator<a href="https://blog.aks.azure.com/2026/03/03/multi-instance-gpu-with-dra-on-aks#set-up-nvidia-gpu-operator" class="hash-link" aria-label="Direct link to Set up NVIDIA GPU Operator" title="Direct link to Set up NVIDIA GPU Operator" translate="no">​</a></h3>
<p>We’ll leverage the <a href="https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html" target="_blank" rel="noopener noreferrer">NVIDIA GPU Operator</a> to manage the GPU driver lifecycle. When creating the GPU-enabled node pool, specify <code>--gpu-driver none</code> to prevent preinstalled drivers from conflicting with the operator-managed stack and ensure consistent configuration across nodes. In this example, we provision an AKS node pool with an <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/ndma100v4-series" target="_blank" rel="noopener noreferrer">Azure NDm_A100_v4 VM size</a> supporting MIG:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks nodepool add \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --resource-group myResourceGroup \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --cluster-name myAKSCluster \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --name gpunp \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --node-count 1 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --gpu-driver none \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --node-vm-size Standard_ND96amsr_A100_v4</span><br></span></code></pre></div></div>
<p>Next, install the NVIDIA GPU Operator with MIG enabled and the legacy Kubernetes device plugin disabled. We consolidate these configuration settings in a YAML file named <code>operator-install.yaml</code> as follows:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">mig</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">strategy</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> single</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">devicePlugin</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">enabled</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">false</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">driver</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">enabled</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">toolkit</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">env</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Limits containers running in unprivileged mode from requesting access to arbitrary GPU devices </span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">value</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"false"</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>In this setup, the traditional Kubernetes device plugin in the NVIDIA GPU Operator is purposely disabled so that GPU resources are not managed through the static model. Instead, the NVIDIA DRA driver serves as the authority for device discovery, enabling dynamic, claim-based management of MIG-backed GPU resources.</p></div></div>
<p>The single strategy partitions each GPU into uniform partitions; alternatively, you can configure <a href="https://docs.nvidia.com/datacenter/cloud-native/kubernetes/latest/index.html#the-mixed-strategy" target="_blank" rel="noopener noreferrer">mixed strategy</a> to partition each GPU into distinct resource types. After preparing and saving the configuration file, install the operator with Helm:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">helm install --wait \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --generate-name -n gpu-operator \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --create-namespace \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  nvidia/gpu-operator \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --version=v25.10.0 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -f operator-install.yaml</span><br></span></code></pre></div></div>
<p>Now, the operator has installed the GPU driver, configured single-strategy MIG, and prepared the node pool for GPU partitioning.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="install-the-nvidia-dra-driver">Install the NVIDIA DRA driver<a href="https://blog.aks.azure.com/2026/03/03/multi-instance-gpu-with-dra-on-aks#install-the-nvidia-dra-driver" class="hash-link" aria-label="Direct link to Install the NVIDIA DRA driver" title="Direct link to Install the NVIDIA DRA driver" translate="no">​</a></h2>
<p>DRA introduces a more flexible device management model in Kubernetes; instead of statically advertising a fixed number of GPUs, DRA allows workloads to create and bind resource claims dynamically.</p>
<p>The NVIDIA DRA driver installation enables GPU resources and points to the driver root managed by the operator, as shown in the following configuration file named <code>dra-install.yaml</code>:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">gpuResourcesEnabledOverride</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">resources-computeDomains</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">enabled</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">false</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic"># We'll be using GPUs, not compute domains.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">controller</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">affinity</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">nodeAffinity</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">requiredDuringSchedulingIgnoredDuringExecution</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">nodeSelectorTerms</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">matchExpressions</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> kubernetes.azure.com/mode</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">operator</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> In</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">values</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> system   </span><span class="token comment" style="color:#999988;font-style:italic"># Makes sure the system nodes are utilized </span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">nvidiaDriverRoot</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"/run/nvidia/driver"</span><br></span></code></pre></div></div>
<p>Using the above settings, install the NVIDIA DRA driver in a dedicated namespace:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">helm install nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--version="25.8.1" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--create-namespace \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--namespace nvidia-dra-driver-gpu \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-f dra-install.yaml</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="verify-mig-configuration">Verify MIG configuration<a href="https://blog.aks.azure.com/2026/03/03/multi-instance-gpu-with-dra-on-aks#verify-mig-configuration" class="hash-link" aria-label="Direct link to Verify MIG configuration" title="Direct link to Verify MIG configuration" translate="no">​</a></h3>
<p>Before scheduling workloads, confirm that the AKS node recognizes the GPU and that MIG is active. Inspecting the node should show that a GPU is present and that MIG capability and strategy are correctly applied. You should see indicators such as <code>nvidia.com/mig.capable=true</code> and <code>nvidia.com/mig.strategy=single</code>, along with a successful MIG configuration state.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl describe node aks-gpunp-12340814-vmss000000 | grep "gpu"</span><br></span></code></pre></div></div>
<p>Output:</p>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Name:               aks-gpunp-12340814-vmss000000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    agentpool=gpunp</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    kubernetes.azure.com/agentpool=gpunp</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    kubernetes.io/hostname=aks-gpunp-12340814-vmss000000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    nvidia.com/gpu-driver-upgrade-state=upgrade-done</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    nvidia.com/gpu.compute.major=9</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    nvidia.com/gpu.compute.minor=0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    nvidia.com/gpu.count=1 # GPUs are recognized</span><br></span></code></pre></div></div>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl describe node aks-gpunp-12340814-vmss000000 | grep "mig"</span><br></span></code></pre></div></div>
<p>Example result:</p>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Name:               aks-gpunp-12340814-vmss000000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    nvidia.com/gpu.deploy.mig-manager=true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    nvidia.com/mig.capable=true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    nvidia.com/mig.config=all-disabled</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    nvidia.com/mig.config.state=success</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    nvidia.com/mig.strategy=single</span><br></span></code></pre></div></div>
<p>Your AKS cluster should now be ready to expose MIG-enabled GPU partitions as dynamically allocatable devices!</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="define-a-deviceclass">Define a DeviceClass<a href="https://blog.aks.azure.com/2026/03/03/multi-instance-gpu-with-dra-on-aks#define-a-deviceclass" class="hash-link" aria-label="Direct link to Define a DeviceClass" title="Direct link to Define a DeviceClass" translate="no">​</a></h2>
<p>DRA introduces a <code>DeviceClass</code> abstraction that allows Kubernetes to select devices based on accelerator type and characteristics. In this case, we define a class that selects NVIDIA GPUs:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> resource.k8s.io/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> DeviceClass</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> nvidia</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mig</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">selectors</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">cel</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">expression</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"device.driver == 'gpu.nvidia.com'"</span><br></span></code></pre></div></div>
<p>This definition tells AKS that any request referencing <code>nvidia-mig</code> should resolve to devices managed by the NVIDIA GPU driver.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get deviceclass</span><br></span></code></pre></div></div>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">NAME                                        AGE</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">compute-domain-daemon.nvidia.com            ...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">compute-domain-default-channel.nvidia.com   ...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gpu.nvidia.com                              ...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">mig.nvidia.com                              ... </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nvidia-mig                                  1m31s</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="create-a-mig-resourceclaimtemplate">Create a MIG ResourceClaimTemplate<a href="https://blog.aks.azure.com/2026/03/03/multi-instance-gpu-with-dra-on-aks#create-a-mig-resourceclaimtemplate" class="hash-link" aria-label="Direct link to Create a MIG ResourceClaimTemplate" title="Direct link to Create a MIG ResourceClaimTemplate" translate="no">​</a></h2>
<p>Instead of requesting <code>nvidia.com/gpu: 1</code> in a pod spec, workloads will now reference a <code>ResourceClaimTemplate</code>, which describes the device requirement declaratively. We'll apply a MIG <code>ResourceClaimTemplate</code> to the AKS cluster as follows:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> resource.k8s.io/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ResourceClaimTemplate</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> mig</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">gpu</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">1g</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">devices</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">requests</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">exactly</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">deviceClassName</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> nvidia</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mig</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">count</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1</span><br></span></code></pre></div></div>
<p>This abstraction decouples workloads from physical device details. A job does not need to know which GPU or partition it receives—it just declares its need for a device from the <code>nvidia-mig</code> class.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="deploy-a-sample-mig-workload">Deploy a sample MIG workload<a href="https://blog.aks.azure.com/2026/03/03/multi-instance-gpu-with-dra-on-aks#deploy-a-sample-mig-workload" class="hash-link" aria-label="Direct link to Deploy a sample MIG workload" title="Direct link to Deploy a sample MIG workload" translate="no">​</a></h3>
<p>To validate the setup, we can deploy a GPU-accelerated workload requesting a MIG partition. Our example below uses a TensorFlow sample and generally mirrors how a data processing or video transcoding job can consume a resource partition in production environments:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> batch/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Job</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> samples</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">tf</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mnist</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">demo</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">labels</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">app</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> samples</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">tf</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mnist</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">demo</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">template</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">labels</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">app</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> samples</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">tf</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mnist</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">demo</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">restartPolicy</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> OnFailure</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">tolerations</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"sku"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">operator</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Equal"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">value</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"gpu"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">effect</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"NoSchedule"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">containers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> samples</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">tf</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mnist</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">demo</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> mcr.microsoft.com/azuredocs/samples</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">tf</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mnist</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">demo</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">imagePullPolicy</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> IfNotPresent</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">args</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"--max_steps"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"500"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">resources</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">claims</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">resourceClaims</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">resourceClaimTemplateName</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> mig</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">gpu</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">1g</span><br></span></code></pre></div></div>
<p>After deploying this job, we can check its status and the usage of the <code>mig-gpu-1g</code> resource claim template we previously created:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get job</span><br></span></code></pre></div></div>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">NAME                    STATUS    COMPLETIONS   DURATION   AGE</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">samples-tf-mnist-demo   Running   0/1           2m59s      2m59s</span><br></span></code></pre></div></div>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get resourceclaimtemplate</span><br></span></code></pre></div></div>
<div class="language-output codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-output codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">NAME         AGE</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">mig-gpu-1g   11s</span><br></span></code></pre></div></div>
<p>The key difference from traditional GPU scheduling is the use of <code>resources.claims</code> and <code>resourceClaimTemplateName</code>: Kubernetes coordinates with the DRA driver to provision and bind a MIG partition dynamically. Now when multiple jobs are submitted, each can receive its own isolated instance, allowing parallel execution on the same physical GPU.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="a-more-elastic-gpu-future-on-aks">A more elastic GPU future on AKS<a href="https://blog.aks.azure.com/2026/03/03/multi-instance-gpu-with-dra-on-aks#a-more-elastic-gpu-future-on-aks" class="hash-link" aria-label="Direct link to A more elastic GPU future on AKS" title="Direct link to A more elastic GPU future on AKS" translate="no">​</a></h2>
<p>GPUs in Kubernetes have traditionally been scheduled as indivisible units. By enabling MIG and DRA on AKS, you move toward a model where accelerators are elastic, shareable, and first-class resources in the control plane. For organizations running parallel workloads that only partially utilize GPU capacity, this shift unlocks immediate cost efficiency and operational benefits.</p>
<p>If you are already operating GPU-enabled node pools on AKS and notice underutilization, implementing MIG with Dynamic Resource Allocation is a highly impactful architectural improvement you can make. It allows you to run more workloads on the same hardware while maintaining predictability and cloud-native operational simplicity.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="additional-resources">Additional resources<a href="https://blog.aks.azure.com/2026/03/03/multi-instance-gpu-with-dra-on-aks#additional-resources" class="hash-link" aria-label="Direct link to Additional resources" title="Direct link to Additional resources" translate="no">​</a></h2>
<p>To learn more about NVIDIA MIG and DRA, check out the following resources:</p>
<ul>
<li><a href="https://docs.nvidia.com/datacenter/tesla/mig-user-guide/supported-gpus.html" target="_blank" rel="noopener noreferrer">MIG User Guide</a></li>
<li><a href="https://docs.nvidia.com/datacenter/tesla/mig-user-guide/supported-mig-profiles.html" target="_blank" rel="noopener noreferrer">Additional MIG profiles per NVIDIA GPU series</a></li>
<li><a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/" target="_blank" rel="noopener noreferrer">Types of DRA users on Kubernetes</a></li>
</ul>]]></content:encoded>
            <category>GPU</category>
            <category>Performance</category>
            <category>Operations</category>
        </item>
        <item>
            <title><![CDATA[Scaling Anyscale Ray Workloads on AKS]]></title>
            <link>https://blog.aks.azure.com/2026/02/13/scaling-ray-aks</link>
            <guid>https://blog.aks.azure.com/2026/02/13/scaling-ray-aks</guid>
            <pubDate>Fri, 13 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to run production-grade Ray workloads on Azure Kubernetes Service with multi-cluster multi-region, unified storage, and automated credential management for AI]]></description>
            <content:encoded><![CDATA[<p>This post focuses on running Anyscale's managed Ray service on AKS, using the Anyscale Runtime (formerly RayTurbo) for an optimized Ray experience. For open-source Ray on AKS, see our <a href="https://blog.aks.azure.com/2025/01/13/ray-on-aks" target="_blank" rel="noopener noreferrer">Ray on AKS overview</a>.</p>
<p>Ray is an open-source distributed compute framework for scaling Python and AI workloads from a laptop to clusters with thousands of nodes. Anyscale provides a managed ML/AI platform and an optimized Ray runtime with better scalability, observability, and operability than running open-source <a href="https://github.com/ray-project/kuberay" target="_blank" rel="noopener noreferrer">KubeRay</a>—including intelligent autoscaling, enhanced monitoring, and fault-tolerant training.</p>
<p>As part of <a href="https://www.anyscale.com/press/anyscale-collaborates-with-microsoft-to-deliver-ai-native-computing-on-azure" target="_blank" rel="noopener noreferrer">Microsoft and Anyscale's strategic collaboration</a> to deliver <a href="https://devblogs.microsoft.com/all-things-azure/powering-distributed-aiml-at-scale-with-azure-and-anyscale/" target="_blank" rel="noopener noreferrer">distributed AI/ML Azure-native computing at scale</a>, we've been working closely with Anyscale to enhance the production-readiness of Ray workloads on Azure Kubernetes Service (AKS) in three critical areas:</p>
<ul>
<li><strong>Elastic scalability</strong> through multi-cluster multi-region capacity aggregation</li>
<li><strong>Data persistence</strong> with unified storage across ML/AI development and operation lifecycle</li>
<li><strong>Operational simplicity</strong> through automated credential management with service principal</li>
</ul>
<p>Whether you're <a href="https://github.com/Azure-Samples/aks-anyscale/tree/main/examples/finetuning" target="_blank" rel="noopener noreferrer">fine-tuning models with DeepSpeed or LLaMA-Factory</a> or <a href="https://github.com/Azure-Samples/aks-anyscale/tree/main/examples/inferencing" target="_blank" rel="noopener noreferrer">deploying inference endpoints for LLMs ranging from small to large-scale reasoning models</a>, Anyscale on AKS delivers a production-grade ML/AI platform that scales with your needs.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="multi-cluster-multi-region">Multi-cluster Multi-region<a href="https://blog.aks.azure.com/2026/02/13/scaling-ray-aks#multi-cluster-multi-region" class="hash-link" aria-label="Direct link to Multi-cluster Multi-region" title="Direct link to Multi-cluster Multi-region" translate="no">​</a></h2>
<p>GPU scarcity remains one of the most significant challenges in large-scale ML operations. High-demand accelerators like NVIDIA GPUs often face capacity constraints in specific Azure regions, leading to delays in provisioning clusters or launching training jobs.</p>
<p>By deploying Ray clusters across multiple AKS clusters in different Azure regions, you can:</p>
<ul>
<li><strong>Increase GPU availability</strong>: Distribute workloads across clusters and regions with available capacity, reducing wait times for cluster provisioning</li>
<li><strong>Scale beyond single-cluster limits</strong>: Azure imposes quota limits on GPU instances per region, but multi-region deployments let you aggregate capacity</li>
<li><strong>Improve fault tolerance</strong>: If one region experiences an outage or capacity shortage, workloads can be automatically rerouted to healthy clusters</li>
</ul>
<p>With infrastructure deployed across multiple regions—and optionally on-premises or other cloud environments—you can manage and monitor Anyscale workloads on registered clusters from the Anyscale console. This multi-cloud and hybrid cloud approach lets you access GPU capacity wherever it exists, whether in Azure regions, on-premises data centers, or other cloud providers. Extend your compute pool beyond Azure by connecting on-premises GPU clusters through <a href="https://learn.microsoft.com/en-us/azure/aks/aksarc/" target="_blank" rel="noopener noreferrer">AKS enabled by Azure Arc</a>, aggregating existing infrastructure investments with cloud-based resources:</p>
<p><img decoding="async" loading="lazy" alt="Anyscale Resources" src="https://blog.aks.azure.com/assets/images/anyscale-resources-1ebf2d8fd4764905f431863b0dcefb1d.png" width="3302" height="1952" class="img_ev3q"></p>
<p>Anyscale Workspaces provides a managed environment for running interactive Ray workloads, with manual or automatic scheduling across available clusters based on resource requirements:</p>
<p><img decoding="async" loading="lazy" alt="Anyscale Workspaces" src="https://blog.aks.azure.com/assets/images/anyscale-workspaces-534d47395de1c18771610771c92e98d0.png" width="3300" height="1972" class="img_ev3q"></p>
<p>To add a cluster or another region to your existing Anyscale cloud, define a cloud resource as below <a href="https://github.com/Azure-Samples/aks-anyscale/blob/main/config/cloud_resource.yaml" target="_blank" rel="noopener noreferrer">cloud_resource.yaml</a>:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> k8s</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">azure</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">$REGION</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">provider</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> AZURE</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">compute_stack</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> K8S</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">region</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> $REGION</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">object_storage</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">bucket_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> abfss</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">//$</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">STORAGE_CONTAINER</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain">@$</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">STORAGE_ACCOUNT</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain">.dfs.core.windows.net</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">file_storage</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">persistent_volume_claim</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> blob</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">fuse2</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">azure_config</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">tenant_id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> $</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">AZURE_TENANT_ID</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kubernetes_config</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">anyscale_operator_iam_identity</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> $</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">IDENTITY_PRINCIPAL_ID</span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<p>Then create the cloud resource using the Anyscale CLI:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">anyscale cloud resource create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --cloud "$ANYSCALE_CLOUD_NAME" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -f cloud_resource.yaml</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="unified-storage">Unified Storage<a href="https://blog.aks.azure.com/2026/02/13/scaling-ray-aks#unified-storage" class="hash-link" aria-label="Direct link to Unified Storage" title="Direct link to Unified Storage" translate="no">​</a></h2>
<p>Another major challenge is sharing training data, model checkpoints, and artifacts across the ML/AI workflow—from pre-training to fine-tuning to inference. <a href="https://github.com/Azure/azure-storage-fuse" target="_blank" rel="noopener noreferrer">Azure BlobFuse2</a> mounts Azure Blob Storage into Ray worker pods as a shared POSIX filesystem, providing unified, cloud-scale storage beyond workload lifecycle.</p>
<p>From Ray's perspective, BlobFuse2 is just a mounted filesystem. Ray tasks and actors read datasets and write checkpoints via normal file I/O, while BlobFuse2 ensures data is persisted to Azure Blob Storage and shared across pods. This keeps Ray code portable while benefiting from Azure-native storage. By decoupling data from compute, you can scale Ray clusters up and down across node pools without data loss, while local caching prevents GPU stalls during large training job run.</p>
<p><img decoding="async" loading="lazy" alt="Cluster Storage Architecture" src="https://blog.aks.azure.com/assets/images/cluster-storage-3182291f76a1f1121a1d7f5ab3a97f3a.svg" width="1051" height="750" class="img_ev3q"></p>
<p>To use BlobFuse2 with Ray on AKS:</p>
<ol>
<li>
<p>Create Azure Blob Storage containers for datasets, checkpoints, and models.</p>
</li>
<li>
<p>Enable <a href="https://github.com/kubernetes-sigs/blob-csi-driver" target="_blank" rel="noopener noreferrer">blob-csi-driver</a> when creating your AKS cluster:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --enable-blob-driver</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ...</span><br></span></code></pre></div></div>
</li>
<li>
<p>Create a <a href="https://github.com/Azure-Samples/aks-anyscale/blob/main/config/storageclass.yaml" target="_blank" rel="noopener noreferrer">StorageClass</a> that uses workload identity authentication and optimized caching parameters for large files:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> storage.k8s.io/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> StorageClass</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> blob</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">fuse2</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">provisioner</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> blob.csi.azure.com</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">parameters</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">protocol</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> fuse2</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">storageAccount</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> $</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">STORAGE_ACCOUNT</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">resourceGroup</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> $</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">RESOURCE_GROUP</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">clientID</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> $</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">IDENTITY_CLIENT_ID</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">mountWithWorkloadIdentityToken</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"true"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">mountOptions</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">o allow_other</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">file</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">cache</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">timeout</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">in</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">seconds=120</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">use</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">attr</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">cache=true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">cancel</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">list</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">on</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mount</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">seconds=10</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">o attr_timeout=120</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">o entry_timeout=120</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">o negative_timeout=120</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">log</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">level=LOG_WARNING</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">cache</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">size</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mb=1000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">allowVolumeExpansion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">reclaimPolicy</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Retain</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">volumeBindingMode</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Immediate</span><br></span></code></pre></div></div>
</li>
<li>
<p>Create a <a href="https://github.com/Azure-Samples/aks-anyscale/blob/main/config/pvc.yaml" target="_blank" rel="noopener noreferrer">PersistentVolumeClaim</a> in the <code>anyscale-operator</code> namespace with <code>ReadWriteMany</code> access mode. This allows multiple Ray workers across different nodes to access the same data:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> PersistentVolumeClaim</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> blob</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">fuse2</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">namespace</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> anyscale</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">operator</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">accessModes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> ReadWriteMany</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">storageClassName</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> blob</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">fuse2</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">resources</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">requests</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">storage</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 100Gi</span><br></span></code></pre></div></div>
</li>
<li>
<p>Configure Ray workloads to read from and write to mounted path <code>/mnt/cluster_storage</code>.</p>
</li>
</ol>
<p>With this setup, Ray workers read and write data using standard POSIX file operations while benefiting from the durability and scalability of Azure Blob Storage. ML/AI engineers can seamlessly transition from pre-training to fine-tuning to inference without manual data migration.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="service-principal-authentication">Service Principal Authentication<a href="https://blog.aks.azure.com/2026/02/13/scaling-ray-aks#service-principal-authentication" class="hash-link" aria-label="Direct link to Service Principal Authentication" title="Direct link to Service Principal Authentication" translate="no">​</a></h2>
<p>Maintaining secure and reliable authentication between Ray clusters and Azure resources can be challenging. Previous integration relied on CLI tokens or API keys that expire every 30 days, requiring manual rotation or causing potential service disruptions.</p>
<p>By combining Microsoft Entra service principals with managed identities (workload identity), you eliminate this operational burden without storing long-lived secrets in the cluster. Pods use the managed identity to obtain short-lived access tokens for the service principal from Microsoft Entra ID, and Entra automatically refreshes these tokens as needed:</p>
<ul>
<li>No long-lived credentials (client secrets or certificates) are stored in Kubernetes clusters</li>
<li>Automatic short-lived token issuance and refresh without manual intervention</li>
<li>Fine-grained RBAC for Azure resource access</li>
<li>Full audit trails through Azure Activity Logs</li>
</ul>
<p>The following diagram illustrates how the service principal enables the Anyscale Kubernetes Operator to authenticate without storing credentials:</p>
<p><img decoding="async" loading="lazy" alt="Authentication Flow" src="https://blog.aks.azure.com/assets/images/auth-flow-f5466e9b302a5c6e8d018651f6bb8849.svg" width="1718" height="260" class="img_ev3q"></p>
<p>In this authentication flow:</p>
<ol>
<li>The <code>Anyscale Operator</code> pod authenticates using a user-assigned managed identity.</li>
<li>The managed identity requests an access token with scope <code>api://086bc.../.default</code>.</li>
<li>The token is issued by the <code>Anyscale Kubernetes Operator Auth</code> service principal.</li>
<li>The service principal's <code>appId</code> becomes the <code>AZURE_CLIENT_ID</code> environment variable.</li>
<li>The managed identity's <code>appId</code> appears as the <code>oid</code> claim in the resulting access token.</li>
</ol>
<p>This approach removes the need for manual credential rotation by letting Azure automatically manage token lifecycle, reducing operational overhead and minimizing the risk of authentication failures due to expired secrets. In a multi-cluster environment, this becomes even more critical—automated credential management across clusters simplifies operations and ensures consistent, secure authentication at scale.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="conclusion">Conclusion<a href="https://blog.aks.azure.com/2026/02/13/scaling-ray-aks#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Running Ray at scale on Azure Kubernetes Service requires careful consideration of compute, storage, and security strategies. Whether you're running hundreds of small experiments or large, multi-day training jobs, Anyscale on AKS gives you the elastic scale, unified storage, and operational simplicity to take ML/AI workloads from development to production.</p>
<p>To explore private preview access for Anyscale on AKS, contact your Microsoft account team or open a request on the <a href="https://github.com/Azure/AKS/issues/new/choose" target="_blank" rel="noopener noreferrer">AKS GitHub repository</a> with details about your Ray workloads and target regions.</p>]]></content:encoded>
            <category>AI</category>
            <category>Ray</category>
            <category>Anyscale</category>
        </item>
        <item>
            <title><![CDATA[Deploying KubeVirt on AKS]]></title>
            <link>https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks</link>
            <guid>https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks</guid>
            <pubDate>Fri, 06 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to use KubeVirt to host virtual machines on Azure Kubernetes Service (AKS)]]></description>
            <content:encoded><![CDATA[<p>Many organizations still depend on virtual machines (VMs) to run applications to meet technical, regulatory, or operational requirements. While Kubernetes adoption continues to grow, not every workload can or should be redesigned for containers.</p>
<p><a href="https://github.com/kubevirt/kubevirt" target="_blank" rel="noopener noreferrer">KubeVirt</a> is a <a href="https://www.cncf.io/projects/kubevirt/" target="_blank" rel="noopener noreferrer">Cloud Native Computing Foundation (CNCF) incubating</a> open-source project that allows users to run, deploy, and manage VMs in their Kubernetes clusters.</p>
<p>In this post, you will learn how KubeVirt lets you run, deploy, and manage VMs in AKS.</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>If you're using KubeVirt on AKS or are interested in trying it, <a href="https://github.com/Azure/AKS/issues/5445" target="_blank" rel="noopener noreferrer">we'd love to hear from you</a>! Your feedback will help the AKS team plan how to best support this feature on our platform.</p></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-kubevirt-matters">Why KubeVirt matters<a href="https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks#why-kubevirt-matters" class="hash-link" aria-label="Direct link to Why KubeVirt matters" title="Direct link to Why KubeVirt matters" translate="no">​</a></h2>
<p>KubeVirt can help organizations that are in various stages of their Kubernetes journey manage their infrastructure more effectively. It allows customers to manage legacy VM workloads alongside containerized applications using the same Kubernetes API.</p>
<p>VMs deployed on KubeVirt act much the same way as VMs deployed in more traditional manners would but can run and be managed alongside other containerized applications through traditional Kubernetes tools. Capabilities like scheduling that users are familiar with on Kubernetes can also be applied to these VMs.</p>
<p>Management of these otherwise disparate deployments can be simplified and unified. This unified management can help teams avoid the sprawl that would otherwise come with managing multiple platforms.</p>
<p>The capability to mix and match your workloads in a "hybrid" setting can also allow organizations that might have more complex, legacy VM-based applications to incrementally transition to containers while ensuring their applications remain operational throughout the transition.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="deploying-kubevirt-on-aks">Deploying KubeVirt on AKS<a href="https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks#deploying-kubevirt-on-aks" class="hash-link" aria-label="Direct link to Deploying KubeVirt on AKS" title="Direct link to Deploying KubeVirt on AKS" translate="no">​</a></h2>
<p>You can deploy KubeVirt on any AKS cluster that has nodes running VM SKUs that support nested virtualization.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="prerequisites">Prerequisites<a href="https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks#prerequisites" class="hash-link" aria-label="Direct link to Prerequisites" title="Direct link to Prerequisites" translate="no">​</a></h3>
<ul>
<li>KubeVirt on AKS requires a chosen VM SKU to support nested virtualization. You can confirm support on the VM size's Microsoft Learn page, such as <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/general-purpose/dv5-series?tabs=sizebasic#feature-support" target="_blank" rel="noopener noreferrer">Standard_D4s_v5</a>.
Using the <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/general-purpose/dv5-series?tabs=sizebasic#feature-support" target="_blank" rel="noopener noreferrer">Standard_D4s_v5</a> SKU as an example, on the SKU page, you can see whether or not nested virtualization is supported in the "Feature support" section.</li>
<li>Install the <code>virtctl</code> binary utility to better access and control your VirtualMachineInstances. You can follow instructions <a href="https://kubevirt.io/user-guide/user_workloads/virtctl_client_tool/" target="_blank" rel="noopener noreferrer">on the KubeVirt page</a> to install <code>virtctl</code>.</li>
</ul>
<p><img decoding="async" loading="lazy" alt="Azure VM SKU documentation page showing nested virtualization feature marked as supported in the feature support table" src="https://blog.aks.azure.com/assets/images/nested-virt-example-b5c5da66613b83fe63c7eef2bc512414.png" width="1127" height="1105" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="creating-an-aks-cluster">Creating an AKS cluster<a href="https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks#creating-an-aks-cluster" class="hash-link" aria-label="Direct link to Creating an AKS cluster" title="Direct link to Creating an AKS cluster" translate="no">​</a></h3>
<ol>
<li>
<p>Create your AKS cluster.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks create --resource-group &lt;resource-group&gt; --name &lt;cluster-name&gt; --node-vm-size Standard_D4s_v5</span><br></span></code></pre></div></div>
</li>
<li>
<p>After your cluster is up and running, get the access credentials for the cluster.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks get-credentials --resource-group &lt;resource-group&gt; --name &lt;cluster-name&gt;</span><br></span></code></pre></div></div>
</li>
</ol>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="installing-kubevirt">Installing KubeVirt<a href="https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks#installing-kubevirt" class="hash-link" aria-label="Direct link to Installing KubeVirt" title="Direct link to Installing KubeVirt" translate="no">​</a></h3>
<ol>
<li>
<p>Install the KubeVirt operator.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Get the latest release</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">export RELEASE=$(curl https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Deploy the KubeVirt operator</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">curl -L https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml | \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sed '8249,8254c\            nodeSelectorTerms:\n            - matchExpressions:\n              - key: node-role.kubernetes.io/worker\n                operator: DoesNotExist' | \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f -</span><br></span></code></pre></div></div>
</li>
<li>
<p>Install the KubeVirt custom resource.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl -L https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-cr.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">| yq '.spec.infra.nodePlacement={}' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">| kubectl apply -f -</span><br></span></code></pre></div></div>
<p>Notice the empty <code>nodePlacement: {}</code> and the update for the node selector. By default, KubeVirt sets the node-affinity of operator/custom resource components to control plane nodes. Because AKS control plane nodes are fully managed by Azure and inaccessible to KubeVirt, this update to utilize worker nodes avoids potential failures.</p>
</li>
</ol>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="verify-kubevirt-installation">Verify KubeVirt installation<a href="https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks#verify-kubevirt-installation" class="hash-link" aria-label="Direct link to Verify KubeVirt installation" title="Direct link to Verify KubeVirt installation" translate="no">​</a></h3>
<p>Once all the components are installed, you can confirm that all KubeVirt components are up and running properly in your cluster:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get pods -n kubevirt -o wide</span><br></span></code></pre></div></div>
<p>You should see something like this:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">NAME                               READY   STATUS    RESTARTS   AGE     IP             NODE                                NOMINATED NODE   READINESS GATES</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">virt-api-7f7d56bbc5-s9nr4          1/1     Running   0          4m10s   10.244.0.174   aks-nodepool1-26901818-vmss000000   &lt;none&gt;           &lt;none&gt;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">virt-controller-7c5744f574-56dd5   1/1     Running   0          3m39s   10.244.0.204   aks-nodepool1-26901818-vmss000000   &lt;none&gt;           &lt;none&gt;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">virt-controller-7c5744f574-ftz6z   1/1     Running   0          3m39s   10.244.0.120   aks-nodepool1-26901818-vmss000000   &lt;none&gt;           &lt;none&gt;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">virt-handler-dlkxf                 1/1     Running   0          3m39s   10.244.0.52    aks-nodepool1-26901818-vmss000000   &lt;none&gt;           &lt;none&gt;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">virt-operator-7c8bdfb574-54cs6     1/1     Running   0          9m38s   10.244.0.87    aks-nodepool1-26901818-vmss000000   &lt;none&gt;           &lt;none&gt;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">virt-operator-7c8bdfb574-wzdxt     1/1     Running   0          9m38s   10.244.0.153   aks-nodepool1-26901818-vmss000000   &lt;none&gt;           &lt;none&gt;</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="creating-virtualmachineinstance-resources-in-kubevirt">Creating VirtualMachineInstance resources in KubeVirt<a href="https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks#creating-virtualmachineinstance-resources-in-kubevirt" class="hash-link" aria-label="Direct link to Creating VirtualMachineInstance resources in KubeVirt" title="Direct link to Creating VirtualMachineInstance resources in KubeVirt" translate="no">​</a></h3>
<p>With KubeVirt installed on your cluster, you can now create your VirtualMachineInstance (VMI) resources.</p>
<ol>
<li>
<p>Create your VMI. Save the following YAML, which will create a VMI based on Fedora OS, as <code>vmi-fedora.yaml</code>. The username for this deployment will default to <code>fedora</code>, while you can specify a password of your choosing in <code>password: &lt;my_password&gt;</code>.</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> kubevirt.io/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> VirtualMachineInstance</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">labels</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">special</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vmi</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">fedora</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vmi</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">fedora</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">domain</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">devices</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">disks</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">disk</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">bus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> virtio</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> containerdisk</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">disk</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">bus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> virtio</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> cloudinitdisk</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">interfaces</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">masquerade</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">rng</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">memory</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">guest</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 1024M</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">resources</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">networks</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">pod</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">terminationGracePeriodSeconds</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">containerDisk</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> quay.io/kubevirt/fedora</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">with</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">test</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">tooling</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">container</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">disk</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">devel</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> containerdisk</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">cloudInitNoCloud</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">userData</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">|</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic">#cloud-config</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">password</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> &lt;my_password</span><span class="token punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">chpasswd</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">expire</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">False</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> cloudinitdisk</span><br></span></code></pre></div></div>
</li>
<li>
<p>Deploy the VMI in your cluster.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f vmi-fedora.yaml</span><br></span></code></pre></div></div>
<p>If successful, you should see an output similar to <code>virtualmachineinstance.kubevirt.io/vmi-fedora created</code>.</p>
</li>
</ol>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="check-out-the-created-vmi">Check out the created VMI<a href="https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks#check-out-the-created-vmi" class="hash-link" aria-label="Direct link to Check out the created VMI" title="Direct link to Check out the created VMI" translate="no">​</a></h3>
<ol>
<li>
<p>Test and make sure the VMI is created and running via <code>kubectl get vmi</code>. You should see a result similar to:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">NAME         AGE   PHASE     IP             NODENAME                            READY</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">vmi-fedora   85s   Running   10.244.0.213   aks-nodepool1-26901818-vmss000000   True</span><br></span></code></pre></div></div>
</li>
<li>
<p>Connect to the newly created VMI and inspect it.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">virtctl console vmi-fedora</span><br></span></code></pre></div></div>
<p>When prompted with credentials, the default username is <code>fedora</code>, while the password was configured in <code>vmi-fedora.yaml</code>.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">vmi-fedora login: fedora</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Password: </span><br></span></code></pre></div></div>
<p>Once logged in, run <code>cat /etc/os-release</code> to display the OS details.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">[fedora@vmi-fedora ~]$ cat /etc/os-release</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">NAME=Fedora</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">VERSION="32 (Cloud Edition)"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ID=fedora</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">VERSION_ID=32</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">VERSION_CODENAME=""</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">PLATFORM_ID="platform:f32"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">PRETTY_NAME="Fedora 32 (Cloud Edition)"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ANSI_COLOR="0;34"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">LOGO=fedora-logo-icon</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">CPE_NAME="cpe:/o:fedoraproject:fedora:32"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">HOME_URL="https://fedoraproject.org/"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f32/system-administrators-guide/"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">BUG_REPORT_URL="https://bugzilla.redhat.com/"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">REDHAT_BUGZILLA_PRODUCT="Fedora"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">REDHAT_BUGZILLA_PRODUCT_VERSION=32</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">REDHAT_SUPPORT_PRODUCT="Fedora"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">REDHAT_SUPPORT_PRODUCT_VERSION=32</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">VARIANT="Cloud Edition"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">VARIANT_ID=cloud</span><br></span></code></pre></div></div>
</li>
</ol>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="converting-your-vms">Converting your VMs<a href="https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks#converting-your-vms" class="hash-link" aria-label="Direct link to Converting your VMs" title="Direct link to Converting your VMs" translate="no">​</a></h2>
<p>At this point, you should have KubeVirt up and running in your AKS cluster and a VMI deployed. KubeVirt can help with a <a href="https://kubevirt.io/" target="_blank" rel="noopener noreferrer">plethora of scenarios</a> that operational teams may run into. Migrating legacy VMs to KubeVirt can be an involved process, however. <a href="https://www.spectrocloud.com/blog/how-to-migrate-your-vms-to-kubevirt-with-forklift" target="_blank" rel="noopener noreferrer">Doing it manually</a> involves steps like converting the VM disk and persisting a VM disk to creating a VM template.</p>
<p>Tools like <a href="https://github.com/kubev2v/forklift" target="_blank" rel="noopener noreferrer">Forklift</a> can automate some of the complexity involved with the migration. Forklift allows VMs to be migrated at scale to KubeVirt. The migration can be done by installing Forklift custom resources and setting up their respective configs in the target cluster. Some great walkthroughs of VM migration can be found in these videos <a href="https://www.youtube.com/watch?v=S7hVcv2Fu6I" target="_blank" rel="noopener noreferrer">detailing how Forklift helps deliver a better UX when importing VMs to KubeVirt</a> and <a href="https://www.youtube.com/watch?v=-w4Afj5-0_g" target="_blank" rel="noopener noreferrer">breaking down everything from the architecture to a demo of Forklift 2.0</a>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="running-in-production">Running in production<a href="https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks#running-in-production" class="hash-link" aria-label="Direct link to Running in production" title="Direct link to Running in production" translate="no">​</a></h2>
<p>When running production grade workloads, stability of both the KubeVirt components and the individual VMs can also be a point of consideration. As we hinted at earlier, KubeVirt typically sets the node-affinity of operator/custom resource components to control-plane nodes. In our deployment, we have the KubeVirt components running on worker nodes.</p>
<p>In order to maintain a control-plane/worker node split, it can be advisable to aim to deploy KubeVirt components in an agentpool that can be designated as the "control-plane" node, while VMs spun up can be ran in designated "worker node" agentpools.</p>
<p>KubeVirt is currently not an officially supported AKS addon/extension, so there is no Microsoft backed SLA/SLO in place for KubeVirt deployments in AKS. If customers need an officially supported offering, <a href="https://learn.microsoft.com/en-us/azure/openshift/howto-create-openshift-virtualization" target="_blank" rel="noopener noreferrer">Azure Red Hat OpenShift</a> is a generally available platform to manage virtualized and containerized applications together.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="share-your-feedback">Share your feedback<a href="https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks#share-your-feedback" class="hash-link" aria-label="Direct link to Share your feedback" title="Direct link to Share your feedback" translate="no">​</a></h2>
<p>If you're using KubeVirt on AKS or are interested in trying it, we'd love to hear from you! Your feedback will help the AKS team plan how to best support these types of workloads on our platform. Share your thoughts in our <a href="https://github.com/Azure/AKS/issues/5445" target="_blank" rel="noopener noreferrer">GitHub Issue</a>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="resources">Resources<a href="https://blog.aks.azure.com/2026/02/06/kubevirt-on-aks#resources" class="hash-link" aria-label="Direct link to Resources" title="Direct link to Resources" translate="no">​</a></h2>
<ul>
<li><a href="https://www.redhat.com/topics/virtualization/what-is-kubevirt" target="_blank" rel="noopener noreferrer">What is KubeVirt?</a></li>
<li><a href="https://kubevirt.io/user-guide/" target="_blank" rel="noopener noreferrer">KubeVirt user guides</a></li>
<li><a href="https://github.com/Azure/AKS/issues/5445" target="_blank" rel="noopener noreferrer">Roadmap item for KubeVirt on AKS</a></li>
</ul>]]></content:encoded>
            <category>KubeVirt</category>
            <category>General</category>
            <category>Operations</category>
        </item>
        <item>
            <title><![CDATA[Autoscale KAITO inference workloads on AKS using KEDA]]></title>
            <link>https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito</link>
            <guid>https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito</guid>
            <pubDate>Tue, 03 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to autoscale KAITO inference workloads on AKS with KEDA to handle varying requests and optimize GPU utilization for AI models at scale.]]></description>
            <content:encoded><![CDATA[<p><a href="https://github.com/Azure/kaito" target="_blank" rel="noopener noreferrer">Kubernetes AI Toolchain Operator</a> (KAITO) is an operator that simplifies and automates AI/ML model inference, tuning, and RAG in a Kubernetes cluster. With the recent <a href="https://github.com/Azure/kaito/releases/tag/v0.8.0" target="_blank" rel="noopener noreferrer">v0.8.0 release</a>, KAITO has introduced intelligent autoscaling for inference workloads as an alpha feature! In this blog, we'll guide you through setting up event-driven autoscaling for vLLM inference workloads.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="introduction">Introduction<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p>LLM inference service is a basic and widely used feature in KAITO. As the number of waiting inference requests increases, scale more inference instances to prevent blocking. Conversely, reduce inference instances when requests decline to improve GPU resource utilization. Kubernetes Event Driven Autoscaling (KEDA) is well-suited for inference pod autoscaling. It enables event-driven, fine-grained scaling based on external metrics and triggers. KEDA supports a wide range of event sources (like custom metrics), allowing pods to scale precisely in response to workload demand. This flexibility and extensibility make KEDA ideal for dynamic, cloud-native applications that require responsive and efficient autoscaling.</p>
<p>To enable intelligent autoscaling for KAITO inference workloads using service monitoring metrics, utilize the following components and features:</p>
<ul>
<li>
<p><a href="https://github.com/kedacore/keda" target="_blank" rel="noopener noreferrer">Kubernetes Event Driven Autoscaling (KEDA)</a></p>
</li>
<li>
<p><strong><a href="https://github.com/kaito-project/keda-kaito-scaler" target="_blank" rel="noopener noreferrer">KEDA KAITO Scaler</a></strong>: A dedicated KEDA external scaler, eliminating the need for external dependencies such as Prometheus.</p>
</li>
<li>
<p><strong>KAITO <code>InferenceSet</code> CustomResourceDefinition (CRD) and controller</strong>: A new CRD and controller were built on top of the KAITO workspace for intelligent autoscaling, introduced as an alpha feature in KAITO version <code>v0.8.0</code>.</p>
</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="architecture">Architecture<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#architecture" class="hash-link" aria-label="Direct link to Architecture" title="Direct link to Architecture" translate="no">​</a></h3>
<p>The following diagram shows how KEDA KAITO Scaler integrates KAITO InferenceSet with KEDA to autoscale inference workloads on AKS:</p>
<p><img decoding="async" loading="lazy" alt="Architecture diagram showing KEDA KAITO Scaler integrating KAITO InferenceSet with KEDA to autoscale inference workloads on AKS" src="https://blog.aks.azure.com/assets/images/keda-kaito-scaler-arch-1af338819a90073a9d0487574a964222.png" width="2433" height="1221" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="getting-started">Getting started<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#getting-started" class="hash-link" aria-label="Direct link to Getting started" title="Direct link to Getting started" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="create-an-aks-cluster-with-gpu-auto-provisioning-capabilities-for-kaito">Create an AKS cluster with GPU auto-provisioning capabilities for KAITO<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#create-an-aks-cluster-with-gpu-auto-provisioning-capabilities-for-kaito" class="hash-link" aria-label="Direct link to Create an AKS cluster with GPU auto-provisioning capabilities for KAITO" title="Direct link to Create an AKS cluster with GPU auto-provisioning capabilities for KAITO" translate="no">​</a></h3>
<p>Refer to the instructions on <a href="https://kaito-project.github.io/kaito/docs/azure" target="_blank" rel="noopener noreferrer">how to create an AKS cluster with GPU auto-provisioning capabilities for KAITO</a>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="enable-inferenceset-controller-in-kaito">Enable InferenceSet controller in KAITO<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#enable-inferenceset-controller-in-kaito" class="hash-link" aria-label="Direct link to Enable InferenceSet controller in KAITO" title="Direct link to Enable InferenceSet controller in KAITO" translate="no">​</a></h3>
<p>The InferenceSet CRD and controller were introduced as an <strong>alpha</strong> feature in KAITO version <code>v0.8.0</code>. Built on top of the KAITO workspace, InferenceSet supports the scale subresource API for intelligent autoscaling. To use InferenceSet, the InferenceSet controller must be enabled during the KAITO installation.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">export CLUSTER_NAME=kaito</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">helm repo add kaito https://kaito-project.github.io/kaito/charts/kaito</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">helm repo update</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">helm upgrade --install kaito-workspace kaito/workspace \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --namespace kaito-workspace \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --create-namespace \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --set clusterName="$CLUSTER_NAME" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --set featureGates.enableInferenceSetController=true \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --wait</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="install-keda">Install KEDA<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#install-keda" class="hash-link" aria-label="Direct link to Install KEDA" title="Direct link to Install KEDA" translate="no">​</a></h3>
<ul>
<li>
<p><strong>Option 1</strong>: Enable managed KEDA add-on
For instructions, refer to <a href="https://learn.microsoft.com/azure/aks/keda-deploy-add-on-cli" target="_blank" rel="noopener noreferrer">Install KEDA add-on on AKS</a></p>
</li>
<li>
<p><strong>Option 2</strong>: Install KEDA using Helm chart</p>
</li>
</ul>
<blockquote>
<p>The following example demonstrates how to install KEDA 2.x using Helm chart. For instructions on installing KEDA through other methods, refer to the <a href="https://github.com/kedacore/keda#deploying-keda" target="_blank" rel="noopener noreferrer">KEDA deployment documentation</a>.</p>
</blockquote>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">helm repo add kedacore https://kedacore.github.io/charts</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">helm install keda kedacore/keda --namespace kube-system</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="example-scenarios">Example Scenarios<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#example-scenarios" class="hash-link" aria-label="Direct link to Example Scenarios" title="Direct link to Example Scenarios" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="time-based-keda-scaler">Time-Based KEDA Scaler<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#time-based-keda-scaler" class="hash-link" aria-label="Direct link to Time-Based KEDA Scaler" title="Direct link to Time-Based KEDA Scaler" translate="no">​</a></h3>
<p>The KEDA cron scaler enables scaling of workloads according to time-based schedules, making it especially beneficial for workloads with predictable traffic patterns. It is perfect for situations where peak hours are known ahead of time, allowing you to proactively adjust resources before demand rises. For more details about time-based scalers, refer to <a href="https://keda.sh/docs/2.18/scalers/cron/" target="_blank" rel="noopener noreferrer">Scale applications based on a cron schedule</a>.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="example-business-hours-scaling">Example: Business Hours Scaling<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#example-business-hours-scaling" class="hash-link" aria-label="Direct link to Example: Business Hours Scaling" title="Direct link to Example: Business Hours Scaling" translate="no">​</a></h4>
<ul>
<li>Create a KAITO InferenceSet for running inference workloads</li>
</ul>
<p>The following example creates an InferenceSet for the phi-4-mini model:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">cat &lt;&lt;EOF | kubectl apply -f -</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: kaito.sh/v1alpha1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: InferenceSet</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name: phi-4-mini</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  namespace: default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  labelSelector:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    matchLabels:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      apps: phi-4-mini</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  replicas: 1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  template:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    inference:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      preset:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        accessMode: public</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        name: phi-4-mini-instruct</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    resource:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      instanceType: Standard_NC24ads_A100_v4</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
<ul>
<li>Create a KEDA ScaledObject</li>
</ul>
<p>Below is an example of creating a <code>ScaledObject</code> that scales a KAITO InferenceSet based on business hours:</p>
<ul>
<li>
<p><strong>Scale up to 5 replicas</strong> from 6:00 AM to 8:00 PM (peak hours)</p>
</li>
<li>
<p><strong>Scale down to 1 replica</strong> otherwise (off-peak hours)</p>
</li>
</ul>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">cat &lt;&lt;EOF | kubectl apply -f -</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: keda.sh/v1alpha1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: ScaledObject</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name: kaito-business-hours-scaler</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  namespace: default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  # Target KAITO InferenceSet to scale</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  scaleTargetRef:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    apiVersion: kaito.sh/v1alpha1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    kind: InferenceSet</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    name: phi-4-mini</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  # Scaling boundaries</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  minReplicaCount: 1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  maxReplicaCount: 5</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  # Cron-based triggers for time-based scaling</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  triggers:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  # Scale up to 5 replicas at 6:00 AM (start of business hours)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  - type: cron</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      timezone: "America/New_York"  # Adjust timezone as needed</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      start: "0 6 * * 1-5"          # 6:00 AM Monday to Friday</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      end: "0 20 * * 1-5"           # 8:00 PM Monday to Friday</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      desiredReplicas: "5"          # Scale to 5 replicas during business hours</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  # Scale down to 1 replica at 8:00 PM (end of business hours)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  - type: cron</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      timezone: "America/New_York"  # Adjust timezone as needed</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      start: "0 20 * * 1-5"         # 8:00 PM Monday to Friday</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      end: "0 6 * * 1-5"            # 6:00 AM Monday to Friday (next day)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      desiredReplicas: "1"          # Scale to 1 replica during off-hours</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="metric-based-keda-scaler">Metric-Based KEDA Scaler<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#metric-based-keda-scaler" class="hash-link" aria-label="Direct link to Metric-Based KEDA Scaler" title="Direct link to Metric-Based KEDA Scaler" translate="no">​</a></h3>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="install-keda-kaito-scaler">Install KEDA KAITO Scaler<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#install-keda-kaito-scaler" class="hash-link" aria-label="Direct link to Install KEDA KAITO Scaler" title="Direct link to Install KEDA KAITO Scaler" translate="no">​</a></h4>
<blockquote>
<p>This component is required only when using metric-based KEDA scaler, ensure that KEDA KAITO Scaler is installed within the same namespace as KEDA.</p>
</blockquote>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">helm repo add keda-kaito-scaler https://kaito-project.github.io/keda-kaito-scaler/charts/kaito-project</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">helm upgrade --install keda-kaito-scaler -n kube-system keda-kaito-scaler/keda-kaito-scaler</span><br></span></code></pre></div></div>
<p>After a few seconds, the <code>keda-kaito-scaler</code> deployment starts.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># kubectl get deployment keda-kaito-scaler -n kube-system</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">NAME                READY   UP-TO-DATE   AVAILABLE   AGE</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">keda-kaito-scaler   1/1     1            1           28h</span><br></span></code></pre></div></div>
<p>The <code>keda-kaito-scaler</code> provides a simplified configuration interface for scaling vLLM inference workloads, it directly scrapes metrics from inference pods, eliminating the need for a separate monitoring stack.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="example-create-a-kaito-inferenceset-with-annotations-for-running-inference-workloads">Example: Create a KAITO InferenceSet with annotations for running inference workloads<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#example-create-a-kaito-inferenceset-with-annotations-for-running-inference-workloads" class="hash-link" aria-label="Direct link to Example: Create a KAITO InferenceSet with annotations for running inference workloads" title="Direct link to Example: Create a KAITO InferenceSet with annotations for running inference workloads" translate="no">​</a></h4>
<ul>
<li>
<p>The following example creates an InferenceSet for the phi-4-mini model, using annotations with the prefix <code>scaledobject.kaito.sh/</code> to supply parameter inputs for the KEDA KAITO scaler.</p>
<ul>
<li><code>scaledobject.kaito.sh/auto-provision</code>
<ul>
<li>required, when set to <code>true</code>, the KEDA KAITO scaler automatically provisions a ScaledObject based on the <code>InferenceSet</code> object</li>
</ul>
</li>
<li><code>scaledobject.kaito.sh/max-replicas</code>
<ul>
<li>required, maximum number of replicas for the target InferenceSet</li>
</ul>
</li>
<li><code>scaledobject.kaito.sh/metricName</code>
<ul>
<li>optional, specifies the metric name collected from the vLLM pod, which is used for monitoring and triggering the scaling operation, default is <code>vllm:num_requests_waiting</code>, find all vllm metrics in <a href="https://docs.vllm.ai/en/stable/usage/metrics/#general-metrics" target="_blank" rel="noopener noreferrer">vLLM Production Metrics</a></li>
</ul>
</li>
<li><code>scaledobject.kaito.sh/threshold</code>
<ul>
<li>required, specifies the threshold for the monitored metric that triggers the scaling operation</li>
</ul>
</li>
</ul>
</li>
</ul>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">cat &lt;&lt;EOF | kubectl apply -f -</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: kaito.sh/v1alpha1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: InferenceSet</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  annotations:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    scaledobject.kaito.sh/auto-provision: "true"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    scaledobject.kaito.sh/max-replicas: "5"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    scaledobject.kaito.sh/metricName: "vllm:num_requests_waiting"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    scaledobject.kaito.sh/threshold: "10"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name: phi-4-mini</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  namespace: default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  labelSelector:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    matchLabels:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      apps: phi-4-mini</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  replicas: 1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  template:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    inference:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      preset:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        accessMode: public</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        name: phi-4-mini-instruct</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    resource:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      instanceType: Standard_NC24ads_A100_v4</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
<p>In just a few seconds, the KEDA KAITO scaler automatically creates the <code>scaledobject</code> and <code>hpa</code> objects. After a few minutes, once the inference pod runs, the KEDA KAITO scaler begins scraping <a href="https://docs.vllm.ai/en/stable/usage/metrics/#general-metrics" target="_blank" rel="noopener noreferrer">metric values</a> from the inference pod. The system then marks the status of the <code>scaledobject</code> and <code>hpa</code> objects as ready.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># kubectl get scaledobject</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">NAME           SCALETARGETKIND                  SCALETARGETNAME   MIN   MAX   READY   ACTIVE    FALLBACK   PAUSED   TRIGGERS   AUTHENTICATIONS           AGE</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">phi-4-mini     kaito.sh/v1alpha1.InferenceSet   phi-4-mini        1     5     True    True     False      False    external   keda-kaito-scaler-creds   10m</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># kubectl get hpa</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">NAME                    REFERENCE                   TARGETS      MINPODS   MAXPODS   REPLICAS   AGE</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">keda-hpa-phi-4-mini     InferenceSet/phi-4-mini     0/10 (avg)   1         5         1          11m</span><br></span></code></pre></div></div>
<p>That's it! Your KAITO workloads will now automatically scale based on the average number of waiting inference requests(<code>vllm:num_requests_waiting</code>) across all workloads associated with <code>InferenceSet/phi-4-mini</code> in the cluster.</p>
<p>In the example below, if <code>vllm:num_requests_waiting</code> exceeds the threshold (10) for over 60 seconds, KEDA scales up by adding a new replica to <code>InferenceSet/phi-4-mini</code>. Conversely, if <code>vllm:num_requests_waiting</code> remains below the threshold (10) for more than 300 seconds, KEDA scales down the number of replicas.</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">Every 2.0s</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> kubectl describe hpa</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">Name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">                                                     keda</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">hpa</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">phi</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mini</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">Namespace</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">                                                default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">Labels</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">                                                   app.kubernetes.io/managed</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">by=keda</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">operator</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                                                          app.kubernetes.io/name=keda</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">hpa</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">phi</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mini</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                                                          app.kubernetes.io/part</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">of=phi</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mini</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                                                          app.kubernetes.io/version=2.18.1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                                                          scaledobject.keda.sh/name=phi</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mini</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">Annotations</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">                                              </span><span class="token key atrule" style="color:#00a4db">scaledobject.kaito.sh/managed-by</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> keda</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">kaito</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">scaler</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">CreationTimestamp</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">                                        Tue</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> 09 Dec 2025 03</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">35</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">09 +0000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">Reference</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">                                                InferenceSet/phi</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mini</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">Metrics</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">                                                  ( current / target )</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "s0</span><span class="token punctuation" style="color:#393A34">-</span><span class="token key atrule" style="color:#00a4db">vllm:num_requests_waiting" (target average value)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">  58 / 10</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">Min replicas</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">                                             </span><span class="token number" style="color:#36acaa">1</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">Max replicas</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">                                             </span><span class="token number" style="color:#36acaa">5</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">Behavior</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">Scale Up</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">Stabilization Window</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 60 seconds</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">Select Policy</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Max</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">Policies</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">Type</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">Pods  Value</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">1  Period</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 300 seconds</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">Scale Down</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">Stabilization Window</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 300 seconds</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">Select Policy</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Max</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">Policies</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">Type</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">Pods  Value</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">1  Period</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 600 seconds</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">InferenceSet pods</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">  2 current / 2 desired</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">Conditions</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  Type            Status  Reason            Message</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">---</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">---</span><span class="token punctuation" style="color:#393A34">---</span><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">---</span><span class="token punctuation" style="color:#393A34">---</span><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">---</span><span class="token punctuation" style="color:#393A34">---</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  AbleToScale     True    ReadyForNewScale  recommended size matches current size</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from external metric s0</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">vllm</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">num_requests_waiting(</span><span class="token important">&amp;Lab</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">elSelector</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">MatchLabels</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">map</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">string</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain">string</span><span class="token punctuation" style="color:#393A34">{</span><span class="token key atrule" style="color:#00a4db">scaledobject.keda.sh/name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> phi</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mini</span><span class="token punctuation" style="color:#393A34">,</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">MatchExpressions</span><span class="token punctuation" style="color:#393A34">:</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain">LabelSelectorRequirement</span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain">)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ScalingLimited  True    ScaleUpLimit      the desired replica count is increasing faster than the maximum scale rate</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">Events</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  Type    Reason             Age   From                       Message</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">---</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">---</span><span class="token punctuation" style="color:#393A34">---</span><span class="token plain">             </span><span class="token punctuation" style="color:#393A34">---</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">---</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">                       </span><span class="token punctuation" style="color:#393A34">---</span><span class="token punctuation" style="color:#393A34">---</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">Normal  SuccessfulRescale  33s   horizontal-pod-autoscaler  New size</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">2; reason</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> external metric s0</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">vllm</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">num_requests_waiting(</span><span class="token important">&amp;LabelSelector</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">MatchLabels</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">ma</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">p</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">string</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain">string</span><span class="token punctuation" style="color:#393A34">{</span><span class="token key atrule" style="color:#00a4db">scaledobject.keda.sh/name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> phi</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mini</span><span class="token punctuation" style="color:#393A34">,</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">MatchExpressions</span><span class="token punctuation" style="color:#393A34">:</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain">LabelSelectorRequirement</span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain">) above target</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="summary">Summary<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary" translate="no">​</a></h2>
<p>KAITO's LLM inference service must scale inference instances dynamically to handle varying numbers of waiting requests: scaling up to prevent blocking when requests increase, and scaling down to optimize GPU usage when requests decrease. With the newly introduced InferenceSet CRD and KEDA KAITO scaler, configuring this setting in KAITO has become much simpler.</p>
<p>We're just getting started and would love your feedback. To learn more about KAITO inference workloads autoscaling and AI model deployment on AKS, check out the following links:</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="resources">Resources<a href="https://blog.aks.azure.com/2026/02/03/autoscale-inference-workloads-with-kaito#resources" class="hash-link" aria-label="Direct link to Resources" title="Direct link to Resources" translate="no">​</a></h2>
<ul>
<li><a href="https://kaito-project.github.io/kaito/docs/keda-autoscaler-inference" target="_blank" rel="noopener noreferrer">KEDA Auto-Scaler for inference workloads</a></li>
<li><a href="https://github.com/kaito-project/kaito/blob/main/docs/proposals/20250918-introduce_inferenceset_autoscaling.md" target="_blank" rel="noopener noreferrer">KAITO InferenceSet</a></li>
<li><a href="https://docs.vllm.ai/en/stable/usage/metrics/#general-metrics" target="_blank" rel="noopener noreferrer">vLLM Production Metrics</a></li>
</ul>]]></content:encoded>
            <category>AI</category>
            <category>KAITO</category>
        </item>
        <item>
            <title><![CDATA[Navigating Capacity Challenges on AKS with Node Auto Provisioning or Virtual Machine Node Pools]]></title>
            <link>https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management</link>
            <guid>https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management</guid>
            <pubDate>Fri, 30 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how Node auto provisioning and virtual machine node pools can address common capacity constraints when scaling an AKS cluster. Also learn best practices for compute scaling in AKS.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorWithStickyNavbar_LWe7" id="when-growth-meets-a-wall">When Growth Meets a Wall<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#when-growth-meets-a-wall" class="hash-link" aria-label="Direct link to When Growth Meets a Wall" title="Direct link to When Growth Meets a Wall" translate="no">​</a></h2>
<p>Imagine this: your application is thriving, traffic spikes, and Kubernetes promises elasticity. You hit “scale,” expecting seamless provisioning - only to be greeted by errors like:</p>
<ul>
<li><strong>SkuNotAvailable</strong>: The VM size (also referred to as VM SKU) you requested is not available.</li>
<li><strong>AllocationFailed</strong>: Azure can’t allocate the specific VM size with the constraints you requested in a particular region.</li>
<li><strong>Quota exceeded</strong>: Your subscription has hit its compute limits for a particular location or VM size.</li>
<li><strong>ZonalAllocationFailed</strong>: Azure can’t allocate the VM size with the constraints you requested in a particular zone.</li>
<li><strong>OverconstrainedAllocationRequest</strong>: Azure can’t allocate the specific VM size with the constraints you requested in a particular region.</li>
<li><strong>OverconstrainedZonalAllocationRequest</strong>: Azure can’t allocate the VM size with the constraints you requested in a particular zone.</li>
</ul>
<p>For customers, these aren’t just error messages - they’re roadblocks. Pods remain pending, deployments stall, and SLAs tremble. Scaling isn’t just about adding nodes; it’s about finding capacity in a dynamic, multi-tenant cloud where demand often outpaces supply. In the case of quota gaps, usually users can increase their quotas in a particular location - but what about when a specific virtual machine size (also known as a "VM SKU") is simply unavailable? This can cause many challenges for users.</p>
<hr>
<p><img decoding="async" loading="lazy" alt="visual demo of node auto provisioning and virtual machine node pools features. The image shows a karpenter scheduler reacting to unscheduled pods and provisioning multiple sizes of nodes to schedule them. The image also shows a virtual machine node pool that can be scaled up or down with different sizes of nodes" src="https://blog.aks.azure.com/assets/images/nap-vms-hero-image-ee3afdca371a39ef82328e999c0323ab.png" width="1814" height="762" class="img_ev3q"></p>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>Learn more in the official documentation: <a href="https://learn.microsoft.com/azure/aks/node-auto-provisioning" target="_blank" rel="noopener noreferrer">Node Auto Provisioning</a> or <a href="https://learn.microsoft.com/azure/aks/virtual-machines-node-pools" target="_blank" rel="noopener noreferrer">virtual machine node pools</a></p></div></div>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-hidden-complexity-behind-capacity">The Hidden Complexity Behind Capacity<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#the-hidden-complexity-behind-capacity" class="hash-link" aria-label="Direct link to The Hidden Complexity Behind Capacity" title="Direct link to The Hidden Complexity Behind Capacity" translate="no">​</a></h2>
<p>When using Kubernetes, every node pool is typically tied to a specific VM SKU, region, and zone, which can require some effort to update. In some scaling scenarios, high-demand VM SKUs can become unavailable in certain regions or zones. In this case, limiting node pools to a single VM size becomes a bottleneck that can result in capacity errors and an outage. You’re left juggling trade-offs: Do you overprovision SKUs “just in case” to ensure availability? Or risk underprovisioning and inability to scale? AKS offers two solutions that aim to address these capacity scaling challenges.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="breaking-the-mold-features-that-change-the-game">Breaking the Mold: Features That Change the Game<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#breaking-the-mold-features-that-change-the-game" class="hash-link" aria-label="Direct link to Breaking the Mold: Features That Change the Game" title="Direct link to Breaking the Mold: Features That Change the Game" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="node-auto-provisioning-nap-smarter-scaling">Node Auto Provisioning (NAP): Smarter Scaling<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#node-auto-provisioning-nap-smarter-scaling" class="hash-link" aria-label="Direct link to Node Auto Provisioning (NAP): Smarter Scaling" title="Direct link to Node Auto Provisioning (NAP): Smarter Scaling" translate="no">​</a></h3>
<p>NAP offers a more intelligent scaling experience. Instead of you guessing the right VM size and precreating node pools, NAP uses <strong>pending pod resource requests</strong> to dynamically provision nodes that fit your workloads. Built on the open-source <strong>Karpenter</strong> project, NAP:</p>
<ul>
<li><strong>Automates VM selection</strong>: Chooses optimal SKUs based on CPU, memory, and constraints</li>
<li><strong>Consolidates intelligently</strong>: Removes underutilized nodes, reducing cost</li>
<li><strong>Adapts in real time</strong>: Responds to pod pressure without manual intervention</li>
</ul>
<p>Think of NAP as Kubernetes with foresight: provisioning what you need, when you need it, without the spreadsheet gymnastics. Without NAP, a single unavailable VM SKU can block scaling entirely. With NAP, AKS dynamically adapts to capacity fluctuations, ensuring workloads keep running on available VM sizes, even during regional/zonal shortages.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="how-nap-handles-capacity-errors">How NAP handles capacity errors<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#how-nap-handles-capacity-errors" class="hash-link" aria-label="Direct link to How NAP handles capacity errors" title="Direct link to How NAP handles capacity errors" translate="no">​</a></h4>
<p>When a requested VM SKU isn’t available due to regional or zonal capacity constraints, NAP doesn’t fail outright. Instead, NAP will automatically:</p>
<ul>
<li>Evaluate pending pod resource requirements (for example CPU, memory, GPU)</li>
<li>Check if pending pods can fit on existing nodes</li>
<li>Search across multiple VM SKUs within the allowed families defined in your NAP configuration files, which are custom resource definitions (CRDs) named NodePool and AKSNodeClass</li>
<li>Provision an alternative SKU that meets the workload requirements and policy constraints</li>
</ul>
<p>In the event that no VM sizes that match your requirements are available, NAP will only then send an error detailing that "No available SKU that meets your configuration definition is available." To mitigate this, make sure you reference a broad range of size options in the NAP configuration files (for example D-series, or multiple SKU families).</p>
<p>This flexibility is key to avoiding hard failures during scale-out. In the scenario where there are no SKUs available based on your configuration requirements, NAP will return an error stating that there were no available SKUs that matched your requirements. Typically this means the configuration requirements probably can be broader, to allow for more available VM sizes.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="nap-vs-cluster-autoscaler">NAP vs Cluster Autoscaler<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#nap-vs-cluster-autoscaler" class="hash-link" aria-label="Direct link to NAP vs Cluster Autoscaler" title="Direct link to NAP vs Cluster Autoscaler" translate="no">​</a></h4>
<p>In traditional Kubernetes, Cluster Autoscaler is the standard autoscaling experience that scales pre-existing same VM size node pools. The requirement for same size autoscaling is subject to availability limits of the selected VM sizes, and Cluster Autoscaler does not allow for changing the node pool's VM SKU. Should the specific SKU be unavailable, a capacity error occurs and your workloads are now stuck. To overcome this limitation, you may have to pre-create multiple node pools with different VM SKUs to avoid capacity exhaustion which introduces operational complexities. When zonal allocation constraints are also factored in, the complexity of traditional node pools increase further. In such scenarios, cluster autoscaler configurations may require one node pool per zone for each VM SKU to reliably scale.</p>
<p>NAP offers a new model based on individual virtual machines rather than node pools or Virtual Machine Scale Sets. NAP also provides versatility that can offer more capacity resilience and more cost optimization than traditional node pools using Cluster Autoscaler. Many of the capacity limitations and work-arounds are addressed with NAP.</p>
<p>For more on enabling NAP on your cluster, visit our <a href="https://learn.microsoft.com/azure/aks/node-auto-provisioning" target="_blank" rel="noopener noreferrer">NAP documentation</a> as well as our docs on configuring the <a href="https://learn.microsoft.com/azure/aks/node-auto-provisioning-node-pools" target="_blank" rel="noopener noreferrer">NodePool CRD</a> and <a href="https://learn.microsoft.com/azure/aks/node-auto-provisioning-aksnodeclass" target="_blank" rel="noopener noreferrer">AKSNodeClass CRD</a>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="virtual-machine-node-pools-flexibility-at-scale">Virtual Machine Node Pools: Flexibility at Scale<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#virtual-machine-node-pools-flexibility-at-scale" class="hash-link" aria-label="Direct link to Virtual Machine Node Pools: Flexibility at Scale" title="Direct link to Virtual Machine Node Pools: Flexibility at Scale" translate="no">​</a></h3>
<p>Traditional node pools are rigid: one VM size per node pool. Virtual machine node pools break that limitation. With multi-SKU support, you can:</p>
<ul>
<li>Mix VM sizes within a single node pool for diverse workloads</li>
<li>Fine-tune capacity without creating dozens of node pools</li>
<li>Reduce operational overhead while improving resilience</li>
</ul>
<p>Virtual machine node pools provide flexibility and versatility in capacity-constrained regions.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="how-virtual-machine-node-pools-handle-capacity-errors">How virtual machine node pools handle capacity errors<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#how-virtual-machine-node-pools-handle-capacity-errors" class="hash-link" aria-label="Direct link to How virtual machine node pools handle capacity errors" title="Direct link to How virtual machine node pools handle capacity errors" translate="no">​</a></h4>
<p>You can manually add or update alternative VM SKUs into your new or existing node pools. When a requested VM SKU isn't available due to a regional or zonal capacity constraint, you will receive a capacity error, and can resolve this error by simply adding and updating the VM SKUs in your node pools.</p>
<p>For more on enabling Virtual machine node pools on your cluster, visit our <a href="https://learn.microsoft.com/azure/aks/virtual-machines-node-pools" target="_blank" rel="noopener noreferrer">Virtual machine node pools documentation</a>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="quick-guidance-when-to-use-what">Quick Guidance: When to Use What<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#quick-guidance-when-to-use-what" class="hash-link" aria-label="Direct link to Quick Guidance: When to Use What" title="Direct link to Quick Guidance: When to Use What" translate="no">​</a></h2>
<p>Generally, NAP and virtual machine node pools are mutually exclusive options. You can use NAP to create standalone VMs that NAP manages instead of traditional node pools, allowing for <strong>mixed SKU autoscaling</strong>. Virtual machine node pools use traditional node pools and allow for <strong>mixed SKU manual scaling</strong>.</p>
<ul>
<li>(Recommended) Choose NAP for dynamic environments or specific SKU selection where manual SKU planning is impractical.</li>
<li>Choose virtual machine node pools when you need fine-tuned control with exact VM SKUs for compliance, predictable performance, or cost modeling</li>
</ul>
<p>Avoid NAP if you require strict SKU governance or have regulatory constraints that cannot allow for dynamic autoscaling. Avoid VM node pools if you want full automation without manual profiles.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="best-practices-for-resilience">Best Practices for Resilience<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#best-practices-for-resilience" class="hash-link" aria-label="Direct link to Best Practices for Resilience" title="Direct link to Best Practices for Resilience" translate="no">​</a></h2>
<p>To maximize NAP's ability to handle capacity errors:</p>
<ul>
<li>Define broad SKU families (e.g., D, E) in your NodePool requirements</li>
<li>Avoid overly restrictive affinity rules. Visit our <a href="https://learn.microsoft.com/azure/aks/operator-best-practices-advanced-scheduler#control-pod-scheduling-using-node-selectors-and-affinity" target="_blank" rel="noopener noreferrer">node selector and affinity best practices documentation</a> for more details</li>
<li>Enable multiple NodePools with different priorities for fallback. Visit our <a href="https://learn.microsoft.com/azure/aks/node-auto-provisioning-node-pools" target="_blank" rel="noopener noreferrer">NAP Node Pool documentation</a> to learn more</li>
</ul>
<p>To maximize virtual machine node pool's ability to adapt to capacity errors:</p>
<ul>
<li>Be clear on a list of VM SKUs that can tolerate your workloads. Visit our <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/overview#list-of-vm-size-families-by-type" target="_blank" rel="noopener noreferrer">Azure VM Sizes documentation</a> for more details</li>
<li>Create virtual machine node pools to offer resiliency to your workloads. Visit our <a href="https://learn.microsoft.com/azure/aks/virtual-machines-node-pools" target="_blank" rel="noopener noreferrer">virtual machine node pool documentation</a> on how to add a mixed SKU node pool</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="getting-started-with-node-auto-provisioning">Getting Started with Node Auto Provisioning<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#getting-started-with-node-auto-provisioning" class="hash-link" aria-label="Direct link to Getting Started with Node Auto Provisioning" title="Direct link to Getting Started with Node Auto Provisioning" translate="no">​</a></h2>
<p>Before you begin, visit our <a href="https://learn.microsoft.com/azure/aks/node-auto-provisioning#limitations-and-unsupported-features" target="_blank" rel="noopener noreferrer">NAP documentation</a> on minimum cluster requirements.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="create-a-new-nap-managed-aks-cluster">Create a new NAP-managed AKS cluster<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#create-a-new-nap-managed-aks-cluster" class="hash-link" aria-label="Direct link to Create a new NAP-managed AKS cluster" title="Direct link to Create a new NAP-managed AKS cluster" translate="no">​</a></h3>
<p>The following command creates a new NAP-managed AKS cluster by setting the <code>--node-provisioning-mode</code> field to <code>Auto</code>. This command also sets the network configuration to the recommended Azure CNI Overlay with a Cilium dataplane (optional). View our <a href="https://learn.microsoft.com/azure/aks/node-auto-provisioning-networking#supported-networking-configurations-for-nap" target="_blank" rel="noopener noreferrer">NAP networking documentation</a> for more on supported CNI options.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks create --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP --node-provisioning-mode Auto --network-plugin azure --network-plugin-mode overlay --network-dataplane cilium</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="update-an-existing-cluster-to-be-a-nap-managed-cluster">Update an existing cluster to be a NAP-managed cluster<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#update-an-existing-cluster-to-be-a-nap-managed-cluster" class="hash-link" aria-label="Direct link to Update an existing cluster to be a NAP-managed cluster" title="Direct link to Update an existing cluster to be a NAP-managed cluster" translate="no">​</a></h3>
<p>The following command updates an existing cluster to enable NAP:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks update --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP --node-provisioning-mode Auto</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="configure-nap-customresourcedefinitions">Configure NAP CustomResourceDefinitions<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#configure-nap-customresourcedefinitions" class="hash-link" aria-label="Direct link to Configure NAP CustomResourceDefinitions" title="Direct link to Configure NAP CustomResourceDefinitions" translate="no">​</a></h3>
<p>NAP uses CustomResourceDefinitions (CRDs) and your application deployment file requirements for its decision-making. The Karpenter controller takes this information and determines which virtual machines to provision and schedule your workloads to. Karpenter CRD types include:</p>
<ul>
<li>NodePool - for setting rules around the range of VM sizes, capacity type (spot vs. on-demand), compute architecture, availability zones, etc</li>
<li>AKSNodeClass - for setting rules around certain Azure specific settings such as more detailed networking (virtual networks) setup, node image family type, operating system configurations, and other resource-related definitions</li>
</ul>
<p>Visit our <a href="https://learn.microsoft.com/azure/aks/node-auto-provisioning-node-pools" target="_blank" rel="noopener noreferrer">NAP NodePool Documentation</a> and <a href="https://learn.microsoft.com/azure/aks/node-auto-provisioning-aksnodeclass" target="_blank" rel="noopener noreferrer">NAP AKSNodeClass documentation</a> for more on configuring these files.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="getting-started-with-virtual-machine-node-pools">Getting started with virtual machine node pools<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#getting-started-with-virtual-machine-node-pools" class="hash-link" aria-label="Direct link to Getting started with virtual machine node pools" title="Direct link to Getting started with virtual machine node pools" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="create-a-new-aks-cluster-with-virtual-machine-node-pools">Create a new AKS cluster with virtual machine node pools<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#create-a-new-aks-cluster-with-virtual-machine-node-pools" class="hash-link" aria-label="Direct link to Create a new AKS cluster with virtual machine node pools" title="Direct link to Create a new AKS cluster with virtual machine node pools" translate="no">​</a></h3>
<p>The following example creates a new cluster named myAKSCluster with a virtual machine node pool containing two nodes with size "Standard_D4s_v3", and sets the Kubernetes version to 1.31.0:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks create --resource-group myResourceGroup --name myAKSCluster \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --vm-set-type "VirtualMachines" --vm-sizes "Standard_D4s_v3" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --node-count 2 --kubernetes-version 1.31.0</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="add-a-new-virtual-machine-node-pool-to-an-existing-cluster">Add a new virtual machine node pool to an existing cluster<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#add-a-new-virtual-machine-node-pool-to-an-existing-cluster" class="hash-link" aria-label="Direct link to Add a new virtual machine node pool to an existing cluster" title="Direct link to Add a new virtual machine node pool to an existing cluster" translate="no">​</a></h3>
<p>The following example adds a virtual machine node pool named myvmpool to the myAKSCluster cluster. The node pool creates a ManualScaleProfile with --vm-sizes set to Standard_D4s_v3 and a --node-count of 3:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks nodepool add --resource-group myResourceGroup --cluster-name myAKSCluster --name myvmpool --vm-set-type "VirtualMachines" --vm-sizes "Standard_D4s_v3" --node-count 3</span><br></span></code></pre></div></div>
<p>With virtual machine node pools you can also perform some of the following commands:</p>
<ul>
<li>Add multiple VM sizes in an existing or new node pool</li>
<li>Update VM sizes in an existing node pool</li>
<li>Single-SKU autoscaling (public preview)</li>
<li>Delete VM sizes in an existing node pool</li>
</ul>
<p>Visit our <a href="https://learn.microsoft.com/azure/aks/virtual-machines-node-pools" target="_blank" rel="noopener noreferrer">virtual machine node pools documentation</a> for more info.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="upcoming-experiences-on-the-aks-roadmap">Upcoming experiences on the AKS roadmap<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#upcoming-experiences-on-the-aks-roadmap" class="hash-link" aria-label="Direct link to Upcoming experiences on the AKS roadmap" title="Direct link to Upcoming experiences on the AKS roadmap" translate="no">​</a></h2>
<ul>
<li><strong>NAP:</strong> Expect deeper integration with cost optimization tools and advanced disruption policies for even smarter consolidation.</li>
<li><strong>Virtual machine node pools:</strong> Multi-SKU autoscaling (general availability) is on the horizon, reducing manual configuration and enabling adaptive scaling across mixed SKUs.</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="next-steps">Next steps<a href="https://blog.aks.azure.com/2025/12/06/node-auto-provisioning-capacity-management#next-steps" class="hash-link" aria-label="Direct link to Next steps" title="Direct link to Next steps" translate="no">​</a></h2>
<p>Ready to get started?</p>
<ol>
<li><strong>Try one of these features now:</strong> Follow the <a href="https://learn.microsoft.com/azure/aks/use-node-auto-provisioning" target="_blank" rel="noopener noreferrer">Enable Node Auto Provisioning steps</a> or <a href="https://learn.microsoft.com/azure/aks/virtual-machines-node-pools" target="_blank" rel="noopener noreferrer">create a virtual machine node pool</a>.</li>
<li><strong>Share feedback:</strong> Open issues or ideas in <a href="https://github.com/Azure/AKS/issues" target="_blank" rel="noopener noreferrer">AKS GitHub Issues</a>.</li>
<li><strong>Join the community:</strong> Subscribe to the <a href="https://www.youtube.com/@theakscommunity" target="_blank" rel="noopener noreferrer">AKS Community YouTube</a> and follow <a href="https://x.com/theakscommunity" target="_blank" rel="noopener noreferrer">@theakscommunity</a> on X.</li>
</ol>]]></content:encoded>
            <category>Node Auto Provisioning</category>
            <category>Virtual Machine Node Pools</category>
        </item>
        <item>
            <title><![CDATA[Azure Container Registry Repository Permissions with Attribute-based Access Control (ABAC)]]></title>
            <link>https://blog.aks.azure.com/2026/01/23/acr-abac-repository-permissions</link>
            <guid>https://blog.aks.azure.com/2026/01/23/acr-abac-repository-permissions</guid>
            <pubDate>Fri, 23 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Azure Container Registry now supports Microsoft Entra ABAC for granular repository permissions in CI/CD pipelines and AKS clusters for least-privilege access.]]></description>
            <content:encoded><![CDATA[<p>Enterprises are converging on centralized container registries that serve multiple business units and application domains. <a href="https://learn.microsoft.com/azure/role-based-access-control/overview" target="_blank" rel="noopener noreferrer">Azure role-based access control (RBAC)</a> uses <a href="https://learn.microsoft.com/azure/role-based-access-control/role-assignments" target="_blank" rel="noopener noreferrer">role assignments</a> to control access to Azure resources. Each Azure RBAC role assignment specifies an identity (who will gain permissions), an Azure role with Entra <a href="https://learn.microsoft.com/azure/role-based-access-control/permissions/containers#microsoftcontainerregistry" target="_blank" rel="noopener noreferrer">actions and data actions</a> (what permissions are granted), and an assignment <a href="https://learn.microsoft.com/azure/role-based-access-control/scope-overview" target="_blank" rel="noopener noreferrer">scope</a> (which resources). For Azure Container Registry (ACR), traditional Azure RBAC scopes are limited to the subscription, resource group, or registry level—meaning permissions apply to all repositories within a registry.</p>
<p>In this shared registry model, traditional Azure role-based access control (RBAC) forces an all-or-nothing choice: either grant broad registry-wide permissions or manage separate registries per team. Neither approach aligns with least-privilege principles or modern zero trust architectures.</p>
<p>Microsoft Entra <a href="https://learn.microsoft.com/azure/role-based-access-control/conditions-format" target="_blank" rel="noopener noreferrer">attribute-based access control (ABAC)</a> for Azure Container Registry solves this challenge. ABAC augments Azure RBAC with fine-grained conditions, enabling platform teams to scope permissions precisely to specific repositories or namespaces within a shared registry. CI/CD pipelines and Azure Kubernetes Service (AKS) clusters can now access only their authorized repositories, eliminating overprivileged authorization while maintaining operational simplicity.</p>
<p><img decoding="async" loading="lazy" alt="AKS cluster pulling from ACR with ABAC" src="https://blog.aks.azure.com/assets/images/aks-cluster-pulling-from-acr-with-abac-f8534471feb79f97c21bbd1a7ce127a4.png" width="1080" height="579" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-abac-works-in-acr">How ABAC works in ACR<a href="https://blog.aks.azure.com/2026/01/23/acr-abac-repository-permissions#how-abac-works-in-acr" class="hash-link" aria-label="Direct link to How ABAC works in ACR" title="Direct link to How ABAC works in ACR" translate="no">​</a></h2>
<p>Attribute-based access control (ABAC) extends the traditional RBAC model by introducing <strong>attributes</strong> on top of actions and data actions. While RBAC defines <strong>who</strong> (identity) can perform <strong>what permissions</strong> (Entra actions and data actions in a <a href="https://learn.microsoft.com/azure/role-based-access-control/role-definitions" target="_blank" rel="noopener noreferrer">role definition</a>) on <strong>which resource scope</strong> (subscription, resource group, or registry), ABAC adds a fourth dimension: <strong>attributes</strong>. Attributes—such as repository name patterns—are properties of the resource being accessed. ABAC uses <strong>conditions</strong> in role assignments to evaluate these attributes and determine whether access should be granted.</p>
<p>For ACR, this means that while the <em>scope</em> of a role assignment can be at the subscription, resource group, or registry level, ABAC <em>conditions</em> evaluate repository-level attributes like <code>repositories:name</code> to determine whether a specific permission (such as <code>repositories/content/read</code>) is permitted. This allows administrators to scope permissions to specific repositories or namespace prefixes within a single registry, without needing separate registries or overprivileged access.</p>
<p>ACR registries support a permissions mode called "<strong>RBAC Registry + ABAC Repository Permissions</strong>" that makes them ABAC-enabled. Once configured, registry administrators add ABAC conditions to standard Azure RBAC role assignments, scoping permissions to specific repositories or namespace prefixes. This enables:</p>
<ul>
<li><strong>CI/CD pipelines</strong> to push images only to their approved namespaces and repositories</li>
<li><strong>AKS clusters, Azure Container Apps, and Azure Container Instances</strong> to pull only from authorized repositories</li>
<li><strong>Microsoft Entra identities</strong> to enforce permission boundaries through standard role assignments</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="enabling-abac-on-acr">Enabling ABAC on ACR<a href="https://blog.aks.azure.com/2026/01/23/acr-abac-repository-permissions#enabling-abac-on-acr" class="hash-link" aria-label="Direct link to Enabling ABAC on ACR" title="Direct link to Enabling ABAC on ACR" translate="no">​</a></h2>
<p>By default, ACR registries use the "<strong>RBAC Registry Permissions</strong>" authorization mode, which provides traditional registry-scoped access control. ABAC can be enabled on all new and existing ACR registries across all SKUs by changing the authorization mode to "<strong>RBAC Registry + ABAC Repository Permissions</strong>", either during registry creation or configured on existing registries. You can assign RBAC roles with optional ABAC conditions to users, groups, service principals, and managed identities used by resources such as Azure Kubernetes Service (AKS), Azure Container Apps (ACA), and Azure Container Instances (ACI) when pulling images from ACR.</p>
<p>Here is the Azure Portal experience for enabling ABAC on a new ACR during creation:</p>
<p><img decoding="async" loading="lazy" alt="Enabling ABAC on a new ACR during creation" src="https://blog.aks.azure.com/assets/images/acr-enabling-abac-during-acr-create-3f92c1c60d98e5265b4ca8581ef627cc.png" width="941" height="941" class="img_ev3q"></p>
<p>Here is the Azure Portal experience for enabling ABAC on an existing ACR:</p>
<p><img decoding="async" loading="lazy" alt="Enabling ABAC on an existing ACR" src="https://blog.aks.azure.com/assets/images/acr-enabling-abac-during-acr-update-f0b1f7fd4cd4b71fd7ba023c14fd2c31.png" width="1226" height="734" class="img_ev3q"></p>
<p>ABAC can also be enabled on ACR registries through Azure Resource Manager (ARM), Bicep files, Terraform templates, and Azure CLI.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="abac-enabled-built-in-roles">ABAC-enabled built-in roles<a href="https://blog.aks.azure.com/2026/01/23/acr-abac-repository-permissions#abac-enabled-built-in-roles" class="hash-link" aria-label="Direct link to ABAC-enabled built-in roles" title="Direct link to ABAC-enabled built-in roles" translate="no">​</a></h2>
<p>Once a registry is ABAC-enabled (configured to "<strong>RBAC Registry + ABAC Repository Permissions</strong>"), registry admins can use these ABAC-enabled built-in roles to grant repository-scoped permissions:</p>
<ul>
<li><strong><a href="https://learn.microsoft.com/azure/role-based-access-control/built-in-roles/containers#container-registry-repository-reader" target="_blank" rel="noopener noreferrer">Container Registry Repository Reader</a></strong>: grants image pull and metadata read permissions, including permissions for <code>HEAD</code> requests, <code>GET</code> manifest requests, <code>GET</code> layer blob requests, tag resolution, and discovering OCI referrers.</li>
<li><strong><a href="https://learn.microsoft.com/azure/role-based-access-control/built-in-roles/containers#container-registry-repository-writer" target="_blank" rel="noopener noreferrer">Container Registry Repository Writer</a></strong>: grants Repository Reader permissions, as well as image and tag push permissions.</li>
<li><strong><a href="https://learn.microsoft.com/azure/role-based-access-control/built-in-roles/containers#container-registry-repository-contributor" target="_blank" rel="noopener noreferrer">Container Registry Repository Contributor</a></strong>: grants Repository Reader and Repository Writer permissions, as well as image and tag delete permissions.</li>
</ul>
<p>Note that these roles do not grant repository list permissions.</p>
<ul>
<li>The separate <strong><a href="https://learn.microsoft.com/azure/role-based-access-control/built-in-roles/containers#container-registry-repository-catalog-lister" target="_blank" rel="noopener noreferrer">Container Registry Repository Catalog Lister</a></strong> must be assigned to grant repository list permissions.</li>
<li>The <strong><a href="https://learn.microsoft.com/azure/role-based-access-control/built-in-roles/containers#container-registry-repository-catalog-lister" target="_blank" rel="noopener noreferrer">Container Registry Repository Catalog Lister</a></strong> role does not support ABAC conditions in role assignments; assigning this role grants permissions to list all repositories in a registry.</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="assigning-roles-with-abac-conditions">Assigning roles with ABAC conditions<a href="https://blog.aks.azure.com/2026/01/23/acr-abac-repository-permissions#assigning-roles-with-abac-conditions" class="hash-link" aria-label="Direct link to Assigning roles with ABAC conditions" title="Direct link to Assigning roles with ABAC conditions" translate="no">​</a></h2>
<p>After enabling ABAC on your registry, you can create role assignments with conditions that scope permissions to specific repositories. Here's an example using Azure CLI to grant a managed identity read access to repositories matching the <code>myapp/*</code> namespace:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az role assignment create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --assignee "&lt;managed-identity-principal-id&gt;" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --role "Container Registry Repository Reader" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --scope "/subscriptions/&lt;subscription-id&gt;/resourceGroups/&lt;resource-group&gt;/providers/Microsoft.ContainerRegistry/registries/&lt;registry-name&gt;" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --condition "((!(ActionMatches{'Microsoft.ContainerRegistry/registries/repositories/content/read'})) OR (@Resource[Microsoft.ContainerRegistry/registries/repositories:name] StringStartsWith 'myapp/'))" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --condition-version "2.0"</span><br></span></code></pre></div></div>
<p>In the Azure Portal, navigate to your registry's <strong>Access control (IAM)</strong> blade, select <strong>Add role assignment</strong>, choose an ABAC-enabled role, and configure conditions in the <strong>Conditions</strong> tab to specify repository name patterns.</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>ABAC conditions in Azure are <strong>fail-closed</strong> by default. If a condition cannot be evaluated (for example, due to a missing attribute), access is denied. This ensures that misconfigured conditions do not inadvertently grant broader access than intended.</p></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="important-role-behavior-changes-in-abac-mode">Important role behavior changes in ABAC mode<a href="https://blog.aks.azure.com/2026/01/23/acr-abac-repository-permissions#important-role-behavior-changes-in-abac-mode" class="hash-link" aria-label="Direct link to Important role behavior changes in ABAC mode" title="Direct link to Important role behavior changes in ABAC mode" translate="no">​</a></h2>
<div class="theme-admonition theme-admonition-caution admonition_xJq3 alert alert--warning"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 16 16"><path fill-rule="evenodd" d="M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"></path></svg></span>caution</div><div class="admonitionContent_BuS1"><p>When a registry is ABAC-enabled by configuring its permissions mode to "<strong>RBAC Registry + ABAC Repository Permissions</strong>", existing built-in roles and role assignments will have different behaviors and will no longer grant the same set of permissions to ACR registries.</p></div></div>
<ul>
<li>Legacy data-plane roles such as <strong>AcrPull</strong>, <strong>AcrPush</strong>, and <strong>AcrDelete</strong> are <em><strong>not honored in ABAC-enabled registries and should not be used</strong></em>. For ABAC-enabled registries, use the ABAC-enabled built-in roles listed above.</li>
<li>Broad roles like <strong>Owner</strong>, <strong>Contributor</strong>, and <strong>Reader</strong> previously granted full control plane and data plane permissions. This is typically an overprivileged role assignment. In ABAC-enabled registries, these broad roles will only grant control plane permissions to the registry. <strong>Owner</strong>, <strong>Contributor</strong>, and <strong>Reader</strong> will <em><strong>no longer grant data plane permissions</strong></em>, such as image push, pull, delete, or repository list permissions.</li>
<li>In ABAC-enabled registries, ACR Tasks, Quick Tasks, Quick Builds, and Quick Runs no longer have default data plane access to source registries. This prevents inadvertent security leaks and broad permissions grants to ACR Tasks. To grant an ACR Task permissions to a source ACR registry, assign the ABAC-enabled roles above to the calling identity of the Task or Task Run as needed.</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="next-steps">Next steps<a href="https://blog.aks.azure.com/2026/01/23/acr-abac-repository-permissions#next-steps" class="hash-link" aria-label="Direct link to Next steps" title="Direct link to Next steps" translate="no">​</a></h2>
<p>Start using ABAC repository permissions in ACR to enforce least-privilege artifact push, pull, and delete boundaries across your CI/CD systems and container image workloads. This model is now the <em>recommended approach</em> for multi-tenant platform engineering patterns to secure container registry deployments.</p>
<p>To get started, follow the step-by-step guides in the <a href="https://aka.ms/acr/auth/abac" target="_blank" rel="noopener noreferrer">official ACR ABAC documentation</a>.</p>]]></content:encoded>
            <category>Azure Container Registry</category>
            <category>Entra</category>
            <category>Best Practices</category>
            <category>General</category>
            <category>Operations</category>
            <category>Security</category>
        </item>
        <item>
            <title><![CDATA[Scaling multi-node LLM inference with NVIDIA Dynamo and NVIDIA GPUs on AKS (Part 2)]]></title>
            <link>https://blog.aks.azure.com/2026/01/22/dynamo-on-aks-part-2</link>
            <guid>https://blog.aks.azure.com/2026/01/22/dynamo-on-aks-part-2</guid>
            <pubDate>Thu, 22 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to scale multi-node LLM inference on Kubernetes using NVIDIA Dynamo, H100 GPUs, and Dynamo Planner tools to optimize throughput and latency.]]></description>
            <content:encoded><![CDATA[<p><em>This blog post is co-authored with
<a href="https://www.linkedin.com/in/sa126/" target="_blank" rel="noopener noreferrer">Saurabh Aggarwal</a>,
<a href="https://www.linkedin.com/in/anish-maddipoti/" target="_blank" rel="noopener noreferrer">Anish Maddipoti</a>,
<a href="https://www.linkedin.com/in/meleegy/" target="_blank" rel="noopener noreferrer">Amr Elmeleegy</a>, and
<a href="https://www.linkedin.com/in/rohan-s-varma/" target="_blank" rel="noopener noreferrer">Rohan Varma</a> from NVIDIA.</em></p>
<p>In our <a href="https://blog.aks.azure.com/2025/10/24/dynamo-on-aks" target="_blank" rel="noopener noreferrer">previous post</a>,
we demonstrated the power of the Azure ND GB200-v6 VMs accelerated by
NVIDIA GB200 NVL72, achieving a
staggering <strong>1.2M tokens per second</strong> across 10 nodes using NVIDIA Dynamo.
Today, we're shifting focus from raw throughput to <strong>developer velocity</strong> and
<strong>operational efficiency</strong>.</p>
<p>We will explore how the
<a href="https://docs.nvidia.com/dynamo/v-0-9-0/components/planner" target="_blank" rel="noopener noreferrer"><strong>Dynamo Planner</strong></a>
and
<a href="https://docs.nvidia.com/dynamo/v-0-9-0/components/profiler" target="_blank" rel="noopener noreferrer"><strong>Dynamo Profiler</strong></a>
remove the guesswork from performance tuning on AKS.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-challenge-balancing-the-rate-matching-equation">The Challenge: Balancing the "Rate Matching" Equation<a href="https://blog.aks.azure.com/2026/01/22/dynamo-on-aks-part-2#the-challenge-balancing-the-rate-matching-equation" class="hash-link" aria-label="Direct link to The Challenge: Balancing the &quot;Rate Matching&quot; Equation" title="Direct link to The Challenge: Balancing the &quot;Rate Matching&quot; Equation" translate="no">​</a></h2>
<p>Disaggregated serving separates the prefill phase (when the model first
processes the entire input sequence at once) and decode phase (when the
model starts sequentially generating output tokens) of inference
across distinct GPU nodes. This allows each phase to be independently
optimized with custom GPU counts and model parallelism configurations.</p>
<p><img decoding="async" loading="lazy" alt="Disaggregated serving with Dynamo" src="https://blog.aks.azure.com/assets/images/dynamo_inference_diagram-a8367e25160a1e7042b7e51a96c4da5b.jpg" width="2469" height="1039" class="img_ev3q"></p>
<p>One of the main challenges in disaggregated serving is <strong>rate matching</strong>:
determining the right GPU allocation between prefill and decode stages to
meet a specific Service Level Objective (SLO). If you miscalculate the GPU
ratio between these stages, you face two "silent killers" of performance:</p>
<ul>
<li><strong>Over-provisioned Prefill</strong>: Your prompt processing is fast, but
requests bottleneck at the generation stage. This spikes <em>Inter-Token
Latency (ITL)</em> and leaves expensive compute nodes idle.</li>
<li><strong>Under-provisioned Prefill</strong>: Your decode GPUs sit starved for data.
This drives up <em>Time-To-First-Token (TTFT)</em> and inflates your
<em>Total Cost of Ownership (TCO)</em>.</li>
</ul>
<p>Beyond rate matching, developers must also optimize model parallelism
parameters (data, tensor, and expert parallelism) to maintain high
<a href="https://arxiv.org/abs/2401.09670" target="_blank" rel="noopener noreferrer">"Goodput"</a> (the fraction of time
and resources where the model is learning or producing correct results,
instead of waiting or doing extra work).</p>
<p>Exploring these configurations manually is technically challenging,
time-consuming and often results in suboptimal resource utilization.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="dynamic-traffic-the-move-to-slo-driven-scaling">Dynamic Traffic: The Move to SLO-Driven Scaling<a href="https://blog.aks.azure.com/2026/01/22/dynamo-on-aks-part-2#dynamic-traffic-the-move-to-slo-driven-scaling" class="hash-link" aria-label="Direct link to Dynamic Traffic: The Move to SLO-Driven Scaling" title="Direct link to Dynamic Traffic: The Move to SLO-Driven Scaling" translate="no">​</a></h2>
<p>Static configurations are brittle. In production, traffic is rarely uniform:</p>
<ul>
<li><strong>Volatile Request Volume</strong>: Traditional Horizontal Pod Autoscalers (HPA)
are too slow for LLM jitters.</li>
<li><strong>Shifting Sequence Patterns</strong>: If your workload shifts from short chat
queries (low input sequence length (ISL)) to long-context document analysis (high ISL), a static
disaggregated split becomes suboptimal instantly (resulting in overworked
prefill GPUs and idle decode GPUs).</li>
</ul>
<p>NVIDIA Dynamo addresses these gaps through two integrated components:
the <strong>Planner Profiler</strong> and the <strong>SLO-based Planner</strong>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="lets-see-it-through-an-example-application-scenario">Let’s see it through an example application scenario<a href="https://blog.aks.azure.com/2026/01/22/dynamo-on-aks-part-2#lets-see-it-through-an-example-application-scenario" class="hash-link" aria-label="Direct link to Let’s see it through an example application scenario" title="Direct link to Let’s see it through an example application scenario" translate="no">​</a></h3>
<p>Consider a mission-critical AI workload running on AKS: an airline’s
automated rerouting system during a widespread delay. This use case is a
'stress test' for
inference: it is subject to massive, sudden bursts in traffic and highly
variable request patterns, such as a mix of short status queries and
long-context itinerary processing. To prevent latency spikes during these
peaks, the underlying system requires the precise orchestration offered
by a disaggregated architecture.</p>
<p>Using the
<a href="https://huggingface.co/Qwen/Qwen3-32B-FP8" target="_blank" rel="noopener noreferrer">Qwen3-32B-FP8</a>
model, we can deploy an Airline Assistant with
strict SLA targets: TTFT ≤ 500ms and ITL (Inter-Token Latency) ≤ 30ms.</p>
<p>During normal operations, passengers ask short queries like
"What's my flight status?" But when a major weather system causes
flight cancellations, passengers flood the app with complex rerouting
requests—long-context queries (~4,000 tokens) requiring detailed itinerary
responses (~500 tokens). This sudden surge of 200 concurrent users is
exactly the kind of real-world spike that breaks static configurations.</p>
<p>To build a truly efficient disaggregated AI inference system, you
need to transition from manual "guess-and-check" configurations
to an automated, SLO-driven approach. The core of this automation
lies in two distinct but deeply integrated components: the Dynamo
Planner profiler and the Dynamo Planner.</p>
<p>The first step in building your system is determining the "Golden Ratio"
of GPUs: how many should handle prefill versus decode, and what tensor
parallelism (TP) levels each should use.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-architect-dynamo-profiler">The Architect: Dynamo Profiler<a href="https://blog.aks.azure.com/2026/01/22/dynamo-on-aks-part-2#the-architect-dynamo-profiler" class="hash-link" aria-label="Direct link to The Architect: Dynamo Profiler" title="Direct link to The Architect: Dynamo Profiler" translate="no">​</a></h3>
<p>The Dynamo Planner profiler is your pre-deployment simulation engine.
Instead of burning GPU hours testing every possible configuration, you
define your requirements in a <strong>DynamoGraphDeploymentRequest (DGDR)</strong>
manifest. The profiler then executes an automated
<a href="https://github.com/ai-dynamo/dynamo/blob/release/0.8.1/docs/benchmarks/sla_driven_profiling.md" target="_blank" rel="noopener noreferrer">"sweep"</a>
of the search space:</p>
<ul>
<li><strong>Parallelization Mapping</strong>: It tests different TP sizes for both prefill
and decode stages to find the lowest TTFT and ITL.</li>
<li><strong>Hardware Simulation</strong>: Using the <strong>AI Configurator (AIC)</strong> mode, the
profiler can simulate performance in just 20–30 seconds
based on pre-measured performance data, allowing for rapid
iteration before you ever touch a physical GPU.</li>
<li><strong>Resulting Recommendation</strong>: The output is a highly tuned
configuration that maximizes <a href="https://arxiv.org/abs/2401.09670" target="_blank" rel="noopener noreferrer">"Goodput"</a>,
the maximum throughput
achievable while staying strictly within your latency bounds.</li>
</ul>
<p>Ultimately, the app developers and AI engineers reduce their time
spent on testing different system setups, and can focus on their airline
passengers’ needs.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-pilot-dynamo-planner">The Pilot: Dynamo Planner<a href="https://blog.aks.azure.com/2026/01/22/dynamo-on-aks-part-2#the-pilot-dynamo-planner" class="hash-link" aria-label="Direct link to The Pilot: Dynamo Planner" title="Direct link to The Pilot: Dynamo Planner" translate="no">​</a></h3>
<p>Once your system is deployed, static configurations can't handle the
"jitter" of real-world traffic. This is where the Dynamo Planner takes
over as a runtime orchestration engine.</p>
<p>Unlike a traditional load balancer, the Dynamo Planner is <strong>LLM-aware</strong>.
It continuously monitors the live state of your cluster, specifically
looking at:</p>
<ul>
<li><strong>KV Cache Load</strong>: It monitors memory utilization in the decode pool.</li>
<li><strong>Prefill Queue Depth</strong>: It tracks how many prompts are waiting to be
processed.</li>
</ul>
<p>Using the performance bounds identified earlier by the profiler
(i.e. TTFT ≤ 500ms and ITL ≤ 30ms) the Planner
proactively scales the number of prefill and decode workers up or down. For
example, if a <em>sudden burst of long-context itinerary queries</em> floods the
system, the Planner detects the spike in the prefill queue and shifts available
GPU resources to the prefill pool <em>before</em> your TTFT violates its SLO.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="seeing-it-in-action">Seeing it in Action<a href="https://blog.aks.azure.com/2026/01/22/dynamo-on-aks-part-2#seeing-it-in-action" class="hash-link" aria-label="Direct link to Seeing it in Action" title="Direct link to Seeing it in Action" translate="no">​</a></h2>
<p>In our airline scenario, the system starts with 1 prefill worker and
1 decode worker. When the passenger surge hits, the Planner's 60-second
adjustment interval detects the SLA violations:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Prefill calculation: 138.55 (p_thpt) / 4838.61 (p_engine_cap) = 1 (num_p)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Decode calculation: 27.27 (d_thpt) / 19381.08 (d_engine_cap) = 1 (num_d)</span><br></span></code></pre></div></div>
<p>As traffic spikes to 200 concurrent passengers, the Planner recalculates:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Prefill calculation: 16177.75 (p_thpt) / 8578.39 (p_engine_cap) = 2 (num_p)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Decode calculation: 400.00 (d_thpt) / 3354.30 (d_engine_cap) = 1 (num_d)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Predicted number of engine replicas: prefill=2, decode=1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Updating prefill component VllmPrefillWorker to desired replica count 2</span><br></span></code></pre></div></div>
<p><a href="https://asciinema.org/a/67XW4yXJIBmIe7bv" target="_blank" rel="noopener noreferrer">See the Dynamo SLA Planner</a>
in action as it automatically scales the Airline Assistant during a
traffic surge. The Planner automatically scales to 2 prefill workers
while keeping 1 decode worker (the optimal configuration to handle the
surge while maintaining SLA targets). Within minutes, the new worker is
online and passengers are getting their rerouting options without
frustrating delays.</p>
<p>Now, you can try this yourself by running the NVIDIA Dynamo Planner Profiler
to capture burst and request behavior, then using the SLO-based Planner to
translate latency targets into placement and scaling decisions on your AKS
cluster. Setting it up in this order - profile under stress, define SLOs,
and let the planner orchestrate your disaggregated inference system to
handle sudden traffic spikes without latency spikes.</p>
<p>After deploying Dynamo by following <a href="https://aka.ms/aks-dynamo" target="_blank" rel="noopener noreferrer">these instructions</a>,
get hands on with the
<a href="https://huggingface.co/Qwen/Qwen3-32B-FP8" target="_blank" rel="noopener noreferrer">Qwen3-32B-FP8</a>
model using the example in <a href="https://aka.ms/aks-dynamo-part-2" target="_blank" rel="noopener noreferrer">AKS Dynamo Part 2 sample</a>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="conclusion-inference-without-the-infrastructure-burden">Conclusion: Inference Without the Infrastructure Burden<a href="https://blog.aks.azure.com/2026/01/22/dynamo-on-aks-part-2#conclusion-inference-without-the-infrastructure-burden" class="hash-link" aria-label="Direct link to Conclusion: Inference Without the Infrastructure Burden" title="Direct link to Conclusion: Inference Without the Infrastructure Burden" translate="no">​</a></h2>
<p>The shift toward disaggregated serving is a necessity for the next
generation of reasoning-heavy and long-context LLMs. However, as we
have seen, the complexity of manually tuning these distributed systems
on Kubernetes can quickly become a bottleneck for even the most
experienced AI teams.</p>
<p>By utilizing the NVIDIA Dynamo Planner Profiler, developers can move
from educated guessing to data-driven certainty, modeling performance
in seconds rather than days. When paired with the Dynamo Planner, this
static optimization becomes a dynamic, SLO-aware reality on AKS, capable of
weathering the unpredictable traffic spikes of production environments.</p>
<p>Ultimately, this suite transforms your inference stack from a series of
fragile configurations into a resilient, self-optimizing engine. For the AI
engineer, this means less time spent managing hardware limits and configuring
system scalability and more time spent delivering the high-quality,
interactive experiences that your users (and your passengers) expect.</p>]]></content:encoded>
            <category>Dynamo on AKS series</category>
            <category>AI</category>
            <category>Performance</category>
            <category>Open Source</category>
        </item>
        <item>
            <title><![CDATA[Deploy Apps to AKS Automatic with Terraform and the Helm Provider]]></title>
            <link>https://blog.aks.azure.com/2026/01/09/deploy-aks-automatic-terraform-helm</link>
            <guid>https://blog.aks.azure.com/2026/01/09/deploy-aks-automatic-terraform-helm</guid>
            <pubDate>Fri, 09 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to deploy AKS Automatic with the AzApi provider for Terraform and configure the Helm provider for Azure RBAC token-based authentication.]]></description>
            <content:encoded><![CDATA[<p>Deploying applications to AKS Automatic with Terraform requires a different authentication approach than traditional AKS clusters. AKS Automatic uses Azure RBAC exclusively, which means you can't download a kubeconfig file with static credentials. This post explores options for configuring the Helm provider to work with AKS Automatic—or any AKS cluster using Azure RBAC—using Azure CLI, service principals, or managed identities.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-challenge">The challenge<a href="https://blog.aks.azure.com/2026/01/09/deploy-aks-automatic-terraform-helm#the-challenge" class="hash-link" aria-label="Direct link to The challenge" title="Direct link to The challenge" translate="no">​</a></h2>
<p>When you create an AKS Automatic cluster, Azure enables several production-ready defaults, including:</p>
<ul>
<li>Azure RBAC for Kubernetes authorization</li>
<li>Disabled local accounts (no static kubeconfig credentials)</li>
<li>Workload Identity authentication</li>
</ul>
<p>These defaults strengthen security, but they also mean the typical Helm provider configuration that relies on a kubeconfig file won't work. Instead, you need to configure the Helm provider to use token-based authentication.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>The authentication approach demonstrated here also applies to the <a href="https://registry.terraform.io/providers/hashicorp/kubernetes/latest/" target="_blank" rel="noopener noreferrer">Kubernetes provider</a>, which shares the same authentication mechanisms.</p></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="prerequisites">Prerequisites<a href="https://blog.aks.azure.com/2026/01/09/deploy-aks-automatic-terraform-helm#prerequisites" class="hash-link" aria-label="Direct link to Prerequisites" title="Direct link to Prerequisites" translate="no">​</a></h2>
<p>Before you begin, ensure you have:</p>
<ul>
<li><a href="https://developer.hashicorp.com/terraform/install" target="_blank" rel="noopener noreferrer">Terraform</a> 1.14 or later</li>
<li><a href="https://learn.microsoft.com/cli/azure/install-azure-cli" target="_blank" rel="noopener noreferrer">Azure CLI</a> 2.81 or later, installed and authenticated</li>
<li><a href="https://kubernetes.io/docs/tasks/tools/install-kubectl/" target="_blank" rel="noopener noreferrer">kubectl</a> v1.34 or later</li>
<li><a href="https://azure.github.io/kubelogin/install.html" target="_blank" rel="noopener noreferrer">kubelogin</a> v0.2.13 or later</li>
<li><a href="https://helm.sh/docs/intro/install/" target="_blank" rel="noopener noreferrer">Helm</a> v3 or later</li>
<li>An Azure subscription with permissions to create AKS clusters</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="deploy-aks-automatic-with-azapi-provider">Deploy AKS Automatic with AzApi provider<a href="https://blog.aks.azure.com/2026/01/09/deploy-aks-automatic-terraform-helm#deploy-aks-automatic-with-azapi-provider" class="hash-link" aria-label="Direct link to Deploy AKS Automatic with AzApi provider" title="Direct link to Deploy AKS Automatic with AzApi provider" translate="no">​</a></h2>
<p>The <a href="https://registry.terraform.io/providers/Azure/azapi/latest/docs" target="_blank" rel="noopener noreferrer">AzApi provider</a> is a lightweight Terraform provider that allows you to deploy Azure resources using the Azure Resource Manager (ARM) API. In most cases, it's simpler to use the <a href="https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs" target="_blank" rel="noopener noreferrer">AzureRM provider</a> but when you're looking to deploy new Azure services that aren't yet supported by the AzureRM provider, AzApi is a great alternative.</p>
<p>Create a new directory and add a Terraform configuration file (for example, <code>main.tf</code>) with the following code to deploy an AKS Automatic cluster:</p>
<div class="language-hcl codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-hcl codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">terraform {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  required_version = "&gt;= 1.14, &lt; 2.0"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  required_providers {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    azurerm = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      source  = "hashicorp/azurerm"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      version = "&gt;= 4.0.0, &lt; 5.0.0"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    azapi = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      source  = "azure/azapi"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      version = "&gt;= 2.8.0, &lt; 3.0.0"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    random = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      source  = "hashicorp/random"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      version = "&gt;= 3.5.0"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    helm = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      source  = "hashicorp/helm"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      version = "&gt;= 3.0.0, &lt; 4.0.0"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">provider "azurerm" {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  features {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    resource_group {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      prevent_deletion_if_contains_resources = false</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">data "azurerm_client_config" "current" {}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">resource "random_pet" "this" {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  separator = ""</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">resource "azurerm_resource_group" "this" {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  location = "swedencentral"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name     = "rg-${random_pet.this.id}"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">resource "azapi_resource" "this" {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  type                      = "Microsoft.ContainerService/managedClusters@2025-10-01"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  parent_id                 = azurerm_resource_group.this.id</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  location                  = azurerm_resource_group.this.location</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name                      = "aks-${random_pet.this.id}"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  schema_validation_enabled = false # Use when the azapi provider's local schema validation doesn't yet support this API version</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  body = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    identity = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      type = "SystemAssigned"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    },</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    properties = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      agentPoolProfiles = [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          name  = "systempool"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          mode  = "System"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          count = 3</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    sku = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      name = "Automatic"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      tier = "Standard"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">resource "azurerm_role_assignment" "this" {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  scope                = azapi_resource.this.id</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  principal_id         = data.azurerm_client_config.current.object_id</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  role_definition_name = "Azure Kubernetes Service RBAC Cluster Admin"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span></code></pre></div></div>
<p>This minimal configuration creates an AKS Automatic cluster and assigns the current user the <strong>Azure Kubernetes Service RBAC Cluster Admin</strong> role.</p>
<p>Run the following commands to deploy the cluster:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">terraform init</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">terraform apply</span><br></span></code></pre></div></div>
<p>After 7 to 10 minutes, your AKS cluster is ready.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="helm-provider-authentication-options">Helm provider authentication options<a href="https://blog.aks.azure.com/2026/01/09/deploy-aks-automatic-terraform-helm#helm-provider-authentication-options" class="hash-link" aria-label="Direct link to Helm provider authentication options" title="Direct link to Helm provider authentication options" translate="no">​</a></h2>
<p>The <a href="https://registry.terraform.io/providers/hashicorp/helm/latest/docs" target="_blank" rel="noopener noreferrer">Helm provider</a> allows you to authenticate to a Kubernetes cluster in several ways:</p>
<ol>
<li><strong>Using a kubeconfig file</strong>: This option doesn't work because local accounts are disabled in AKS Automatic clusters.</li>
<li><strong>Supplying credentials directly</strong>: This option is possible, but with a twist—you need to obtain a bearer token first and then supply it to the provider.</li>
<li><strong>Using the exec plugin</strong>: This option calls an external program to obtain short-lived credentials. It uses the <a href="https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins" target="_blank" rel="noopener noreferrer">client-go credential plugin</a> mechanism built into <code>kubectl</code> and the Kubernetes client libraries.</li>
</ol>
<p>With that context, let's explore the two viable options for configuring the Helm provider.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>Both options require retrieving the connection details—the host URL and cluster CA certificate—from the AKS cluster. The AzApi provider doesn't expose these values directly; however, you can use the <code>azurerm_kubernetes_cluster</code> data source as a workaround.</p></div></div>
<p>Add the following to the bottom of your <code>main.tf</code> file:</p>
<div class="language-hcl codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-hcl codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">data "azurerm_kubernetes_cluster" "this" {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name                = azapi_resource.this.name</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  resource_group_name = azurerm_resource_group.this.name</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="option-1-configure-the-helm-provider-with-azure-bearer-token-authentication">Option 1: Configure the Helm provider with Azure bearer token authentication<a href="https://blog.aks.azure.com/2026/01/09/deploy-aks-automatic-terraform-helm#option-1-configure-the-helm-provider-with-azure-bearer-token-authentication" class="hash-link" aria-label="Direct link to Option 1: Configure the Helm provider with Azure bearer token authentication" title="Direct link to Option 1: Configure the Helm provider with Azure bearer token authentication" translate="no">​</a></h3>
<p>The Helm provider's <a href="https://registry.terraform.io/providers/hashicorp/helm/latest/docs#kubernetes-1" target="_blank" rel="noopener noreferrer"><code>kubernetes</code> block</a> supports a <a href="https://registry.terraform.io/providers/hashicorp/helm/latest/docs#token-1" target="_blank" rel="noopener noreferrer"><code>token</code> argument</a> that lets you supply a bearer token directly for authentication.</p>
<p>You can obtain short-lived access tokens using the Azure CLI. If you're authenticated with <code>az login</code>, you can get a token for the AKS resource like this:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az account get-access-token --resource 6dae42f8-4368-4678-94ff-3960e28e3630</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>The resource ID <code>6dae42f8-4368-4678-94ff-3960e28e3630</code> is the well-known application ID for <strong>Azure Kubernetes Service AAD Server</strong>. This ID is the same for all AKS clusters using Microsoft Entra ID authentication.</p></div></div>
<p>Combining the above command with the <a href="https://developer.hashicorp.com/terraform/language/data-sources/external" target="_blank" rel="noopener noreferrer"><code>external</code> data source</a> in Terraform allows you to retrieve the token dynamically and use it in the Helm provider configuration.</p>
<p>Add the following to your <code>main.tf</code> file:</p>
<div class="language-hcl codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-hcl codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">data "external" "this" {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  program = ["bash", "-c", "az account get-access-token --resource 6dae42f8-4368-4678-94ff-3960e28e3630 --query '{token: accessToken}' -o json"]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span></code></pre></div></div>
<p>You can now reference the token in the Helm provider configuration like this:</p>
<div class="language-hcl codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-hcl codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">provider "helm" {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  kubernetes = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    host                   = data.azurerm_kubernetes_cluster.this.kube_config.0.host</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.this.kube_config.0.cluster_ca_certificate)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    token                  = data.external.this.result.token</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span></code></pre></div></div>
<p>This configuration uses the host and cluster CA certificate from the AKS cluster data source and gains access to the cluster using the bearer token from the external data source.</p>
<div class="theme-admonition theme-admonition-danger admonition_xJq3 alert alert--danger"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M5.05.31c.81 2.17.41 3.38-.52 4.31C3.55 5.67 1.98 6.45.9 7.98c-1.45 2.05-1.7 6.53 3.53 7.7-2.2-1.16-2.67-4.52-.3-6.61-.61 2.03.53 3.33 1.94 2.86 1.39-.47 2.3.53 2.27 1.67-.02.78-.31 1.44-1.13 1.81 3.42-.59 4.78-3.42 4.78-5.56 0-2.84-2.53-3.22-1.25-5.61-1.52.13-2.03 1.13-1.89 2.75.09 1.08-1.02 1.8-1.86 1.33-.67-.41-.66-1.19-.06-1.78C8.18 5.31 8.68 2.45 5.05.32L5.03.3l.02.01z"></path></svg></span>danger</div><div class="admonitionContent_BuS1"><p>Terraform stores the access token in plain text in the state file. Although the token expires after about one hour, this poses a security risk in shared environments or CI/CD pipelines. For production use, secure your state file appropriately or use Option 2 instead.</p></div></div>
<p>For quick local demos, this approach is convenient. For CI/CD pipelines or service principal authentication, the next option is more flexible.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="option-2-configure-the-helm-provider-to-use-the-exec-plugin-with-kubelogin">Option 2: Configure the Helm provider to use the exec plugin with kubelogin<a href="https://blog.aks.azure.com/2026/01/09/deploy-aks-automatic-terraform-helm#option-2-configure-the-helm-provider-to-use-the-exec-plugin-with-kubelogin" class="hash-link" aria-label="Direct link to Option 2: Configure the Helm provider to use the exec plugin with kubelogin" title="Direct link to Option 2: Configure the Helm provider to use the exec plugin with kubelogin" translate="no">​</a></h3>
<p>The Helm provider also supports using the <code>exec</code> plugin mechanism to obtain credentials dynamically. This approach is more flexible and works well with various authentication methods supported by <a href="https://azure.github.io/kubelogin/index.html" target="_blank" rel="noopener noreferrer">kubelogin</a> which is a <a href="https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins" target="_blank" rel="noopener noreferrer">Kubernetes client-go credential plugin</a> for Azure.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>This is a more ideal approach because it does not require storing tokens in the Terraform state file. Instead, the Helm provider invokes the specified command to obtain fresh credentials each time it needs to authenticate.</p></div></div>
<p>To use this approach, add the following to your <code>main.tf</code>:</p>
<div class="language-hcl codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-hcl codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">provider "helm" {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  kubernetes = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    host                   = data.azurerm_kubernetes_cluster.this.kube_config.0.host</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.this.kube_config.0.cluster_ca_certificate)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    exec = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      api_version = "client.authentication.k8s.io/v1beta1"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      command     = "kubelogin"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      args = [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "get-token",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "--login",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "azurecli",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "--server-id",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "6dae42f8-4368-4678-94ff-3960e28e3630"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span></code></pre></div></div>
<p>This configuration uses the kubelogin tool to obtain an access token based on your existing Azure CLI authentication context. Therefore, you must ensure it is installed and accessible in your system's PATH.</p>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>The kubelogin tool is an open-source project maintained by Microsoft that implements the Kubernetes client-go credential plugin interface for Azure authentication. It supports multiple login methods, including Azure CLI, managed identity, and service principal.</p><p>If you're deploying applications from a CI/CD pipeline or HCP Terraform or Terraform Enterprise that uses a service principal or managed identity instead of Azure CLI, you can adjust the command arguments accordingly.</p><p>For example, to use a service principal with a client secret, the Helm provider configuration would look like this:</p><div class="language-hcl codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-hcl codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">provider "helm" {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  kubernetes = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    host                   = data.azurerm_kubernetes_cluster.this.kube_config.0.host</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.this.kube_config.0.cluster_ca_certificate)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    exec = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      api_version = "client.authentication.k8s.io/v1beta1"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      command     = "kubelogin"             # Make sure kubelogin is installed and accessible in PATH</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      args = [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "get-token",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "--login",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "spn",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "--environment",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "AzurePublicCloud",                 # Adjust if using a different cloud</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "--server-id",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "6dae42f8-4368-4678-94ff-3960e28e3630",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "--client-id",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        var.service_principal_client_id,    # Replace with your service principal client ID</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "--tenant-id",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        var.service_principal_tenant_id     # Replace with your service principal tenant ID</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      env = {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        AAD_SERVICE_PRINCIPAL_CLIENT_SECRET = var.service_principal_client_secret</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span></code></pre></div></div><p>Note the client secret is passed via the <code>AAD_SERVICE_PRINCIPAL_CLIENT_SECRET</code> environment variable instead of the <code>--client-secret</code> command-line argument. This approach avoids exposing the secret in process listings or logs where it could be captured by other users or system tooling.</p><p>There are additional options for using managed identities as well. See the <a href="https://azure.github.io/kubelogin/index.html" target="_blank" rel="noopener noreferrer">kubelogin documentation</a> for more details.</p></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="deploy-an-application-with-helm">Deploy an application with Helm<a href="https://blog.aks.azure.com/2026/01/09/deploy-aks-automatic-terraform-helm#deploy-an-application-with-helm" class="hash-link" aria-label="Direct link to Deploy an application with Helm" title="Direct link to Deploy an application with Helm" translate="no">​</a></h2>
<p>Now you can use the Helm provider to deploy applications. Add the following to your <code>main.tf</code> file to deploy the <a href="https://github.com/Azure-Samples/aks-store-demo" target="_blank" rel="noopener noreferrer">AKS Store Demo application</a> Helm chart:</p>
<div class="language-hcl codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-hcl codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">resource "helm_release" "example" {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name       = "aks-store-demo"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  repository = "https://azure-samples.github.io/aks-store-demo"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  chart      = "aks-store-demo-chart"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  version    = "1.5.0"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span></code></pre></div></div>
<p>Run the following commands and you'll see the Helm provider uses the configured authentication method to connect to the AKS Automatic cluster and deploy the AKS Store Demo application.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">terraform init -upgrade</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">terraform apply</span><br></span></code></pre></div></div>
<p>If all goes well, you should see output indicating the release was deployed successfully 🚀</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="conclusion">Conclusion<a href="https://blog.aks.azure.com/2026/01/09/deploy-aks-automatic-terraform-helm#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>AKS Automatic provides production-ready defaults that improve security but require a different authentication approach for automation tools like Helm or Kubernetes providers for Terraform. By using the <code>exec</code> plugin with kubelogin or bearer token-based approaches, you can seamlessly integrate Helm deployments into your Terraform workflow while maintaining the security benefits of Azure RBAC.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="resources">Resources<a href="https://blog.aks.azure.com/2026/01/09/deploy-aks-automatic-terraform-helm#resources" class="hash-link" aria-label="Direct link to Resources" title="Direct link to Resources" translate="no">​</a></h2>
<ul>
<li><a href="https://learn.microsoft.com/azure/aks/intro-aks-automatic" target="_blank" rel="noopener noreferrer">AKS Automatic documentation</a></li>
<li><a href="https://registry.terraform.io/providers/Azure/azapi/latest/docs" target="_blank" rel="noopener noreferrer">AzApi provider documentation</a></li>
<li><a href="https://registry.terraform.io/providers/hashicorp/helm/latest/docs" target="_blank" rel="noopener noreferrer">Terraform Helm Provider</a></li>
<li><a href="https://azure.github.io/kubelogin/index.html" target="_blank" rel="noopener noreferrer">kubelogin documentation</a></li>
<li><a href="https://learn.microsoft.com/azure/aks/manage-azure-rbac" target="_blank" rel="noopener noreferrer">Azure Kubernetes Service RBAC roles</a></li>
</ul>]]></content:encoded>
            <category>AKS Automatic</category>
            <category>Developer</category>
            <category>Best Practices</category>
        </item>
        <item>
            <title><![CDATA[AI Conformant Azure Kubernetes Service (AKS) clusters]]></title>
            <link>https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks</link>
            <guid>https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks</guid>
            <pubDate>Tue, 09 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn about the Kubernetes AI Conformance Program, why it matters for AI/ML workloads, and how to create AI-conformant AKS clusters.]]></description>
            <content:encoded><![CDATA[<p>As organizations increasingly move AI workloads into production, they need consistent and interoperable infrastructure they can rely on. The Cloud Native Computing Foundation (CNCF) launched the <strong>Kubernetes AI Conformance Program</strong> to address this need by creating open, community-defined standards for running AI workloads on Kubernetes. See <a href="https://www.cncf.io/announcements/2025/11/11/cncf-launches-certified-kubernetes-ai-conformance-program-to-standardize-ai-workloads-on-kubernetes/" target="_blank" rel="noopener noreferrer">CNCF Kubernetes AI Conformance Announcement</a> at KubeCon North America 2025.</p>
<p>Azure Kubernetes Service (AKS) is proud to be among the first platforms certified for Kubernetes AI Conformance, demonstrating our commitment to providing customers with a verified, standardized platform for running AI workloads.</p>
<p><img decoding="async" loading="lazy" alt="Picture showing the Azure Kubernetes Service (AKS) logo and the AI Conformance badge" src="https://blog.aks.azure.com/assets/images/aks-ai-conformance-crop-8334eaa2b9085721285f6a3977677afb.png" width="1280" height="500" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-is-the-kubernetes-ai-conformance-program">What is the Kubernetes AI Conformance Program?<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#what-is-the-kubernetes-ai-conformance-program" class="hash-link" aria-label="Direct link to What is the Kubernetes AI Conformance Program?" title="Direct link to What is the Kubernetes AI Conformance Program?" translate="no">​</a></h2>
<p>The <a href="https://github.com/cncf/k8s-ai-conformance" target="_blank" rel="noopener noreferrer">Kubernetes AI Conformance Program</a> defines a standard set of capabilities, APIs, and configurations that a Kubernetes cluster must offer to reliably and efficiently run AI and ML workloads. Building on CNCF's successful <a href="https://www.cncf.io/certification/software-conformance/" target="_blank" rel="noopener noreferrer">Certified Kubernetes Conformance Program</a>, which brought together more than 100 certified distributions and platforms, this new initiative applies the same proven model to AI infrastructure.</p>
<p>The program is developed in the open by the Kubernetes community.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-ai-conformance-matters">Why AI Conformance matters<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#why-ai-conformance-matters" class="hash-link" aria-label="Direct link to Why AI Conformance matters" title="Direct link to Why AI Conformance matters" translate="no">​</a></h2>
<p>Running AI workloads on Kubernetes introduces unique challenges. Teams often struggle with GPU driver compatibility, specialized scheduling requirements for distributed training, and the complexity of exposing inference endpoints at scale. Without a common baseline, organizations risk building on fragmented, vendor-specific implementations that limit flexibility and portability.</p>
<p>According to <a href="https://www.linuxfoundation.org/hubfs/Research%20Reports/lfr_sovereign_ai_090525a.pdf" target="_blank" rel="noopener noreferrer">Linux Foundation Research on Sovereign AI</a>, 82% of organizations are already building custom AI solutions, and 58% use Kubernetes to support those workloads. With 90% of enterprises identifying open-source software as critical to their AI strategies, the risk of fragmentation and inconsistent performance is rising. The Kubernetes AI Conformance Program responds directly to this need.</p>
<p>The AI Conformance Program addresses these challenges by establishing a verified set of capabilities that every conformant platform must support. When you deploy on AKS, a certified AI-conformant platform, you benefit from:</p>
<ul>
<li><strong>Predictable scaling</strong>: Your AI workloads scale consistently using standardized APIs and autoscaling behaviors</li>
<li><strong>Hardware optimization</strong>: GPU and accelerator resources are managed through proven Kubernetes primitives, maximizing utilization</li>
<li><strong>Workload mobility</strong>: Applications built for one conformant platform work on any other, reducing vendor lock-in</li>
<li><strong>Ecosystem compatibility</strong>: Popular ML frameworks, operators, and tools function reliably because they target a known, tested baseline</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="key-requirements-for-ai-conformance">Key requirements for AI Conformance<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#key-requirements-for-ai-conformance" class="hash-link" aria-label="Direct link to Key requirements for AI Conformance" title="Direct link to Key requirements for AI Conformance" translate="no">​</a></h2>
<p>The Kubernetes AI Conformance program maintains a versioned specification of requirements for each Kubernetes release starting with v1.33. Each requirement goes through a graduation process, starting as a SHOULD recommendation and may eventually become a MUST for certification.</p>
<p>Here are the <a href="https://github.com/cncf/k8s-ai-conformance/blob/main/docs/AIConformance-1.34.yaml" target="_blank" rel="noopener noreferrer">requirements for Kubernetes v1.34</a> and how AKS meets each requirement:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="accelerators">Accelerators<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#accelerators" class="hash-link" aria-label="Direct link to Accelerators" title="Direct link to Accelerators" translate="no">​</a></h3>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="dynamic-resource-allocation-dra">Dynamic Resource Allocation (DRA)<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#dynamic-resource-allocation-dra" class="hash-link" aria-label="Direct link to Dynamic Resource Allocation (DRA)" title="Direct link to Dynamic Resource Allocation (DRA)" translate="no">​</a></h4>
<p>Traditional resource requests in Kubernetes use simple numeric counts (for example, <code>nvidia.com/gpu: 1</code>). DRA introduces a more flexible model where workloads can specify device characteristics, request specific GPU models, or express preferences about memory and compute capabilities. DRA APIs are enabled by default in Kubernetes 1.34 on AKS. For a deep dive into how DRA works and how to use it with GPU drivers, see our blog post on <a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes">DRA: Devices and Drivers on Kubernetes</a>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="networking">Networking<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#networking" class="hash-link" aria-label="Direct link to Networking" title="Direct link to Networking" translate="no">​</a></h3>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="gateway-api-for-ai-inference">Gateway API for AI inference<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#gateway-api-for-ai-inference" class="hash-link" aria-label="Direct link to Gateway API for AI inference" title="Direct link to Gateway API for AI inference" translate="no">​</a></h4>
<p>Inference services often need sophisticated traffic routing: sending a percentage of requests to a new model version, routing based on request headers (including OpenAI protocol headers), or implementing canary deployments. The Kubernetes Gateway API introduces improved routing capabilities for a more flexible architecture for traffic management. AKS supports Gateway API through both the <a href="https://learn.microsoft.com/azure/aks/istio-gateway-api" target="_blank" rel="noopener noreferrer">Istio-based service mesh add-on</a> and <a href="https://aka.ms/agc/addon" target="_blank" rel="noopener noreferrer">Application Gateway for Containers</a>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="scheduling-and-orchestration">Scheduling and orchestration<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#scheduling-and-orchestration" class="hash-link" aria-label="Direct link to Scheduling and orchestration" title="Direct link to Scheduling and orchestration" translate="no">​</a></h3>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="gang-scheduling">Gang scheduling<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#gang-scheduling" class="hash-link" aria-label="Direct link to Gang scheduling" title="Direct link to Gang scheduling" translate="no">​</a></h4>
<p>Distributed training jobs often require multiple pods to start simultaneously. If only some pods in a training job get scheduled, the others wait indefinitely, wasting resources. Gang scheduling solves this by treating a group of pods as a single unit: either all pods get scheduled together, or none do. You can run <a href="https://learn.microsoft.com/azure/aks/kueue-overview" target="_blank" rel="noopener noreferrer">Kueue on AKS</a> to enable gang scheduling. To get started, refer to the guidance on <a href="https://learn.microsoft.com/azure/aks/deploy-batch-jobs-with-kueue" target="_blank" rel="noopener noreferrer">deploying batch jobs with Kueue</a>.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="cluster-autoscaling">Cluster autoscaling<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#cluster-autoscaling" class="hash-link" aria-label="Direct link to Cluster autoscaling" title="Direct link to Cluster autoscaling" translate="no">​</a></h4>
<p>GPU nodes are expensive. The cluster autoscaler must intelligently provision GPU nodes when training or inference pods are pending, and scale them down during idle periods. On AKS, you can combine the <a href="https://learn.microsoft.com/azure/aks/cluster-autoscaler" target="_blank" rel="noopener noreferrer">cluster autoscaler</a> with <a href="https://learn.microsoft.com/azure/aks/autoscale-gpu-workloads-with-keda" target="_blank" rel="noopener noreferrer">KEDA and NVIDIA DCGM metrics</a> to automatically scale GPU node pools based on real-time utilization. You can configure GPU node pools to <a href="https://learn.microsoft.com/azure/aks/autoscale-gpu-workloads-with-keda#scale-down-the-gpu-node-pool" target="_blank" rel="noopener noreferrer">scale down to zero</a> when no workloads are running, minimizing costs.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="pod-autoscaling">Pod autoscaling<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#pod-autoscaling" class="hash-link" aria-label="Direct link to Pod autoscaling" title="Direct link to Pod autoscaling" translate="no">​</a></h4>
<p>Inference workloads need to scale pod replicas based on demand. The Horizontal Pod Autoscaler (HPA) must work correctly with GPU-enabled pods and support custom metrics relevant to AI workloads. On AKS, you can use <a href="https://learn.microsoft.com/azure/aks/autoscale-gpu-workloads-with-keda" target="_blank" rel="noopener noreferrer">KEDA with NVIDIA DCGM metrics</a> to scale pods based on GPU metrics like <code>DCGM_FI_DEV_GPU_UTIL</code> (GPU utilization percentage). This enables scaling decisions based on actual GPU usage rather than just CPU or memory.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="observability">Observability<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#observability" class="hash-link" aria-label="Direct link to Observability" title="Direct link to Observability" translate="no">​</a></h3>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="accelerator-performance-metrics">Accelerator performance metrics<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#accelerator-performance-metrics" class="hash-link" aria-label="Direct link to Accelerator performance metrics" title="Direct link to Accelerator performance metrics" translate="no">​</a></h4>
<p>Observability is critical for AI workloads. Platforms must expose GPU utilization, memory consumption, temperature, power draw, and other accelerator metrics through standard endpoints. You can deploy the <a href="https://learn.microsoft.com/azure/aks/monitor-gpu-metrics" target="_blank" rel="noopener noreferrer">NVIDIA DCGM exporter</a> on AKS to expose GPU metrics in Prometheus format, and use <a href="https://learn.microsoft.com/azure/aks/gpu-health-monitoring" target="_blank" rel="noopener noreferrer">GPU health monitoring</a> for proactive issue detection with Node Problem Detector (NPD) in your AI pipelines.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="ai-service-metrics">AI service metrics<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#ai-service-metrics" class="hash-link" aria-label="Direct link to AI service metrics" title="Direct link to AI service metrics" translate="no">​</a></h4>
<p>Platforms must provide a monitoring system capable of discovering and collecting metrics from workloads that expose them in standard formats like Prometheus. AKS integrates with <a href="https://learn.microsoft.com/azure/azure-monitor/containers/kubernetes-monitoring-enable" target="_blank" rel="noopener noreferrer">Azure Monitor for containers</a> and provides <a href="https://learn.microsoft.com/azure/azure-monitor/containers/container-insights-gpu-monitoring" target="_blank" rel="noopener noreferrer">Container Insights GPU monitoring</a> for observability needs in your environment. For AI inference workloads, you can <a href="https://learn.microsoft.com/azure/aks/ai-toolchain-operator-monitoring" target="_blank" rel="noopener noreferrer">monitor and visualize vLLM inference metrics</a> with Azure Managed Prometheus and Azure Managed Grafana when using the AI toolchain operator (KAITO) add-on.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="security">Security<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#security" class="hash-link" aria-label="Direct link to Security" title="Direct link to Security" translate="no">​</a></h3>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="secure-accelerator-access">Secure accelerator access<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#secure-accelerator-access" class="hash-link" aria-label="Direct link to Secure accelerator access" title="Direct link to Secure accelerator access" translate="no">​</a></h4>
<p>Access to accelerators from within containers must be properly isolated and mediated by the Kubernetes resource management framework (device plugin or DRA) and container runtime, preventing unauthorized access or interference between workloads. See the <a href="https://github.com/Azure/AKS/tree/master/ai-conformance/v1.34/secure_accelerator_access" target="_blank" rel="noopener noreferrer">AKS secure accelerator access evidence</a> for implementation details.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="operators">Operators<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#operators" class="hash-link" aria-label="Direct link to Operators" title="Direct link to Operators" translate="no">​</a></h3>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="ai-operator-support">AI operator support<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#ai-operator-support" class="hash-link" aria-label="Direct link to AI operator support" title="Direct link to AI operator support" translate="no">​</a></h4>
<p>Modern AI platforms rely on Kubernetes operators to manage complex resources like training jobs, model servers, and distributed training coordinators. AKS supports the <a href="https://learn.microsoft.com/azure/aks/ai-toolchain-operator" target="_blank" rel="noopener noreferrer">AI Toolchain Operator (KAITO)</a> to run inferencing, fine-tuning, and retrieval augmented generation (RAG). You can also install and run operators with custom resource definitions, such as <a href="https://learn.microsoft.com/azure/aks/ray-overview" target="_blank" rel="noopener noreferrer">Ray on AKS</a>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="creating-an-ai-conformant-aks-cluster">Creating an AI-conformant AKS cluster<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#creating-an-ai-conformant-aks-cluster" class="hash-link" aria-label="Direct link to Creating an AI-conformant AKS cluster" title="Direct link to Creating an AI-conformant AKS cluster" translate="no">​</a></h2>
<p>To create an AI-conformant AKS cluster, be sure to choose a certified Kubernetes version with the appropriate features enabled.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="step-1-create-an-aks-cluster">Step 1. Create an AKS cluster<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#step-1-create-an-aks-cluster" class="hash-link" aria-label="Direct link to Step 1. Create an AKS cluster" title="Direct link to Step 1. Create an AKS cluster" translate="no">​</a></h3>
<p>AKS achieved AI Conformance certification starting with Kubernetes 1.34. Create an AKS cluster running Kubernetes 1.34 or later:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group &lt;resource-group&gt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name &lt;cluster-name&gt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --enable-azure-monitor-metrics \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --kubernetes-version 1.34.0</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="step-2-add-a-gpu-node-pool">Step 2. Add a GPU node pool<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#step-2-add-a-gpu-node-pool" class="hash-link" aria-label="Direct link to Step 2. Add a GPU node pool" title="Direct link to Step 2. Add a GPU node pool" translate="no">​</a></h3>
<p>For GPU-accelerated workloads, add a <a href="https://learn.microsoft.com/azure/aks/aks-managed-gpu-nodes" target="_blank" rel="noopener noreferrer">fully managed GPU node pool (preview)</a> with <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/overview#gpu-accelerated" target="_blank" rel="noopener noreferrer">GPU-enabled VMs</a>. For a select set of NVIDIA GPU SKUs, AKS automatically installs the GPU driver, device plugin, and DCGM metrics exporter.</p>
<p>First, register the feature flag:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az feature register --namespace Microsoft.ContainerService --name ManagedGPUExperiencePreview</span><br></span></code></pre></div></div>
<p>Then add the GPU node pool with the <code>EnableManagedGPUExperience</code> tag:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks nodepool add \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group &lt;resource-group&gt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --cluster-name &lt;cluster-name&gt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name gpunp \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --node-count 1 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --node-vm-size Standard_NC40ads_H100_v5 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --node-taints sku=gpu:NoSchedule \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --enable-cluster-autoscaler \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --min-count 1 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --max-count 3 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --tags EnableManagedGPUExperience=true</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="step-3-enable-the-istio-service-mesh-with-gateway-api-optional">Step 3. Enable the Istio service mesh with Gateway API (optional)<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#step-3-enable-the-istio-service-mesh-with-gateway-api-optional" class="hash-link" aria-label="Direct link to Step 3. Enable the Istio service mesh with Gateway API (optional)" title="Direct link to Step 3. Enable the Istio service mesh with Gateway API (optional)" translate="no">​</a></h3>
<p>If your AI workloads require advanced traffic management capabilities, you can use the <a href="https://learn.microsoft.com/azure/aks/istio-gateway-api" target="_blank" rel="noopener noreferrer">Istio service mesh add-on with the Gateway API (preview)</a>.</p>
<p>First, install the Azure CLI preview extension and register the feature flag:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Install the aks-preview extension</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">az extension add --name aks-preview</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Register the Gateway API feature flag</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">az feature register --namespace "Microsoft.ContainerService" --name "ManagedGatewayAPIPreview"</span><br></span></code></pre></div></div>
<p>Then enable the Istio service mesh and Gateway API on your cluster:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks mesh enable \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group &lt;resource-group&gt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name &lt;cluster-name&gt;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">az aks update \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group &lt;resource-group&gt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name &lt;cluster-name&gt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --enable-gateway-api</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="step-4-enable-prometheus-metrics-optional">Step 4. Enable Prometheus metrics (optional)<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#step-4-enable-prometheus-metrics-optional" class="hash-link" aria-label="Direct link to Step 4. Enable Prometheus metrics (optional)" title="Direct link to Step 4. Enable Prometheus metrics (optional)" translate="no">​</a></h3>
<p>If you need to collect metrics about your applications and infrastructure, you can optionally install the metrics add-on that scrapes Prometheus metrics.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks update \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group &lt;resource-group&gt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name &lt;cluster-name&gt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --enable-azure-monitor-metrics</span><br></span></code></pre></div></div>
<p>Your cluster now meets the requirements for Kubernetes AI Conformance. You can now optionally <a href="https://learn.microsoft.com/azure/aks/kueue-overview" target="_blank" rel="noopener noreferrer">install and run Kueue</a> to enable gang scheduling and <a href="https://learn.microsoft.com/azure/aks/deploy-batch-jobs-with-kueue" target="_blank" rel="noopener noreferrer">deploy batch jobs</a>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="conclusion">Conclusion<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>The Kubernetes AI Conformance Program represents an important step forward for the AI ecosystem, aligning the community around shared standards that make deploying AI at scale more consistent and reliable. By using an AI-conformant platform like AKS, you can build AI applications that are production-ready, portable, and efficient without reinventing infrastructure for every deployment. For more information about the program and how to get involved, check out the <a href="https://github.com/cncf/k8s-ai-conformance" target="_blank" rel="noopener noreferrer">CNCF Kubernetes AI Conformance Repository</a>.</p>
<p>AKS's certification demonstrates Microsoft's commitment to open standards and ensures that your AI workloads can run reliably on a verified platform. Start building your AI-conformant AKS cluster today and take advantage of the growing ecosystem of compatible tools and frameworks.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="resources">Resources<a href="https://blog.aks.azure.com/2025/12/05/kubernetes-ai-conformance-aks#resources" class="hash-link" aria-label="Direct link to Resources" title="Direct link to Resources" translate="no">​</a></h2>
<ul>
<li><a href="https://github.com/cncf/k8s-ai-conformance" target="_blank" rel="noopener noreferrer">CNCF Kubernetes AI Conformance Repository</a></li>
<li><a href="https://learn.microsoft.com/azure/aks/ai-ml-overview" target="_blank" rel="noopener noreferrer">AKS AI/ML Documentation</a></li>
<li><a href="https://www.cncf.io/announcements/2025/11/11/cncf-launches-certified-kubernetes-ai-conformance-program-to-standardize-ai-workloads-on-kubernetes/" target="_blank" rel="noopener noreferrer">CNCF Kubernetes AI Conformance Announcement</a></li>
</ul>]]></content:encoded>
            <category>AI</category>
            <category>Best Practices</category>
            <category>General</category>
        </item>
        <item>
            <title><![CDATA[Announcing AKS Automatic managed system node pools (preview) and the Pod readiness SLA]]></title>
            <link>https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools</link>
            <guid>https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools</guid>
            <pubDate>Wed, 26 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how AKS Automatic now offers managed system node pools to ship apps faster. The Pod readiness SLA guarantees your apps are serving users, beyond a healthy control plane.]]></description>
            <content:encoded><![CDATA[<p>In Azure Kubernetes Service (AKS), nodes with the same configuration (operating system and VM size) are grouped into <em>node pools</em>. AKS clusters use two node pool modes: <em>system node pools</em> host critical platform components that keep your cluster running, while <em>user node pools</em> run your application workloads. Traditionally, you manage both types yourself. You select VM sizes, set node counts, configure autoscaling, and plan capacity for system components. As your cluster grows or workload requirements change, you must revisit these settings to maintain resiliency.</p>
<p>AKS Automatic simplifies this by enabling teams to ship applications with production-grade defaults from day one. With <strong>managed system node pools (preview)</strong>, AKS takes this further. The system pool is now fully managed by Microsoft. Core cluster components run on Microsoft-owned infrastructure, so you no longer provision, patch, or scale system nodes. You focus on your apps while AKS handles the operational overhead of keeping the cluster healthy.</p>
<p>Automatic clusters with managed system node pools also introduce the <strong>Pod readiness Service Level Agreement (SLA)</strong>. Beyond API server uptime, AKS now guarantees your pods reach readiness and serve users.</p>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>Learn more in the official documentation: <a href="https://learn.microsoft.com/azure/aks/automatic/aks-automatic-managed-system-node-pools-about" target="_blank" rel="noopener noreferrer">Managed system node pools on AKS Automatic (preview)</a></p></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-it-matters">Why it matters<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#why-it-matters" class="hash-link" aria-label="Direct link to Why it matters" title="Direct link to Why it matters" translate="no">​</a></h2>
<ul>
<li><strong>Reduced operational overhead:</strong> AKS handles provisioning, patching, upgrades, and scaling for the system pool, so you spend less time on infrastructure maintenance.</li>
<li><strong>Managed add-on hosting at lower cost:</strong> Core services like Azure Monitor collectors, CoreDNS, KEDA, VPA, Konnectivity, Eraser, and Metrics Server run on Microsoft-owned infrastructure. Some add-ons and DaemonSets still run on nodes in your subscription.</li>
<li><strong>Built-in security policies:</strong> Deployment Safeguards enforce pod security standards, restrict access to platform namespaces, and block risky configurations by default.</li>
<li><strong>Automatic upgrades:</strong> AKS keeps platform components current, reducing the risk of running outdated or vulnerable system software.</li>
<li><strong>Pod readiness SLA:</strong> A financially backed guarantee that your pods reach readiness and serve traffic, not just that your cluster is healthy. Refer to the <a href="https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services" target="_blank" rel="noopener noreferrer">SLA</a> for details.</li>
</ul>
<p><img decoding="async" loading="lazy" alt="Architecture diagram showing managed system node pools hosted on Microsoft infrastructure with platform components separated from user workloads" src="https://blog.aks.azure.com/assets/images/aks-managed-arch-36c76b9600d6c818707cb060850e4f5c.svg" width="3084" height="1856" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="components-running-on-managed-system-node-pools">Components running on managed system node pools<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#components-running-on-managed-system-node-pools" class="hash-link" aria-label="Direct link to Components running on managed system node pools" title="Direct link to Components running on managed system node pools" translate="no">​</a></h2>
<p>AKS manages the following platform components on the managed system node pool. You don't need to provision capacity for these services.</p>
<table><thead><tr><th>Component</th><th>Description</th></tr></thead><tbody><tr><td><a href="https://learn.microsoft.com/azure/aks/monitor-aks" target="_blank" rel="noopener noreferrer">Azure Monitor</a></td><td>Collects container logs, scrapes Prometheus metrics, and gathers Kubernetes object state for observability and alerting</td></tr><tr><td><a href="https://learn.microsoft.com/azure/aks/coredns-custom" target="_blank" rel="noopener noreferrer">CoreDNS</a></td><td>Provides cluster DNS resolution for service discovery</td></tr><tr><td><a href="https://learn.microsoft.com/azure/aks/image-cleaner" target="_blank" rel="noopener noreferrer">Eraser</a></td><td>Removes unused and vulnerable container images from nodes</td></tr><tr><td><a href="https://learn.microsoft.com/azure/aks/keda-about" target="_blank" rel="noopener noreferrer">KEDA</a></td><td>Scales workloads based on event-driven metrics such as queue length or HTTP traffic</td></tr><tr><td>Konnectivity</td><td>Maintains secure connectivity between the control plane and nodes</td></tr><tr><td><a href="https://learn.microsoft.com/azure/aks/monitor-aks-reference" target="_blank" rel="noopener noreferrer">Metrics Server</a></td><td>Exposes resource metrics for Horizontal Pod Autoscaler and kubectl top</td></tr><tr><td><a href="https://learn.microsoft.com/azure/aks/vertical-pod-autoscaler" target="_blank" rel="noopener noreferrer">VPA</a></td><td>Recommends and applies optimal CPU and memory requests for pods</td></tr><tr><td><a href="https://learn.microsoft.com/azure/aks/workload-identity-overview" target="_blank" rel="noopener noreferrer">Workload Identity webhook</a></td><td>Injects Azure environment variables and projected service account tokens into pods for Microsoft Entra ID authentication</td></tr></tbody></table>
<p>Other add-ons and extensions, outside of that list, run on <code>aks-system-surge</code> nodes, with scaling handled by <a href="https://learn.microsoft.com/azure/aks/node-auto-provisioning" target="_blank" rel="noopener noreferrer">Node Auto-Provisioning (NAP)</a>. <code>DaemonSets</code> run on both managed system node pools and nodes in your subscription.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-managed-system-node-pools-differ-from-traditional-system-node-pools">How managed system node pools differ from traditional system node pools<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#how-managed-system-node-pools-differ-from-traditional-system-node-pools" class="hash-link" aria-label="Direct link to How managed system node pools differ from traditional system node pools" title="Direct link to How managed system node pools differ from traditional system node pools" translate="no">​</a></h2>
<table><thead><tr><th>Aspect</th><th>AKS Standard system pool</th><th>AKS Automatic managed system pool</th></tr></thead><tbody><tr><td><strong>Provisioning</strong></td><td>You create the pool, select VM SKUs, set node count, and configure OS disk size</td><td>AKS provisions and sizes the pool for you automatically</td></tr><tr><td><strong>Capacity planning</strong></td><td>You <a href="https://learn.microsoft.com/azure/aks/use-system-pools?tabs=azure-cli#system-and-user-node-pools" target="_blank" rel="noopener noreferrer">estimate headroom for system components</a> like CoreDNS, Konnectivity, metrics-server, and any add-ons; scale manually or configure cluster autoscaler with min/max counts</td><td>AKS right-sizes capacity for platform components and scales automatically when add-ons need more room without taking up quota in your subscription</td></tr><tr><td><strong>Cost</strong></td><td>System nodes are billed as standard VMs to your subscription; you pay for system pool capacity</td><td>System nodes do not run on your subscription</td></tr><tr><td><strong>Service Level Agreements (SLAs)</strong></td><td>API server uptime SLA</td><td>API server uptime SLA and pod readiness SLA</td></tr></tbody></table>
<p><img decoding="async" loading="lazy" alt="Comparison diagram showing AKS Standard requiring manual system pool management versus AKS Automatic with fully managed system pools" src="https://blog.aks.azure.com/assets/images/aks-standard-automatic-62874c91501cc88f99b651765f34bff5.png" width="1280" height="720" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="guardrails-for-security-and-reliability">Guardrails for security and reliability<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#guardrails-for-security-and-reliability" class="hash-link" aria-label="Direct link to Guardrails for security and reliability" title="Direct link to Guardrails for security and reliability" translate="no">​</a></h2>
<p>Security misconfigurations are a leading cause of container breaches. AKS Automatic addresses this by enforcing <a href="https://learn.microsoft.com/azure/aks/deployment-safeguards" target="_blank" rel="noopener noreferrer">Deployment Safeguards</a> that validate every workload against the <a href="https://kubernetes.io/docs/concepts/security/pod-security-standards/" target="_blank" rel="noopener noreferrer">Kubernetes Pod Security Standards</a> before it reaches your cluster. Baseline policies block dangerous privilege escalations while restricted policies enforce maximum hardening. Compliance flows into Azure Policy dashboards automatically.</p>
<p>These policies also improve workload reliability. Resource limits prevent runaway containers from starving neighbors. Health probes ensure traffic reaches only healthy pods. Anti-affinity rules spread replicas across failure domains. PodDisruptionBudget validation keeps node maintenance on schedule.</p>
<p>Since AKS manages the system node pool on your behalf, additional restrictions protect platform stability. User workloads cannot run on the managed system node pool and all create, update, and delete operations on managed system pool resources are denied since Microsoft hosts the system node pool outside of your subscription, as are pod <code>exec</code>, <code>attach</code>, and <code>kubectl debug</code> operations.</p>
<p><strong>Preventing container escapes:</strong> Blocking privileged containers, host namespaces, host ports, and hostPath volumes for alignment with security best practices.</p>
<p><strong>Reducing attack surface:</strong> Restricting Linux capabilities to a minimal set means processes run with only the permissions they need. Fewer capabilities translate directly to fewer exploitation opportunities.</p>
<p><strong>Enforcing least privilege:</strong> Requiring containers to run as non-root and disabling privilege escalation limits the blast radius of any vulnerability.</p>
<p><strong>Maintaining kernel protections:</strong> Seccomp, AppArmor, and SELinux profiles filter system calls and confine container behavior. Policies ensure these protections stay active.</p>
<p><strong>Enabling safe cluster operations:</strong> Limiting <code>sysctls</code> to safe parameters and protecting node objects ensures platform components run undisturbed and node drains proceed smoothly.</p>
<p>For detailed specifications, see the <a href="https://learn.microsoft.com/azure/aks/deployment-safeguards" target="_blank" rel="noopener noreferrer">Deployment Safeguards documentation</a>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="pod-readiness-sla-for-aks-automatic">Pod Readiness SLA for AKS Automatic<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#pod-readiness-sla-for-aks-automatic" class="hash-link" aria-label="Direct link to Pod Readiness SLA for AKS Automatic" title="Direct link to Pod Readiness SLA for AKS Automatic" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Diagram showing two SLA guarantees for AKS Automatic: 99.95% API server uptime and 99.9% pod readiness within 5 minutes" src="https://blog.aks.azure.com/assets/images/automatic-slas-aa7c4a367e669a58ba4a3623f7e173bc.png" width="1280" height="720" class="img_ev3q"></p>
<p>Uptime means more than a healthy control plane; it means your applications are actually serving users. The <a href="https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services" target="_blank" rel="noopener noreferrer">Pod Readiness SLA</a> guarantees that pods reach readiness targets, closing the gap between "the cluster is healthy" and "my app is ready."</p>
<ul>
<li><strong>Faster recovery during failures:</strong> Node failures and scale events trigger remediation so pods return to a ready state within defined thresholds.</li>
<li><strong>Predictable reliability:</strong> Availability planning aligns with measurable guarantees instead of best-effort behavior.</li>
<li><strong>Reduced operational overhead:</strong> Platform automation handles remediation, eliminating manual firefighting during disruptions.</li>
<li><strong>Business continuity at scale:</strong> Mission-critical services experience minimal disruption even during infrastructure events.</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="pricing">Pricing<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#pricing" class="hash-link" aria-label="Direct link to Pricing" title="Direct link to Pricing" translate="no">​</a></h2>
<p>AKS Automatic pricing includes a fixed monthly cluster fee and per-vCPU charges on top of standard VM compute costs. This pricing includes financially backed SLAs for both API server uptime and pod readiness. For current rates and a full breakdown by VM category, see the <a href="https://azure.microsoft.com/pricing/details/kubernetes-service#pricing" target="_blank" rel="noopener noreferrer">Azure Kubernetes Service pricing page</a>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="getting-started">Getting started<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#getting-started" class="hash-link" aria-label="Direct link to Getting started" title="Direct link to Getting started" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="prerequisites">Prerequisites<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#prerequisites" class="hash-link" aria-label="Direct link to Prerequisites" title="Direct link to Prerequisites" translate="no">​</a></h3>
<ul>
<li>Azure CLI 2.77.0 or later.</li>
<li><code>aks-preview</code> extension 19.0.0b15 or later.</li>
</ul>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Install or update the aks-preview extension</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">az extension add --name aks-preview</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">az extension update --name aks-preview</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="register-the-preview-feature">Register the preview feature<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#register-the-preview-feature" class="hash-link" aria-label="Direct link to Register the preview feature" title="Direct link to Register the preview feature" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az feature register --name AKS-AutomaticHostedSystemProfilePreview --namespace Microsoft.ContainerService</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="create-the-cluster">Create the cluster<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#create-the-cluster" class="hash-link" aria-label="Direct link to Create the cluster" title="Direct link to Create the cluster" translate="no">​</a></h3>
<p>Select a region where managed system node pools are available. Check the <a href="https://aka.ms/aks/automatic/managed-systempool-regions" target="_blank" rel="noopener noreferrer">supported regions for managed system node pools</a>.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="set-your-variables">Set your variables<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#set-your-variables" class="hash-link" aria-label="Direct link to Set your variables" title="Direct link to Set your variables" translate="no">​</a></h4>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">RESOURCE_GROUP="myResourceGroup"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">CLUSTER_NAME="myAKSCluster"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">LOCATION="westcentralus"  # Choose a supported region (see: https://aka.ms/aks/automatic/managed-systempool-regions)</span><br></span></code></pre></div></div>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="create-the-resource-group">Create the resource group<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#create-the-resource-group" class="hash-link" aria-label="Direct link to Create the resource group" title="Direct link to Create the resource group" translate="no">​</a></h4>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az group create --name $RESOURCE_GROUP --location $LOCATION</span><br></span></code></pre></div></div>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="create-an-automatic-cluster-with-a-managed-system-node-pool">Create an Automatic cluster with a managed system node pool<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#create-an-automatic-cluster-with-a-managed-system-node-pool" class="hash-link" aria-label="Direct link to Create an Automatic cluster with a managed system node pool" title="Direct link to Create an Automatic cluster with a managed system node pool" translate="no">​</a></h4>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--resource-group $RESOURCE_GROUP \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--name $CLUSTER_NAME \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--location $LOCATION \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--sku automatic \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--enable-hosted-system</span><br></span></code></pre></div></div>
<p>The output includes <code>"hostedSystemProfile": { "enabled": true }</code> confirming the feature is active.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="connect-to-the-cluster-and-deploy-an-application">Connect to the cluster and deploy an application<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#connect-to-the-cluster-and-deploy-an-application" class="hash-link" aria-label="Direct link to Connect to the cluster and deploy an application" title="Direct link to Connect to the cluster and deploy an application" translate="no">​</a></h3>
<p>Get credentials for your cluster and deploy the <a href="https://github.com/Azure-Samples/aks-store-demo" target="_blank" rel="noopener noreferrer">AKS Store demo application</a>:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl create ns aks-store-demo</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -n aks-store-demo -f https://aka.ms/aks/quickstarts/store.yaml</span><br></span></code></pre></div></div>
<p>Check the ingress address and open it in your browser once an IP is assigned:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get ingress store-front -n aks-store-demo --watch</span><br></span></code></pre></div></div>
<p><img decoding="async" loading="lazy" alt="Screenshot of the deployed application on an AKS Automatic cluster" src="https://blog.aks.azure.com/assets/images/contoso-pet-store-e50932ff57b958c0ed3bc7c9e29254b9.png" width="1263" height="823" class="img_ev3q"></p>
<p>Your workload runs on user node pools in your subscription that Node Auto Provisioning will create, while system services stay on the managed pool.</p>
<p><img decoding="async" loading="lazy" alt="Screenshot of AKS desktop application showing the nodes in the cluster" src="https://blog.aks.azure.com/assets/images/aks-desktop-nodes-c61ab307f0beed6e2744c514b79574ea.png" width="3630" height="1896" class="img_ev3q"></p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>Prefer a graphical experience? <a href="https://learn.microsoft.com/azure/aks/aks-desktop-overview" target="_blank" rel="noopener noreferrer">AKS Desktop</a> lets you manage clusters, view workloads, and troubleshoot issues without leaving your desktop.</p></div></div>
<p>The managed system nodes will not be running in your Azure subscription.</p>
<p><img decoding="async" loading="lazy" alt="Screenshot of the Azure portal showing that the managed system nodes are not there" src="https://blog.aks.azure.com/assets/images/portal-vms-3ecb15c9ffb98d2d32cd0cbbbd52f451.png" width="1832" height="842" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="looking-ahead">Looking ahead<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#looking-ahead" class="hash-link" aria-label="Direct link to Looking ahead" title="Direct link to Looking ahead" translate="no">​</a></h2>
<p>Upcoming improvements include custom virtual network support, optimized platform components with reduced resource overhead, faster cluster provisioning, and a streamlined path to Deployment Safeguards compliance. Longer term, managed system node pools will extend to all existing AKS Automatic clusters.</p>
<p>Follow the <a href="https://aka.ms/aks/roadmap" target="_blank" rel="noopener noreferrer">AKS public roadmap</a> for updates on these features.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="next-steps">Next steps<a href="https://blog.aks.azure.com/2025/11/26/aks-automatic-managed-system-node-pools#next-steps" class="hash-link" aria-label="Direct link to Next steps" title="Direct link to Next steps" translate="no">​</a></h2>
<p>Ready to get started?</p>
<ol>
<li><strong>Try it now:</strong> Follow the <a href="https://learn.microsoft.com/azure/aks/automatic/aks-automatic-managed-system-node-pools" target="_blank" rel="noopener noreferrer">managed system node pools quickstart</a>.</li>
<li><strong>Share feedback:</strong> Open issues or ideas in <a href="https://github.com/Azure/AKS/issues" target="_blank" rel="noopener noreferrer">AKS GitHub Issues</a>.</li>
<li><strong>Join the community:</strong> Subscribe to the <a href="https://www.youtube.com/@theakscommunity" target="_blank" rel="noopener noreferrer">AKS Community YouTube</a> and follow <a href="https://x.com/theakscommunity" target="_blank" rel="noopener noreferrer">@theakscommunity</a> on X.</li>
</ol>
<p>Share your experience with how managed system node pools simplify your operations and where the service can continue to improve.</p>]]></content:encoded>
            <category>AKS Automatic</category>
        </item>
        <item>
            <title><![CDATA[Recommendations for container and security optimized OS options on Azure Kubernetes Service (AKS)]]></title>
            <link>https://blog.aks.azure.com/2025/11/20/recommendations-for-container-and-security-optimized-os-options-on-aks</link>
            <guid>https://blog.aks.azure.com/2025/11/20/recommendations-for-container-and-security-optimized-os-options-on-aks</guid>
            <pubDate>Thu, 20 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Discover best practices and actionable guidance to help you select a container or security optimized OS for your AKS deployments.]]></description>
            <content:encoded><![CDATA[<p>Selecting an operating system for your Kubernetes deployments may appear straightforward; however, this decision can significantly influence both security and operational complexity. In this blog, we’ll share key recommendations to help you select a container optimized OS for your AKS deployments.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="selecting-a-linux-os-option">Selecting a Linux OS option<a href="https://blog.aks.azure.com/2025/11/20/recommendations-for-container-and-security-optimized-os-options-on-aks#selecting-a-linux-os-option" class="hash-link" aria-label="Direct link to Selecting a Linux OS option" title="Direct link to Selecting a Linux OS option" translate="no">​</a></h2>
<p>AKS has just released support for two new Linux OS options:</p>
<ul>
<li><a href="https://aka.ms/aks/azure-linux-os-guard" target="_blank" rel="noopener noreferrer">Azure Linux OS Guard (preview)</a> is Microsoft-created and optimized for Azure. OS Guard is built on top of Azure Linux with specialized configuration to support containerized workloads with security optimizations.</li>
<li><a href="https://aka.ms/aks/flatcar" target="_blank" rel="noopener noreferrer">Flatcar Container Linux for AKS (preview)</a> is a CNCF-governed, vendor-neutral, container-optimized immutable OS, best suited for running on multi-cloud and on-prem environments.</li>
</ul>
<p>As contributors to both projects, we understand the distinct customer needs each solution addresses. Customers running containerized workloads across multiple clouds and seeking consistency during critical OS updates typically choose Ubuntu or Flatcar Container Linux. Security-focused enterprises operating primarily on Azure and prioritizing a unified support experience opt for Azure Linux or OS Guard. Both approaches are valid, distinct, and fully supported by Microsoft.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-different-about-optimized-linux-os-options">What's different about optimized Linux OS options?<a href="https://blog.aks.azure.com/2025/11/20/recommendations-for-container-and-security-optimized-os-options-on-aks#whats-different-about-optimized-linux-os-options" class="hash-link" aria-label="Direct link to What's different about optimized Linux OS options?" title="Direct link to What's different about optimized Linux OS options?" translate="no">​</a></h2>
<p>The main optimization in OS options like <a href="https://aka.ms/aks/azure-linux-os-guard" target="_blank" rel="noopener noreferrer">Azure Linux OS Guard</a> and <a href="https://aka.ms/aks/flatcar" target="_blank" rel="noopener noreferrer">Flatcar Container Linux</a> is their immutability.</p>
<p>An immutable operating system refers to a type of operating system that cannot be modified at runtime. All OS binaries, libraries and static configuration are read-only, while the bit-for-bit integrity is often cryptographically protected. These special purpose operating systems usually come without any kind of package management or other traditional means of altering the OS, shipping as self-contained images. User workloads run in isolated environments like containers, sandboxed from the OS.</p>
<p>While these are certainly limiting factors compared to general purpose operating systems, immutable systems perform unparalleled in security and compliance:</p>
<ul>
<li>Binaries cannot be changed, eliminating whole classes of sandbox escapes and exploits.</li>
<li>Special purpose operating systems include only what’s absolutely necessary, minimizing the attack surface.</li>
<li>As individual parts of the OS cannot be swapped in or out, any given OS release always corresponds to the full version-set of all software and libraries included with that release. This significantly eases software inventory management and makes version drift impossible.</li>
</ul>
<p>What’s more, immutable operating systems can bring similar benefits to node configuration. By applying node configuration at provisioning time only, there is no configuration drift. To phrase it differently, a node does not hold <em>state</em>, its state is defined by configuration passed during provisioning – making node provisioning reproducible.</p>
<p>While immutability is the core difference, there's typically more security features offered with Kubernetes optimized OS options:</p>
<table><thead><tr><th></th><th><a href="https://aka.ms/aks/azure-linux-os-guard" target="_blank" rel="noopener noreferrer">Azure Linux OS Guard</a></th><th><a href="https://aka.ms/aks/flatcar" target="_blank" rel="noopener noreferrer">Flatcar Container Linux for AKS</a></th><th>General purpose Linux OS</th></tr></thead><tbody><tr><td>Filesystem</td><td>Immutable (read-only)</td><td>Immutable (read-only)</td><td>Writable (read-write)</td></tr><tr><td>Focus on</td><td>Trusted code execution backed by IPE (Integrity Policy Enforcement)</td><td>Multi-cloud, on-prem, Adaptability and sovereignty</td><td>Extensibility, flexibility, and choice</td></tr><tr><td>Mandatory Access Control</td><td>SELinux</td><td>SELinux</td><td>AppArmor</td></tr><tr><td>Secure Boot</td><td>Supported by default with UKI (Unified Kernel Image)</td><td>Not yet supported with AKS</td><td>Supported with certain VM sizes</td></tr></tbody></table>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="recommendations-for-linux-os-on-aks">Recommendations for Linux OS on AKS<a href="https://blog.aks.azure.com/2025/11/20/recommendations-for-container-and-security-optimized-os-options-on-aks#recommendations-for-linux-os-on-aks" class="hash-link" aria-label="Direct link to Recommendations for Linux OS on AKS" title="Direct link to Recommendations for Linux OS on AKS" translate="no">​</a></h2>
<p>When deciding between which Linux OS options to use, we recommend the following:</p>
<ul>
<li>Use <a href="https://aka.ms/aks/flatcar" target="_blank" rel="noopener noreferrer"><strong>Flatcar Container Linux for AKS (preview)</strong></a> if you're looking for a vendor neutral, community stewarded immutable OS with cross-cloud support.</li>
<li>Use <a href="https://aka.ms/aks/azure-linux-os-guard" target="_blank" rel="noopener noreferrer"><strong>Azure Linux OS Guard (preview)</strong></a> if you're looking for an immutable OS that is Microsoft-created and optimized for Azure.</li>
<li>Use <a href="https://aka.ms/aks/supported-ubuntu-versions" target="_blank" rel="noopener noreferrer">Ubuntu</a> if you're looking for a portable, general purpose OS with cross-cloud support.</li>
<li>Use <a href="https://aka.ms/aks/use-azure-linux" target="_blank" rel="noopener noreferrer">Azure Linux</a> if you're looking for a general purpose OS that is Microsoft-created and optimized for Azure.</li>
</ul>
<p><img decoding="async" loading="lazy" alt="OS Recommendations on AKS" src="https://blog.aks.azure.com/assets/images/os-comparison-f67a6721931279416a6662af47301e56.png" width="1676" height="518" class="img_ev3q"></p>
<p><em>Figure 1: Comparison across OS options supported on AKS, including Flatcar Container Linux for AKS, Azure Linux OS Guard, Ubuntu, and Azure Linux.</em></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="migration-to-an-optimized-linux-os-option">Migration to an optimized Linux OS option<a href="https://blog.aks.azure.com/2025/11/20/recommendations-for-container-and-security-optimized-os-options-on-aks#migration-to-an-optimized-linux-os-option" class="hash-link" aria-label="Direct link to Migration to an optimized Linux OS option" title="Direct link to Migration to an optimized Linux OS option" translate="no">​</a></h2>
<p>If you'd like to migrate to <a href="https://aka.ms/aks/azure-linux-os-guard" target="_blank" rel="noopener noreferrer">Azure Linux OS Guard (preview)</a> or <a href="https://aka.ms/aks/flatcar" target="_blank" rel="noopener noreferrer">Flatcar Container Linux for AKS (preview)</a>, you'll want to keep in mind the following limitations and recommendations.</p>
<p>Immutable operating systems, by implication, make large parts of a node’s file system read-only. While Kubernetes workloads in general should not break abstraction and interfere with a node’s OS, the reality is often different. Care must be taken when migrating from general purpose operating systems. We have observed workloads’ expectations not being uniformly upheld on immutable systems particularly with, but not limited to:</p>
<ul>
<li>Any containers that require access to the host filesystem (e.g. via a /host/... mount), in particular init containers and daemonsets.</li>
<li>Containers required to run in host PID and / or Networking namespace</li>
</ul>
<p>Some AKS features may not be supported when using <a href="https://aka.ms/aks/azure-linux-os-guard" target="_blank" rel="noopener noreferrer">Azure Linux OS Guard (preview)</a> or <a href="https://aka.ms/aks/flatcar" target="_blank" rel="noopener noreferrer">Flatcar Container Linux for AKS (preview)</a>. If you are using a feature that is not supported by the new OS, you will not be able to migrate your existing clusters/node pools. Ensure you review the limitations called out in their respective pages thoroughly.</p>
<p>When planning to migrate to an optimized OS option, we recommend the following:</p>
<ul>
<li>Ensure your workloads configure and run successfully on the new OS in test/dev before migrating any production clusters.</li>
<li>If you'd like to migrate existing Linux clusters or node pools to <a href="https://aka.ms/aks/azure-linux-os-guard" target="_blank" rel="noopener noreferrer">Azure Linux OS Guard (preview)</a>, you can use <a href="https://learn.microsoft.com/azure/azure-linux/tutorial-azure-linux-os-guard-migration" target="_blank" rel="noopener noreferrer">in-place OS Sku migration</a>. There are pre-requisites and limitations to this process, see documentation for details.</li>
<li>If you'd like to migrate to <a href="https://aka.ms/aks/flatcar" target="_blank" rel="noopener noreferrer">Flatcar Container Linux for AKS (preview)</a>, you'll need to create new clusters and/or node pools and migrate existing workloads. <a href="https://aka.ms/aks/flatcar" target="_blank" rel="noopener noreferrer">Flatcar</a> is available on all AKS supported Kubernetes versions.</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="community-stewardship">Community Stewardship<a href="https://blog.aks.azure.com/2025/11/20/recommendations-for-container-and-security-optimized-os-options-on-aks#community-stewardship" class="hash-link" aria-label="Direct link to Community Stewardship" title="Direct link to Community Stewardship" translate="no">​</a></h2>
<p>AKS is built on community stewarded open source projects. Our teams maintain <a href="https://aka.ms/aks/azure-linux-os-guard" target="_blank" rel="noopener noreferrer">Azure Linux OS Guard</a>, contribute significantly to <a href="https://aka.ms/aks/flatcar" target="_blank" rel="noopener noreferrer">Flatcar Container Linux</a>, and actively collaborate with the Immutable Linux community, the UAPI group, and other open source initiatives. Our continued engagement with projects like Flatcar improves the ecosystem for everybody and also empowers our users and customers to actively engage and participate in both development as well as project stewardship – driving the technology as well as determining the course and direction of these projects.</p>
<p>Join us in the Flatcar Container Linux open source project, which is community-driven and governed by the Cloud Native Computing Foundation. Get involved, contribute, and help shape the future of Flatcar Container Linux:</p>
<ul>
<li><a href="https://github.com/flatcar/Flatcar?tab=readme-ov-file#participate-and-contribute" target="_blank" rel="noopener noreferrer">Flatcar's participation how-to</a></li>
<li><a href="https://app.element.io/?#/room/#flatcar:matrix.org" target="_blank" rel="noopener noreferrer">Chat with Flatcar contributors over at Matrix</a></li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="roadmap">Roadmap<a href="https://blog.aks.azure.com/2025/11/20/recommendations-for-container-and-security-optimized-os-options-on-aks#roadmap" class="hash-link" aria-label="Direct link to Roadmap" title="Direct link to Roadmap" translate="no">​</a></h2>
<p>We’re excited to continue to extend AKS support for these optimized OS options. Our long-term goals include:</p>
<ul>
<li>In-place updates of OS and Kubernetes: faster, safer, less resource constraining</li>
<li>Trusted and Confidential computing, locked-down execution through code signing</li>
<li>Making signed execution available to everyone, by means of multiple trust levels and the option for users to use their own signing keys for their workloads</li>
</ul>
<p>As we build these new features, we will be excited to share all these achievements with the broader Linux and Kubernetes ecosystem by contributing back and by making building blocks available.</p>
<p>To follow along with our backlog and progress, please see our public roadmaps:</p>
<ul>
<li><a href="https://github.com/orgs/flatcar/projects/7/views/9" target="_blank" rel="noopener noreferrer">Flatcar Container Linux roadmap</a></li>
<li><a href="https://github.com/orgs/Azure/projects/685" target="_blank" rel="noopener noreferrer">AKS Public Roadmap</a></li>
<li><a href="https://github.com/orgs/microsoft/projects/970" target="_blank" rel="noopener noreferrer">Azure Linux roadmap</a></li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="questions">Questions?<a href="https://blog.aks.azure.com/2025/11/20/recommendations-for-container-and-security-optimized-os-options-on-aks#questions" class="hash-link" aria-label="Direct link to Questions?" title="Direct link to Questions?" translate="no">​</a></h2>
<p>Connect with the AKS and Azure Linux teams and communities through our <a href="https://github.com/Azure/AKS/discussions" target="_blank" rel="noopener noreferrer">GitHub discussions</a> or share your <a href="https://github.com/Azure/AKS/issues" target="_blank" rel="noopener noreferrer">feedback and suggestions</a>.</p>]]></content:encoded>
            <category>Azure Linux</category>
            <category>Flatcar Container Linux</category>
            <category>Best Practices</category>
            <category>Operations</category>
            <category>Security</category>
        </item>
        <item>
            <title><![CDATA[Preview: nftables support for kube-proxy in Azure Kubernetes Service (AKS)]]></title>
            <link>https://blog.aks.azure.com/2025/11/19/nftables-in-kube-proxy</link>
            <guid>https://blog.aks.azure.com/2025/11/19/nftables-in-kube-proxy</guid>
            <pubDate>Wed, 19 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn about how to use the nftables mode of kube-proxy on AKS.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorWithStickyNavbar_LWe7" id="summary">Summary<a href="https://blog.aks.azure.com/2025/11/19/nftables-in-kube-proxy#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary" translate="no">​</a></h2>
<p>We're announcing the preview availability of <strong>nftables</strong> mode for kube-proxy in Azure Kubernetes Service (AKS). This feature was requested in <a href="https://github.com/Azure/AKS/issues/5061" target="_blank" rel="noopener noreferrer">GitHub issue #5061</a> and is now aligned with the upstream Kubernetes GA release of the nftables backend.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="background">Background<a href="https://blog.aks.azure.com/2025/11/19/nftables-in-kube-proxy#background" class="hash-link" aria-label="Direct link to Background" title="Direct link to Background" translate="no">​</a></h2>
<p>Kubernetes 1.33 introduced <strong>nftables</strong> as a fully supported kube-proxy mode. It serves as the modern replacement for iptables, offering a more efficient rule model and improved performance characteristics on newer Linux kernels. As highlighted by the upstream project, nftables reduces rule churn and avoids the scaling and latency limitations seen in large clusters using iptables.</p>
<p>Unlike iptables, which implements the ruleset in an O(n) manner that slows down processing as the number of services grows, nftables utilizes a structure with a roughly O(1) map lookup. As a result, packet processing time is more or less constant regardless of cluster size, and the best/average/worst cases are very similar:</p>
<p><img decoding="async" loading="lazy" alt="kube-proxy nftables first packet latency at various percentiles in clusters of various sizes" src="https://blog.aks.azure.com/assets/images/nftables-only-7d0dba2f4868af7c76a29b3748980e52.svg" width="600" height="371" class="img_ev3q"></p>
<p>In clusters with 5,000 and 10,000 Services, the p50 (average) latency for nftables is approximately the same as the p01 (best-case) latency for iptables. In the 30,000 Service cluster, the p99 (worst-case) latency for nftables manages to beat the p01 latency for iptables by a few microseconds! Here's both sets of data together, though you may have to squint to see the nftables results:</p>
<p><img decoding="async" loading="lazy" alt="kube-proxy iptables vs nftables first packet latency at various percentiles in clusters of various sizes" src="https://blog.aks.azure.com/assets/images/iptables-vs-nftables-d287c4baca937019cbbe5c6d5d54d6e7.svg" width="600" height="371" class="img_ev3q"></p>
<p>For additional context, see the upstream GA announcement: <a href="https://kubernetes.io/blog/2025/02/28/nftables-kube-proxy/" target="_blank" rel="noopener noreferrer">Kubernetes blog: NFTables mode for kube-proxy</a>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-available-in-aks">What's available in AKS<a href="https://blog.aks.azure.com/2025/11/19/nftables-in-kube-proxy#whats-available-in-aks" class="hash-link" aria-label="Direct link to What's available in AKS" title="Direct link to What's available in AKS" translate="no">​</a></h2>
<p>AKS now exposes nftables through the <strong>kube-proxy configuration preview feature</strong>. You can configure kube-proxy in one of three modes:</p>
<ul>
<li><code>IPTABLES</code></li>
<li><code>IPVS</code></li>
<li><code>NFTABLES</code> <em>(new – preview)</em></li>
</ul>
<p>This configuration is applied through <code>--kube-proxy-config</code> during cluster creation or update.</p>
<p>Example:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"enabled"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"mode"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"NFTABLES"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="enabling-the-preview">Enabling the preview<a href="https://blog.aks.azure.com/2025/11/19/nftables-in-kube-proxy#enabling-the-preview" class="hash-link" aria-label="Direct link to Enabling the preview" title="Direct link to Enabling the preview" translate="no">​</a></h2>
<ol>
<li>Install the latest <code>aks-preview</code> CLI extension.</li>
<li>Register the <code>KubeProxyConfigurationPreview</code> feature flag.</li>
<li>Create or update your cluster with the nftables kube-proxy configuration.</li>
</ol>
<p>Full details are in the updated documentation: <a href="https://learn.microsoft.com/azure/aks/configure-kube-proxy" target="_blank" rel="noopener noreferrer">Configure kube-proxy in AKS</a>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="notes-for-operators">Notes for operators<a href="https://blog.aks.azure.com/2025/11/19/nftables-in-kube-proxy#notes-for-operators" class="hash-link" aria-label="Direct link to Notes for operators" title="Direct link to Notes for operators" translate="no">​</a></h2>
<ul>
<li>Switching kube-proxy modes may cause brief disruptions as rules are reprogrammed.</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="feedback">Feedback<a href="https://blog.aks.azure.com/2025/11/19/nftables-in-kube-proxy#feedback" class="hash-link" aria-label="Direct link to Feedback" title="Direct link to Feedback" translate="no">​</a></h2>
<p>We encourage you to try the feature in non-production clusters and share feedback in the original GitHub thread:<br>
<a href="https://github.com/Azure/AKS/issues/5061" target="_blank" rel="noopener noreferrer">GitHub issue #5061</a></p>]]></content:encoded>
            <category>nftables</category>
            <category>kube-proxy</category>
        </item>
        <item>
            <title><![CDATA[Fully Managed GPU workloads with Azure Linux on Azure Kubernetes Service (AKS)]]></title>
            <link>https://blog.aks.azure.com/2025/11/18/azure-linux-gpu-on-aks</link>
            <guid>https://blog.aks.azure.com/2025/11/18/azure-linux-gpu-on-aks</guid>
            <pubDate>Tue, 18 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn about how managed NVIDIA GPU nodes with Azure Linux OS deliver efficiency and streamlined operations for high-performance computing workloads on AKS.]]></description>
            <content:encoded><![CDATA[<h3 class="anchor anchorWithStickyNavbar_LWe7" id="introduction">Introduction<a href="https://blog.aks.azure.com/2025/11/18/azure-linux-gpu-on-aks#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h3>
<p>Running GPU workloads on AKS enables scalable, automated data processing and AI applications across Windows, Ubuntu, or Azure Linux nodes. <a href="https://learn.microsoft.com/azure/aks/use-azure-linux" target="_blank" rel="noopener noreferrer">Azure Linux</a>, Microsoft’s minimal and secure OS, simplifies GPU setup with validated drivers and seamless integration, reducing operational efforts. This blog covers how AKS supports GPU nodes on various OS platforms and highlights the security and performance benefits of Azure Linux for GPU workloads.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="unique-challenges-of-gpu-nodes">Unique challenges of GPU nodes<a href="https://blog.aks.azure.com/2025/11/18/azure-linux-gpu-on-aks#unique-challenges-of-gpu-nodes" class="hash-link" aria-label="Direct link to Unique challenges of GPU nodes" title="Direct link to Unique challenges of GPU nodes" translate="no">​</a></h3>
<p>Deploying a GPU workload isn’t just about picking the right VM size. There is also significant operational overhead that developers and platform engineers need to manage.</p>
<p>We’ve found that many of our customers struggled to manage GPU device discoverability/scheduling and observability, especially across different OS images. Platform teams spent cycles maintaining custom node images and post-deployment scripts to ensure CUDA compatibility, while developers had to debug “GPU not found” errors or stalled workloads that consumed GPU capacity with limited visibility into utilization.</p>
<p>The inconsistent experience across OS options on AKS was a major challenge that we sought to improve. We wanted to encourage our customers to use the OS that best-fit their needs, not blocking them because of feature parity gaps.</p>
<p>For example, Azure Linux support for GPU-enabled VM sizes on AKS was historically limited to NVIDIA V100 and T4, creating a gap for Azure Linux customers requiring higher-performance options. Platform teams looking to run compute-intensive workloads, such as general-purpose AI/ML workloads or large-scale simulations, were unable to do so with Azure Linux and <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/nca100v4-series?tabs=sizebasic" target="_blank" rel="noopener noreferrer">NVIDIA NC A100 GPU node pools</a> -- until now.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="aks-expanding-azure-linux-gpu-support">AKS expanding Azure Linux GPU support<a href="https://blog.aks.azure.com/2025/11/18/azure-linux-gpu-on-aks#aks-expanding-azure-linux-gpu-support" class="hash-link" aria-label="Direct link to AKS expanding Azure Linux GPU support" title="Direct link to AKS expanding Azure Linux GPU support" translate="no">​</a></h3>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="nc-a100-gpu-support">NC A100 GPU support<a href="https://blog.aks.azure.com/2025/11/18/azure-linux-gpu-on-aks#nc-a100-gpu-support" class="hash-link" aria-label="Direct link to NC A100 GPU support" title="Direct link to NC A100 GPU support" translate="no">​</a></h4>
<p>The introduction of Azure Linux 3.0 support for NC A100 GPU node pools in AKS starts to close many of these gaps. For platform engineers, the new OS image standardizes the underlying kernel, container runtime, and driver stack while enabling GPU provisioning in a single declarative step. Instead of layering custom extensions or maintaining golden images, engineers can now define a node pool with <code>--os-sku AzureLinux</code> and get a consistent, secure, and AKS-managed runtime that includes NVIDIA drivers/plugin setup and GPU telemetry out of the box. The Azure Linux 3.0 image also aligns with the AKS release cadence, which means fewer compatibility issues when upgrading clusters or deploying existing workloads onto GPU nodes.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="aks-fully-managed-gpu-nodes-preview">AKS fully managed GPU nodes (preview)<a href="https://blog.aks.azure.com/2025/11/18/azure-linux-gpu-on-aks#aks-fully-managed-gpu-nodes-preview" class="hash-link" aria-label="Direct link to AKS fully managed GPU nodes (preview)" title="Direct link to AKS fully managed GPU nodes (preview)" translate="no">​</a></h3>
<p>Using NVIDIA GPUs with Azure Linux on AKS requires the installation of several components for proper functioning of AKS-GPU enabled nodes, including GPU drivers, the NVIDIA Kubernetes device plugin, and GPU metrics exporter for telemetry. Previously, the installation of these components was either done manually or via the open-source NVIDIA GPU Operator, creating operational overhead for platform engineers. To ease this complexity and overhead, AKS has released support for <a href="https://learn.microsoft.com/azure/aks/aks-managed-gpu-nodes" target="_blank" rel="noopener noreferrer">fully managed GPU nodes (preview)</a>, which installs the NVIDIA GPU driver, device plugin, and Data Center GPU Manager (DCGM) metrics exporter by default.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="deploying-gpu-workloads-on-aks-with-azure-linux-30">Deploying GPU workloads on AKS with Azure Linux 3.0<a href="https://blog.aks.azure.com/2025/11/18/azure-linux-gpu-on-aks#deploying-gpu-workloads-on-aks-with-azure-linux-30" class="hash-link" aria-label="Direct link to Deploying GPU workloads on AKS with Azure Linux 3.0" title="Direct link to Deploying GPU workloads on AKS with Azure Linux 3.0" translate="no">​</a></h3>
<p>Customers choose to run their GPU workloads on Azure Linux for many reasons, such as the security posture, support model, resiliency, and/or performance optimizations that the OS provides. Some of the benefits that Azure Linux provides for your GPU workloads include:</p>
<table><thead><tr><th><strong>Values</strong></th><th><strong>Azure Linux</strong></th><th><strong>Other Distributions</strong></th></tr></thead><tbody><tr><td><strong>Security and Compliance</strong></td><td>Azure Linux is a minimal, hardened OS built from source in Microsoft’s trusted pipeline. It includes only essential packages for Kubernetes and GPU workloads, reducing CVEs and patching overhead. All kernel modules installed on Azure Linux AKS nodes must be signed using a trusted Microsoft secure key. FIPS-compliant images and CIS benchmarks further strengthen the security posture of your GPU node pools with out of the box compliance.</td><td>Other distributions often include broader package sets and dependencies, which can increase the attack surface and CVE exposure. Other distributions allow kernel modules to be installed on nodes that are not signed by Microsoft. Further, FIPS-compliant images or CIS benchmarks may require additional configuration or customizations.</td></tr><tr><td><strong>Operational Efficiency</strong></td><td>Azure Linux images are lightweight and optimized for AKS, enabling quick node provisioning and upgrade times. GPU drivers also come pre-installed for Azure Linux NVIDIA GPU node pools, ensuring smooth GPU enablement without manual intervention.</td><td>Other distributions have larger image footprints which can lead to slower node provisioning and upgrade times. Like Azure Linux, other distributions also come with GPU drivers preinstalled in NVIDIA GPU node pools.</td></tr><tr><td><strong>Resiliency and Reliability</strong></td><td>Each Azure Linux image undergoes rigorous validation by the Azure Linux team, including GPU-specific scenarios, to prevent regressions and ensure stability before the image is released to AKS.</td><td>Other distributions cannot run AKS end-to-end tests prior to releasing their images to the AKS team.</td></tr></tbody></table>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="deploy-a-gpu-workload-with-azure-linux-on-aks">Deploy a GPU workload with Azure Linux on AKS<a href="https://blog.aks.azure.com/2025/11/18/azure-linux-gpu-on-aks#deploy-a-gpu-workload-with-azure-linux-on-aks" class="hash-link" aria-label="Direct link to Deploy a GPU workload with Azure Linux on AKS" title="Direct link to Deploy a GPU workload with Azure Linux on AKS" translate="no">​</a></h4>
<p>Deploying your GPU workloads on AKS with Azure Linux 3.0 is simple. Let’s use the newly supported NVIDIA NC A100 GPU as our example.</p>
<ol>
<li>
<p>To add an NVIDIA NC A100 node pool running on Azure Linux to your AKS cluster using the fully managed GPU node experience you can follow <a href="https://learn.microsoft.com/azure/aks/aks-managed-gpu-nodes?tabs=add-ubuntu-gpu-node-pool" target="_blank" rel="noopener noreferrer">these instructions</a>. Please note, the following parameters must be specified in your <code>az aks nodepool add</code> command to create an NVIDIA NC A100 node pool running on Azure Linux:</p>
<ul>
<li><code>--os-sku AzureLinux</code>: provisions a node pool with the Azure Linux container host as the node OS.</li>
<li><code>--node-vm-size Standard_nc24ads_A100_v4</code>: provisions a node pool using the <code>Standard_nc24ads_A100_v4</code> VM size. Please note, any of the sizes in the Azure <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/nca100v4-series?tabs=sizebasic" target="_blank" rel="noopener noreferrer">NC_A100_v4</a> series are supported.</li>
</ul>
</li>
<li>
<p>With the DCGM exporter installed by default, you can observe detailed GPU metrics such as utilization, memory consumption, and error states.</p>
</li>
</ol>
<p>If you prefer not to use a preview feature, you can follow <a href="https://learn.microsoft.com/azure/aks/use-nvidia-gpu?tabs=add-ubuntu-gpu-node-pool#manually-install-the-nvidia-device-plugin" target="_blank" rel="noopener noreferrer">these instructions</a> on AKS to create an NVIDIA NC A100 node pool with Azure Linux by manually installing the NVIDIA device plugin via a DaemonSet. You’ll also need to manually install the DCGM exporter to consume GPU metrics.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="observability--monitoring">Observability &amp; monitoring<a href="https://blog.aks.azure.com/2025/11/18/azure-linux-gpu-on-aks#observability--monitoring" class="hash-link" aria-label="Direct link to Observability &amp; monitoring" title="Direct link to Observability &amp; monitoring" translate="no">​</a></h3>
<p>Monitoring GPU performance is critical for optimizing utilization, troubleshooting workloads, and enabling cost-efficient scaling in AKS clusters. Traditionally, NVIDIA GPU node pools were treated as opaque resources - jobs would succeed or fail without visibility into whether GPUs were fully utilized or misallocated.</p>
<p>With the DCGM exporter now managed on AKS, cluster operators can collect detailed GPU metrics such as utilization, memory consumption, and error states for analysis. These metrics can integrate naturally with existing observability pipelines, providing a foundation for intelligent automation and alerting.</p>
<p>As an example, a platform team can configure scaling logic in the <a href="https://learn.microsoft.com/azure/aks/cluster-autoscaler" target="_blank" rel="noopener noreferrer">Cluster Autoscaler (CAS)</a> or <a href="https://learn.microsoft.com/azure/aks/keda-about" target="_blank" rel="noopener noreferrer">Kubernetes Event-Driven Autoscaling (KEDA)</a> to add A100 nodes when GPU utilization exceeds 70%, or scale down when utilization remains low for a defined interval. This enables GPU infrastructure to operate as a dynamic, demand-driven resource rather than a static, high-cost allocation.</p>
<p>For more conceptual guidance on GPU metrics in AKS, visit these <a href="https://aka.ms/aks/managed-gpu-metrics" target="_blank" rel="noopener noreferrer">docs</a>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's next?<a href="https://blog.aks.azure.com/2025/11/18/azure-linux-gpu-on-aks#whats-next" class="hash-link" aria-label="Direct link to What's next?" title="Direct link to What's next?" translate="no">​</a></h3>
<p>The Azure Linux and AKS teams are actively working on expanding support for additional GPU VM sizes and managed GPU features on AKS. You can expect to see Azure Linux support for the NVIDIA <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/ndma100v4-series" target="_blank" rel="noopener noreferrer">ND A100</a>, <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/ncadsh100v5-series" target="_blank" rel="noopener noreferrer">NC H100</a>, and <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/nd-h200-v5-series" target="_blank" rel="noopener noreferrer">ND H200</a> families landing in the near future, as well as Azure Linux support for managed AKS GPU features like <a href="https://learn.microsoft.com/azure/aks/gpu-multi-instance" target="_blank" rel="noopener noreferrer">multi-instance GPU (MIG)</a>, built-in GPU metrics in Azure Managed Prometheus and Grafana, and <a href="https://learn.microsoft.com/azure/aks/ai-toolchain-operator" target="_blank" rel="noopener noreferrer">KAITO</a>.</p>]]></content:encoded>
            <category>Azure Linux</category>
            <category>Best Practices</category>
            <category>GPU</category>
            <category>Monitoring</category>
        </item>
        <item>
            <title><![CDATA[Delve into Dynamic Resource Allocation, devices, and drivers on Kubernetes]]></title>
            <link>https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes</link>
            <guid>https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes</guid>
            <pubDate>Mon, 17 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[An interactive and 'digestible' way to learn about DRA, followed up with a practical example utilizing NVIDIA's DRA drivers]]></description>
            <content:encoded><![CDATA[<p>Dynamic Resource Allocation (DRA) is often mentioned in discussions about GPUs and specialized devices designed for high-performance AI and video processing jobs. But what exactly is it?</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="using-gpus-in-kubernetes">Using GPUs in Kubernetes<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#using-gpus-in-kubernetes" class="hash-link" aria-label="Direct link to Using GPUs in Kubernetes" title="Direct link to Using GPUs in Kubernetes" translate="no">​</a></h2>
<p>Today, you need a few vendor-specific software components to even get started with a GPU node pool on Kubernetes - in particular, these are the Kubernetes device plugin and the GPU drivers. While the installation of GPU drivers makes sense, you might ask - why do we need a specific device plugin to be installed as well?</p>
<p>Since Kubernetes does not have native support for special devices like GPUs, the device plugin surfaces and makes these resources available to your application. The device plugin works by exposing the number of GPUs on a node, by taking in a list of allocatable resources through the device plugin API and passing this to the kubelet. The kubelet then tracks this set and inputs the count of arbitrary resource type on the node to API Server, for kube-scheduler to use in pod scheduling decisions.</p>
<p><img decoding="async" loading="lazy" alt="image" src="https://blog.aks.azure.com/assets/images/device-plugin-diagram-4127ad9fc6abd88aacb94b8cf64f32d5.png" width="1043" height="289" class="img_ev3q"></p>
<p>However, there are some limitations with this device plugin approach. It only allows GPUs to be statically assigned to a workload, without any ability for fine-grained sharing, partitioning, nor hot-swapping/reconfiguring of the GPUs.</p>
<p>The introduction of DRA provides a path to addressing some of these limitations, by ensuring resources can by dynamically categorized, requested, and used in a cluster. The <a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/" target="_blank" rel="noopener noreferrer">Dynamic Resource Allocation API</a> generalizes the Persistent Volumes API for generic resources, like GPUs. It allows for resource adjustment based on real-time demand and proper configuration without manual intervention.</p>
<p>NVIDIA's DRA driver extends this capability for their GPUs, introducing <em>GPUs</em> and <em>ComputeDomains</em> as two types of resources users can manage through DRA handles.</p>
<div class="theme-admonition theme-admonition-warning admonition_xJq3 alert alert--warning"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 16 16"><path fill-rule="evenodd" d="M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"></path></svg></span>warning</div><div class="admonitionContent_BuS1"><p>Dynamic resource allocation is currently an <strong>alpha feature</strong> and only enabled when the <em>DynamicResourceAllocation</em> <a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/" target="_blank" rel="noopener noreferrer">feature gate</a> and the <em>resource.k8s.io/v1alpha3</em> <a href="https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-groups-and-versioning" target="_blank" rel="noopener noreferrer">API group</a> are enabled.</p></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="lets-backtrack-with-a-simpler-scenario-im-confused">Let’s backtrack with a simpler scenario… I’m confused<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#lets-backtrack-with-a-simpler-scenario-im-confused" class="hash-link" aria-label="Direct link to Let’s backtrack with a simpler scenario… I’m confused" title="Direct link to Let’s backtrack with a simpler scenario… I’m confused" translate="no">​</a></h2>
<p>Imagine that Kubernetes DRA is a working kitchen, and its staff is preparing a multi-course meal. In this example, the ingredients include:</p>
<ul>
<li><strong>Pods</strong>: The individual dishes that need to be prepared</li>
<li><strong>Nodes</strong>: Various kitchen stations where the dishes are cooked</li>
<li><strong>API Server</strong>: The head chef, directing the entire operation</li>
<li><strong>Scheduler</strong>: The sous-chef, making sure each dish is cooked at the right station</li>
<li><strong>Resource Quotas</strong>: Portion sizes, ensuring each dish uses a specific amount of ingredients</li>
<li><strong>Horizontal Pod Autoscaler (HPA)</strong>: Dynamic serving staff, adjusting the number of dishes served based on customer demand</li>
</ul>
<p>The instructions for this DRA recipe are then to:</p>
<ol>
<li><strong>Prep</strong>: Start by defining the recipes (Pods) and gathering all ingredients (resources like CPU and memory). Each recipe is assigned to a station (Node) by the head chef (API Server) through the sous-chef (Scheduler).</li>
<li><strong>Cooking</strong>: The stations (Nodes) start cooking the dishes (Pods) as directed. The head chef ensures everything is progressing smoothly and the dishes are being prepared according to the recipes.</li>
<li><strong>Portion Control</strong>: Use Resource Quotas to manage the portion sizes of each dish. This ensures that no single dish uses up too many ingredients, maintaining a balanced kitchen.</li>
<li><strong>Dynamic Serving</strong>: The HPA acts like a waiter, monitoring the number of customers (resource usage metrics) and adjusting the number of dishes prepared (Pod count) dynamically. If more customers arrive, more dishes are prepared; if the demand decreases, fewer dishes are made.</li>
<li><strong>Serving the Meal</strong>: The kitchen operates efficiently, with each dish receiving the right amount of attention and resources needed to ensure a delightful dining experience for the customers.</li>
</ol>
<p>Just like in a restaurant kitchen, Kubernetes DRA makes sure your applications (dishes) get the right number of resources (ingredients) at the right time, providing an efficient resource management experience in the complex environment.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="now-we-can-dig-into-the-nitty-gritty">Now, we can dig into the nitty-gritty<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#now-we-can-dig-into-the-nitty-gritty" class="hash-link" aria-label="Direct link to Now, we can dig into the nitty-gritty" title="Direct link to Now, we can dig into the nitty-gritty" translate="no">​</a></h2>
<p>DRA involves several key components to efficiently manage resources within a cluster, shown in the following sample resource driver:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: resource.k8s.io/v1alpha3</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: DeviceClass</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">name: resource.example.com</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  selectors:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  - cel:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      expression: device.driver == "resource-driver.example.com"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">---</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: resource.k8s.io/v1alpha1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: ResourceClaimTemplate</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name: gpu-template</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    resourceClassName: resource.example.com</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">–--</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: Pod</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name: pod</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  containers:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  - name: container0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    image: gpuDriverImage</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    command: ["cmd0"]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    resources:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      claims:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  - name: container1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    image: gpuDriverImage</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    command: ["cmd0"]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    resources:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      claims:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  resourceClaims:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  - name: gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    source:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      resourceClaimTemplate: gpu-template</span><br></span></code></pre></div></div>
<p><em>Adapted from: <a href="https://docs.google.com/document/d/1BNWqgx_SmZDi-va_V31v3DnuVwYnF2EmN7D-O_fB6Oo/edit?tab=t.0" target="_blank" rel="noopener noreferrer">DRA for GPUs in Kubernetes</a></em></p>
<p>Here's what each of the components are:</p>
<table><thead><tr><th>Component</th><th>What does it do?</th></tr></thead><tbody><tr><td>Resource Claims and Templates</td><td><em>ResourceClaim API</em> allows workloads to request specific resources, such as GPUs, by defining their requirements and a <em>ResourceClaimTemplate</em> helps in creating these claims automatically when deploying workloads</td></tr><tr><td>Device classes</td><td>Predefined criteria for selecting and configuring devices. Each resource request references a <em>DeviceClass</em></td></tr><tr><td>Pod scheduling context</td><td>Coordinates pod scheduling when <em>ResourceClaims</em> need to be allocated and ensures that the resources requested by the pods are available and properly allocated</td></tr><tr><td>Resource slices</td><td>Publish useful information about available CPU, GPU, memory resources in the cluster to help manage and track resource allocation efficiently</td></tr><tr><td>Control plane controller</td><td>When the DRA driver provides a control plane controller, it handles the allocation of resources in cooperation with the Kubernetes scheduler. This ensures that resources are allocated based on structured parameters</td></tr></tbody></table>
<p>The example above demonstrates GPU sharing within a pod, where two containers get access to one ResourceClaim object created for the pod.</p>
<p>The Kubernetes scheduler will try multiple times to identify the right node and let the DRA resource driver(s) know that the node is ready for resource allocation.
Then the <em>resourceClaim</em> stages record that the resource has been allocated on a particular node, and the scheduler is signaled to run the pod. (This prevents pods from being scheduled onto “unprepared” nodes that cannot accept new resources.)
Leveraging the Named Resources model, the DRA resource driver can specify:</p>
<ol>
<li>How to get a hold of a specific GPU type, and</li>
<li>How to configure the GPU that has been allocated to a pod and assigned to a node.</li>
</ol>
<p>In the place of arbitrary resource count, an entire object now represents the choice of resource. This object is passed to the scheduler at node start time and may stream updates if resources become unhealthy.</p>
<p>Pulling this all together, the key components of DRA look like:</p>
<p><img decoding="async" loading="lazy" alt="image" src="https://blog.aks.azure.com/assets/images/dra-driver-diagram-78f7237fef67ee88854de239e62ca308.png" width="624" height="234" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="vendor-specific-drivers">Vendor specific drivers<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#vendor-specific-drivers" class="hash-link" aria-label="Direct link to Vendor specific drivers" title="Direct link to Vendor specific drivers" translate="no">​</a></h2>
<p>Vendors can provide driver packages that extend the base DRA capabilities to interact with their own resources. We will take a look at NVIDIA's <a href="https://github.com/NVIDIA/k8s-dra-driver-gpu" target="_blank" rel="noopener noreferrer">DRA drivers</a>, and explore how that allows for flexible and dynamic allocation of their GPUs.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="seeing-nvidias-dra-driver-in-action">Seeing NVIDIA's DRA driver in action<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#seeing-nvidias-dra-driver-in-action" class="hash-link" aria-label="Direct link to Seeing NVIDIA's DRA driver in action" title="Direct link to Seeing NVIDIA's DRA driver in action" translate="no">​</a></h2>
<p>Now, let’s see how NVIDIA's <a href="https://github.com/NVIDIA/k8s-dra-driver/tree/main" target="_blank" rel="noopener noreferrer">open-source k8s DRA driver</a> works on a Kubernetes cluster. We'll walk through:</p>
<ol>
<li>Setting up your cluster and DRA drivers</li>
<li>Verifying your drivers are installed properly</li>
<li>Running a sample workload to illustrate a GPU being flexibly assigned</li>
</ol>
<div class="theme-admonition theme-admonition-warning admonition_xJq3 alert alert--warning"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 16 16"><path fill-rule="evenodd" d="M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"></path></svg></span>warning</div><div class="admonitionContent_BuS1"><p>The following is an <strong>experimental</strong> demo using an Azure Kubernetes Service cluster; the open-source k8s DRA resource driver is under active development and <strong>not yet supported for production use</strong>.</p></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="before-you-begin">Before you begin<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#before-you-begin" class="hash-link" aria-label="Direct link to Before you begin" title="Direct link to Before you begin" translate="no">​</a></h3>
<ul>
<li>
<p>If you don't have a cluster, create one using the <a href="https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-cli" target="_blank" rel="noopener noreferrer">Azure CLI</a>, <a href="https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-powershell" target="_blank" rel="noopener noreferrer">Azure PowerShell</a>, <a href="https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-portal?tabs=azure-cli" target="_blank" rel="noopener noreferrer">Azure portal</a>, or IaaC tool of your choice. Here's an example of creating one using the Azure CLI. Note that the cluster should be on Kubernetes v.134 or later to have the DRA feature gate enabled.</p>
<div class="language-azurecli-interactive codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-azurecli-interactive codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks create --name myAKSCluster --resource-group myResourceGroup --location &lt;region&gt;  --kubernetes-version 1.34</span><br></span></code></pre></div></div>
</li>
<li>
<p>Your GPU node pool should be provisioned with an <a href="https://learn.microsoft.com/en-us/azure/aks/use-nvidia-gpu?tabs=add-ubuntu-gpu-node-pool#options-for-using-nvidia-gpus" target="_blank" rel="noopener noreferrer">NVIDIA GPU enabled VM size</a>.</p>
<ul>
<li>Make sure you also <a href="https://learn.microsoft.com/en-us/azure/aks/use-nvidia-gpu?tabs=add-ubuntu-gpu-node-pool#skip-gpu-driver-installation" target="_blank" rel="noopener noreferrer">skip GPU driver installation</a>, as we install the drivers via the NVIDIA GPU operator in this tutorial.</li>
</ul>
<div class="language-azurecli-interactive codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-azurecli-interactive codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks nodepool add --cluster-name myAKSCluster --resource-group myResourceGroup --name gpunodepool  --node-count 1 --gpu-driver none --node-vm-size Standard_NC6s_v3 (or alternative NVIDIA GPU SKU)</span><br></span></code></pre></div></div>
</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="get-the-credentials-for-your-cluster">Get the credentials for your cluster<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#get-the-credentials-for-your-cluster" class="hash-link" aria-label="Direct link to Get the credentials for your cluster" title="Direct link to Get the credentials for your cluster" translate="no">​</a></h3>
<p>Get the credentials for your AKS cluster using the <a href="https://learn.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az-aks-get-credentials" target="_blank" rel="noopener noreferrer"><code>az aks get-credentials</code></a> command. The following example command gets the credentials for the <em>myAKSCluster</em> in the <em>myResourceGroup</em> resource group:</p>
<div class="language-azurecli-interactive codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-azurecli-interactive codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks get-credentials --resource-group myResourceGroup --name myAKSCluster</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="verify-dra-is-enabled">Verify DRA is enabled<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#verify-dra-is-enabled" class="hash-link" aria-label="Direct link to Verify DRA is enabled" title="Direct link to Verify DRA is enabled" translate="no">​</a></h3>
<p>You can confirm whether or not DRA is enabled on your cluster by checking <code>deviceclasses</code> and <code>resourceslices</code>.</p>
<p>Check <code>deviceclasses</code> via <code>kubectl get deviceclasses</code> or check <code>resourceslices</code> via <code>kubectl get resourceslices</code>.</p>
<p>The results for both should look similar to:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">No resources found</span><br></span></code></pre></div></div>
<p>If DRA isn't enabled on your cluster, you may instead get a response similar to <code>error: the server doesn't have a resource type "deviceclasses"/"resourceslices"</code>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="install-the-nvidia-gpu-operator">Install the NVIDIA GPU operator<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#install-the-nvidia-gpu-operator" class="hash-link" aria-label="Direct link to Install the NVIDIA GPU operator" title="Direct link to Install the NVIDIA GPU operator" translate="no">​</a></h3>
<p>Set up your GPU operator, ensure GPUs are schedulable, and GPU workloads can be run successfully.</p>
<blockquote>
<p>[!NOTE]
Make sure you use a version of the GPU operator that matches or exceeds the version you specify when installing the DRA driver.</p>
</blockquote>
<ol>
<li>
<p>Install the GPU operator. We will be using the DRA drivers to manage our GPUs, so we also want to ensure we disable the Kubernetes device plugin during install. We will use an <code>operator-install.yaml</code> to parameters we'd like the operator to be installed with.</p>
<ul>
<li>
<p>Create <code>operator-install.yaml</code> like so:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">devicePlugin</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">enabled</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">false</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">driver</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   </span><span class="token key atrule" style="color:#00a4db">enabled</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">toolkit</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   </span><span class="token key atrule" style="color:#00a4db">env</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     </span><span class="token comment" style="color:#999988;font-style:italic"># Limits containers running in _unprivileged_ mode from requesting access to arbitrary GPU devices </span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">       </span><span class="token key atrule" style="color:#00a4db">value</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"false"</span><br></span></code></pre></div></div>
</li>
<li>
<p>Install the GPU operator</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">helm install --wait --generate-name -n gpu-operator \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--create-namespace nvidia/gpu-operator \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--version=v25.10.0 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-f operator-install.yaml</span><br></span></code></pre></div></div>
</li>
</ul>
</li>
<li>
<p>Make sure all your GPU operator components are running and ready via <code>kubectl get pod -n gpu-operator</code>. The result should look similar to:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">NAME                                                              READY   STATUS      RESTARTS   AGE</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gpu-feature-discovery-t9xs5                                       1/1     Running     0          2m9s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gpu-operator-1761843468-node-feature-discovery-gc-6648dd8449tbx   1/1     Running     0          2m27s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gpu-operator-1761843468-node-feature-discovery-master-597bhvwmm   1/1     Running     0          2m27s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gpu-operator-1761843468-node-feature-discovery-worker-mvbbt       1/1     Running     0          2m27s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gpu-operator-f8577988-p2k9x                                       1/1     Running     0          2m27s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nvidia-driver-daemonset-tgf78                                     1/1     Running     0          1m30s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nvidia-container-toolkit-daemonset-sqchb                          1/1     Running     0          2m10s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nvidia-cuda-validator-f7g97                                       0/1     Completed   0          77s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nvidia-dcgm-exporter-6lbxc                                        1/1     Running     0          2m9s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nvidia-device-plugin-daemonset-v74ww                              1/1     Running     0          2m9s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nvidia-mig-manager-jsnkr                                          1/1     Running     0          16s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nvidia-operator-validator-h8s5n                                   1/1     Running     0          2m10s</span><br></span></code></pre></div></div>
</li>
</ol>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="installing-nvidia-dra-drivers">Installing NVIDIA DRA drivers<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#installing-nvidia-dra-drivers" class="hash-link" aria-label="Direct link to Installing NVIDIA DRA drivers" title="Direct link to Installing NVIDIA DRA drivers" translate="no">​</a></h2>
<p>The recommended way to install the driver is via Helm. Ensure you have Helm updated to the <a href="https://helm.sh/docs/topics/version_skew/#supported-version-skew" target="_blank" rel="noopener noreferrer">correct version</a>.</p>
<ol>
<li>
<p>Add the Helm chart that contains the DRA driver.</p>
<div class="language-azurecli-interactive codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-azurecli-interactive codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">helm repo add nvidia https://helm.ngc.nvidia.com/nvidia &amp;&amp; helm repo update</span><br></span></code></pre></div></div>
</li>
<li>
<p>Create a <code>dra-install.yaml</code> to specify parameters during the installation.</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">gpuResourcesEnabledOverride</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">resources-computeDomains</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   </span><span class="token key atrule" style="color:#00a4db">enabled</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">false</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic"># We'll be using GPUs, not compute domains.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">controller</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   </span><span class="token key atrule" style="color:#00a4db">affinity</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     </span><span class="token key atrule" style="color:#00a4db">nodeAffinity</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">       </span><span class="token key atrule" style="color:#00a4db">requiredDuringSchedulingIgnoredDuringExecution</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         </span><span class="token key atrule" style="color:#00a4db">nodeSelectorTerms</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">matchExpressions</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">           </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> kubernetes.azure.com/mode</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">             </span><span class="token key atrule" style="color:#00a4db">operator</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> In</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">             </span><span class="token key atrule" style="color:#00a4db">values</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">             </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> system   </span><span class="token comment" style="color:#999988;font-style:italic"># Makes sure system nodes are utilized </span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">nvidiaDriverRoot</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"/run/nvidia/driver"</span><br></span></code></pre></div></div>
<ol>
<li>Install the DRA driver.</li>
</ol>
<div class="language-azurecli-interactive codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-azurecli-interactive codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">helm install nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">       # Ensure you select an appropriate version (https://github.com/NVIDIA/k8s-dra-driver-gpu/releases)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">       --version="25.8.0" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">       --create-namespace \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">       --namespace nvidia-dra-driver-gpu \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">       -f dra-install.yaml \</span><br></span></code></pre></div></div>
</li>
</ol>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="verify-your-installation">Verify your installation<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#verify-your-installation" class="hash-link" aria-label="Direct link to Verify your installation" title="Direct link to Verify your installation" translate="no">​</a></h3>
<p>Once setup, double check whether all DRA driver components are ready and running:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">$ kubectl get pod -n nvidia-dra-driver-gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">NAME                                               READY   STATUS    RESTARTS   AGE</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nvidia-dra-driver-gpu-kubelet-plugin-[...]         1/1     Running   0          61m</span><br></span></code></pre></div></div>
<p><code>deviceclasses</code> and <code>resourceslices</code> should also recognize the new GPU devices. You can use <code>kubectl get deviceclasses</code> or <code>kubectl get resourceslices</code> to confirm.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="run-a-gpu-workload-using-dra-drivers">Run a GPU workload using DRA drivers<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#run-a-gpu-workload-using-dra-drivers" class="hash-link" aria-label="Direct link to Run a GPU workload using DRA drivers" title="Direct link to Run a GPU workload using DRA drivers" translate="no">​</a></h3>
<p>You can run some sample workloads to confirm that DRA drivers are installed and behave as expected.</p>
<ol>
<li>
<p>Create a namespace that houses the resources for our sample workloads.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl create namespace dra-gpu-share-test</span><br></span></code></pre></div></div>
</li>
<li>
<p>Create a new <code>ResourceClaimTemplate</code> that is used to create <code>ResourceClaims</code> of 1 GPU for associated workloads. Save this manifest as <code>my-rct.yaml</code>.</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> resource.k8s.io/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ResourceClaimTemplate</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">namespace</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> dra</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">gpu</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">share</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">test</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> single</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">devices</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">requests</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">exactly</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">count</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">deviceClassName</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gpu.nvidia.com</span><br></span></code></pre></div></div>
</li>
<li>
<p>Create a pod manifest, <code>dra-rct-pod.yaml</code>, that takes advantage of our <code>ResourceClaimTemplate</code>. We spin up a pod that holds two containers, <code>ctr0</code> and <code>ctr1</code>. Both containers reference the same <code>ResourceClaim</code> and therefore share access to the same GPU device.</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Pod</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">namespace</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> dra</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">gpu</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">share</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">test</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> pod</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">labels</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">app</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> pod</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">containers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ctr0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ubuntu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">22.04</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">command</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"bash"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"-c"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">args</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 &amp; wait"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">resources</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">claims</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> shared</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ctr1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ubuntu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">22.04</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">command</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"bash"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"-c"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">args</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 &amp; wait"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">resources</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">claims</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> shared</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">resourceClaims</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> shared</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">resourceClaimTemplateName</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> single</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">gpu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">tolerations</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"nvidia.com/gpu"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">operator</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Exists"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">effect</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"NoSchedule"</span><br></span></code></pre></div></div>
</li>
<li>
<p>Fetch the containers' logs to check the GPU UUID for both containers</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl logs pod -n dra-gpu-share-test --all-containers --prefix</span><br></span></code></pre></div></div>
</li>
<li>
<p>The output should look similar to:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">[pod/pod/ctr0] GPU 0: NVIDIA H100 NVL (UUID: GPU-c552c7e1-3d44-482e-aaaf-507944ab75f7)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[pod/pod/ctr1] GPU 0: NVIDIA H100 NVL (UUID: GPU-c552c7e1-3d44-482e-aaaf-507944ab75f7)</span><br></span></code></pre></div></div>
</li>
</ol>
<p>The results show us that the GPU UUID for both containers matches, confirming that both containers are accessing the same GPU device.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="next-steps">Next steps<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#next-steps" class="hash-link" aria-label="Direct link to Next steps" title="Direct link to Next steps" translate="no">​</a></h2>
<ul>
<li>Further validate your installation of the DRA drivers with <a href="https://github.com/NVIDIA/k8s-dra-driver-gpu/wiki/Installation#validate-installation" target="_blank" rel="noopener noreferrer">sample workloads</a></li>
<li>Learn more about <a href="https://github.com/NVIDIA/k8s-dra-driver-gpu" target="_blank" rel="noopener noreferrer">NVIDIA DRA drivers</a></li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="questions">Questions?<a href="https://blog.aks.azure.com/2025/11/17/dra-devices-and-drivers-on-kubernetes#questions" class="hash-link" aria-label="Direct link to Questions?" title="Direct link to Questions?" translate="no">​</a></h2>
<p>Connect with the AKS team through our <a href="https://github.com/Azure/AKS/discussions" target="_blank" rel="noopener noreferrer">GitHub discussions</a> or <a href="https://github.com/Azure/AKS/issues" target="_blank" rel="noopener noreferrer">share your feedback and suggestions</a>.</p>]]></content:encoded>
            <category>AI</category>
        </item>
        <item>
            <title><![CDATA[Update on the Azure Kubernetes Service (AKS) Application Routing add-on, the Ingress API, and Ingress-NGINX]]></title>
            <link>https://blog.aks.azure.com/2025/11/13/ingress-nginx-update</link>
            <guid>https://blog.aks.azure.com/2025/11/13/ingress-nginx-update</guid>
            <pubDate>Thu, 13 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Microsoft's commitment to Azure Kubernetes Service (AKS) customers using the Application Routing add-on with Ingress-NGINX and guidance on migrating to modern Gateway API solutions.]]></description>
            <content:encoded><![CDATA[<p>The <a href="https://github.com/kubernetes/community/blob/master/sig-network/README.md" target="_blank" rel="noopener noreferrer">Kubernetes SIG Network</a> and the Security Response Committee have <a href="https://www.kubernetes.dev/blog/2025/11/12/ingress-nginx-retirement/" target="_blank" rel="noopener noreferrer">announced</a> the upcoming retirement of the <a href="https://github.com/kubernetes/ingress-nginx/" target="_blank" rel="noopener noreferrer">Ingress NGINX project</a>, with maintenance ending in <strong>March 2026</strong>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="no-immediate-action-required">No immediate action required<a href="https://blog.aks.azure.com/2025/11/13/ingress-nginx-update#no-immediate-action-required" class="hash-link" aria-label="Direct link to No immediate action required" title="Direct link to No immediate action required" translate="no">​</a></h2>
<p>Microsoft understands that customers value continuity and clarity around the maintenance and evolution of the components that power their workloads. There is no change or immediate action required today for AKS clusters using the <a href="https://learn.microsoft.com/azure/aks/app-routing" target="_blank" rel="noopener noreferrer">Application Routing add-on with NGINX</a> to manage Ingress NGINX resources. Microsoft will provide official support for Application Routing add-on Ingress NGINX resources through <strong>November 2026</strong> and only for critical security patches during this period.</p>
<p>We are actively investing in the future of application connectivity in Azure Kubernetes Service (AKS), centered on the Gateway API. This includes support for the Gateway API in the Istio-based service mesh add-on, expanding the Application Routing add-on to support the Gateway API, and continuing investment in Application Gateway for Containers.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-future-of-the-application-routing-add-on">The future of the Application Routing add-on<a href="https://blog.aks.azure.com/2025/11/13/ingress-nginx-update#the-future-of-the-application-routing-add-on" class="hash-link" aria-label="Direct link to The future of the Application Routing add-on" title="Direct link to The future of the Application Routing add-on" translate="no">​</a></h3>
<p><strong>Application Routing with Gateway API</strong>, which will be powered by the Istio control plane for Gateway API-based ingress only, is <strong>planned for the first half of 2026</strong>, along with migration guidance documentation. The Kubernetes Gateway API represents the next generation of Kubernetes ingress traffic management, evolving from the Ingress API by offering richer routing capabilities, standardized extensibility, and a more secure, role-oriented design.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="alternative-migration-paths">Alternative migration paths<a href="https://blog.aks.azure.com/2025/11/13/ingress-nginx-update#alternative-migration-paths" class="hash-link" aria-label="Direct link to Alternative migration paths" title="Direct link to Alternative migration paths" translate="no">​</a></h3>
<p>Alternatively, existing users of Ingress NGINX, including users who provisioned it through Application Routing add-on, can also migrate to one of the following options:</p>
<ul>
<li>The <a href="https://learn.microsoft.com/azure/aks/istio-gateway-api" target="_blank" rel="noopener noreferrer">Istio-based service mesh add-on</a> using the Gateway API</li>
<li><a href="https://aka.ms/agc/addon" target="_blank" rel="noopener noreferrer">Application Gateway for Containers</a> using either the Ingress API or the Gateway API</li>
</ul>
<p>For further questions, you can reach us on <a href="https://github.com/Azure/AKS/issues" target="_blank" rel="noopener noreferrer">GitHub</a> or create a <a href="https://learn.microsoft.com/azure/azure-portal/supportability/how-to-create-azure-support-request" target="_blank" rel="noopener noreferrer">support case</a>.</p>]]></content:encoded>
            <category>App Routing</category>
            <category>Istio</category>
            <category>Traffic Management</category>
        </item>
        <item>
            <title><![CDATA[PostgreSQL + CloudNativePG guidance for AKS]]></title>
            <link>https://blog.aks.azure.com/2025/11/10/announce-pgsql-howto</link>
            <guid>https://blog.aks.azure.com/2025/11/10/announce-pgsql-howto</guid>
            <pubDate>Mon, 10 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to set up an AKS cluster, deploy PostgreSQL, and explore CloudNativePG running on AKS.]]></description>
            <content:encoded><![CDATA[<p>We're pleased to share the <a href="https://learn.microsoft.com/azure/aks/postgresql-ha-overview" target="_blank" rel="noopener noreferrer">newly updated guidance</a> on deploying PostgreSQL with CloudNativePG on Azure Kubernetes Service (AKS).</p>
<p><a href="https://cloudnative-pg.io/" target="_blank" rel="noopener noreferrer">CloudNativePG</a> now anchors our recommended pattern for running production-ready PostgreSQL on AKS. With the latest feedback from EnterpriseDB (EDB), the documentation aligns to the refreshed container image catalogs, safer operator rollouts, and updates to the <a href="https://pgbarman.org/" target="_blank" rel="noopener noreferrer">Barman backup tool</a> that help teams meet availability targets from day one.</p>
<p>The updated set of articles walks through the journey end-to-end: an overview of the architecture, infrastructure setup with workload identity and storage RBAC, deployment with the new PostgreSQL 18 image, and day-two validation. You'll see how we folded in the CNPG controller tuning, the move toward self-managed PodMonitors, and guidance on planning for the Barman Cloud plugin as upstream support evolves.</p>
<p>We also expanded the operational coverage: monitoring with Prometheus and Grafana, exercising failover across availability zones, and validating backup and restore flows that rely on AKS workload identity. The how-tos keep the commands concise so you can get into the CLI quickly without losing sight of the bigger architectural decisions.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="cloudnativepg-on-aks-with-edb">CloudNativePG on AKS, with EDB<a href="https://blog.aks.azure.com/2025/11/10/announce-pgsql-howto#cloudnativepg-on-aks-with-edb" class="hash-link" aria-label="Direct link to CloudNativePG on AKS, with EDB" title="Direct link to CloudNativePG on AKS, with EDB" translate="no">​</a></h2>
<p>For production support for CloudNativePG on AKS, visit <a href="https://marketplace.microsoft.com/en-us/product/saas/enterprisedb-corp.edb-cnpg?tab=Overview" target="_blank" rel="noopener noreferrer">EDB's Azure Marketplace offering</a>. As maintainers of the operator, they provide services to keep clusters compliant, optimized, and ready for future releases.</p>
<p>Thanks again to the CloudNativePG team at EDB for their collaboration, reviews, and continued support as we bring the latest PostgreSQL best practices to AKS users. Dive in and see how quickly you can stand up a resilient PostgreSQL footprint on AKS.</p>]]></content:encoded>
            <category>General</category>
            <category>Developer</category>
            <category>Storage</category>
            <category>Databases</category>
        </item>
        <item>
            <title><![CDATA[Collecting Custom Metrics on AKS with Telegraf]]></title>
            <link>https://blog.aks.azure.com/2025/11/06/custom-monitoring-with-telegraf</link>
            <guid>https://blog.aks.azure.com/2025/11/06/custom-monitoring-with-telegraf</guid>
            <pubDate>Thu, 06 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to set up and deploy Telegraf on Azure Kubernetes Service (AKS) to monitor your Kubernetes clusters efficiently by collecting custom metrics and exposing them to Prometheus. This guide covers best practices, step-by-step instructions, and common troubleshooting tips for AKS monitoring with Telegraf.]]></description>
            <content:encoded><![CDATA[<p>What if you need to collect <strong>your own custom metrics</strong> from workloads or nodes in AKS, but don't want to run a full monitoring stack?
In this post, we will discuss how to integrate custom metrics into Azure's managed monitoring stack with minimal setup using <code>Telegraf DaemonSet</code>, for flexible metric collection, <a href="https://learn.microsoft.com/en-us/azure/azure-monitor/metrics/prometheus-metrics-overview" target="_blank" rel="noopener noreferrer">Azure Monitor managed service for Prometheus</a>, for scraping and storage, and <a href="https://learn.microsoft.com/en-us/azure/managed-grafana/overview" target="_blank" rel="noopener noreferrer">Azure Managed Grafana</a> for visualization and alerting.</p>
<p>By default, AKS and Azure Monitor give you a rich set of out-of-the-box insights: CPU and memory utilization, pod restarts, node health, and Kubernetes control plane metrics. But many teams need more visibility into what’s happening <em>inside</em> their workloads — for example:</p>
<ul>
<li>Application-specific metrics such as API request latency or queue depth</li>
<li>Custom business metrics like transactions per second or user sessions</li>
<li>System-level data such as network interface stats, disk I/O, or custom log counters</li>
</ul>
<p>Traditionally, enabling this kind of deep observability required deploying and managing a full Prometheus stack — configuring storage, scaling scrapers, and handling upgrades. That adds operational complexity, especially when all you need is a few targeted custom metrics.  This is where Azure Monitor managed service for Prometheus comes in — it takes care of high availability, storage, and scaling, so you can focus entirely on defining the metrics that matter most. And by using <strong>Telegraf</strong> as a lightweight collector, you can easily publish custom metrics from your workloads or nodes directly into your managed monitoring environment, with no self-managed Prometheus servers required.</p>
<p>While our example uses network metrics, the same pattern applies to any custom data source you want to monitor in AKS. If you want to take this example one step further, we have a hands-on experience with the <a href="https://azure-samples.github.io/aks-labs/docs/operations/observability-and-monitoring" target="_blank" rel="noopener noreferrer">AKS Labs: Advanced Observability Concepts</a> and the <a href="https://www.youtube.com/watch?v=Dc0TqbAkQX0" target="_blank" rel="noopener noreferrer">Observability with Managed Prometheus and Managed Grafana at the Microsoft Reactor</a>.</p>
<p>A common question we hear from Kubernetes users is:</p>
<p>"How can I scrape a specific set of custom metrics from my cluster without adding too much operational overhead?"</p>
<p>Before we dive into the setup, let’s look at why this approach is so effective for extending AKS monitoring.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-this-matters">Why This Matters<a href="https://blog.aks.azure.com/2025/11/06/custom-monitoring-with-telegraf#why-this-matters" class="hash-link" aria-label="Direct link to Why This Matters" title="Direct link to Why This Matters" translate="no">​</a></h2>
<p>If you run workloads on AKS, you already get a solid baseline of metrics out of the box — node and pod resource usage, cluster health, and control plane telemetry through Azure Monitor. But those built-in signals don’t always tell the full story.</p>
<p>Engineers often need visibility into what’s actually happening inside their workloads or on the host — things like:</p>
<ul>
<li>Network throughput or packet drops on specific interfaces.</li>
<li>Application-level metrics like queue depth or request latency.</li>
<li>Custom counters from scripts, logs, or local daemons.</li>
</ul>
<p>You could deploy a full Prometheus stack to get those insights, but that means managing storage, scaling scrapers, maintaining alert rules, and patching over time. For many teams, that’s more operational effort than it’s worth — especially when you just need a handful of custom metrics.</p>
<p>This approach combines <strong>Telegraf, Azure Monitor managed service for Prometheus,</strong> and <strong>Azure Managed Grafana</strong> to bridge that gap. Telegraf runs as a <strong>DaemonSet</strong>, collecting metrics from every node (or from any command or script you define) and exposing them in Prometheus format. Azure Monitor managed service for Prometheus then handles <strong>a) scraping, b) scaling, and c) storage</strong> — so there’s no local Prometheus to manage — and Grafana provides dashboards and alerting without extra infrastructure.</p>
<p>The result is a lightweight, fully managed way to extend AKS observability with exactly the metrics you care about, using standard open-source tools and Azure’s managed services.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="solution-at-a-glance">Solution at a Glance<a href="https://blog.aks.azure.com/2025/11/06/custom-monitoring-with-telegraf#solution-at-a-glance" class="hash-link" aria-label="Direct link to Solution at a Glance" title="Direct link to Solution at a Glance" translate="no">​</a></h2>
<p>Here’s the workflow we’re setting up:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">+-----------------+    +------------------+    +-----------------+ </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">|   AKS Nodes     |    |  Azure Managed   |    |  Azure Managed  | </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">|                 |    |   Prometheus     |    |    Grafana      | </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">| +-------------+ |    |                  |    |                 | </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">| |  Telegraf   | |---&gt;|  Scrapes via     |---&gt;|  Dashboards &amp;   | </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">| | DaemonSet   | |    |  PodMonitor      |    |  Alerting       | </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">| |:2112/metrics| |    |                  |    |                 | </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">| +-------------+ |    +------------------+    +-----------------+ </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">+-----------------+                                                </span><br></span></code></pre></div></div>
<ul>
<li><strong>Telegraf DaemonSet</strong>: runs on every node and collects <em>your</em> custom metrics.</li>
<li><strong>Prometheus PodMonitor</strong>: automatically scrapes those metrics endpoints.</li>
<li><strong>Azure Managed Grafana</strong>: visualizes and alerts on metrics without extra servers.</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="understanding-the-solution">Understanding the solution<a href="https://blog.aks.azure.com/2025/11/06/custom-monitoring-with-telegraf#understanding-the-solution" class="hash-link" aria-label="Direct link to Understanding the solution" title="Direct link to Understanding the solution" translate="no">​</a></h2>
<p>For our example, we will create a custom collection of the following metrics for each network interface using <code>ip -s link</code>:</p>
<table><thead><tr><th>Metric</th><th>Type</th><th>Description</th></tr></thead><tbody><tr><td><code>network_interface_stats_mtu</code></td><td>gauge</td><td>Maximum Transmission Unit</td></tr><tr><td><code>network_interface_stats_rx_bytes</code></td><td>counter</td><td>Received bytes</td></tr><tr><td><code>network_interface_stats_rx_packets</code></td><td>counter</td><td>Received packets</td></tr><tr><td><code>network_interface_stats_rx_errors</code></td><td>counter</td><td>Receive errors</td></tr><tr><td><code>network_interface_stats_rx_dropped</code></td><td>counter</td><td>Received packets dropped</td></tr><tr><td><code>network_interface_stats_rx_missed</code></td><td>counter</td><td>Received packets missed (true missed field from ip -s link)</td></tr><tr><td><code>network_interface_stats_rx_multicast</code></td><td>counter</td><td>Received multicast packets</td></tr><tr><td><code>network_interface_stats_tx_bytes</code></td><td>counter</td><td>Transmitted bytes</td></tr><tr><td><code>network_interface_stats_tx_packets</code></td><td>counter</td><td>Transmitted packets</td></tr><tr><td><code>network_interface_stats_tx_errors</code></td><td>counter</td><td>Transmission errors</td></tr><tr><td><code>network_interface_stats_tx_dropped</code></td><td>counter</td><td>Transmitted packets dropped</td></tr><tr><td><code>network_interface_stats_tx_carrier</code></td><td>counter</td><td>Carrier errors</td></tr><tr><td><code>network_interface_stats_tx_collisions</code></td><td>counter</td><td>Collision errors</td></tr></tbody></table>
<p>Each metric includes the following labels:</p>
<ul>
<li><code>cluster</code>: AKS cluster identifier</li>
<li><code>environment</code>: Environment tag (configurable)</li>
<li><code>host</code>: Node hostname</li>
<li><code>hostname</code>: Node hostname (duplicate for compatibility)</li>
<li><code>interface</code>: Network interface name (eth0, eth1, etc.)</li>
<li><code>state</code>: Interface operational state</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="setup-your-environment-variables-and-placeholders">Setup your environment variables and placeholders<a href="https://blog.aks.azure.com/2025/11/06/custom-monitoring-with-telegraf#setup-your-environment-variables-and-placeholders" class="hash-link" aria-label="Direct link to Setup your environment variables and placeholders" title="Direct link to Setup your environment variables and placeholders" translate="no">​</a></h2>
<p>In these next steps, we will set up a new AKS cluster, an Azure Managed Grafana instance, and an Azure Monitor Workspace.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">export RG_NAME="rg-telegraf-on-aks"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">export LOCATION="westus3"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Azure Kubernetes Service Cluster</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">export AKS_CLUSTER_NAME="telegraf-on-aks"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Azure Managed Grafana</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">export GRAFANA_NAME="aks-blog-${RANDOM}"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Azure Monitor Workspace</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">export AZ_MONITOR_WORKSPACE_NAME="telegraf-on-aks"</span><br></span></code></pre></div></div>
<p>Next, let's create our solution:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Create resource group</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">az group create --name ${RG_NAME} --location ${LOCATION}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Create an Azure Monitor Workspace</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">az monitor account create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group ${RG_NAME} \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --location ${LOCATION} \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name ${AZ_MONITOR_WORKSPACE_NAME}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Get the Azure Monitor Workspace ID</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">AZ_MONITOR_WORKSPACE_ID=$(az monitor account show \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group ${RG_NAME} \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name ${AZ_MONITOR_WORKSPACE_NAME} \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --query id -o tsv)</span><br></span></code></pre></div></div>
<p>Create a Grafana instance. The Azure CLI extension for Azure Managed Grafana (amg) will be used for this.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Add the Azure Managed Grafana extension to az cli:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">az extension add --name amg</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Create an Azure Managed Grafana instance:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">az grafana create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name ${GRAFANA_NAME} \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group $RG_NAME \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --location $LOCATION</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Once created, save the Grafana resource ID</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">GRAFANA_RESOURCE_ID=$(az grafana show \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name ${GRAFANA_NAME} \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group ${RG_NAME} \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --query id -o tsv)</span><br></span></code></pre></div></div>
<p>We can now create the cluster, passing both the 'grafana-resource-id' and 'azure-monitor-workspace-resource-id' during cluster creation:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Create the AKS cluster</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">az aks create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name ${AKS_CLUSTER_NAME}  \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group ${RG_NAME} \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --node-count 1 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --enable-managed-identity  \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --enable-azure-monitor-metrics \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --grafana-resource-id ${GRAFANA_RESOURCE_ID} \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --azure-monitor-workspace-resource-id ${AZ_MONITOR_WORKSPACE_ID}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Get the cluster credentials</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">az aks get-credentials \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name ${AKS_CLUSTER_NAME} \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group ${RG_NAME}</span><br></span></code></pre></div></div>
<p>Verify that the PodMonitor CRD is now available in your cluster</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Check if PodMonitor CRD exists</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get crd | grep podmonitor</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Expected output (Azure Monitor managed service for Prometheus):</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># podmonitors.azmonitoring.coreos.com                  2025-07-23T19:12:02Z</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="deploying-the-solution">Deploying the solution<a href="https://blog.aks.azure.com/2025/11/06/custom-monitoring-with-telegraf#deploying-the-solution" class="hash-link" aria-label="Direct link to Deploying the solution" title="Direct link to Deploying the solution" translate="no">​</a></h2>
<p>We’ll deploy a single YAML manifest that contains:</p>
<ul>
<li><code>ConfigMap</code> for Telegraf config + your custom metric script (<code>parse_ip_stats.sh</code>)</li>
<li><code>DaemonSet</code> to run Telegraf on each node</li>
<li><code>ServiceAccount</code>, <code>Service</code>, and <code>PodMonitor</code></li>
</ul>
<ol>
<li>
<p>Create the Telegraf Configuration</p>
<p>First, let's create the main Telegraf configuration:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">cat &lt;&lt;EOF &gt; 01-configmap.yaml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: ConfigMap</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">name: telegraf-config</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">namespace: default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">data:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">telegraf.conf: |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   [global_tags]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      environment = "aks"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      cluster = "aks-telegraf"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   [agent]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      interval = "30s"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      round_interval = true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      metric_batch_size = 1000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      metric_buffer_limit = 10000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      collection_jitter = "5s"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      flush_interval = "30s"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      flush_jitter = "5s"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      precision = ""</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      hostname = "\$HOSTNAME"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      omit_hostname = false</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   # Custom script to parse ip -s link output</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   [[inputs.exec]]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      commands = ["/usr/local/bin/parse_ip_stats.sh"]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      timeout = "10s"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      data_format = "influx"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      name_override = "network_interface_stats"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   # Prometheus metrics output</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   [[outputs.prometheus_client]]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      listen = ":2112"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      metric_version = 2</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      path = "/metrics"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      expiration_interval = "60s"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      collectors_exclude = ["gocollector", "process"]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
</li>
<li>
<p>Create the Network Parsing Script</p>
<p>Now create the ConfigMap containing our custom script that parses network interface statistics:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">cat &lt;&lt;EOF &gt; 02-scripts-configmap.yaml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: ConfigMap</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">name: telegraf-scripts</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">namespace: default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">data:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">parse_ip_stats.sh: |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   #!/bin/bash</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   # Script to parse ip -s link output and convert to InfluxDB line protocol</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   # Get the current timestamp in nanoseconds</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   timestamp=\$(date +%s%N)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   hostname=\$(hostname)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   # Parse ip -s link output for network statistics</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   ip -s link | awk -v ts="\$timestamp" -v host="\$hostname" '</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   BEGIN {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      interface = "";</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      state = "";</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      mtu = 0;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   # Parse interface line (e.g., "2: eth0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 ...")</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   /^[0-9]+:/ {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      # Extract interface name (handle both regular and @ notation)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      if (match(\$0, /^[0-9]+: ([^:@]+)/)) {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            interface_match = substr(\$0, RSTART, RLENGTH);</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            # Remove the number and colon prefix, then trim spaces</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            gsub(/^[0-9]+: */, "", interface_match);</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            interface = interface_match;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      # Extract state from flags</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      if (match(\$0, /&lt;[^&gt;]+&gt;/)) {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            flags = substr(\$0, RSTART+1, RLENGTH-2);</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            if (index(flags, "UP")) {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">               state = "up";</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            } else {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">               state = "down";</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      # Extract MTU</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      if (match(\$0, /mtu [0-9]+/)) {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            mtu_str = substr(\$0, RSTART+4, RLENGTH-4);</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            mtu = mtu_str + 0;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   # Parse RX line header (RX: bytes packets errors dropped missed mcast)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   /^[[:space:]]*RX:.*bytes.*packets.*errors.*dropped.*missed.*mcast/ {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      getline; # Get the next line with the actual numbers</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      gsub(/^[[:space:]]+/, ""); # Remove leading spaces</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      n = split(\$0, rx_fields);</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      if (n &gt;= 6) {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            rx_bytes = rx_fields[1];</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            rx_packets = rx_fields[2];</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            rx_errors = rx_fields[3];</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            rx_dropped = rx_fields[4];</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            rx_missed = rx_fields[5];</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            rx_multicast = rx_fields[6];</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   # Parse TX line header (TX: bytes packets errors dropped carrier collsns)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   /^[[:space:]]*TX:.*bytes.*packets.*errors.*dropped.*carrier.*collsns/ {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      getline; # Get the next line with the actual numbers</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      gsub(/^[[:space:]]+/, ""); # Remove leading spaces</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      n = split(\$0, tx_fields);</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      if (n &gt;= 6 &amp;&amp; interface != "" &amp;&amp; interface != "lo") {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            tx_bytes = tx_fields[1];</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            tx_packets = tx_fields[2];</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            tx_errors = tx_fields[3];</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            tx_dropped = tx_fields[4];</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            tx_carrier = tx_fields[5];</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            tx_collisions = tx_fields[6];</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            # Output metrics after processing both RX and TX (skip loopback)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            printf "network_interface_stats,interface=%s,hostname=%s,state=\"%s\" ", interface, host, state;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            printf "mtu=%si,", mtu;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            printf "rx_bytes=%si,rx_packets=%si,rx_errors=%si,rx_dropped=%si,rx_missed=%si,rx_multicast=%si,", rx_bytes, rx_packets, rx_errors, rx_dropped, rx_missed, rx_multicast;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            printf "tx_bytes=%si,tx_packets=%si,tx_errors=%si,tx_dropped=%si,tx_carrier=%si,tx_collisions=%si ", tx_bytes, tx_packets, tx_errors, tx_dropped, tx_carrier, tx_collisions;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            printf "%s\n", ts;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   '</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
</li>
<li>
<p>Create the Service Account</p>
<p>Create a service account for RBAC permissions:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">cat &lt;&lt;EOF &gt; 03-serviceaccount.yaml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: ServiceAccount</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">name: telegraf-sa</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">namespace: default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
</li>
<li>
<p>Create the DaemonSet</p>
<p>Now create the main DaemonSet that runs Telegraf on each node:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">cat &lt;&lt;EOF &gt; 04-daemonset.yaml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: apps/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: DaemonSet</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">name: telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">namespace: default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">labels:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   app: telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">selector:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   matchLabels:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      app: telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">template:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      labels:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      app: telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      serviceAccountName: telegraf-sa</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      hostNetwork: true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      hostPID: true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      tolerations:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - key: node-role.kubernetes.io/master</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      operator: Exists</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      effect: NoSchedule</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - key: node-role.kubernetes.io/control-plane</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      operator: Exists</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      effect: NoSchedule</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      containers:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      image: telegraf:1.28</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      env:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: HOSTNAME</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         valueFrom:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            fieldRef:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            fieldPath: spec.nodeName</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      ports:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: prometheus</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         containerPort: 2112</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         protocol: TCP</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      securityContext:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         privileged: true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         runAsUser: 0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      volumeMounts:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: telegraf-config</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         mountPath: /etc/telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: telegraf-scripts</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         mountPath: /scripts</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: proc</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         mountPath: /host/proc</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         readOnly: true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: sys</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         mountPath: /host/sys</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         readOnly: true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: var-run-docker</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         mountPath: /var/run/docker.sock</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         readOnly: true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      resources:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         requests:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            memory: "64Mi"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            cpu: "100m"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         limits:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            memory: "128Mi"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            cpu: "200m"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      command:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - /bin/bash</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - -c</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         # Install iproute2 if not present</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         if ! command -v ip &gt; /dev/null 2&gt;&amp;1; then</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            apt-get update &amp;&amp; apt-get install -y iproute2</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         fi</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         # Copy the parsing script to the expected location</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         cp /scripts/parse_ip_stats.sh /usr/local/bin/parse_ip_stats.sh</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         chmod +x /usr/local/bin/parse_ip_stats.sh</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         # Start telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         exec telegraf --config /etc/telegraf/telegraf.conf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      volumes:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: telegraf-config</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      configMap:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         name: telegraf-config</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         defaultMode: 0755</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: telegraf-scripts</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      configMap:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         name: telegraf-scripts</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         defaultMode: 0755</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: proc</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      hostPath:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         path: /proc</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: sys</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      hostPath:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         path: /sys</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: var-run-docker</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      hostPath:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         path: /var/run/docker.sock</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      terminationGracePeriodSeconds: 30</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
</li>
<li>
<p>Create the Service</p>
<p>Create a service to expose the Prometheus metrics endpoint:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">cat &lt;&lt;EOF &gt; 05-service.yaml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: Service</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">name: telegraf-metrics</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">namespace: default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">labels:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   app: telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">selector:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   app: telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ports:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- name: prometheus</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   port: 2112</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   targetPort: 2112</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   protocol: TCP</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">type: ClusterIP</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
</li>
<li>
<p>Create the PodMonitor</p>
<p>Finally, create the PodMonitor that tells Azure Monitor managed service for Prometheus to scrape our metrics:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">cat &lt;&lt;EOF &gt; 06-podmonitor.yaml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: azmonitoring.coreos.com/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: PodMonitor</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">name: telegraf-podmonitor</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">namespace: default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">labels:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   app: telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">selector:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   matchLabels:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      app: telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">podMetricsEndpoints:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- port: prometheus</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   interval: 30s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   path: /metrics</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
</li>
<li>
<p>Deploy All Components</p>
<p>Now deploy all the components in order:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f 01-configmap.yaml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f 02-scripts-configmap.yaml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f 03-serviceaccount.yaml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f 04-daemonset.yaml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f 05-service.yaml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f 06-podmonitor.yaml</span><br></span></code></pre></div></div>
</li>
<li>
<p>Verification</p>
<p>After a minute or two, verify everything is running:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get daemonset telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get pods -l app=telegraf</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get service telegraf-metrics</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get podmonitor telegraf-podmonitor</span><br></span></code></pre></div></div>
</li>
<li>
<p>Validate the Metrics</p>
<p>You can check if the new metrics are now being collected correctly, by forwarding the <code>telegraf-metrics</code> service port locally and then by running <code>curl</code> against it:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl port-forward svc/telegraf-metrics 2112:2112 &amp;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">curl http://localhost:2112/metrics | head -20</span><br></span></code></pre></div></div>
<p>Sample output:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># HELP network_interface_stats_rx_bytes Telegraf collected metric</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># TYPE network_interface_stats_rx_bytes untyped</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">network_interface_stats_rx_bytes{interface="eth0",host="aks-node-1"} 16876971289</span><br></span></code></pre></div></div>
<p>Great! At this point we know that our collection is working. Next we will look into how to visualize these new metrics in Grafana.</p>
</li>
</ol>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="visualize-in-grafana">Visualize in Grafana<a href="https://blog.aks.azure.com/2025/11/06/custom-monitoring-with-telegraf#visualize-in-grafana" class="hash-link" aria-label="Direct link to Visualize in Grafana" title="Direct link to Visualize in Grafana" translate="no">​</a></h2>
<p>You can now go to your new <strong>Azure Managed Grafana</strong> instance and try some queries. To get the URL for your <strong>Azure Managed Grafana</strong>, run the following command:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">GRAFANA_UI=$(az grafana show \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name ${GRAFANA_NAME} \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group ${RG_NAME} \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --query "properties.endpoint" -o tsv)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">echo "Your Azure Managed Grafana is accessible at: $GRAFANA_UI"</span><br></span></code></pre></div></div>
<p>Now that you know the URL, open Azure Managed Grafana and go to the <code>Drilldown</code> tab.</p>
<p><img decoding="async" loading="lazy" alt="Explore" src="https://blog.aks.azure.com/assets/images/grafana-drilldown-e26935f2a7377d3a82d46a7a40bc31f6.png" width="302" height="396" class="img_ev3q"></p>
<p>Make sure the Data Source is <code>Managed_Prometheus_telegraf-on-aks</code>.</p>
<p><img decoding="async" loading="lazy" alt="Datasource" src="https://blog.aks.azure.com/assets/images/grafana-datasource-80685941f4e3b40e36c97d5867be93b9.png" width="614" height="193" class="img_ev3q"></p>
<p>Try to search for <code>network_interface_</code> metrics. You should see all of the new metrics that are being collected by Telegraf</p>
<p><img decoding="async" loading="lazy" alt="Drilldown" src="https://blog.aks.azure.com/assets/images/drilldown-metrics-8425036437d88bc016f831f1d8620501.png" width="1890" height="1001" class="img_ev3q"></p>
<p>Next you can create <strong>table panels</strong> with <code>Instant</code> queries for top-N views or <strong>time series panels</strong> for trends over time. Here are some suggestions on metrics:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Network throughput by node</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sum(rate(network_interface_stats_rx_bytes[5m])) by (host)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Top interfaces by traffic</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">topk(10, network_interface_stats_tx_bytes)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Packet drops</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sum(rate(network_interface_stats_rx_dropped[5m])) by (interface)</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="cleaning-up">Cleaning up<a href="https://blog.aks.azure.com/2025/11/06/custom-monitoring-with-telegraf#cleaning-up" class="hash-link" aria-label="Direct link to Cleaning up" title="Direct link to Cleaning up" translate="no">​</a></h2>
<p>To remove these resources, you can run this command:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az group delete --name ${RG_NAME} --yes --no-wait</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="conclusion">Conclusion<a href="https://blog.aks.azure.com/2025/11/06/custom-monitoring-with-telegraf#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>In this post, we saw an approach to integrating custom metrics into Azure’s managed monitoring stack with minimal setup using <code>Telegraf DaemonSet</code>, for flexible metric collection, <code>Azure Monitor managed service for Prometheus</code>, for scraping and storage, and <a href="https://learn.microsoft.com/en-us/azure/managed-grafana/overview" target="_blank" rel="noopener noreferrer">Azure Managed Grafana</a> for visualization and alerting.</p>
<p>While our example used network metrics, the same pattern applies to any custom data source you want to monitor in AKS. If you want to take this example one step further, we have a hands-on experience with the <a href="https://azure-samples.github.io/aks-labs/docs/operations/observability-and-monitoring" target="_blank" rel="noopener noreferrer">AKS Labs: Advanced Observability Concepts</a> and the <a href="https://www.youtube.com/watch?v=Dc0TqbAkQX0" target="_blank" rel="noopener noreferrer">Observability with Managed Prometheus and Managed Grafana at the Microsoft Reactor</a>.</p>]]></content:encoded>
            <category>General</category>
            <category>Operations</category>
            <category>Observability</category>
        </item>
        <item>
            <title><![CDATA[Scaling multi-node LLM inference with NVIDIA Dynamo and ND GB200 NVL72 GPUs on AKS]]></title>
            <link>https://blog.aks.azure.com/2025/10/24/dynamo-on-aks</link>
            <guid>https://blog.aks.azure.com/2025/10/24/dynamo-on-aks</guid>
            <pubDate>Fri, 24 Oct 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how optimizing routing, planning, and system setup boosts the efficiency and reliability of multi-node LLM inference on AKS with OSS Dynamo and ND GB200-v6 GPUs.]]></description>
            <content:encoded><![CDATA[<p><em>This blog post is co-authored with
<a href="https://www.linkedin.com/in/rohan-s-varma/" target="_blank" rel="noopener noreferrer">Rohan Varma</a>,
<a href="https://www.linkedin.com/in/sa126/" target="_blank" rel="noopener noreferrer">Saurabh Aggarwal</a>,
<a href="https://www.linkedin.com/in/anish-maddipoti/" target="_blank" rel="noopener noreferrer">Anish Maddipoti</a>, and
<a href="https://www.linkedin.com/in/meleegy/" target="_blank" rel="noopener noreferrer">Amr Elmeleegy</a> from NVIDIA
to showcase solutions
that help customers run AI inference at scale using Azure Kubernetes Service
(AKS) and NVIDIA’s advanced hardware and distributed inference frameworks.</em></p>
<p>Modern language models now routinely exceed the compute and memory capacity of
a single GPU or even a whole node with multiple GPUs on Kubernetes.
Consequently, inference at the
scale of billions of model parameters demands multi-node, distributed
deployment. Frameworks like the <a href="https://github.com/ai-dynamo/dynamo" target="_blank" rel="noopener noreferrer">open-source NVIDIA Dynamo platform</a>
play a crucial role by coordinating execution across nodes, managing
memory resources efficiently, and accelerating data transfers between GPUs
to keep latency low.</p>
<p>However, software alone cannot solve these challenges. The underlying hardware
must also support this level of scale and throughput. Rack-scale systems like
<a href="https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/nd-gb200-v6-series" target="_blank" rel="noopener noreferrer">Azure ND GB200-v6</a>
VMs, accelerated by NVIDIA GB200 NVL72, meet this need by integrating
72 NVIDIA Blackwell
GPUs in a distributed GPU setup connected via high-bandwidth, low-latency
interconnect. This architecture uses the rack as a unified compute engine
and enables fast, efficient communication and scaling that traditional
multi-node setups struggle to achieve.</p>
<p>For some more demanding or unpredictable workloads, even combining advanced hardware and distributed inference frameworks
is not sufficient on its own. Inference traffic spikes unpredictably.
Fixed, static inference configurations and setups with predetermined resource
allocation can lead to GPU underutilization or overprovisioning. Instead,
inference infrastructure must dynamically adjust in real time, scaling
resources up or down to align with current demand without wasting GPU capacity
or risking performance degradation.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="a-holistic-solution-nd-gb200-v6-vms-and-dynamo-on-aks">A holistic solution: ND GB200-v6 VMs and Dynamo on AKS<a href="https://blog.aks.azure.com/2025/10/24/dynamo-on-aks#a-holistic-solution-nd-gb200-v6-vms-and-dynamo-on-aks" class="hash-link" aria-label="Direct link to A holistic solution: ND GB200-v6 VMs and Dynamo on AKS" title="Direct link to A holistic solution: ND GB200-v6 VMs and Dynamo on AKS" translate="no">​</a></h2>
<p>To effectively address the variability in inference traffic in distributed
deployments, our approach combines three key components: ND GB200-v6
VMs, the NVIDIA Dynamo inference framework, with an Azure Kubernetes
Service (AKS) cluster. Together, these technologies provide the scale,
flexibility, and responsiveness necessary to meet the demands of modern,
large-scale inference workloads.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="nd-gb200-v6-rack-scale-accelerated-hardware">ND GB200-v6: Rack-Scale Accelerated Hardware<a href="https://blog.aks.azure.com/2025/10/24/dynamo-on-aks#nd-gb200-v6-rack-scale-accelerated-hardware" class="hash-link" aria-label="Direct link to ND GB200-v6: Rack-Scale Accelerated Hardware" title="Direct link to ND GB200-v6: Rack-Scale Accelerated Hardware" translate="no">​</a></h3>
<p>At the core of Azure’s <a href="https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/nd-gb200-v6-series" target="_blank" rel="noopener noreferrer">ND GB200-v6 VM series</a>
is the liquid-cooled NVIDIA GB200 NVL72 system, a rack-scale architecture
that integrates 72 NVIDIA Blackwell GPUs and 36 NVIDIA Grace™ CPUs into a
single, tightly coupled domain.</p>
<p>The rack-scale design of ND GB200-v6 unlocks model serving patterns that were
previously infeasible due to interconnect and memory bandwidth constraints.</p>
<p><img decoding="async" loading="lazy" alt="NVIDIA GB200 NVL72 system" src="https://blog.aks.azure.com/assets/images/GB200_v6_arch-866a00c62b79d6de64ad53ac70613849.png" width="1600" height="898" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="nvidia-dynamo-a-distributed-inference-framework">NVIDIA Dynamo: a distributed inference framework<a href="https://blog.aks.azure.com/2025/10/24/dynamo-on-aks#nvidia-dynamo-a-distributed-inference-framework" class="hash-link" aria-label="Direct link to NVIDIA Dynamo: a distributed inference framework" title="Direct link to NVIDIA Dynamo: a distributed inference framework" translate="no">​</a></h3>
<p><a href="https://www.nvidia.com/en-us/ai/dynamo/" target="_blank" rel="noopener noreferrer">NVIDIA Dynamo</a> is an open source
distributed inference serving framework that supports multiple engine
backends, including <a href="https://github.com/vllm-project/vllm" target="_blank" rel="noopener noreferrer">vLLM</a>,
<a href="https://github.com/NVIDIA/TensorRT-LLM" target="_blank" rel="noopener noreferrer">TensorRT-LLM</a>, and
<a href="https://github.com/sgl-project/sglang" target="_blank" rel="noopener noreferrer">SGLang</a>. It disaggregates the
prefill (compute-bound) and decode (memory-bound) phases across separate GPUs,
enabling independent scaling and phase-specific parallelism strategies.
For example, the memory-bound decode phase can leverage wide
<a href="https://developer.nvidia.com/blog/how-nvidia-gb200-nvl72-and-nvidia-dynamo-boost-inference-performance-for-moe-models/#boosting_moe_model_performance_with_disaggregated_serving%C2%A0" target="_blank" rel="noopener noreferrer">expert parallelism</a> (EP)
without constraining the compute-heavy prefill phase, improving overall
resource utilization and performance.</p>
<p>Dynamo includes an
<a href="https://docs.nvidia.com/dynamo/v-0-9-0/components/planner" target="_blank" rel="noopener noreferrer">SLA-based Planner</a>
that proactively manages GPU scaling for prefill/decode (PD) disaggregated
inference. Using pre-deployment profiling, it evaluates how model parallelism
and batching affect performance, recommending configurations that meet
latency targets like Time to First Token (TTFT) and Inter-Token Latency (ITL)
within a given GPU budget. At runtime, the Planner forecasts traffic with
time-series models, dynamically adjusting PD worker counts based on predicted
demand and real-time metrics.</p>
<p>The Dynamo
<a href="https://docs.nvidia.com/dynamo/v-0-9-0/components/router" target="_blank" rel="noopener noreferrer">LLM-aware Router</a>
manages the key-value (KV) cache across large GPU clusters by hashing requests
and tracking cache locations. It calculates overlap scores between incoming
requests and cached KV blocks, routing requests to GPUs that maximize cache
reuse while balancing workload. This cache-aware routing reduces costly KV
recomputation and avoids bottlenecks, which in turn improves performance, especially for
large models with long context windows.</p>
<p>To reduce GPU memory overhead, the Dynamo
<a href="https://github.com/ai-dynamo/dynamo/blob/f93b619ad9c6dfe820fbf08b79f1f9eedec4a62c/docs/kvbm/kvbm_architecture.md" target="_blank" rel="noopener noreferrer">KV Block Manager</a>
offloads infrequently accessed KV blocks to CPU RAM, SSDs, or object storage.
It supports hierarchical caching and intelligent eviction policies across
nodes, scaling cache storage to petabyte levels while preserving reuse
efficiency.</p>
<p>Dynamo’s disaggregated execution model is especially effective for large,
dynamic inference workloads where compute and memory demands shift across
phases. The Azure Research paper <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2023/12/Splitwise_ISCA24.pdf" target="_blank" rel="noopener noreferrer">"Splitwise: Efficient generative LLM
inference using phase splitting"</a>
demonstrated the benefits of separating the compute-intensive prefill and
memory-bound decode phases of LLM inference onto different hardware. We will
explore this disaggregated model in detail in an upcoming blog post.</p>
<p><img decoding="async" loading="lazy" alt="Dynamo project key features" src="https://blog.aks.azure.com/assets/images/dynamo_features-04dd30fd6edac98ed86c14ec21fca1af.png" width="1600" height="896" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-dynamo-can-optimize-ai-product-recommendations-in-e-commerce-apps">How Dynamo can optimize AI product recommendations in e-commerce apps<a href="https://blog.aks.azure.com/2025/10/24/dynamo-on-aks#how-dynamo-can-optimize-ai-product-recommendations-in-e-commerce-apps" class="hash-link" aria-label="Direct link to How Dynamo can optimize AI product recommendations in e-commerce apps" title="Direct link to How Dynamo can optimize AI product recommendations in e-commerce apps" translate="no">​</a></h3>
<p>Let’s put Dynamo’s features in context by walking through a realistic app
scenario and explore how its framework addresses common inference challenges
on AKS.</p>
<p>Imagine you operate a large e-commerce platform (or provide infrastructure for
one), where customers browse thousands of products in real time. The app runs
on AKS and experiences traffic surges during sales, launches, and seasonal
events. The app also leverages LLMs to generate natural language outputs,
such as:</p>
<ul>
<li>Context-aware product recommendations</li>
<li>Dynamic product descriptions</li>
<li>AI-generated upsells based on behavior, reviews, or search queries</li>
</ul>
<p>This architecture powers user experiences like: “Customers who viewed this
camera also looked at these accessories, chosen for outdoor use and battery
compatibility.” Personalized product copies are dynamically rewritten for
different segments, such as “For photographers” vs. “For frequent travelers.”</p>
<p>Behind the scenes, it requires a multi-stage LLM pipeline: retrieving
product/user context, running prompted inference, and generating natural
language outputs per session.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="common-pain-points-and-how-dynamo-tackles-them">Common pain points and how Dynamo tackles them<a href="https://blog.aks.azure.com/2025/10/24/dynamo-on-aks#common-pain-points-and-how-dynamo-tackles-them" class="hash-link" aria-label="Direct link to Common pain points and how Dynamo tackles them" title="Direct link to Common pain points and how Dynamo tackles them" translate="no">​</a></h3>
<ol>
<li>
<p><strong>Heavy Prefill + Lightweight Decode = GPU Waste</strong></p>
<p>Generating personalized recommendations requires a heavy prefill stage
(processing more than 8,000 tokens of context) but results in short outputs
(~50 tokens). Running both on a single GPU can be inefficient.</p>
<p><strong>Dynamo Solution</strong>: The pipeline is split into two distinct stages, each
deployed on separate GPUs. This allows independent configuration of GPU
count and model parallelism for each phase. It also enables the use of
different GPU types—for example, GPUs with high compute capability but
lower memory for the prefill stage, and GPUs with both high compute and
large memory capacity for the decode stage.</p>
<p>In our e-commerce example, when a user lands on a product page:</p>
<ul>
<li>
<p>Prefill runs uninterrupted on dedicated GPUs using model parallelism
degrees optimized for accelerating math-intensive attention GEMM
operation. This enables fast processing of 8,000 tokens of user context
and product metadata.</p>
</li>
<li>
<p>Decode runs on a GPU pool with different counts and parallelism degrees
designed and tuned to maximize memory bandwidth and capacity for
generating the short product blurb.</p>
</li>
</ul>
<p><strong>Result</strong>: This approach maximizes GPU utilization and reduces per-request
cost.</p>
</li>
<li>
<p><strong>Meeting SLOs and handling traffic spikes without overprovisioning</strong></p>
<p>Your SLO might define time-to-first-token &lt; 300ms and 99th percentile
latency &lt; 500ms, but maintaining this across dynamic workloads is tough.
Static GPU allocation leads to bottlenecks during traffic spikes,
causing either SLO violations or wasted capacity.</p>
<p><strong>Dynamo Solution</strong>: Continuously monitors metrics and auto-scales GPU
replicas or reallocates GPUs between prefill and decode stages based on
real-time traffic patterns, queue depth, and latency targets.</p>
<p>In our e-commerce example:</p>
<ul>
<li>During Black Friday, Dynamo observes latency climbing due to a surge in
prefill demand. It responds by increasing prefill GPU replicas by 50%,
shifting GPUs from decode or spinning up additional ones.</li>
<li>At night, when email generation jobs dominate, Dynamo reallocates GPUs
back to decode to optimize throughput.</li>
<li>When load drops, resources scale back down.</li>
</ul>
<p><strong>Result</strong>: SLOs are met consistently without over or under provisioning,
controlling costs while maintaining performance.</p>
</li>
<li>
<p><strong>Recomputing shared context is wasteful</strong></p>
<p>Many requests within the same session reuse the same product or user
context but unnecessarily recompute the KV cache each time, wasting
valuable GPU resources that could be spent serving other user requests.</p>
<p><strong>Dynamo Solution</strong>: LLM-aware routing maintains a map of KV cache across
large GPU clusters and directs requests to the GPUs that already hold the
relevant KV cache, avoiding redundant computation.</p>
<p>In our e-commerce example:</p>
<ul>
<li>A user browses five similar items in one session.</li>
<li>Dynamo routes all requests to the same GPU that already has the user’s
or product’s context cached.</li>
</ul>
<p><strong>Result</strong>: Faster response times, lower latency, reduced GPU usage.</p>
</li>
<li>
<p><strong>KV cache growth blows past GPU memory</strong></p>
<p>With many concurrent sessions and large input sequence lengths, the
KV cache (product data + user history) can exceed available GPU memory.
This can trigger evictions, leading to costly re-computations or inference
errors.</p>
<p><strong>Dynamo Solution</strong>: The KV Block Manager (KVBM) offloads cold/unused KV
cache data to CPU RAM, NVMe, or networked storage freeing valuable GPU
memory for active requests.</p>
<p>In our e-commerce example:</p>
<ul>
<li>Without cache offloading: increasing number of concurrent sessions
per GPU increases latency due to KV cache evictions and recomputations</li>
<li>With Dynamo: GPUs can support higher concurrencies while maintaining
low latency</li>
</ul>
<p><strong>Result</strong>: Higher concurrency at lower cost, without degrading user
experience.</p>
</li>
</ol>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="enterprise-scale-inference-experiments--dynamo-with-gb200-running-on-aks">Enterprise-scale inference experiments:  Dynamo with GB200, running on AKS<a href="https://blog.aks.azure.com/2025/10/24/dynamo-on-aks#enterprise-scale-inference-experiments--dynamo-with-gb200-running-on-aks" class="hash-link" aria-label="Direct link to Enterprise-scale inference experiments:  Dynamo with GB200, running on AKS" title="Direct link to Enterprise-scale inference experiments:  Dynamo with GB200, running on AKS" translate="no">​</a></h2>
<p>We set out to deploy the popular open-source
<a href="https://huggingface.co/openai/gpt-oss-120b" target="_blank" rel="noopener noreferrer">GPT-OSS 120B</a> reasoning model
using Dynamo on AKS on GB200 NVL72, adapting the
<a href="https://inferencemax.semianalysis.com/" target="_blank" rel="noopener noreferrer">SemiAnalysis InferenceMAX</a> recipe
for a large scale, production-grade environment.</p>
<p><strong>Our approach</strong>: leverage Dynamo as the inference server and swap GB200 NVL72
nodes in place of NVIDIA HGX™ B200, scaling the deployment
across multiple nodes.</p>
<p>Our goal was to replicate the performance results reported by SemiAnalysis,
but at a larger scale within an AKS environment, proving that enterprise-scale
inference with cutting-edge hardware and open-source models is not only
possible, but practical.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="aks-deployment-overview">AKS Deployment Overview<a href="https://blog.aks.azure.com/2025/10/24/dynamo-on-aks#aks-deployment-overview" class="hash-link" aria-label="Direct link to AKS Deployment Overview" title="Direct link to AKS Deployment Overview" translate="no">​</a></h3>
<p>Ready to build the same setup? Our comprehensive guide walks you through
each stage of the deployment:</p>
<ol>
<li><em>Set up your foundation:</em> Configure GPU node pools and prepare your
inference set up with the prerequisites you will need.</li>
<li><em>Deploy Dynamo via Helm:</em> Get the inference server running with the right
configurations for GB200 NVL72.</li>
<li><em>Benchmark performance with your serving engine:</em> Test and optimize latency/throughput under production conditions.</li>
</ol>
<p>Find the complete recipe for GPT-OSS 120B at
<a href="https://aka.ms/dynamo-recipe-gpt-oss-120b" target="_blank" rel="noopener noreferrer">aka.ms/dynamo-recipe-gpt-oss-120b</a>
and get hands-on with the deployment guide at
<a href="https://aka.ms/aks-dynamo" target="_blank" rel="noopener noreferrer">aka.ms/aks-dynamo</a>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-results">The results<a href="https://blog.aks.azure.com/2025/10/24/dynamo-on-aks#the-results" class="hash-link" aria-label="Direct link to The results" title="Direct link to The results" translate="no">​</a></h3>
<p>By following this approach, we achieved 1.2 million tokens per second,
meeting our goal of replicating SemiAnalysis InferenceMAX results at enterprise
scale. This demonstrates that Dynamo on AKS running on ND GB200-v6 instances
can deliver the performance needed for production inference workloads.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="looking-ahead">Looking ahead<a href="https://blog.aks.azure.com/2025/10/24/dynamo-on-aks#looking-ahead" class="hash-link" aria-label="Direct link to Looking ahead" title="Direct link to Looking ahead" translate="no">​</a></h2>
<p>This work reflects a deep collaboration between Azure and NVIDIA to reimagine
how large-scale inference is built and operated, from the hardware up through
the software stack. By combining GB200 NVL72 nodes and the open-source Dynamo
project on AKS, we’ve taken a step toward making distributed inference faster,
more efficient, and more responsive to real-world demands.</p>
<p>This post focused on the foundational serving stack. In upcoming
blogs, we will build on this foundation and explore more of Dynamo's
advanced features, such as
<a href="https://github.com/ai-dynamo/dynamo/blob/9defc01b9b9c51a4a21abbb02907a4f1d5d2a2d2/examples/basics/disaggregated_serving/README.md#L4" target="_blank" rel="noopener noreferrer">Disaggregated Serving</a>
and <a href="https://docs.nvidia.com/dynamo/v-0-9-0/components/planner" target="_blank" rel="noopener noreferrer">SLA-based Planner</a>.
We'll demonstrate how these features allow for even greater efficiency, moving
from a static, holistic deployment to a flexible, phase-splitted architecture.
Moving forward, we also plan to extend our testing to include larger
mixture-of-experts (MoE) reasoning models such as DeepSeek R1.
We encourage you to try out the
<a href="https://aka.ms/dynamo-recipe-gpt-oss-120b" target="_blank" rel="noopener noreferrer">Dynamo recipe</a>
in this blog on <a href="https://aka.ms/aks-dynamo" target="_blank" rel="noopener noreferrer">AKS</a> and share your feedback!</p>]]></content:encoded>
            <category>Dynamo on AKS series</category>
            <category>AI</category>
            <category>GPU</category>
            <category>oss</category>
            <category>GB200</category>
        </item>
        <item>
            <title><![CDATA[How to Deploy AKS MCP Server on AKS with Workload Identity]]></title>
            <link>https://blog.aks.azure.com/2025/10/22/deploy-mcp-server-aks-workload-identity</link>
            <guid>https://blog.aks.azure.com/2025/10/22/deploy-mcp-server-aks-workload-identity</guid>
            <pubDate>Wed, 22 Oct 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Deploy the AKS MCP server directly on Azure Kubernetes Service with secure Workload Identity authentication. Learn how to centralize MCP server management, leverage AKS scalability, and maintain fine-grained access control with this complete guide.]]></description>
            <content:encoded><![CDATA[<p>It's been a few months since the <a href="https://blog.aks.azure.com/2025/08/06/aks-mcp-server">AKS-MCP server was announced</a>. Since then, there have been several updates and improvements. The MCP server can be easily installed on a local machine using the <a href="https://blog.aks.azure.com/2025/08/06/aks-mcp-server#getting-started-with-aks-mcp">AKS Extension for VS Code</a>, or via the <a href="https://github.com/mcp?q=AKS" target="_blank" rel="noopener noreferrer">GitHub MCP registry</a>, or even using the <a href="https://hub.docker.com/mcp/explore?search=AKS" target="_blank" rel="noopener noreferrer">Docker MCP hub</a>.</p>
<p>In this blog post, I'll show you one approach to running the AKS MCP server: deploying it inside an AKS cluster as a Streamable HTTP service. This pattern demonstrates how MCP servers can be centrally managed and made accessible to multiple clients—including AI assistants, automation tools, and even autonomous agents.</p>
<p>Before we get started, let's explore why you might consider this deployment pattern.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-deploy-mcp-servers-on-aks">Why Deploy MCP Servers on AKS?<a href="https://blog.aks.azure.com/2025/10/22/deploy-mcp-server-aks-workload-identity#why-deploy-mcp-servers-on-aks" class="hash-link" aria-label="Direct link to Why Deploy MCP Servers on AKS?" title="Direct link to Why Deploy MCP Servers on AKS?" translate="no">​</a></h2>
<p>Running MCP servers on AKS offers several potential advantages, depending on your requirements:</p>
<ul>
<li><strong>Centralized deployment</strong>: Host multiple MCP servers in a single cluster, making them accessible to various clients without requiring local installation on each machine</li>
<li><strong>Scalability and reliability</strong>: Leverage AKS's built-in features for scaling, monitoring, logging, and high availability</li>
<li><strong>Secure authentication</strong>: Use Workload Identity for passwordless authentication with fine-grained access control</li>
<li><strong>Multi-client access</strong>: Enable different teams, applications, or autonomous agents to connect to the same MCP server instance over HTTP</li>
<li><strong>Standardized patterns</strong>: Establish a repeatable deployment pattern that can be applied to other MCP servers beyond just the AKS MCP server</li>
<li><strong>Governance and auditability</strong>: Maintain centralized control over authentication and audit trails by collecting MCP server logs in one place, providing visibility into all tool invocations across your organization</li>
</ul>
<p>That said, this approach isn't the only way—or necessarily the best way—to run MCP servers. Local installations may be simpler for individual users, while other deployment patterns might better suit different use cases. The key is choosing the right approach for your specific requirements.</p>
<p>In this setup, we'll use a User-Assigned Managed Identity with Workload Identity to authenticate the MCP server to Azure, giving it necessary permissions to manage both Kubernetes and Azure resources.</p>
<p>Let's dive in!</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="prerequisites">Prerequisites<a href="https://blog.aks.azure.com/2025/10/22/deploy-mcp-server-aks-workload-identity#prerequisites" class="hash-link" aria-label="Direct link to Prerequisites" title="Direct link to Prerequisites" translate="no">​</a></h2>
<p>Before you begin, ensure you have the following:</p>
<ul>
<li><a href="https://azure.microsoft.com/free/" target="_blank" rel="noopener noreferrer">Azure subscription</a></li>
<li><a href="https://learn.microsoft.com/cli/azure/install-azure-cli" target="_blank" rel="noopener noreferrer">Azure CLI</a></li>
<li><a href="https://kubernetes.io/docs/tasks/tools/install-kubectl/" target="_blank" rel="noopener noreferrer">kubectl</a></li>
<li><a href="https://nodejs.org/" target="_blank" rel="noopener noreferrer">NodeJS and npm</a> for running MCP Inspector</li>
<li>POSIX-compliant shell (e.g., bash, zsh) or use <a href="https://shell.azure.com/" target="_blank" rel="noopener noreferrer">Azure Cloud Shell</a></li>
</ul>
<p>With an Azure subscription and the Azure CLI installed, log into your Azure account.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az login</span><br></span></code></pre></div></div>
<p>We'll take advantage of one preview feature of AKS so run the following command to install the Azure CLI extensions for AKS.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az extension add --name aks-preview</span><br></span></code></pre></div></div>
<p>Register the <strong>DisableSSHPreview</strong> feature in your subscription. We don't need SSH access for this cluster so disable it at cluster creation time.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az feature register --name DisableSSHPreview --namespace Microsoft.ContainerService</span><br></span></code></pre></div></div>
<p>Wait for the feature to be registered. You can check the status with the following command:</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az feature show --name DisableSSHPreview --namespace Microsoft.ContainerService</span><br></span></code></pre></div></div>
<p>Once the feature is registered, refresh the registration of the Microsoft.ContainerService resource provider.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az provider register --namespace Microsoft.ContainerService</span><br></span></code></pre></div></div>
<p>With the prerequisites in place, you're ready to start setting up the AKS MCP server on AKS.</p>
<p>Run the following command to export a random name variable to use for resource names to avoid naming conflicts.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">export RANDOM_NAME=$(petname) # If you don't have petname, replace this entire command with e.g. export RANDOM_NAME=myrandomname</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>I am working on a Ubuntu 24.04 LTS machine with the <a href="https://manpages.ubuntu.com/manpages/jammy/en/man1/petname.1.html" target="_blank" rel="noopener noreferrer">petname</a> utility installed to generate random names. If you don't have access to this package, you can simply replace the <code>petname</code> command with any random name of your choice.</p></div></div>
<p>Set a few environment variables to use throughout.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">export LOCATION=westus3 # or any Azure region of your choice</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">export RESOURCE_GROUP_NAME=rg-$RANDOM_NAME</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">export AKS_CLUSTER_NAME=aks-$RANDOM_NAME</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">export MANAGED_IDENTITY_NAME=mi-$RANDOM_NAME</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="provision-azure-resources">Provision Azure resources<a href="https://blog.aks.azure.com/2025/10/22/deploy-mcp-server-aks-workload-identity#provision-azure-resources" class="hash-link" aria-label="Direct link to Provision Azure resources" title="Direct link to Provision Azure resources" translate="no">​</a></h2>
<p>You're now ready to create the necessary Azure resources. Run the following command to create a resource group and retrieve the resource group ID. This will be used later when assigning Azure RBAC roles to the managed identity.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">read -r RESOURCE_GROUP_ID &lt;&lt;&lt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "$(az group create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name $RESOURCE_GROUP_NAME \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --location $LOCATION \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --query '{id:id}' -o tsv)"</span><br></span></code></pre></div></div>
<p>For demonstration purposes, we can create a single node AKS cluster. Run the following command to create one with Workload Identity enabled and retrieve the OIDC issuer URL. This value will be used later when configuring the federated identity credential.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">read -r AKS_OIDC_ISSUER_URL &lt;&lt;&lt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "$(az aks create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --resource-group $RESOURCE_GROUP_NAME \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --name $AKS_CLUSTER_NAME \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --enable-workload-identity \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --enable-oidc-issuer \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --ssh-access disabled \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --node-count 1 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --query '{oidcIssuerUrl:oidcIssuerProfile.issuerUrl}' -o tsv)"</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>The <code>--ssh-access disabled</code> flag is used to disable SSH access to the nodes in the AKS cluster. This is a security best practice as it reduces the attack surface of the cluster.</p></div></div>
<p>In order to allow the MCP server to authenticate to Azure, you'll need to create a User-Assigned Managed Identity and configure it for Workload Identity.</p>
<p>Create a new User-Assigned Managed Identity in the same resource group as your AKS cluster and retrieve its principal ID and client ID which will be used to configure Azure role assignment and Service Account annotation.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">read -r PRINCIPAL_ID CLIENT_ID &lt;&lt;&lt; \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "$(az identity create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --resource-group $RESOURCE_GROUP_NAME \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --name $MANAGED_IDENTITY_NAME \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    --query '{principalId:principalId, clientId:clientId}' -o tsv)"</span><br></span></code></pre></div></div>
<p>Create a federated identity credential for the managed identity. This allows the Service Account in the AKS cluster to authenticate as the managed identity.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az identity federated-credential create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name $MANAGED_IDENTITY_NAME \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --identity-name $MANAGED_IDENTITY_NAME \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group $RESOURCE_GROUP_NAME \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --issuer $AKS_OIDC_ISSUER_URL \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --subject system:serviceaccount:default:aks-mcp \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --audiences api://AzureADTokenExchange</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>The following parameters used in the command above:</p><ul>
<li><code>--issuer</code>: The OIDC issuer URL of the AKS cluster. This is obtained from the AKS cluster properties.</li>
<li><code>--subject</code>: The subject claim that identifies the Service Account in the AKS cluster. In this case, it's <code>system:serviceaccount:default:aks-mcp</code>, which corresponds to the <code>aks-mcp</code> Service Account in the <code>default</code> namespace.</li>
<li><code>--audiences</code>: The audience for the token. For Workload Identity, this should be <code>api://AzureADTokenExchange</code>.</li>
</ul></div></div>
<p>Grant the managed identity the <strong>Contributor</strong> role for the resource group so it can manage resources associated with the cluster. This includes permissions to manage the AKS cluster itself as well as any other resources in the resource group such as monitoring, networking, and storage resources.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az role assignment create \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --assignee $PRINCIPAL_ID \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --role "Contributor" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --scope $RESOURCE_GROUP_ID</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-danger admonition_xJq3 alert alert--danger"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M5.05.31c.81 2.17.41 3.38-.52 4.31C3.55 5.67 1.98 6.45.9 7.98c-1.45 2.05-1.7 6.53 3.53 7.7-2.2-1.16-2.67-4.52-.3-6.61-.61 2.03.53 3.33 1.94 2.86 1.39-.47 2.3.53 2.27 1.67-.02.78-.31 1.44-1.13 1.81 3.42-.59 4.78-3.42 4.78-5.56 0-2.84-2.53-3.22-1.25-5.61-1.52.13-2.03 1.13-1.89 2.75.09 1.08-1.02 1.8-1.86 1.33-.67-.41-.66-1.19-.06-1.78C8.18 5.31 8.68 2.45 5.05.32L5.03.3l.02.01z"></path></svg></span>danger</div><div class="admonitionContent_BuS1"><p>You are granting the managed identity a broad set of permissions that may exceed the minimum required for your MCP server to function. This is only for demonstration purposes. In a production environment, you should follow the principle of least privilege and assign only the necessary permissions required for your use case.</p></div></div>
<p>Connect to the AKS cluster.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az aks get-credentials \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --resource-group $RESOURCE_GROUP_NAME \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --name $AKS_CLUSTER_NAME</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>If you don't have <code>kubectl</code> installed, you can use the Azure Cloud Shell in the Azure portal, which has <code>kubectl</code> pre-installed or use the <code>az aks install-cli</code> command to install <code>kubectl</code>.</p></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="deploy-kubernetes-resources">Deploy Kubernetes resources<a href="https://blog.aks.azure.com/2025/10/22/deploy-mcp-server-aks-workload-identity#deploy-kubernetes-resources" class="hash-link" aria-label="Direct link to Deploy Kubernetes resources" title="Direct link to Deploy Kubernetes resources" translate="no">​</a></h2>
<p>Create a Service Account in the AKS cluster and annotate it with the managed identity client ID.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f - &lt;&lt;EOF</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: ServiceAccount</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name: aks-mcp</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  annotations:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    azure.workload.identity/client-id: $CLIENT_ID</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>The annotation <code>azure.workload.identity/client-id</code> is used to associate the Service Account with the managed identity. This allows the pods that use this Service Account to authenticate to Azure using the managed identity.</p></div></div>
<p>Create a ClusterRoleBinding to bind the <code>cluster-admin</code> role to the Service Account. This allows the MCP server to manage resources in the AKS cluster.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f - &lt;&lt;EOF</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: rbac.authorization.k8s.io/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: ClusterRoleBinding</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name: aks-mcp-cluster-admin</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">roleRef:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  apiGroup: rbac.authorization.k8s.io</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  kind: ClusterRole</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name: cluster-admin</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">subjects:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  - kind: ServiceAccount</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    name: aks-mcp</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    namespace: default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-danger admonition_xJq3 alert alert--danger"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M5.05.31c.81 2.17.41 3.38-.52 4.31C3.55 5.67 1.98 6.45.9 7.98c-1.45 2.05-1.7 6.53 3.53 7.7-2.2-1.16-2.67-4.52-.3-6.61-.61 2.03.53 3.33 1.94 2.86 1.39-.47 2.3.53 2.27 1.67-.02.78-.31 1.44-1.13 1.81 3.42-.59 4.78-3.42 4.78-5.56 0-2.84-2.53-3.22-1.25-5.61-1.52.13-2.03 1.13-1.89 2.75.09 1.08-1.02 1.8-1.86 1.33-.67-.41-.66-1.19-.06-1.78C8.18 5.31 8.68 2.45 5.05.32L5.03.3l.02.01z"></path></svg></span>danger</div><div class="admonitionContent_BuS1"><p>You are granting the Service Account a broad set of permissions that may exceed the minimum required for your MCP server to function. This is only for demonstration purposes. In a production environment, you should follow the principle of least privilege and assign only the necessary permissions required for your use case.</p></div></div>
<p>Deploy the AKS MCP server as a Deployment and Service in the AKS cluster.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f - &lt;&lt;EOF</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: apps/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: Deployment</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name: aks-mcp</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  replicas: 1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  selector:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    matchLabels:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      app: aks-mcp</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  template:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      labels:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        app: aks-mcp</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        azure.workload.identity/use: "true"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      serviceAccountName: aks-mcp</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      containers:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      - name: aks-mcp</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        image: ghcr.io/azure/aks-mcp:v0.0.9</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        args:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          - --access-level=readwrite</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          - --transport=streamable-http</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          - --host=0.0.0.0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          - --port=8000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          - --timeout=600</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ports:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        - containerPort: 8000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          name: http</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        resources: {}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>Let's break down some of the key parameters used in the Deployment manifest above:</p><ul>
<li><code>azure.workload.identity/use: "true"</code>: This label indicates that the pod should use Workload Identity for authentication.</li>
<li><code>serviceAccountName: aks-mcp</code>: This specifies the Service Account that the pod will use. This Service Account is annotated with the managed identity client ID, allowing the pod to authenticate to Azure using the managed identity.</li>
<li><code>image: ghcr.io/azure/aks-mcp:v0.0.9</code>: This is the container image for the AKS MCP server. You can check for the latest version on the <a href="https://github.com/Azure/aks-mcp/releases" target="_blank" rel="noopener noreferrer">AKS MCP releases page</a>.</li>
<li><code>args</code>: These are the command-line arguments passed to the MCP server:<!-- -->
<ul>
<li><code>--access-level=readwrite</code>: This sets the access level of the MCP server to read-write, allowing it to manage resources.</li>
<li><code>--transport=streamable-http</code>: This configures the MCP server to use Streamable HTTP as the transport protocol.</li>
<li><code>--host=0.0.0.0</code>: This binds the MCP server to all network interfaces in the pod.</li>
<li><code>--port=8000</code>: This sets the port on which the MCP server will listen for incoming requests.</li>
<li><code>--timeout=600</code>: This sets the timeout for requests to 600 seconds.</li>
</ul>
</li>
</ul><p>You can refer to the <a href="https://github.com/Azure/aks-mcp?tab=readme-ov-file#options" target="_blank" rel="noopener noreferrer">AKS MCP documentation</a> for additional configuration options.</p></div></div>
<div class="theme-admonition theme-admonition-danger admonition_xJq3 alert alert--danger"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M5.05.31c.81 2.17.41 3.38-.52 4.31C3.55 5.67 1.98 6.45.9 7.98c-1.45 2.05-1.7 6.53 3.53 7.7-2.2-1.16-2.67-4.52-.3-6.61-.61 2.03.53 3.33 1.94 2.86 1.39-.47 2.3.53 2.27 1.67-.02.78-.31 1.44-1.13 1.81 3.42-.59 4.78-3.42 4.78-5.56 0-2.84-2.53-3.22-1.25-5.61-1.52.13-2.03 1.13-1.89 2.75.09 1.08-1.02 1.8-1.86 1.33-.67-.41-.66-1.19-.06-1.78C8.18 5.31 8.68 2.45 5.05.32L5.03.3l.02.01z"></path></svg></span>danger</div><div class="admonitionContent_BuS1"><p>The AKS MCP server is deployed with <strong>readwrite</strong> access, which enables the MCP server to manage resources in the cluster and on Azure. This is only for demonstration purposes. In a production environment, you should follow the principle of least privilege and configure the MCP server with only the necessary access level required for your use case.</p></div></div>
<p>Wait for the MCP server pod to be in the <code>Running</code> state.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl get pods -l app=aks-mcp -w</span><br></span></code></pre></div></div>
<p>You can also check the logs of the MCP server pod to ensure it's running correctly. Here is an example of what the logs should look like:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">$ kubectl logs deploy/aks-mcp</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:02" level=info msg="Initializing AKS MCP service..."</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:02" level=info msg="Azure client initialized successfully"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="Azure CLI initialized successfully (federated_token)"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="MCP server initialized successfully"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="Registering Azure Components..."</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="Azure Components registered successfully"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="Registering Kubernetes Components..."</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="No optional Kubernetes components enabled"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="Kubernetes Components registered successfully"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="Registering Prompts..."</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="AKS MCP service initialization completed successfully"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="AKS MCP version: 0.0.9+1758679679"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="Streamable HTTP server listening on 0.0.0.0:8000"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="MCP endpoint available at: http://0.0.0.0:8000/mcp"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">time="02:57:06" level=info msg="Send POST requests to /mcp to initialize session and obtain Mcp-Session-Id"</span><br></span></code></pre></div></div>
<p>Note the logs indicate that the MCP server is using the federated token for authentication, which means that Workload Identity is working correctly.</p>
<p>Finally, create a Service to expose the AKS MCP server.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f - &lt;&lt;EOF</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">apiVersion: v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kind: Service</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">metadata:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  name: aks-mcp</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">spec:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  type: ClusterIP</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ports:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  - port: 8000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    targetPort: 8000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    protocol: TCP</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    name: http</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  selector:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    app: aks-mcp</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EOF</span><br></span></code></pre></div></div>
<p>This will create an internal ClusterIP service that exposes the MCP server on port 8000.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="verify-the-deployment">Verify the deployment<a href="https://blog.aks.azure.com/2025/10/22/deploy-mcp-server-aks-workload-identity#verify-the-deployment" class="hash-link" aria-label="Direct link to Verify the deployment" title="Direct link to Verify the deployment" translate="no">​</a></h2>
<p>Now, we can verify the MCP server is working correctly by port-forwarding the service to your local machine and using the <a href="https://github.com/modelcontextprotocol/inspector" target="_blank" rel="noopener noreferrer">MCP Inspector</a> tool to connect to it.</p>
<p>Run the following command to port-forward the AKS MCP service to your local machine and append the <code>&amp;</code> at the end to move the process to the background.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">kubectl port-forward svc/aks-mcp 8000:8000 &amp;</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>This service is only accessible within the AKS cluster. Port-forwarding allows you to access the service from your local machine for testing purposes. In a production environment, you would typically expose the service using an Ingress controller. See the AKS documentation on <a href="https://learn.microsoft.com/azure/aks/app-routing" target="_blank" rel="noopener noreferrer">managed NGINX ingress with the application routing add-on</a> for more details.</p></div></div>
<p>To test the MCP server, we'll use the MCP Inspector tool, which is a web-based tool that allows you to interact with the MCP server.</p>
<p>Run the following command to start the MCP Inspector tool.</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">npx @modelcontextprotocol/inspector</span><br></span></code></pre></div></div>
<p>The command will install and run the MCP Inspector tool. Once it's running, your default web browser should automatically open to <code>http://localhost:6274</code>. If it doesn't, you can manually open your browser and navigate to the URL provided in the terminal output.</p>
<p>In the MCP Inspector tool, you will need to populate a few fields to connect:</p>
<ul>
<li><strong>Transport Type</strong>: Select <code>Streamable HTTP</code>.</li>
<li><strong>URL</strong>: Enter <code>http://localhost:8000/mcp</code>.</li>
</ul>
<p><img decoding="async" loading="lazy" alt="MCP Inspector connection settings" src="https://blog.aks.azure.com/assets/images/mcp-inspector-connection-2b905d148c94054cebf386849e3f445d.png" width="2495" height="1379" class="img_ev3q"></p>
<p>Click the "Connect" button to connect to the AKS MCP server.</p>
<p>In the menu on the top of the page, click on "<strong>Tools</strong>" and then click on "<strong>List Tools</strong>" to see the list of available tools.</p>
<p><img decoding="async" loading="lazy" alt="Tools list in MCP Inspector" src="https://blog.aks.azure.com/assets/images/mcp-inspector-tools-ecf746ad08d7a6ce4aa2d6a428125139.png" width="2495" height="1379" class="img_ev3q"></p>
<p>Click on the "<strong>az_aks_operations</strong>" tool to see the details of the tool.</p>
<p><img decoding="async" loading="lazy" alt="az_aks_operations tool in MCP Inspector" src="https://blog.aks.azure.com/assets/images/mcp-inspector-az-aks-operations-19ce57053539d58a5c70f036f5a7ed6f.png" width="2495" height="1379" class="img_ev3q"></p>
<p>In the panel to the right, you can enter the arguments for the tool. Enter the following arguments to list the AKS clusters in your subscription:</p>
<ul>
<li><strong>operation</strong>: Enter <code>list</code>.</li>
<li><strong>resource_type</strong>: Enter <code>cluster</code>.</li>
</ul>
<p><img decoding="async" loading="lazy" alt="az_aks_operations tool arguments in MCP Inspector" src="https://blog.aks.azure.com/assets/images/mcp-inspector-az-aks-operations-args-7f2fb997b5aba84cf0b9edc8bd92699e.png" width="2495" height="1379" class="img_ev3q"></p>
<p>Click the <strong>Run Tool</strong> button to run the tool.</p>
<p><img decoding="async" loading="lazy" alt="az_aks_operations tool output in MCP Inspector" src="https://blog.aks.azure.com/assets/images/mcp-inspector-az-aks-operations-output-1a769bf6f8adbe160472969e6a5b861a.png" width="2495" height="1379" class="img_ev3q"></p>
<p>You should see the list of AKS clusters in your subscription.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="conclusion">Conclusion<a href="https://blog.aks.azure.com/2025/10/22/deploy-mcp-server-aks-workload-identity#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Congratulations! You've successfully deployed the AKS MCP server on AKS as a Streamable HTTP service with Workload Identity.</p>
<p>In this walkthrough, you've learned how to:</p>
<ul>
<li>Enable Workload Identity on an AKS cluster for secure, passwordless authentication</li>
<li>Configure a User-Assigned Managed Identity with federated credentials</li>
<li>Deploy the AKS MCP server as a containerized workload with fine-grained Azure and Kubernetes permissions</li>
<li>Test the deployment using the MCP Inspector tool</li>
</ul>
<p>More importantly, you've seen a deployment pattern that extends beyond just the AKS MCP server. This same approach can be applied to deploy any MCP server as a Streamable HTTP service on AKS, creating a centralized hub where multiple clients—from AI assistants to autonomous agents—can connect and interact with your infrastructure tools.</p>
<p>This pattern is particularly valuable when:</p>
<ul>
<li>Multiple teams need access to the same MCP server capabilities</li>
<li>You want to reduce the overhead of managing individual client-side installations</li>
<li>Your use case involves autonomous agents or automated workflows that need programmatic access to MCP tools</li>
<li>You require enterprise-grade monitoring, logging, and security controls</li>
</ul>
<p>However, keep in mind that this is one approach among many. Depending on your specific requirements, you might choose local installations for simplicity, serverless deployments for cost optimization, or hybrid approaches that combine multiple patterns.</p>
<p>Remember, the configuration used in this guide grants broad permissions for demonstration purposes. In production scenarios, always follow the principle of least privilege by granting only the specific permissions your MCP server needs to operate.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's next<a href="https://blog.aks.azure.com/2025/10/22/deploy-mcp-server-aks-workload-identity#whats-next" class="hash-link" aria-label="Direct link to What's next" title="Direct link to What's next" translate="no">​</a></h2>
<p>Now that you have the AKS MCP server running on AKS, I'll leave it to you to explore the various tools and capabilities available through the MCP server. Consider these next steps:</p>
<ul>
<li><strong>Expand to other MCP servers</strong>: Apply this same pattern to deploy other MCP servers (database tools, cloud providers, etc.) creating a centralized MCP hub</li>
<li><strong>Configure monitoring and logging</strong>: Set up Azure Monitor and Container Insights to track the health and performance of your MCP servers, and enable centralized audit logging for governance and compliance</li>
<li><strong>Implement network policies</strong>: Add Kubernetes Network Policies to control traffic flow to and from your MCP servers</li>
<li><strong>Set up ingress</strong>: Replace the port-forward approach with an Ingress controller to enable remote access for multiple clients or autonomous agents</li>
<li><strong>Scale for production</strong>: Adjust replica counts, resource requests/limits, and implement autoscaling based on your workload requirements</li>
</ul>
<p>One area I'm particularly excited about is how autonomous agents can leverage this centralized deployment to orchestrate complex infrastructure operations. Stay tuned for a future post where I'll dive into using this AKS MCP server deployment with autonomous agents to automate real-world scenarios.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="troubleshooting">Troubleshooting<a href="https://blog.aks.azure.com/2025/10/22/deploy-mcp-server-aks-workload-identity#troubleshooting" class="hash-link" aria-label="Direct link to Troubleshooting" title="Direct link to Troubleshooting" translate="no">​</a></h2>
<p>If you encounter a tool call error with a message stating <em>Please run <code>az login</code> to setup account</em>, it means that the Workload Identity was not properly configured for the Service Account. Double-check the following:</p>
<ul>
<li>Ensure that the federated identity credential was created with the correct issuer URL, subject, and audiences.</li>
<li>Verify that the Service Account is annotated with the correct client ID of the managed identity.</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="cleanup">Cleanup<a href="https://blog.aks.azure.com/2025/10/22/deploy-mcp-server-aks-workload-identity#cleanup" class="hash-link" aria-label="Direct link to Cleanup" title="Direct link to Cleanup" translate="no">​</a></h2>
<p>To stop the MCP Inspector tool, press <code>Ctrl+C</code> in the terminal where it is running.</p>
<p>To stop the port forwarding, run the following command:</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">pkill -f 'kubectl port-forward'</span><br></span></code></pre></div></div>
<p>To delete the resources created in this post, run the following command:</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">az group delete --name $RESOURCE_GROUP_NAME --yes --no-wait</span><br></span></code></pre></div></div>]]></content:encoded>
            <category>AKS-MCP</category>
            <category>AI</category>
            <category>MCP</category>
            <category>Workload Identity</category>
        </item>
    </channel>
</rss>