Skip to main content

6 posts tagged with "Open Source"

Open source projects, community tooling, and contributions related to AKS.

View All Tags

Scaling multi-node LLM inference with NVIDIA Dynamo and NVIDIA GPUs on AKS (Part 3)

· 8 min read
Sachi Desai
Product Manager for AI/ML, GPU workloads on Azure Kubernetes Service
Devi Vasudevan
Principal Engineering Manager for Azure Upstream

This blog post is co-authored with Nikhar Maheshwari, Anish Maddipoti, Rohan Varma, Clement Pakkam Isaac, and Stephen Mccoulough from NVIDIA.

We kicked things off in Part 1 by introducing NVIDIA Dynamo on AKS and demonstrating 1.2 million tokens per second across 10 GPU nodes of GB200 NVL72. In Part 2, we explored the Dynamo Planner and Profiler for SLO-driven scaling.

In this blog, we explore how Dynamo’s KV Router makes multi-worker LLM deployments significantly more efficient, demonstrating over 20x faster Time To First Token (TTFT) and over 4x faster end-to-end latency on real-world production traces. These latency reductions not only improve the end-user experience but also maximize GPU utilization and lower the Total Cost of Ownership (TCO).

Scaling multi-node LLM inference with NVIDIA Dynamo and NVIDIA GPUs on AKS (Part 2)

· 8 min read
Sachi Desai
Product Manager for AI/ML, GPU workloads on Azure Kubernetes Service
Sertaç Özercan
Principal Engineering Manager for Azure Upstream

This blog post is co-authored with Saurabh Aggarwal, Anish Maddipoti, Amr Elmeleegy, and Rohan Varma from NVIDIA.

In our previous post, we demonstrated the power of the Azure ND GB200-v6 VMs accelerated by NVIDIA GB200 NVL72, achieving a staggering 1.2M tokens per second across 10 nodes using NVIDIA Dynamo. Today, we're shifting focus from raw throughput to developer velocity and operational efficiency.

We will explore how the Dynamo Planner and Dynamo Profiler remove the guesswork from performance tuning on AKS.

Observe Smarter: Leveraging Real-Time insights via the AKS-MCP Server

· 9 min read
Qasim Sarfraz
Software Engineer at Microsoft

Introduction

Recently, we released the AKS-MCP server, which enables AKS customers to automate diagnostics, troubleshooting, and cluster management using natural language. One of its key capabilities is real-time observability using inspektor_gadget_observability MCP tool, which leverages a technology called eBPF to help customers quickly inspect and debug applications running in AKS clusters.

Announcing the CLI Agent for AKS: Agentic AI-powered operations and diagnostics at your fingertips

· 9 min read
Pavneet Ahluwalia
Principal PM Lead for the Azure Kubernetes Service
Julia Yin
Product Manager at Microsoft
Aritra Ghosh
Senior Product Manager at Microsoft

At KubeCon India earlier this month, the AKS team shared our newest Agentic AI-powered feature with the broader Kubernetes community: the CLI Agent for AKS. CLI Agent for AKS is a new AI-powered command-line experience designed to help Azure Kubernetes Service (AKS) users troubleshoot, optimize, and operate their clusters with unprecedented ease and intelligence.

Announcing the AKS-MCP Server: Unlock Intelligent Kubernetes Operations

· 9 min read
Pavneet Ahluwalia
Principal PM Lead for the Azure Kubernetes Service

We're excited to announce the launch of the AKS-MCP Server. An open source Model Context Protocol (MCP) server designed to make your Azure Kubernetes Service (AKS) clusters AI-native and more accessible for developers, SREs, and platform engineers through Agentic AI workflows.

AKS-MCP isn't just another integration layer. It empowers cutting-edge AI assistants (such as Claude, Cursor, and GitHub Copilot) to interact with AKS through a secure, standards-based protocol—opening new possibilities for automation, observability, and collaborative cloud operations.

Streamlining Temporal Worker Deployments on AKS

· 6 min read
Steve Womack
Solutions Architect at Temporal
Brian Redmond
AKS and Azure Cloud Native Platforms

Temporal is an open source platform that helps developers build and scale resilient Enterprise and AI applications. Complex and long-running processes are easily orchestrated with durable execution, ensuring they never fail or lose state. Every step is tracked in an Event History that lets developers easily observe and debug applications. In this guide, we will help you understand how to run and scale your workers on Azure Kubernetes Service (AKS).