8 posts tagged with "Networking"

Cluster and application networking, eBPF, traffic management, and connectivity patterns.

Accelerate DNS Performance with LocalDNS

August 4, 2025 · 6 min read

Product Manager for Azure Kubernetes Service

DNS performance issues can cripple production Kubernetes clusters, causing application timeouts and service outages. LocalDNS in AKS solves this by moving DNS resolution directly to each node, delivering 10x faster queries and improved reliability. In this post, we share the results from our internal tests showing exactly how much of an improvement LocalDNS can make and how it can benefit your cluster.

Debugging DNS in AKS with Inspektor Gadget

July 23, 2025 · 7 min read

Jose Blanquicet

Senior Software Engineer at Microsoft

If you're reading this, you likely have heard the phrase "It's always DNS." This is a common joke amongst developers that the root of many issues is related to DNS.

In this blog we aim to empower you to identify the root cause of DNS issues and get back to green. You can also watch the video walkthrough from Microsoft Build Breakout Session #181 starting at the 5-minute mark.

Performance Tuning AKS for Network Intensive Workloads

July 15, 2025 · 6 min read

Anson Qian

Software Engineer at Microsoft

Alyssa Vu

Software Engineer at Microsoft

As more intelligent applications are deployed and hosted on Azure Kubernetes Service (AKS), network performance becomes increasingly critical to ensuring a seamless user experience. For example, a chatbot server running in an AKS cluster needs to handle high volumes of network traffic with low latency, while retrieving contextual data — such as conversation history and user feedback from a database or cache, and interacting with a LLM (Large Language Model) endpoint through prompt requests and streamed inference responses.

In this blog post, we share how we conducted simple benchmarks to evaluate and compare network performance across various VM (Virtual Machine) SKUs and series. We also provide recommendations on key kernel settings to help you explore the trade-offs between network performance and resource usage.

Simplifying InfiniBand on AKS

April 11, 2025 · 5 min read

Sachi Desai

Product Manager for AI/ML, GPU workloads on Azure Kubernetes Service

Suraj Deshmukh

Software Engineer at Microsoft

Ernest Wong

Software Engineer at Microsoft

High performance computing (HPC) workloads, like large-scale distributed AI training and inferencing, often require fast, reliable data transfer and synchronization across the underlying compute. Model training, for example, requires shared memory across GPUs because the parameters and gradients need to be constantly shared. For models with billions of parameters, the available memory in a single GPU node may not be enough, so "pooling" the memory across multiple nodes also requires high memory bandwidth due to the sheer volume of data involved. A common way to achieve this at scale is with a high-speed, low-latency network interconnect technology called InfiniBand (IB).

Optimize AKS Traffic with externalTrafficPolicy Local

April 4, 2025 · 10 min read

Mitch Shao

Senior Software Engineer for Azure Kubernetes Service

Vaibhav Arora

Product Manager for Azure Kubernetes Service

Managing external traffic in Kubernetes clusters can be a complex task, especially when striving to maintain service reliability, optimize performance, and ensure seamless user experiences. With the increasing adoption of Kubernetes in production environments, understanding and implementing best practices for external traffic management when using the Azure Load Balancer has become essential.

End-to-End TLS Encryption with AKS App Routing and AFD

February 28, 2025 · 14 min read

Steve Griffith

Microsoft App Innovation Global Blackbelt team

When running globally distributed public applications in Kubernetes, having access to a global traffic management solution is critical to ensuring high availability and security at the edge. Fortunately, Azure Front Door provides an easy-to-use global traffic routing capability, with integrated Content Delivery Network and Web Application Firewall.

Local Development on AKS with mirrord

December 4, 2024 · 11 min read

Gemma Tipper

Software Engineer at MetalBear

Quentin Petraroia

Product Manager for Azure Kubernetes Service

Developing applications for Kubernetes can mean a lot of time spent waiting and relatively little time spent writing code. Whenever you want to test your code changes in the cluster, you usually have to build your application, deploy it to the cluster, and attach a remote debugger (or add a bunch of logs). These iterations can be incredibly time-consuming. Thankfully, there is a way to bridge the gap between your local environment and a remote cluster, making them feel seamlessly connected. mirrord, which can be used as a plugin for VSCode or IntelliJ or directly in the CLI, is an open-source tool that does exactly that (and much more).

Service Connector Introduction

May 23, 2024 · 2 min read

Coco Wang

AKS Product Manager

Workloads deployed on an Azure Kubernetes Service (AKS) cluster often need to access Azure backing resources, such as Azure Key Vault, databases, or AI services like Azure OpenAI Service. Users are required to manually configure Microsoft Entra Workload ID or Managed Identities so their AKS workloads can securely access these protected resources.