Skip to main content

8 posts tagged with "Networking"

Cluster and application networking, eBPF, traffic management, and connectivity patterns.

View All Tags

Accelerate DNS Performance with LocalDNS

· 6 min read
Vaibhav Arora
Product Manager for Azure Kubernetes Service

DNS performance issues can cripple production Kubernetes clusters, causing application timeouts and service outages. LocalDNS in AKS solves this by moving DNS resolution directly to each node, delivering 10x faster queries and improved reliability. In this post, we share the results from our internal tests showing exactly how much of an improvement LocalDNS can make and how it can benefit your cluster.

Debugging DNS in AKS with Inspektor Gadget

· 7 min read
Jose Blanquicet
Senior Software Engineer at Microsoft

If you're reading this, you likely have heard the phrase "It's always DNS." This is a common joke amongst developers that the root of many issues is related to DNS.

In this blog we aim to empower you to identify the root cause of DNS issues and get back to green. You can also watch the video walkthrough from Microsoft Build Breakout Session #181 starting at the 5-minute mark.

Performance Tuning AKS for Network Intensive Workloads

· 6 min read
Anson Qian
Software Engineer at Microsoft
Alyssa Vu
Software Engineer at Microsoft

As more intelligent applications are deployed and hosted on Azure Kubernetes Service (AKS), network performance becomes increasingly critical to ensuring a seamless user experience. For example, a chatbot server running in an AKS cluster needs to handle high volumes of network traffic with low latency, while retrieving contextual data — such as conversation history and user feedback from a database or cache, and interacting with a LLM (Large Language Model) endpoint through prompt requests and streamed inference responses.

In this blog post, we share how we conducted simple benchmarks to evaluate and compare network performance across various VM (Virtual Machine) SKUs and series. We also provide recommendations on key kernel settings to help you explore the trade-offs between network performance and resource usage.

Simplifying InfiniBand on AKS

· 5 min read
Sachi Desai
Product Manager for AI/ML, GPU workloads on Azure Kubernetes Service
Suraj Deshmukh
Software Engineer at Microsoft
Ernest Wong
Software Engineer at Microsoft

High performance computing (HPC) workloads, like large-scale distributed AI training and inferencing, often require fast, reliable data transfer and synchronization across the underlying compute. Model training, for example, requires shared memory across GPUs because the parameters and gradients need to be constantly shared. For models with billions of parameters, the available memory in a single GPU node may not be enough, so "pooling" the memory across multiple nodes also requires high memory bandwidth due to the sheer volume of data involved. A common way to achieve this at scale is with a high-speed, low-latency network interconnect technology called InfiniBand (IB).

Optimize AKS Traffic with externalTrafficPolicy Local

· 10 min read
Mitch Shao
Senior Software Engineer for Azure Kubernetes Service
Vaibhav Arora
Product Manager for Azure Kubernetes Service

Managing external traffic in Kubernetes clusters can be a complex task, especially when striving to maintain service reliability, optimize performance, and ensure seamless user experiences. With the increasing adoption of Kubernetes in production environments, understanding and implementing best practices for external traffic management when using the Azure Load Balancer has become essential.

End-to-End TLS Encryption with AKS App Routing and AFD

· 14 min read
Steve Griffith
Microsoft App Innovation Global Blackbelt team

When running globally distributed public applications in Kubernetes, having access to a global traffic management solution is critical to ensuring high availability and security at the edge. Fortunately, Azure Front Door provides an easy-to-use global traffic routing capability, with integrated Content Delivery Network and Web Application Firewall.

Local Development on AKS with mirrord

· 11 min read
Gemma Tipper
Software Engineer at MetalBear
Quentin Petraroia
Product Manager for Azure Kubernetes Service

Developing applications for Kubernetes can mean a lot of time spent waiting and relatively little time spent writing code. Whenever you want to test your code changes in the cluster, you usually have to build your application, deploy it to the cluster, and attach a remote debugger (or add a bunch of logs). These iterations can be incredibly time-consuming. Thankfully, there is a way to bridge the gap between your local environment and a remote cluster, making them feel seamlessly connected. mirrord, which can be used as a plugin for VSCode or IntelliJ or directly in the CLI, is an open-source tool that does exactly that (and much more).