16 posts tagged with "Operations"

Operational best practices and management strategies for AKS.

View All Tags

Recommendations for container and security optimized OS options on Azure Kubernetes Service (AKS)

November 20, 2025 · 7 min read

Ally Ford

Product Manager 2 at Microsoft

Thilo Fromm

Principal Software Engineering Manager at Microsoft

Sudhanva Huruli

Principal Program Manager at Microsoft

Selecting an operating system for your Kubernetes deployments may appear straightforward; however, this decision can significantly influence both security and operational complexity. In this blog, we’ll share key recommendations to help you select a container optimized OS for your AKS deployments.

Collecting Custom Metrics on AKS with Telegraf

November 6, 2025 · 13 min read

Diego casati

Microsoft App Innovation Global Blackbelt team

What if you need to collect your own custom metrics from workloads or nodes in AKS, but don't want to run a full monitoring stack? In this post, we will discuss how to integrate custom metrics into Azure's managed monitoring stack with minimal setup using Telegraf DaemonSet, for flexible metric collection, Azure Monitor managed service for Prometheus, for scraping and storage, and Azure Managed Grafana for visualization and alerting.

Recommendations for Major OS Version Upgrades with Azure Kubernetes Service (AKS)

October 7, 2025 · 11 min read

Flora Taagen

Product Manager 2 at Microsoft

Ally Ford

Product Manager 2 at Microsoft

Introduction

Upgrading the operating system version on your AKS nodes is a critical step that can impact workload security, stability, and performance. In this blog, we’ll share key recommendations to help you plan and execute major OS version upgrades smoothly and confidently on AKS.

Azure Monitor dashboards with Grafana in Azure Portal

September 18, 2025 · 7 min read

Aritra Ghosh

Senior Product Manager at Microsoft

Kayode Prince

Senior Program Manager at Microsoft

Introduction

As Kubernetes adoption accelerates, engineers need streamlined, cost-effective tools for cluster observability. Until now, this often meant deploying and managing separate monitoring stacks. Azure Monitor's latest integration with Grafana changes this: cluster insights are now just a click away in the Azure portal.

Announcing the CLI Agent for AKS: Agentic AI-powered operations and diagnostics at your fingertips

August 15, 2025 · 9 min read

Pavneet Ahluwalia

Principal PM Lead for the Azure Kubernetes Service

Julia Yin

Product Manager at Microsoft

Aritra Ghosh

Senior Product Manager at Microsoft

At KubeCon India earlier this month, the AKS team shared our newest Agentic AI-powered feature with the broader Kubernetes community: the CLI Agent for AKS. CLI Agent for AKS is a new AI-powered command-line experience designed to help Azure Kubernetes Service (AKS) users troubleshoot, optimize, and operate their clusters with unprecedented ease and intelligence.

Announcing the AKS-MCP Server: Unlock Intelligent Kubernetes Operations

August 6, 2025 · 9 min read

Pavneet Ahluwalia

Principal PM Lead for the Azure Kubernetes Service

We're excited to announce the launch of the AKS-MCP Server. An open source Model Context Protocol (MCP) server designed to make your Azure Kubernetes Service (AKS) clusters AI-native and more accessible for developers, SREs, and platform engineers through Agentic AI workflows.

AKS-MCP isn't just another integration layer. It empowers cutting-edge AI assistants (such as Claude, Cursor, and GitHub Copilot) to interact with AKS through a secure, standards-based protocol—opening new possibilities for automation, observability, and collaborative cloud operations.

Using Stream Analytics to Filter AKS Control Plane Logs

May 30, 2025 · 11 min read

Steve Griffith

Microsoft App Innovation Global Blackbelt team

While AKS does not provide access to the cluster's managed control plane, it does provide access to the control plane component logs via diagnostic settings. The easiest option to persist and search this data is to send it directly to Azure Log Analytics, however there is a large amount of data in those logs, which makes it cost prohibitive in Log Analytics. Alternatively, you can send all the data to an Azure Storage Account, but then searching and alerting can be challenging.

To address the above challenge, one option is to stream the data to Azure Event Hub, which then gives you the option to use Azure Stream Analytics to filter out events that you deem important and then just store the rest in cheaper storage (ex. Azure Storage) for potential future diagnostic needs.

In this walkthrough we'll create an AKS cluster, enable diagnostic logging to Azure Stream Analytics and then demonstrate how to filter out some key records.

Azure VM Generations and AKS

April 23, 2025 · 6 min read

Jack Jiang

Product Manager at Microsoft

Ally Ford

Product Manager 2 at Microsoft

Sarah Zhou

Product Manager at Microsoft

What are Virtual Machine Generations?

If you are a user of Azure, you may be familiar with virtual machines. What you may not have known is the fact that Azure now offers two generations of virtual machines!

Before going further, let's first break down virtual machines. Azure virtual machines are offered in various "sizes," which are broken down by the amount and type of each resource allocated, such as CPU, memory, storage, and network bandwidth. These resources are tied to a portion of a physical server's hardware capabilities. Physical servers may be broken down into many different VM size series or configurations available utilizing its resources.

As the physical hardware ages and newer components become available, older hardware and VMs get retired, while newer generation hardware and VM products are made available.

In this blog, we will go over Generation 1 and newer Generation 2 virtual machines. Both have their own use cases, and picking the right one to suit your workloads is critical in ensuring you get the best possible experience, capabilities, and cost.

Enhancing Your Operating System's Security with OS Security Patches in AKS

April 22, 2025 · 6 min read

Kaarthikeyan Subramanian

Senior Product Manager for the Azure Kubernetes Service

Traditional patching and the need for Managed patching

Operating System (OS) security patches are critical for safeguarding systems against vulnerabilities that malicious actors could exploit. These patches help ensure your system remains protected against emerging threats. Traditionally, customers have relied on nightly updates, such as unattended upgrades in Ubuntu or Automatic Guest OS Patching at the virtual machine (VM) level. However, when kernel security packages were updated, a host machine reboot was often required, typically managed using tools like kured.

Limitless Kubernetes Scaling for AI and Data-intensive Workloads: The AKS Fleet Strategy

April 2, 2025 · 7 min read

Pavneet Ahluwalia

Principal PM Lead for the Azure Kubernetes Service

With the fast-paced advancement of AI workloads, building and fine-tuning of multi-modal models, and extensive batch data processing jobs, more and more enterprises are leaning into Kubernetes platforms to take advantage of its ability to scale and optimize compute resources. With AKS, you can manage up to 5,000 nodes (upstream K8s limit) in a single cluster under optimal conditions, but for some large enterprises, that might not be enough.

Building Community with CRDs: Kube Resource Orchestrator

January 30, 2025 · 3 min read

Matthew Christopher

Bridget Kromhout

Principal Product Manager at Microsoft Azure

Kube Resource Orchestrator introduces a Kubernetes-native, cloud-agnostic way to define groupings of Kubernetes resources. With kro, you can group your applications and their dependencies as a single resource that can be easily consumed by end users.

Just as we collaborate in upstream Kubernetes, Azure is partnering with AWS and Google Cloud on kro (pronounced “crow”) to make Kubernetes APIs simpler for all Kubernetes users. We’re centering the needs of customers and the cloud native community to offer tooling that works seamlessly no matter where you run your K8s clusters.

Multi-Cluster Management with KubeFleet

January 27, 2025 · 4 min read

Ryan Zhang

Principal Software Engineer for Azure Kubernetes Service

In the ever-evolving world of cloud native technologies, managing multiple Kubernetes clusters efficiently is a common challenge that still does not have a well received community driven solution.

Apache Airflow Guidance for AKS

January 20, 2025 · 2 min read

Kenneth Kilty

Technical Program Manager for Cloud Native Platforms

We're pleased to share new guidance on deploying open-source Apache Airflow on Azure Kubernetes Service (AKS).

Local Development on AKS with mirrord

December 4, 2024 · 11 min read

Gemma Tipper

Software Engineer at MetalBear

Quentin Petraroia

Product Manager for Azure Kubernetes Service

Developing applications for Kubernetes can mean a lot of time spent waiting and relatively little time spent writing code. Whenever you want to test your code changes in the cluster, you usually have to build your application, deploy it to the cluster, and attach a remote debugger (or add a bunch of logs). These iterations can be incredibly time-consuming. Thankfully, there is a way to bridge the gap between your local environment and a remote cluster, making them feel seamlessly connected. mirrord, which can be used as a plugin for VSCode or IntelliJ or directly in the CLI, is an open-source tool that does exactly that (and much more).

Deploy and take Flyte with an end-to-end ML orchestration solution on AKS

November 20, 2024 · 7 min read

Sachi Desai

Product Manager for AI/ML, GPU workloads on Azure Kubernetes Service

Data is often at the heart of application design and development - it fuels user-centric design, provides insights for feature enhancements, and represents the value of an application as a whole. In that case, shouldn’t we use data science tools and workflows that are flexible and scalable on a platform like Kubernetes, for a range of application types?

In collaboration with David Espejo and Shalabh Chaudhri from Union.ai, we’ll dive into an example using Flyte, a platform built on Kubernetes itself. Flyte can help you manage and scale out data processing and machine learning pipelines through a simple user interface.

Service Connector Introduction

May 23, 2024 · 2 min read

Coco Wang

AKS Product Manager

Workloads deployed on an Azure Kubernetes Service (AKS) cluster often need to access Azure backing resources, such as Azure Key Vault, databases, or AI services like Azure OpenAI Service. Users are required to manually configure Microsoft Entra Workload ID or Managed Identities so their AKS workloads can securely access these protected resources.

Introduction​

Introduction​

What are Virtual Machine Generations?​

Traditional patching and the need for Managed patching​

Introduction

Introduction

What are Virtual Machine Generations?

Traditional patching and the need for Managed patching