5 posts tagged with "AI Inference"

AI inference is the process of using a trained AI model to make predictions, generate content, or make decisions on new, unseen data.

View All Tags

AI Inference on AKS enabled by Azure Arc: Generative AI using Triton and TensorRT‑LLM

April 9, 2026 · 14 min read

Datta Rajpure

Principal Group Eng Manager at Microsoft Azure Core

In this post, you’ll deploy NVIDIA Triton Inference Server on your Azure Kubernetes Service (AKS) enabled by Azure Arc cluster to serve a Qwen‑based generative model using the TensorRT‑LLM backend. By the end, you’ll have a working generative AI inference pipeline running locally on your on‑premises GPU hardware.

AI Inference on AKS enabled by Azure Arc: Predictive AI using Triton and ResNet-50

April 7, 2026 · 11 min read

Datta Rajpure

Principal Group Eng Manager at Microsoft Azure Core

In this post, you'll deploy NVIDIA Triton Inference Server on your Azure Kubernetes Service (AKS) enabled by Azure Arc cluster to serve a ResNet-50 image classification model in ONNX format. By the end, you'll have a working predictive AI inference pipeline running on your on-premises GPU hardware.

AI Inference on AKS enabled by Azure Arc: Generative AI with Open‑Source LLM Server

April 7, 2026 · 11 min read

Datta Rajpure

Principal Group Eng Manager at Microsoft Azure Core

In this post, you'll explore how to deploy and run generative AI inference workloads using open-source large language model servers on Azure Kubernetes Service (AKS) enabled by Azure Arc. You'll focus on running these workloads locally, on-premises or at the edge, using GPU acceleration with centralized management.

AI Inference on AKS enabled by Azure Arc: Series Introduction and Scope

April 7, 2026 · 6 min read

Datta Rajpure

Principal Group Eng Manager at Microsoft Azure Core

This series gives you practical, step-by-step guidance for experimentation with generative and predictive AI inference workloads on Azure Kubernetes Service (AKS) enabled by Azure Arc clusters, using CPUs, GPUs, and neural processing units (NPUs). The scenarios target on‑premises and edge environments, specifically Azure Local, with a focus on repeatable, hands-on experimentation rather than abstract examples.

AI Inference on AKS enabled by Azure Arc: Bringing AI to the Edge and On‑Premises

April 7, 2026 · 3 min read

Datta Rajpure

Principal Group Eng Manager at Microsoft Azure Core

For many edge and on-premises environments, sending data to the cloud for AI inferencing isn't an option, as latency, data residency, and compliance make it a non-starter. With Azure Kubernetes Service (AKS) enabled by Azure Arc managing your Kubernetes clusters, you can run AI inferencing locally on the hardware you already have. This blog series shows you how, with hands-on tutorials for experimenting with generative and predictive AI workloads using CPUs, GPUs, and NPUs.