One post tagged with "llm-d" | AKS Engineering Blog

Pair llm-d Inference with KAITO RAG Advanced Search to Enhance your AI Workflows

September 12, 2025 · 10 min read

Ernest Wong

Software Engineer at Microsoft

Sachi Desai

Product Manager for AI/ML, GPU workloads on Azure Kubernetes Service

Overview

In this blog, we'll guide you through setting up an OpenAI API compatible inference endpoint with llm-d and integrating with retrieval- augmented generation (RAG) on AKS. This blog will showcase its value in a key finance use case: indexing the latest SEC 10-K filings for the two S&P 500 companies and querying them. We’ll also highlight the benefits of llm-d based on its architecture and its synergy with RAG.

Overview​

Overview