Scaling Anyscale Ray Workloads on AKS
This post focuses on running Anyscale's managed Ray service on AKS, using the Anyscale Runtime (formerly RayTurbo) for an optimized Ray experience. For open-source Ray on AKS, see our Ray on AKS overview.
Ray is an open-source distributed compute framework for scaling Python and AI workloads from a laptop to clusters with thousands of nodes. Anyscale provides a managed ML/AI platform and an optimized Ray runtime with better scalability, observability, and operability than running open-source KubeRay—including intelligent autoscaling, enhanced monitoring, and fault-tolerant training.
As part of Microsoft and Anyscale's strategic collaboration to deliver distributed AI/ML Azure-native computing at scale, we've been working closely with Anyscale to enhance the production-readiness of Ray workloads on Azure Kubernetes Service (AKS) in three critical areas:
- Elastic scalability through multi-cluster multi-region capacity aggregation
- Data persistence with unified storage across ML/AI development and operation lifecycle
- Operational simplicity through automated credential management with service principal
Whether you're fine-tuning models with DeepSpeed or LLaMA-Factory or deploying inference endpoints for LLMs ranging from small to large-scale reasoning models, Anyscale on AKS delivers a production-grade ML/AI platform that scales with your needs.


