Skip to main content

2 posts tagged with "Ray"

KubeRay operator usage for distributed AI/ML workloads on AKS.

View All Tags

Scaling Anyscale Ray Workloads on AKS

· 7 min read
Anson Qian
Software Engineer at Azure Kubernetes Service
Bob Mital
Principal Product Manager at Microsoft Azure
Kenneth Kilty
Technical Program Manager for Cloud Native Platforms

This post focuses on running Anyscale's managed Ray service on AKS, using the Anyscale Runtime (formerly RayTurbo) for an optimized Ray experience. For open-source Ray on AKS, see our Ray on AKS overview.

Ray is an open-source distributed compute framework for scaling Python and AI workloads from a laptop to clusters with thousands of nodes. Anyscale provides a managed ML/AI platform and an optimized Ray runtime with better scalability, observability, and operability than running open-source KubeRay—including intelligent autoscaling, enhanced monitoring, and fault-tolerant training.

As part of Microsoft and Anyscale's strategic collaboration to deliver distributed AI/ML Azure-native computing at scale, we've been working closely with Anyscale to enhance the production-readiness of Ray workloads on Azure Kubernetes Service (AKS) in three critical areas:

  • Elastic scalability through multi-cluster multi-region capacity aggregation
  • Data persistence with unified storage across ML/AI development and operation lifecycle
  • Operational simplicity through automated credential management with service principal

Whether you're fine-tuning models with DeepSpeed or LLaMA-Factory or deploying inference endpoints for LLMs ranging from small to large-scale reasoning models, Anyscale on AKS delivers a production-grade ML/AI platform that scales with your needs.