Control where your pods land on AKS with NAP

March 30, 2026 · 15 min read

Product Manager at Microsoft

With Kubernetes, you control what your workloads (or pods) need with resource requests. But how do you control where they land? On AKS, three scheduling levers work with Node Auto-Provisioning (NAP) to give you predictable placement: taints, affinity, and topology spread constraints.

Diagram showing NAP topology spread behavior

info

Learn more in the official documentation: Node Auto Provisioning and AKS Operator Best Practices

Background

You want to ensure that workloads schedule, scale, and are disrupted only when (or where) desired. The problem here is Kubernetes can feel complex, and it's easy to be unclear what settings to use to accomplish this. Node Auto-Provisioning optimizes bin-packing your compute, but to best utilize it - users need to make sure certain best practices are followed for predictable behavior.

When adopting Kubernetes at scale, the hardest operational questions often aren’t “How do I scale nodes (or VMs)?” — they’re:

Where will my workload replicas land (zones / nodes)?
How do I express node preferences without accidentally blocking scheduling?
If I’m using Node Auto-Provisioning (NAP), how does it interpret the rules I set?

Kubernetes scheduling is a negotiation between:

Workload intent (what your pod spec asks for)
Available capacity (what nodes exist, and what the platform can create)

This workload intent can be expressed in your workload manifest using 3 levers:

Taints and Tolerations – control which pods can go to which nodes
Affinity/Anti-Affinity – control where workloads can (or should not) run
Topology Spread Constraints – control replica distribution across failure domains

This post will connect NAP with three most important workload-level tools for shaping predictable node provisioning outcomes on AKS. Then we’ll connect the dots to explain what AKS Node Auto-Provisioning (NAP) does with those signals to manage your workloads.

If you’re new to these Kubernetes features, this post will give you “good defaults” as a starting point. If you’re already deep into scheduling, treat it as a checklist for the behaviors AKS users most commonly ask about.

For more on operator and scheduling best practices, visit operator best-practices guidance or configure the AKS scheduler using Configurable Scheduler Profiles. Watch for an upcoming post that explains how to align pod placement to achieve cost efficiencies with Configurable Scheduler Profiles on AKS.

How NAP handles node selection

Node auto-provisioning provisions, scales, and manages nodes. NAP senses pending pod pressure, provisions new nodes that satisfy workload specs and NodePool allowed options — and then AKS schedules pods onto those nodes.

NAP uses the following levers to control workload scheduling:

NodePool CRD (policies / constraints) - Node settings like (SKU selection, capacity type, zones, labels, node-level resource limits)
AKSNodeClass CRD (policies / constraints) - Azure-specific node settings like subnet behavior, image/OS disk/kubelet configuration, etc
NodeClaims - detail the state of provisioned and provisioning nodes
Workload spec / deployment file - The Kubernetes manifest that defines your workload's resource requirements and scheduling constraints (Node Affinity, Tolerations, and Topology Spread Constraints)

Simply put, NAP uses each lever to express “where and how this pod should run”, “what nodes should exist for this class of workloads”, and track what nodes are being scheduled or currently running.

You can think of the NodePool/AKSNodeClass as your "acceptable range of compute options” which your workload intent has to fit inside it. Check out our documentation for a good default spec for your NodePool CRD and the options for your AKSNodeClass CRD configuration.

Note: NAP is a node-level (or infrastructure) autoscaler that provisions and manages nodes (VMs) based on pending pods and their requirements, while the Kubernetes scheduler continues to schedule pods onto those nodes. For application level autoscaling, you can use KEDA with NAP. We also suggest using Vertical Pod Autoscaler (VPA) in recommendation-only mode (for example, with updateMode: Off in the VPA custom resource) for resource sizing recommendations.

Part 1 — Topology Spread Constraints: tool for zone-aware replicas

Topology Spread Constraints let you tell the scheduler: “Keep these replicas balanced across domains like zones or nodes.” The Kubernetes documentation describes them as a way to spread pods across failure domains such as regions, zones, nodes, and custom topology keys.

How NAP handles Topology Spread

NAP honors workload topologySpreadConstraints. While you can list the allowed zones in the NodePool CRD, topologySpreadConstraints are the means to ensure topology spread.

NAP (without pod-level topologySpreadConstraints defined) will provision wherever there is availability for the preferred VM SKU. This can look like NAP provisioning all preferred nodes in zone 1 and none in zone 2 and zone 3.
NAP (with pod-level topologySpreadConstraints defined) ensures topology spread. NAP honors pod-level constraints (number of replicas, topology spread behavior) in the workload deployment file. See the Kubernetes docs on topology spread for other examples also.

A good default: spread across Availability Zones

Here’s a typical “3-zone spread” pattern for a Deployment:

spec:
  replicas: 6
  template:
    metadata:
      labels:
        app: web
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          minDomains: 3
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: web

What these fields mean (in plain language):

topologyKey: topology.kubernetes.io/zone → spread across zones (not just nodes).
maxSkew: 1 → keep zone counts close (difference between most/least loaded domains can’t exceed 1 when DoNotSchedule).
minDomains: 3 (only valid with DoNotSchedule) → treat it as a requirement that at least 3 eligible domains participate; if fewer than minDomains are eligible, Kubernetes treats the “global minimum” as 0, affecting skew calculation.
whenUnsatisfiable: DoNotSchedule → enforce the rule strictly; if it can’t be met, pods stay Pending.

“Hard” vs “soft” topology spreading

Kubernetes gives you two behaviors:

DoNotSchedule: strict; better for HA-critical workloads, but can stall rollouts (pods stay pending) if capacity is constrained.
ScheduleAnyway: best-effort; scheduler still places pods wherever there is capacity but prioritizes choices that reduce skew.

The following example uses the "soft rule" of whenUnsatisfiable: ScheduleAnyway which will attempt to spread workloads across zones evenly, but will prioritize scale-up success over even topology spread. This method does not guarantee topology spread, so consider tradeoffs between zonal resiliency and scheduling success.

spec:
  replicas: 6
  template:
    metadata:
      labels:
        app: web
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: web

Practical guidance:

Start with DoNotSchedule for Tier-0 services where zonal placement is critical and more important than scheduling speed. Use ScheduleAnyway if you’d rather progress than block workload readiness during partial zone pressure.

For more info, visit the upstream Kubernetes docs on topology spread constraints.

Part 2 — Node Affinity / Anti-Affinity: shaping which nodes are eligible

Node affinity is the evolution of nodeSelector: it’s more expressive and lets you define hard requirements vs soft preferences.

Common use cases:

Standard Example (with nodeAffinity) - sets a hard rule using requiredDuringSchedulingIgnoredDuringExecution requiring gpu support:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: accelerator
              operator: In
              values:
                - gpu

Standard Example (with nodeAffinity) - “Prefer this node type, but don’t block if it’s unavailable” - Uses a soft rule of preferredDuringSchedulingIgnoredDuringExecution that prefer a specific SKU, but will apply best effort and schedule elsewhere if this SKU is unavailable:

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
      preference:
      matchExpressions:
        - key: node.kubernetes.io/instance-type
        operator: In
          values:
            - Standard_D16ds_v5

Standard Example - “Never co-locate replicas on the same node”

That’s usually podAntiAffinity or topology spread across hostname. This scenario uses a hard rule DoNotSchedule to spread pods using kubernetes.io/hostname:

topologySpreadConstraints:
  - maxSkew: 1
  topologyKey: kubernetes.io/hostname
  whenUnsatisfiable: DoNotSchedule
  labelSelector:
    matchLabels:
      app: web

Node Affinity Example - Gaming Workload

The following example workload uses best effort topology spread with a hard rule node affinity.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-gaming-low-latency-prefer-local
spec:
  replicas: 12
  selector:
    matchLabels:
      app: sample-gaming-low-latency-prefer-local
  template:
    metadata:
      labels:
        app: sample-gaming-low-latency-prefer-local
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 50
              preference:
                matchExpressions:
                  - key: example.com/network-tier
                    operator: In
                    values: ["low-latency"]

      topologySpreadConstraints:
        - maxSkew: 2
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: sample-gaming-low-latency-prefer-local

      containers:
        - name: session
          image: mcr.microsoft.com/oss/kubernetes/pause:3.6
          resources:
            requests:
              cpu: "500m"
              memory: "256Mi"

How do Topology Spread Constraints interact with Node Affinity rules?

Both Topology Spread Constraints and Node Affinity have hard and soft controls. If you set both, depending on how you configure them, Kubernetes and AKS will factor them into scheduling logic in multiple ways.

Node Affinity rules can either be:

Hard Rule - requiredDuringSchedulingIgnoredDuringExecution
Soft Rule (best effort) - preferredDuringSchedulingIgnoredDuringExecution

Topology Spread Constraint can either be:

Hard Rule - whenUnsatisfiable: DoNotSchedule
Soft Rule (best effort) - whenUnsatisfiable: ScheduleAnyway

The following table lists what to expect when you set these two constraints together in common scenarios, and our recommended setting:

Topology Spread Configuration	Affinity Configuration	Observed Scheduling Behavior	Recommendation
Hard (`whenUnsatisfiable: DoNotSchedule`)	Hard Node Affinity (`requiredDuringSchedulingIgnoredDuringExecution`)	Pod remains Pending if no node satisfies both constraints. The scheduler filters out all nodes that violate either rule.	Use only when you are certain the constraints are always compatible (for example, multi‑zone node affinity plus multi‑zone spread). Avoid mixing single‑zone affinity with multi‑zone spread.
Soft (`whenUnsatisfiable: ScheduleAnyway`)	Hard Node Affinity (`requiredDuringSchedulingIgnoredDuringExecution`)	Pod schedules only on nodes matching affinity. Topology spread is applied as best‑effort, and distribution may be uneven.	✅ Recommended default for most workloads. Enforce strict placement requirements while keeping high availability best‑effort.
Hard (`whenUnsatisfiable: DoNotSchedule`)	Soft Node Affinity (`preferredDuringSchedulingIgnoredDuringExecution`)	Pod schedules only if topology spread constraints are met. Affinity acts only as a preference among valid nodes.	Use when even distribution across zones or nodes is more important than node‑level preferences.
Soft (`whenUnsatisfiable: ScheduleAnyway`)	Soft Node Affinity	Pod always schedules. Both constraints only influence scoring; placement is flexible and may be imbalanced.	Suitable for dev/test, batch, or low‑criticality workloads.
Hard multi‑zone spread (`whenUnsatisfiable: DoNotSchedule` and `minDomains` >= 2)	Single‑zone hard affinity	Pod enters a permanent Pending state due to a logical contradiction between constraints.	Align affinity and spread to the same topology domains, or relax one of the constraints.

Practical Guidance

Consider your priorities for topology spread and node affinity.

Decide which requirement is truly “must-have.”
- Make that one hard (required… or DoNotSchedule).
- Make the other a preference (preferred… or ScheduleAnyway).
If you combine strict affinity with strict multi-zone spread, double-check feasibility:
- If affinity restricts you to 1 zone, you cannot also require even spread across 3 zones with DoNotSchedule.
Use nodeAffinityPolicy: Honor when your intent is “spread within the nodes I’ve made eligible via affinity.”

Part 3 - Taints and tolerations

Taints and tolerations control which pods can run on which nodes. Think of a taint as a "keep out" sign on a node. If your pod doesn't carry a matching toleration, the scheduler skips that node.

Taints and tolerations are mechanisms used in Kubernetes to control which pods can be scheduled onto which nodes. They allow you to ensure that certain pods do not run on particular nodes, enabling more fine-grained control over your clusters.

A Practical AKS/NAP Mental Model for taints and tolerations

Think of taints as a ‘Do Not Enter’ sign on a node. If a node has a specific taint, pods must tolerate it to be scheduled on that node. This helps families of workloads maintain their operational boundaries while ensuring they run on appropriate resources.

Taints are defined in your NodePool CRD, and Tolerations are defined in your workload deployment file.

Taint your NAP-managed nodes

In NAP, you can provide taints in your NodePool CRD, and every node created based on this NodePool CRD will have this taint. If you only want specific nodes to have this taint, make sure you have a specific NodePool CRD file created for this purpose (as you can have multiple NodePool CRDs).

The following example shows a taint called test.com/custom-taint that is added in the spec.template.spec.taints field in a NodePool CRD:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      taints:
        - key: test.com/custom-taint
          effect: NoSchedule

Note: Taints can prevent pods from being scheduled to these nodes if they are not tolerated by the pods. A proper toleration must be added to your specific pods to allow them to be scheduled to nodes that are based on this NodePool CRD.

Tolerations are a field in your pod spec that declare which taint effects a pod can accept. The two most common taint effects are:

NoSchedule: strict. Only pods with a matching toleration can land on the tainted node.
PreferNoSchedule: best-effort. AKS tries to avoid placing pods that don't tolerate the taint, but doesn't guarantee it.

Hard Rule - NoSchedule Toleration example:

   tolerations:
     - key: "key1"
       operator: "Equal"
       value: "value1"
       effect: "NoSchedule"

Best-effort rule - PreferNoSchedule Toleration example:

tolerations:
  - key: "key2"
    operator: "Equal"
    value: "value2"
    effect: "PreferNoSchedule"

Healthcare workload example

The following example workload uses hard rule topology spread with a hard rule node affinity and toleration to ensure resiliency and limited node co-location.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-healthcare-phi-zone-hardened
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sample-healthcare-phi-zone-hardened
  template:
    metadata:
      labels:
        app: sample-healthcare-phi-zone-hardened
    spec:
      # If you taint compliance nodes like: example.com/compliance=phi:NoSchedule
      tolerations:
        - key: "example.com/compliance"
          operator: "Equal"
          value: "phi"
          effect: "NoSchedule"

      affinity:
        podAntiAffinity:
          # HARD anti-affinity at hostname: don't put 2 replicas on same node.
          # Tradeoff: can go Pending if cluster is small.
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: sample-healthcare-phi-zone-hardened
              topologyKey: kubernetes.io/hostname

      topologySpreadConstraints:
        - maxSkew: 1
          minDomains: 3
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule  # HARD: ensure zone spread for HA
          labelSelector:
            matchLabels:
              app: sample-healthcare-phi-zone-hardened

      containers:
        - name: api
          image: mcr.microsoft.com/oss/kubernetes/pause:3.6
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"

Common Taint + Toleration Pitfalls

Over-tainting Nodes: Be cautious not to overuse taints as they can create scheduling issues.
Complexity in Management: Managing multiple taints and tolerations can become complex, making debugging and management harder.

For more on Taints and Tolerations, visit our operator best practices docs or the Kubernetes documentation.

Recap

General Recommendations:

Configure your NAP NodePool CRD spec to allow as flexible (or as specific) of a range of nodes that your workloads are allowed to schedule to. You can also create multiple NodePool CRDs, just make sure they are mutually exclusive.
Configure your AKSNodeClass CRD spec to define any Azure-specific settings.
Define desired scheduling behavior in the deployment spec with Topology Spread Constraints, Node Affinity, and/or Taints and Tolerations.
Consider your priorities between Topology Spread and certain Node Affinities, and set one with a hard rule and the other with best effort.
Use Taints and Tolerations to ensure special workloads schedule only to the exact nodes you want them to. Be sure to use in moderation to not overcomplicate your cluster.

FAQ

How can I overprovision to respond to spikes of traffic?

When using NAP, you can set your resource needs slightly higher than you expect to actually use. NAP responds to pending pod pressure, so by default it provisions nodes to match the amount you request in your deployment file. When not using an autoscaler, you have the option to use overprovisioning to have excess compute to respond quickly to spikes of traffic.

How can I reduce latency when trying to schedule nodes?

You can consider enabling features such as Artifact Streaming which can decrease pod readiness time.

For more visit our documentation on performance and scaling best practices.

Next steps

Ready to get started?

Try NAP today: Follow the Enable Node Auto Provisioning steps.
Learn more: Visit our AKS operator best-practices guidance.
Share feedback: Open issues or ideas in AKS GitHub Issues.
Join the community: Subscribe to the AKS Community YouTube and follow @theakscommunity on X.

Background​

How NAP handles node selection​

Part 1 — Topology Spread Constraints: tool for zone-aware replicas​

How NAP handles Topology Spread​

A good default: spread across Availability Zones​

“Hard” vs “soft” topology spreading​

Part 2 — Node Affinity / Anti-Affinity: shaping which nodes are eligible​

Node Affinity Example - Gaming Workload​

How do Topology Spread Constraints interact with Node Affinity rules?​

Practical Guidance​

Part 3 - Taints and tolerations​

A Practical AKS/NAP Mental Model for taints and tolerations​

Taint your NAP-managed nodes​

Healthcare workload example​

Common Taint + Toleration Pitfalls​

Recap​

FAQ​

How can I overprovision to respond to spikes of traffic?​

How can I reduce latency when trying to schedule nodes?​

Next steps​