Apply Copy Fail and DirtyFrag CVE mitigations at-scale using Azure Kubernetes Fleet Manager

May 11, 2026 · 11 min read

Product Manager for Azure Kubernetes Fleet Manager

This post shows how to use Azure Kubernetes Fleet Manager to simplify the safe rollout of mitigations for CVE-2026-31431 ("Copy Fail") and CVE-2026-43284 / CVE-2026-43500 ("DirtyFrag") across multiple AKS clusters. This vulnerability allows a container to escalate to root on the node and impacts AKS Linux nodes until mitigations are applied. Existing nodes require either a node image upgrade or a self-service DaemonSet mitigation.

The approaches covered in this post apply mitigations as follows:

Immediate mitigation: deploy a DaemonSet using Fleet Manager resource placement. Best for clusters where a node image upgrade can't yet be applied.
Permanent mitigation: upgrade node images using Fleet Manager update runs. Applies patches when new node image version is available in the cluster's region.

The original AKS advisory and detailed mitigation guide for per-cluster application can be found in GitHub Issue 5753.

Before you begin

Azure CLI version 2.86.0 or later installed (instructions).
Fleet Manager Azure CLI extension installed and updated.

az extension add --name fleet
az extension update --name fleet

A Fleet Manager with a hub cluster (instructions). You can add a hub cluster if you are already using Fleet without one.
AKS clusters added as member clusters with update group and member labels set. Groups and labels are required for controlling the rollout via placement and update runs.

az fleet member create \
  --resource-group $GROUP \
  --fleet-name $FLEET \
  --name $MEMBER_CLUSTER_NAME \
  --update-group canary \
  --labels upgroup=canary

kubectl configured for the hub cluster (instructions).

Immediate mitigation - DaemonSet rollout

important

Make sure to review the steps in before you begin, especially adding labels to member clusters.

For immediate protection, apply a DaemonSet that disables the vulnerable kernel module. The DaemonSet is deployed into its own Namespace on each cluster, making it easy to discern intent and easier to distribute across clusters. Given how DaemonSets function, the protection is applied to the node, even though we don't use the kube-system Namespace as shown in the original AKS mitigation guide.

Fleet Manager's cluster-scoped resource placement enables:

Multi-cluster selection for a resource via ClusterResourcePlacement.
Controlled rollout of the resource via ClusterStagedUpdateRun.

Step 1: Create the mitigation Namespace and DaemonSet

Create the 01-kernel-lpe-mitigate.yaml file with the following contents. The DaemonSet blocks vulnerable modules covered by the CVEs.

apiVersion: v1
kind: Namespace
metadata:
  name: kernel-lpe-cve-mitigate-ns
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kernel-lpe-mitigate
  namespace: kernel-lpe-cve-mitigate-ns
  labels:
    app: kernel-lpe-mitigate
    purpose: security-mitigation
spec:
  selector:
    matchLabels:
      app: kernel-lpe-mitigate
  template:
    metadata:
      labels:
        app: kernel-lpe-mitigate
    spec:
      hostPID: true
      priorityClassName: system-node-critical
      nodeSelector:
        kubernetes.io/os: linux
      tolerations:
        - operator: Exists
      containers:
        - name: mitigate
          image: mcr.microsoft.com/cbl-mariner/busybox:2.0
          command:
            - /bin/sh
            - -c
            - |
              echo "=== Kernel LPE Module Mitigation ==="
              echo "Covers: CVE-2026-31431 (algif_aead), DirtyFrag (esp4/esp6/rxrpc)"

              MODULES="algif_aead esp4 esp6 rxrpc"

              for mod in $MODULES; do
                if ! grep -qs "install ${mod} /bin/false" /host/etc/modprobe.d/*.conf 2>/dev/null; then
                  printf "install %s /bin/false\nblacklist %s\n" "$mod" "$mod" >> /host/etc/modprobe.d/disable-kernel-lpe.conf
                  echo "Blocked ${mod}"
                else
                  echo "${mod} already blocked"
                fi

                if chroot /host grep -q "^${mod} " /proc/modules 2>/dev/null; then
                  if chroot /host modprobe -r "$mod" 2>/dev/null; then
                    echo "Unloaded ${mod}"
                  else
                    echo "WARNING: Could not unload ${mod} (in use). Reboot node."
                  fi
                fi
              done

              echo "=== Mitigation complete. Sleeping ==="
              sleep infinity
          resources:
            requests:
              cpu: 10m
              memory: 16Mi
            limits:
              memory: 32Mi
          securityContext:
            privileged: true
          volumeMounts:
            - name: host-root
              mountPath: /host
      volumes:
        - name: host-root
          hostPath:
            path: /
            type: Directory

Stage the mitigation namespace and DaemonSet on the Fleet Manager hub cluster.

kubectl apply -f 01-kernel-lpe-mitigate.yaml

note

The DaemonSet is not instantiated (0 pods are scheduled) on the Fleet Manager hub cluster. This is expected behavior.

Step 2: Create a ClusterResourcePlacement

Create a placement manifest 02-fleet-pick-clusters-mitigation.yaml that selects all Fleet Manager member clusters (PickAll) as suitable to receive the Namespace and its DaemonSet. We set the strategy to External so we can define and control the rollout order for clusters, which we will do in following steps.

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
  name: kernel-lpe-cve-mit-ns-place
spec:
  resourceSelectors:
  - group: ""
    version: v1
    kind: Namespace
    name: kernel-lpe-cve-mitigate-ns
  policy:
    placementType: PickAll
  strategy:
    type: External

Apply the placement on Fleet Manager hub cluster.

kubectl apply -f 02-fleet-pick-clusters-mitigation.yaml

You can validate how many clusters will receive the mitigation using this command.

kubectl get clusterresourceplacement kernel-lpe-cve-mit-ns-place -o jsonpath="{.status.conditions[?(@.type=='ClusterResourcePlacementScheduled')].message}"

The message in the response shows how many clusters were selected.

found all cluster needed as specified by the scheduling policy, found 2 cluster(s)

Step 3: Create a ClusterStagedUpdateStrategy

A strategy is used to define the rollout order and soak duration per rollout stage. Note that all selected clusters must be in a stage. This is determined by the labels set of the cluster when it was added to the Fleet Manager.

In this sample 03-fleet-mitigation-rollout-strategy.yaml we have three stages, with a 4 hour soak time between each stage. You can also add approvals if required.

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterStagedUpdateStrategy
metadata:
  name: kernel-lpe-cve-mit-ds-strategy
spec:
  stages:
    - name: stage1
      labelSelector:
        matchLabels:
          upgroup: canary
      afterStageTasks:
        - type: TimedWait
          waitTime: 4h
      maxConcurrency: 1
    - name: stage2
      labelSelector:
        matchLabels:
          upgroup: nonprod
      afterStageTasks:
        - type: TimedWait
          waitTime: 4h
      maxConcurrency: 1
    - name: stage3
      labelSelector:
        matchLabels:
          upgroup: prod

Apply the strategy on Fleet Manager hub cluster.

kubectl apply -f 03-fleet-mitigation-rollout-strategy.yaml

Step 4: Start rollout

Finally, create a ClusterStagedUpdateRun 04-fleet-mitigation-rollout-run.yaml to begin the rollout.

note

To start the rollout later, set state to Initialize. You can patch the resource to Run to begin rollout (see below).

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterStagedUpdateRun
metadata:
  name: kernel-lpe-cve-mit-ds-rollout
spec:
  placementName: kernel-lpe-cve-mit-ns-place
  stagedRolloutStrategyName: kernel-lpe-cve-mit-ds-strategy
  state: Run

Apply the update run on Fleet Manager hub cluster. If you have Run as the state, the rollout begins.

kubectl apply -f 04-fleet-mitigation-rollout-run.yaml

Step 5: Monitor and control rollout

Use the following command to retrieve the status of the rollout. Inspect the conditions and stageStatuses arrays to review progress.

kubectl get clusterstagedupdaterun kernel-lpe-cve-mit-ds-rollout -o yaml

note

You can also use the preview version of KubeFleet's Headlamp plugin to help monitor and control the rollout. Connect it to the Fleet Manager hub cluster to use.

If you want to stop the rollout, you can patch the update run:

kubectl patch clusterstagedupdaterun kernel-lpe-cve-mit-ds-rollout --type merge -p '{"spec":{"state":"Stop"}}'

Restart the rollout by patching the update run:

kubectl patch clusterstagedupdaterun kernel-lpe-cve-mit-ds-rollout --type merge -p '{"spec":{"state":"Run"}}'

Step 6: Validate mitigation is applied

Select clusters in your fleet that have received the mitigation and run the following check.

kubectl logs -n kernel-lpe-cve-mitigate-ns -l app=kernel-lpe-mitigate

If the mitigation is applied successfully you should see.

=== Kernel LPE Module Mitigation ===
Covers: CVE-2026-31431 (algif_aead), DirtyFrag (esp4/esp6/rxrpc)
Blocked algif_aead
Blocked esp4 
Blocked esp6
Blocked rxrpc
=== Mitigation complete. Sleeping ===

note

In some cases where the module is in use, the node will require a reboot. The log message will indicate this is required.

Step 7: Roll back or remove

If issues occur, or the DaemonSet is no longer required as the node image has been updated to a permanently patched release, you can remove the mitigation DaemonSet by deleting the ClusterResourcePlacement on the Fleet Manager hub cluster.

kubectl delete clusterresourceplacement kernel-lpe-cve-mit-ns-place

DaemonSet Rollout Outcome

Immediate mitigation applied across clusters via DaemonSet.
Controlled rollout reduces risk, allows testing on clusters in batches.
No node reboot required when module not in use.

Permanent mitigation - node image upgrade

important

Make sure to review the steps in before you begin, especially adding update groups to member clusters.
Depending on the regional spread of your clusters, it is possible that the update run may pause for periods as it waits for the node image to be published to a region. This is expected behavior. You can apply the DaemonSet to your clusters to ensure they are protected while waiting for the updated node image to be available.

Once AKS provides a permanent mitigation via updated node images you can use Fleet Manager update runs to update the node image across your clusters.

Fleet Manager's safe multi-cluster updates feature enables:

Application of consistent node image across clusters using the Node Image channel.
Controlled rollout of the node image update via Update Strategies.

Step 1: Define an update strategy

An update strategy defines the order in which clusters will receive the node image update. Strategies are defined using a simple JSON format as shown next.

In the following sample we have one group of clusters per stage, with clusters grouped based on the update-group on the member cluster. We continue with the 4 hour soak time and have a 100% concurrency which allows up to 50 clusters to update at once. You can also set these up via the Azure portal if desired.

{
    "stages": [
        {
            "name": "stage-1",
            "maxConcurrency": "100%",
            "groups": [
                {
                    "name": "canary",
                    "maxConcurrency": "100%"
                }
            ],
            "afterStageWaitInSeconds": 14400
        },
        {
            "name": "stage-2",
            "maxConcurrency": "100%",
            "groups": [
                {
                    "name": "nonprod",
                    "maxConcurrency": "100%"
                }
            ],
            "afterStageWaitInSeconds": 14400
        },
        {
            "name": "stage-3",
            "maxConcurrency": "100%",
            "groups": [
                {
                    "name": "prod",
                    "maxConcurrency": "100%"
                }
            ]
        }
    ]
}

Save the strategy as a file (nodeimage-strategy.json) and then submit via the Azure CLI to create the Strategy ready for use in Fleet Manager.

export GROUP=resource-group
export FLEET=fleet-name
export STRATEGY=copyfail-mitigation-strategy

az fleet updatestrategy create \
  --resource-group $GROUP \
  --fleet-name $FLEET \
  --name $STRATEGY \
  --stages nodeimage-strategy.json

note

To update a strategy after you have created it, make changes to the JSON file and simply call the create command again.

Step 2: Create an update run for node image upgrades

Next, let's create a new update run for updating just the node image on selected clusters.

The update run won't automatically start, so you can create this once AKS releases the patched node images, then start the run at a later time that suits your schedule.

az fleet updaterun create \
  --resource-group $GROUP \
  --fleet-name $FLEET \
  --name copyfail-nodeimage-update \
  --upgrade-type NodeImageOnly \
  --update-strategy-name $STRATEGY

Step 3: Start the update run

Once you are satisfied that all clusters to be upgraded are included, you can start the update run.

az fleet updaterun start \
  --fleet-name $FLEET \
  --resource-group $GROUP \
  --name copyfail-nodeimage-update

Step 4: Monitor progress

You can review the status using the Azure CLI shown below, or use the Azure portal's web interface to view the rollout graphically.

az fleet updaterun show \
  --resource-group $GROUP \
  --fleet-name $FLEET \
  --name copyfail-nodeimage-update

Node Image Upgrade Outcome

All nodes are updated to the patched node image.
DaemonSet no longer required and can be removed.

Summary and next steps

In this post we've looked at the multiple ways in which Azure Kubernetes Fleet Manager can help you safely apply mitigations for the Copy Fail CVE across all your clusters.

Fleet Manager continues to add new capabilities regularly, with the goal of simplifying at-scale Kubernetes cluster management for everyone.

Here's some recommended next steps:

Review AKS Advisory and Mitigation for CVE-2026-31431 (Copy Fail).
Learn more about Fleet Manager Safe Multi-cluster Updates.
Learn more about Fleet Manager resource placement.
View the Fleet Manager public roadmap. Feature requests always welcome!
Join a KubeFleet community call to learn about Fleet Manager's open source resource placement.

Before you begin​

Immediate mitigation - DaemonSet rollout​

Step 1: Create the mitigation Namespace and DaemonSet​

Step 2: Create a ClusterResourcePlacement​

Step 3: Create a ClusterStagedUpdateStrategy​

Step 4: Start rollout​

Step 5: Monitor and control rollout​

Step 6: Validate mitigation is applied​

Step 7: Roll back or remove​

DaemonSet Rollout Outcome​

Permanent mitigation - node image upgrade​

Step 1: Define an update strategy​

Step 2: Create an update run for node image upgrades​

Step 3: Start the update run​

Step 4: Monitor progress​

Node Image Upgrade Outcome​

Summary and next steps​

Before you begin

Immediate mitigation - DaemonSet rollout

Step 1: Create the mitigation Namespace and DaemonSet

Step 2: Create a ClusterResourcePlacement

Step 3: Create a ClusterStagedUpdateStrategy

Step 4: Start rollout

Step 5: Monitor and control rollout

Step 6: Validate mitigation is applied

Step 7: Roll back or remove

DaemonSet Rollout Outcome

Permanent mitigation - node image upgrade

Step 1: Define an update strategy

Step 2: Create an update run for node image upgrades

Step 3: Start the update run

Step 4: Monitor progress

Node Image Upgrade Outcome

Summary and next steps