Apply Copy Fail and DirtyFrag CVE mitigations at-scale using Azure Kubernetes Fleet Manager
This post shows how to use Azure Kubernetes Fleet Manager to simplify the safe rollout of mitigations for CVE-2026-31431 ("Copy Fail") and CVE-2026-43284 / CVE-2026-43500 ("DirtyFrag") across multiple AKS clusters. This vulnerability allows a container to escalate to root on the node and impacts AKS Linux nodes until mitigations are applied. Existing nodes require either a node image upgrade or a self-service DaemonSet mitigation.
The approaches covered in this post apply mitigations as follows:
- Immediate mitigation: deploy a DaemonSet using Fleet Manager resource placement. Best for clusters where a node image upgrade can't yet be applied.
- Permanent mitigation: upgrade node images using Fleet Manager update runs. Applies patches when new node image version is available in the cluster's region.
The original AKS advisory and detailed mitigation guide for per-cluster application can be found in GitHub Issue 5753.
Before you begin
- Azure CLI version 2.86.0 or later installed (instructions).
- Fleet Manager Azure CLI extension installed and updated.
az extension add --name fleet
az extension update --name fleet
- A Fleet Manager with a hub cluster (instructions). You can add a hub cluster if you are already using Fleet without one.
- AKS clusters added as member clusters with update group and member labels set. Groups and labels are required for controlling the rollout via placement and update runs.
az fleet member create \
--resource-group $GROUP \
--fleet-name $FLEET \
--name $MEMBER_CLUSTER_NAME \
--update-group canary \
--labels upgroup=canary
- kubectl configured for the hub cluster (instructions).
Immediate mitigation - DaemonSet rollout
Make sure to review the steps in before you begin, especially adding labels to member clusters.
For immediate protection, apply a DaemonSet that disables the vulnerable kernel module. The DaemonSet is deployed into its own Namespace on each cluster, making it easy to discern intent and easier to distribute across clusters. Given how DaemonSets function, the protection is applied to the node, even though we don't use the kube-system Namespace as shown in the original AKS mitigation guide.
Fleet Manager's cluster-scoped resource placement enables:
- Multi-cluster selection for a resource via ClusterResourcePlacement.
- Controlled rollout of the resource via ClusterStagedUpdateRun.
Step 1: Create the mitigation Namespace and DaemonSet
Create the 01-kernel-lpe-mitigate.yaml file with the following contents. The DaemonSet blocks vulnerable modules covered by the CVEs.
apiVersion: v1
kind: Namespace
metadata:
name: kernel-lpe-cve-mitigate-ns
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kernel-lpe-mitigate
namespace: kernel-lpe-cve-mitigate-ns
labels:
app: kernel-lpe-mitigate
purpose: security-mitigation
spec:
selector:
matchLabels:
app: kernel-lpe-mitigate
template:
metadata:
labels:
app: kernel-lpe-mitigate
spec:
hostPID: true
priorityClassName: system-node-critical
nodeSelector:
kubernetes.io/os: linux
tolerations:
- operator: Exists
containers:
- name: mitigate
image: mcr.microsoft.com/cbl-mariner/busybox:2.0
command:
- /bin/sh
- -c
- |
echo "=== Kernel LPE Module Mitigation ==="
echo "Covers: CVE-2026-31431 (algif_aead), DirtyFrag (esp4/esp6/rxrpc)"
MODULES="algif_aead esp4 esp6 rxrpc"
for mod in $MODULES; do
if ! grep -qs "install ${mod} /bin/false" /host/etc/modprobe.d/*.conf 2>/dev/null; then
printf "install %s /bin/false\nblacklist %s\n" "$mod" "$mod" >> /host/etc/modprobe.d/disable-kernel-lpe.conf
echo "Blocked ${mod}"
else
echo "${mod} already blocked"
fi
if chroot /host grep -q "^${mod} " /proc/modules 2>/dev/null; then
if chroot /host modprobe -r "$mod" 2>/dev/null; then
echo "Unloaded ${mod}"
else
echo "WARNING: Could not unload ${mod} (in use). Reboot node."
fi
fi
done
echo "=== Mitigation complete. Sleeping ==="
sleep infinity
resources:
requests:
cpu: 10m
memory: 16Mi
limits:
memory: 32Mi
securityContext:
privileged: true
volumeMounts:
- name: host-root
mountPath: /host
volumes:
- name: host-root
hostPath:
path: /
type: Directory
Stage the mitigation namespace and DaemonSet on the Fleet Manager hub cluster.
kubectl apply -f 01-kernel-lpe-mitigate.yaml
The DaemonSet is not instantiated (0 pods are scheduled) on the Fleet Manager hub cluster. This is expected behavior.
Step 2: Create a ClusterResourcePlacement
Create a placement manifest 02-fleet-pick-clusters-mitigation.yaml that selects all Fleet Manager member clusters (PickAll) as suitable to receive the Namespace and its DaemonSet. We set the strategy to External so we can define and control the rollout order for clusters, which we will do in following steps.
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
name: kernel-lpe-cve-mit-ns-place
spec:
resourceSelectors:
- group: ""
version: v1
kind: Namespace
name: kernel-lpe-cve-mitigate-ns
policy:
placementType: PickAll
strategy:
type: External
Apply the placement on Fleet Manager hub cluster.
kubectl apply -f 02-fleet-pick-clusters-mitigation.yaml
You can validate how many clusters will receive the mitigation using this command.
kubectl get clusterresourceplacement kernel-lpe-cve-mit-ns-place -o jsonpath="{.status.conditions[?(@.type=='ClusterResourcePlacementScheduled')].message}"
The message in the response shows how many clusters were selected.
found all cluster needed as specified by the scheduling policy, found 2 cluster(s)
Step 3: Create a ClusterStagedUpdateStrategy
A strategy is used to define the rollout order and soak duration per rollout stage. Note that all selected clusters must be in a stage. This is determined by the labels set of the cluster when it was added to the Fleet Manager.
In this sample 03-fleet-mitigation-rollout-strategy.yaml we have three stages, with a 4 hour soak time between each stage. You can also add approvals if required.
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterStagedUpdateStrategy
metadata:
name: kernel-lpe-cve-mit-ds-strategy
spec:
stages:
- name: stage1
labelSelector:
matchLabels:
upgroup: canary
afterStageTasks:
- type: TimedWait
waitTime: 4h
maxConcurrency: 1
- name: stage2
labelSelector:
matchLabels:
upgroup: nonprod
afterStageTasks:
- type: TimedWait
waitTime: 4h
maxConcurrency: 1
- name: stage3
labelSelector:
matchLabels:
upgroup: prod
Apply the strategy on Fleet Manager hub cluster.
kubectl apply -f 03-fleet-mitigation-rollout-strategy.yaml
Step 4: Start rollout
Finally, create a ClusterStagedUpdateRun 04-fleet-mitigation-rollout-run.yaml to begin the rollout.
To start the rollout later, set state to Initialize. You can patch the resource to Run to begin rollout (see below).
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterStagedUpdateRun
metadata:
name: kernel-lpe-cve-mit-ds-rollout
spec:
placementName: kernel-lpe-cve-mit-ns-place
stagedRolloutStrategyName: kernel-lpe-cve-mit-ds-strategy
state: Run
Apply the update run on Fleet Manager hub cluster. If you have Run as the state, the rollout begins.
kubectl apply -f 04-fleet-mitigation-rollout-run.yaml
Step 5: Monitor and control rollout
Use the following command to retrieve the status of the rollout. Inspect the conditions and stageStatuses arrays to review progress.
kubectl get clusterstagedupdaterun kernel-lpe-cve-mit-ds-rollout -o yaml
You can also use the preview version of KubeFleet's Headlamp plugin to help monitor and control the rollout. Connect it to the Fleet Manager hub cluster to use.
If you want to stop the rollout, you can patch the update run:
kubectl patch clusterstagedupdaterun kernel-lpe-cve-mit-ds-rollout --type merge -p '{"spec":{"state":"Stop"}}'
Restart the rollout by patching the update run:
kubectl patch clusterstagedupdaterun kernel-lpe-cve-mit-ds-rollout --type merge -p '{"spec":{"state":"Run"}}'
Step 6: Validate mitigation is applied
Select clusters in your fleet that have received the mitigation and run the following check.
kubectl logs -n kernel-lpe-cve-mitigate-ns -l app=kernel-lpe-mitigate
If the mitigation is applied successfully you should see.
=== Kernel LPE Module Mitigation ===
Covers: CVE-2026-31431 (algif_aead), DirtyFrag (esp4/esp6/rxrpc)
Blocked algif_aead
Blocked esp4
Blocked esp6
Blocked rxrpc
=== Mitigation complete. Sleeping ===
In some cases where the module is in use, the node will require a reboot. The log message will indicate this is required.
Step 7: Roll back or remove
If issues occur, or the DaemonSet is no longer required as the node image has been updated to a permanently patched release, you can remove the mitigation DaemonSet by deleting the ClusterResourcePlacement on the Fleet Manager hub cluster.
kubectl delete clusterresourceplacement kernel-lpe-cve-mit-ns-place
DaemonSet Rollout Outcome
- Immediate mitigation applied across clusters via DaemonSet.
- Controlled rollout reduces risk, allows testing on clusters in batches.
- No node reboot required when module not in use.
Permanent mitigation - node image upgrade
-
Make sure to review the steps in before you begin, especially adding update groups to member clusters.
-
Depending on the regional spread of your clusters, it is possible that the update run may pause for periods as it waits for the node image to be published to a region. This is expected behavior. You can apply the DaemonSet to your clusters to ensure they are protected while waiting for the updated node image to be available.
Once AKS provides a permanent mitigation via updated node images you can use Fleet Manager update runs to update the node image across your clusters.
Fleet Manager's safe multi-cluster updates feature enables:
- Application of consistent node image across clusters using the Node Image channel.
- Controlled rollout of the node image update via Update Strategies.
Step 1: Define an update strategy
An update strategy defines the order in which clusters will receive the node image update. Strategies are defined using a simple JSON format as shown next.
In the following sample we have one group of clusters per stage, with clusters grouped based on the update-group on the member cluster. We continue with the 4 hour soak time and have a 100% concurrency which allows up to 50 clusters to update at once. You can also set these up via the Azure portal if desired.
{
"stages": [
{
"name": "stage-1",
"maxConcurrency": "100%",
"groups": [
{
"name": "canary",
"maxConcurrency": "100%"
}
],
"afterStageWaitInSeconds": 14400
},
{
"name": "stage-2",
"maxConcurrency": "100%",
"groups": [
{
"name": "nonprod",
"maxConcurrency": "100%"
}
],
"afterStageWaitInSeconds": 14400
},
{
"name": "stage-3",
"maxConcurrency": "100%",
"groups": [
{
"name": "prod",
"maxConcurrency": "100%"
}
]
}
]
}
Save the strategy as a file (nodeimage-strategy.json) and then submit via the Azure CLI to create the Strategy ready for use in Fleet Manager.
export GROUP=resource-group
export FLEET=fleet-name
export STRATEGY=copyfail-mitigation-strategy
az fleet updatestrategy create \
--resource-group $GROUP \
--fleet-name $FLEET \
--name $STRATEGY \
--stages nodeimage-strategy.json
To update a strategy after you have created it, make changes to the JSON file and simply call the create command again.
Step 2: Create an update run for node image upgrades
Next, let's create a new update run for updating just the node image on selected clusters.
The update run won't automatically start, so you can create this once AKS releases the patched node images, then start the run at a later time that suits your schedule.
az fleet updaterun create \
--resource-group $GROUP \
--fleet-name $FLEET \
--name copyfail-nodeimage-update \
--upgrade-type NodeImageOnly \
--update-strategy-name $STRATEGY
Step 3: Start the update run
Once you are satisfied that all clusters to be upgraded are included, you can start the update run.
az fleet updaterun start \
--fleet-name $FLEET \
--resource-group $GROUP \
--name copyfail-nodeimage-update
Step 4: Monitor progress
You can review the status using the Azure CLI shown below, or use the Azure portal's web interface to view the rollout graphically.
az fleet updaterun show \
--resource-group $GROUP \
--fleet-name $FLEET \
--name copyfail-nodeimage-update
Node Image Upgrade Outcome
- All nodes are updated to the patched node image.
- DaemonSet no longer required and can be removed.
Summary and next steps
In this post we've looked at the multiple ways in which Azure Kubernetes Fleet Manager can help you safely apply mitigations for the Copy Fail CVE across all your clusters.
Fleet Manager continues to add new capabilities regularly, with the goal of simplifying at-scale Kubernetes cluster management for everyone.
Here's some recommended next steps:
- Review AKS Advisory and Mitigation for CVE-2026-31431 (Copy Fail).
- Learn more about Fleet Manager Safe Multi-cluster Updates.
- Learn more about Fleet Manager resource placement.
- View the Fleet Manager public roadmap. Feature requests always welcome!
- Join a KubeFleet community call to learn about Fleet Manager's open source resource placement.
