Is your feature request related to a problem? Please describe.
Auto-upgrade in Azure Kubernetes Fleet Manager is currently event-driven and not state-reconciliating.
If an auto-upgrade profile is created after a Kubernetes patch version has already been released, no updaterun is triggered. As a result, clusters can remain on outdated patch versions indefinitely until a new version is released.
Example:
Cluster version: 1.33.5
Available versions: 1.33.6, 1.33.7
Auto-upgrade profile created after 1.33.7 rollout
Result: no upgrade happens
Additionally:
If a cluster is stopped during maintenance window, the upgrade is missed and not retried
There is no mechanism to “catch up” to the latest allowed patch version
This leads to non-deterministic behavior and breaks the expectation of automatic patching.
Describe the solution you'd like
Auto-upgrade should behave as a desired state reconciliation system, not only event-driven.
Expected behavior:
When auto-upgrade profile is created or updated:
system evaluates current cluster version
if cluster is behind → trigger updaterun immediately
System should periodically reconcile:
if cluster version < latest allowed by policy → trigger upgrade
If upgrade fails (e.g. cluster stopped):
retry within maintenance window until successful
Optional configuration:
reconcileOnCreate: true
catchUpToLatestPatch: true
retryUntilSuccess: true
Describe alternatives you've considered
Current workarounds:
Manually triggering upgrades via CLI/API
Changing upgrade policy to force re-evaluation
Building custom automation (pipelines/scripts) to detect and upgrade outdated clusters
These approaches:
add operational overhead
defeat the purpose of “auto-upgrade”
introduce additional failure points
Additional context
Current behavior depends on:
timing of Kubernetes version rollout
timing of auto-upgrade profile creation
cluster availability during maintenance window
This makes the system non-deterministic:
two identical clusters may end up on different versions depending on timing
The system already supports manual updaterun triggering, which indicates that the capability exists, but is not used automatically for reconciliation.
Expected behavior for users:
clusters should converge to the latest allowed version defined by policy, regardless of when the profile was created or whether an event was missed.
Is your feature request related to a problem? Please describe.
Auto-upgrade in Azure Kubernetes Fleet Manager is currently event-driven and not state-reconciliating.
If an auto-upgrade profile is created after a Kubernetes patch version has already been released, no updaterun is triggered. As a result, clusters can remain on outdated patch versions indefinitely until a new version is released.
Example:
Cluster version: 1.33.5
Available versions: 1.33.6, 1.33.7
Auto-upgrade profile created after 1.33.7 rollout
Result: no upgrade happens
Additionally:
If a cluster is stopped during maintenance window, the upgrade is missed and not retried
There is no mechanism to “catch up” to the latest allowed patch version
This leads to non-deterministic behavior and breaks the expectation of automatic patching.
Describe the solution you'd like
Auto-upgrade should behave as a desired state reconciliation system, not only event-driven.
Expected behavior:
When auto-upgrade profile is created or updated:
system evaluates current cluster version
if cluster is behind → trigger updaterun immediately
System should periodically reconcile:
if cluster version < latest allowed by policy → trigger upgrade
If upgrade fails (e.g. cluster stopped):
retry within maintenance window until successful
Optional configuration:
reconcileOnCreate: true
catchUpToLatestPatch: true
retryUntilSuccess: true
Describe alternatives you've considered
Current workarounds:
Manually triggering upgrades via CLI/API
Changing upgrade policy to force re-evaluation
Building custom automation (pipelines/scripts) to detect and upgrade outdated clusters
These approaches:
add operational overhead
defeat the purpose of “auto-upgrade”
introduce additional failure points
Additional context
Current behavior depends on:
timing of Kubernetes version rollout
timing of auto-upgrade profile creation
cluster availability during maintenance window
This makes the system non-deterministic:
two identical clusters may end up on different versions depending on timing
The system already supports manual updaterun triggering, which indicates that the capability exists, but is not used automatically for reconciliation.
Expected behavior for users:
clusters should converge to the latest allowed version defined by policy, regardless of when the profile was created or whether an event was missed.