[Feature] Fleet: Auto-upgrade is not retroactive and does not reconcile clusters to latest patch version

**Is your feature request related to a problem? Please describe.**

Auto-upgrade in Azure Kubernetes Fleet Manager is currently event-driven and not state-reconciliating.
If an auto-upgrade profile is created after a Kubernetes patch version has already been released, no updaterun is triggered. As a result, clusters can remain on outdated patch versions indefinitely until a new version is released.

Example:

Cluster version: 1.33.5
Available versions: 1.33.6, 1.33.7
Auto-upgrade profile created after 1.33.7 rollout
Result: no upgrade happens

Additionally:

If a cluster is stopped during maintenance window, the upgrade is missed and not retried
There is no mechanism to “catch up” to the latest allowed patch version
This leads to non-deterministic behavior and breaks the expectation of automatic patching.

**Describe the solution you'd like**

Auto-upgrade should behave as a desired state reconciliation system, not only event-driven.

Expected behavior:

When auto-upgrade profile is created or updated:
system evaluates current cluster version
if cluster is behind → trigger updaterun immediately
System should periodically reconcile:
if cluster version < latest allowed by policy → trigger upgrade
If upgrade fails (e.g. cluster stopped):
retry within maintenance window until successful

Optional configuration:

reconcileOnCreate: true
catchUpToLatestPatch: true
retryUntilSuccess: true
**Describe alternatives you've considered**

Current workarounds:
Manually triggering upgrades via CLI/API
Changing upgrade policy to force re-evaluation
Building custom automation (pipelines/scripts) to detect and upgrade outdated clusters

These approaches:
add operational overhead
defeat the purpose of “auto-upgrade”
introduce additional failure points

**Additional context**

Current behavior depends on:
timing of Kubernetes version rollout
timing of auto-upgrade profile creation
cluster availability during maintenance window

This makes the system non-deterministic:

two identical clusters may end up on different versions depending on timing
The system already supports manual updaterun triggering, which indicates that the capability exists, but is not used automatically for reconciliation.

Expected behavior for users:
clusters should converge to the latest allowed version defined by policy, regardless of when the profile was created or whether an event was missed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Fleet: Auto-upgrade is not retroactive and does not reconcile clusters to latest patch version #5741

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Fleet: Auto-upgrade is not retroactive and does not reconcile clusters to latest patch version #5741

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions