Skip to content

aks: support BYO VNet for Automatic Managed System Pool clusters#33259

Open
wenhug wants to merge 20 commits intoAzure:devfrom
wenhug:wenhug/aks-hobo-byo-vnet
Open

aks: support BYO VNet for Automatic Managed System Pool clusters#33259
wenhug wants to merge 20 commits intoAzure:devfrom
wenhug:wenhug/aks-hobo-byo-vnet

Conversation

@wenhug
Copy link
Copy Markdown

@wenhug wenhug commented Apr 23, 2026

Summary

Adds core az aks create support for Automatic SKU Managed System Pool BYO VNet, plus fixes follow-up AKS commands against clusters whose server-side agentPoolProfiles is null.

New az aks create flags:

  • --system-node-subnet-id
  • --node-subnet-id
  • --enable-hosted-system

Behavior:

  • --enable-hosted-system is valid only with --sku automatic. It sets hostedSystemProfile.enabled=true and skips synthesizing the CLI default agentPoolProfiles, because the RP provisions the Managed System Pool server-side.
  • BYO VNet is implied when --system-node-subnet-id, --node-subnet-id, and the existing --apiserver-subnet-id are supplied together on --sku automatic.
  • BYO VNet populates hostedSystemProfile.systemNodeSubnetID, hostedSystemProfile.nodeSubnetID, and apiServerAccessProfile.subnetId; it also sets apiServerAccessProfile.enableVnetIntegration=true.
  • BYO VNet defaults to loadBalancer outbound instead of the normal Automatic no-VNet managedNATGateway default. BYO subnets also satisfy the VNet requirement for userAssignedNATGateway and userDefinedRouting; managedNATGateway is rejected with BYO/custom VNet input.
  • The CLI grants Network Contributor on all BYO subnets for service-principal/user-assigned identity creates, and defers the grants for system-assigned identity creates until the cluster identity exists.
  • az aks upgrade, az aks scale, and az aks update now handle Managed System Pool clusters with agentPoolProfiles=null; update also preserves the existing outbound type instead of reapplying the Automatic create default.

Validation:

  • Partial BYO trio -> RequiredArgumentMissingError listing missing flags.
  • BYO trio without --sku automatic -> RequiredArgumentMissingError.
  • --enable-hosted-system without --sku automatic -> RequiredArgumentMissingError.

Test plan

  • python -m pytest src/azure-cli/azure/cli/command_modules/acs/tests/latest/test_managed_cluster_decorator.py - 248 passed
  • git diff --check
  • Verified local core CLI import path is src/azure-cli/azure/cli/command_modules/acs/managed_cluster_decorator.py
  • Verified aks-preview is not installed; az aks create -h shows the new core CLI flags from this PR
  • Live EUAP BYO VNet create in eastus2euap succeeded with hostedSystemProfile.enabled=true, agentPoolProfiles=null, nodeProvisioningProfile.mode=Auto, and networkProfile.outboundType=loadBalancer
  • Verified the user-assigned identity has Network Contributor assignments on the system-node, node, and apiserver BYO subnets
  • Live az aks nodepool list returned []; az aks get-credentials succeeded
  • Live az aks scale returned a friendly "no scalable node pools" error instead of crashing
  • Live az aks upgrade --node-image-only --yes completed without a NoneType crash
  • Live az aks update --tags ... was accepted by the RP; the smoke-test tag is visible and outboundType remains loadBalancer while the RP operation continues

… crash

Add --system-node-subnet-id, --node-subnet-id, --disable-hosted-system
to 'az aks create'. When the subnet trio (system-node, node, apiserver)
is supplied on --sku automatic, the cluster is created with an MC
hosted_system_profile carrying BYO subnets; the Enabled flag is left
unset so the server decides the default. --disable-hosted-system
deterministically opts an Automatic cluster out of HOBO.

Validate the BYO VNet trio up front:
- Partial trio -> RequiredArgumentMissingError listing missing flags.
- Trio without --sku automatic -> RequiredArgumentMissingError.
- --disable-hosted-system + any subnet flag -> MutuallyExclusiveArgumentError.
- --disable-hosted-system without --sku automatic -> RequiredArgumentMissingError.

Fix 'az aks upgrade' / 'az aks scale' crash on HOBO clusters where
agent_pool_profiles can be None server-side ('NoneType is not iterable').

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 23, 2026 22:32
@azure-client-tools-bot-prd
Copy link
Copy Markdown

Validation for Azure CLI Full Test Starting...

Thanks for your contribution!

@azure-client-tools-bot-prd
Copy link
Copy Markdown

Hi @wenhug,
Since the current milestone time is less than 7 days, this pr will be reviewed in the next milestone.

@azure-client-tools-bot-prd
Copy link
Copy Markdown

Validation for Breaking Change Starting...

Thanks for your contribution!

@yonzhan
Copy link
Copy Markdown
Collaborator

yonzhan commented Apr 23, 2026

Thank you for your contribution! We will review the pull request and get back to you soon.

@github-actions
Copy link
Copy Markdown

The git hooks are available for azure-cli and azure-cli-extensions repos. They could help you run required checks before creating the PR.

Please sync the latest code with latest dev branch (for azure-cli) or main branch (for azure-cli-extensions).
After that please run the following commands to enable git hooks:

pip install azdev --upgrade
azdev setup -c <your azure-cli repo path> -r <your azure-cli-extensions repo path>

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CLI support for BYO VNet on AKS Automatic SKU (HOBO) clusters via new az aks create flags, and hardens az aks upgrade against RP responses where agentPoolProfiles is null.

Changes:

  • Introduce --system-node-subnet-id/--sys-node-subnet-id, --node-subnet-id, and --disable-hosted-system for az aks create, plus validation and request shaping via hosted_system_profile.
  • Update help/params/validators to expose and validate the new subnet arguments.
  • Prevent az aks upgrade from crashing when instance.agent_pool_profiles is None.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/azure-cli/azure/cli/command_modules/acs/managed_cluster_decorator.py Adds new context getters, validation, and hosted_system_profile setup during create flow.
src/azure-cli/azure/cli/command_modules/acs/custom.py Adds new aks_create parameters and guards aks_upgrade iterations over agent_pool_profiles.
src/azure-cli/azure/cli/command_modules/acs/_validators.py Adds subnet ID validators for the new flags.
src/azure-cli/azure/cli/command_modules/acs/_params.py Wires new flags into aks create argument registration.
src/azure-cli/azure/cli/command_modules/acs/_help.py Documents the new create flags and intended usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/azure-cli/azure/cli/command_modules/acs/custom.py
Comment thread src/azure-cli/azure/cli/command_modules/acs/managed_cluster_decorator.py Outdated
Comment thread src/azure-cli/azure/cli/command_modules/acs/managed_cluster_decorator.py Outdated
* Simplify --system-node-subnet-id registration (drop the --sys-node-subnet-id
  alias so the linter picks up help correctly).
* Relax _get_apiserver_subnet_id CREATE-time check: don't require
  --vnet-subnet-id when BYO HOBO subnets are set, since system-node/node
  subnets replace vnet-subnet-id on --sku automatic.
* Run _validate_byo_hobo_subnets up front in set_up_api_server_access_profile
  so the targeted "require --sku automatic" error beats the generic
  --apiserver-subnet-id messaging.
* Also fix aks_scale against HOBO clusters where agent_pool_profiles is None
  (same crash Qizhe hit with aks_upgrade): guard with `or []` and return a
  user-friendly error for empty pools.
* Add linter_exclusions entries for the three new parameters
  (missing_parameter_test_coverage) to keep azdev-linter green without
  recorded scenario tests at this stage.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
wenhug added 5 commits April 23, 2026 23:38
When customers pass --system-node-subnet-id / --node-subnet-id /
--apiserver-subnet-id on --sku automatic to bring their own VNet for
HOBO, the CLI was producing a payload the RP rejected:

1. apiServerAccessProfile.enableVnetIntegration was not set, so the RP
   treated the cluster as default-VNet while subnetId was populated and
   returned ApiserverSubnetConfigError. Auto-wire enable_vnet_integration
   whenever the BYO HOBO subnet trio is present.
2. hostedSystemProfile.enabled was left unset, so the RP could not
   distinguish BYO HOBO from default mode. Set enabled=True when the
   subnet trio is provided.
3. agentPoolProfiles contained the default system pool, which the RP
   rejected because HOBO manages node pools itself. Clear
   agent_pool_profiles in BYO HOBO mode, matching the preview path.
4. outbound_type defaulted to managedNATGateway for Automatic SKU, which
   the RP disallows on BYO VNet. Keep the user's explicit value (or let
   it default to loadBalancer) when the BYO trio is provided.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
HOBO (Automatic SKU Hosted Overlay System Pool) clusters have
agent_pool_profiles=null on the RP side because node pools are
server-managed. update_agentpool_profile was raising
'Encounter an unexpected error while getting agent pool profiles...'
on any 'az aks update' against a HOBO cluster (including 'az aks
update --sku base' for Automatic-to-Base downgrade). Skip that step
when hostedSystemProfile.enabled is true.

Also refines the Automatic-SKU outbound-type override: keep the
existing 'default to ManagedNATGateway when no user value and no
vnet subnet' behavior unchanged; the BYO-HOBO exemption added in
the prior commit is already enough.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Address Copilot review feedback on PR Azure#33259:

- Clarify _validate_byo_hobo_subnets docstring: BYO VNet HOBO is triggered
  only by --system-node-subnet-id / --node-subnet-id. --apiserver-subnet-id
  keeps its existing general-purpose meaning for --enable-apiserver-vnet-integration
  flows on non-HOBO clusters, so it is deliberately not part of the trigger
  or the mutual-exclusion set.
- Remove the unused 'any_trio_set' placeholder variable.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Fix pylint W0212 (protected-access) reported by CI: the validator is
called across classes (AKSManagedClusterCreateDecorator accessing
AKSManagedClusterContext), so it should be a public method.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Docstring still described the earlier 'enabled left unset' behavior,
but the code now sets enabled=True on BYO VNet HOBO trio (required so
the RP treats the request as BYO rather than default-VNet mode) and
clears agent_pool_profiles because HOBO manages node pools server-side.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Comment thread src/azure-cli/azure/cli/command_modules/acs/_help.py Outdated
Comment thread src/azure-cli/azure/cli/command_modules/acs/_help.py Outdated
Comment thread src/azure-cli/azure/cli/command_modules/acs/_help.py Outdated
Comment thread src/azure-cli/azure/cli/command_modules/acs/_params.py Outdated
Comment thread src/azure-cli/azure/cli/command_modules/acs/custom.py Outdated
Comment thread src/azure-cli/azure/cli/command_modules/acs/custom.py Outdated
Comment thread src/azure-cli/azure/cli/command_modules/acs/linter_exclusions.yml Outdated
Comment thread src/azure-cli/azure/cli/command_modules/acs/managed_cluster_decorator.py Outdated
wenhug added 7 commits April 24, 2026 05:04
Per review feedback:
- Remove --disable-hosted-system flag entirely (PM decision).
- Rename user-visible HOBO / Hosted Overlay System Pool terminology to
  Managed System Pool for Automatic cluster.
- Drop associated getter, validator branch, param, linter exclusion,
  and related test cases.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Rework short/long summaries for --system-node-subnet-id and --node-subnet-id
so each flag clearly explains which pool it maps to (Managed System Pool vs
user node pools) and states that the full three-subnet trio (including
--apiserver-subnet-id) must belong to the same VNet and requires --sku
automatic.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
…eset

- Rewrite --system-node-subnet-id and --node-subnet-id short summaries to
  follow the 'The ID of a subnet in an existing VNet to be used by ...'
  style already used for --vnet-subnet-id.
- Rewrite the comment above the BYO-path 'agent_pool_profiles = None'
  assignment to explain the real reason: on an Automatic cluster with BYO
  VNet, the RP provisions the system pool from hosted_system_profile, so
  the CLI-synthesized default agent pool entry conflicts with the BYO
  trio and must be cleared.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Give power users an explicit way to request a Managed System Pool on
Automatic SKU clusters, independent of the region-level default toggle.

- `--enable-hosted-system` sets `hosted_system_profile.enabled=True` and
  clears the CLI-synthesized default agent pool. This avoids the
  ghost-pool problem on non-BYO Automatic clusters in toggle-ON regions
  where the RP auto-enables HOBO but the CLI still ships a default pool.
- The BYO VNet subnet trio implies `--enable-hosted-system`, so existing
  BYO flows keep working unchanged.
- `--enable-hosted-system` is gated to `--sku automatic`.

Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
@wenhug wenhug changed the title aks: BYO VNet support for Automatic SKU HOBO clusters + fix upgrade crash aks: support BYO VNet for Automatic Managed System Pool clusters Apr 24, 2026
Comment thread src/azure-cli/azure/cli/command_modules/acs/managed_cluster_decorator.py Outdated
wenhug added 4 commits April 24, 2026 23:36
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Signed-off-by: wenhug <50309350+wenhug@users.noreply.github.com>
Comment thread src/azure-cli/azure/cli/command_modules/acs/managed_cluster_decorator.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

act-observability-squad AKS az aks/acs/openshift Auto-Assign Auto assign by bot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants