How to categorize this issue?
/area auto-scaling
/kind bug
/priority 3
What happened:
We recently identified a bug where when there are multiple sequential MCD replica scale downs by an external actor (for example cluster-autoscaler), we noticed that the MCS controller repeatedly picks the same machine for deletion.
This causes problems, specially with how the external actor processed the scale down. For example, when using cluster-autoscaler with the mcm provider, cluster-autoscaler will cordon nodes that it has marked for termination. This leads to cluster states with multiple cordoned nodes that that are not deleted.
What you expected to happen:
When selecting a node for deletion due to reducing replicas, MCM should not select nodes already in a terminating state.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version):
- Cloud provider or hardware configuration:
- Others:
How to categorize this issue?
/area auto-scaling
/kind bug
/priority 3
What happened:
We recently identified a bug where when there are multiple sequential MCD replica scale downs by an external actor (for example
cluster-autoscaler), we noticed that the MCS controller repeatedly picks the same machine for deletion.This causes problems, specially with how the external actor processed the scale down. For example, when using
cluster-autoscalerwith themcmprovider,cluster-autoscalerwill cordon nodes that it has marked for termination. This leads to cluster states with multiple cordoned nodes that that are not deleted.What you expected to happen:
When selecting a node for deletion due to reducing replicas, MCM should not select nodes already in a terminating state.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version):