Skip to content

OCPNODE-3880: Add criocredentialproviderconfig event handler#5487

Open
QiWang19 wants to merge 1 commit intoopenshift:mainfrom
QiWang19:addcriocpcontroller
Open

OCPNODE-3880: Add criocredentialproviderconfig event handler#5487
QiWang19 wants to merge 1 commit intoopenshift:mainfrom
QiWang19:addcriocpcontroller

Conversation

@QiWang19
Copy link
Member

@QiWang19 QiWang19 commented Dec 11, 2025

- What I did
Implement criocredentialprovierconfig that is used by crio-credential-provider plugin to fetch private mirror image pull secrets from the secret object.

The handler creates 97-pool-generated-credentialproviderconfig to rollout configurations to file /etc/kubernetes/credential-providers/[platform]-credential-provider.yaml .

workflow: https://github.com/openshift/enhancements/blob/master/enhancements/api-review/criocredentialproviderconfig-for-namespace-scoped-mirror-authentication.md#workflow-description
- How to verify it

  1. create cluster CRIOCredentialProviderConfig resource, file updated with a new section name: crio-credential-provider
apiVersion: config.openshift.io/v1alpha1
kind: CRIOCredentialProviderConfig
metadata:
  name: cluster
spec:
  matchImages:
  - docker.io
  - 123456789.dkr.ecr.us-east-1.amazonaws.com
  - "*.azurecr.io"
  - gcr.io
  - "*.*.registry.io"
  - registry.io:8080/path
$ oc get mc
97-master-generated-credentialproviderconfig       f6dc328e4c370e377ea5c878aa4c0b8feeb1b181   3.5.0             2s
97-worker-generated-credentialproviderconfig       f6dc328e4c370e377ea5c878aa4c0b8feeb1b181   3.5.0             2s

$ oc describe criocredentialproviderconfig.config.openshift.io/cluster
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1alpha1
Kind:         CRIOCredentialProviderConfig
Metadata:
  Creation Timestamp:  2026-02-16T21:26:17Z
  Generation:          1
  Resource Version:    42066
  UID:                 05d173ad-7b8d-4a28-afb9-b587fe565e16
Spec:
  Match Images:
    docker.io
    123456789.dkr.ecr.us-east-1.amazonaws.com
    *.azurecr.io
    gcr.io
    *.*.registry.io
    registry.io:8080/path
Status:
  Conditions:
    Last Transition Time:  2026-02-16T21:26:17Z
    Message:               Success
    Observed Generation:   1
    Reason:                MachineConfigRenderingSucceeded
    Status:                True
    Type:                  MachineConfigRendered
Events:                    <none>

sh-5.1# cat /etc/kubernetes/credential-providers/ecr-credential-provider.yaml 
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- apiVersion: credentialprovider.kubelet.k8s.io/v1
  defaultCacheDuration: 12h0m0s
  matchImages:
  - '*.dkr.ecr.*.amazonaws.com'
  - '*.dkr.ecr.*.amazonaws.com.cn'
  - '*.dkr.ecr-fips.*.amazonaws.com'
  - '*.dkr.ecr.us-iso-east-1.c2s.ic.gov'
  - '*.dkr.ecr.us-isob-east-1.sc2s.sgov.gov'
  name: ecr-credential-provider
- apiVersion: credentialprovider.kubelet.k8s.io/v1
  defaultCacheDuration: 1s
  matchImages:
  - docker.io
  - 123456789.dkr.ecr.us-east-1.amazonaws.com
  - '*.azurecr.io'
  - gcr.io
  - '*.*.registry.io'
  - registry.io:8080/path
  name: crio-credential-provider
  tokenAttributes:
    cacheType: Token
    requireServiceAccount: false
    serviceAccountTokenAudience: https://kubernetes.default.svc

  1. create namespace rolebing allow serviceaccount to get secrect in the namespace e.g. namespace:mynamespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: credential-provider-secret-reader
  namespace: mynamespace 
rules:
- apiGroups: [""]
  resources: ["secrets"] 
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: credential-provider-secret-reader-binding
  namespace: mynamespace 
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: credential-provider-secret-reader
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: User
    name: system:serviceaccount:mynamespace:default
  1. Create ImageDigestMirrorSet/ImageTagMirrorSet mirror configurations
  2. Create a pod containers.image is from mirror source registry
  3. Check crio-credential-provider log journalctl _COMM=crio-credential on the scheduled node

- Description for the changelog

Summary by CodeRabbit

  • New Features

    • Manage container runtime credential providers via a new CRIOCredentialProviderConfig and support for kubelet credential-provider flags on master, worker, and arbiter nodes.
  • Chores / Defaults

    • Cluster defaults updated to reload/restart kubelet when credential provider configs change.
  • Security & Permissions

    • Added cluster-scoped RBAC to permit node credential provider operations.
  • Tests

    • Added extensive tests covering CRIO credential provider creation, updates, merging, and machineconfig integration.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 11, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2025
@QiWang19
Copy link
Member Author

/test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: QiWang19
Once this PR has been reviewed and has the lgtm label, please assign cheesesashimi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch 3 times, most recently from 865fa47 to 8a072e4 Compare December 13, 2025 06:29
@QiWang19
Copy link
Member Author

/test all

@QiWang19 QiWang19 changed the title Addcriocpcontroller OCPNODE-3880: Add criocredentialprovierconfig pcontroller Dec 23, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 23, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Dec 23, 2025

@QiWang19: This pull request references OCPNODE-3880 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

- What I did

- How to verify it

apiVersion: config.openshift.io/v1alpha1
kind: CRIOCredentialProviderConfig
metadata:
 name: cluster
spec:
 matchImages:
 - 123456789.dkr.ecr.us-east-1.amazonaws.com
 - "*.azurecr.io"
 - "*.*.registry.io"
 - registry.io:8080/path
sh-5.1# cat /etc/kubernetes/credential-providers/gcr-credential-provider.yaml 
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- apiVersion: credentialprovider.kubelet.k8s.io/v1
 args:
 - get-credentials
 - --v=3
 defaultCacheDuration: 1m0s
 matchImages:
 - gcr.io
 - '*.gcr.io'
 - '*.pkg.dev'
 - container.cloud.google.com
 name: gcr-credential-provider
- apiVersion: credentialprovider.kubelet.k8s.io/v1
 defaultCacheDuration: 1s
 matchImages:
 - registry.io:8080/path
 - 123456789.dkr.ecr.us-east-1.amazonaws.com
 - '*.azurecr.io'
 - '*.*.registry.io'
 name: crio-credential-provider
 tokenAttributes:
   cacheType: Token
   requireServiceAccount: false
   serviceAccountTokenAudience: https://kubernetes.default.svc

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 21, 2026
@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch from 8a072e4 to 75dfbfd Compare January 26, 2026 15:35
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 26, 2026
@QiWang19
Copy link
Member Author

/test all

@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch from 75dfbfd to 511fde6 Compare January 27, 2026 01:46
@QiWang19
Copy link
Member Author

/test all

@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch from 511fde6 to 0ff0fe8 Compare January 27, 2026 18:42
@QiWang19
Copy link
Member Author

/test all

@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch from 0ff0fe8 to 5d5008b Compare January 28, 2026 15:25
@QiWang19
Copy link
Member Author

/test all

@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch from 5d5008b to 7d79743 Compare January 30, 2026 14:15
@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch 2 times, most recently from 147fbf2 to 3027d6a Compare February 15, 2026 16:17
@QiWang19
Copy link
Member Author

/test all

@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch from 3027d6a to 9a477a8 Compare February 16, 2026 19:53
@QiWang19
Copy link
Member Author

/test all

@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch from 9a477a8 to 58fa6bc Compare February 16, 2026 20:59
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 16, 2026

@QiWang19: This pull request references OCPNODE-3880 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

- What I did

- How to verify it

create cluster CRIOCredentialProviderConfig resource

apiVersion: config.openshift.io/v1alpha1
kind: CRIOCredentialProviderConfig
metadata:
 name: cluster
spec:
 matchImages:
 - docker.io
 - 123456789.dkr.ecr.us-east-1.amazonaws.com
 - "*.azurecr.io"
 - gcr.io
 - "*.*.registry.io"
 - registry.io:8080/path
$ oc describe criocredentialproviderconfig.config.openshift.io/cluster
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1alpha1
Kind:         CRIOCredentialProviderConfig
Metadata:
 Creation Timestamp:  2026-02-16T21:26:17Z
 Generation:          1
 Resource Version:    42066
 UID:                 05d173ad-7b8d-4a28-afb9-b587fe565e16
Spec:
 Match Images:
   docker.io
   123456789.dkr.ecr.us-east-1.amazonaws.com
   *.azurecr.io
   gcr.io
   *.*.registry.io
   registry.io:8080/path
Status:
 Conditions:
   Last Transition Time:  2026-02-16T21:26:17Z
   Message:               Success
   Observed Generation:   1
   Reason:                MachineConfigRenderingSucceeded
   Status:                True
   Type:                  MachineConfigRendered
Events:                    <none>

sh-5.1# cat /etc/kubernetes/credential-providers/ecr-credential-provider.yaml 
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- apiVersion: credentialprovider.kubelet.k8s.io/v1
 defaultCacheDuration: 12h0m0s
 matchImages:
 - '*.dkr.ecr.*.amazonaws.com'
 - '*.dkr.ecr.*.amazonaws.com.cn'
 - '*.dkr.ecr-fips.*.amazonaws.com'
 - '*.dkr.ecr.us-iso-east-1.c2s.ic.gov'
 - '*.dkr.ecr.us-isob-east-1.sc2s.sgov.gov'
 name: ecr-credential-provider
- apiVersion: credentialprovider.kubelet.k8s.io/v1
 defaultCacheDuration: 1s
 matchImages:
 - docker.io
 - 123456789.dkr.ecr.us-east-1.amazonaws.com
 - '*.azurecr.io'
 - gcr.io
 - '*.*.registry.io'
 - registry.io:8080/path
 name: crio-credential-provider
 tokenAttributes:
   cacheType: Token
   requireServiceAccount: false
   serviceAccountTokenAudience: https://kubernetes.default.svc

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 16, 2026

@QiWang19: This pull request references OCPNODE-3880 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

- What I did

- How to verify it

  1. create cluster CRIOCredentialProviderConfig resource
apiVersion: config.openshift.io/v1alpha1
kind: CRIOCredentialProviderConfig
metadata:
 name: cluster
spec:
 matchImages:
 - docker.io
 - 123456789.dkr.ecr.us-east-1.amazonaws.com
 - "*.azurecr.io"
 - gcr.io
 - "*.*.registry.io"
 - registry.io:8080/path
$ oc describe criocredentialproviderconfig.config.openshift.io/cluster
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1alpha1
Kind:         CRIOCredentialProviderConfig
Metadata:
 Creation Timestamp:  2026-02-16T21:26:17Z
 Generation:          1
 Resource Version:    42066
 UID:                 05d173ad-7b8d-4a28-afb9-b587fe565e16
Spec:
 Match Images:
   docker.io
   123456789.dkr.ecr.us-east-1.amazonaws.com
   *.azurecr.io
   gcr.io
   *.*.registry.io
   registry.io:8080/path
Status:
 Conditions:
   Last Transition Time:  2026-02-16T21:26:17Z
   Message:               Success
   Observed Generation:   1
   Reason:                MachineConfigRenderingSucceeded
   Status:                True
   Type:                  MachineConfigRendered
Events:                    <none>

sh-5.1# cat /etc/kubernetes/credential-providers/ecr-credential-provider.yaml 
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- apiVersion: credentialprovider.kubelet.k8s.io/v1
 defaultCacheDuration: 12h0m0s
 matchImages:
 - '*.dkr.ecr.*.amazonaws.com'
 - '*.dkr.ecr.*.amazonaws.com.cn'
 - '*.dkr.ecr-fips.*.amazonaws.com'
 - '*.dkr.ecr.us-iso-east-1.c2s.ic.gov'
 - '*.dkr.ecr.us-isob-east-1.sc2s.sgov.gov'
 name: ecr-credential-provider
- apiVersion: credentialprovider.kubelet.k8s.io/v1
 defaultCacheDuration: 1s
 matchImages:
 - docker.io
 - 123456789.dkr.ecr.us-east-1.amazonaws.com
 - '*.azurecr.io'
 - gcr.io
 - '*.*.registry.io'
 - registry.io:8080/path
 name: crio-credential-provider
 tokenAttributes:
   cacheType: Token
   requireServiceAccount: false
   serviceAccountTokenAudience: https://kubernetes.default.svc

  1. create namespace rolebing e.g. namespace:mynamespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 name: credential-provider-secret-reader
 namespace: mynamespace 
rules:
- apiGroups: [""]
 resources: ["secrets"] 
 verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
 name: credential-provider-secret-reader-binding
 namespace: mynamespace 
roleRef:
 apiGroup: rbac.authorization.k8s.io
 kind: Role
 name: credential-provider-secret-reader
subjects:
 - apiGroup: rbac.authorization.k8s.io
   kind: User
   name: system:serviceaccount:mynamespace:default
  1. Create ImageDigestMirrorSet/ImageTagMirrorSet mirror configurations
  2. Create a pod containers.image is from source
  3. Check crio-credential-provider log journalctl _COMM=crio-credential on the scheduled node

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.


for _, pool := range mcpPools {
applied := true
role := pool.Name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems only used once. do we need?

klog.Warningf("error updating CRIOCredentialProviderConfig status: %v", statusUpdateErr)
}
// Want to return the actual error received from the sync function
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this pattern but i'm sure there's precedent. seems better to have this function not take nor return an error and then return in caller rather than double returning with no modification

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. This Sync status function focuses on status update, let the caller return the actual err.

@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch from 0227c1a to 4be43bd Compare March 5, 2026 21:25
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Mar 5, 2026
Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid start, the controller plumbing follows existing patterns well and the handleQueueErr refactor is a nice cleanup. Left some inline comments.

}

if err := retry.RetryOnConflict(updateBackoff, func() error {
if crioCredentialProviderConfig != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always true at this point since the function already returned early for IsNotFound and the disabled case. Remove the check and flatten the retry body.

crioCredentialProviderConfig, err = ctrl.criocpLister.Get("cluster")
if errors.IsNotFound(err) {
klog.V(2).Infof("CRIOCredentialProviderConfig 'cluster' does not exist or has been deleted")
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enhancement requires cleanup of generated MachineConfigs when the CR is deleted. The 97-<pool>-generated-credentialproviderconfig MCs are left orphaned here. On downgrade, nodes would still reference a crio-credential-provider binary that may no longer exist. This should iterate pools and delete the managed MCs.

case apicfgv1.AzurePlatformType:
credProviderConfigPath = fmt.Sprintf(credProviderConfigPathFormat, "acr")
default:
return nil, "", fmt.Errorf("unsupported platform type: %s", cc.Spec.Infra.Status.PlatformStatus.Type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-errors on any platform that isn't AWS/GCP/Azure. The upstream crio-credential-provider works standalone: its reference credential-provider-config.yml has only the crio-credential-provider entry with no platform-specific provider. For bare-metal, vSphere, OpenStack, etc., there's no pre-existing credential provider config to merge into, but you could generate a standalone config with just the crio-credential-provider entry. As-is, the feature is broken on non-cloud platforms even though the enhancement says it applies to standalone OCP and SNO.

// Get ControllerConfig
controllerConfig, err := ctrl.ccLister.Get(ctrlcommon.ControllerConfigName)
if err != nil {
return fmt.Errorf("could not get ControllerConfig %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing colon before %w. Should be "could not get ControllerConfig: %w".


// matchImages is not expected to be empty here as the caller should skip calling this function if there are no images
images := []string{}
for image := range matchImages {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map iteration order in Go is non-deterministic. Each sync could produce different YAML, causing unnecessary MC updates and node rollouts (kubelet restarts). Add sort.Strings(images) after building the slice.

}

type credentialProviderConfigWithVersion struct {
APIVersion string `json:"apiVersion"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enhancement requires: "validate the matchImages field to ensure no conflicts with existing non-CRI-O credential providers. ignores conflicting registries and adds logs to CR's status condition." No validation exists. If a user specifies images overlapping with the platform provider (e.g. *.dkr.ecr.*.amazonaws.com), both providers match the same images, causing undefined kubelet behavior.

return nil, fmt.Errorf("could not decode CRIOCredentialProvider config for role %s: %w", role, err)
}

matchImages := make(map[string]bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converts slice to map[string]bool only for updateCredentialProviderConfig to convert it back to a slice. The map is the source of the non-determinism bug (see other comment on helpers.go). Pass the slice directly and deduplicate with a sorted slice if needed.

fmt.Sprintf("Error: %v", err),
)
}
if len(args) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When err == nil and len(args) > 0, condition is never assigned (the first branch requires len(args) == 0, the second requires err != nil), so condition.Message will nil-pointer panic. This if len(args) > 0 block should be nested inside if err != nil.

}

func TestCrioCredentialProviderConfigCreate(t *testing.T) {
for _, platform := range []apicfgv1.PlatformType{apicfgv1.AWSPlatformType} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only tests AWS. GCP and Azure should be covered given the platform switch. More importantly, a non-cloud platform test would catch the hard-error issue in generateOriginalCredentialProviderConfig.

}
}

func TestCrioCredentialProviderConfigUpdate(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test for CR deletion. The enhancement and the downgrade strategy require that generated MachineConfigs are cleaned up when the CR is removed.

Copy link
Member Author

@QiWang19 QiWang19 Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During API review, this CRD was explicitly constrained to a singleton and OpenShift managed resource, we do not support user delete/recreate, and it is not required to remove this singleton as part of a downgrade flow.
The enhancement should be updated to clarify that.
Instead of testing CR deletion, added TestCrioCredentialProviderConfigCreateEmpty to verify that shipping a CR with an empty spec correctly handles the no-op scenario.

@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch 7 times, most recently from c09c4eb to ce0b3cb Compare March 8, 2026 20:40
@QiWang19 QiWang19 changed the title OCPNODE-3880: Add criocredentialprovierconfig event handler OCPNODE-3880: Add criocredentialproviderconfig event handler Mar 8, 2026
@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch 2 times, most recently from 6229834 to 89ae4bf Compare March 9, 2026 02:55
}()

if !ctrl.criocpEnabled() || !ctrl.addedCRIOCPObservers {
klog.V(2).Infof("CRIOCredentialProviderConfig observer not added, skipping sync")
Copy link
Member

@saschagrunert saschagrunert Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The owner reference does not set Controller: true, so GC won't cascade-delete the managed MCs. ContainerRuntimeConfig handles this with finalizers and cascadeDelete. Consider doing the same here: iterate pools, delete the 97-<pool>-generated-credentialproviderconfig MCs, then return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This controller only reconciles the singleton CRIOCredentialProviderConfig named cluster, and cluster objects are OpenShift managed and not user deletable, enforced in openshift-apiserver. Because there is no supported delete path for this CR, we do not have an orphaned MC in this case.

sort.Strings(conflictList)

if len(matchImages) > 0 {
for img := range matchImages {
Copy link
Member

@saschagrunert saschagrunert Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When all images overlap, matchImages is empty and the else branch re-adds every newImage unfiltered, contradicting the overlap status reported to the user.

Suggested change
for img := range matchImages {
if len(matchImages) > 0 {
for img := range matchImages {
images = append(images, img)
}
} else if len(overlappedEntries) == 0 {
// No existing providers to conflict with (e.g. non-cloud platform)
for _, img := range newImages {
images = append(images, string(img))
}
}

)
} else {
condition = apihelpers.NewCondition(
apicfgv1alpha1.ConditionTypeMachineConfigRendered,
Copy link
Member

@saschagrunert saschagrunert Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The partially-applied call site passes the Reason constant as args[0], which gets used as the fmt.Sprintf format string here. Since the Reason has no % verbs, the actual message and overlappedEntries are silently discarded. The condition Reason also stays ReasonMachineConfigRenderingSucceeded instead of ReasonConfigurationPartiallyApplied. Consider splitting Reason and Message into separate parameters.

@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch from 89ae4bf to 0d2b089 Compare March 9, 2026 20:06
@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

Walkthrough

Adds CRIOCredentialProviderConfig support: cluster RBAC, controller observers/queues and status handling, rendering and Ignition generation (including kubelet drop-in and flag wiring), constants and default policies, tests, and CRD registration for the test environment.

Changes

Cohort / File(s) Summary
RBAC / Cluster permissions
install/0000_80_machine-config_00_rbac.yaml, manifests/machineconfigcontroller/clusterrole.yaml
Add ClusterRole node-credential-providers and ClusterRoleBinding node-credential-providers-binding; extend machineconfigcontroller ClusterRole to include criocredentialproviderconfigs and its status subresource.
Controller: container-runtime-config
pkg/controller/container-runtime-config/container_runtime_config_controller.go
Add CRIOCredentialProviderConfig observers, listers, new workqueue/worker, sync handlers, status updates, and integration of CRIO credential provider generation into Ignition/render pipeline.
Controller tests
pkg/controller/container-runtime-config/container_runtime_config_controller_test.go
Wire CRIOCredentialProviderConfig into test fixtures, enable feature gate in tests, and add create/update/empty-create tests plus verification helpers.
Helpers and unit tests
pkg/controller/container-runtime-config/helpers.go, pkg/controller/container-runtime-config/helpers_test.go
Add credential-provider utilities: find, ownerRef, managed key, update/merge logic, drop-in unit generation, error→condition wrapper; add unit tests for merging, overlaps, and condition wrapping.
Daemon constants & cluster policies
pkg/daemon/constants/constants.go, pkg/apihelpers/apihelpers.go
Introduce KubernetesCredentialProvidersDir and KubeletCrioImageCredProviderConfPath constants; add default cluster policy entries (daemon-reload/restart kubelet) for those paths.
Kubelet service templates
templates/arbiter/01-arbiter-kubelet/.../kubelet.service.yaml, templates/master/01-master-kubelet/on-prem/.../kubelet.service.yaml, templates/worker/01-worker-kubelet/on-prem/.../kubelet.service.yaml
Append $KUBELET_CRIO_IMAGE_CREDENTIAL_PROVIDER_FLAGS to kubelet ExecStart across templates to expose CRIO image credential provider flags.
Test environment CRDs
test/framework/envtest.go
Register three CRD manifest paths for CRIOCredentialProviderConfig variants (CustomNoUpgrade, DevPreviewNoUpgrade, TechPreviewNoUpgrade) for envtest.

Sequence Diagram(s)

sequenceDiagram
    participant API as "K8s API\n(CRIOCredentialProviderConfig)"
    participant Controller as "machine-config-controller\n(container-runtime-config)"
    participant Renderer as "Ignition Generator\n(merge & drop-in)"
    participant MCO as "MachineConfig/ MCO"
    participant Node as "Node kubelet\n(systemd)"

    API->>Controller: watch/create/update CRIOCredentialProviderConfig
    Controller->>Controller: enqueue & process work item
    Controller->>Renderer: generate platform-aware provider config
    Renderer->>Controller: return merged Ignition + drop-in unit
    Controller->>MCO: create/update MachineConfig with rendered Ignition
    MCO->>Node: apply MachineConfig (write files, add unit)
    Node->>Node: systemctl daemon-reload && systemctl restart kubelet
    Controller->>API: update CRIO CP status (ObservedGeneration, Conditions)
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'OCPNODE-3880: Add criocredentialproviderconfig event handler' accurately describes the main change—adding support for handling CRIOCredentialProviderConfig events in the container runtime config controller.
Stable And Deterministic Test Names ✅ Passed This pull request does not contain any Ginkgo tests. All new tests use standard Go testing framework with deterministic enum constants, not dynamic values.
Test Structure And Quality ✅ Passed PR contains no Ginkgo tests, only standard Go table-driven tests, so Ginkgo-specific requirements are not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 9, 2026

@QiWang19: This pull request references OCPNODE-3880 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

- What I did
Implement criocredentialprovierconfig that is used by crio-credential-provider plugin to fetch private mirror image pull secrets from the secret object.

The handler creates 97-pool-generated-credentialproviderconfig to rollout configurations to file /etc/kubernetes/credential-providers/[platform]-credential-provider.yaml .

workflow: https://github.com/openshift/enhancements/blob/master/enhancements/api-review/criocredentialproviderconfig-for-namespace-scoped-mirror-authentication.md#workflow-description
- How to verify it

  1. create cluster CRIOCredentialProviderConfig resource, file updated with a new section name: crio-credential-provider
apiVersion: config.openshift.io/v1alpha1
kind: CRIOCredentialProviderConfig
metadata:
 name: cluster
spec:
 matchImages:
 - docker.io
 - 123456789.dkr.ecr.us-east-1.amazonaws.com
 - "*.azurecr.io"
 - gcr.io
 - "*.*.registry.io"
 - registry.io:8080/path
$ oc get mc
97-master-generated-credentialproviderconfig       f6dc328e4c370e377ea5c878aa4c0b8feeb1b181   3.5.0             2s
97-worker-generated-credentialproviderconfig       f6dc328e4c370e377ea5c878aa4c0b8feeb1b181   3.5.0             2s

$ oc describe criocredentialproviderconfig.config.openshift.io/cluster
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1alpha1
Kind:         CRIOCredentialProviderConfig
Metadata:
 Creation Timestamp:  2026-02-16T21:26:17Z
 Generation:          1
 Resource Version:    42066
 UID:                 05d173ad-7b8d-4a28-afb9-b587fe565e16
Spec:
 Match Images:
   docker.io
   123456789.dkr.ecr.us-east-1.amazonaws.com
   *.azurecr.io
   gcr.io
   *.*.registry.io
   registry.io:8080/path
Status:
 Conditions:
   Last Transition Time:  2026-02-16T21:26:17Z
   Message:               Success
   Observed Generation:   1
   Reason:                MachineConfigRenderingSucceeded
   Status:                True
   Type:                  MachineConfigRendered
Events:                    <none>

sh-5.1# cat /etc/kubernetes/credential-providers/ecr-credential-provider.yaml 
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- apiVersion: credentialprovider.kubelet.k8s.io/v1
 defaultCacheDuration: 12h0m0s
 matchImages:
 - '*.dkr.ecr.*.amazonaws.com'
 - '*.dkr.ecr.*.amazonaws.com.cn'
 - '*.dkr.ecr-fips.*.amazonaws.com'
 - '*.dkr.ecr.us-iso-east-1.c2s.ic.gov'
 - '*.dkr.ecr.us-isob-east-1.sc2s.sgov.gov'
 name: ecr-credential-provider
- apiVersion: credentialprovider.kubelet.k8s.io/v1
 defaultCacheDuration: 1s
 matchImages:
 - docker.io
 - 123456789.dkr.ecr.us-east-1.amazonaws.com
 - '*.azurecr.io'
 - gcr.io
 - '*.*.registry.io'
 - registry.io:8080/path
 name: crio-credential-provider
 tokenAttributes:
   cacheType: Token
   requireServiceAccount: false
   serviceAccountTokenAudience: https://kubernetes.default.svc

  1. create namespace rolebing allow serviceaccount to get secrect in the namespace e.g. namespace:mynamespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 name: credential-provider-secret-reader
 namespace: mynamespace 
rules:
- apiGroups: [""]
 resources: ["secrets"] 
 verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
 name: credential-provider-secret-reader-binding
 namespace: mynamespace 
roleRef:
 apiGroup: rbac.authorization.k8s.io
 kind: Role
 name: credential-provider-secret-reader
subjects:
 - apiGroup: rbac.authorization.k8s.io
   kind: User
   name: system:serviceaccount:mynamespace:default
  1. Create ImageDigestMirrorSet/ImageTagMirrorSet mirror configurations
  2. Create a pod containers.image is from mirror source registry
  3. Check crio-credential-provider log journalctl _COMM=crio-credential on the scheduled node

- Description for the changelog

Summary by CodeRabbit

Release Notes

  • New Features

  • Added support for managing container runtime credential provider configurations through a new CRIOCredentialProviderConfig resource.

  • Enhanced kubelet startup configuration to support credential provider flags across node types.

  • Security & Permissions

  • Added cluster-level RBAC permissions for node credential provider management.

  • Tests

  • Added comprehensive test coverage for credential provider configuration lifecycle and handling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
pkg/controller/container-runtime-config/container_runtime_config_controller.go (1)

1168-1172: ⚠️ Potential issue | 🟠 Major

Delete the managed MachineConfigs when the singleton CR disappears.

The IsNotFound path returns immediately, so the existing 97-<pool>-generated-credentialproviderconfig MachineConfigs survive CR deletion. That leaves stale kubelet credential-provider config on the pool until something else cleans it up.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`
around lines 1168 - 1172, The IsNotFound branch for
ctrl.criocpLister.Get("cluster") currently returns immediately and leaves
generated MachineConfigs behind; instead, when the singleton CR is missing,
enumerate and delete the managed MachineConfigs (those matching the generated
name pattern "97-<pool>-generated-credentialproviderconfig") using the
MachineConfig lister/client (e.g., ctrl.mcLister or
ctrl.machineConfigClient/MachineConfigs().Delete), handling NotFound errors
gracefully and logging failures; update the errors.IsNotFound branch in the
function around ctrl.criocpLister.Get("cluster") to perform this cleanup before
returning.
pkg/controller/container-runtime-config/helpers.go (1)

1354-1366: ⚠️ Potential issue | 🟠 Major

Prepend crio-credential-provider instead of appending it.

Provider precedence is order-sensitive here. Appending the new entry makes the CRI-O plugin lowest priority, which can change which provider wins for nested or overlapping matches.

Suggested fix
-		credProviderConfigObject.Providers = append(credProviderConfigObject.Providers, newProvider)
+		credProviderConfigObject.Providers = append(
+			[]*credentialProviderWithTag{newProvider},
+			credProviderConfigObject.Providers...,
+		)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/container-runtime-config/helpers.go` around lines 1354 - 1366,
The credential provider for CRI-O is being appended which makes
crio-credential-provider lowest priority; change the logic that adds the new
credentialProviderWithTag (crioCredentialProviderName) so it is prepended to
credProviderConfigObject.Providers (i.e., insert newProvider at index 0) rather
than using append, preserving all existing providers after it to ensure correct
precedence for matching.
🧹 Nitpick comments (1)
install/0000_80_machine-config_00_rbac.yaml (1)

149-178: Consider adding ibm-cloud-managed annotation for consistency.

Other RBAC resources in this file include the include.release.openshift.io/ibm-cloud-managed: "true" annotation. If the CRIO credential provider feature should be available on IBM Cloud Managed deployments, this annotation should be added to both the ClusterRole and ClusterRoleBinding.

Proposed fix to add ibm-cloud-managed annotation
 apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
   name: node-credential-providers
   annotations:
+    include.release.openshift.io/ibm-cloud-managed: "true"
     include.release.openshift.io/self-managed-high-availability: "true"
     include.release.openshift.io/single-node-developer: "true"
 apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
 metadata:
   name: node-credential-providers-binding
   annotations:
+    include.release.openshift.io/ibm-cloud-managed: "true"
     include.release.openshift.io/self-managed-high-availability: "true"
     include.release.openshift.io/single-node-developer: "true"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install/0000_80_machine-config_00_rbac.yaml` around lines 149 - 178, Add the
missing IBM Cloud managed annotation to the ClusterRole and ClusterRoleBinding
by inserting include.release.openshift.io/ibm-cloud-managed: "true" under
metadata.annotations for the ClusterRole named node-credential-providers and the
ClusterRoleBinding named node-credential-providers-binding so their annotations
match other RBAC entries and ensure the CRIO credential provider is marked for
IBM Cloud Managed deployments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@install/0000_80_machine-config_00_rbac.yaml`:
- Around line 156-162: The RBAC rule grants the custom verb
request-serviceaccounts-token-audience to resources: ["*"], which is too broad;
update the rules for serviceaccounts so the
request-serviceaccounts-token-audience verb is only applied to resources:
["serviceaccounts"] (consolidate the two serviceaccount rules so the
"serviceaccounts" resource has verbs
"get","list","request-serviceaccounts-token-audience")—locate the rules block
containing resources: ["*"] and the serviceaccounts rule and adjust accordingly.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller_test.go`:
- Around line 651-654: The current verifyCRIOCredentialProviderConfigContents
function short-circuits when criocp.Spec.MatchImages is empty and returns before
running the empty-spec assertions; change the logic so that when
len(criocp.Spec.MatchImages) == 0 you call the empty-MC assertion
(expectMCNilContent or equivalent) and still run any drop-in/file checks from
the verifyOpts (criocpConfigVerifyOptions) before returning; in short, do not
simply return immediately—invoke expectMCNilContent(mcName) and execute the
drop-in verification paths in verifyOpts when MatchImages is empty, then return.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`:
- Around line 562-566: The current logic calls queue.Forget(key) before
queue.AddAfter(...), which prevents honoring maxRetries and causes permanent
1-minute resyncs; change the retry handling in this error branch to check the
retry count (use queue.NumRequeues(key) or the controller's maxRetries variable)
and only call queue.AddAfter or queue.AddRateLimited when retries < maxRetries,
otherwise call queue.Forget(key) and stop requeuing; update the block around
utilruntime.HandleError(err), klog.V(2).Infof(...), queue.Forget, and
queue.AddAfter to implement this conditional retry logic using the existing
maxRetries, resourceType, key, and err symbols.
- Around line 1235-1249: The code fetches an informer-cached object via
ctrl.criocpLister.Get("cluster") and then mutates it in-place
(meta.SetStatusCondition) which can corrupt the shared cache; fix by
deep-copying the returned object immediately (e.g., call DeepCopy on the result
of criocpLister.Get inside the RetryOnConflict closure), perform the
ObservedGeneration update and meta.SetStatusCondition on the copy, and pass that
copy to
ctrl.configClient.ConfigV1alpha1().CRIOCredentialProviderConfigs().UpdateStatus;
keep the use of wrapErrorWithCRIOCredentialProviderConfigCondition,
retry.RetryOnConflict, and UpdateStatus but ensure mutations occur only on the
DeepCopy.

In `@pkg/controller/container-runtime-config/helpers.go`:
- Around line 1309-1327: The current loop resets skip per provider causing an
image to be considered non-conflicting if it doesn't overlap a later provider;
change the logic to detect conflicts per image across all providers by iterating
newImages outermost and then checking every provider in
credProviderConfigObject.Providers (skipping crioCredentialProviderName) and
using runtimeutils.ScopeIsNestedInsideScope to mark overlappedEntries[img]=true
and set skip=true then break out of the provider loop; only after checking all
providers (or breaking on first overlap) add img to matchImages if skip is
false. Ensure you reference and update the same variables: newImages,
matchImages, overlappedEntries and preserve skipping of
crioCredentialProviderName.

In `@test/framework/envtest.go`:
- Around line 119-121: The shared envtest helpers are not handling the
config.openshift.io/v1alpha1 CRIOCredentialProviderConfig type; update
NewTestEnv and the helper functions CheckCleanEnvironment, CleanEnvironment, and
CreateObjects to include CRIOCredentialProviderConfig in the CRD/object lists
they install, delete, and create. Specifically, add the CRD file entries (the
three zz_generated.crd-manifests paths) to the set NewTestEnv installs, ensure
CheckCleanEnvironment includes
config.openshift.io/v1alpha1.CRIOCredentialProviderConfig when scanning for
leftover resources, make CleanEnvironment delete any
CRIOCredentialProviderConfig instances it finds, and allow CreateObjects to
accept/seed CRIOCredentialProviderConfig objects so tests can create them via
the common helper.

---

Duplicate comments:
In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`:
- Around line 1168-1172: The IsNotFound branch for
ctrl.criocpLister.Get("cluster") currently returns immediately and leaves
generated MachineConfigs behind; instead, when the singleton CR is missing,
enumerate and delete the managed MachineConfigs (those matching the generated
name pattern "97-<pool>-generated-credentialproviderconfig") using the
MachineConfig lister/client (e.g., ctrl.mcLister or
ctrl.machineConfigClient/MachineConfigs().Delete), handling NotFound errors
gracefully and logging failures; update the errors.IsNotFound branch in the
function around ctrl.criocpLister.Get("cluster") to perform this cleanup before
returning.

In `@pkg/controller/container-runtime-config/helpers.go`:
- Around line 1354-1366: The credential provider for CRI-O is being appended
which makes crio-credential-provider lowest priority; change the logic that adds
the new credentialProviderWithTag (crioCredentialProviderName) so it is
prepended to credProviderConfigObject.Providers (i.e., insert newProvider at
index 0) rather than using append, preserving all existing providers after it to
ensure correct precedence for matching.

---

Nitpick comments:
In `@install/0000_80_machine-config_00_rbac.yaml`:
- Around line 149-178: Add the missing IBM Cloud managed annotation to the
ClusterRole and ClusterRoleBinding by inserting
include.release.openshift.io/ibm-cloud-managed: "true" under
metadata.annotations for the ClusterRole named node-credential-providers and the
ClusterRoleBinding named node-credential-providers-binding so their annotations
match other RBAC entries and ensure the CRIO credential provider is marked for
IBM Cloud Managed deployments.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e40c7900-9f33-4929-8e5c-f255d95f064f

📥 Commits

Reviewing files that changed from the base of the PR and between 7a8a698 and 0d2b089.

📒 Files selected for processing (13)
  • install/0000_80_machine-config_00_rbac.yaml
  • manifests/machineconfigcontroller/clusterrole.yaml
  • pkg/apihelpers/apihelpers.go
  • pkg/controller/container-runtime-config/container_runtime_config_controller.go
  • pkg/controller/container-runtime-config/container_runtime_config_controller_test.go
  • pkg/controller/container-runtime-config/helpers.go
  • pkg/controller/container-runtime-config/helpers_test.go
  • pkg/daemon/constants/constants.go
  • templates/arbiter/01-arbiter-kubelet/_base/units/kubelet.service.yaml
  • templates/arbiter/01-arbiter-kubelet/on-prem/units/kubelet.service.yaml
  • templates/master/01-master-kubelet/on-prem/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml
  • test/framework/envtest.go

Comment on lines +156 to +162
rules:
- apiGroups: [""]
resources: ["serviceaccounts"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["*"]
verbs: ["request-serviceaccounts-token-audience"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

OpenShift request-serviceaccounts-token-audience RBAC verb

💡 Result:

In OpenShift/Kubernetes RBAC, request-serviceaccounts-token-audience is itself the RBAC verb string (it’s a non-standard/custom verb) that you place under rules[].verbs—typically alongside get / list on the serviceaccounts resource. Examples in the wild grant it like this: verbs: ["request-serviceaccounts-token-audience"] (often with get,list) on resources: ["serviceaccounts"]. [1], [2]

If what you actually need is permission to call the upstream TokenRequest endpoint (.../serviceaccounts/<name>/token), the RBAC permission is the standard Kubernetes pattern: verb create on the token subresource (resource: serviceaccounts/token). [3]

Example ClusterRole rule (custom verb):

rules:
- apiGroups: [""]
  resources: ["serviceaccounts"]
  verbs: ["get","list","request-serviceaccounts-token-audience"]

Example ClusterRole rule (TokenRequest API):

rules:
- apiGroups: [""]
  resources: ["serviceaccounts/token"]
  verbs: ["create"]

Sources: [1] [2] [3]


Fix overly broad resource scope for request-serviceaccounts-token-audience verb.

The verb request-serviceaccounts-token-audience is a valid OpenShift custom RBAC verb, but it's currently applied to resources: ["*"], granting it on all resources. According to OpenShift patterns, this verb should be restricted to resources: ["serviceaccounts"] only. The second rule should be:

Suggested fix
- apiGroups: [""]
  resources: ["serviceaccounts"]
  verbs: ["request-serviceaccounts-token-audience"]

This consolidates both rules into a single rule on the serviceaccounts resource.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install/0000_80_machine-config_00_rbac.yaml` around lines 156 - 162, The RBAC
rule grants the custom verb request-serviceaccounts-token-audience to resources:
["*"], which is too broad; update the rules for serviceaccounts so the
request-serviceaccounts-token-audience verb is only applied to resources:
["serviceaccounts"] (consolidate the two serviceaccount rules so the
"serviceaccounts" resource has verbs
"get","list","request-serviceaccounts-token-audience")—locate the rules block
containing resources: ["*"] and the serviceaccounts rule and adjust accordingly.

Comment on lines +651 to +654
func (f *fixture) verifyCRIOCredentialProviderConfigContents(t *testing.T, mcName string, criocp *apicfgv1alpha1.CRIOCredentialProviderConfig, verifyOpts criocpConfigVerifyOptions) {
if len(criocp.Spec.MatchImages) == 0 {
return
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't short-circuit the empty-spec assertions.

This returns before expectMCNilContent or any drop-in checks run, so TestCrioCredentialProviderConfigCreateEmpty currently passes without validating the generated MachineConfig at all.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller_test.go`
around lines 651 - 654, The current verifyCRIOCredentialProviderConfigContents
function short-circuits when criocp.Spec.MatchImages is empty and returns before
running the empty-spec assertions; change the logic so that when
len(criocp.Spec.MatchImages) == 0 you call the empty-MC assertion
(expectMCNilContent or equivalent) and still run any drop-in/file checks from
the verifyOpts (criocpConfigVerifyOptions) before returning; in short, do not
simply return immediately—invoke expectMCNilContent(mcName) and execute the
drop-in verification paths in verifyOpts when MatchImages is empty, then return.

Comment on lines +119 to +121
filepath.Join("..", "..", "vendor", "github.com", "openshift", "api", "config", "v1alpha1", "zz_generated.crd-manifests", "0000_10_config-operator_01_criocredentialproviderconfigs-CustomNoUpgrade.crd.yaml"),
filepath.Join("..", "..", "vendor", "github.com", "openshift", "api", "config", "v1alpha1", "zz_generated.crd-manifests", "0000_10_config-operator_01_criocredentialproviderconfigs-DevPreviewNoUpgrade.crd.yaml"),
filepath.Join("..", "..", "vendor", "github.com", "openshift", "api", "config", "v1alpha1", "zz_generated.crd-manifests", "0000_10_config-operator_01_criocredentialproviderconfigs-TechPreviewNoUpgrade.crd.yaml"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wire CRIOCredentialProviderConfig through the shared envtest helpers.

NewTestEnv now installs this CRD, but CheckCleanEnvironment, CleanEnvironment, and CreateObjects still don't list/delete/create config.openshift.io/v1alpha1.CRIOCredentialProviderConfig. Any envtest that exercises this type can leak state across cases and can't seed the object through the common helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/framework/envtest.go` around lines 119 - 121, The shared envtest helpers
are not handling the config.openshift.io/v1alpha1 CRIOCredentialProviderConfig
type; update NewTestEnv and the helper functions CheckCleanEnvironment,
CleanEnvironment, and CreateObjects to include CRIOCredentialProviderConfig in
the CRD/object lists they install, delete, and create. Specifically, add the CRD
file entries (the three zz_generated.crd-manifests paths) to the set NewTestEnv
installs, ensure CheckCleanEnvironment includes
config.openshift.io/v1alpha1.CRIOCredentialProviderConfig when scanning for
leftover resources, make CleanEnvironment delete any
CRIOCredentialProviderConfig instances it finds, and allow CreateObjects to
accept/seed CRIOCredentialProviderConfig objects so tests can create them via
the common helper.

@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch from 0d2b089 to 9b97140 Compare March 9, 2026 20:33
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 9, 2026

@QiWang19: This pull request references OCPNODE-3880 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

- What I did
Implement criocredentialprovierconfig that is used by crio-credential-provider plugin to fetch private mirror image pull secrets from the secret object.

The handler creates 97-pool-generated-credentialproviderconfig to rollout configurations to file /etc/kubernetes/credential-providers/[platform]-credential-provider.yaml .

workflow: https://github.com/openshift/enhancements/blob/master/enhancements/api-review/criocredentialproviderconfig-for-namespace-scoped-mirror-authentication.md#workflow-description
- How to verify it

  1. create cluster CRIOCredentialProviderConfig resource, file updated with a new section name: crio-credential-provider
apiVersion: config.openshift.io/v1alpha1
kind: CRIOCredentialProviderConfig
metadata:
 name: cluster
spec:
 matchImages:
 - docker.io
 - 123456789.dkr.ecr.us-east-1.amazonaws.com
 - "*.azurecr.io"
 - gcr.io
 - "*.*.registry.io"
 - registry.io:8080/path
$ oc get mc
97-master-generated-credentialproviderconfig       f6dc328e4c370e377ea5c878aa4c0b8feeb1b181   3.5.0             2s
97-worker-generated-credentialproviderconfig       f6dc328e4c370e377ea5c878aa4c0b8feeb1b181   3.5.0             2s

$ oc describe criocredentialproviderconfig.config.openshift.io/cluster
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1alpha1
Kind:         CRIOCredentialProviderConfig
Metadata:
 Creation Timestamp:  2026-02-16T21:26:17Z
 Generation:          1
 Resource Version:    42066
 UID:                 05d173ad-7b8d-4a28-afb9-b587fe565e16
Spec:
 Match Images:
   docker.io
   123456789.dkr.ecr.us-east-1.amazonaws.com
   *.azurecr.io
   gcr.io
   *.*.registry.io
   registry.io:8080/path
Status:
 Conditions:
   Last Transition Time:  2026-02-16T21:26:17Z
   Message:               Success
   Observed Generation:   1
   Reason:                MachineConfigRenderingSucceeded
   Status:                True
   Type:                  MachineConfigRendered
Events:                    <none>

sh-5.1# cat /etc/kubernetes/credential-providers/ecr-credential-provider.yaml 
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- apiVersion: credentialprovider.kubelet.k8s.io/v1
 defaultCacheDuration: 12h0m0s
 matchImages:
 - '*.dkr.ecr.*.amazonaws.com'
 - '*.dkr.ecr.*.amazonaws.com.cn'
 - '*.dkr.ecr-fips.*.amazonaws.com'
 - '*.dkr.ecr.us-iso-east-1.c2s.ic.gov'
 - '*.dkr.ecr.us-isob-east-1.sc2s.sgov.gov'
 name: ecr-credential-provider
- apiVersion: credentialprovider.kubelet.k8s.io/v1
 defaultCacheDuration: 1s
 matchImages:
 - docker.io
 - 123456789.dkr.ecr.us-east-1.amazonaws.com
 - '*.azurecr.io'
 - gcr.io
 - '*.*.registry.io'
 - registry.io:8080/path
 name: crio-credential-provider
 tokenAttributes:
   cacheType: Token
   requireServiceAccount: false
   serviceAccountTokenAudience: https://kubernetes.default.svc

  1. create namespace rolebing allow serviceaccount to get secrect in the namespace e.g. namespace:mynamespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 name: credential-provider-secret-reader
 namespace: mynamespace 
rules:
- apiGroups: [""]
 resources: ["secrets"] 
 verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
 name: credential-provider-secret-reader-binding
 namespace: mynamespace 
roleRef:
 apiGroup: rbac.authorization.k8s.io
 kind: Role
 name: credential-provider-secret-reader
subjects:
 - apiGroup: rbac.authorization.k8s.io
   kind: User
   name: system:serviceaccount:mynamespace:default
  1. Create ImageDigestMirrorSet/ImageTagMirrorSet mirror configurations
  2. Create a pod containers.image is from mirror source registry
  3. Check crio-credential-provider log journalctl _COMM=crio-credential on the scheduled node

- Description for the changelog

Summary by CodeRabbit

  • New Features

  • Support for managing container runtime credential provider configs via a new CRIOCredentialProviderConfig and kubelet flag support across node types.

  • Cluster defaults updated to ensure kubelet reload/restart when credential provider configs change.

  • Security & Permissions

  • Added cluster-wide RBAC to allow node credential provider management.

  • Tests

  • Added comprehensive tests covering CRIO credential provider lifecycle and config merging.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
install/0000_80_machine-config_00_rbac.yaml (1)

160-162: ⚠️ Potential issue | 🟠 Major

Scope request-serviceaccounts-token-audience to serviceaccounts only.

This rule still grants the custom verb across resources: ["*"], which is broader than the credential-provider flow needs. Please fold it into the existing serviceaccounts rule instead of granting it cluster-wide on every core resource.

Suggested fix
   - apiGroups: [""]
     resources: ["serviceaccounts"]
-    verbs: ["get", "list"]
-  - apiGroups: [""]
-    resources: ["*"]
-    verbs: ["request-serviceaccounts-token-audience"]
+    verbs: ["get", "list", "request-serviceaccounts-token-audience"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install/0000_80_machine-config_00_rbac.yaml` around lines 160 - 162, The
manifest currently grants the custom verb
"request-serviceaccounts-token-audience" to resources: ["*"] under apiGroups:
[""], which is too broad; remove that standalone rule and instead add
"request-serviceaccounts-token-audience" to the existing serviceaccounts rule
(the rule that has apiGroups: [""], resources: ["serviceaccounts"] or resources:
["serviceaccounts", ...]) so the verb is scoped only to serviceaccounts; ensure
you remove the resources: ["*"] entry and update the serviceaccounts rule's
verbs array to include "request-serviceaccounts-token-audience" without
duplicating other verbs.
pkg/controller/container-runtime-config/container_runtime_config_controller.go (1)

562-565: ⚠️ Potential issue | 🟠 Major

maxRetries still never becomes terminal here.

After Forget(key), re-adding the item with AddAfter turns persistent failures into an endless 1-minute retry loop for every queue using this helper. That defeats the contract implied by maxRetries.

Suggested fix
 	utilruntime.HandleError(err)
 	klog.V(2).Infof("Dropping %s %q out of the queue: %v", resourceType, key, err)
 	queue.Forget(key)
-	queue.AddAfter(key, 1*time.Minute)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`
around lines 562 - 565, The code calls queue.Forget(key) and then
queue.AddAfter(key, 1*time.Minute) which resets retry state and defeats the
intended maxRetries behavior; instead, check the retry count (use
queue.NumRequeues(key) or the workqueue.RateLimitingInterface helper) against
maxRetries and only re-enqueue with AddAfter/AddRateLimited when the count is
below maxRetries, otherwise call queue.Forget(key) and stop requeuing (log
terminal failure). Update the error-handling block around
utilruntime.HandleError and klog.V(2).Infof to implement this conditional
requeue logic using the existing maxRetries variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`:
- Around line 1224-1228: The status update is only executed when the reconcile
actually applied changes (applied == true), so no-op reconciles don't advance
CRIOCredentialProviderConfig status; move or add a call to
syncCRIOCredentialProviderConfigStatusOnly(...) so it runs even when applied is
false (i.e., after the applied conditional), ensuring you invoke
syncCRIOCredentialProviderConfigStatusOnly(nil,
apicfgv1alpha1.ConditionTypeMachineConfigRendered,
apicfgv1alpha1.ReasonMachineConfigRenderingSucceeded) on successful
reconciliation/no-op so ObservedGeneration and the rendered condition are
advanced regardless of whether MachineConfigs were modified.

---

Duplicate comments:
In `@install/0000_80_machine-config_00_rbac.yaml`:
- Around line 160-162: The manifest currently grants the custom verb
"request-serviceaccounts-token-audience" to resources: ["*"] under apiGroups:
[""], which is too broad; remove that standalone rule and instead add
"request-serviceaccounts-token-audience" to the existing serviceaccounts rule
(the rule that has apiGroups: [""], resources: ["serviceaccounts"] or resources:
["serviceaccounts", ...]) so the verb is scoped only to serviceaccounts; ensure
you remove the resources: ["*"] entry and update the serviceaccounts rule's
verbs array to include "request-serviceaccounts-token-audience" without
duplicating other verbs.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`:
- Around line 562-565: The code calls queue.Forget(key) and then
queue.AddAfter(key, 1*time.Minute) which resets retry state and defeats the
intended maxRetries behavior; instead, check the retry count (use
queue.NumRequeues(key) or the workqueue.RateLimitingInterface helper) against
maxRetries and only re-enqueue with AddAfter/AddRateLimited when the count is
below maxRetries, otherwise call queue.Forget(key) and stop requeuing (log
terminal failure). Update the error-handling block around
utilruntime.HandleError and klog.V(2).Infof to implement this conditional
requeue logic using the existing maxRetries variable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5e364b1d-5e56-4d04-922d-2361caed86f3

📥 Commits

Reviewing files that changed from the base of the PR and between 0d2b089 and 9b97140.

📒 Files selected for processing (13)
  • install/0000_80_machine-config_00_rbac.yaml
  • manifests/machineconfigcontroller/clusterrole.yaml
  • pkg/apihelpers/apihelpers.go
  • pkg/controller/container-runtime-config/container_runtime_config_controller.go
  • pkg/controller/container-runtime-config/container_runtime_config_controller_test.go
  • pkg/controller/container-runtime-config/helpers.go
  • pkg/controller/container-runtime-config/helpers_test.go
  • pkg/daemon/constants/constants.go
  • templates/arbiter/01-arbiter-kubelet/_base/units/kubelet.service.yaml
  • templates/arbiter/01-arbiter-kubelet/on-prem/units/kubelet.service.yaml
  • templates/master/01-master-kubelet/on-prem/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml
  • test/framework/envtest.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • manifests/machineconfigcontroller/clusterrole.yaml
  • pkg/controller/container-runtime-config/helpers_test.go
  • pkg/daemon/constants/constants.go
  • test/framework/envtest.go
  • pkg/apihelpers/apihelpers.go
  • templates/arbiter/01-arbiter-kubelet/_base/units/kubelet.service.yaml

@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch from 9b97140 to dc39060 Compare March 9, 2026 20:57
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 9, 2026

@QiWang19: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-2of2 b4132ef link true /test e2e-gcp-op-2of2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 9, 2026

@QiWang19: This pull request references OCPNODE-3880 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

- What I did
Implement criocredentialprovierconfig that is used by crio-credential-provider plugin to fetch private mirror image pull secrets from the secret object.

The handler creates 97-pool-generated-credentialproviderconfig to rollout configurations to file /etc/kubernetes/credential-providers/[platform]-credential-provider.yaml .

workflow: https://github.com/openshift/enhancements/blob/master/enhancements/api-review/criocredentialproviderconfig-for-namespace-scoped-mirror-authentication.md#workflow-description
- How to verify it

  1. create cluster CRIOCredentialProviderConfig resource, file updated with a new section name: crio-credential-provider
apiVersion: config.openshift.io/v1alpha1
kind: CRIOCredentialProviderConfig
metadata:
 name: cluster
spec:
 matchImages:
 - docker.io
 - 123456789.dkr.ecr.us-east-1.amazonaws.com
 - "*.azurecr.io"
 - gcr.io
 - "*.*.registry.io"
 - registry.io:8080/path
$ oc get mc
97-master-generated-credentialproviderconfig       f6dc328e4c370e377ea5c878aa4c0b8feeb1b181   3.5.0             2s
97-worker-generated-credentialproviderconfig       f6dc328e4c370e377ea5c878aa4c0b8feeb1b181   3.5.0             2s

$ oc describe criocredentialproviderconfig.config.openshift.io/cluster
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1alpha1
Kind:         CRIOCredentialProviderConfig
Metadata:
 Creation Timestamp:  2026-02-16T21:26:17Z
 Generation:          1
 Resource Version:    42066
 UID:                 05d173ad-7b8d-4a28-afb9-b587fe565e16
Spec:
 Match Images:
   docker.io
   123456789.dkr.ecr.us-east-1.amazonaws.com
   *.azurecr.io
   gcr.io
   *.*.registry.io
   registry.io:8080/path
Status:
 Conditions:
   Last Transition Time:  2026-02-16T21:26:17Z
   Message:               Success
   Observed Generation:   1
   Reason:                MachineConfigRenderingSucceeded
   Status:                True
   Type:                  MachineConfigRendered
Events:                    <none>

sh-5.1# cat /etc/kubernetes/credential-providers/ecr-credential-provider.yaml 
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- apiVersion: credentialprovider.kubelet.k8s.io/v1
 defaultCacheDuration: 12h0m0s
 matchImages:
 - '*.dkr.ecr.*.amazonaws.com'
 - '*.dkr.ecr.*.amazonaws.com.cn'
 - '*.dkr.ecr-fips.*.amazonaws.com'
 - '*.dkr.ecr.us-iso-east-1.c2s.ic.gov'
 - '*.dkr.ecr.us-isob-east-1.sc2s.sgov.gov'
 name: ecr-credential-provider
- apiVersion: credentialprovider.kubelet.k8s.io/v1
 defaultCacheDuration: 1s
 matchImages:
 - docker.io
 - 123456789.dkr.ecr.us-east-1.amazonaws.com
 - '*.azurecr.io'
 - gcr.io
 - '*.*.registry.io'
 - registry.io:8080/path
 name: crio-credential-provider
 tokenAttributes:
   cacheType: Token
   requireServiceAccount: false
   serviceAccountTokenAudience: https://kubernetes.default.svc

  1. create namespace rolebing allow serviceaccount to get secrect in the namespace e.g. namespace:mynamespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 name: credential-provider-secret-reader
 namespace: mynamespace 
rules:
- apiGroups: [""]
 resources: ["secrets"] 
 verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
 name: credential-provider-secret-reader-binding
 namespace: mynamespace 
roleRef:
 apiGroup: rbac.authorization.k8s.io
 kind: Role
 name: credential-provider-secret-reader
subjects:
 - apiGroup: rbac.authorization.k8s.io
   kind: User
   name: system:serviceaccount:mynamespace:default
  1. Create ImageDigestMirrorSet/ImageTagMirrorSet mirror configurations
  2. Create a pod containers.image is from mirror source registry
  3. Check crio-credential-provider log journalctl _COMM=crio-credential on the scheduled node

- Description for the changelog

Summary by CodeRabbit

  • New Features

  • Manage container runtime credential provider configs via a new CRIOCredentialProviderConfig and kubelet flag support across node types.

  • Kubelet startup now accepts additional flags to enable CRIO image credential provider behavior.

  • Chores / Defaults

  • Cluster defaults updated to ensure kubelet reload/restart when credential provider configs change.

  • Security & Permissions

  • Added cluster-wide RBAC to allow node credential provider management.

  • Tests

  • Added comprehensive tests covering CRIO credential provider lifecycle and config merging.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
pkg/controller/container-runtime-config/container_runtime_config_controller.go (1)

562-565: ⚠️ Potential issue | 🟠 Major

Don't requeue from the terminal retry path.

After maxRetries, this still does Forget and then AddAfter, so broken objects never actually drop out of the queue and will retry forever every minute.

Suggested fix
 	utilruntime.HandleError(err)
 	klog.V(2).Infof("Dropping %s %q out of the queue: %v", resourceType, key, err)
 	queue.Forget(key)
-	queue.AddAfter(key, 1*time.Minute)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`
around lines 562 - 565, The terminal retry path currently calls
utilruntime.HandleError(err), logs via klog.V(2).Infof, then calls
queue.Forget(key) and queue.AddAfter(key, 1*time.Minute) causing infinite
requeues; update the terminal path so after reaching maxRetries you call
utilruntime.HandleError(err) and queue.Forget(key) (or just Forget and return)
but do NOT call queue.AddAfter; locate the logic around maxRetries handling in
container_runtime_config_controller.go where queue.Forget and queue.AddAfter are
invoked and remove the AddAfter call so failed objects are not requeued forever.
pkg/controller/container-runtime-config/container_runtime_config_controller_test.go (1)

659-664: ⚠️ Potential issue | 🟡 Minor

Don't short-circuit the empty-spec verification path.

This still returns before any later assertions run, so TestCrioCredentialProviderConfigCreateEmpty can pass while silently masking mismatches in criocpConfigVerifyOptions for the non-cloud path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller_test.go`
around lines 659 - 664, The test currently returns early when
verifyOpts.expectMCNilContent is true, which skips later assertions and can mask
mismatches for TestCrioCredentialProviderConfigCreateEmpty; change the block in
the test that checks verifyOpts.expectMCNilContent so it only asserts that
ignCfg.Storage.Files is empty (when expectMCNilContent is true) and do not
return from the test—remove the early return (or scope it so only the specific
check is skipped) so the remaining criocpConfigVerifyOptions-related assertions
still execute.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@pkg/controller/container-runtime-config/container_runtime_config_controller_test.go`:
- Around line 685-692: The current loop sets allMatch to true when any
criocp.Spec.MatchImages entry is found in provider.MatchImages, so a partial
match passes; change the logic so allMatch reflects that every expected image is
present: initialize allMatch = true before the loop over
criocp.Spec.MatchImages, and for each mi check assert.Contains(t,
provider.MatchImages, string(mi)); if the check fails set allMatch = false and
break (or return) immediately. Use the same symbols (allMatch,
criocp.Spec.MatchImages, provider.MatchImages) so the test correctly requires
every expected image to be present.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`:
- Around line 1208-1210: When handling CRIOCredentialProviderConfig overlaps in
syncCRIOCredentialProviderConfigStatusOnly, also clear the stale
ConditionTypeValidated/ReasonConfigurationPartiallyApplied when overlaps are
resolved: if len(overlappedEntries) == 0 call
syncCRIOCredentialProviderConfigStatusOnly with a success/validated condition
(e.g., apicfgv1alpha1.ConditionTypeValidated, ReasonConfigurationApplied or
similar) and a message indicating rendering succeeded so the previous
ConfigurationPartiallyApplied condition is removed; ensure you update the same
condition type (ConditionTypeValidated) rather than only setting it when
overlaps exist.

In `@pkg/controller/container-runtime-config/helpers.go`:
- Line 1383: The parameter name in function generateDropinUnitCredProviderConfig
is misspelled as generticCredProviderConfigPath; rename it to
genericCredProviderConfigPath in the function signature and update all internal
references and callers to use the corrected identifier so the symbol matches
expected spelling (ensure generateDropinUnitCredProviderConfig and any places
that call it compile after the rename).

---

Duplicate comments:
In
`@pkg/controller/container-runtime-config/container_runtime_config_controller_test.go`:
- Around line 659-664: The test currently returns early when
verifyOpts.expectMCNilContent is true, which skips later assertions and can mask
mismatches for TestCrioCredentialProviderConfigCreateEmpty; change the block in
the test that checks verifyOpts.expectMCNilContent so it only asserts that
ignCfg.Storage.Files is empty (when expectMCNilContent is true) and do not
return from the test—remove the early return (or scope it so only the specific
check is skipped) so the remaining criocpConfigVerifyOptions-related assertions
still execute.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`:
- Around line 562-565: The terminal retry path currently calls
utilruntime.HandleError(err), logs via klog.V(2).Infof, then calls
queue.Forget(key) and queue.AddAfter(key, 1*time.Minute) causing infinite
requeues; update the terminal path so after reaching maxRetries you call
utilruntime.HandleError(err) and queue.Forget(key) (or just Forget and return)
but do NOT call queue.AddAfter; locate the logic around maxRetries handling in
container_runtime_config_controller.go where queue.Forget and queue.AddAfter are
invoked and remove the AddAfter call so failed objects are not requeued forever.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6b802971-f882-404a-af1a-cf29785423f4

📥 Commits

Reviewing files that changed from the base of the PR and between 9b97140 and dc39060.

📒 Files selected for processing (13)
  • install/0000_80_machine-config_00_rbac.yaml
  • manifests/machineconfigcontroller/clusterrole.yaml
  • pkg/apihelpers/apihelpers.go
  • pkg/controller/container-runtime-config/container_runtime_config_controller.go
  • pkg/controller/container-runtime-config/container_runtime_config_controller_test.go
  • pkg/controller/container-runtime-config/helpers.go
  • pkg/controller/container-runtime-config/helpers_test.go
  • pkg/daemon/constants/constants.go
  • templates/arbiter/01-arbiter-kubelet/_base/units/kubelet.service.yaml
  • templates/arbiter/01-arbiter-kubelet/on-prem/units/kubelet.service.yaml
  • templates/master/01-master-kubelet/on-prem/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml
  • test/framework/envtest.go
🚧 Files skipped from review as they are similar to previous changes (8)
  • pkg/daemon/constants/constants.go
  • templates/arbiter/01-arbiter-kubelet/on-prem/units/kubelet.service.yaml
  • install/0000_80_machine-config_00_rbac.yaml
  • test/framework/envtest.go
  • templates/arbiter/01-arbiter-kubelet/_base/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml
  • templates/master/01-master-kubelet/on-prem/units/kubelet.service.yaml
  • pkg/apihelpers/apihelpers.go

Comment on lines +1208 to +1210
if len(overlappedEntries) > 0 {
ctrl.syncCRIOCredentialProviderConfigStatusOnly(nil, apicfgv1alpha1.ConditionTypeValidated, apicfgv1alpha1.ReasonConfigurationPartiallyApplied, "CRIOCredentialProviderConfig has one or multiple entries that overlap with the original credential provider config. Skip rendering entries: %v.", overlappedEntries)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clear the stale Validated=False condition after overlaps are fixed.

This only updates ConditionTypeValidated when overlaps exist. If a user removes the conflicting entries later, the old ConfigurationPartiallyApplied condition is left behind indefinitely even though rendering now succeeds.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`
around lines 1208 - 1210, When handling CRIOCredentialProviderConfig overlaps in
syncCRIOCredentialProviderConfigStatusOnly, also clear the stale
ConditionTypeValidated/ReasonConfigurationPartiallyApplied when overlaps are
resolved: if len(overlappedEntries) == 0 call
syncCRIOCredentialProviderConfigStatusOnly with a success/validated condition
(e.g., apicfgv1alpha1.ConditionTypeValidated, ReasonConfigurationApplied or
similar) and a message indicating rendering succeeded so the previous
ConfigurationPartiallyApplied condition is removed; ensure you update the same
condition type (ConditionTypeValidated) rather than only setting it when
overlaps exist.

Signed-off-by: Qi Wang <qiwan@redhat.com>
@QiWang19 QiWang19 force-pushed the addcriocpcontroller branch from dc39060 to d1eef7d Compare March 9, 2026 21:13
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
install/0000_80_machine-config_00_rbac.yaml (1)

156-162: ⚠️ Potential issue | 🟠 Major

Scope request-serviceaccounts-token-audience to serviceaccounts only.

Granting this custom verb on resources: ["*"] gives system:nodes that capability across every core resource. The new credential-provider flow only needs it on serviceaccounts, so this should be merged into the preceding rule instead of expanding the resource scope cluster-wide.

Suggested fix
 rules:
   - apiGroups: [""]
     resources: ["serviceaccounts"]
-    verbs: ["get", "list"]
-  - apiGroups: [""]
-    resources: ["*"]
-    verbs: ["request-serviceaccounts-token-audience"]
+    verbs: ["get", "list", "request-serviceaccounts-token-audience"]
In OpenShift RBAC, should the custom verb `request-serviceaccounts-token-audience` be granted on `resources: ["serviceaccounts"]` rather than `resources: ["*"]`?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install/0000_80_machine-config_00_rbac.yaml` around lines 156 - 162, The
second rule grants the custom verb "request-serviceaccounts-token-audience"
across all core resources; narrow its scope by moving that verb into the prior
rule that already targets apiGroups: [""] and resources: ["serviceaccounts"] so
the combined rule lists verbs:
["get","list","request-serviceaccounts-token-audience"] (i.e., remove the
separate rule with resources: ["*"] and ensure only serviceaccounts retain the
custom verb).
pkg/controller/container-runtime-config/container_runtime_config_controller.go (2)

549-565: ⚠️ Potential issue | 🟠 Major

Stop requeueing once maxRetries is exhausted.

queue.Forget(key) followed by queue.AddAfter(key, 1*time.Minute) turns permanent failures into an endless 1-minute retry loop, so invalid objects never actually drop out of the queue despite the controller contract above.

Suggested fix
 	utilruntime.HandleError(err)
 	klog.V(2).Infof("Dropping %s %q out of the queue: %v", resourceType, key, err)
 	queue.Forget(key)
-	queue.AddAfter(key, 1*time.Minute)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`
around lines 549 - 565, The current handleQueueErr implementation requeues keys
after maxRetries by calling queue.Forget(key) then queue.AddAfter(key,
1*time.Minute), creating an endless retry loop; modify the logic in
Controller.handleQueueErr so that when queue.NumRequeues(key) >= maxRetries it
only calls queue.Forget(key) (and utilruntime.HandleError(err) / klog as
present) and does not call queue.AddAfter(key, 1*time.Minute) so the item is
dropped permanently instead of being requeued.

1208-1228: ⚠️ Potential issue | 🟠 Major

Clear the stale Validated=False condition when overlaps disappear.

The validated condition is only updated inside the overlap branch. If a user fixes the conflicting matchImages later, this reconcile only writes MachineConfigRendered=True, so the old ConditionTypeValidated / partial-applied status remains stuck on the CR. Update ConditionTypeValidated on the non-overlap path too so meta.SetStatusCondition can replace the stale failure.

Suggested fix
 			if len(overlappedEntries) > 0 {
 				ctrl.syncCRIOCredentialProviderConfigStatusOnly(nil, apicfgv1alpha1.ConditionTypeValidated, apicfgv1alpha1.ReasonConfigurationPartiallyApplied, "CRIOCredentialProviderConfig has one or multiple entries that overlap with the original credential provider config. Skip rendering entries: %v.", overlappedEntries)
+			} else {
+				ctrl.syncCRIOCredentialProviderConfigStatusOnly(nil, apicfgv1alpha1.ConditionTypeValidated, "")
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`
around lines 1208 - 1228, The code only calls
syncCRIOCredentialProviderConfigStatusOnly(...,
apicfgv1alpha1.ConditionTypeValidated,
apicfgv1alpha1.ReasonConfigurationPartiallyApplied, ...) inside the "overlap"
branch, leaving a stale Validated=False when overlaps are later resolved; modify
the non-overlap path (the branch that proceeds to call syncIgnitionConfig and
returns nil/applied) to explicitly clear or set ConditionTypeValidated to a
success state (e.g., call syncCRIOCredentialProviderConfigStatusOnly(nil,
apicfgv1alpha1.ConditionTypeValidated,
apicfgv1alpha1.ReasonConfigurationValidated, "validated") or similar) so
meta.SetStatusCondition replaces the previous failure—add this call just before
returning nil or after successful MachineConfig rendering in the code paths that
use syncIgnitionConfig and when applied is true, referencing
syncCRIOCredentialProviderConfigStatusOnly and ConditionTypeValidated to locate
the insertion point.
🧹 Nitpick comments (1)
pkg/controller/container-runtime-config/container_runtime_config_controller.go (1)

67-71: Reuse the exported credential-provider path constants here.

These literals now duplicate pkg/daemon/constants/constants.go, so a later path change can silently desync controller rendering from the daemon-side policy wiring. Please build these from constants.KubernetesCredentialProvidersDir and constants.KubeletCrioImageCredProviderConfPath instead of hardcoding them again.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`
around lines 67 - 71, Replace the hardcoded credential provider path constants
in this file: instead of defining genericCredProviderConfigPath and
kubeletCrioImageCredProviderConfPath as string literals, build them from the
exported constants in pkg/daemon/constants: use
constants.KubernetesCredentialProvidersDir to construct the generic credential
provider config filename (generic-credential-provider.yaml) and use
constants.KubeletCrioImageCredProviderConfPath for the kubelet crio image
credential provider config; update the declarations where builtInLabelKey and
these two path constants are defined so the controller reuses
constants.KubernetesCredentialProvidersDir and
constants.KubeletCrioImageCredProviderConfPath rather than duplicating the
literal paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@install/0000_80_machine-config_00_rbac.yaml`:
- Around line 156-162: The second rule grants the custom verb
"request-serviceaccounts-token-audience" across all core resources; narrow its
scope by moving that verb into the prior rule that already targets apiGroups:
[""] and resources: ["serviceaccounts"] so the combined rule lists verbs:
["get","list","request-serviceaccounts-token-audience"] (i.e., remove the
separate rule with resources: ["*"] and ensure only serviceaccounts retain the
custom verb).

In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`:
- Around line 549-565: The current handleQueueErr implementation requeues keys
after maxRetries by calling queue.Forget(key) then queue.AddAfter(key,
1*time.Minute), creating an endless retry loop; modify the logic in
Controller.handleQueueErr so that when queue.NumRequeues(key) >= maxRetries it
only calls queue.Forget(key) (and utilruntime.HandleError(err) / klog as
present) and does not call queue.AddAfter(key, 1*time.Minute) so the item is
dropped permanently instead of being requeued.
- Around line 1208-1228: The code only calls
syncCRIOCredentialProviderConfigStatusOnly(...,
apicfgv1alpha1.ConditionTypeValidated,
apicfgv1alpha1.ReasonConfigurationPartiallyApplied, ...) inside the "overlap"
branch, leaving a stale Validated=False when overlaps are later resolved; modify
the non-overlap path (the branch that proceeds to call syncIgnitionConfig and
returns nil/applied) to explicitly clear or set ConditionTypeValidated to a
success state (e.g., call syncCRIOCredentialProviderConfigStatusOnly(nil,
apicfgv1alpha1.ConditionTypeValidated,
apicfgv1alpha1.ReasonConfigurationValidated, "validated") or similar) so
meta.SetStatusCondition replaces the previous failure—add this call just before
returning nil or after successful MachineConfig rendering in the code paths that
use syncIgnitionConfig and when applied is true, referencing
syncCRIOCredentialProviderConfigStatusOnly and ConditionTypeValidated to locate
the insertion point.

---

Nitpick comments:
In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`:
- Around line 67-71: Replace the hardcoded credential provider path constants in
this file: instead of defining genericCredProviderConfigPath and
kubeletCrioImageCredProviderConfPath as string literals, build them from the
exported constants in pkg/daemon/constants: use
constants.KubernetesCredentialProvidersDir to construct the generic credential
provider config filename (generic-credential-provider.yaml) and use
constants.KubeletCrioImageCredProviderConfPath for the kubelet crio image
credential provider config; update the declarations where builtInLabelKey and
these two path constants are defined so the controller reuses
constants.KubernetesCredentialProvidersDir and
constants.KubeletCrioImageCredProviderConfPath rather than duplicating the
literal paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 049b026b-d651-43da-9a25-0d4750052dd9

📥 Commits

Reviewing files that changed from the base of the PR and between dc39060 and d1eef7d.

📒 Files selected for processing (13)
  • install/0000_80_machine-config_00_rbac.yaml
  • manifests/machineconfigcontroller/clusterrole.yaml
  • pkg/apihelpers/apihelpers.go
  • pkg/controller/container-runtime-config/container_runtime_config_controller.go
  • pkg/controller/container-runtime-config/container_runtime_config_controller_test.go
  • pkg/controller/container-runtime-config/helpers.go
  • pkg/controller/container-runtime-config/helpers_test.go
  • pkg/daemon/constants/constants.go
  • templates/arbiter/01-arbiter-kubelet/_base/units/kubelet.service.yaml
  • templates/arbiter/01-arbiter-kubelet/on-prem/units/kubelet.service.yaml
  • templates/master/01-master-kubelet/on-prem/units/kubelet.service.yaml
  • templates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yaml
  • test/framework/envtest.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • templates/arbiter/01-arbiter-kubelet/_base/units/kubelet.service.yaml
  • pkg/apihelpers/apihelpers.go
  • test/framework/envtest.go
  • templates/master/01-master-kubelet/on-prem/units/kubelet.service.yaml
  • pkg/controller/container-runtime-config/container_runtime_config_controller_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants