Releases: Cloudzero/cloudzero-charts
1.2.10
1.2.10 (2026-03-23)
Release 1.2.10 brings significant new integrations, security improvements, enhanced diagnostics configurability, and numerous bug fixes. Highlights include Istio service mesh support, a more secure cAdvisor collection mode, granular control over validator diagnostic checks, and the switch to a CloudZero-maintained Alloy binary.
Key Features
-
Istio Service Mesh Integration: Added comprehensive Istio support with automatic detection and runtime validation. The agent detects sidecar and ambient mesh modes, validates cluster ID configuration for multicluster environments, and automatically applies Istio port exclusion annotations. Includes DestinationRule and VirtualService templates for traffic fencing. Configure with
integrations.istio.enabled(defaults to auto-detection) andintegrations.istio.clusterIDfor multicluster setups. -
Direct Kubelet cAdvisor Collection: New
integrations.cAdvisor.directNodeAccess.enabledoption collects cAdvisor metrics directly from node kubelets (port 10250) instead of through the Kubernetes API server proxy. This significantly improves security posture by requiring onlynodes/metricsRBAC permission instead ofnodes/proxy, which grants cluster-wide remote code execution capability. Works in both Prometheus and Alloy modes. The previousprometheusConfig.scrapeJobs.cadvisor.enabledis now deprecated in favor ofintegrations.cAdvisor.enabled. -
Per-Check Validator Diagnostics: Replaced the per-stage
enforceboolean with a granular per-check type system. Each diagnostic check can now be configured asrequired(blocks pod startup on failure),optional(warns but allows startup),informative(always passes, gathers information), ordisabled(skipped). Configure viacomponents.validator.checks.<stage>.<check>: <type>in values.yaml. -
CloudZero Alloy Fork: Clustered mode now uses a CloudZero-maintained Alloy binary embedded directly in the agent image, eliminating the separate Grafana Alloy container image pull. Users can still override via
clusteredNode.imageand the newclusteredNode.commandvalue.
Bug Fixes
-
Webhook Service Name in Validator: Fixed incorrect service name reference in the validator ConfigMap (
cloudzero-agent-cz-webhook-svcvs actualcloudzero-agent-cz-webhook). This caused thewebhook_server_reachablepost-start check to fail with DNS lookup errors, adding ~70 seconds of unnecessary delay to pod startup. -
Image Pull Secrets Inheritance: Fixed
defaults.image.pullSecretsnot being applied to 6 of 8 workload templates. All templates now correctly use the fallback chain: component-specificimage.pullSecrets→defaults.image.pullSecrets→ deprecated top-levelimagePullSecrets. -
cAdvisor Schema Constraint: Removed an
enum: [true]constraint onprometheusConfig.scrapeJobs.cadvisor.enabledthat made it impossible to disable the cAdvisor scrape job via values without bypassing schema validation.
Improvements
-
Scout Configuration Override: Scout-detected values (region, cloud account ID, cluster name) from instance metadata now always override customer-provided values, with customer values used only as a fallback when detection returns empty. A warning is logged when detected and configured values differ.
-
EndpointSlice Service Discovery: Migrated Prometheus and Alloy internal service discovery from the deprecated
role: endpointstorole: endpointslice, ensuring forward compatibility with Kubernetes 1.33+ where the Endpoints API is deprecated. -
Active Series Metric: Now collects
prometheus_remote_write_wal_storage_active_seriesfrom Alloy's self-scrape job, useful for diagnosing memory usage in high-volume deployments. -
Configuration Documentation: Added extensive inline documentation to Alloy and Prometheus configuration templates, including architecture diagrams, component reference tables, and data flow explanations for each scrape job.
Support Tooling
-
Label/Annotation Enumeration: New
scripts/kube-list-labels-annotations.shscript enumerates all labels and annotations in use across a cluster with frequency counts, useful for debugging label-based cost allocation. -
Diagnostic Script Improvements: The anaximander diagnostic script now tracks command success/failure with a summary report, and uses updated label selectors matching the current naming conventions.
Experimental Features
- KubeState Plugin: Added an experimental option to replace the kube-state-metrics subchart with an embedded Alloy plugin for Kubernetes state metrics collection. Enable with
components.agent.kubeState.enabled: truein clustered mode.
Build and Infrastructure
- Go version updated to 1.25.7
- Base image updated to latest distroless/static-debian12
- Updated copyright year to 2026
- Numerous dependency updates (Prometheus, Kubernetes client libraries, etc.)
Upgrade Steps
To upgrade to version 1.2.10, run the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.101.2.9
1.2.9 (2025-11-25)
Release 1.2.9 focuses on quality, scalability, configurability, and consistency. It includes significant improvements to the organization and configurability of the Helm chart, Prometheus 3.x support (with 2.x support preserved for now), an HPA for the webhook server, as well as some early previews of experimental functionality we hope to stabilize over the next few releases.
Key Improvements
-
Prometheus 3.x Support: Upgraded default Prometheus version from 2.55.1 to 3.7.3 with automatic backward compatibility. The chart detects Prometheus version and uses the appropriate agent mode flag (
--agentfor 3.x,--enable-feature=agentfor 2.x). Customers using custom Prometheus 2.x images will continue to work without changes. -
Webhook Server Autoscaling: Added Horizontal Pod Autoscaler support for the webhook server, enabling automatic scaling based on CPU and memory utilization. Enable with
insightsController.autoscaling.enabled: true.
Configuration Improvements
-
Unified Label/Annotation System: Refactored label and annotation generation with new
generateLabelsandgenerateAnnotationshelpers. Labels now follow Kubernetes recommended practices withapp.kubernetes.io/namefor component identity andapp.kubernetes.io/part-of: cloudzero-agentfor chart membership. -
Component-Specific Metadata: Added support for component-specific
labels,annotations,podLabels, andpodAnnotationsacross all workload types, providing fine-grained control over Kubernetes metadata. -
Centralized Resource Names: Implemented unified resource naming pattern (
{release-name}-cz-{component}) across all Kubernetes resources, improving consistency and enabling programmatic name reconstruction. -
Unified Mode Configuration: New
components.agent.modeproperty consolidates deployment mode selection with values:agent,server,federated, andclustered. Legacy properties continue to work with automatic derivation. -
Cohesive Replicas System: New
defaults.replicasproperty provides global default with mode-specific constraints. -
Persistent Volume Strategy: Changed deployment strategy to
Recreatewhen persistent volumes are enabled, preventing volume mount conflicts during rolling updates.
Reliability Improvements
-
Subchart Isolation: Excluded
.globalsections from configuration checksum calculation, preventing unnecessary pod restarts when parent chart globals change in subchart deployments. -
Cert-Manager Compatibility: Fixed ArgoCD reconciliation failures by removing empty
caBundlekey from webhook configuration when using cert-manager (which injects the CA bundle via annotation). -
DNS Resolution: Updated cAdvisor configuration to use fully qualified domain name
kubernetes.default.svc.cluster.local:443for improved DNS resolution reliability.
Support Tooling
-
Diagnostic Script: Added
scripts/anaximander.shfor comprehensive diagnostic information gathering. Customers can run this script to collect logs, configurations, resource status, and environment context for CloudZero support. -
Post-Install Guidance: Updated Helm NOTES.txt with improved post-installation guidance and next steps.
Experimental Features
The following features are experimental and may change in future releases:
-
Grafana Alloy Integration: Added Grafana Alloy as an alternative to Prometheus for metrics collection in high-volume environments. Configure with
components.agent.mode: clusteredto enable. -
GPU Metrics Collection: Added NVIDIA DCGM GPU metrics scraping. Enable with
prometheusConfig.scrapeJobs.gpu.enabled: true. Note that this is just for collection, CloudZero does not yet support cost allocation based on GPU.
Upgrade Steps
This release includes changes to immutable Kubernetes selectors, requiring the --force flag to recreate affected resources:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.9 --force1.2.8
1.2.8 (2025-10-10)
Release 1.2.8 is a maintenance release focused on quality assurance improvements, with enhanced testing infrastructure, better configuration validation, and useful configuration enhancements.
Key Features
- Configurable Labels and Annotations: All Kubernetes resources now support customizable labels and annotations through Helm values, providing better integration with organizational policies and tooling.
- Enhanced Build System: Specialized build configurations now support environment-specific customization (e.g., Replicated builds), simplifying multi-environment deployments.
Configuration Improvements
- Centralized Validation: Moved
apiKey/existingSecretNamevalidation from Helm templates to JSON Schema, centralizing all configuration validation in a single location for improved maintainability. - Service Port Protocol: Added configurable
protocolfield for webhook server service ports, improving compatibility with service mesh configurations.
Quality Assurance
- CI/CD Infrastructure Overhaul: Restructured the entire CI testing infrastructure to support more comprehensive testing during development, including expanded Kubernetes version coverage (now testing against 1.33 and 1.34) and improved test isolation.
- Unified Testing Framework: Introduced consolidated testing infrastructure with new
test-alltarget covering unit tests, integration tests, Helm tests, and KUTTL end-to-end tests. - Workflow Validation: Added actionlint for GitHub Actions workflow validation and markdownlint-cli2 for documentation quality checks.
- Documentation Expansion: Significantly expanded project documentation with comprehensive guides for development, testing, architecture, and troubleshooting.
Upgrade Steps
To upgrade to version 1.2.8, run the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.81.2.7
1.2.7 (2025-08-22)
Release 1.2.7 removes a dependency on the Bitnami kubectl container image, replacing it instead with a new cloudzero-certifik8s executable. This is critical as Broadcom (who controls Bitnami through VMware) is doing away with the old bitnami images.
Key Features
- Go-Based Certificate Management: Complete transformation from bash scripts to modern Go-based certificate management with the new
cloudzero-certifik8stool, providing enhanced security, testability, and maintainability. - Comprehensive Security Context: Added security context to all Kubernetes resources (pods, containers, jobs, deployments, daemonsets) with secure defaults and component-specific overrides.
- Enhanced Shipper Reliability: Improved shipper logging and fixed a replay file processing bug that could cause successful uploads to be incorrectly abandoned.
Security Enhancements
- Certificate Management Security: Replaced bash scripts with secure Go-based certificate generation, eliminating dependency on deprecated bitnami/kubectl Docker image and implementing proper RBAC with reduced permissions.
- Security Context Implementation: Added comprehensive security context to all Helm templates with secure defaults (
runAsUser: 65534,runAsNonRoot: true) and proper property filtering for pod vs container contexts. - Checkov Security Compliance: Enabled security context rules (CKV_K8S_29, CKV_K8S_30, CKV_K8S_23) after implementing proper security contexts across all resources.
- RBAC Improvements: Enhanced cluster-scoped permissions for certificate management with resource-specific restrictions and proper Kubernetes client integration.
Shipper Reliability Improvements
Replay File Processing Fix:
- Fixed critical bug where successfully uploaded files were incorrectly abandoned
- Corrected replay request loop to iterate over reference IDs instead of URLs
- Enhanced abandon operation logging with file-specific details (reference_id and reason)
- Added comprehensive debug logging for replay request processing
Enhanced Logging:
- Improved abandon operation logging to include file-specific details
- Added debug logging for replay request processing
- Fixed smoke test failures related to replay request processing
Configuration Enhancements
CloudAccountId Validation:
- Enhanced JSON schema to allow quoted values for better user experience
- Added support for quoted numeric and UUID values (e.g., '1234567890', '123e4567-e89b-12d3-a456-426614174000')
- Implemented comprehensive test coverage for all quote scenarios
- Added warning notes discouraging manual configuration of auto-detectable properties
Upgrade Steps
To upgrade to version 1.2.7, run the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.71.2.6
1.2.6 (2025-08-05)
Release 1.2.6 introduces a new CronJob-based backfill system, comprehensive resource management improvements, and enhanced security and reliability features.
Key Features
- Enhanced Shutdown Coordination: Implemented robust file-based shutdown coordination between collector and shipper containers with intelligent waiting mechanisms and timeout protection, ensuring graceful shutdown sequences.
- Dual Backfill System: Implemented both a CronJob for scheduled runs (default: every 12 hours) and an immediate Job for instant execution on install, providing both immediate execution and configurable recurring runs for ongoing data collection.
- Comprehensive Resource Management: Systematic refactoring of all components with centralized resource generation, providing consistent resource request/limit configurations across all containers.
Security Enhancements
- Checkov Security Integration: Added comprehensive security analysis with Checkov to build system and CI, fixing multiple Kubernetes security violations including missing liveness and readiness probes.
- Fail-Open Webhook Validation: Implemented true fail-open behavior for webhook validation with always-allow behavior, ensuring webhook validation never blocks Kubernetes resource operations.
- Enhanced Health Monitoring: Added proper liveness and readiness probes to prometheus-config-reloader container, improving container health monitoring and automatic restart capabilities.
- Comprehensive Security Context Implementation: Added configurable security context support to all Kubernetes pods and containers.
Additional Enhancements
- Observability Improvements: Removed observability files on upload to prevent storage bloat and improve performance.
- Scout Configuration Enhancement: Updated scout to return Google project number instead of project ID for improved metadata accuracy.
- Cloud Account Validation: Added JSON Schema validation for cloudAccountId contents to ensure proper configuration.
- Image Pull Secrets: Added image pull secrets support for config loader and helmless jobs for enhanced security.
Technical Improvements
- Centralized Resource Generation: Created reusable helper functions for consistent resource configuration patterns across all templates.
- Backward Compatibility: Maintained full backward compatibility through legacy precedence logic, ensuring existing deployments continue to work without changes.
- Comprehensive Testing: Added 20 test suites with 87 total tests covering all fallback scenarios, security context functionality, and edge cases.
Resource Configuration Details
New Component Structure:
- Core Components:
components.agent.resources,components.aggregator.collector.resources,components.aggregator.shipper.resources,components.webhookServer.resources - Job Components:
components.miscellaneous.configLoader.resources,components.webhookServer.backfill.resources,components.agent.federatedNode.resources - New Components:
components.helmless.resources,components.initCertJob.resources - Specialized Components:
components.agent.configmapReloader.resources,components.validator.resources,components.kubeStateMetrics.resources
Upgrade Steps
To upgrade to version 1.2.6, run the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.61.2.5
1.2.5 (2025-07-25)
Release 1.2.5 is a critical maintenance release that fixes a webhook configuration issue affecting resource metadata collection. Due to a single-character difference in resource names (using singular instead of plural), the webhook server was not collecting the necessary information for labels and annotations. Customers on versions 1.2.3 and 1.2.4 should upgrade immediately.
Critical Fix
- Webhook Configuration Fix: Fixed a critical bug where the webhook server was not collecting resource metadata due to incorrect resource name configuration. This affected label and annotation collection for all resources processed by the webhook.
Key Features
- Enhanced Webhook Configuration: Fixed webhook misconfiguration issues and improved integration testing infrastructure with comprehensive validation and debugging capabilities.
- AWS IMDSv1 Fallback Support: The CloudZero Agent's AWS scout implementation now gracefully falls back from IMDSv2 to IMDSv1 when the token endpoint is unavailable, ensuring compatibility with clusters that don't have IMDSv2 enabled. This maintains security preference for IMDSv2 while providing compatibility with IMDSv1-only environments.
- Comprehensive Troubleshooting Guide: Added a troubleshooting guide covering quick diagnosis, component-specific troubleshooting, network policies, certificate issues, and scaling problems with clear escalation paths.
Additional Enhancements
- Security Documentation: Significantly expanded SECURITY.md with detailed security considerations, vulnerability reporting procedures, and best practices for secure deployment.
- Scout Error Messages: Enhanced scout configuration error messages with specific Helm chart parameter guidance, making troubleshooting more actionable.
- Cloud Provider Detection: Added cloud provider information to cluster configuration for improved metadata collection and environment awareness.
- Test Infrastructure: Improved webhook integration testing with centralized Kind cluster configuration, enhanced test maintainability, and comprehensive validation.
- Dependency Updates: All third-party dependencies have been update to the latest versions.
Technical Improvements
- Webhook Reliability: Fixed service name resolution and improved webhook test validation with comprehensive debugging capabilities
- Documentation Quality: Added systematic troubleshooting approach with label selector commands and component-specific diagnostic procedures
- Build System: Enhanced test infrastructure with better organization and maintainability
- AWS Metadata Service Compatibility: Implemented robust fallback mechanism for AWS metadata retrieval with clear error distinction between IMDSv2 and IMDSv1 failures
Upgrade Steps
To upgrade to version 1.2.5, run the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.51.2.4
1.2.4 (2025-07-17)
Release 1.2.4 is a maintenance release including Improved Metrics Filtering, and Collector Interval Adjustments for better performance. This release focuses on operational improvements, build efficiency, and enhanced visibility into metric processing.
Key Features
- Optimized Collection Intervals: Increased cost metrics collection interval from 10 minutes to 30 minutes for better performance in smaller clusters, while reducing observability metrics timeout to 10 minutes to maintain cluster connectivity visibility.
- Enhanced Scout Auto-Detection: The confload job now leverages the Scout system to automatically detect cloud environment metadata (region, account ID, cluster name) when these values are not explicitly provided, significantly simplifying deployment configuration.
- Dramatic Docker Build Performance: Build times reduced from 2:30-3:00 minutes to ~12 seconds through multi-stage builds with platform-specific caching, selective file copying, and conditional dependency generation.
- Dropped Metrics Tracking: The metric filter now provides visibility into filtered-out metrics through debug logging, making it easier to debug filter configurations and understand metric processing behavior.
Additional Enhancements
- Backfiller Reliability: Fixed GroupVersionKind issues and race conditions in namespace and node processing, with comprehensive integration testing.
- Test Infrastructure: Improved test reliability by fixing flaky tests related to file monitoring, file locking, and SQL timestamp formatting.
- Development Tooling: Added semantic diff targets (
*.{yaml,json}-semdiff) for better visibility into Helm template changes during development. - Dependency Management: Updated Dependabot to run on Wednesdays instead of Fridays for better alignment with patch release cycles.
Upgrade Steps
To upgrade to version 1.2.4, run the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.41.2.3
1.2.3 (2025-07-02)
Release 1.2.3 introduces Cloud Service Provider Auto-Detection, significant Performance Optimizations for the admission controller, enhanced Istio Integration, and numerous reliability improvements. This release dramatically simplifies deployment configuration while improving performance and compatibility with service mesh environments.
Key Features
- Cloud Service Provider Auto-Detection: The CloudZero agent now includes a comprehensive "scout" system that automatically detects cloud environment metadata including provider, region, account ID, and cluster name. This eliminates the need to manually configure these values in many deployments.
- AWS Support: Automatically detects region, account ID from EC2 instance metadata
- Google Cloud Support: Automatically detects region, project ID, and cluster name from GCE metadata
- Azure Support: Automatically detects region and subscription ID from Azure IMDS
- Webhook Server Optimization: The webhook server now explicitly requests only the Kubernetes resource types it needs instead of receiving all resources, significantly reducing network traffic and improving performance.
- Enhanced Istio Integration: The webhook server now automatically includes
sidecar.istio.io/inject: "false"annotation by default, providing seamless out-of-the-box compatibility with Istio service mesh environments without requiring manual configuration.
Additional Enhancements
- Improved Load Balancing: Enhanced webhook server connection handling with periodic connection rotation to ensure proper load distribution across service replicas in multi-replica deployments.
- Configurable Webhook Timeout: Added ability to configure webhook admission controller timeout values, and changed the default from 15 seconds to 1 second.
- Enhanced Pod Disruption Budget (PDB) Configuration: Completely reworked PDB validation and override behavior to prevent common configuration errors and provide more intuitive component-level overrides.
Upgrade Steps
To upgrade to version 1.2.3, run the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.31.2.2
1.2.2 (2025-06-24)
This is a maintenance release that includes important bug fixes and dependency updates to improve reliability and stability.
Bug Fixes
- Configuration Management: Fixed an issue where component-specific configuration merging was incorrectly modifying default values, potentially causing unexpected behavior.
- ConfigMap References: Updated ConfigMap name references in the loader job to use the correct naming convention, preventing resource lookup failures.
- JSON Schema Validation: Added support for properties which were previously not present in
values.yaml, but were used in the template. - Invalid Template Fixes: Fixed template generation for options were causing invalid Kubernetes resources to be generated.
- Allow resource_type Labels: The Aggregator no longer filters out "resource_type" and "workload" labels.
Enhancements
- Helmless Tool: Improved the helmless implementation by splitting it out from the CLI with enhanced testing coverage and removal of unnecessary functionality.
- Testing Infrastructure: Added checks to verify that all Kubernetes resources are created successfully during deployment validation.
- Testing Template Generation: Added kubeconform tests to validate generated templates.
Upgrade Steps
To upgrade to version 1.2.2, run the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.21.2.1
1.2.1 (2025-06-17)
This is primarily a bugfix release that resolves JSON Schema validation issues when the cloudzero-agent Helm chart is used as a subchart.
Bug Fixes
- Subchart Schema Validation: Fixed JSON Schema validation error that occurred when the cloudzero-agent chart was used as a subchart. Helm automatically adds a top-level 'global' property for subcharts, which was not previously allowed by the schema, causing validation failures.
Additional Enhancements
- Helmless Job: Added a Helm job that runs the helmless tool, providing an easy way to determine minimal configuration overrides by checking the job logs.
- Improved Logging: Both the collector and shipper now emit regular info-level log messages, providing positive confirmation that the agent is working correctly.
Testing Improvements
- Subchart Testing: Added comprehensive test coverage for subchart scenarios to prevent regression of schema validation issues.
Upgrade Steps
To upgrade to version 1.2.1, run the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.2.1