Releases: projectsveltos/addon-controller
Release list
v1.12.0
🚀 New Features
Supply Chain Verification for Helm Charts
Sveltos can now verify the integrity and origin of a Helm chart before deploying it, with two mechanisms targeting different chart sources. For charts pulled from OCI registries, Sveltos verifies the Cosign signature attached to the chart: a PublicKey provider checks it against a static key stored in a Kubernetes Secret, while a Keyless provider verifies the Fulcio-issued certificate against an expected OIDC issuer/subject and confirms the signing event was recorded in the Rekor transparency log, so the chart must have been signed by a specific pipeline in a specific repository. Both providers support the Sigstore Bundle v0.3 OCI referrer format and fall back to the legacy tag-based signature format. For charts pulled from HTTP repositories, Sveltos verifies the Helm .prov provenance file against a GPG keyring stored in a Secret. In both cases a failed verification blocks the deployment and the reason is recorded on the ClusterSummary status; charts without a verification field deploy as before.
PRs: addon-controller #1842, sveltos #753
Workload Identity Support
SveltosCluster now supports authenticating to a managed cluster using the cloud provider's native workload identity instead of a stored kubeconfig Secret: AWS (IRSA / EKS Pod Identity), GCP (Workload Identity Federation), and Azure (Azure Workload Identity). When configured, Sveltos obtains short-lived credentials directly from the cloud provider, caching them in-process and refreshing proactively before expiry. sveltosctl register cluster has been extended to configure workload identity when registering a cluster.
PRs: libsveltos #636, sveltosctl #434
OCI Support in PolicyRef
RemoteURL in PolicyRef now accepts oci:// URLs in addition to http:// and https://. Sveltos pulls the OCI artifact from the registry on each reconciliation at the configured interval, computes a content hash, and redeploys when the content changes, identical to the existing HTTP polling behavior. Authentication uses the same secretRef field, supporting a bearer token, basic auth, or a custom CA certificate. Both a tar archive (the standard ORAS/Flux format) and a raw YAML/JSON blob are supported as artifact layouts.
PR: addon-controller #1851
Classify Clusters from Management Cluster Resources
Classifier evaluates rules against resources inside each managed cluster, which leaves a gap when the classification signal instead lives on the management cluster itself, such as a Crossplane Composite Resource created when a team orders an addon on an Internal Developer Platform. A new ManagementClusterClassifier resource closes that gap: it watches resources on the management cluster and runs a Lua function that receives the full set of matched resources and returns which managed clusters should be labeled. A ManagementClusterClassifierReport tracks label ownership per classifier/cluster pair, giving the same conflict detection the existing Classifier provides.
PR: classifier #482
sveltosctl: show classifier-labels
A new sveltosctl show classifier-labels command displays the labels that Classifier and ManagementClusterClassifier instances are actively managing on each cluster, along with the name of the instance that owns each label.
PR: sveltosctl #437
⚙️ Improvements
sveltos-agent: Reduced Memory Usage in Agentless Mode
In agentless mode, each sveltos-agent instance runs in the management cluster and is responsible for a single managed cluster, but its controller-runtime cache had no namespace or label restrictions, so every agent's informer held HealthCheckReport, EventReport, and ConfigMap objects for all managed clusters, causing O(N) memory per agent and O(N²) total. The cache is now scoped per agent: HealthCheckReport and EventReport are restricted with a label selector matching the agent's cluster name and type, and the per-cluster ConfigMap is restricted with a field selector on its name.
PR: sveltos-agent #493
addon-controller: Condition-Based Wait for CRD Reapply
When a Helm chart is deployed with UpgradeCRDs: true, the controller used to sleep for 30 seconds after applying the chart's CRD files on every reconcile, regardless of whether the CRDs had actually changed, which was especially costly in ContinuousWithDriftDetection sync mode. The sleep is replaced with a poll against the destination cluster's CRD status, returning as soon as each CRD's Established and NamesAccepted conditions are true, effectively zero delay when CRDs are already established and no more than necessary for a genuine upgrade.
PR: addon-controller #1844
🐞 Bug Fixes
sveltos-agent: Reloader Feature Broken in Agentless Mode
In agentless mode, sveltos-agent watched every Reloader instance in the management cluster instead of only the subset belonging to its own managed cluster, and watched ConfigMap/Secret objects in the management cluster instead of the managed cluster. Both are now scoped correctly.
PR: sveltos-agent #494
addon-controller: Tier Change Not Triggering Takeover
Once a ClusterProfile was managing a chart, its tier was never compared against other conflicting profiles on later reconciliations, so raising a profile's tier above a challenger stuck in FailedNonRetriable never woke the challenger up. Tier comparison now runs on every reconcile even for the profile currently managing the chart, so a higher-tier challenger correctly reclaims it.
PR: addon-controller #1837
access-manager: Configurable Namespace
The Sveltos namespace was hardcoded to projectsveltos; this closes a gap missed in the previous release by allowing it to be configurable, part of the broader effort to remove that limitation across components.
PR: access-manager #348
classifier: Upgrade Blocked by Deleted Clusters
The migration init container that upgrades deprecated ClusterInfo entries into ClassifierReport objects would crash-loop and block the upgrade entirely if a managed cluster had been deleted before the upgrade, since its namespace no longer existed. Entries whose cluster namespace is gone are now logged at debug level and skipped, while all other entries migrate normally.
PR: classifier #480
sveltos-agent: Wait for Informer to Be Synced
In agentless mode, sveltos-agent can restart its internal controller-manager without a pod restart, for example when a managed cluster's kubeconfig token expires or a CRD change is detected. If an EventSource was evaluated in the window before the new informers had synced, it returned an empty result that was misread as zero matching resources, causing event-manager to delete previously created ClusterProfiles. An unsynced informer is now treated as a transient error and the evaluation is retried once the informers have synced.
PR: sveltos-agent #495
🔧 Maintenance
- All components advanced to Cluster API v1.13.3
- All components advanced to Kubernetes client-go v1.36.2
v1.11.1
🐞 Bug Fixes
ClusterProfile deletion no longer stalls (addon-controller #1829 (#1829))
Fixed a bug where deleting a ClusterProfile could get stuck indefinitely. The allMatchingProfilesProcessed check was not skipping other profiles that were themselves already being deleted, causing the controller to wait forever for work that would never complete.
MCP compare-clusters tool reliability (mcp-server #57 (projectsveltos/mcp-server#57))
Corrected several issues in the compare_managed_clusters MCP tool. A not-found cluster now surfaces a clear error instead of silently returning an empty result. Additionally, when a cluster exists but its ClusterConfiguration has not yet been created (resources still deploying), the tool now returns a non-fatal warning in the output rather than returning misleading empty comparison data.
Spurious pending-updates indicator in the UI (ui-backend #171 (projectsveltos/ui-backend#171))
Fixed incorrect cluster status reporting in the UI backend. A cluster was wrongly flagged as having pending updates when a profile matched it and that profile had declared dependencies. The status is now computed correctly in those cases.
v1.11.0
🚀 New Features
Health Checks: Metric-Based Validation
ValidateHealth now supports querying a Prometheus-compatible endpoint as an additional data source. Named scalar values are exposed to the Lua evaluate() function via a metrics table, enabling checks such as "error rate below 5 %". In push mode the endpoint must be reachable from the management cluster; in pull mode the sveltos-applier agent running inside the managed cluster reaches it directly via in-cluster DNS.
PR: addon-controller #1816 (#1816)
Kubernetes Events for Deployment Failures
The addon-controller now raises Kubernetes Warning events to make failure causes immediately visible via kubectl describe or any event-watching tool. Events are emitted for conflicts, missing referenced resources, template instantiation errors, and when the controller gives up after reaching the maximum consecutive failure count.
PR: addon-controller #1815 (#1815)
Dashboard: Cluster Deployment Health Signals
The cluster list now surfaces deployment state at a glance. An amber alert icon appears when one or more profile deployments are failing; a blue clock icon appears when deployment is actively in progress with no failures. The ui-backend computes these signals as O(1) in-memory indexes updated by the existing ClusterSummary watcher, so no per-request scanning is required.
PRs: dashboard #173 (projectsveltos/dashboard#173), ui-backend #167 (projectsveltos/ui-backend#167)
HealthCheck: Surface "No Resources Found" as a Degraded Status
When a HealthCheck's resource selectors matched nothing, the resulting HealthCheckReport was silently empty and sveltosctl show resources showed nothing. The Lua evaluate() function can now detect an empty resources table and return a top-level degraded status (e.g. "No deployments found in namespace metrics"), which is surfaced in the report and in sveltosctl.
PR: sveltos-agent #489 (projectsveltos/sveltos-agent#489)
MCP Server: Classifier Pipeline Analysis Tool
A new tool has been added to the Sveltos MCP server to analyze classifier pipelines, making it easier to inspect and reason about classifier configuration via AI-assisted workflows.
PR: mcp-server #55 (projectsveltos/mcp-server#55)
Shard-Controller: Patch Support
A new --shard-components-config flag accepts the name of a ConfigMap in the Sveltos namespace. The ConfigMap holds one or more patches (JSON6902 or strategic-merge) that are applied to the five Deployments shard-controller creates per shard. Target selectors allow a single ConfigMap to patch only specific components. When the ConfigMap changes, a dedicated reconciler re-deploys all active shards immediately so patches take effect without restarting shard-controller.
PR: shard-controller #216 (projectsveltos/shard-controller#216)
🐞 Bug Fixes
addon-controller: Logging, Pull-Mode Status Handler, Helm Data Race, and Dependency Manager
Fixed a logging mistake where cluster name was populated with the cluster namespace in two places. Corrected the pull-mode agent status handler, which could reach a nil dereference when the agent returned an error without a status payload. Fixed a data race in the Helm chart manager where a shared map was read without holding the mutex. The dependency manager's background update loop now snapshots work, releases the write lock before making API calls, and re-acquires it only to clear completed entries — unblocking concurrent reconcilers. Its startup rebuild loop now waits 5 seconds between retries instead of spinning at full speed on API errors.
PR: addon-controller #1823 (#1823)
sveltos-applier: ClassifierReport Retry, Discovery Loop, and Namespace Client Reuse
ClassifierReports were silently marked as delivered even when the push to the management cluster failed, so they were never rld also continue with an empty Classifier object on non-NotFound API errors. The discovery retry loop in the REST mapperrefresh executed exactly once regardless of the loop variable, causing immediate failure on transient API server slowness. A new Kubernetes client — and with it a new HTTP connection pool — was being created for every single resource that needed a namespace ensured; it is now
created once per reconciliation pass.
PR: sveltos-applier #89 (projectsveltos/sveltos-applier#89)
sveltos-agent: Handful Bugs in the Evaluation Package
Nine bugs in the evaluation package have been corrected.
PR: sveltos-agent #490 (projectsveltos/sveltos-agent#490)
🔧 Maintenance
- All components advanced to Go v1.26.4
- All components advanced to golangci-lint v1.12.1
v1.10.0
🚀 New Features
OIDC Authentication in Dashboard
Users can now log into the Sveltos dashboard using OpenID Connect (Authorization Code Flow with a public client), as an alternative to manual token authentication.
PR: dashboard #160 (projectsveltos/dashboard#160)
🐞 Bug Fixes
EventManager: Ordered Removal of Resources
When stale ClusterProfile resources were being cleaned up by the event-manager, referenced resources could be removed before the ClusterProfile was fully deleted, causing ordering violations. Sveltos now waits for stale ClusterProfiles to be fully deleted before removing their referenced resources.
PR: event-manager #485 (projectsveltos/event-manager#485)
addon-controller: Guaranteed Helm Chart Handoff Between Profiles
When a cluster atomically switched from one ClusterProfile to another (both referencing the same Helm chart), a race condition could cause a delete-and-reinstall instead of an in-place upgrade. Sveltos now verifies that every matching profile has had its ClusterSummary fully processed by the chart manager before allowing an uninstall, avoiding unnecessary downtime.
PR: addon-controller #1780 (#1780)
addon-controller: Surface Errors for Missing Non-Optional TemplateResourceRefs
When a non-optional resource referenced in TemplateResourceRefs was missing, no error was reported in the ClusterSummary, making it difficult to diagnose why a profile was not being deployed. The failure message is now surfaced directly in the ClusterSummary status.
PR: addon-controller #1790 (#1790)
shard-controller: Correct Flags for Init Container in Agentless Mode
When running in agentless mode, the agent-in-mgmt-cluster flag was not being set correctly for the addon-controller init container.
v1.9.0
🚀 New Features
Remote URL Support in PolicyRefs
Reference YAML content directly from HTTP/HTTPS endpoints: Previously, PolicyRef was limited to ConfigMaps and Secrets, which imposed an ~1 MB size cap. You can now define a remoteURL field pointing to any HTTP/HTTPS URL. Sveltos fetches and redeploys automatically whenever the content changes, driven by a configurable polling interval (default: 5 minutes).
Optional authentication via secretRef (token, username/password, or CA file) and Go template rendering are fully supported.
preDeployChecks
Gate deployments on cluster readiness: A new preDeployChecks field on ClusterProfile/Profile lets you define conditions that must pass before Sveltos deploys any resource. This provides a built-in operational gate — for example, blocking rollouts until a cluster reaches a healthy state.
Avoid Spurious Helm Upgrades
Stable revision counters after management cluster takeover: When a new management cluster reconciled clusters that already had Helm charts deployed, the absence of stored state caused Sveltos to run helm upgrade on every reconcile even when nothing had changed. Revision counters now remain stable on takeover. Charts with patches: configured and
ContinuousWithDriftDetection subsequent reconciliations are intentionally unaffected.
Show Addons: Filter by Helm Charts or Resources
Targeted addon inspection in sveltosctl: sveltosctl show addons gains two new flags — --helm-charts to display only Helm releases and --resources to display only Kubernetes resources. This makes it easier to inspect large deployments without noise from unrelated resource types.
PR: sveltosctl #427
Dashboard DryRun Information
Simulation results in the Sveltos dashboard: The Sveltos dashboard now surfaces DryRun simulation results. Operators can review exactly what changes would be applied to each cluster before committing a profile to active mode — without leaving the dashboard.
🐞 Bug Fixes
Drift Detection and KustomizationRefs
Configuration drift for KustomizationRef-deployed resources was not being detected or repaired. Resources deployed via KustomizationRefs (e.g. through a Flux GitRepository) are now correctly tracked by the drift-detection agent and reconciled when changed out-of-band.
Helm Chart Errors in Pull Mode
Partial ConfigurationBundle instances no longer reach the applier: When processing Helm charts in pull mode, an error mid-flight could cause a partially populated ConfigurationBundle to be committed. The applier would then treat missing resources as deleted, pruning live workloads or deploying broken stacks. Bundle preparation is now atomic — any error discards all
partial state before it can be committed.
ClusterPromotion/ClusterProfile Ordering
When ClusterPromotion creates ClusterProfile resources, the order of HelmCharts, KustomizationRefs, and PolicyRefs must be preserved. A bug was causing the order in generated ClusterProfile resources to diverge from the ClusterPromotion definition. This is now fixed.
EventTrigger: Stale Profiles on EventSource Change
When an EventTrigger's referenced EventSource was updated, the ClusterProfile, ConfigMap, and Secret resources created for the previous EventSource were left as stale orphans in the management cluster. These are now correctly cleaned up whenever the EventSource reference changes. ...
v1.8.0
🚀 New Features
GitOps Friendly Kubeconfig Renewal
Support for In-place Secret Updates: Previously, Sveltos rotated kubeconfigs by creating a new key (re-kubeconfig) and updating the SveltosCluster spec. This caused drift in GitOps tools like ArgoCD or Flux.
You can now define spec.tokenRequestRenewalOption.kubeconfigKeyName. If set to the original key name, Sveltos will overwrite the existing Secret in-place and skip updating the Spec, keeping your live state and Git source-of-truth in sync.
PR: sveltoscluster-manager 361
Enhanced Helm Testing
Native Helm Tests: Introduced RunTests in the Helm configuration. When set to true, Sveltos will automatically run Helm test hooks (helm.sh/hook: test) after successful installs or upgrades.
Failing tests will surface as deployment failures in the ClusterSummary status, providing an automated operational gate.
Flexible Namespace Management
Skip Namespace Creation: To support multi-tenant environments where Sveltos may have restricted RBAC (lacking cluster-wide Namespace permissions), we’ve introduced SkipNamespaceCreation to PolicyRef and KustomizationRef.
When enabled, Sveltos bypasses the check/creation logic and attempts to deploy resources directly into pre-provisioned namespaces.
🐞 Bug Fixes
Helm & Patching Consistency
- Patch-Triggered Redeployments: Fixed an issue where modifying Helm patches did not trigger a redeploy because the chart version and values remained unchanged. Sveltos now tracks a hash of the patches to accurately detect changes.
- Persistence of Failure Messages: Resolved a sequencing bug where FailureMessage for Helm releases was being overwritten in the API server before it could be persisted.
- Template Resolution in Summaries: Fixed a lookup failure where failure messages weren't being correctly mapped when ReleaseName or ReleaseNamespace contained Go templates.
Controller Robustness
- Out-of-band Deletions: Fixed a reconciliation deadlock that occurred if a ClusterSummary was deleted manually while the cluster was in an UpdatingClusters state. The controller now recovers gracefully by dropping the cluster from the update list and forcing a requeue.
v1.7.0
✨ Key Highlights
- Reconciliation Stability (#1657): Fixed "reconciliation storms" in production environments by implementing a robust NextReconcileTime guard and an in-memory cooldown map to prevent tight loops and high CPU usage.
- Discovery Optimization (#1644): Drastically reduced API overhead by caching DiscoveryClient and RESTMapper. Added targeted invalidation to discover new CRDs instantly without restarts.
- Granular Helm Debugging (#1663): Added FailureMessage to individual Helm release summaries, making it easier to pinpoint exactly which chart failed and why.
- Unified Promotion Logic (#1666): Shared validation and health check logic across both Auto and Manual promotion modes for consistent guardrails.
Advanced to:
- Helm v4
- Updated to Flux v1 OCIRepository/Bucket sources.
- Upgraded to Go v1.26.1 and CAPI v1.12.4.
v1.6.1
🐞 Bug Fixes
#1641: Hardened Helm Lifecycle: Resolved issues causing Helm releases to stall or fail during upgrades.
- Improved handling of missing sub-charts within the dependency management flow.
- Added metadata sanitization to prevent "invalid semantic version" errors during the upgrade process.
✨ Improvements
Intelligent Resource Filtering: Optimized the EventSource and HealthCheck reconcilers. The system now performs a pre-evaluation check on Name and Namespace before triggering a full evaluation. This eliminates redundant processing for resources that don't match your ResourceSelector criteria.
v1.6.0
🚀 Release Notes: Performance & Stability Update
This release focuses heavily on infrastructure efficiency and core stability. We have significantly optimized the resource footprint of our edge components and addressed several critical bugs in the addon-controller.
⚡ Performance Optimizations
We have optimized the resource management for sveltos-agent and drift-detection-manager. These components are now leaner and more efficient, particularly in large-scale environments.
- Memory Efficiency: Drastically reduced memory consumption, specifically targeting system admin memory overhead. This ensures a smaller footprint on managed nodes.
- CPU Optimization: Refined execution loops to lower CPU cycles during idle and reconciliation phases.
🐞 Bug Fixes
This version resolves several edge-case behaviors and stability issues:
-
#1635: Clean up Stale ResourceSummaries (Agentless): Fixed an issue in agentless mode where ResourceSummary objects were not being properly cleaned up, leading to stale data in the management cluster.
-
#1632: Resolve Helm Installation Deadlock: Addressed a critical bug where Helm installations could enter a deadlock state, preventing the deployment from moving forward.
-
#1630: Fix Drift Detection Upgrade (Agentless): Resolved a failure during the upgrade process of the drift detection mechanism when running in agentless mode.
✨ Improvements
-
#1625: New FailedClusters Status Field: Surfaced orchestration-level errors (e.g., failure to create/update a ClusterSummary) directly in the ClusterProfile status. This eliminates "blind spots" where users previously had to check controller logs to understand why a profile wasn't progressing.
-
#1620: Specialized Health Check Error Handling: * Introduced a dedicated HealthCheckError type to distinguish between deployment failures and functional health check failures. Added the --health-error-retry-time CLI flag (default: 90s). This allows the controller to back off specifically on health failures without affecting standard reconciliation requeue logic.