Releases · projectsveltos/addon-controller

Release list

v1.12.0 Latest

Latest

gianlucam76 released this 03 Jul 12:35

v1.12.0

b528b72

🚀 New Features

Supply Chain Verification for Helm Charts
Sveltos can now verify the integrity and origin of a Helm chart before deploying it, with two mechanisms targeting different chart sources. For charts pulled from OCI registries, Sveltos verifies the Cosign signature attached to the chart: a PublicKey provider checks it against a static key stored in a Kubernetes Secret, while a Keyless provider verifies the Fulcio-issued certificate against an expected OIDC issuer/subject and confirms the signing event was recorded in the Rekor transparency log, so the chart must have been signed by a specific pipeline in a specific repository. Both providers support the Sigstore Bundle v0.3 OCI referrer format and fall back to the legacy tag-based signature format. For charts pulled from HTTP repositories, Sveltos verifies the Helm .prov provenance file against a GPG keyring stored in a Secret. In both cases a failed verification blocks the deployment and the reason is recorded on the ClusterSummary status; charts without a verification field deploy as before.
PRs: addon-controller #1842, sveltos #753

Workload Identity Support
SveltosCluster now supports authenticating to a managed cluster using the cloud provider's native workload identity instead of a stored kubeconfig Secret: AWS (IRSA / EKS Pod Identity), GCP (Workload Identity Federation), and Azure (Azure Workload Identity). When configured, Sveltos obtains short-lived credentials directly from the cloud provider, caching them in-process and refreshing proactively before expiry. sveltosctl register cluster has been extended to configure workload identity when registering a cluster.
PRs: libsveltos #636, sveltosctl #434

OCI Support in PolicyRef
RemoteURL in PolicyRef now accepts oci:// URLs in addition to http:// and https://. Sveltos pulls the OCI artifact from the registry on each reconciliation at the configured interval, computes a content hash, and redeploys when the content changes, identical to the existing HTTP polling behavior. Authentication uses the same secretRef field, supporting a bearer token, basic auth, or a custom CA certificate. Both a tar archive (the standard ORAS/Flux format) and a raw YAML/JSON blob are supported as artifact layouts.
PR: addon-controller #1851

Classify Clusters from Management Cluster Resources
Classifier evaluates rules against resources inside each managed cluster, which leaves a gap when the classification signal instead lives on the management cluster itself, such as a Crossplane Composite Resource created when a team orders an addon on an Internal Developer Platform. A new ManagementClusterClassifier resource closes that gap: it watches resources on the management cluster and runs a Lua function that receives the full set of matched resources and returns which managed clusters should be labeled. A ManagementClusterClassifierReport tracks label ownership per classifier/cluster pair, giving the same conflict detection the existing Classifier provides.
PR: classifier #482

sveltosctl: show classifier-labels
A new sveltosctl show classifier-labels command displays the labels that Classifier and ManagementClusterClassifier instances are actively managing on each cluster, along with the name of the instance that owns each label.
PR: sveltosctl #437

⚙️ Improvements

sveltos-agent: Reduced Memory Usage in Agentless Mode
In agentless mode, each sveltos-agent instance runs in the management cluster and is responsible for a single managed cluster, but its controller-runtime cache had no namespace or label restrictions, so every agent's informer held HealthCheckReport, EventReport, and ConfigMap objects for all managed clusters, causing O(N) memory per agent and O(N²) total. The cache is now scoped per agent: HealthCheckReport and EventReport are restricted with a label selector matching the agent's cluster name and type, and the per-cluster ConfigMap is restricted with a field selector on its name.
PR: sveltos-agent #493

addon-controller: Condition-Based Wait for CRD Reapply
When a Helm chart is deployed with UpgradeCRDs: true, the controller used to sleep for 30 seconds after applying the chart's CRD files on every reconcile, regardless of whether the CRDs had actually changed, which was especially costly in ContinuousWithDriftDetection sync mode. The sleep is replaced with a poll against the destination cluster's CRD status, returning as soon as each CRD's Established and NamesAccepted conditions are true, effectively zero delay when CRDs are already established and no more than necessary for a genuine upgrade.
PR: addon-controller #1844

🐞 Bug Fixes

sveltos-agent: Reloader Feature Broken in Agentless Mode
In agentless mode, sveltos-agent watched every Reloader instance in the management cluster instead of only the subset belonging to its own managed cluster, and watched ConfigMap/Secret objects in the management cluster instead of the managed cluster. Both are now scoped correctly.
PR: sveltos-agent #494

addon-controller: Tier Change Not Triggering Takeover
Once a ClusterProfile was managing a chart, its tier was never compared against other conflicting profiles on later reconciliations, so raising a profile's tier above a challenger stuck in FailedNonRetriable never woke the challenger up. Tier comparison now runs on every reconcile even for the profile currently managing the chart, so a higher-tier challenger correctly reclaims it.
PR: addon-controller #1837

access-manager: Configurable Namespace
The Sveltos namespace was hardcoded to projectsveltos; this closes a gap missed in the previous release by allowing it to be configurable, part of the broader effort to remove that limitation across components.
PR: access-manager #348

classifier: Upgrade Blocked by Deleted Clusters
The migration init container that upgrades deprecated ClusterInfo entries into ClassifierReport objects would crash-loop and block the upgrade entirely if a managed cluster had been deleted before the upgrade, since its namespace no longer existed. Entries whose cluster namespace is gone are now logged at debug level and skipped, while all other entries migrate normally.
PR: classifier #480

sveltos-agent: Wait for Informer to Be Synced
In agentless mode, sveltos-agent can restart its internal controller-manager without a pod restart, for example when a managed cluster's kubeconfig token expires or a CRD change is detected. If an EventSource was evaluated in the window before the new informers had synced, it returned an empty result that was misread as zero matching resources, causing event-manager to delete previously created ClusterProfiles. An unsynced informer is now treated as a transient error and the evaluation is retried once the informers have synced.
PR: sveltos-agent #495

🔧 Maintenance

All components advanced to Cluster API v1.13.3
All components advanced to Kubernetes client-go v1.36.2

Assets 2

v1.11.1

gianlucam76 released this 11 Jun 14:17

v1.11.1

6a0831a

🐞 Bug Fixes

ClusterProfile deletion no longer stalls (addon-controller #1829 (#1829))
Fixed a bug where deleting a ClusterProfile could get stuck indefinitely. The allMatchingProfilesProcessed check was not skipping other profiles that were themselves already being deleted, causing the controller to wait forever for work that would never complete.

MCP compare-clusters tool reliability (mcp-server #57 (projectsveltos/mcp-server#57))
Corrected several issues in the compare_managed_clusters MCP tool. A not-found cluster now surfaces a clear error instead of silently returning an empty result. Additionally, when a cluster exists but its ClusterConfiguration has not yet been created (resources still deploying), the tool now returns a non-fatal warning in the output rather than returning misleading empty comparison data.

Spurious pending-updates indicator in the UI (ui-backend #171 (projectsveltos/ui-backend#171))
Fixed incorrect cluster status reporting in the UI backend. A cluster was wrongly flagged as having pending updates when a profile matched it and that profile had declared dependencies. The status is now computed correctly in those cases.

Assets 2

v1.11.0

gianlucam76 released this 10 Jun 05:57

v1.11.0

daf7b13

🚀 New Features

Health Checks: Metric-Based Validation
ValidateHealth now supports querying a Prometheus-compatible endpoint as an additional data source. Named scalar values are exposed to the Lua evaluate() function via a metrics table, enabling checks such as "error rate below 5 %". In push mode the endpoint must be reachable from the management cluster; in pull mode the sveltos-applier agent running inside the managed cluster reaches it directly via in-cluster DNS.
PR: addon-controller #1816 (#1816)

Kubernetes Events for Deployment Failures
The addon-controller now raises Kubernetes Warning events to make failure causes immediately visible via kubectl describe or any event-watching tool. Events are emitted for conflicts, missing referenced resources, template instantiation errors, and when the controller gives up after reaching the maximum consecutive failure count.
PR: addon-controller #1815 (#1815)

Dashboard: Cluster Deployment Health Signals
The cluster list now surfaces deployment state at a glance. An amber alert icon appears when one or more profile deployments are failing; a blue clock icon appears when deployment is actively in progress with no failures. The ui-backend computes these signals as O(1) in-memory indexes updated by the existing ClusterSummary watcher, so no per-request scanning is required.
PRs: dashboard #173 (projectsveltos/dashboard#173), ui-backend #167 (projectsveltos/ui-backend#167)

HealthCheck: Surface "No Resources Found" as a Degraded Status
When a HealthCheck's resource selectors matched nothing, the resulting HealthCheckReport was silently empty and sveltosctl show resources showed nothing. The Lua evaluate() function can now detect an empty resources table and return a top-level degraded status (e.g. "No deployments found in namespace metrics"), which is surfaced in the report and in sveltosctl.
PR: sveltos-agent #489 (projectsveltos/sveltos-agent#489)

MCP Server: Classifier Pipeline Analysis Tool
A new tool has been added to the Sveltos MCP server to analyze classifier pipelines, making it easier to inspect and reason about classifier configuration via AI-assisted workflows.
PR: mcp-server #55 (projectsveltos/mcp-server#55)

Shard-Controller: Patch Support
A new --shard-components-config flag accepts the name of a ConfigMap in the Sveltos namespace. The ConfigMap holds one or more patches (JSON6902 or strategic-merge) that are applied to the five Deployments shard-controller creates per shard. Target selectors allow a single ConfigMap to patch only specific components. When the ConfigMap changes, a dedicated reconciler re-deploys all active shards immediately so patches take effect without restarting shard-controller.
PR: shard-controller #216 (projectsveltos/shard-controller#216)

🐞 Bug Fixes

addon-controller: Logging, Pull-Mode Status Handler, Helm Data Race, and Dependency Manager
Fixed a logging mistake where cluster name was populated with the cluster namespace in two places. Corrected the pull-mode agent status handler, which could reach a nil dereference when the agent returned an error without a status payload. Fixed a data race in the Helm chart manager where a shared map was read without holding the mutex. The dependency manager's background update loop now snapshots work, releases the write lock before making API calls, and re-acquires it only to clear completed entries — unblocking concurrent reconcilers. Its startup rebuild loop now waits 5 seconds between retries instead of spinning at full speed on API errors.
PR: addon-controller #1823 (#1823)

sveltos-applier: ClassifierReport Retry, Discovery Loop, and Namespace Client Reuse
ClassifierReports were silently marked as delivered even when the push to the management cluster failed, so they were never rld also continue with an empty Classifier object on non-NotFound API errors. The discovery retry loop in the REST mapperrefresh executed exactly once regardless of the loop variable, causing immediate failure on transient API server slowness. A new Kubernetes client — and with it a new HTTP connection pool — was being created for every single resource that needed a namespace ensured; it is now
created once per reconciliation pass.
PR: sveltos-applier #89 (projectsveltos/sveltos-applier#89)

sveltos-agent: Handful Bugs in the Evaluation Package
Nine bugs in the evaluation package have been corrected.
PR: sveltos-agent #490 (projectsveltos/sveltos-agent#490)

🔧 Maintenance

All components advanced to Go v1.26.4
All components advanced to golangci-lint v1.12.1

Assets 2

v1.10.0

gianlucam76 released this 15 May 18:20

v1.10.0

95bb46f

🚀 New Features

OIDC Authentication in Dashboard

Users can now log into the Sveltos dashboard using OpenID Connect (Authorization Code Flow with a public client), as an alternative to manual token authentication.

PR: dashboard #160 (projectsveltos/dashboard#160)

🐞 Bug Fixes

EventManager: Ordered Removal of Resources

When stale ClusterProfile resources were being cleaned up by the event-manager, referenced resources could be removed before the ClusterProfile was fully deleted, causing ordering violations. Sveltos now waits for stale ClusterProfiles to be fully deleted before removing their referenced resources.

PR: event-manager #485 (projectsveltos/event-manager#485)

addon-controller: Guaranteed Helm Chart Handoff Between Profiles

When a cluster atomically switched from one ClusterProfile to another (both referencing the same Helm chart), a race condition could cause a delete-and-reinstall instead of an in-place upgrade. Sveltos now verifies that every matching profile has had its ClusterSummary fully processed by the chart manager before allowing an uninstall, avoiding unnecessary downtime.

PR: addon-controller #1780 (#1780)

addon-controller: Surface Errors for Missing Non-Optional TemplateResourceRefs

When a non-optional resource referenced in TemplateResourceRefs was missing, no error was reported in the ClusterSummary, making it difficult to diagnose why a profile was not being deployed. The failure message is now surfaced directly in the ClusterSummary status.

PR: addon-controller #1790 (#1790)

shard-controller: Correct Flags for Init Container in Agentless Mode

When running in agentless mode, the agent-in-mgmt-cluster flag was not being set correctly for the addon-controller init container.

Assets 2

v1.9.0

gianlucam76 released this 08 May 13:23

v1.9.0

fae5349

🚀 New Features

Remote URL Support in PolicyRefs

Reference YAML content directly from HTTP/HTTPS endpoints: Previously, PolicyRef was limited to ConfigMaps and Secrets, which imposed an ~1 MB size cap. You can now define a remoteURL field pointing to any HTTP/HTTPS URL. Sveltos fetches and redeploys automatically whenever the content changes, driven by a configurable polling interval (default: 5 minutes).

Optional authentication via secretRef (token, username/password, or CA file) and Go template rendering are fully supported.

PR: addon-controller #1721

preDeployChecks

Gate deployments on cluster readiness: A new preDeployChecks field on ClusterProfile/Profile lets you define conditions that must pass before Sveltos deploys any resource. This provides a built-in operational gate — for example, blocking rollouts until a cluster reaches a healthy state.

PR: addon-controller #1753

Avoid Spurious Helm Upgrades

Stable revision counters after management cluster takeover: When a new management cluster reconciled clusters that already had Helm charts deployed, the absence of stored state caused Sveltos to run helm upgrade on every reconcile even when nothing had changed. Revision counters now remain stable on takeover. Charts with patches: configured and
ContinuousWithDriftDetection subsequent reconciliations are intentionally unaffected.

PR: addon-controller #1731

Show Addons: Filter by Helm Charts or Resources

Targeted addon inspection in sveltosctl: sveltosctl show addons gains two new flags — --helm-charts to display only Helm releases and --resources to display only Kubernetes resources. This makes it easier to inspect large deployments without noise from unrelated resource types.

PR: sveltosctl #427

Dashboard DryRun Information

Simulation results in the Sveltos dashboard: The Sveltos dashboard now surfaces DryRun simulation results. Operators can review exactly what changes would be applied to each cluster before committing a profile to active mode — without leaving the dashboard.

🐞 Bug Fixes

Drift Detection and KustomizationRefs

Configuration drift for KustomizationRef-deployed resources was not being detected or repaired. Resources deployed via KustomizationRefs (e.g. through a Flux GitRepository) are now correctly tracked by the drift-detection agent and reconciled when changed out-of-band.

PR: addon-controller #1723

Helm Chart Errors in Pull Mode

Partial ConfigurationBundle instances no longer reach the applier: When processing Helm charts in pull mode, an error mid-flight could cause a partially populated ConfigurationBundle to be committed. The applier would then treat missing resources as deleted, pruning live workloads or deploying broken stacks. Bundle preparation is now atomic — any error discards all
partial state before it can be committed.

PR: addon-controller #1725

ClusterPromotion/ClusterProfile Ordering

When ClusterPromotion creates ClusterProfile resources, the order of HelmCharts, KustomizationRefs, and PolicyRefs must be preserved. A bug was causing the order in generated ClusterProfile resources to diverge from the ClusterPromotion definition. This is now fixed.

PR: addon-controller #1736

EventTrigger: Stale Profiles on EventSource Change

When an EventTrigger's referenced EventSource was updated, the ClusterProfile, ConfigMap, and Secret resources created for the previous EventSource were left as stale orphans in the management cluster. These are now correctly cleaned up whenever the EventSource reference changes. ...

Assets 2

v1.8.0

gianlucam76 released this 11 Apr 15:15

v1.8.0

3381607

🚀 New Features

GitOps Friendly Kubeconfig Renewal

Support for In-place Secret Updates: Previously, Sveltos rotated kubeconfigs by creating a new key (re-kubeconfig) and updating the SveltosCluster spec. This caused drift in GitOps tools like ArgoCD or Flux.

You can now define spec.tokenRequestRenewalOption.kubeconfigKeyName. If set to the original key name, Sveltos will overwrite the existing Secret in-place and skip updating the Spec, keeping your live state and Git source-of-truth in sync.

PR: sveltoscluster-manager 361

Enhanced Helm Testing

Native Helm Tests: Introduced RunTests in the Helm configuration. When set to true, Sveltos will automatically run Helm test hooks (helm.sh/hook: test) after successful installs or upgrades.

Failing tests will surface as deployment failures in the ClusterSummary status, providing an automated operational gate.

PR: addon-controller 1687

Flexible Namespace Management

Skip Namespace Creation: To support multi-tenant environments where Sveltos may have restricted RBAC (lacking cluster-wide Namespace permissions), we’ve introduced SkipNamespaceCreation to PolicyRef and KustomizationRef.

When enabled, Sveltos bypasses the check/creation logic and attempts to deploy resources directly into pre-provisioned namespaces.

PR: addon-controller 1685

🐞 Bug Fixes

Helm & Patching Consistency

Patch-Triggered Redeployments: Fixed an issue where modifying Helm patches did not trigger a redeploy because the chart version and values remained unchanged. Sveltos now tracks a hash of the patches to accurately detect changes.

PR: addon-controller 1686

Persistence of Failure Messages: Resolved a sequencing bug where FailureMessage for Helm releases was being overwritten in the API server before it could be persisted.

PR: addon-controller 1693

Template Resolution in Summaries: Fixed a lookup failure where failure messages weren't being correctly mapped when ReleaseName or ReleaseNamespace contained Go templates.

PR: addon-controller 1693

Controller Robustness

Out-of-band Deletions: Fixed a reconciliation deadlock that occurred if a ClusterSummary was deleted manually while the cluster was in an UpdatingClusters state. The controller now recovers gracefully by dropping the cluster from the update list and forcing a requeue.

PR: addon-controller 1704

Assets 2

v1.7.0

gianlucam76 released this 30 Mar 17:06

v1.7.0

ead84ac

✨ Key Highlights

Reconciliation Stability (#1657): Fixed "reconciliation storms" in production environments by implementing a robust NextReconcileTime guard and an in-memory cooldown map to prevent tight loops and high CPU usage.
Discovery Optimization (#1644): Drastically reduced API overhead by caching DiscoveryClient and RESTMapper. Added targeted invalidation to discover new CRDs instantly without restarts.
Granular Helm Debugging (#1663): Added FailureMessage to individual Helm release summaries, making it easier to pinpoint exactly which chart failed and why.
Unified Promotion Logic (#1666): Shared validation and health check logic across both Auto and Manual promotion modes for consistent guardrails.

Advanced to:

Helm v4
Updated to Flux v1 OCIRepository/Bucket sources.
Upgraded to Go v1.26.1 and CAPI v1.12.4.

Assets 2

v1.6.1

gianlucam76 released this 15 Mar 10:45

v1.6.1

f02fabc

🐞 Bug Fixes

#1641: Hardened Helm Lifecycle: Resolved issues causing Helm releases to stall or fail during upgrades.

Improved handling of missing sub-charts within the dependency management flow.
Added metadata sanitization to prevent "invalid semantic version" errors during the upgrade process.

✨ Improvements

Intelligent Resource Filtering: Optimized the EventSource and HealthCheck reconcilers. The system now performs a pre-evaluation check on Name and Namespace before triggering a full evaluation. This eliminates redundant processing for resources that don't match your ResourceSelector criteria.

Assets 2

v1.6.0

gianlucam76 released this 07 Mar 16:36

v1.6.0

811a297

🚀 Release Notes: Performance & Stability Update

This release focuses heavily on infrastructure efficiency and core stability. We have significantly optimized the resource footprint of our edge components and addressed several critical bugs in the addon-controller.

⚡ Performance Optimizations

We have optimized the resource management for sveltos-agent and drift-detection-manager. These components are now leaner and more efficient, particularly in large-scale environments.

Memory Efficiency: Drastically reduced memory consumption, specifically targeting system admin memory overhead. This ensures a smaller footprint on managed nodes.
CPU Optimization: Refined execution loops to lower CPU cycles during idle and reconciliation phases.

🐞 Bug Fixes

This version resolves several edge-case behaviors and stability issues:

#1635: Clean up Stale ResourceSummaries (Agentless): Fixed an issue in agentless mode where ResourceSummary objects were not being properly cleaned up, leading to stale data in the management cluster.
#1632: Resolve Helm Installation Deadlock: Addressed a critical bug where Helm installations could enter a deadlock state, preventing the deployment from moving forward.
#1630: Fix Drift Detection Upgrade (Agentless): Resolved a failure during the upgrade process of the drift detection mechanism when running in agentless mode.

✨ Improvements

#1625: New FailedClusters Status Field: Surfaced orchestration-level errors (e.g., failure to create/update a ClusterSummary) directly in the ClusterProfile status. This eliminates "blind spots" where users previously had to check controller logs to understand why a profile wasn't progressing.
#1620: Specialized Health Check Error Handling: * Introduced a dedicated HealthCheckError type to distinguish between deployment failures and functional health check failures. Added the --health-error-retry-time CLI flag (default: 90s). This allows the controller to back off specifically on health failures without affecting standard reconciliation requeue logic.

Assets 2

v1.5.1

gianlucam76 released this 15 Feb 13:11

v1.5.1

06b3a13

🐞 Bug Fixes

Fix ClusterSummary hash evaluation when running with agents in the management cluster. PR
Honor Kubernetes cluster domain PR

Assets 2

Uh oh!

Releases: projectsveltos/addon-controller

Release list

v1.12.0

🚀 New Features

⚙️ Improvements

🐞 Bug Fixes

🔧 Maintenance

Uh oh!

v1.11.1

🐞 Bug Fixes

Uh oh!

v1.11.0

🚀 New Features

🐞 Bug Fixes

🔧 Maintenance

Uh oh!

v1.10.0

🚀 New Features

🐞 Bug Fixes

Uh oh!

v1.9.0

🚀 New Features

Remote URL Support in PolicyRefs

preDeployChecks

Avoid Spurious Helm Upgrades

Show Addons: Filter by Helm Charts or Resources

Dashboard DryRun Information

🐞 Bug Fixes

Drift Detection and KustomizationRefs

Helm Chart Errors in Pull Mode

ClusterPromotion/ClusterProfile Ordering

EventTrigger: Stale Profiles on EventSource Change

Uh oh!

v1.8.0

🚀 New Features

🐞 Bug Fixes

Uh oh!

v1.7.0

✨ Key Highlights

Uh oh!

v1.6.1

🐞 Bug Fixes

✨ Improvements

Uh oh!

v1.6.0

🚀 Release Notes: Performance & Stability Update

⚡ Performance Optimizations

🐞 Bug Fixes

✨ Improvements

Uh oh!

v1.5.1

🐞 Bug Fixes

Uh oh!