Skip to content

Karpenter add-on missing INTERRUPTION_QUEUE: NodeTerminationHandler must be disabled, leaving the cluster without AWS interruption events #18016

@nickdallavalentina

Description

@nickdallavalentina

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

1.34.1

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.33.7

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

kops update cluster

5. What happened after the commands executed?

Karpenter kops addons does not use by default the native Karpenter termination handler

then If Karpenter and AWS nodeTerminationHandler are enabled at the same time, kOps fails with:

Error: error replacing cluster: spec.cloudProvider.aws.nodeTerminationHandler: Forbidden: nodeTerminationHandler cannot be used in conjunction with Karpenter

This comes from the kOps validation logic here:
https://github.com/kubernetes/kops/blob/v1.35.0-beta.1/pkg/apis/kops/validation/validation.go#L1889

which is perfectly fine as AWS guidance indicates is should be not to run two interruption handlers in the same cluster, as described in the Karpenter best-practices documentation:
https://docs.aws.amazon.com/eks/latest/best-practices/karpenter.html

The bigger problem is that, with the current kOps Karpenter addon, we cannot configure Karpenter’s native interruption handling, because we can’t pass the required interruption queue parameter (e.g. INTERRUPTION_QUEUE) via the addon configuration then will remain potentially disabled as for this code:

https://github.com/aws/karpenter-provider-aws/blob/v1.9.0/pkg/operator/options/options.go#L55

So we’re stuck:

We can’t enable nodeTerminationHandler (kOps blocks it if Karpenter is enabled), and we can’t configure Karpenter interruption handling through kOps either, which leaves the cluster without any interruption handling.

Suggestion change:

kOps addons should provide a way to configure interruption handling for Karpenter either:

allow setting INTERRUPTION_QUEUE and manually setup the requirement for Karpenter native termination handler, or

provide a higher-level flag (e.g. INTERRUPTION=true) that automatically provisions the SQS queue and EventBridge rules, and updates IRSA/IAM as needed.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

spec:
  karpenter:
    enabled: true

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

N/A

9. Anything else do we need to know?
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions