/kind bug
1. What kops version are you running? The command kops version, will display
this information.
1.34.1
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.33.7
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops update cluster
5. What happened after the commands executed?
Karpenter kops addons does not use by default the native Karpenter termination handler
then If Karpenter and AWS nodeTerminationHandler are enabled at the same time, kOps fails with:
Error: error replacing cluster: spec.cloudProvider.aws.nodeTerminationHandler: Forbidden: nodeTerminationHandler cannot be used in conjunction with Karpenter
This comes from the kOps validation logic here:
https://github.com/kubernetes/kops/blob/v1.35.0-beta.1/pkg/apis/kops/validation/validation.go#L1889
which is perfectly fine as AWS guidance indicates is should be not to run two interruption handlers in the same cluster, as described in the Karpenter best-practices documentation:
https://docs.aws.amazon.com/eks/latest/best-practices/karpenter.html
The bigger problem is that, with the current kOps Karpenter addon, we cannot configure Karpenter’s native interruption handling, because we can’t pass the required interruption queue parameter (e.g. INTERRUPTION_QUEUE) via the addon configuration then will remain potentially disabled as for this code:
https://github.com/aws/karpenter-provider-aws/blob/v1.9.0/pkg/operator/options/options.go#L55
So we’re stuck:
We can’t enable nodeTerminationHandler (kOps blocks it if Karpenter is enabled), and we can’t configure Karpenter interruption handling through kOps either, which leaves the cluster without any interruption handling.
Suggestion change:
kOps addons should provide a way to configure interruption handling for Karpenter either:
allow setting INTERRUPTION_QUEUE and manually setup the requirement for Karpenter native termination handler, or
provide a higher-level flag (e.g. INTERRUPTION=true) that automatically provisions the SQS queue and EventBridge rules, and updates IRSA/IAM as needed.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
spec:
karpenter:
enabled: true
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
N/A
9. Anything else do we need to know?
N/A
/kind bug
1. What
kopsversion are you running? The commandkops version, will displaythis information.
1.34.1
2. What Kubernetes version are you running?
kubectl versionwill print theversion if a cluster is running or provide the Kubernetes version specified as
a
kopsflag.1.33.7
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops update cluster
5. What happened after the commands executed?
Karpenter kops addons does not use by default the native Karpenter termination handler
then If Karpenter and AWS nodeTerminationHandler are enabled at the same time, kOps fails with:
Error: error replacing cluster: spec.cloudProvider.aws.nodeTerminationHandler: Forbidden: nodeTerminationHandler cannot be used in conjunction with Karpenter
This comes from the kOps validation logic here:
https://github.com/kubernetes/kops/blob/v1.35.0-beta.1/pkg/apis/kops/validation/validation.go#L1889
which is perfectly fine as AWS guidance indicates is should be not to run two interruption handlers in the same cluster, as described in the Karpenter best-practices documentation:
https://docs.aws.amazon.com/eks/latest/best-practices/karpenter.html
The bigger problem is that, with the current kOps Karpenter addon, we cannot configure Karpenter’s native interruption handling, because we can’t pass the required interruption queue parameter (e.g. INTERRUPTION_QUEUE) via the addon configuration then will remain potentially disabled as for this code:
https://github.com/aws/karpenter-provider-aws/blob/v1.9.0/pkg/operator/options/options.go#L55
So we’re stuck:
We can’t enable nodeTerminationHandler (kOps blocks it if Karpenter is enabled), and we can’t configure Karpenter interruption handling through kOps either, which leaves the cluster without any interruption handling.
Suggestion change:
kOps addons should provide a way to configure interruption handling for Karpenter either:
allow setting
INTERRUPTION_QUEUEand manually setup the requirement for Karpenter native termination handler, orprovide a higher-level flag (e.g. INTERRUPTION=true) that automatically provisions the SQS queue and EventBridge rules, and updates IRSA/IAM as needed.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yamlto display your cluster manifest.You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the
-v 10flag.Paste the logs into this report, or in a gist and provide the gist link here.
N/A
9. Anything else do we need to know?
N/A