Skip to content

feat: soft pod anti-affinity for broker, etcd, and proxy#153

Open
kamir wants to merge 2 commits into
KafScale:mainfrom
kamir:pr/anti-affinity
Open

feat: soft pod anti-affinity for broker, etcd, and proxy#153
kamir wants to merge 2 commits into
KafScale:mainfrom
kamir:pr/anti-affinity

Conversation

@kamir

@kamir kamir commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

What

Add soft (preferred) pod anti-affinity so replicas of the same component avoid co-locating on one node:

  • operator-managed broker and etcd StatefulSets (pkg/operator)
  • proxy Deployment (chart), plus a topologySpreadConstraints hook in values

All soft (preferredDuringSchedulingIgnoredDuringExecution), so single-node clusters still schedule every replica; flip to required in multi-node production.

Why

Without it the scheduler can place all replicas of a component on one node, defeating the replication. Soft anti-affinity is the standard HA default.

Test

go build ./pkg/operator/... green; helm template renders proxy podAntiAffinity.

Part of a small series upstreaming deployment-hardening deltas we currently carry.

kamir and others added 2 commits June 4, 2026 19:38
…— BUG-0012

Operator-managed broker and etcd StatefulSets shipped with no anti-affinity,
so all replicas could schedule onto one node and a single node loss took out
the whole quorum. Add a preferred (soft) podAntiAffinity over each
StatefulSet's own pod labels keyed on kubernetes.io/hostname. Soft so the
single-node KIND demo still schedules every replica; on a multi-node cluster
the scheduler spreads them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hook

PLAN-06 iter-7 E-14 / G-007. The chart-templated proxy is the only
multi-replica workload bp-001 controls through this chart (operator-
managed brokers + etcd are tracked in BUG-0012). Defaults match the
soft-anti-affinity shape we want everywhere:

  * podAntiAffinity / preferredDuringSchedulingIgnoredDuringExecution,
    weight 100, topologyKey kubernetes.io/hostname. Soft so single-
    node KIND clusters still schedule both proxy replicas; flip to
    requiredDuring in a multi-node production overlay.
  * topologySpreadConstraints: [] by default; populate per-cluster
    when multi-zone topology is available.

Templates extended:
  * proxy-deployment.yaml: added topologySpreadConstraints hook
    next to the existing affinity hook (both gated by `with`).

Chart bump 0.4.1 -> 0.4.2.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant