[PRODENG-3342] Validate --pod-cidr does not overlap Swarm overlay address pool#636
[PRODENG-3342] Validate --pod-cidr does not overlap Swarm overlay address pool#636james-nesbitt wants to merge 2 commits into
Conversation
Configs with --pod-cidr that overlaps the Swarm overlay address pool (default 10.0.0.0/8) cause the Docker daemon to restart into a broken network state during MKE bootstrap. The SSH session rides that network, so the connection drops silently and launchpad reports a timeout after 20+ minutes of install. Adds ValidateFacts.validatePodCIDR() which: - Parses --pod-cidr from mke.installFlags - Uses the Swarm default pool (10.0.0.0/8) unless mcr.swarmInstallFlags contains --default-addr-pool, in which case that value is used instead - Fails immediately with a clear, actionable error if the CIDRs overlap Tests cover: overlap with default pool, no overlap, absent flag, custom pool no overlap, custom pool overlap. PRODENG-3342
f11af3b to
be103ef
Compare
| swarmPoolStr := p.Config.Spec.MCR.SwarmInstallFlags.GetValue("--default-addr-pool") | ||
| if swarmPoolStr == "" { | ||
| swarmPoolStr = swarmDefaultAddrPool | ||
| } | ||
|
|
||
| _, podNet, err := net.ParseCIDR(podCIDRStr) | ||
| if err != nil { | ||
| return fmt.Errorf("%w: cannot parse --pod-cidr %q: %w", errInvalidPodCIDR, podCIDRStr, err) | ||
| } | ||
|
|
||
| _, swarmNet, err := net.ParseCIDR(swarmPoolStr) | ||
| if err != nil { | ||
| return fmt.Errorf("%w: cannot parse Swarm address pool %q: %w", errInvalidPodCIDR, swarmPoolStr, err) | ||
| } | ||
|
|
||
| if swarmNet.Contains(podNet.IP) || podNet.Contains(swarmNet.IP) { | ||
| return fmt.Errorf( | ||
| "%w: --pod-cidr %s overlaps with the Swarm overlay address pool %s; "+ | ||
| "choose a non-overlapping range or set mcr.swarmInstallFlags --default-addr-pool to a non-conflicting pool", | ||
| errInvalidPodCIDR, podCIDRStr, swarmPoolStr, | ||
| ) | ||
| } | ||
|
|
||
| log.Debugf("pod CIDR %s does not overlap with Swarm pool %s", podCIDRStr, swarmPoolStr) | ||
| return nil |
There was a problem hiding this comment.
Fixed in bacd3ef. Added Flags.GetValues to collect every --default-addr-pool value, and validatePodCIDR now loops over all configured pools (falling back to the compiled-in 10.0.0.0/8 only when none are set), so an overlap with any pool fails validation.
Written by AI: claude-opus-4-8
| func TestValidatePodCIDRCustomSwarmPoolOverlap(t *testing.T) { | ||
| // Custom pool 192.168.0.0/16 overlaps with pod-cidr 192.168.1.0/24. | ||
| p := makePhaseWithPodCIDR("192.168.1.0/24", "192.168.0.0/16") | ||
| err := p.validatePodCIDR() | ||
| require.Error(t, err) |
There was a problem hiding this comment.
Done in bacd3ef. Added TestValidatePodCIDROverlapsLaterSwarmPool (pod-cidr overlaps only the second of two pools, so it fails only if every pool is checked) and TestValidatePodCIDRMultipleSwarmPoolsNoOverlap. Also added TestFlags_GetValues covering repeated flags in both separator forms and the --default-addr-pool / --default-addr-pool-mask-length prefix collision.
Written by AI: claude-opus-4-8
docker swarm init accepts --default-addr-pool repeated, but validatePodCIDR only inspected the first pool via GetValue, so a --pod-cidr overlapping a later pool passed validation. Add Flags.GetValues to collect every matching value and loop over all configured pools, parsing --pod-cidr once up front. Addresses Copilot review feedback on #636.
What
Add early validation that rejects configs where
--pod-cidrinmke.installFlagsoverlaps with the Swarm overlay address pool.Why
A
--pod-cidrthat overlaps10.0.0.0/8(Swarm's default overlay pool) causes the Docker daemon to restart into a broken network state during MKE bootstrap. The SSH session rides that network, so the connection drops silently and launchpad reports a 20-minute timeout with no useful diagnosis.How
validatePodCIDR()toValidateFactsphase — parses--pod-cidrfrommke.installFlagsand checks for CIDR overlap with the Swarm poolmcr.swarmInstallFlagscontains--default-addr-pool, that value is used as the Swarm pool instead of the compiled-in default (10.0.0.0/8)Testing
make unit-testpasses (24/24 packages)Links
Checklist