Skip to content

feat(cf): cross-env reconcile plan, dry-run (POST /admin/cf/reconcile)#284

Open
posix4e wants to merge 2 commits into
mainfrom
feat/cf-reconcile-plan
Open

feat(cf): cross-env reconcile plan, dry-run (POST /admin/cf/reconcile)#284
posix4e wants to merge 2 commits into
mainfrom
feat/cf-reconcile-plan

Conversation

@posix4e
Copy link
Copy Markdown
Member

@posix4e posix4e commented Jun 3, 2026

PR-5 of the CF-reconcile arc — the read-only plan. Operator-gated apply lands in the next PR.

What

POST /admin/cf/reconcile computes, read-only, what a reconcile would do across the whole CF map, env-labelled:

  • adopt: live (status==healthy) CF agent tunnels in the serving env the CP store is missing → fill-only rebuild from CF (recovery).
  • prune: a serving-env agent tunnel that's unclaimed AND not healthy; an unexpected serving-env CNAME; every resource of an env with no live control plane (closed PR); and the whole (unattributed) leak bucket.
  • refill: hostnames the serving CP expects but CF has no CNAME for.
  • A live foreign env (another CP's — store not held here) is left untouched with a note; a degraded map yields an empty plan + refusal note.

Adds status/created_at to CfTunnel (populated in both the per-env snapshot and the map) so adopt-vs-prune can distinguish a live agent from a dead/leaked tunnel; exposes build_cp_state for reuse.

Dry-run ONLY?apply=true is acknowledged (apply_requested echoed) but performs no mutations. Same auth as the other /admin/cf/* surfaces.

Validation

  • cargo fmt clean; compiles locally (macOS sessiond.rs noise only; CI builds musl).
  • CI build + preview deploy green; POST /admin/cf/reconcile on the preview returns dry_run:true, applied:false and a plan whose prune bucket lists the real (unattributed) leaks (the ~121 stale CNAMEs the map surfaced) — with zero CF mutations.

Next: PR-6 adds ?apply=true with the guards (skip in-flight-deploy env, TTL, zero-conn), fill-only adopt, audit log.

🤖 Generated with Claude Code

Computes — read-only — what a reconcile WOULD do across the whole CF map,
env-labelled, in three buckets:
- adopt: live (healthy) CF agent tunnels in the serving env the CP store
  is missing → fill-only rebuild from CF.
- prune: a serving-env agent tunnel that's unclaimed AND not healthy, an
  unexpected serving-env CNAME, every resource of an env with no live
  control plane (closed PR), and the whole (unattributed) leak bucket.
- refill: hostnames the serving CP expects but CF has no CNAME for.
A live foreign env (another CP's, store not held here) is left untouched
with a note. A degraded map yields an empty plan + refusal note.

Adds tunnel `status`/`created_at` to CfTunnel (populated in both the
per-env snapshot and the map) so adopt-vs-prune can tell a live agent
from a dead/leaked tunnel; exposes `build_cp_state` for reuse. The
endpoint is dry-run ONLY — `?apply=true` is acknowledged but performs no
mutations (the guarded, operator-gated apply lands next). Same auth as
the other /admin/cf/* surfaces.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

DD preview ready

URL: https://pr-284.devopsdefender.com

Browser login: visit https://pr-284.devopsdefender.com — DD redirects you to
the GitHub App auth broker. A DD session cookie scoped
to .devopsdefender.com lets the preview, fleet, and
shell hosts share the same login.

Machine-to-machine: GitHub Actions workflows in the
DD_OWNER org pass their per-job OIDC JWT as
Authorization: Bearer … (audience dd-agent).

Register endpoint for a local agent: https://pr-284.devopsdefender.com/register
(authenticated by ITA attestation).

The dry-run plan flagged 3 live pr-N agent CNAMEs (its -api-/oracle/shell
vanity hosts) for prune: 'expected hostnames' was derived from the CP
store's extras, but the CP creates more CNAMEs than it records there
(and agent-api uses a different name format), so live records looked
orphaned — a delete-a-healthy-agent bug at apply time.

Rewrite as two passes: decide tunnel actions first (recording pruned
tunnel ids), then prune a CNAME only if its target tunnel is gone
(unattributed) or is itself being pruned. A CNAME pointing at a live/kept
tunnel is always kept, regardless of whether we can re-derive its name.
Refill is limited to the reliably-known primary agent hostname.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant