Skip to content

fix(derive): guarantee unique step IDs per path#111

Open
akesling wants to merge 2 commits into
mainfrom
akesling/derive-unique-step-ids
Open

fix(derive): guarantee unique step IDs per path#111
akesling wants to merge 2 commits into
mainfrom
akesling/derive-unique-step-ids

Conversation

@akesling

@akesling akesling commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Problem

A source can reuse an ID across distinct records. Claude Code, for one, reuses the line uuid on attachment events — so two unrelated events arrive with the same ID. derive_path carried those straight through, emitting two steps that share a step ID within one path.

That violates the path invariant (one step per ID) and breaks any consumer that keys on it. Concretely, pathbase stores steps under UNIQUE (path_id, step_id), so uploading such a document failed with an opaque 409 Conflict: already exists (the path_id is the fresh server UUID — the collision is intra-path, from the duplicate IDs in the document itself). A real session derived 332 duplicate step IDs (148 byte-identical repeats, 184 genuinely-distinct events colliding on a reused uuid).

Fix

derive_path now resolves collisions at generation time, so every consumer gets a valid document:

  • keep the first step for an ID,
  • drop byte-identical repeats (the same record emitted twice),
  • re-ID genuinely-distinct collisions to <id>#<n> so no information is lost.

Parent references resolve to the first occurrence, which always retains the original ID; head is computed after the pass, so it references a surviving step. The original source ID is preserved in event_source_id, so round-trip/export is unaffected.

This lives in the shared derive_path, so it covers every provider — not just Claude.

Verification

  • Unit test duplicate_event_ids_are_resolved_to_unique_step_ids: a byte-identical repeat is dropped, a distinct collision is re-ID'd, all IDs unique, head valid. Full toolpath-convo suite (119 tests) green; clippy/fmt clean.
  • Re-deriving the real Claude session that previously produced 332 duplicate IDs now yields 11462 steps, 0 duplicates, 184 collisions disambiguated, head intact.

Note

The duplicate IDs originate upstream in Claude Code (reused uuids on attachment lines). This makes toolpath robust to that; a complementary pathbase change turns the opaque 409 into a clear validation error for any other malformed document.


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

A source can reuse an id across distinct records -- Claude Code reuses the
line `uuid` on `attachment` events, so two unrelated events arrive with the
same id. derive_path carried those through, emitting two steps that share a
step id within one path. That violates the path invariant and breaks
consumers that key on it (e.g. a store with a UNIQUE (path_id, step_id)
constraint, which rejected such uploads with an opaque conflict).

derive_path now resolves collisions at generation time: keep the first step
for an id, drop byte-identical repeats (the same record emitted twice), and
re-id genuinely-distinct collisions to `<id>#<n>` so no information is lost.
Parent references resolve to the first occurrence, which retains the
original id.

Verified on a real Claude session that previously derived 332 duplicate step
ids (148 exact repeats, 184 distinct collisions): output now has zero
duplicates and head still references a surviving step.
@akesling akesling changed the title fix(derive): guarantee unique step ids per path fix(derive): guarantee unique step IDs per path Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant