fix(derive): guarantee unique step IDs per path#111
Open
akesling wants to merge 2 commits into
Open
Conversation
A source can reuse an id across distinct records -- Claude Code reuses the line `uuid` on `attachment` events, so two unrelated events arrive with the same id. derive_path carried those through, emitting two steps that share a step id within one path. That violates the path invariant and breaks consumers that key on it (e.g. a store with a UNIQUE (path_id, step_id) constraint, which rejected such uploads with an opaque conflict). derive_path now resolves collisions at generation time: keep the first step for an id, drop byte-identical repeats (the same record emitted twice), and re-id genuinely-distinct collisions to `<id>#<n>` so no information is lost. Parent references resolve to the first occurrence, which retains the original id. Verified on a real Claude session that previously derived 332 duplicate step ids (148 exact repeats, 184 distinct collisions): output now has zero duplicates and head still references a surviving step.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A source can reuse an ID across distinct records. Claude Code, for one, reuses the line
uuidonattachmentevents — so two unrelated events arrive with the same ID.derive_pathcarried those straight through, emitting two steps that share a step ID within one path.That violates the path invariant (one step per ID) and breaks any consumer that keys on it. Concretely, pathbase stores steps under
UNIQUE (path_id, step_id), so uploading such a document failed with an opaque409 Conflict: already exists(thepath_idis the fresh server UUID — the collision is intra-path, from the duplicate IDs in the document itself). A real session derived 332 duplicate step IDs (148 byte-identical repeats, 184 genuinely-distinct events colliding on a reused uuid).Fix
derive_pathnow resolves collisions at generation time, so every consumer gets a valid document:<id>#<n>so no information is lost.Parent references resolve to the first occurrence, which always retains the original ID;
headis computed after the pass, so it references a surviving step. The original source ID is preserved inevent_source_id, so round-trip/export is unaffected.This lives in the shared
derive_path, so it covers every provider — not just Claude.Verification
duplicate_event_ids_are_resolved_to_unique_step_ids: a byte-identical repeat is dropped, a distinct collision is re-ID'd, all IDs unique, head valid. Fulltoolpath-convosuite (119 tests) green;clippy/fmtclean.Note
The duplicate IDs originate upstream in Claude Code (reused
uuids on attachment lines). This makes toolpath robust to that; a complementary pathbase change turns the opaque 409 into a clear validation error for any other malformed document.Need help on this PR? Tag
/codesmithwith what you need. Autofix is disabled.