Cross-harness context-compaction provenance (kind v1.2.0) by benbaarber · Pull Request #108 · empathic/toolpath

benbaarber · 2026-06-23T16:40:11Z

Records context-compaction as first-class provenance: when an agent summarizes and drops earlier context, we capture the boundary, what was kept, and why — instead of silently losing it.

What changed

Unified IR: ConversationView.turns/events collapse into one ordered items: Vec<Item> stream (turns()/events()/compactions() accessors). New Compaction/CompactionTrigger types.
New step type conversation.compact, placed between the turns it separates so the head-ancestry walk crosses it in order.
Kind → v1.2.0 (agent-coding-session): extends main's v1.1.0 (token usage) with conversation.compact; v1.0.0/v1.1.0 schemas retained.
derive_path resolves duplicate step ids as it emits steps — a byte-identical re-emission is dropped, a same-id-but-different step is re-IDed to <id>#<n> — so it stays infallible (returns Path) and always yields a collision-free path. The per-provider derive::derive_path/derive_project wrappers shed Result too. Subsumes fix(derive): guarantee unique step IDs per path #111.

Per-harness coverage

Harness	Compaction	Notes
Claude, Codex, opencode, pi	✅ full round-trip	read marker → IR → reproject to disk
Gemini, Cursor	⛔ by design	format records none (Gemini) / summary+kept live server-side and are unrecoverable from local data (Cursor) — boundary marker skipped on read

Claude's post-boundary re-emission (the replay block it re-logs before a boundary) is stripped on read and folded into the boundary's kept set. On projection the replay block is not re-emitted — a resume test confirmed it's dead weight (Claude rebuilds context from the summary plus post-boundary turns) — and kept rides in compactMetadata.preservedMessages instead, so re-reading reconstructs the same boundary.

Tested

42 compaction-named tests, incl. dedicated round-trip suites (claude/codex/opencode/pi) over real on-disk fixtures.
Cross-harness matrix proves A→IR→B→IR→A compaction survival; also fixes token-usage double-counting the matrix exposed (per-group_id canonicalization across the whole sequence; Gemini split-message grouping; Codex group-once accounting).
clippy --all-targets clean; full suite green.

Rebased onto main's token-usage work (#106). Format references under docs/agents/formats/ updated.

^{Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.}

…fixtures Document how each agent harness records context compaction, correcting claims that were based on synthetic/outdated fixtures (verified against source and freshly captured real sessions): - claude-code: full compactMetadata shape (preservedSegment/ preservedMessages, postTokens, durationMs) + the duplicate-UUID re-emission known issue - codex: real `compacted` payload is {message, replacement_history}, not the synthetic {trigger, preTokens, summary}; trigger is analytics-only and never persisted - gemini: compresses in-memory but persists nothing (was "no compaction") - pi: Compaction entry fields; fromHook is extension-vs-default, not auto-vs-manual - opencode: stays one session, contiguous tail_start_id, no id reuse - README: cross-harness comparison; manual == auto record everywhere, only the trigger's visibility differs Capture script: add a compaction second pass per harness (claude `/compact`, codex tiny `model_context_window`, pi raised `reserveTokens`, opencode `summarize` route) writing convo-compacted.*, plus auto-delete of scratch sessions after capture (KEEP_SESSIONS=1 to opt out, SKIP_COMPACTION=1 to skip). Includes the captured fixtures.

A conversation can carry the same turn id twice — Claude re-emits a block of earlier messages with their original uuids just before a compaction boundary — which produced duplicate step.id values and broke any store with a (path, step_id) primary key (e.g. Pathbase's bulk COPY ingest). derive_path now drops later duplicates, keeping the first occurrence: it carries the true parent lineage, while replayed copies are re-parented into a synthetic linear chain. Parent/head references by id resolve to the kept step, so no remapping is needed. Adds a regression test.

Replace the separate `turns: Vec<Turn>` and `events: Vec<ConversationEvent>` fields with a single ordered `items: Vec<Item>`, where `Item = Turn | Event | Compaction`. Holding the conversation as one ordered stream lets derive_path <-> extract_conversation round-trip losslessly and gives compaction boundaries (populated in a later phase) a true position. Adds the Compaction / KeptRange / CompactionTrigger types (defined, not yet emitted). Reads go through new turns()/events()/compactions() iterator accessors; turns_since now returns Vec<&Turn>. All five providers' to_view build `items` (all turns, then events — preserving the prior layout); gemini and codex disambiguate reused turn ids so the keep-first uniqueness pass doesn't silently drop turns.

derive_path now makes a single ordered pass over ConversationView.items, emitting a `conversation.compact` step for each Item::Compaction at its true position between the turns it separates. The step carries trigger / summary / pre_tokens / kept (each only when present) and resolves its parent through the turn map; later turns that reference the boundary rewire onto it, so the DAG threads through the compaction. extract_conversation reconstructs the Compaction from the step. Synthetic turn/event step ids are byte-identical to the old two-loop layout (per-variant counters), so every provider round-trip passes unchanged. Providers don't emit compactions yet (next phase); this wires the core derive/extract path with unit + round-trip tests.

Each provider's view builder now detects its compaction marker and emits an Item::Compaction at its true position in the stream, mapped per docs/agents/formats: - claude: compact_boundary -> Compaction (trigger from compactMetadata, pre_tokens=preTokens, kept from preservedSegment head/tail); the isCompactSummary entry is folded into summary, not emitted as a turn. - codex: `compacted` rollout item -> Compaction (summary=payload.message, trigger/pre_tokens=None, kept empty); synthetic fixture updated to the real {message, replacement_history} payload shape. - opencode: `compaction` part -> Compaction (auto bool -> trigger, tail_start_id -> kept range); fixed tail_start_id to deserialize the camelCase `tailStartID` wire key. - pi: Compaction entry -> Item::Compaction (summary, pre_tokens=tokensBefore, trigger=None since fromHook is extension-vs-default, not auto-vs-manual) replacing the old synthetic System turn. Per-provider round-trip tests assert the mapping against the real test-fixtures/<harness>/convo-compacted.* captures and that derive -> extract preserves each Compaction. gemini unchanged (no compaction on disk).

Bump PATH_KIND_AGENT_CODING_SESSION to .../v1.1.0 and ship the new kind spec + bundled schema documenting the conversation.compact step type (optional trigger / summary / pre_tokens / kept). v1.0.0 stays registered and documented for backward compatibility; the base schema treats meta.kind as a free-form URI, so paths tagged either version validate. Updates the path-cli kind-schema registry, the site kind pages + registry index, and the RFC / CLAUDE.md kind references.

Minor bumps for the crates touched by the items/compaction work, with workspace deps and site/_data/crates.json kept consistent, plus a CHANGELOG entry: toolpath 0.6.0->0.7.0, toolpath-convo 0.10.0->0.11.0, toolpath-claude 0.11.0->0.12.0, toolpath-codex 0.5.0->0.6.0, toolpath-gemini 0.5.0->0.6.0, toolpath-opencode 0.4.0->0.5.0, toolpath-pi 0.5.0->0.6.0, path-cli 0.13.0->0.14.0

The provider projectors (view -> harness on-disk format, used by `path resume` / `path export`) walked turns() and dropped compaction boundaries. They now iterate `view.items` and reconstruct each harness's marker at its true position -- the inverse of the forward mapping: - claude: compact_boundary entry (+ isCompactSummary summary) with reconstructed compactMetadata (trigger / preTokens / preservedSegment). - codex: `compacted` rollout line (payload.message = summary). - opencode: `compaction` part (auto from trigger, tailStartID from kept) plus the synthetic summary message when present. - pi: Entry::Compaction (summary, tokensBefore, firstKeptEntryId from kept), replacing the now-dead turn-extra reconstruction path. Each is verified by a projection round-trip against the real convo-compacted fixtures (view -> project -> re-read -> same Item::Compaction). Forward derive/extract and reverse projection are now both lossless for compaction.

…them Replaces the keep-first dedup (7f05b83): silently truncating a path's steps is surprising, undefined behavior nobody expects. derive_path now returns Result<Path> and fails with ConvoError::DuplicateStepId when two steps would share an id; path-cli surfaces it as a clean error rather than producing a quietly-wrong (empty/truncated) upload. Producing unique ids is the provider's job: gemini and codex already disambiguate their format-reused ids, so they derive fine. A Claude compaction replay (which re-emits earlier messages with their original uuids) now errors loudly -- e.g. `path share` of such a session prints "duplicate step id <uuid>: ..." instead of uploading an empty path. Cascades the fallible signature through the five provider derive wrappers and all path-cli call sites (errors propagate via `?` to anyhow). Also runs rustfmt across the items/compaction work.

…fidelity Reworks compaction so it survives a round-trip through toolpath into a different harness and renders natively there. The portable payload is the summary plus the KEPT SET -- which prior turns survive verbatim into the post-compaction window. - Compaction.kept is now Vec<String> (surviving turn-ids), replacing the contiguous KeptRange (deleted): Claude's kept set is non-contiguous -- a preserved tail PLUS a scattered set of pinned tool results. - Claude forward strips the re-emitted replay block (duplicate-uuid entries before the boundary; Claude records no marker for them, so we detect duplicates) and records kept = preservedMessages ∪ replayed. The real a813677e session now derives cleanly (2805 steps, 0 dup ids). - Each projector renders the kept set in its own form: Claude re-emits the kept turns on-chain before the boundary; opencode/Pi anchor a tail at the earliest kept id; Codex keeps none (wholesale). Verified end-to-end: a Claude compaction projected into Codex emits a native `compacted` rollout line, and into Pi a native `compaction` entry (firstKeptEntryId + tokensBefore).

Claude stamps harness-injected assistant entries (API errors, rate-limit notices) with model "<synthetic>". actor_for_turn passed it straight through as agent:<synthetic>, whose angle brackets violate the actor-id pattern, so real derived sessions failed `path validate`. Attribute these to the harness (tool:claude-code) like System turns instead. The a813677e session now validates.

Folds compaction into the existing A -> IR -> B -> IR -> B translation matrix rather than a bespoke round-trip test. run_cell gains a compaction_survives invariant (boundary count + summary presence survive the A -> B leg), guarded by a persists_compaction() capability so gemini is exempt -- it compresses context in memory but never writes a boundary to the chat file, so there's nothing on disk to round-trip. matrix_translation_compacted runs the full invariant set over each harness's convo-compacted fixture, so the boundary is checked alongside turns, text, tools, and tokens.

A resume test confirmed Claude rebuilds context from the summary plus post-boundary turns only -- everything before the boundary's parentUuid: null is unreachable -- so the re-logged replay block is dead weight on resume. And `kept` already round-trips through compactMetadata.preservedMessages, so the replay is redundant for the derive<->project loop too. Drop it, along with the now-unused turn_to_entry and turn_index plumbing. Projected sessions shrink; resume is unaffected.

main added the toolpath-cursor crate while this branch was changing the shared toolpath-convo API; migrate cursor onto it: - ConversationView.turns/events -> items: Vec<Item> (provider builds Item::Turn; project/derive read via the turns() accessor). - derive_path -> Result<Path> (errors on duplicate step ids); propagate through the cursor path-cli call sites. - bump toolpath-cursor 0.1.0 -> 0.2.0 for the breaking signature change. In the cross-harness matrix, cursor is exempt from compaction survival. Renamed the capability persists_compaction -> roundtrips_compaction: the exemption is about our pipeline (the cursor provider doesn't derive or render compaction yet), not a claim about the format -- Cursor appears to persist summarization, so it's a gap to revisit. Also normalizes rustfmt drift in the recently-added sources.

Cursor compacts (/summarize + auto + Composer self-summarization) and writes a capabilityType:22 boundary marker bubble, but the summary text and kept set live server-side -- not in the local store. Verified against a live /summarize'd session: no latestConversationSummary field, the composer's conversationState protobuf holds only system prompt + tool/skill definitions, and the speculativeSummarizationEncryptionKey payload isn't stored locally. So there's nothing reconstructable to derive; like gemini, we model no compaction for Cursor. The provider recognizes the cap22 marker (Bubble::is_summarization) and skips it -- no turn, no compaction -- rather than surface a content-less boundary or an empty turn. Full finding documented in docs/agents/formats/cursor.md.

The compaction branch was rebased onto main's per-message token-usage work (#106). This makes the merged result build and pass: - update test code for the merged API: Turn's new group_id/ attributed_token_usage fields, ConversationView.items + turns()/ events() accessors, and derive_path's Result return (toolpath-convo, -claude, -codex, -opencode) - fix token-usage double-counting the cross-harness round-trip matrix exposed once compaction fixtures were added: - canonicalize message totals per group_id across the whole turn sequence, not per consecutive run (toolpath-claude) - group consecutive same-id Gemini lines (one split message) so the repeated tokens snapshot counts once; un-fold on the reverse path (toolpath-gemini) - advance Codex's cumulative token_count by a group total once, on the group's last turn (toolpath-codex) - correct the CHANGELOG entry: kind agent-coding-session v1.2.0 and the final crate version list

github-actions · 2026-06-23T16:44:46Z

🔍 Preview deployed: https://6179f26c.toolpath.pages.dev

…on-global one The reader computed one session-level summary and stamped it onto every compaction, so a session with multiple boundaries collapsed all summaries onto the first. Track a pending-boundary cursor and attach each summary message to the boundary awaiting it. Tighten the cross-harness matrix to compare summary text (not mere presence) so this can't regress.

pi drops trigger, defaults pre_tokens to 0, and never leaves kept empty (mandatory tokensBefore / firstKeptEntryId). opencode's trigger is a bool (only auto vs not-auto survives), and each boundary now carries its own summary rather than a session-global one.

The "known limitation" note claimed the compact_boundary marker was dropped on read and isCompactSummary unrecognized — both untrue since the boundary became a first-class Item::Compaction (with the summary folded in). Replace it with what the test now covers, pointing at compaction_view.rs for the boundary assertions.

…oring derive_path is infallible (returns Path, not Result). On a step-id collision it drops a byte-identical re-emission and re-IDs a same-id-but-different step to `<id>#<n>`, so the result is always collision-free without surfacing an error to the caller. Removes ConvoError::DuplicateStepId. The per-provider derive::derive_path / derive_project wrappers (and pi's derive_graph) are infallible too — only the disk-reading entry points (e.g. pi's derive_project) still return Result. toolpath-git's own derive_path does real fallible I/O and is unchanged.

benbaarber added 17 commits June 22, 2026 15:59

docs(changelog): correct uniqueness behavior to error, not keep-first

7556419

benbaarber added 3 commits June 24, 2026 16:00

ben-emp force-pushed the ben/compaction branch 2 times, most recently from 8d86c26 to e0d1bab Compare June 26, 2026 16:49

ben-emp force-pushed the ben/compaction branch from e0d1bab to 7cb45a7 Compare June 26, 2026 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cross-harness context-compaction provenance (kind v1.2.0)#108

Cross-harness context-compaction provenance (kind v1.2.0)#108
benbaarber wants to merge 21 commits into
mainfrom
ben/compaction

benbaarber commented Jun 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

benbaarber commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Per-harness coverage

Tested

Uh oh!

github-actions Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benbaarber commented Jun 23, 2026 •

edited

Loading

github-actions Bot commented Jun 23, 2026 •

edited

Loading