Skip to content

Cross-harness context-compaction provenance (kind v1.2.0)#108

Open
benbaarber wants to merge 21 commits into
mainfrom
ben/compaction
Open

Cross-harness context-compaction provenance (kind v1.2.0)#108
benbaarber wants to merge 21 commits into
mainfrom
ben/compaction

Conversation

@benbaarber

@benbaarber benbaarber commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Records context-compaction as first-class provenance: when an agent summarizes and drops earlier context, we capture the boundary, what was kept, and why — instead of silently losing it.

What changed

  • Unified IR: ConversationView.turns/events collapse into one ordered items: Vec<Item> stream (turns()/events()/compactions() accessors). New Compaction/CompactionTrigger types.
  • New step type conversation.compact, placed between the turns it separates so the head-ancestry walk crosses it in order.
  • Kind → v1.2.0 (agent-coding-session): extends main's v1.1.0 (token usage) with conversation.compact; v1.0.0/v1.1.0 schemas retained.
  • derive_path resolves duplicate step ids as it emits steps — a byte-identical re-emission is dropped, a same-id-but-different step is re-IDed to <id>#<n> — so it stays infallible (returns Path) and always yields a collision-free path. The per-provider derive::derive_path/derive_project wrappers shed Result too. Subsumes fix(derive): guarantee unique step IDs per path #111.

Per-harness coverage

Harness Compaction Notes
Claude, Codex, opencode, pi ✅ full round-trip read marker → IR → reproject to disk
Gemini, Cursor ⛔ by design format records none (Gemini) / summary+kept live server-side and are unrecoverable from local data (Cursor) — boundary marker skipped on read

Claude's post-boundary re-emission (the replay block it re-logs before a boundary) is stripped on read and folded into the boundary's kept set. On projection the replay block is not re-emitted — a resume test confirmed it's dead weight (Claude rebuilds context from the summary plus post-boundary turns) — and kept rides in compactMetadata.preservedMessages instead, so re-reading reconstructs the same boundary.

Tested

  • 42 compaction-named tests, incl. dedicated round-trip suites (claude/codex/opencode/pi) over real on-disk fixtures.
  • Cross-harness matrix proves A→IR→B→IR→A compaction survival; also fixes token-usage double-counting the matrix exposed (per-group_id canonicalization across the whole sequence; Gemini split-message grouping; Codex group-once accounting).
  • clippy --all-targets clean; full suite green.

Rebased onto main's token-usage work (#106). Format references under docs/agents/formats/ updated.


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

…fixtures

Document how each agent harness records context compaction, correcting
claims that were based on synthetic/outdated fixtures (verified against
source and freshly captured real sessions):

- claude-code: full compactMetadata shape (preservedSegment/
  preservedMessages, postTokens, durationMs) + the duplicate-UUID
  re-emission known issue
- codex: real `compacted` payload is {message, replacement_history},
  not the synthetic {trigger, preTokens, summary}; trigger is
  analytics-only and never persisted
- gemini: compresses in-memory but persists nothing (was "no compaction")
- pi: Compaction entry fields; fromHook is extension-vs-default, not
  auto-vs-manual
- opencode: stays one session, contiguous tail_start_id, no id reuse
- README: cross-harness comparison; manual == auto record everywhere,
  only the trigger's visibility differs

Capture script: add a compaction second pass per harness (claude
`/compact`, codex tiny `model_context_window`, pi raised
`reserveTokens`, opencode `summarize` route) writing convo-compacted.*,
plus auto-delete of scratch sessions after capture (KEEP_SESSIONS=1 to
opt out, SKIP_COMPACTION=1 to skip). Includes the captured fixtures.
A conversation can carry the same turn id twice — Claude re-emits a block
of earlier messages with their original uuids just before a compaction
boundary — which produced duplicate step.id values and broke any store
with a (path, step_id) primary key (e.g. Pathbase's bulk COPY ingest).

derive_path now drops later duplicates, keeping the first occurrence: it
carries the true parent lineage, while replayed copies are re-parented
into a synthetic linear chain. Parent/head references by id resolve to
the kept step, so no remapping is needed. Adds a regression test.
Replace the separate `turns: Vec<Turn>` and `events: Vec<ConversationEvent>`
fields with a single ordered `items: Vec<Item>`, where
`Item = Turn | Event | Compaction`. Holding the conversation as one ordered
stream lets derive_path <-> extract_conversation round-trip losslessly and
gives compaction boundaries (populated in a later phase) a true position.
Adds the Compaction / KeptRange / CompactionTrigger types (defined, not yet
emitted).

Reads go through new turns()/events()/compactions() iterator accessors;
turns_since now returns Vec<&Turn>. All five providers' to_view build
`items` (all turns, then events — preserving the prior layout); gemini and
codex disambiguate reused turn ids so the keep-first uniqueness pass doesn't
silently drop turns.
derive_path now makes a single ordered pass over ConversationView.items,
emitting a `conversation.compact` step for each Item::Compaction at its true
position between the turns it separates. The step carries trigger / summary /
pre_tokens / kept (each only when present) and resolves its parent through
the turn map; later turns that reference the boundary rewire onto it, so the
DAG threads through the compaction. extract_conversation reconstructs the
Compaction from the step.

Synthetic turn/event step ids are byte-identical to the old two-loop layout
(per-variant counters), so every provider round-trip passes unchanged.
Providers don't emit compactions yet (next phase); this wires the core
derive/extract path with unit + round-trip tests.
Each provider's view builder now detects its compaction marker and emits
an Item::Compaction at its true position in the stream, mapped per
docs/agents/formats:

- claude: compact_boundary -> Compaction (trigger from compactMetadata,
  pre_tokens=preTokens, kept from preservedSegment head/tail); the
  isCompactSummary entry is folded into summary, not emitted as a turn.
- codex: `compacted` rollout item -> Compaction (summary=payload.message,
  trigger/pre_tokens=None, kept empty); synthetic fixture updated to the
  real {message, replacement_history} payload shape.
- opencode: `compaction` part -> Compaction (auto bool -> trigger,
  tail_start_id -> kept range); fixed tail_start_id to deserialize the
  camelCase `tailStartID` wire key.
- pi: Compaction entry -> Item::Compaction (summary, pre_tokens=tokensBefore,
  trigger=None since fromHook is extension-vs-default, not auto-vs-manual)
  replacing the old synthetic System turn.

Per-provider round-trip tests assert the mapping against the real
test-fixtures/<harness>/convo-compacted.* captures and that derive ->
extract preserves each Compaction. gemini unchanged (no compaction on disk).
Bump PATH_KIND_AGENT_CODING_SESSION to .../v1.1.0 and ship the new kind
spec + bundled schema documenting the conversation.compact step type
(optional trigger / summary / pre_tokens / kept). v1.0.0 stays registered
and documented for backward compatibility; the base schema treats
meta.kind as a free-form URI, so paths tagged either version validate.

Updates the path-cli kind-schema registry, the site kind pages + registry
index, and the RFC / CLAUDE.md kind references.
Minor bumps for the crates touched by the items/compaction work, with
workspace deps and site/_data/crates.json kept consistent, plus a
CHANGELOG entry:

  toolpath 0.6.0->0.7.0, toolpath-convo 0.10.0->0.11.0,
  toolpath-claude 0.11.0->0.12.0, toolpath-codex 0.5.0->0.6.0,
  toolpath-gemini 0.5.0->0.6.0, toolpath-opencode 0.4.0->0.5.0,
  toolpath-pi 0.5.0->0.6.0, path-cli 0.13.0->0.14.0
The provider projectors (view -> harness on-disk format, used by
`path resume` / `path export`) walked turns() and dropped compaction
boundaries. They now iterate `view.items` and reconstruct each harness's
marker at its true position -- the inverse of the forward mapping:

- claude: compact_boundary entry (+ isCompactSummary summary) with
  reconstructed compactMetadata (trigger / preTokens / preservedSegment).
- codex: `compacted` rollout line (payload.message = summary).
- opencode: `compaction` part (auto from trigger, tailStartID from kept)
  plus the synthetic summary message when present.
- pi: Entry::Compaction (summary, tokensBefore, firstKeptEntryId from
  kept), replacing the now-dead turn-extra reconstruction path.

Each is verified by a projection round-trip against the real
convo-compacted fixtures (view -> project -> re-read -> same
Item::Compaction). Forward derive/extract and reverse projection are now
both lossless for compaction.
…them

Replaces the keep-first dedup (7f05b83): silently truncating a path's
steps is surprising, undefined behavior nobody expects. derive_path now
returns Result<Path> and fails with ConvoError::DuplicateStepId when two
steps would share an id; path-cli surfaces it as a clean error rather than
producing a quietly-wrong (empty/truncated) upload.

Producing unique ids is the provider's job: gemini and codex already
disambiguate their format-reused ids, so they derive fine. A Claude
compaction replay (which re-emits earlier messages with their original
uuids) now errors loudly -- e.g. `path share` of such a session prints
"duplicate step id <uuid>: ..." instead of uploading an empty path.

Cascades the fallible signature through the five provider derive wrappers
and all path-cli call sites (errors propagate via `?` to anyhow). Also
runs rustfmt across the items/compaction work.
…fidelity

Reworks compaction so it survives a round-trip through toolpath into a
different harness and renders natively there. The portable payload is the
summary plus the KEPT SET -- which prior turns survive verbatim into the
post-compaction window.

- Compaction.kept is now Vec<String> (surviving turn-ids), replacing the
  contiguous KeptRange (deleted): Claude's kept set is non-contiguous -- a
  preserved tail PLUS a scattered set of pinned tool results.
- Claude forward strips the re-emitted replay block (duplicate-uuid
  entries before the boundary; Claude records no marker for them, so we
  detect duplicates) and records kept = preservedMessages ∪ replayed.
  The real a813677e session now derives cleanly (2805 steps, 0 dup ids).
- Each projector renders the kept set in its own form: Claude re-emits the
  kept turns on-chain before the boundary; opencode/Pi anchor a tail at the
  earliest kept id; Codex keeps none (wholesale).

Verified end-to-end: a Claude compaction projected into Codex emits a
native `compacted` rollout line, and into Pi a native `compaction` entry
(firstKeptEntryId + tokensBefore).
Claude stamps harness-injected assistant entries (API errors, rate-limit
notices) with model "<synthetic>". actor_for_turn passed it straight
through as agent:<synthetic>, whose angle brackets violate the actor-id
pattern, so real derived sessions failed `path validate`. Attribute these
to the harness (tool:claude-code) like System turns instead. The a813677e
session now validates.
Folds compaction into the existing A -> IR -> B -> IR -> B translation
matrix rather than a bespoke round-trip test. run_cell gains a
compaction_survives invariant (boundary count + summary presence survive
the A -> B leg), guarded by a persists_compaction() capability so gemini is
exempt -- it compresses context in memory but never writes a boundary to
the chat file, so there's nothing on disk to round-trip.
matrix_translation_compacted runs the full invariant set over each
harness's convo-compacted fixture, so the boundary is checked alongside
turns, text, tools, and tokens.
A resume test confirmed Claude rebuilds context from the summary plus
post-boundary turns only -- everything before the boundary's parentUuid:
null is unreachable -- so the re-logged replay block is dead weight on
resume. And `kept` already round-trips through
compactMetadata.preservedMessages, so the replay is redundant for the
derive<->project loop too. Drop it, along with the now-unused turn_to_entry
and turn_index plumbing. Projected sessions shrink; resume is unaffected.
main added the toolpath-cursor crate while this branch was changing the
shared toolpath-convo API; migrate cursor onto it:

- ConversationView.turns/events -> items: Vec<Item> (provider builds
  Item::Turn; project/derive read via the turns() accessor).
- derive_path -> Result<Path> (errors on duplicate step ids); propagate
  through the cursor path-cli call sites.
- bump toolpath-cursor 0.1.0 -> 0.2.0 for the breaking signature change.

In the cross-harness matrix, cursor is exempt from compaction survival.
Renamed the capability persists_compaction -> roundtrips_compaction: the
exemption is about our pipeline (the cursor provider doesn't derive or
render compaction yet), not a claim about the format -- Cursor appears to
persist summarization, so it's a gap to revisit. Also normalizes rustfmt
drift in the recently-added sources.
Cursor compacts (/summarize + auto + Composer self-summarization) and
writes a capabilityType:22 boundary marker bubble, but the summary text
and kept set live server-side -- not in the local store. Verified against
a live /summarize'd session: no latestConversationSummary field, the
composer's conversationState protobuf holds only system prompt + tool/skill
definitions, and the speculativeSummarizationEncryptionKey payload isn't
stored locally. So there's nothing reconstructable to derive; like gemini,
we model no compaction for Cursor.

The provider recognizes the cap22 marker (Bubble::is_summarization) and
skips it -- no turn, no compaction -- rather than surface a content-less
boundary or an empty turn. Full finding documented in
docs/agents/formats/cursor.md.
The compaction branch was rebased onto main's per-message token-usage
work (#106). This makes the merged result build and pass:

- update test code for the merged API: Turn's new group_id/
  attributed_token_usage fields, ConversationView.items + turns()/
  events() accessors, and derive_path's Result return
  (toolpath-convo, -claude, -codex, -opencode)
- fix token-usage double-counting the cross-harness round-trip matrix
  exposed once compaction fixtures were added:
  - canonicalize message totals per group_id across the whole turn
    sequence, not per consecutive run (toolpath-claude)
  - group consecutive same-id Gemini lines (one split message) so the
    repeated tokens snapshot counts once; un-fold on the reverse path
    (toolpath-gemini)
  - advance Codex's cumulative token_count by a group total once, on
    the group's last turn (toolpath-codex)
- correct the CHANGELOG entry: kind agent-coding-session v1.2.0 and the
  final crate version list
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

🔍 Preview deployed: https://6179f26c.toolpath.pages.dev

…on-global one

The reader computed one session-level summary and stamped it onto every
compaction, so a session with multiple boundaries collapsed all summaries
onto the first. Track a pending-boundary cursor and attach each summary
message to the boundary awaiting it. Tighten the cross-harness matrix to
compare summary text (not mere presence) so this can't regress.
pi drops trigger, defaults pre_tokens to 0, and never leaves kept empty
(mandatory tokensBefore / firstKeptEntryId). opencode's trigger is a bool
(only auto vs not-auto survives), and each boundary now carries its own
summary rather than a session-global one.
The "known limitation" note claimed the compact_boundary marker was dropped
on read and isCompactSummary unrecognized — both untrue since the boundary
became a first-class Item::Compaction (with the summary folded in). Replace
it with what the test now covers, pointing at compaction_view.rs for the
boundary assertions.
@ben-emp ben-emp force-pushed the ben/compaction branch 2 times, most recently from 8d86c26 to e0d1bab Compare June 26, 2026 16:49
…oring

derive_path is infallible (returns Path, not Result). On a step-id collision
it drops a byte-identical re-emission and re-IDs a same-id-but-different step
to `<id>#<n>`, so the result is always collision-free without surfacing an
error to the caller. Removes ConvoError::DuplicateStepId.

The per-provider derive::derive_path / derive_project wrappers (and pi's
derive_graph) are infallible too — only the disk-reading entry points (e.g.
pi's derive_project) still return Result. toolpath-git's own derive_path does
real fallible I/O and is unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant