Cross-harness context-compaction provenance (kind v1.2.0)#108
Open
benbaarber wants to merge 21 commits into
Open
Cross-harness context-compaction provenance (kind v1.2.0)#108benbaarber wants to merge 21 commits into
benbaarber wants to merge 21 commits into
Conversation
…fixtures
Document how each agent harness records context compaction, correcting
claims that were based on synthetic/outdated fixtures (verified against
source and freshly captured real sessions):
- claude-code: full compactMetadata shape (preservedSegment/
preservedMessages, postTokens, durationMs) + the duplicate-UUID
re-emission known issue
- codex: real `compacted` payload is {message, replacement_history},
not the synthetic {trigger, preTokens, summary}; trigger is
analytics-only and never persisted
- gemini: compresses in-memory but persists nothing (was "no compaction")
- pi: Compaction entry fields; fromHook is extension-vs-default, not
auto-vs-manual
- opencode: stays one session, contiguous tail_start_id, no id reuse
- README: cross-harness comparison; manual == auto record everywhere,
only the trigger's visibility differs
Capture script: add a compaction second pass per harness (claude
`/compact`, codex tiny `model_context_window`, pi raised
`reserveTokens`, opencode `summarize` route) writing convo-compacted.*,
plus auto-delete of scratch sessions after capture (KEEP_SESSIONS=1 to
opt out, SKIP_COMPACTION=1 to skip). Includes the captured fixtures.
A conversation can carry the same turn id twice — Claude re-emits a block of earlier messages with their original uuids just before a compaction boundary — which produced duplicate step.id values and broke any store with a (path, step_id) primary key (e.g. Pathbase's bulk COPY ingest). derive_path now drops later duplicates, keeping the first occurrence: it carries the true parent lineage, while replayed copies are re-parented into a synthetic linear chain. Parent/head references by id resolve to the kept step, so no remapping is needed. Adds a regression test.
Replace the separate `turns: Vec<Turn>` and `events: Vec<ConversationEvent>` fields with a single ordered `items: Vec<Item>`, where `Item = Turn | Event | Compaction`. Holding the conversation as one ordered stream lets derive_path <-> extract_conversation round-trip losslessly and gives compaction boundaries (populated in a later phase) a true position. Adds the Compaction / KeptRange / CompactionTrigger types (defined, not yet emitted). Reads go through new turns()/events()/compactions() iterator accessors; turns_since now returns Vec<&Turn>. All five providers' to_view build `items` (all turns, then events — preserving the prior layout); gemini and codex disambiguate reused turn ids so the keep-first uniqueness pass doesn't silently drop turns.
derive_path now makes a single ordered pass over ConversationView.items, emitting a `conversation.compact` step for each Item::Compaction at its true position between the turns it separates. The step carries trigger / summary / pre_tokens / kept (each only when present) and resolves its parent through the turn map; later turns that reference the boundary rewire onto it, so the DAG threads through the compaction. extract_conversation reconstructs the Compaction from the step. Synthetic turn/event step ids are byte-identical to the old two-loop layout (per-variant counters), so every provider round-trip passes unchanged. Providers don't emit compactions yet (next phase); this wires the core derive/extract path with unit + round-trip tests.
Each provider's view builder now detects its compaction marker and emits
an Item::Compaction at its true position in the stream, mapped per
docs/agents/formats:
- claude: compact_boundary -> Compaction (trigger from compactMetadata,
pre_tokens=preTokens, kept from preservedSegment head/tail); the
isCompactSummary entry is folded into summary, not emitted as a turn.
- codex: `compacted` rollout item -> Compaction (summary=payload.message,
trigger/pre_tokens=None, kept empty); synthetic fixture updated to the
real {message, replacement_history} payload shape.
- opencode: `compaction` part -> Compaction (auto bool -> trigger,
tail_start_id -> kept range); fixed tail_start_id to deserialize the
camelCase `tailStartID` wire key.
- pi: Compaction entry -> Item::Compaction (summary, pre_tokens=tokensBefore,
trigger=None since fromHook is extension-vs-default, not auto-vs-manual)
replacing the old synthetic System turn.
Per-provider round-trip tests assert the mapping against the real
test-fixtures/<harness>/convo-compacted.* captures and that derive ->
extract preserves each Compaction. gemini unchanged (no compaction on disk).
Bump PATH_KIND_AGENT_CODING_SESSION to .../v1.1.0 and ship the new kind spec + bundled schema documenting the conversation.compact step type (optional trigger / summary / pre_tokens / kept). v1.0.0 stays registered and documented for backward compatibility; the base schema treats meta.kind as a free-form URI, so paths tagged either version validate. Updates the path-cli kind-schema registry, the site kind pages + registry index, and the RFC / CLAUDE.md kind references.
Minor bumps for the crates touched by the items/compaction work, with workspace deps and site/_data/crates.json kept consistent, plus a CHANGELOG entry: toolpath 0.6.0->0.7.0, toolpath-convo 0.10.0->0.11.0, toolpath-claude 0.11.0->0.12.0, toolpath-codex 0.5.0->0.6.0, toolpath-gemini 0.5.0->0.6.0, toolpath-opencode 0.4.0->0.5.0, toolpath-pi 0.5.0->0.6.0, path-cli 0.13.0->0.14.0
The provider projectors (view -> harness on-disk format, used by `path resume` / `path export`) walked turns() and dropped compaction boundaries. They now iterate `view.items` and reconstruct each harness's marker at its true position -- the inverse of the forward mapping: - claude: compact_boundary entry (+ isCompactSummary summary) with reconstructed compactMetadata (trigger / preTokens / preservedSegment). - codex: `compacted` rollout line (payload.message = summary). - opencode: `compaction` part (auto from trigger, tailStartID from kept) plus the synthetic summary message when present. - pi: Entry::Compaction (summary, tokensBefore, firstKeptEntryId from kept), replacing the now-dead turn-extra reconstruction path. Each is verified by a projection round-trip against the real convo-compacted fixtures (view -> project -> re-read -> same Item::Compaction). Forward derive/extract and reverse projection are now both lossless for compaction.
…them Replaces the keep-first dedup (7f05b83): silently truncating a path's steps is surprising, undefined behavior nobody expects. derive_path now returns Result<Path> and fails with ConvoError::DuplicateStepId when two steps would share an id; path-cli surfaces it as a clean error rather than producing a quietly-wrong (empty/truncated) upload. Producing unique ids is the provider's job: gemini and codex already disambiguate their format-reused ids, so they derive fine. A Claude compaction replay (which re-emits earlier messages with their original uuids) now errors loudly -- e.g. `path share` of such a session prints "duplicate step id <uuid>: ..." instead of uploading an empty path. Cascades the fallible signature through the five provider derive wrappers and all path-cli call sites (errors propagate via `?` to anyhow). Also runs rustfmt across the items/compaction work.
…fidelity Reworks compaction so it survives a round-trip through toolpath into a different harness and renders natively there. The portable payload is the summary plus the KEPT SET -- which prior turns survive verbatim into the post-compaction window. - Compaction.kept is now Vec<String> (surviving turn-ids), replacing the contiguous KeptRange (deleted): Claude's kept set is non-contiguous -- a preserved tail PLUS a scattered set of pinned tool results. - Claude forward strips the re-emitted replay block (duplicate-uuid entries before the boundary; Claude records no marker for them, so we detect duplicates) and records kept = preservedMessages ∪ replayed. The real a813677e session now derives cleanly (2805 steps, 0 dup ids). - Each projector renders the kept set in its own form: Claude re-emits the kept turns on-chain before the boundary; opencode/Pi anchor a tail at the earliest kept id; Codex keeps none (wholesale). Verified end-to-end: a Claude compaction projected into Codex emits a native `compacted` rollout line, and into Pi a native `compaction` entry (firstKeptEntryId + tokensBefore).
Claude stamps harness-injected assistant entries (API errors, rate-limit notices) with model "<synthetic>". actor_for_turn passed it straight through as agent:<synthetic>, whose angle brackets violate the actor-id pattern, so real derived sessions failed `path validate`. Attribute these to the harness (tool:claude-code) like System turns instead. The a813677e session now validates.
Folds compaction into the existing A -> IR -> B -> IR -> B translation matrix rather than a bespoke round-trip test. run_cell gains a compaction_survives invariant (boundary count + summary presence survive the A -> B leg), guarded by a persists_compaction() capability so gemini is exempt -- it compresses context in memory but never writes a boundary to the chat file, so there's nothing on disk to round-trip. matrix_translation_compacted runs the full invariant set over each harness's convo-compacted fixture, so the boundary is checked alongside turns, text, tools, and tokens.
A resume test confirmed Claude rebuilds context from the summary plus post-boundary turns only -- everything before the boundary's parentUuid: null is unreachable -- so the re-logged replay block is dead weight on resume. And `kept` already round-trips through compactMetadata.preservedMessages, so the replay is redundant for the derive<->project loop too. Drop it, along with the now-unused turn_to_entry and turn_index plumbing. Projected sessions shrink; resume is unaffected.
main added the toolpath-cursor crate while this branch was changing the shared toolpath-convo API; migrate cursor onto it: - ConversationView.turns/events -> items: Vec<Item> (provider builds Item::Turn; project/derive read via the turns() accessor). - derive_path -> Result<Path> (errors on duplicate step ids); propagate through the cursor path-cli call sites. - bump toolpath-cursor 0.1.0 -> 0.2.0 for the breaking signature change. In the cross-harness matrix, cursor is exempt from compaction survival. Renamed the capability persists_compaction -> roundtrips_compaction: the exemption is about our pipeline (the cursor provider doesn't derive or render compaction yet), not a claim about the format -- Cursor appears to persist summarization, so it's a gap to revisit. Also normalizes rustfmt drift in the recently-added sources.
Cursor compacts (/summarize + auto + Composer self-summarization) and writes a capabilityType:22 boundary marker bubble, but the summary text and kept set live server-side -- not in the local store. Verified against a live /summarize'd session: no latestConversationSummary field, the composer's conversationState protobuf holds only system prompt + tool/skill definitions, and the speculativeSummarizationEncryptionKey payload isn't stored locally. So there's nothing reconstructable to derive; like gemini, we model no compaction for Cursor. The provider recognizes the cap22 marker (Bubble::is_summarization) and skips it -- no turn, no compaction -- rather than surface a content-less boundary or an empty turn. Full finding documented in docs/agents/formats/cursor.md.
The compaction branch was rebased onto main's per-message token-usage work (#106). This makes the merged result build and pass: - update test code for the merged API: Turn's new group_id/ attributed_token_usage fields, ConversationView.items + turns()/ events() accessors, and derive_path's Result return (toolpath-convo, -claude, -codex, -opencode) - fix token-usage double-counting the cross-harness round-trip matrix exposed once compaction fixtures were added: - canonicalize message totals per group_id across the whole turn sequence, not per consecutive run (toolpath-claude) - group consecutive same-id Gemini lines (one split message) so the repeated tokens snapshot counts once; un-fold on the reverse path (toolpath-gemini) - advance Codex's cumulative token_count by a group total once, on the group's last turn (toolpath-codex) - correct the CHANGELOG entry: kind agent-coding-session v1.2.0 and the final crate version list
|
🔍 Preview deployed: https://6179f26c.toolpath.pages.dev |
…on-global one The reader computed one session-level summary and stamped it onto every compaction, so a session with multiple boundaries collapsed all summaries onto the first. Track a pending-boundary cursor and attach each summary message to the boundary awaiting it. Tighten the cross-harness matrix to compare summary text (not mere presence) so this can't regress.
pi drops trigger, defaults pre_tokens to 0, and never leaves kept empty (mandatory tokensBefore / firstKeptEntryId). opencode's trigger is a bool (only auto vs not-auto survives), and each boundary now carries its own summary rather than a session-global one.
The "known limitation" note claimed the compact_boundary marker was dropped on read and isCompactSummary unrecognized — both untrue since the boundary became a first-class Item::Compaction (with the summary folded in). Replace it with what the test now covers, pointing at compaction_view.rs for the boundary assertions.
8d86c26 to
e0d1bab
Compare
…oring derive_path is infallible (returns Path, not Result). On a step-id collision it drops a byte-identical re-emission and re-IDs a same-id-but-different step to `<id>#<n>`, so the result is always collision-free without surfacing an error to the caller. Removes ConvoError::DuplicateStepId. The per-provider derive::derive_path / derive_project wrappers (and pi's derive_graph) are infallible too — only the disk-reading entry points (e.g. pi's derive_project) still return Result. toolpath-git's own derive_path does real fallible I/O and is unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Records context-compaction as first-class provenance: when an agent summarizes and drops earlier context, we capture the boundary, what was kept, and why — instead of silently losing it.
What changed
ConversationView.turns/eventscollapse into one ordereditems: Vec<Item>stream (turns()/events()/compactions()accessors). NewCompaction/CompactionTriggertypes.conversation.compact, placed between the turns it separates so thehead-ancestry walk crosses it in order.agent-coding-session): extends main's v1.1.0 (token usage) withconversation.compact; v1.0.0/v1.1.0 schemas retained.derive_pathresolves duplicate step ids as it emits steps — a byte-identical re-emission is dropped, a same-id-but-different step is re-IDed to<id>#<n>— so it stays infallible (returnsPath) and always yields a collision-free path. The per-providerderive::derive_path/derive_projectwrappers shedResulttoo. Subsumes fix(derive): guarantee unique step IDs per path #111.Per-harness coverage
Claude's post-boundary re-emission (the replay block it re-logs before a boundary) is stripped on read and folded into the boundary's
keptset. On projection the replay block is not re-emitted — a resume test confirmed it's dead weight (Claude rebuilds context from the summary plus post-boundary turns) — andkeptrides incompactMetadata.preservedMessagesinstead, so re-reading reconstructs the same boundary.Tested
group_idcanonicalization across the whole sequence; Gemini split-message grouping; Codex group-once accounting).--all-targetsclean; full suite green.Rebased onto main's token-usage work (#106). Format references under
docs/agents/formats/updated.Need help on this PR? Tag
/codesmithwith what you need. Autofix is disabled.