diff --git a/.gitignore b/.gitignore index c50ee01..8f10b34 100644 --- a/.gitignore +++ b/.gitignore @@ -10,3 +10,6 @@ site/wasm/ .claude/worktrees/ # Synthetic benchmark fixtures — generate locally via gen_synthetic_path. /bench/fixtures/ + +# macOS +.DS_Store diff --git a/CHANGELOG.md b/CHANGELOG.md index 9e740c9..895f5e0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,114 @@ All notable changes to the Toolpath workspace are documented here. +## Token usage: once per message, with per-step attribution + kind v1.1.0 — 2026-06-17 + +Fixes token over-counting in derived documents (~3× output-token +inflation on real Claude sessions, unbounded on Codex) and adds per-step +token attribution where the source genuinely reports it (Codex). Two +over-counting bugs, one spec gap, plus a capability the corrected reads +make possible. Verified against every Claude session and all Codex +sessions on disk, and cross-checked against the Anthropic streaming API +reference and OpenAI's codex issue tracker. + +- **Claude**: Claude Code writes one JSONL line per content block of an + assistant API message, repeating the message-level `usage` on every + line. `toolpath-claude` emitted one step per line, each carrying the + full usage — so summing `token_usage` per step over-counted by the + block count, and the disambiguating `message.id` was dropped. +- **Codex**: `toolpath-codex` stamped the *cumulative* session counter + (`total_token_usage`) onto each assistant turn instead of per-step + spend, so per-step sums grew quadratically. + +Core model (kind `agent-coding-session` **v1.1.0**, both fields optional +so any producer can populate per-step attribution later with no further +kind version): + +- `token_usage` always means **the total for a message**, on the + group's final step (`Σ token_usage` over a path = session total). +- `attributed_token_usage` (new) is **this step's own attributed + spend**, on its own key so the sum above is unaffected. Whether a + number is a total or a share is structural (the key), never + positional. The unattributed remainder + (`group token_usage − Σ attributed`) is computed by consumers, never + recorded — stored values stay verbatim source observations. +- `breakdowns` (new, optional) is a **decomposition of a top-level + class into named sub-classes** — keyed by the class being broken down (e.g. + `"output"`), inner map sub-class → tokens (e.g. `{"output": + {"reasoning": 243}}`). It is **informational and never summed into + any total** — the parent class already counts those tokens — so the + session-total guarantee is untouched. Invariant: `Σ(inner) ≤` the + parent class's value; the field is omitted when empty. It rides both + `token_usage` and `attributed_token_usage`. + +Changes: + +- `toolpath_convo::TokenUsage` gains `breakdowns` + (`BTreeMap>`); the kind + `tokenUsage` `$def` gains a matching optional `breakdowns` property. +- **Gemini under-count FIX**: Gemini reports `thoughts` (reasoning) as + an additive sibling of `output_tokens` that the derivation was + **dropping** — so Gemini output totals were under-counted by the + reasoning spend. `thoughts` is now **folded into `output_tokens`** + (correcting the total) *and* recorded under + `breakdowns["output"]["reasoning"]`; the projector **un-folds** it on + the reverse path for a lossless round-trip (`Some(0)` is preserved as + a real Gemini-3 zero-reasoning signal, not collapsed to absent). +- **OpenCode**: continues folding `reasoning` into `output_tokens`, and + now also records it under `breakdowns["output"]["reasoning"]`. +- **Codex**: `reasoning_output_tokens` (a subset of `output_tokens`, + cumulative → differenced like the other counters) is surfaced under + `breakdowns["output"]["reasoning"]` on both the per-step + `attributed_token_usage` and the per-round `token_usage`. +- **Claude**: records no breakdown — its JSONL `usage` does not itemize + thinking tokens. +- `toolpath_convo::Turn` gains `group_id` (grouping key) and + `attributed_token_usage`. `derive_path` writes `token_usage` once per + `group_id` group and `attributed_token_usage` on each step that has + it; `extract_conversation` reads both back. +- `toolpath-claude`: a split message's lines carry `message.usage` as a + **cumulative streaming snapshot**, not a per-line bill — per the + Anthropic streaming API, `message_start` seeds `output_tokens` near + zero and each `message_delta` reports the running cumulative total + (confirmed across every session sampled: input/cache constant, output + climbing to the final-line total; ~27% of multi-line messages vary). + Each `group_id` run is reduced to the **field-wise maximum** total + (never under-counts whatever the line order) on its final turn. The + intermediate snapshots are flush-time artifacts, *not* per-block costs + (a real prose block routinely shows `output_tokens: 1`), so Claude + emits **no** `attributed_token_usage`. `total_usage` is deduped by + group; the projector re-expands the total onto every line of a split. +- `toolpath-codex`: per-step spend is the increase in the cumulative + `total_token_usage` since the previous count — **differencing the + cumulative is dedup-safe**, where summing `last_token_usage` would + double-count because Codex re-emits a stale `last_token_usage` on + repeated `token_count` events (a documented trap: openai/codex #14489, + #17539). Each per-call delta is attributed to the step it follows as + `attributed_token_usage`; a round's `token_usage` total is the sum of + its steps' attributions (one source of truth — total and shares cannot + drift). The projector emits a `turn_context` per group and a cumulative + `token_count` after each step, so grouping and attribution survive the + round-trip. +- `toolpath-pi` and `toolpath-opencode` decode absent/all-zero wire + usage counters as `token_usage: None` ("spend unknown") instead of + `Some(zeros)` — their wires require usage fields, which + foreign-source projections zero-fill. +- `PATH_KIND_AGENT_CODING_SESSION` now points at v1.1.0; + `PATH_KIND_AGENT_CODING_SESSION_V1_0_0` names the old URI. `path p + validate` bundles both schemas. The v1.0.0 spec page gains an erratum + documenting the historical duplication (consumers of v1.0.0 documents + still need dedup heuristics; the byte-identical-tuple heuristic does + not repair Codex documents). + +Crates bumped (every crate that depends on `toolpath`, matching the +domain-rename precedent since the emitted kind URI changes): `toolpath` +0.7.0, `toolpath-convo` 0.11.0, `toolpath-git` 0.6.0, `toolpath-github` +0.6.0, `toolpath-claude` 0.12.0, `toolpath-gemini` 0.6.0, +`toolpath-codex` 0.6.0, `toolpath-opencode` 0.5.0, `toolpath-cursor` +0.2.0, `toolpath-pi` 0.6.0, `toolpath-dot` 0.5.0, `toolpath-md` 0.7.0, +`path-cli` 0.14.0, `toolpath-cli` 0.14.0. `pathbase-client` is +unaffected. + ## toolpath-claude 0.11.1 + path-cli 0.13.1 + toolpath-cli 0.13.1: derive `project_path` from the file's parent directory — 2026-06-09 `ConversationReader::read_conversation_metadata` used to set diff --git a/CLAUDE.md b/CLAUDE.md index bbdea69..01c70e9 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -175,18 +175,18 @@ the server publishes that operation. Tests live alongside the code (`#[cfg(test)] mod tests`), plus `path-cli` has integration tests in `tests/`. Per-crate counts: -- `toolpath`: 32 unit + 9 doc tests (serde roundtrip, builders, query) -- `toolpath-convo`: 58 unit + 1 doc test (types, enrichment, display, ConversationView -> Path derivation) +- `toolpath`: 69 unit + 11 doc tests (serde roundtrip, builders, query) +- `toolpath-convo`: 118 unit + 4 doc tests (types, enrichment, display, ConversationView -> Path derivation, message-group usage accounting, breakdowns) - `toolpath-git`: 33 unit + 3 doc tests (derive, branch detection, diffstat) -- `toolpath-github`: 28 unit + 2 doc tests (mapping, DAG construction, fixtures) -- `toolpath-claude`: 278 unit + 6 doc tests (path resolution, conversation reading, query, chaining, watcher, derive, metadata first-user-message) -- `toolpath-gemini`: 163 unit + 12 integration + 4 doc tests (path resolution, chat-file parsing, query, watcher, derive, provider, round-trip fidelity) -- `toolpath-codex`: 69 unit + 33 integration + 1 doc test (rollout parsing, provider assembly, patch-fidelity derive, real-session fixture, source→path fidelity invariants, JSON wire-level round-trip) -- `toolpath-opencode`: 43 unit + 1 doc test (SQLite reader, JSON payload serde, provider assembly, snapshot-based derive, tool-input fallback for gitignored paths) -- `toolpath-cursor`: 70 unit + 8 integration round-trip + 1 real-DB sanity + 1 doc test (state.vscdb SQLite reader, bubble store + composer header parsing, content-addressed blob lookup, projector with full TOOL_TABLE coverage, JSONL transcript ingest in `examples/dump_fixture.rs`) -- `toolpath-pi`: 123 unit + 4 doc tests (types, paths, error, reader, io, provider) +- `toolpath-github`: 32 unit + 3 doc tests (mapping, DAG construction, fixtures) +- `toolpath-claude`: 229 unit + 18 integration + 6 doc tests (path resolution, conversation reading, query, chaining, watcher, derive, metadata first-user-message, group_id grouping + once-per-message usage totals) +- `toolpath-gemini`: 161 unit + 29 integration + 5 doc tests (path resolution, chat-file parsing, query, watcher, derive, provider, round-trip fidelity, thoughts-folded-into-output + reasoning breakdown round-trip) +- `toolpath-codex`: 80 unit + 51 integration + 2 doc tests (rollout parsing, provider assembly, patch-fidelity derive, real-session fixture, source→path fidelity invariants, JSON wire-level round-trip, per-turn token deltas from cumulative counters, reasoning breakdown) +- `toolpath-opencode`: 52 unit + 19 integration + 1 doc test (SQLite reader, JSON payload serde, provider assembly, snapshot-based derive, tool-input fallback for gitignored paths, reasoning breakdown) +- `toolpath-cursor`: 78 unit + 8 integration round-trip + 1 real-DB sanity + 1 doc test (state.vscdb SQLite reader, bubble store + composer header parsing, content-addressed blob lookup, projector with full TOOL_TABLE coverage, JSONL transcript ingest in `examples/dump_fixture.rs`) +- `toolpath-pi`: 133 unit + 26 integration + 5 doc tests (types, paths, error, reader, io, provider) - `toolpath-dot`: 30 unit + 2 doc tests (render, visual conventions, escaping) -- `path-cli`: 260 unit + 63 integration tests (import/export/cache, track sessions, merge, validate, roundtrip, render-md snapshots, deprecation aliases, pathbase HTTP mock-server tests, fzf-friendly TSV output, `path resume` orchestration with injectable `ExecStrategy`). For an end-to-end check against a real Pathbase deployment, run `scripts/test-pathbase-live.sh ` — it does an anon round-trip in a sandboxed config dir and, if you're logged into that URL, an authed pathstash round-trip too. +- `path-cli`: 294 unit + 65 integration tests (import/export/cache, track sessions, merge, validate, roundtrip, render-md snapshots, deprecation aliases, pathbase HTTP mock-server tests, fzf-friendly TSV output, `path resume` orchestration with injectable `ExecStrategy`). For an end-to-end check against a real Pathbase deployment, run `scripts/test-pathbase-live.sh ` — it does an anon round-trip in a sandboxed config dir and, if you're logged into that URL, an authed pathstash round-trip too. - `toolpath-cli`: 0 tests (it's a one-line `path_cli::run()` shim crate that exists only so `cargo install toolpath-cli` keeps installing the `path` binary) Validate example documents: `for f in examples/*.json; do cargo run -p path-cli -- p validate --input "$f"; done` @@ -229,7 +229,7 @@ When changing a crate's public API (new types, new trait impls, new public metho The `toolpath-cli` shim lives **outside** the workspace (`exclude = ["crates/toolpath-cli"]` in the root `Cargo.toml`). Both `toolpath-cli` and `path-cli` produce a binary literally named `path`, and cargo can't write two bin targets to the same workspace `target/debug/path` — so the shim opts out and gets its own `crates/toolpath-cli/target/` (covered by the `crates/*/target` line in `.gitignore`). Practical consequences: `cargo build --workspace`, `cargo test --workspace`, and `cargo run -p toolpath-cli` from the repo root **do not** include the shim. To touch it, use `--manifest-path crates/toolpath-cli/Cargo.toml`. The release script special-cases the shim in `get_version` and `publish` so the workflow is otherwise unchanged. -Build the site after changes: `cd site && pnpm run build` (should produce 7 pages). +Build the site after changes: `cd site && pnpm run build` (should produce 11 pages). ## Things to know @@ -242,7 +242,9 @@ Build the site after changes: `cd site && pnpm run build` (should produce 7 page - `toolpath-gemini` treats main file + sibling sub-agent UUID dir as one conversation. Sub-agent files are folded into `DelegatedWork` with populated `turns` (unlike `toolpath-claude`, whose sub-agent turns live in separate session files and stay empty). See `docs/agents/formats/gemini.md` for the full format reference. - Provider-specific extras convention: `Turn.extra` and `WatcherEvent::Progress.data` use provider-namespaced keys (e.g. `extra["claude"]`, `extra["gemini"]`). `toolpath-claude` populates `Turn.extra["claude"]` from `ConversationEntry.extra`; `toolpath-gemini` populates `Turn.extra["gemini"]` with the full `tokens` struct, per-thought metadata, and tool-call status. This lets trait-only consumers access provider metadata without importing provider types. - Shared derivation: `toolpath-convo` provides a provider-agnostic `ConversationView → Path` mapping via `toolpath_convo::derive_path`. New conversation providers should build on it rather than re-implementing the mapping. -- Path kinds: `toolpath::v1::PathMeta.kind` is an optional URI naming a hosted kind spec; URIs are immutable and semver-versioned. The only one defined so far is `https://toolpath.net/kinds/agent-coding-session/v1.0.0` (constant `toolpath::v1::PATH_KIND_AGENT_CODING_SESSION`); every conversation → `Path` derivation sets it via the shared `toolpath_convo::derive_path` or each provider crate's own. Carried through the JSONL form via `PathOpen.meta` and `PathMeta` patch lines. Spec sources live in `site/kinds///{index.md,schema.json}` and publish under `https://toolpath.net/kinds/`; the registry index is `site/kinds/index.md`. RFC: "Document Kind". JSON Schema: `$defs/pathMeta`. +- Path kinds: `toolpath::v1::PathMeta.kind` is an optional URI naming a hosted kind spec; URIs are immutable and semver-versioned. The only one defined so far is `https://toolpath.net/kinds/agent-coding-session/v1.1.0` (constant `toolpath::v1::PATH_KIND_AGENT_CODING_SESSION`; `…_V1_0_0` names the superseded URI); every conversation → `Path` derivation sets it via the shared `toolpath_convo::derive_path` or each provider crate's own. Carried through the JSONL form via `PathOpen.meta` and `PathMeta` patch lines. Spec sources live in `site/kinds///{index.md,schema.json}` (schema.json is a symlink into `crates/path-cli/kinds/`, which `path p validate` bundles — both versions) and publish under `https://toolpath.net/kinds/`; the registry index is `site/kinds/index.md`. RFC: "Document Kind". JSON Schema: `$defs/pathMeta`. +- Token accounting (kind v1.1.0): two keys on `conversation.append`/`Turn`, both optional. `token_usage` = "the total for a message" (on the group's final step; `Σ` over a path = session total). `attributed_token_usage` = "this step's own attributed spend", populated only where the source genuinely reports per-step spend (its own key, so the sum is unaffected; remainder = group total − Σ attributed, computed not stored). One provider message can span several steps (Claude writes one JSONL line per content block); `Turn.group_id` groups them. `toolpath-claude` fills `group_id` from `message.id` and takes the **field-wise-max** group total (line order not trusted). Claude's per-line `usage` is a cumulative *streaming snapshot* (Anthropic streaming API: `message_start` seeds output near 0, `message_delta` is cumulative), NOT a per-block cost — so Claude emits no `attributed_token_usage`; the projector re-expands the total onto every line. `toolpath-codex` differences the cumulative `total_token_usage` (dedup-safe: never sum `last_token_usage` — Codex re-emits it stale; openai/codex #14489), attributes each per-call delta to the step it follows, and derives the round total from those attributions. pi/opencode decode all-zero wire counters as `None`. Never stamp a cumulative counter, a repeated message total, or zero-filled placeholders onto a step; never derive attribution from Claude's streaming snapshots. +- Token usage `breakdowns` (kind v1.1.0, additive): an optional third key on `TokenUsage` — a decomposition of a top-level class into named sub-classes, keyed by class (e.g. `"output"`), inner map sub-class → tokens (e.g. `breakdowns["output"]["reasoning"] = 243`). INFORMATIONAL ONLY: **never summed into any total** (the parent class already counts those tokens, so the session-total guarantee is untouched); invariant `Σ(inner) ≤ parent`; omitted when empty; rides both `token_usage` and `attributed_token_usage`. Per-provider reality: **Gemini** reports `thoughts` (reasoning) as an additive sibling that the derivation used to **drop** (under-counting output) — it's now folded into `output_tokens` *and* recorded as `breakdowns["output"]["reasoning"]`, with the projector un-folding it on the reverse path for a lossless round-trip (`Some(0)` preserved as a real Gemini-3 zero-reasoning signal). **OpenCode** folds `reasoning` into output and records the same breakdown. **Codex** differences `reasoning_output_tokens` (⊆ output, cumulative) into `breakdowns["output"]["reasoning"]` on both per-step `attributed_token_usage` and per-round `token_usage`. **Claude** records no breakdown (its JSONL `usage` doesn't itemize thinking tokens). - Pi provider: `toolpath-pi` reads Pi session JSONL from `~/.pi/agent/sessions/`. Sessions use a tree (id/parentId) in a single file, and may link to a parent file via `parentSession` in the header. The tree is preserved as a DAG in the derived `Path`. - Codex provider: `toolpath-codex` reads Codex CLI rollout files from `~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl`. Sessions are date-bucketed (not project-keyed). File-change fidelity is excellent — Codex's `patch_apply_end` events carry either the unified diff (for updates) or the full file content (for adds), so the derived `Path` gets a real `raw` perspective on every file artifact. See `docs/agents/formats/codex.md` for the full format reference. - opencode provider: `toolpath-opencode` reads a SQLite database at `~/.local/share/opencode/opencode.db` (opened read-only). Each session's messages and 12 typed part variants (text, reasoning, tool, step-start/-finish, snapshot, patch, file, agent, subtask, retry, compaction) land as one step per message with tool invocations attached. File diffs come from a sibling bare git repo at `snapshot//[]/` via `git2` tree↔tree diffs — opencode respects the user's `.gitignore`, so changes under gitignored paths fall back to tool-input-derived structural changes with no `raw` perspective. Project id is the SHA of the repo's first root commit. See `docs/agents/formats/opencode.md` for the full format reference. diff --git a/Cargo.lock b/Cargo.lock index 2453f87..77e8650 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -2308,7 +2308,7 @@ dependencies = [ [[package]] name = "path-cli" -version = "0.13.1" +version = "0.14.0" dependencies = [ "anyhow", "assert_cmd", @@ -3885,7 +3885,7 @@ dependencies = [ [[package]] name = "toolpath" -version = "0.6.0" +version = "0.7.0" dependencies = [ "serde", "serde_json", @@ -3893,7 +3893,7 @@ dependencies = [ [[package]] name = "toolpath-claude" -version = "0.11.1" +version = "0.12.0" dependencies = [ "anyhow", "chrono", @@ -3910,7 +3910,7 @@ dependencies = [ [[package]] name = "toolpath-codex" -version = "0.5.0" +version = "0.6.0" dependencies = [ "anyhow", "chrono", @@ -3924,7 +3924,7 @@ dependencies = [ [[package]] name = "toolpath-convo" -version = "0.10.0" +version = "0.11.0" dependencies = [ "chrono", "jsonschema", @@ -3937,7 +3937,7 @@ dependencies = [ [[package]] name = "toolpath-cursor" -version = "0.1.0" +version = "0.2.0" dependencies = [ "anyhow", "chrono", @@ -3953,14 +3953,14 @@ dependencies = [ [[package]] name = "toolpath-dot" -version = "0.4.0" +version = "0.5.0" dependencies = [ "toolpath", ] [[package]] name = "toolpath-gemini" -version = "0.5.0" +version = "0.6.0" dependencies = [ "anyhow", "chrono", @@ -3975,7 +3975,7 @@ dependencies = [ [[package]] name = "toolpath-git" -version = "0.5.0" +version = "0.6.0" dependencies = [ "anyhow", "chrono", @@ -3986,7 +3986,7 @@ dependencies = [ [[package]] name = "toolpath-github" -version = "0.5.0" +version = "0.6.0" dependencies = [ "anyhow", "chrono", @@ -3998,7 +3998,7 @@ dependencies = [ [[package]] name = "toolpath-md" -version = "0.6.0" +version = "0.7.0" dependencies = [ "serde_json", "toolpath", @@ -4006,7 +4006,7 @@ dependencies = [ [[package]] name = "toolpath-opencode" -version = "0.4.0" +version = "0.5.0" dependencies = [ "anyhow", "chrono", @@ -4024,7 +4024,7 @@ dependencies = [ [[package]] name = "toolpath-pi" -version = "0.5.0" +version = "0.6.0" dependencies = [ "anyhow", "chrono", diff --git a/Cargo.toml b/Cargo.toml index f92778f..b03c2d2 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -23,19 +23,19 @@ edition = "2024" license = "Apache-2.0" [workspace.dependencies] -toolpath = { version = "0.6.0", path = "crates/toolpath" } -toolpath-convo = { version = "0.10.0", path = "crates/toolpath-convo" } -toolpath-git = { version = "0.5.0", path = "crates/toolpath-git" } -toolpath-claude = { version = "0.11.1", path = "crates/toolpath-claude", default-features = false } -toolpath-gemini = { version = "0.5.0", path = "crates/toolpath-gemini", default-features = false } -toolpath-codex = { version = "0.5.0", path = "crates/toolpath-codex" } -toolpath-opencode = { version = "0.4.0", path = "crates/toolpath-opencode" } -toolpath-cursor = { version = "0.1.0", path = "crates/toolpath-cursor" } -toolpath-github = { version = "0.5.0", path = "crates/toolpath-github" } -toolpath-dot = { version = "0.4.0", path = "crates/toolpath-dot" } -toolpath-md = { version = "0.6.0", path = "crates/toolpath-md" } -toolpath-pi = { version = "0.5.0", path = "crates/toolpath-pi" } -path-cli = { version = "0.13.1", path = "crates/path-cli" } +toolpath = { version = "0.7.0", path = "crates/toolpath" } +toolpath-convo = { version = "0.11.0", path = "crates/toolpath-convo" } +toolpath-git = { version = "0.6.0", path = "crates/toolpath-git" } +toolpath-claude = { version = "0.12.0", path = "crates/toolpath-claude", default-features = false } +toolpath-gemini = { version = "0.6.0", path = "crates/toolpath-gemini", default-features = false } +toolpath-codex = { version = "0.6.0", path = "crates/toolpath-codex" } +toolpath-opencode = { version = "0.5.0", path = "crates/toolpath-opencode" } +toolpath-cursor = { version = "0.2.0", path = "crates/toolpath-cursor" } +toolpath-github = { version = "0.6.0", path = "crates/toolpath-github" } +toolpath-dot = { version = "0.5.0", path = "crates/toolpath-dot" } +toolpath-md = { version = "0.7.0", path = "crates/toolpath-md" } +toolpath-pi = { version = "0.6.0", path = "crates/toolpath-pi" } +path-cli = { version = "0.14.0", path = "crates/path-cli" } pathbase-client = { version = "0.2.0", path = "crates/pathbase-client" } reqwest = { version = "0.13", default-features = false, features = ["blocking", "json", "rustls"] } diff --git a/crates/path-cli/Cargo.toml b/crates/path-cli/Cargo.toml index a9160b5..01eea17 100644 --- a/crates/path-cli/Cargo.toml +++ b/crates/path-cli/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "path-cli" -version = "0.13.1" +version = "0.14.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/path-cli/kinds/agent-coding-session/v1.1.0/schema.json b/crates/path-cli/kinds/agent-coding-session/v1.1.0/schema.json new file mode 100644 index 0000000..90e5816 --- /dev/null +++ b/crates/path-cli/kinds/agent-coding-session/v1.1.0/schema.json @@ -0,0 +1,246 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://toolpath.net/kinds/agent-coding-session/v1.1.0/schema.json", + "title": "Toolpath kind: agent-coding-session v1.1.0", + "description": "Additive constraints on a Toolpath `Path` whose `meta.kind` is the agent-coding-session v1.1.0 URI. Apply alongside the base Toolpath schema; the path is valid when both pass. These constraints are by structural `type`, not by artifact key — a `change` entry is checked only when its `structural.type` matches one this kind defines (`conversation.append`, `file.write`, `conversation.event`). Everything is additive: unmentioned properties are allowed, so producer-specific extras never invalidate a path. New in v1.1.0: `group_id` on the turn payload, and the group-level accounting rule — within a `group_id` group, the last step (document order) carries the group's total `token_usage` verbatim from the source and the other steps carry none; values are per-group amounts (never cumulative session counters); summing `token_usage` over a path's steps therefore yields the session totals. The once-per-group rule is normative prose (JSON Schema cannot express it); producers must enforce it. The human-readable contract is at https://toolpath.net/kinds/agent-coding-session/v1.1.0/.", + "type": "object", + + "$defs": { + "tokenUsage": { + "type": "object", + "description": "Token accounting for one source group (a message for Claude, a round for Codex). `token_usage` always means the total for a group, verbatim from the source — never a cumulative session counter, never a per-step share. Within a `group_id` group the total sits on the group's last step (document order) and the other steps carry none; a step without a `group_id` is a group of one, so its value is that group's total. Summing over a path's steps therefore yields the session totals. `input_tokens`/`output_tokens` are always emitted (possibly null); cache counters appear only when the source records them.", + "properties": { + "input_tokens": { "type": ["integer", "null"] }, + "output_tokens": { "type": ["integer", "null"] }, + "cache_read_tokens": { "type": "integer" }, + "cache_write_tokens": { "type": "integer" }, + "breakdowns": { + "type": "object", + "description": "Optional decomposition of a top-level class into named sub-classes, keyed by the class being broken down (e.g. \"output\"); each value maps sub-class → tokens. INFORMATIONAL: never summed into the total — the parent class already counts these. Invariant: Σ(inner) ≤ the parent class's value.", + "additionalProperties": { "type": "object", "additionalProperties": { "type": "integer" } } + } + }, + "required": ["input_tokens", "output_tokens"] + }, + + "attributedTokenUsage": { + "type": "object", + "description": "This step's own attributed spend, when the source provides step-aligned data — distinct from `token_usage` (the group total). Optional and orthogonal: it rides its own key so summing `token_usage` over steps is unaffected. Within a `group_id` group, the sum of `attributed_token_usage` over its steps is the group's attributed spend; the unattributed remainder (`group token_usage − Σ attributed`) is computed by consumers, never recorded. Same field shape as `tokenUsage`. A producer populates it only when the source genuinely reports per-step spend — among current producers, Codex does (each step is a separate API call with a reported per-call delta); Claude does not (its per-block usage is a cumulative streaming snapshot, not a per-block cost), so Claude-derived steps carry the group total only.", + "properties": { + "input_tokens": { "type": ["integer", "null"] }, + "output_tokens": { "type": ["integer", "null"] }, + "cache_read_tokens": { "type": "integer" }, + "cache_write_tokens": { "type": "integer" } + } + }, + + "toolResult": { + "type": "object", + "properties": { + "content": { "type": "string" }, + "is_error": { "type": "boolean" } + }, + "required": ["content", "is_error"] + }, + + "toolUse": { + "type": "object", + "description": "One tool invocation. `input` is producer-specific JSON (left unconstrained). `category` is Toolpath's classification, or null when the tool is unrecognized. `result` is present only when the result was available in the same turn.", + "properties": { + "id": { "type": "string" }, + "name": { "type": "string" }, + "input": true, + "category": { + "type": ["string", "null"], + "enum": [ + "file_read", + "file_write", + "file_search", + "shell", + "network", + "delegation", + null + ] + }, + "result": { "$ref": "#/$defs/toolResult" } + }, + "required": ["id", "name", "input", "category"] + }, + + "environment": { + "type": "object", + "description": "Working environment captured at the turn. All fields optional.", + "properties": { + "working_dir": { "type": "string" }, + "vcs_branch": { "type": "string" }, + "vcs_revision": { "type": "string" } + } + }, + + "delegation": { + "type": "object", + "description": "Sub-agent work spawned from a turn. `turns` carries the sub-agent's own turns when the producer inlines them.", + "properties": { + "agent_id": { "type": "string" }, + "prompt": { "type": "string" }, + "turns": { "type": "array" }, + "result": { "type": "string" } + }, + "required": ["agent_id", "prompt"] + }, + + "conversationAppend": { + "type": "object", + "description": "The turn payload: the `structural` object of the one `change` entry whose `type` is `conversation.append`. `role` and `text` are always present (text may be empty); everything else appears only when the turn carries it. `group_id` is the provider's identifier for the source accounting unit this turn was derived from — a message for Claude (`message.id`), a round for Codex (`turn_id`). A grouping key, not a step identifier: steps sharing a `group_id` came from one accounting unit (Claude Code writes one JSONL line per content block; a Codex round emits several turns).", + "properties": { + "type": { "const": "conversation.append" }, + "role": { "type": "string" }, + "text": { "type": "string" }, + "thinking": { "type": "string" }, + "group_id": { "type": "string" }, + "tool_uses": { + "type": "array", + "items": { "$ref": "#/$defs/toolUse" } + }, + "token_usage": { "$ref": "#/$defs/tokenUsage" }, + "attributed_token_usage": { "$ref": "#/$defs/attributedTokenUsage" }, + "stop_reason": { "type": "string" }, + "delegations": { + "type": "array", + "items": { "$ref": "#/$defs/delegation" } + }, + "environment": { "$ref": "#/$defs/environment" } + }, + "required": ["type", "role", "text"] + }, + + "fileWrite": { + "type": "object", + "description": "The `structural` object of a sibling `file.write` change keyed by file path. The unified diff (when present) lives on the artifact change's `raw`, not here. `tool_id`/`tool` link the mutation to the `ToolInvocation` that caused it when attributable.", + "properties": { + "type": { "const": "file.write" }, + "tool_id": { "type": "string" }, + "tool": { "type": "string" }, + "operation": { "type": "string" }, + "before": { "type": "string" }, + "after": { "type": "string" }, + "rename_to": { "type": "string" } + }, + "required": ["type"] + }, + + "conversationEvent": { + "type": "object", + "description": "The `structural` object of a `conversation.event` change — a non-turn entry (attachment, preamble line, snapshot, …) preserved for round-trip fidelity. `entry_type` names the source entry kind; the producer's flattened event data rides alongside.", + "properties": { + "type": { "const": "conversation.event" }, + "entry_type": { "type": "string" }, + "event_source_id": { "type": "string" } + }, + "required": ["type", "entry_type"] + }, + + "artifactChange": { + "type": "object", + "description": "An artifact change, constrained only when its `structural.type` is one this kind defines.", + "allOf": [ + { + "if": { + "type": "object", + "properties": { + "structural": { + "type": "object", + "properties": { "type": { "const": "conversation.append" } }, + "required": ["type"] + } + }, + "required": ["structural"] + }, + "then": { + "properties": { + "structural": { "$ref": "#/$defs/conversationAppend" } + } + } + }, + { + "if": { + "type": "object", + "properties": { + "structural": { + "type": "object", + "properties": { "type": { "const": "file.write" } }, + "required": ["type"] + } + }, + "required": ["structural"] + }, + "then": { + "properties": { + "structural": { "$ref": "#/$defs/fileWrite" } + } + } + }, + { + "if": { + "type": "object", + "properties": { + "structural": { + "type": "object", + "properties": { "type": { "const": "conversation.event" } }, + "required": ["type"] + } + }, + "required": ["structural"] + }, + "then": { + "properties": { + "structural": { "$ref": "#/$defs/conversationEvent" } + } + } + } + ] + } + }, + + "properties": { + "meta": { + "type": "object", + "description": "Path metadata. `kind` pins this spec; `source` names the producing harness; `producer`/`files_changed`/`vcs_remote` are flattened session-level fields (PathMeta carries `extra` via serde flatten, so they sit directly under `meta`, not under `meta.extra`).", + "properties": { + "kind": { + "const": "https://toolpath.net/kinds/agent-coding-session/v1.1.0" + }, + "source": { "type": "string" }, + "files_changed": { + "type": "array", + "items": { "type": "string" } + }, + "vcs_remote": { "type": "string" }, + "producer": { + "type": "object", + "properties": { + "name": { "type": "string" }, + "version": { "type": "string" } + }, + "required": ["name"] + } + }, + "required": ["kind"] + }, + + "steps": { + "type": "array", + "items": { + "type": "object", + "properties": { + "change": { + "type": "object", + "additionalProperties": { "$ref": "#/$defs/artifactChange" } + } + } + } + } + }, + + "required": ["meta"] +} diff --git a/crates/path-cli/src/schema.rs b/crates/path-cli/src/schema.rs index 2c035d1..381ef7d 100644 --- a/crates/path-cli/src/schema.rs +++ b/crates/path-cli/src/schema.rs @@ -24,10 +24,16 @@ const SCHEMA_SOURCE: &str = toolpath::SCHEMA_JSON; /// `meta.kind` URI → bundled kind-schema source. Bundled (rather than /// fetched from `toolpath.net` at validation time) so validation is /// offline and deterministic. -const KIND_SCHEMAS: &[(&str, &str)] = &[( - "https://toolpath.net/kinds/agent-coding-session/v1.0.0", - include_str!("../kinds/agent-coding-session/v1.0.0/schema.json"), -)]; +const KIND_SCHEMAS: &[(&str, &str)] = &[ + ( + "https://toolpath.net/kinds/agent-coding-session/v1.0.0", + include_str!("../kinds/agent-coding-session/v1.0.0/schema.json"), + ), + ( + "https://toolpath.net/kinds/agent-coding-session/v1.1.0", + include_str!("../kinds/agent-coding-session/v1.1.0/schema.json"), + ), +]; fn validator() -> &'static Validator { static VALIDATOR: OnceLock = OnceLock::new(); @@ -223,7 +229,7 @@ mod tests { validate(&doc).expect("base is optional on path identity"); } - const ACS_KIND: &str = "https://toolpath.net/kinds/agent-coding-session/v1.0.0"; + const ACS_KIND: &str = "https://toolpath.net/kinds/agent-coding-session/v1.1.0"; fn acs_graph(append: serde_json::Value) -> serde_json::Value { json!({ diff --git a/crates/path-cli/tests/cross_harness_matrix.rs b/crates/path-cli/tests/cross_harness_matrix.rs index 9c11aee..34e55fe 100644 --- a/crates/path-cli/tests/cross_harness_matrix.rs +++ b/crates/path-cli/tests/cross_harness_matrix.rs @@ -639,23 +639,31 @@ mod invariants { after_target: &ConversationView, failures: &mut Vec, ) { - let pre: Vec<&Turn> = before_target - .turns - .iter() - .filter(|t| matches!(t.role, Role::Assistant)) - .collect(); - let post: Vec<&Turn> = after_target - .turns - .iter() - .filter(|t| matches!(t.role, Role::Assistant)) - .collect(); - for (i, (a, b)) in pre.iter().zip(post.iter()).enumerate() { - if a.token_usage.is_some() && b.token_usage.is_none() { - failures.push(format!( - "token_usage at assistant #{} dropped (had {:?})", - i, a.token_usage - )); - } + // Harnesses legitimately fold or split turns in translation + // (e.g. thinking-only claude turns merge into codex `reasoning` + // lines), so assistant indexes don't align across harnesses. + // The accounting invariant is order-preserving instead: the + // sequence of usage values on assistant turns survives, compared + // on input/output — the fields every wire carries (codex has no + // cache_write analog, cursor carries no cache counters at all). + let usage_seq = |v: &ConversationView| -> Vec<(Option, Option)> { + v.turns + .iter() + .filter(|t| matches!(t.role, Role::Assistant)) + .filter_map(|t| t.token_usage.as_ref()) + .map(|u| (u.input_tokens, u.output_tokens)) + .collect() + }; + let pre = usage_seq(before_target); + let post = usage_seq(after_target); + if pre != post { + failures.push(format!( + "assistant usage sequence diverged ({} -> {} entries)\n first: {:?}\n second: {:?}", + pre.len(), + post.len(), + pre, + post + )); } if before_target.total_usage.is_some() && after_target.total_usage.is_none() { failures.push(format!( diff --git a/crates/toolpath-claude/Cargo.toml b/crates/toolpath-claude/Cargo.toml index de1ca31..5f46341 100644 --- a/crates/toolpath-claude/Cargo.toml +++ b/crates/toolpath-claude/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath-claude" -version = "0.11.1" +version = "0.12.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/toolpath-claude/src/project.rs b/crates/toolpath-claude/src/project.rs index ef27332..85969d7 100644 --- a/crates/toolpath-claude/src/project.rs +++ b/crates/toolpath-claude/src/project.rs @@ -105,6 +105,24 @@ fn project_view(view: &ConversationView) -> std::result::Result = HashMap::new(); + // Message-group accounting. The IR carries a message's total + // `token_usage` only on the group's final turn; real Claude JSONL stamps + // usage on every line of a split (streaming snapshots that climb to the + // total — see `provider::canonicalize_message_usage`). We re-expand the + // group total onto every line, matching the common after-generation + // pattern where the lines repeat one value. (We don't reconstruct the + // intermediate streaming snapshots: they carry no per-step meaning, the + // IR doesn't retain them, and the final total is what consumers sum.) + let mut group_total: HashMap<&str, toolpath_convo::TokenUsage> = HashMap::new(); + for turn in &view.turns { + if let (Some(mid), Some(usage)) = (turn.group_id.as_deref(), &turn.token_usage) { + group_total + .entry(mid) + .and_modify(|acc| *acc = crate::provider::max_usage(acc, usage)) + .or_insert_with(|| usage.clone()); + } + } + for turn in &view.turns { // Pre-rewrite this turn's parent_id if a synthesized tool_result // was emitted between it and its IR-recorded parent. @@ -122,7 +140,15 @@ fn project_view(view: &ConversationView) -> std::result::Result { - let mut assistant_entry = assistant_turn_to_entry(turn, &view.id); + // Grouped: the message total on every line of the split. + // Ungrouped: the turn's own usage. + let wire_usage: Option = match turn.group_id.as_deref() + { + Some(mid) => group_total.get(mid).cloned(), + None => turn.token_usage.clone(), + }; + let mut assistant_entry = + assistant_turn_to_entry_with_usage(turn, &view.id, wire_usage.as_ref()); apply_turn_metadata(&mut assistant_entry, turn); assistant_entry.parent_uuid = effective_parent; convo.add_entry(assistant_entry); @@ -341,11 +367,19 @@ fn user_turn_to_entry(turn: &Turn, session_id: &str) -> ConversationEntry { } } -/// Build a `ConversationEntry` for an assistant turn. -fn assistant_turn_to_entry(turn: &Turn, session_id: &str) -> ConversationEntry { +/// Build a `ConversationEntry` for an assistant turn. `wire_usage` is the +/// usage to write on the JSONL line: the IR carries a message's total only +/// on the group's final turn, but real Claude Code repeats `message.usage` +/// on every line of a split message, so `project_view` passes the group +/// total for every member turn. +fn assistant_turn_to_entry_with_usage( + turn: &Turn, + session_id: &str, + wire_usage: Option<&toolpath_convo::TokenUsage>, +) -> ConversationEntry { let content = build_assistant_content(turn); - let usage = turn.token_usage.as_ref().map(|u| Usage { + let usage = wire_usage.map(|u| Usage { input_tokens: u.input_tokens, output_tokens: u.output_tokens, // TokenUsage uses cache_write_tokens; Usage uses cache_creation_input_tokens @@ -368,7 +402,7 @@ fn assistant_turn_to_entry(turn: &Turn, session_id: &str) -> ConversationEntry { role: MessageRole::Assistant, content: Some(content), model: turn.model.clone(), - id: None, + id: turn.group_id.clone(), message_type: None, stop_reason: turn.stop_reason.clone(), stop_sequence: None, @@ -1001,6 +1035,7 @@ mod tests { Turn { id: id.to_string(), parent_id: None, + group_id: None, role: Role::User, timestamp: "2024-01-01T00:00:00Z".to_string(), text: text.to_string(), @@ -1009,6 +1044,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -1019,6 +1055,7 @@ mod tests { Turn { id: id.to_string(), parent_id: None, + group_id: None, role: Role::Assistant, timestamp: "2024-01-01T00:00:01Z".to_string(), text: text.to_string(), @@ -1027,6 +1064,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -1038,6 +1076,81 @@ mod tests { &convo.entries } + // ── Message-group usage re-expansion ───────────────────────────── + + #[test] + fn test_projector_reexpands_group_usage_onto_every_line() { + // The IR carries a message's total only on the group's final turn; + // real Claude Code JSONL repeats `message.usage` (and `message.id`) + // on every line of the split. The projector must restore that. + let usage = toolpath_convo::TokenUsage { + input_tokens: Some(6), + output_tokens: Some(997), + cache_read_tokens: Some(14_842), + cache_write_tokens: Some(429_831), + ..Default::default() + }; + let mut a1 = assistant_turn("a1", "Working on it."); + a1.group_id = Some("msg_A".into()); + let mut a2 = assistant_turn("a2", ""); + a2.group_id = Some("msg_A".into()); + a2.token_usage = Some(usage); + + let view = make_view("sess-1", vec![user_turn("u1", "Go"), a1, a2]); + let convo = ClaudeProjector.project(&view).unwrap(); + + let assistants: Vec<&ConversationEntry> = content_entries(&convo) + .iter() + .filter(|e| e.entry_type == "assistant") + .collect(); + assert_eq!(assistants.len(), 2); + for entry in &assistants { + let msg = entry.message.as_ref().unwrap(); + assert_eq!(msg.id.as_deref(), Some("msg_A")); + let u = msg.usage.as_ref().expect("every line carries usage"); + assert_eq!(u.output_tokens, Some(997)); + assert_eq!(u.cache_creation_input_tokens, Some(429_831)); + } + } + + #[test] + fn test_group_total_survives_projector_roundtrip() { + // The IR carries a message's total on the group's final turn only. + // The projector re-expands it onto every line of the split (with the + // shared message.id); re-reading collapses it back to the same total + // on the final turn. Summing per step yields the session total + // either way. + let mut a1 = assistant_turn("a1", "first"); + a1.group_id = Some("msg_A".into()); + let mut a2 = assistant_turn("a2", "second"); + a2.group_id = Some("msg_A".into()); + a2.token_usage = Some(toolpath_convo::TokenUsage { + input_tokens: Some(6), + output_tokens: Some(164), + cache_read_tokens: Some(100), + cache_write_tokens: Some(200), + ..Default::default() + }); + + let view = make_view("sess-1", vec![user_turn("u1", "Go"), a1, a2]); + let convo = ClaudeProjector.project(&view).unwrap(); + + // Wire: the total is stamped on every line of the split, each tagged + // with the shared message.id. + for entry in content_entries(&convo).iter().filter(|e| e.entry_type == "assistant") { + let msg = entry.message.as_ref().unwrap(); + assert_eq!(msg.id.as_deref(), Some("msg_A")); + assert_eq!(msg.usage.as_ref().unwrap().output_tokens, Some(164)); + } + + // Re-read: total back on the final turn only; no fabricated attribution. + let back = crate::provider::to_view(&convo); + let a: Vec<&Turn> = back.turns.iter().filter(|t| t.role == Role::Assistant).collect(); + assert!(a[0].token_usage.is_none()); + assert_eq!(a[1].token_usage.as_ref().unwrap().output_tokens, Some(164)); + assert!(a.iter().all(|t| t.attributed_token_usage.is_none())); + } + // ── Permission-mode preamble ───────────────────────────────────── #[test] @@ -1258,6 +1371,7 @@ mod tests { output_tokens: Some(50), cache_read_tokens: Some(500), // → cache_read_input_tokens cache_write_tokens: Some(200), // → cache_creation_input_tokens + ..Default::default() }); let view = make_view("sess-1", vec![turn]); diff --git a/crates/toolpath-claude/src/provider.rs b/crates/toolpath-claude/src/provider.rs index a619e8c..1896d1e 100644 --- a/crates/toolpath-claude/src/provider.rs +++ b/crates/toolpath-claude/src/provider.rs @@ -105,6 +105,7 @@ fn message_to_turn(entry: &ConversationEntry, msg: &Message) -> Turn { output_tokens: u.output_tokens, cache_read_tokens: u.cache_read_input_tokens, cache_write_tokens: u.cache_creation_input_tokens, + ..Default::default() }); let environment = if entry.cwd.is_some() || entry.git_branch.is_some() { @@ -122,6 +123,11 @@ fn message_to_turn(entry: &ConversationEntry, msg: &Message) -> Turn { Turn { id: entry.uuid.clone(), parent_id: entry.parent_uuid.clone(), + // The API message ID (`msg_…`). Claude Code writes one JSONL line + // per content block, so several turns can share one group_id — + // and each repeats the message-level `usage`. Downstream accounting + // (sum_usage, derive_path) counts a message group once. + group_id: msg.id.clone(), role: claude_role_to_role(&msg.role), timestamp: entry.timestamp.clone(), text, @@ -130,6 +136,7 @@ fn message_to_turn(entry: &ConversationEntry, msg: &Message) -> Turn { model: msg.model.clone(), stop_reason: msg.stop_reason.clone(), token_usage, + attributed_token_usage: None, environment, delegations, file_mutations, @@ -372,6 +379,8 @@ fn conversation_to_view(convo: &Conversation) -> ConversationView { turns.push(turn); } + canonicalize_message_usage(&mut turns); + // Re-derive delegation results now that tool results are merged for turn in &mut turns { for delegation in &mut turn.delegations { @@ -517,11 +526,93 @@ fn entry_to_event(entry: &ConversationEntry) -> toolpath_convo::ConversationEven } } +/// Field-wise maximum of two usage tuples. `None` is "absent", not 0, so a +/// field present in only one operand survives. +pub(crate) fn max_usage(a: &TokenUsage, b: &TokenUsage) -> TokenUsage { + fn m(x: Option, y: Option) -> Option { + match (x, y) { + (Some(a), Some(b)) => Some(a.max(b)), + (Some(v), None) | (None, Some(v)) => Some(v), + (None, None) => None, + } + } + TokenUsage { + input_tokens: m(a.input_tokens, b.input_tokens), + output_tokens: m(a.output_tokens, b.output_tokens), + cache_read_tokens: m(a.cache_read_tokens, b.cache_read_tokens), + cache_write_tokens: m(a.cache_write_tokens, b.cache_write_tokens), + ..Default::default() + } +} + +/// Canonicalize message-level accounting for split messages. +/// +/// Claude Code writes one JSONL line per content block of an assistant API +/// message, each stamped with `message.usage`. That `usage` is a **streaming +/// snapshot**, not a per-line bill: per the Anthropic streaming API, +/// `message_start` seeds `output_tokens` near zero and each `message_delta` +/// reports the running **cumulative** total, with the final value being the +/// message total. So across a split message's lines, `input`/`cache` are +/// constant and `output_tokens` climbs to the total on the final line — +/// confirmed across every session sampled (~27% of multi-line messages vary; +/// the rest repeat one value stamped after generation). The intermediate +/// values are flush-time snapshots, **not** per-content-block costs (a real +/// prose block routinely shows `output_tokens: 1`), so we do not derive +/// per-step attribution from them, and — the format being undocumented — we +/// do not trust line order. +/// +/// For each consecutive `group_id` run this sets `token_usage` on the run's +/// **final** turn to the field-wise **maximum** across the run (the message +/// total — never under-counts whatever the stream order) and clears it from +/// the others, so summing `token_usage` over turns yields session totals. +fn canonicalize_message_usage(turns: &mut [Turn]) { + let mut i = 0; + while i < turns.len() { + let Some(mid) = turns[i].group_id.clone() else { + i += 1; + continue; + }; + let mut j = i; + while j < turns.len() && turns[j].group_id.as_deref() == Some(mid.as_str()) { + j += 1; + } + + // Message total = field-wise max across the run (the final streaming + // snapshot, found without trusting line order). + let mut total: Option = None; + for t in &turns[i..j] { + if let Some(u) = &t.token_usage { + total = Some(match total { + Some(acc) => max_usage(&acc, u), + None => u.clone(), + }); + } + } + + for t in &mut turns[i..j] { + t.token_usage = None; + } + if let Some(total) = total { + turns[j - 1].token_usage = Some(total); + } + i = j; + } +} + /// Sum token usage across all turns. fn sum_usage(turns: &[Turn]) -> Option { let mut total = TokenUsage::default(); let mut any = false; - for turn in turns { + for (idx, turn) in turns.iter().enumerate() { + // Turns split from one provider message all repeat that message's + // usage; count it once, on the run's last turn. + if let Some(mid) = &turn.group_id + && turns + .get(idx + 1) + .is_some_and(|next| next.group_id.as_ref() == Some(mid)) + { + continue; + } if let Some(u) = &turn.token_usage { any = true; total.input_tokens = @@ -744,6 +835,96 @@ mod tests { use std::fs; use tempfile::TempDir; + /// One assistant turn carrying a cumulative usage snapshot (only + /// output varies across a split, so input/cache are fixed here). + fn grp_turn(id: &str, mid: &str, output: u32) -> Turn { + let mut t = message_turn_stub(id); + t.group_id = Some(mid.into()); + t.token_usage = Some(TokenUsage { + input_tokens: Some(6), + output_tokens: Some(output), + cache_read_tokens: Some(14_842), + cache_write_tokens: Some(429_831), + ..Default::default() + }); + t + } + + fn message_turn_stub(id: &str) -> Turn { + Turn { + id: id.into(), + parent_id: None, + group_id: None, + role: Role::Assistant, + timestamp: "2024-01-01T00:00:00Z".into(), + text: String::new(), + thinking: None, + tool_uses: vec![], + model: None, + stop_reason: None, + token_usage: None, + attributed_token_usage: None, + environment: None, + delegations: vec![], + file_mutations: vec![], + } + } + + #[test] + fn canonicalize_streamed_group_keeps_total_only_on_final_turn() { + // Streaming snapshots climb 55 -> 164 across two lines of one + // message. The final turn carries the message total (the final + // snapshot); earlier turns carry nothing. The intermediate snapshot + // (55) is NOT per-block attribution — it's where generation happened + // to be when the line was flushed — so we never record it. + let mut turns = vec![grp_turn("t1", "msg_A", 55), grp_turn("t2", "msg_A", 164)]; + canonicalize_message_usage(&mut turns); + + assert!(turns[0].token_usage.is_none(), "total only on final turn"); + assert_eq!(turns[1].token_usage.as_ref().unwrap().output_tokens, Some(164)); + assert_eq!(turns[1].token_usage.as_ref().unwrap().input_tokens, Some(6)); + for t in &turns { + assert!( + t.attributed_token_usage.is_none(), + "Claude per-line snapshots are not per-step attribution" + ); + } + } + + #[test] + fn canonicalize_does_not_trust_line_order() { + // Defensive: the complete total arrives FIRST (out of order). We + // must still report 164 as the message total — the field-wise max, + // not the last line's snapshot. + let mut turns = vec![grp_turn("t1", "msg_A", 164), grp_turn("t2", "msg_A", 55)]; + canonicalize_message_usage(&mut turns); + + assert_eq!( + turns[1].token_usage.as_ref().unwrap().output_tokens, + Some(164), + "field-wise max, not the last line" + ); + } + + #[test] + fn canonicalize_collapses_repeated_total_to_one_turn() { + // Byte-identical lines (the ~73% case): the total lands once, on the + // final turn; no attribution either way. + let mut turns = vec![ + grp_turn("t1", "msg_A", 997), + grp_turn("t2", "msg_A", 997), + grp_turn("t3", "msg_A", 997), + ]; + canonicalize_message_usage(&mut turns); + + assert!(turns[0].token_usage.is_none()); + assert!(turns[1].token_usage.is_none()); + assert_eq!(turns[2].token_usage.as_ref().unwrap().output_tokens, Some(997)); + for t in &turns { + assert!(t.attributed_token_usage.is_none()); + } + } + fn setup_provider() -> (TempDir, ClaudeConvo) { let temp = TempDir::new().unwrap(); let claude_dir = temp.path().join(".claude"); @@ -765,6 +946,86 @@ mod tests { (temp, ClaudeConvo::with_resolver(resolver)) } + /// A session whose first assistant API message is split across three + /// JSONL lines (text, then one per tool_use) — the on-disk shape Claude + /// Code writes. Each line repeats the same `message.id` and the full + /// message-level `usage`, followed by a singleton assistant message. + fn setup_split_message_provider() -> (TempDir, ClaudeConvo) { + let temp = TempDir::new().unwrap(); + let claude_dir = temp.path().join(".claude"); + let project_dir = claude_dir.join("projects/-test-project"); + fs::create_dir_all(&project_dir).unwrap(); + + let usage_a = r#"{"input_tokens":6,"output_tokens":997,"cache_read_input_tokens":14842,"cache_creation_input_tokens":429831}"#; + let entries = [ + r#"{"uuid":"uuid-1","type":"user","timestamp":"2024-01-01T00:00:00Z","message":{"role":"user","content":"Fix the bug"}}"#.to_string(), + format!( + r#"{{"uuid":"uuid-2","type":"assistant","parentUuid":"uuid-1","timestamp":"2024-01-01T00:00:01Z","message":{{"id":"msg_A","role":"assistant","content":[{{"type":"text","text":"Working on it."}}],"model":"claude-opus-4-7","stop_reason":null,"usage":{usage_a}}}}}"# + ), + format!( + r#"{{"uuid":"uuid-3","type":"assistant","parentUuid":"uuid-2","timestamp":"2024-01-01T00:00:02Z","message":{{"id":"msg_A","role":"assistant","content":[{{"type":"tool_use","id":"t1","name":"Read","input":{{"file_path":"a.rs"}}}}],"model":"claude-opus-4-7","stop_reason":null,"usage":{usage_a}}}}}"# + ), + format!( + r#"{{"uuid":"uuid-4","type":"assistant","parentUuid":"uuid-3","timestamp":"2024-01-01T00:00:03Z","message":{{"id":"msg_A","role":"assistant","content":[{{"type":"tool_use","id":"t2","name":"Read","input":{{"file_path":"b.rs"}}}}],"model":"claude-opus-4-7","stop_reason":"tool_use","usage":{usage_a}}}}}"# + ), + r#"{"uuid":"uuid-5","type":"assistant","parentUuid":"uuid-4","timestamp":"2024-01-01T00:00:04Z","message":{"id":"msg_B","role":"assistant","content":[{"type":"text","text":"Done."}],"model":"claude-opus-4-7","stop_reason":"end_turn","usage":{"input_tokens":5,"output_tokens":11}}}"#.to_string(), + ]; + fs::write(project_dir.join("session-2.jsonl"), entries.join("\n")).unwrap(); + + let resolver = PathResolver::new().with_claude_dir(&claude_dir); + (temp, ClaudeConvo::with_resolver(resolver)) + } + + #[test] + fn test_split_message_turns_share_group_id() { + let (_temp, provider) = setup_split_message_provider(); + let view = ConversationProvider::load_conversation(&provider, "/test/project", "session-2") + .unwrap(); + + assert_eq!(view.turns.len(), 5); + assert!(view.turns[0].group_id.is_none(), "user lines carry no ID"); + for turn in &view.turns[1..=3] { + assert_eq!(turn.group_id.as_deref(), Some("msg_A")); + } + assert_eq!(view.turns[4].group_id.as_deref(), Some("msg_B")); + } + + #[test] + fn test_view_usage_is_canonical_total_on_group_final_turn() { + // IR contract: `Turn.token_usage` always means "the message's + // total" and appears only on the message's final turn. The wire + // repeats the total on every line of a split; the view must not. + let (_temp, provider) = setup_split_message_provider(); + let view = ConversationProvider::load_conversation(&provider, "/test/project", "session-2") + .unwrap(); + + assert!(view.turns[1].token_usage.is_none()); + assert!(view.turns[2].token_usage.is_none()); + assert_eq!( + view.turns[3].token_usage.as_ref().unwrap().output_tokens, + Some(997) + ); + assert_eq!( + view.turns[4].token_usage.as_ref().unwrap().output_tokens, + Some(11) + ); + } + + #[test] + fn test_total_usage_counts_each_message_once() { + let (_temp, provider) = setup_split_message_provider(); + let view = ConversationProvider::load_conversation(&provider, "/test/project", "session-2") + .unwrap(); + + // msg_A's usage appears on three lines but is one API message; + // totals must be msg_A + msg_B, not 3×msg_A + msg_B. + let total = view.total_usage.as_ref().unwrap(); + assert_eq!(total.output_tokens, Some(997 + 11)); + assert_eq!(total.input_tokens, Some(6 + 5)); + assert_eq!(total.cache_read_tokens, Some(14_842)); + assert_eq!(total.cache_write_tokens, Some(429_831)); + } + #[test] fn test_load_conversation_assembles_tool_results() { let (_temp, provider) = setup_provider(); @@ -1166,6 +1427,7 @@ mod tests { let mut turns = vec![Turn { id: "t1".into(), parent_id: None, + group_id: None, role: Role::Assistant, timestamp: "2024-01-01T00:00:00Z".into(), text: "test".into(), @@ -1189,6 +1451,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), diff --git a/crates/toolpath-cli/Cargo.toml b/crates/toolpath-cli/Cargo.toml index f103325..da3b709 100644 --- a/crates/toolpath-cli/Cargo.toml +++ b/crates/toolpath-cli/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath-cli" -version = "0.13.1" +version = "0.14.0" edition = "2024" license = "Apache-2.0" repository = "https://github.com/empathic/toolpath" @@ -14,7 +14,7 @@ name = "path" path = "src/main.rs" [dependencies] -path-cli = { path = "../path-cli", version = "0.13.1" } +path-cli = { path = "../path-cli", version = "0.14.0" } anyhow = "1.0" [workspace] diff --git a/crates/toolpath-codex/Cargo.toml b/crates/toolpath-codex/Cargo.toml index 67c01d4..edf70bb 100644 --- a/crates/toolpath-codex/Cargo.toml +++ b/crates/toolpath-codex/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath-codex" -version = "0.5.0" +version = "0.6.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/toolpath-codex/src/project.rs b/crates/toolpath-codex/src/project.rs index 8efc2c1..2ec3dd5 100644 --- a/crates/toolpath-codex/src/project.rs +++ b/crates/toolpath-codex/src/project.rs @@ -139,18 +139,6 @@ fn project_view( // Line 1: session_meta. Codex always writes this first. lines.push(make_session_meta_line(cfg, view, &session_timestamp, &cwd)); - // Line 2: turn_context. Codex writes one per turn; we emit a - // single one up front since cross-harness turns don't change - // per-turn context. (For Codex→View→Codex, the original per-turn - // contexts live on `view.events` and are NOT round-tripped today - // — a pragmatic loss for cross-harness output.) - lines.push(make_turn_context_line( - view, - &session_timestamp, - &cwd, - &model, - )); - // Find the last assistant turn so we can mark it `phase: "final"`. // Codex annotates every other assistant turn with `phase: "commentary"`, // matching what real rollouts look like. @@ -159,10 +147,46 @@ fn project_view( .iter() .rposition(|t| matches!(t.role, Role::Assistant)); + // A turn's group ID is its `group_id`; an assistant turn without one + // is its own group (a unique synthesized ID) so its own total survives. + let group_of = |idx: usize, turn: &Turn| -> String { + turn.group_id + .clone() + .unwrap_or_else(|| format!("{}-t{}", view.id, idx)) + }; + + // Line 2: an opening turn_context (real Codex writes it right after + // session_meta, before the first user turn). Its turn_id is the first + // group's, so leading user turns and the first assistant share it; later + // group boundaries emit their own. This is what makes the source's + // grouping survive the round-trip — the reader keys `Turn.group_id` + // off the turn_context `turn_id`. + let first_group = view + .turns + .iter() + .enumerate() + .find(|(_, t)| matches!(t.role, Role::Assistant)) + .map(|(i, t)| group_of(i, t)) + .unwrap_or_else(|| view.id.clone()); + lines.push(make_turn_context_line(&first_group, &session_timestamp, &cwd, &model)); + let mut current_group = Some(first_group); + + // Running session-cumulative usage. Codex's `total_token_usage` is + // cumulative; we advance it by each turn's per-step contribution and + // emit it after the turn, so a re-read differences it back to the same + // per-step spend. + let mut running = toolpath_convo::TokenUsage::default(); for (idx, turn) in view.turns.iter().enumerate() { + if matches!(turn.role, Role::Assistant) { + let group = group_of(idx, turn); + if current_group.as_deref() != Some(&group) { + lines.push(make_turn_context_line(&group, &turn.timestamp, &cwd, &model)); + current_group = Some(group); + } + } let codex = codex_extras(turn).cloned().unwrap_or_default(); let is_final_assistant = Some(idx) == last_assistant_idx; - emit_turn_lines(turn, &codex, is_final_assistant, &cwd, &mut lines); + emit_turn_lines(turn, &codex, is_final_assistant, &cwd, &mut lines, &mut running); } Ok(crate::types::Session { @@ -211,14 +235,13 @@ fn make_session_meta_line( } fn make_turn_context_line( - view: &ConversationView, + turn_id: &str, timestamp: &str, cwd: &str, model: &str, ) -> RolloutLine { - let turn_id = view.id.clone(); let tc = TurnContext { - turn_id, + turn_id: turn_id.to_string(), cwd: PathBuf::from(cwd), current_date: None, timezone: None, @@ -251,10 +274,13 @@ fn emit_turn_lines( is_final_assistant: bool, session_cwd: &str, lines: &mut Vec, + running: &mut toolpath_convo::TokenUsage, ) { match &turn.role { Role::User => emit_user_message(turn, lines), - Role::Assistant => emit_assistant(turn, codex, is_final_assistant, session_cwd, lines), + Role::Assistant => { + emit_assistant(turn, codex, is_final_assistant, session_cwd, lines, running) + } Role::System => emit_developer_message(turn, lines), Role::Other(_) => { // Unknown roles don't have a clean Codex analog; emit them @@ -326,6 +352,7 @@ fn emit_assistant( is_final_assistant: bool, session_cwd: &str, lines: &mut Vec, + running: &mut toolpath_convo::TokenUsage, ) { // Order matches what Codex itself emits per turn: // reasoning? → message → (function_call → function_call_output)* @@ -376,23 +403,6 @@ fn emit_assistant( )); } - // The forward path's `pending_token_usage` attaches to the next turn - // pushed, so this `token_count` event must precede the assistant - // message line below. - if let Some(usage) = &turn.token_usage { - lines.push(event_msg_line( - &turn.timestamp, - json!({ - "type": "token_count", - "info": { - "total_token_usage": convo_usage_to_codex_json(usage), - "last_token_usage": convo_usage_to_codex_json(usage), - }, - "rate_limits": Value::Null, - }), - )); - } - // The TUI gates scrollback rendering on `final_answer` exactly — // `final` would silently drop the closing message from view. let phase = Some(if is_final_assistant { @@ -447,6 +457,42 @@ fn emit_assistant( let name = tool_native_name(tu); emit_tool_call(turn, tu, &name, &tool_extras, session_cwd, lines); } + + // Advance the session-cumulative counter by this step's contribution + // (its attributed per-step spend, or its group total when no per-step + // split exists), then emit `token_count` AFTER the turn — the reader + // differences the cumulative and attributes the delta to the step it + // follows. Mirrors how real Codex streams cumulative counts per step. + if let Some(contribution) = turn + .attributed_token_usage + .as_ref() + .or(turn.token_usage.as_ref()) + { + add_codex_usage(running, contribution); + lines.push(event_msg_line( + &turn.timestamp, + json!({ + "type": "token_count", + "info": { + "total_token_usage": convo_usage_to_codex_json(running), + }, + "rate_limits": Value::Null, + }), + )); + } +} + +/// Component-wise `acc += delta` on the convo usage type (None as 0). +fn add_codex_usage(acc: &mut toolpath_convo::TokenUsage, delta: &toolpath_convo::TokenUsage) { + let add = |a: &mut Option, b: Option| { + if let Some(b) = b { + *a = Some(a.unwrap_or(0) + b); + } + }; + add(&mut acc.input_tokens, delta.input_tokens); + add(&mut acc.output_tokens, delta.output_tokens); + add(&mut acc.cache_read_tokens, delta.cache_read_tokens); + add(&mut acc.cache_write_tokens, delta.cache_write_tokens); } fn emit_tool_call( @@ -655,6 +701,7 @@ mod tests { Turn { id: id.into(), parent_id: None, + group_id: None, role: Role::User, timestamp: "2026-04-20T16:00:00.000Z".into(), text: text.into(), @@ -663,6 +710,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -673,6 +721,7 @@ mod tests { Turn { id: id.into(), parent_id: None, + group_id: None, role: Role::Assistant, timestamp: "2026-04-20T16:00:01.000Z".into(), text: text.into(), @@ -685,7 +734,9 @@ mod tests { output_tokens: Some(50), cache_read_tokens: None, cache_write_tokens: None, + ..Default::default() }), + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -786,26 +837,27 @@ mod tests { .project(&view_with(vec![t])) .unwrap(); let inner = inner_types(&s); - // session_meta + turn_context + token_count + message + agent_message - // + function_call + function_call_output + exec_command_end. - // The token_count must precede the assistant message so the - // forward path's pending_token_usage attaches to this turn. + // session_meta + turn_context + message + agent_message + // + function_call + function_call_output + exec_command_end + // + token_count. The token_count follows the turn (the reader + // differences the cumulative and attributes the delta to the step + // it follows); the per-step spend is the turn's contribution. assert_eq!( inner, vec![ "", "", - "token_count", "message", "agent_message", "function_call", "function_call_output", - "exec_command_end" + "exec_command_end", + "token_count" ] ); // FunctionCall.arguments is a JSON STRING, not a parsed value. - let fc_payload = &s.lines[5].payload; + let fc_payload = &s.lines[4].payload; assert_eq!(fc_payload["type"], "function_call"); assert_eq!(fc_payload["call_id"], "call_001"); assert_eq!(fc_payload["name"], "exec_command"); @@ -813,13 +865,13 @@ mod tests { let parsed: Value = serde_json::from_str(args).unwrap(); assert_eq!(parsed["cmd"], "pwd"); - let fco_payload = &s.lines[6].payload; + let fco_payload = &s.lines[5].payload; assert_eq!(fco_payload["type"], "function_call_output"); assert_eq!(fco_payload["call_id"], "call_001"); assert_eq!(fco_payload["output"], "/tmp\n"); // exec_command_end: TUI counterpart with the aggregated output. - let exec = &s.lines[7].payload; + let exec = &s.lines[6].payload; assert_eq!(exec["type"], "exec_command_end"); assert_eq!(exec["call_id"], "call_001"); assert_eq!(exec["aggregated_output"], "/tmp\n"); diff --git a/crates/toolpath-codex/src/provider.rs b/crates/toolpath-codex/src/provider.rs index a0e2bc7..00bcdf4 100644 --- a/crates/toolpath-codex/src/provider.rs +++ b/crates/toolpath-codex/src/provider.rs @@ -17,8 +17,17 @@ //! 6. `event_msg.patch_apply_end` is captured on the current turn's //! `extra["codex"]["patch_changes"]` — the derive layer consumes it //! for file-artifact sibling changes. -//! 7. `event_msg.token_count` populates `Turn.token_usage` on the next -//! assistant turn emitted. +//! 7. Token accounting. `turn_context` / `task_started` open an API round +//! (`turn_id`); assistant turns in it share that ID as `Turn.group_id`. +//! `event_msg.token_count` carries the SESSION-cumulative +//! `total_token_usage`; each step's spend is the increase since the +//! previous count — differencing the cumulative is dedup-safe (Codex +//! emits each count twice; a repeated total is a 0 delta) where summing +//! `last_token_usage` would double. Each delta is attributed to the step +//! it follows (`Turn.attributed_token_usage`); `finalize_usage` then +//! sets each group's total `Turn.token_usage` to the sum of its +//! attributions, on the group's final turn — one source of truth, so +//! `Σ token_usage == Σ attributed ==` session total. //! 8. Everything else (`task_started`, `task_complete`, `turn_context`, //! `user_message`/`agent_message` duplicates, unknown events) lands //! in `ConversationView.events` as a typed [`ConversationEvent`]. @@ -28,7 +37,7 @@ use std::collections::HashMap; use crate::io::ConvoIO; use crate::types::{ EventMsg, ExecCommandEnd, Message, PatchApplyEnd, PatchChange, ResponseItem, RolloutItem, - Session, TokenCountInfo, TokenUsage as CodexTokenUsage, + Session, TokenCountInfo, }; use serde_json::Value; use toolpath_convo::{ @@ -180,7 +189,13 @@ struct Builder<'a> { /// Plaintext reasoning summaries (rare — only in configurations where /// OpenAI exposes public reasoning). These land on `Turn.thinking`. pending_reasoning_plaintext: Vec, - pending_token_usage: Option, + /// The current API round (Codex "turn"), from `turn_context` / + /// `task_started`. Assistant turns emitted during a round share it as + /// their `group_id`. + current_round_id: Option, + /// Per-step spend awaiting an assistant turn to attach to (a token_count + /// arriving before this round's first assistant turn exists). + pending_attributed: Option, working_dir: Option, current_model: Option, call_index: HashMap, @@ -197,7 +212,8 @@ impl<'a> Builder<'a> { turns: Vec::new(), events: Vec::new(), pending_reasoning_plaintext: Vec::new(), - pending_token_usage: None, + current_round_id: None, + pending_attributed: None, working_dir: None, current_model: None, call_index: HashMap::new(), @@ -220,6 +236,7 @@ impl<'a> Builder<'a> { )); } RolloutItem::TurnContext(tc) => { + self.start_round(&tc.turn_id); if let Some(m) = &tc.model { self.current_model = Some(m.clone()); } @@ -252,6 +269,9 @@ impl<'a> Builder<'a> { } } + // Compute message-group totals from per-step attributions. + self.finalize_usage(); + // Path-level base context from session_meta (cwd + git). let meta = self.session.meta(); let base = { @@ -315,7 +335,7 @@ impl<'a> Builder<'a> { // `-`, which collides when codex emits // multiple events of the same type at the same timestamp (rare // but real). Suffix duplicates with their position so each step - // gets a unique id. + // gets a unique ID. let mut seen: std::collections::HashSet = std::collections::HashSet::new(); for t in &self.turns { seen.insert(t.id.clone()); @@ -419,10 +439,22 @@ impl<'a> Builder<'a> { match ev { EventMsg::TokenCount(tc) => { if let Some(info) = tc.info.as_ref() { + // `total_token_usage` is the SESSION-cumulative counter; + // the spend of the step that just completed is the + // increase since the previous count. Differencing the + // cumulative (not summing `last_token_usage`) is + // dedup-safe: Codex emits each token_count twice, so a + // repeated total contributes a 0 delta instead of + // double-counting. The delta accrues to the round total + // (a per-step `token_usage` sum can't exceed it) and is + // attributed to the step it follows — for Codex every + // field is per-step, since each call re-sends context. + let prev_total = self.total_usage.clone(); apply_token_count(&mut self.total_usage, info); self.total_usage_set = true; - if let Some(total) = info.total_token_usage.as_ref() { - self.pending_token_usage = Some(codex_usage_to_convo(total)); + let delta = usage_delta(&self.total_usage, &prev_total); + if !is_usage_zero(&delta) { + self.attribute_delta(delta); } } self.events @@ -438,10 +470,22 @@ impl<'a> Builder<'a> { self.events .push(event_from_raw(timestamp, "patch_apply_end", raw_payload)); } - EventMsg::AgentMessage(_) - | EventMsg::UserMessage(_) - | EventMsg::TaskStarted(_) - | EventMsg::TaskComplete(_) => { + EventMsg::TaskStarted(payload) => { + if let Some(tid) = payload.get("turn_id").and_then(|v| v.as_str()) { + self.start_round(tid); + } + self.events + .push(event_from_raw(timestamp, "task_started", raw_payload)); + } + EventMsg::TaskComplete(_) => { + // Round over: anything after the boundary is outside the + // round, so the grouping key resets. Totals are computed + // once in `finalize_usage`. + self.current_round_id = None; + self.events + .push(event_from_raw(timestamp, "task_complete", raw_payload)); + } + EventMsg::AgentMessage(_) | EventMsg::UserMessage(_) => { self.events .push(event_from_raw(timestamp, ev.kind(), raw_payload)); } @@ -464,13 +508,12 @@ impl<'a> Builder<'a> { let turn_idx = match self.last_assistant_turn_index() { Some(idx) => idx, None => { - let mut t = synthetic_assistant_turn( + let t = synthetic_assistant_turn( timestamp, self.working_dir.as_deref(), self.current_model.as_deref(), ); - self.drain_pending_onto(&mut t); - self.turns.push(t); + self.push_turn(t); self.turns.len() - 1 } }; @@ -561,6 +604,9 @@ impl<'a> Builder<'a> { fn push_turn(&mut self, mut turn: Turn) { self.drain_pending_onto(&mut turn); + if turn.role == Role::Assistant && turn.group_id.is_none() { + turn.group_id = self.current_round_id.clone(); + } self.turns.push(turn); } @@ -573,8 +619,93 @@ impl<'a> Builder<'a> { turn.thinking = Some(self.pending_reasoning_plaintext.join("\n\n")); self.pending_reasoning_plaintext.clear(); } - if let Some(tu) = self.pending_token_usage.take() { - turn.token_usage = Some(tu); + // A step's spend that arrived before any assistant turn existed + // attaches to this, the first one. + if let Some(pending) = self.pending_attributed.take() { + add_usage(turn.attributed_token_usage.get_or_insert_with(TokenUsage::default), &pending); + } + } + + /// Attribute one step's spend to the most recent assistant turn **of the + /// current round** (the step the `token_count` followed). If this round + /// has no assistant turn yet, buffer it for the round's first one — + /// never leak a round's spend onto a prior round's turn. + fn attribute_delta(&mut self, delta: TokenUsage) { + let target = self + .turns + .iter() + .enumerate() + .rev() + .find(|(_, t)| t.role == Role::Assistant) + .filter(|(_, t)| t.group_id == self.current_round_id) + .map(|(i, _)| i); + match target { + Some(idx) => add_usage( + self.turns[idx] + .attributed_token_usage + .get_or_insert_with(TokenUsage::default), + &delta, + ), + None => match &mut self.pending_attributed { + Some(acc) => add_usage(acc, &delta), + None => self.pending_attributed = Some(delta), + }, + } + } + + /// Begin a new API round; later assistant turns share `round_id` as + /// their `group_id`. Totals are computed once in [`Self::finalize_usage`]. + fn start_round(&mut self, round_id: &str) { + if round_id.is_empty() || self.current_round_id.as_deref() == Some(round_id) { + return; + } + self.current_round_id = Some(round_id.to_string()); + } + + /// Set each message group's total `token_usage` to the sum of its + /// turns' per-step attributions, on the group's final turn (the kind's + /// once-per-group rule). One source of truth — the group total and its + /// per-step shares can't drift, and `Σ token_usage == Σ attributed ==` + /// session total. A run of assistant turns sharing a `group_id` is one + /// round; an assistant turn without one is its own group. + fn finalize_usage(&mut self) { + // A step's spend that arrived after the last assistant turn (no + // later turn to drain onto) still belongs to that turn. + if let Some(pending) = self.pending_attributed.take() + && let Some(idx) = self.turns.iter().rposition(|t| t.role == Role::Assistant) + { + add_usage( + self.turns[idx] + .attributed_token_usage + .get_or_insert_with(TokenUsage::default), + &pending, + ); + } + + let assistants: Vec = (0..self.turns.len()) + .filter(|&i| self.turns[i].role == Role::Assistant) + .collect(); + let mut k = 0; + while k < assistants.len() { + let start = k; + let mid = self.turns[assistants[k]].group_id.clone(); + if mid.is_some() { + while k + 1 < assistants.len() + && self.turns[assistants[k + 1]].group_id == mid + { + k += 1; + } + } + let mut total: Option = None; + for &gi in &assistants[start..=k] { + if let Some(a) = &self.turns[gi].attributed_token_usage { + add_usage(total.get_or_insert_with(TokenUsage::default), a); + } + } + if let Some(total) = total { + self.turns[assistants[k]].token_usage = Some(total); + } + k += 1; } } @@ -678,6 +809,7 @@ fn message_to_turn( Turn { id: msg.id.clone().unwrap_or_default(), parent_id: None, + group_id: None, role: role.clone(), timestamp: timestamp.to_string(), text, @@ -690,6 +822,7 @@ fn message_to_turn( }, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment, delegations: Vec::new(), file_mutations: Vec::new(), @@ -704,6 +837,7 @@ fn synthetic_assistant_turn( Turn { id: format!("synth-{}", timestamp), parent_id: None, + group_id: None, role: Role::Assistant, timestamp: timestamp.to_string(), text: String::new(), @@ -712,6 +846,7 @@ fn synthetic_assistant_turn( model: model.map(str::to_string), stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: working_dir.map(|wd| EnvironmentSnapshot { working_dir: Some(wd.to_string()), vcs_branch: None, @@ -722,13 +857,69 @@ fn synthetic_assistant_turn( } } -fn codex_usage_to_convo(u: &CodexTokenUsage) -> TokenUsage { - TokenUsage { - input_tokens: u.input_tokens, - output_tokens: u.output_tokens, - cache_read_tokens: u.cached_input_tokens, - cache_write_tokens: None, +/// Component-wise `acc += delta`, treating `None` as 0 on the addend. +/// Breakdowns merge key-wise (inner values add) rather than overwrite, so a +/// round's per-call reasoning slices accumulate into the group total. +fn add_usage(acc: &mut TokenUsage, delta: &TokenUsage) { + let add = |a: &mut Option, b: Option| { + if let Some(b) = b { + *a = Some(a.unwrap_or(0) + b); + } + }; + add(&mut acc.input_tokens, delta.input_tokens); + add(&mut acc.output_tokens, delta.output_tokens); + add(&mut acc.cache_read_tokens, delta.cache_read_tokens); + add(&mut acc.cache_write_tokens, delta.cache_write_tokens); + for (class, inner) in &delta.breakdowns { + let target = acc.breakdowns.entry(class.clone()).or_default(); + for (sub, n) in inner { + *target.entry(sub.clone()).or_insert(0) += *n; + } + } +} + +/// True when every counter is absent or zero (no real spend to record). +fn is_usage_zero(u: &TokenUsage) -> bool { + [ + u.input_tokens, + u.output_tokens, + u.cache_read_tokens, + u.cache_write_tokens, + ] + .iter() + .all(|f| f.unwrap_or(0) == 0) +} + +/// Component-wise `current - prev`, for recovering a round's spend from +/// successive cumulative totals. Saturating: a counter reset (e.g. after +/// compaction) yields 0 rather than wrapping. +fn usage_delta(current: &TokenUsage, prev: &TokenUsage) -> TokenUsage { + let sub = |c: Option, p: Option| c.map(|c| c.saturating_sub(p.unwrap_or(0))); + let mut delta = TokenUsage { + input_tokens: sub(current.input_tokens, prev.input_tokens), + output_tokens: sub(current.output_tokens, prev.output_tokens), + cache_read_tokens: sub(current.cache_read_tokens, prev.cache_read_tokens), + cache_write_tokens: sub(current.cache_write_tokens, prev.cache_write_tokens), + ..Default::default() + }; + // Breakdowns (e.g. output→reasoning) are cumulative subsets of their + // parent class, so difference them the same saturating way. Only retain + // sub-classes whose delta is > 0 so a flat round stays breakdown-free. + for (class, inner) in ¤t.breakdowns { + let prev_inner = prev.breakdowns.get(class); + let mut diffed: std::collections::BTreeMap = Default::default(); + for (sub, cur) in inner { + let p = prev_inner.and_then(|m| m.get(sub)).copied().unwrap_or(0); + let d = cur.saturating_sub(p); + if d > 0 { + diffed.insert(sub.clone(), d); + } + } + if !diffed.is_empty() { + delta.breakdowns.insert(class.clone(), diffed); + } } + delta } fn apply_token_count(total: &mut TokenUsage, info: &TokenCountInfo) { @@ -736,6 +927,17 @@ fn apply_token_count(total: &mut TokenUsage, info: &TokenCountInfo) { total.input_tokens = t.input_tokens.or(total.input_tokens); total.output_tokens = t.output_tokens.or(total.output_tokens); total.cache_read_tokens = t.cached_input_tokens.or(total.cache_read_tokens); + // `reasoning_output_tokens` ⊆ `output_tokens` (informational); carry the + // cumulative reasoning counter under breakdowns["output"]["reasoning"] + // so `usage_delta` differences it per call just like the others. Only + // record it when present and > 0 to keep zero-reasoning rounds clean. + if let Some(r) = t.reasoning_output_tokens.filter(|&r| r > 0) { + total + .breakdowns + .entry("output".to_string()) + .or_default() + .insert("reasoning".to_string(), r); + } } } @@ -883,6 +1085,206 @@ mod tests { assert_eq!(view.turns[1].model.as_deref(), Some("gpt-5.4")); } + /// Two API rounds. Codex's `token_count` events carry cumulative + /// session totals in `total_token_usage` and the round's own spend in + /// `last_token_usage`; per-turn accounting must use the latter. + fn two_round_session(with_last: bool) -> String { + let last1 = r#","last_token_usage":{"input_tokens":100,"output_tokens":20,"cached_input_tokens":10,"total_tokens":130}"#; + let last2 = r#","last_token_usage":{"input_tokens":200,"output_tokens":30,"cached_input_tokens":30,"total_tokens":260}"#; + [ + r#"{"timestamp":"2026-04-20T16:44:37.772Z","type":"session_meta","payload":{"id":"019dabc6-8fef-7681-a054-b5bb75fcb97d","timestamp":"2026-04-20T16:43:30.171Z","cwd":"/tmp/proj","originator":"codex-tui","cli_version":"0.118.0","source":"cli"}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:37.773Z","type":"turn_context","payload":{"turn_id":"t1","cwd":"/tmp/proj","model":"gpt-5.4"}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:37.800Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"round one"}]}}"#.to_string(), + format!( + r#"{{"timestamp":"2026-04-20T16:44:38.800Z","type":"event_msg","payload":{{"type":"token_count","info":{{"total_token_usage":{{"input_tokens":100,"output_tokens":20,"cached_input_tokens":10,"total_tokens":130}}{}}}}}}}"#, + if with_last { last1 } else { "" } + ), + r#"{"timestamp":"2026-04-20T16:44:38.900Z","type":"response_item","payload":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"first"}],"phase":"final","end_turn":true}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:39.700Z","type":"turn_context","payload":{"turn_id":"t2","cwd":"/tmp/proj","model":"gpt-5.4"}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:39.800Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"round two"}]}}"#.to_string(), + format!( + r#"{{"timestamp":"2026-04-20T16:44:40.800Z","type":"event_msg","payload":{{"type":"token_count","info":{{"total_token_usage":{{"input_tokens":300,"output_tokens":50,"cached_input_tokens":40,"total_tokens":390}}{}}}}}}}"#, + if with_last { last2 } else { "" } + ), + r#"{"timestamp":"2026-04-20T16:44:40.900Z","type":"response_item","payload":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"second"}],"phase":"final","end_turn":true}}"#.to_string(), + ] + .join("\n") + } + + #[test] + fn turn_usage_is_per_round_delta_from_last_token_usage() { + let (_t, mgr, id) = setup_session_fixture(&two_round_session(true)); + let view = to_view(&mgr.read_session(&id).unwrap()); + + let first = view.turns[1].token_usage.as_ref().unwrap(); + assert_eq!(first.input_tokens, Some(100)); + assert_eq!(first.output_tokens, Some(20)); + assert_eq!(first.cache_read_tokens, Some(10)); + + let second = view.turns[3].token_usage.as_ref().unwrap(); + assert_eq!(second.input_tokens, Some(200)); + assert_eq!(second.output_tokens, Some(30)); + assert_eq!(second.cache_read_tokens, Some(30)); + + // Session total stays the final cumulative counter. + let total = view.total_usage.as_ref().unwrap(); + assert_eq!(total.input_tokens, Some(300)); + assert_eq!(total.output_tokens, Some(50)); + } + + #[test] + fn per_step_attribution_from_deduped_cumulative_deltas() { + // Real Codex emits each token_count TWICE (identical values). Per-step + // spend must come from differencing the cumulative total — a repeated + // total yields a 0 delta — never from summing, which would double. + // Two tool calls in one round: cumulative output 0->40->100, so the + // steps cost 40 and 60; the round total is 100. + let dup = |total_out: u32, total_in: u32| { + format!( + r#"{{"timestamp":"2026-04-20T16:44:38.800Z","type":"event_msg","payload":{{"type":"token_count","info":{{"total_token_usage":{{"input_tokens":{total_in},"output_tokens":{total_out},"cached_input_tokens":0,"total_tokens":{}}}}}}}}}"#, + total_in + total_out + ) + }; + let body = [ + r#"{"timestamp":"2026-04-20T16:44:37.772Z","type":"session_meta","payload":{"id":"019dabc6-8fef-7681-a054-b5bb75fcb97d","timestamp":"2026-04-20T16:43:30.171Z","cwd":"/tmp/proj","originator":"codex-tui","cli_version":"0.118.0","source":"cli"}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:37.773Z","type":"turn_context","payload":{"turn_id":"r1","cwd":"/tmp/proj","model":"gpt-5.4"}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:37.800Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"go"}]}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:38.100Z","type":"response_item","payload":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"step one"}],"phase":"commentary"}}"#.to_string(), + dup(40, 10), dup(40, 10), // step 1: out 40 (emitted twice) + r#"{"timestamp":"2026-04-20T16:44:38.900Z","type":"response_item","payload":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"step two"}],"phase":"final","end_turn":true}}"#.to_string(), + dup(100, 20), dup(100, 20), // step 2: out 100-40=60 (emitted twice) + r#"{"timestamp":"2026-04-20T16:44:39.000Z","type":"event_msg","payload":{"type":"task_complete","turn_id":"r1"}}"#.to_string(), + ].join("\n"); + let (_t, mgr, id) = setup_session_fixture(&body); + let view = to_view(&mgr.read_session(&id).unwrap()); + + let assistants: Vec<&Turn> = view.turns.iter().filter(|t| t.role == Role::Assistant).collect(); + assert_eq!(assistants.len(), 2); + // Per-step attribution: 40 then 60 — NOT 80/120 (which doubling gives). + assert_eq!(assistants[0].attributed_token_usage.as_ref().unwrap().output_tokens, Some(40)); + assert_eq!(assistants[1].attributed_token_usage.as_ref().unwrap().output_tokens, Some(60)); + // Σ attributed == round total on the final turn. + assert_eq!(assistants[1].token_usage.as_ref().unwrap().output_tokens, Some(100)); + let sum: u32 = assistants.iter().filter_map(|t| t.attributed_token_usage.as_ref()?.output_tokens).sum(); + assert_eq!(sum, 100); + } + + /// Read the `breakdowns["output"]["reasoning"]` slice off a usage, or None. + fn reasoning_of(u: Option<&TokenUsage>) -> Option { + u?.breakdowns.get("output")?.get("reasoning").copied() + } + + #[test] + fn reasoning_breakdown_is_per_step_delta_and_round_sum() { + // `reasoning_output_tokens` is a SUBSET of output and rides on the + // cumulative `total_token_usage`. It must be differenced exactly like + // output: cumulative reasoning 0->100->260 ⇒ step deltas 100 then 160, + // and the round total carries their sum (260) under + // breakdowns["output"]["reasoning"]. Each token_count is emitted twice + // (dedup-safe: a repeated total yields a 0 delta). + let dup = |total_out: u32, total_reason: u32| { + format!( + r#"{{"timestamp":"2026-04-20T16:44:38.800Z","type":"event_msg","payload":{{"type":"token_count","info":{{"total_token_usage":{{"input_tokens":10,"output_tokens":{total_out},"reasoning_output_tokens":{total_reason},"cached_input_tokens":0,"total_tokens":{}}}}}}}}}"#, + 10 + total_out + ) + }; + let body = [ + r#"{"timestamp":"2026-04-20T16:44:37.772Z","type":"session_meta","payload":{"id":"019dabc6-8fef-7681-a054-b5bb75fcb97d","timestamp":"2026-04-20T16:43:30.171Z","cwd":"/tmp/proj","originator":"codex-tui","cli_version":"0.118.0","source":"cli"}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:37.773Z","type":"turn_context","payload":{"turn_id":"r1","cwd":"/tmp/proj","model":"gpt-5.4"}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:37.800Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"go"}]}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:38.100Z","type":"response_item","payload":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"step one"}],"phase":"commentary"}}"#.to_string(), + dup(200, 100), dup(200, 100), // step 1: output 200, reasoning 100 + r#"{"timestamp":"2026-04-20T16:44:38.900Z","type":"response_item","payload":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"step two"}],"phase":"final","end_turn":true}}"#.to_string(), + dup(500, 260), dup(500, 260), // step 2: output 300, reasoning 160 + r#"{"timestamp":"2026-04-20T16:44:39.000Z","type":"event_msg","payload":{"type":"task_complete","turn_id":"r1"}}"#.to_string(), + ].join("\n"); + let (_t, mgr, id) = setup_session_fixture(&body); + let view = to_view(&mgr.read_session(&id).unwrap()); + + let assistants: Vec<&Turn> = view.turns.iter().filter(|t| t.role == Role::Assistant).collect(); + assert_eq!(assistants.len(), 2); + // Per-step reasoning deltas, NOT cumulative (100/260) and NOT doubled. + assert_eq!(reasoning_of(assistants[0].attributed_token_usage.as_ref()), Some(100)); + assert_eq!(reasoning_of(assistants[1].attributed_token_usage.as_ref()), Some(160)); + // Round total breakdown is the sum of attributions. + let round = assistants[1].token_usage.as_ref().unwrap(); + assert_eq!(reasoning_of(Some(round)), Some(260)); + // Invariant: Σ(reasoning) ≤ output. + assert!(260 <= round.output_tokens.unwrap()); + } + + #[test] + fn zero_reasoning_produces_no_breakdown_entry() { + // A round whose cumulative reasoning never rises (absent or 0) must + // leave breakdowns empty so the field is omitted on the wire. + let dup = |total_out: u32| { + format!( + r#"{{"timestamp":"2026-04-20T16:44:38.800Z","type":"event_msg","payload":{{"type":"token_count","info":{{"total_token_usage":{{"input_tokens":10,"output_tokens":{total_out},"reasoning_output_tokens":0,"cached_input_tokens":0,"total_tokens":{}}}}}}}}}"#, + 10 + total_out + ) + }; + let body = [ + r#"{"timestamp":"2026-04-20T16:44:37.772Z","type":"session_meta","payload":{"id":"019dabc6-8fef-7681-a054-b5bb75fcb97d","timestamp":"2026-04-20T16:43:30.171Z","cwd":"/tmp/proj","originator":"codex-tui","cli_version":"0.118.0","source":"cli"}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:37.773Z","type":"turn_context","payload":{"turn_id":"r1","cwd":"/tmp/proj","model":"gpt-5.4"}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:37.800Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"go"}]}}"#.to_string(), + r#"{"timestamp":"2026-04-20T16:44:38.100Z","type":"response_item","payload":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"answer"}],"phase":"final","end_turn":true}}"#.to_string(), + dup(40), dup(40), + r#"{"timestamp":"2026-04-20T16:44:39.000Z","type":"event_msg","payload":{"type":"task_complete","turn_id":"r1"}}"#.to_string(), + ].join("\n"); + let (_t, mgr, id) = setup_session_fixture(&body); + let view = to_view(&mgr.read_session(&id).unwrap()); + let a = view.turns.iter().find(|t| t.role == Role::Assistant).unwrap(); + assert!(a.attributed_token_usage.as_ref().unwrap().breakdowns.is_empty()); + assert!(a.token_usage.as_ref().unwrap().breakdowns.is_empty()); + } + + #[test] + fn round_turns_share_group_id_and_usage_lands_on_round_final_turn() { + // One round emitting two assistant messages (commentary + final). + // Both belong to one API round, so they share a group_id (the + // round's turn_id) and the round total sits on the round's final + // assistant turn only — never on an interior turn, and never as a + // singleton claim on a turn whose siblings shared the spend. + let body = [ + r#"{"timestamp":"2026-04-20T16:44:37.772Z","type":"session_meta","payload":{"id":"019dabc6-8fef-7681-a054-b5bb75fcb97d","timestamp":"2026-04-20T16:43:30.171Z","cwd":"/tmp/proj","originator":"codex-tui","cli_version":"0.118.0","source":"cli"}}"#, + r#"{"timestamp":"2026-04-20T16:44:37.773Z","type":"turn_context","payload":{"turn_id":"round-1","cwd":"/tmp/proj","model":"gpt-5.4"}}"#, + r#"{"timestamp":"2026-04-20T16:44:37.775Z","type":"event_msg","payload":{"type":"task_started","turn_id":"round-1"}}"#, + r#"{"timestamp":"2026-04-20T16:44:37.800Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"go"}]}}"#, + r#"{"timestamp":"2026-04-20T16:44:38.100Z","type":"response_item","payload":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"working on it"}],"phase":"commentary"}}"#, + r#"{"timestamp":"2026-04-20T16:44:38.800Z","type":"event_msg","payload":{"type":"token_count","info":{"total_token_usage":{"input_tokens":100,"output_tokens":20,"cached_input_tokens":10,"total_tokens":130},"last_token_usage":{"input_tokens":100,"output_tokens":20,"cached_input_tokens":10,"total_tokens":130}}}}"#, + r#"{"timestamp":"2026-04-20T16:44:38.900Z","type":"response_item","payload":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"done"}],"phase":"final","end_turn":true}}"#, + r#"{"timestamp":"2026-04-20T16:44:39.000Z","type":"event_msg","payload":{"type":"task_complete","turn_id":"round-1","last_agent_message":"done"}}"#, + ] + .join("\n"); + let (_t, mgr, id) = setup_session_fixture(&body); + let view = to_view(&mgr.read_session(&id).unwrap()); + + assert_eq!(view.turns.len(), 3); + assert!(view.turns[0].group_id.is_none(), "user turn ungrouped"); + assert_eq!(view.turns[1].group_id.as_deref(), Some("round-1")); + assert_eq!(view.turns[2].group_id.as_deref(), Some("round-1")); + assert!( + view.turns[1].token_usage.is_none(), + "interior turn of the round must not carry usage" + ); + let total = view.turns[2].token_usage.as_ref().unwrap(); + assert_eq!(total.output_tokens, Some(20)); + assert_eq!(total.input_tokens, Some(100)); + } + + #[test] + fn turn_usage_delta_is_computed_when_last_token_usage_missing() { + // Older rollouts carry only cumulative totals; the per-turn value + // must be the difference between successive totals, not the total. + let (_t, mgr, id) = setup_session_fixture(&two_round_session(false)); + let view = to_view(&mgr.read_session(&id).unwrap()); + + let second = view.turns[3].token_usage.as_ref().unwrap(); + assert_eq!(second.input_tokens, Some(200)); + assert_eq!(second.output_tokens, Some(30)); + assert_eq!(second.cache_read_tokens, Some(30)); + } + #[test] fn encrypted_reasoning_does_not_land_on_thinking() { // The fixture only has encrypted_content. That must NOT be rendered diff --git a/crates/toolpath-codex/tests/fixture_roundtrip.rs b/crates/toolpath-codex/tests/fixture_roundtrip.rs index f756e78..d1bc593 100644 --- a/crates/toolpath-codex/tests/fixture_roundtrip.rs +++ b/crates/toolpath-codex/tests/fixture_roundtrip.rs @@ -136,6 +136,72 @@ fn token_usage_captured() { let u = view.total_usage.expect("total_usage missing"); assert!(u.input_tokens.unwrap_or(0) > 0); assert!(u.output_tokens.unwrap_or(0) > 0); + // Reasoning is surfaced under breakdowns["output"]["reasoning"], derived + // by differencing the cumulative `reasoning_output_tokens` counter — never + // raw-summed. The fixture's final cumulative reasoning is 979 ≤ output + // 11929, so the session total breakdown must match and respect the + // reasoning ⊆ output invariant. + let session_reasoning = u + .breakdowns + .get("output") + .and_then(|m| m.get("reasoning")) + .copied() + .expect("session reasoning breakdown present"); + assert_eq!(session_reasoning, 979); + assert!(session_reasoning <= u.output_tokens.unwrap()); +} + +#[test] +fn reasoning_breakdown_differenced_dedup_safe_against_real_fixture() { + use toolpath_convo::TokenUsage; + let s = session(); + let view = to_view(&s); + + let reasoning_of = |u: Option<&TokenUsage>| -> u32 { + u.and_then(|u| u.breakdowns.get("output")) + .and_then(|m| m.get("reasoning")) + .copied() + .unwrap_or(0) + }; + + // Sum of per-step attributed reasoning across the whole session must equal + // the final cumulative reasoning (979). If differencing were unsafe — + // summing the twice-emitted counts, or stamping the cumulative — this would + // overshoot. This is the dedup-safe / no-double-count proof on real data. + let attributed_reasoning: u32 = view + .turns + .iter() + .map(|t| reasoning_of(t.attributed_token_usage.as_ref())) + .sum(); + assert_eq!(attributed_reasoning, 979, "Σ attributed reasoning != cumulative"); + + // Per step, reasoning ⊆ output. + for t in &view.turns { + if let Some(a) = t.attributed_token_usage.as_ref() { + let r = reasoning_of(Some(a)); + assert!( + r <= a.output_tokens.unwrap_or(0), + "step reasoning {} exceeds output {:?}", + r, + a.output_tokens + ); + } + } + + // Round (group) totals: Σ over group token_usage reasoning == 979 too, and + // each round's reasoning ⊆ its output. + let round_reasoning: u32 = view + .turns + .iter() + .filter(|t| t.token_usage.is_some()) + .map(|t| { + let u = t.token_usage.as_ref().unwrap(); + let r = reasoning_of(Some(u)); + assert!(r <= u.output_tokens.unwrap_or(0)); + r + }) + .sum(); + assert_eq!(round_reasoning, 979, "Σ round-total reasoning != cumulative"); } #[test] diff --git a/crates/toolpath-convo/Cargo.toml b/crates/toolpath-convo/Cargo.toml index 79126e5..e1009ed 100644 --- a/crates/toolpath-convo/Cargo.toml +++ b/crates/toolpath-convo/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath-convo" -version = "0.10.0" +version = "0.11.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/toolpath-convo/src/derive.rs b/crates/toolpath-convo/src/derive.rs index 95acdda..ad66857 100644 --- a/crates/toolpath-convo/src/derive.rs +++ b/crates/toolpath-convo/src/derive.rs @@ -178,12 +178,42 @@ pub fn derive_path(view: &ConversationView, config: &DeriveConfig) -> Path { extra.insert("tool_uses".to_string(), serde_json::Value::Array(arr)); } - if let Some(usage) = &turn.token_usage + // Message-level accounting lands exactly once per message: when a + // provider splits one message across several turns (group_id + // set on each), only the run's last turn carries token_usage, so + // summing over steps yields session totals. A turn without a + // group_id is its own accounting unit. + let last_of_message = match &turn.group_id { + None => true, + Some(mid) => view + .turns + .get(idx + 1) + .is_none_or(|next| next.group_id.as_ref() != Some(mid)), + }; + if last_of_message + && let Some(usage) = &turn.token_usage && let Ok(v) = serde_json::to_value(usage) { extra.insert("token_usage".to_string(), v); } + // Per-step attributed spend rides its own key on every step that + // has it (independent of the once-per-message `token_usage`), so + // summing `token_usage` is unaffected while per-step cost stays + // readable structurally. + if let Some(attr) = &turn.attributed_token_usage + && let Ok(v) = serde_json::to_value(attr) + { + extra.insert("attributed_token_usage".to_string(), v); + } + + if let Some(mid) = &turn.group_id { + extra.insert( + "group_id".to_string(), + serde_json::Value::String(mid.clone()), + ); + } + if !turn.delegations.is_empty() && let Ok(v) = serde_json::to_value(&turn.delegations) { @@ -648,6 +678,7 @@ mod tests { Turn { id: id.to_string(), parent_id: None, + group_id: None, role, timestamp: "2026-01-01T00:00:00Z".to_string(), text: String::new(), @@ -656,6 +687,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -699,10 +731,68 @@ mod tests { // ...and survives a JSON round-trip. let json = serde_json::to_string(&path).unwrap(); assert!( - json.contains(r#""kind":"https://toolpath.net/kinds/agent-coding-session/v1.0.0""#) + json.contains(r#""kind":"https://toolpath.net/kinds/agent-coding-session/v1.1.0""#) ); } + #[test] + fn test_token_usage_breakdowns_round_trip() { + use std::collections::BTreeMap; + // A Turn whose token_usage carries breakdowns should derive into a + // Path and extract back out with the breakdowns intact. + let mut breakdowns = BTreeMap::new(); + breakdowns.insert( + "output".to_string(), + BTreeMap::from([("reasoning".to_string(), 450u32)]), + ); + let mut turn = base_turn("t1", Role::Assistant); + turn.model = Some("claude-opus-4-7".into()); + turn.token_usage = Some(TokenUsage { + input_tokens: Some(100), + output_tokens: Some(900), + breakdowns: breakdowns.clone(), + ..Default::default() + }); + let view = view_with(vec![turn]); + + let path = derive_path(&view, &DeriveConfig::default()); + let extracted = crate::extract::extract_conversation(&path); + + let usage = extracted.turns[0] + .token_usage + .as_ref() + .expect("token_usage survives round-trip"); + assert_eq!(usage.input_tokens, Some(100)); + assert_eq!(usage.output_tokens, Some(900)); + assert_eq!(usage.breakdowns, breakdowns); + assert_eq!(usage.breakdowns["output"]["reasoning"], 450); + } + + #[test] + fn test_token_usage_empty_breakdowns_omitted_in_json() { + // skip_serializing_if guarantees no "breakdowns" key for the empty map, + // keeping the wire format byte-compatible with pre-breakdowns producers. + let usage = TokenUsage { + input_tokens: Some(10), + output_tokens: Some(20), + ..Default::default() + }; + let json = serde_json::to_string(&usage).unwrap(); + assert!( + !json.contains("breakdowns"), + "empty breakdowns must be omitted, got: {json}" + ); + } + + #[test] + fn test_token_usage_absent_breakdowns_defaults_empty() { + // Deserializing an old-style token_usage object with no breakdowns key + // yields an empty map (serde default). + let usage: TokenUsage = + serde_json::from_str(r#"{"input_tokens":10,"output_tokens":20}"#).unwrap(); + assert!(usage.breakdowns.is_empty()); + } + #[test] fn test_single_user_turn() { let mut turn = base_turn("t1", Role::User); @@ -806,6 +896,7 @@ mod tests { let mut assistant = base_turn("t2", Role::Assistant); assistant.parent_id = Some("t1".into()); + assistant.group_id = Some("msg_t2".into()); assistant.model = Some("gpt-5.5".into()); assistant.text = "on it".into(); assistant.thinking = Some("plan the edit".into()); @@ -815,6 +906,11 @@ mod tests { output_tokens: Some(20), cache_read_tokens: Some(50), cache_write_tokens: None, + ..Default::default() + }); + assistant.attributed_token_usage = Some(TokenUsage { + output_tokens: Some(20), + ..Default::default() }); assistant.environment = Some(EnvironmentSnapshot { working_dir: Some("/repo".into()), @@ -873,7 +969,7 @@ mod tests { let schema_src = std::fs::read_to_string(concat!( env!("CARGO_MANIFEST_DIR"), - "/../path-cli/kinds/agent-coding-session/v1.0.0/schema.json" + "/../path-cli/kinds/agent-coding-session/v1.1.0/schema.json" )) .expect("read kind schema"); let schema: serde_json::Value = serde_json::from_str(&schema_src).unwrap(); @@ -1294,6 +1390,7 @@ mod tests { output_tokens: Some(50), cache_read_tokens: None, cache_write_tokens: None, + ..Default::default() }); let view = view_with(vec![turn]); let path = derive_path(&view, &DeriveConfig::default()); @@ -1305,6 +1402,109 @@ mod tests { ); } + fn usage(output: u32) -> TokenUsage { + TokenUsage { + input_tokens: Some(6), + output_tokens: Some(output), + cache_read_tokens: Some(14_842), + cache_write_tokens: Some(429_831), + ..Default::default() + } + } + + #[test] + fn test_message_group_carries_usage_once_on_last_step() { + // Three turns split from one provider message (Claude Code repeats + // message.usage on every content-block line), then one singleton + // message. Usage must land exactly once per group_id group — on + // the group's last step — and group_id on every grouped step. + let mut turns: Vec = (1..=3) + .map(|i| { + let mut t = base_turn(&format!("t{i}"), Role::Assistant); + t.group_id = Some("msg_01".into()); + t.token_usage = Some(usage(997)); + t + }) + .collect(); + let mut t4 = base_turn("t4", Role::Assistant); + t4.group_id = Some("msg_02".into()); + t4.token_usage = Some(usage(11)); + turns.push(t4); + + let view = view_with(turns); + let path = derive_path(&view, &DeriveConfig::default()); + let changes: Vec<&StructuralChange> = path.steps.iter().map(conv_change).collect(); + + assert!(!changes[0].extra.contains_key("token_usage")); + assert!(!changes[1].extra.contains_key("token_usage")); + assert_eq!( + changes[2].extra["token_usage"]["output_tokens"], + serde_json::json!(997) + ); + assert_eq!( + changes[3].extra["token_usage"]["output_tokens"], + serde_json::json!(11) + ); + for c in &changes[..3] { + assert_eq!(c.extra["group_id"], serde_json::json!("msg_01")); + } + assert_eq!(changes[3].extra["group_id"], serde_json::json!("msg_02")); + } + + #[test] + fn test_turn_without_group_id_is_its_own_accounting_unit() { + // Providers that never split a message (gemini, pi, opencode) + // leave group_id unset; every turn keeps its own usage. + let mut turns = Vec::new(); + for i in 1..=2 { + let mut t = base_turn(&format!("t{i}"), Role::Assistant); + t.token_usage = Some(usage(50 + i)); + turns.push(t); + } + let view = view_with(turns); + let path = derive_path(&view, &DeriveConfig::default()); + for (i, step) in path.steps.iter().enumerate() { + let sc = conv_change(step); + assert_eq!( + sc.extra["token_usage"]["output_tokens"], + serde_json::json!(51 + i as u64) + ); + assert!(!sc.extra.contains_key("group_id")); + } + } + + #[test] + fn test_message_grouping_is_consecutive_only() { + // A group_id reappearing after an intervening message starts a + // new group (defensive: source formats never interleave, but the + // rule is defined over consecutive runs in document order). + let mk = |id: &str, msg: &str, out: u32| { + let mut t = base_turn(id, Role::Assistant); + t.group_id = Some(msg.into()); + t.token_usage = Some(usage(out)); + t + }; + let view = view_with(vec![ + mk("t1", "msg_01", 100), + mk("t2", "msg_02", 200), + mk("t3", "msg_01", 300), + ]); + let path = derive_path(&view, &DeriveConfig::default()); + let changes: Vec<&StructuralChange> = path.steps.iter().map(conv_change).collect(); + assert_eq!( + changes[0].extra["token_usage"]["output_tokens"], + serde_json::json!(100) + ); + assert_eq!( + changes[1].extra["token_usage"]["output_tokens"], + serde_json::json!(200) + ); + assert_eq!( + changes[2].extra["token_usage"]["output_tokens"], + serde_json::json!(300) + ); + } + #[test] fn test_delegations_in_extras() { let mut turn = base_turn("t1", Role::Assistant); diff --git a/crates/toolpath-convo/src/extract.rs b/crates/toolpath-convo/src/extract.rs index 4753785..c1d8380 100644 --- a/crates/toolpath-convo/src/extract.rs +++ b/crates/toolpath-convo/src/extract.rs @@ -306,9 +306,19 @@ fn build_turn(step: &Step, extra: &HashMap) -> Turn { let parent_id = step.step.parents.first().cloned(); + let group_id = extra + .get("group_id") + .and_then(|v| v.as_str()) + .map(|s| s.to_string()); + + let attributed_token_usage = extra + .get("attributed_token_usage") + .and_then(|v| serde_json::from_value::(v.clone()).ok()); + Turn { id: step.step.id.clone(), parent_id, + group_id, role, timestamp: step.step.timestamp.clone(), text, @@ -317,6 +327,7 @@ fn build_turn(step: &Step, extra: &HashMap) -> Turn { model, stop_reason, token_usage, + attributed_token_usage, environment, delegations, file_mutations: Vec::new(), @@ -419,6 +430,7 @@ fn build_token_usage(extra: &HashMap) -> Option, + /// Optional decomposition of a top-level class into named sub-classes (keyed by the class + /// being broken down, e.g. "output"; inner map is sub-class → tokens, e.g. + /// {"reasoning": 450} or {"text": 300, "image": 500}). INFORMATIONAL ONLY: + /// breakdowns are never summed into the total — the parent class already + /// counts these tokens. Invariant: Σ(inner) ≤ the parent class's value. + #[serde(default, skip_serializing_if = "BTreeMap::is_empty")] + pub breakdowns: BTreeMap>, } /// Identity of the software that produced a session: e.g. @@ -246,6 +253,16 @@ pub struct Turn { /// Parent turn ID (for branching conversations). pub parent_id: Option, + /// Identifier of the source accounting unit this turn belongs to — + /// a message for Claude (`message.id`), a round for Codex (`turn_id`). + /// A grouping key, not a turn identifier: when a provider derives + /// several turns from one unit (Claude writes one JSONL line per + /// content block; a Codex round emits several turns), every sibling + /// turn carries the same value, and group-level accounting + /// (`token_usage`) belongs to the group once (on its final turn). + #[serde(default, skip_serializing_if = "Option::is_none")] + pub group_id: Option, + /// Who produced this turn. pub role: Role, @@ -267,9 +284,26 @@ pub struct Turn { /// Why the turn ended (e.g. "end_turn", "tool_use", "max_tokens"). pub stop_reason: Option, - /// Token usage for this turn. + /// Token usage for this turn. When this turn belongs to a `group_id` + /// group, this is the **whole message's total**, carried on the + /// group's final turn only (it always means "the total for a + /// message"; summing over turns yields session totals). pub token_usage: Option, + /// This turn's own attributed spend, when the source provides + /// step-aligned data — the output tokens generated *for this turn*, + /// distinct from [`Turn::token_usage`] (the whole message's total). + /// Populated where a provider streams per-step counts (Claude's + /// per-content-block cumulative `usage`, Codex's per-step + /// `token_count` deltas); absent where it can't be attributed. + /// Within a `group_id` group, `Σ attributed_token_usage` is the + /// group's attributed output; the unattributed remainder + /// (prompt-side input/cache, inherently per-message) stays in + /// `token_usage` on the group's final turn. A separate field from + /// `token_usage` precisely so the session-total sum is unaffected. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub attributed_token_usage: Option, + /// Environment at time of this turn. #[serde(default, skip_serializing_if = "Option::is_none")] pub environment: Option, @@ -543,6 +577,7 @@ mod tests { Turn { id: "t1".into(), parent_id: None, + group_id: None, role: Role::User, timestamp: "2026-01-01T00:00:00Z".into(), text: "Fix the authentication bug in login.rs".into(), @@ -551,6 +586,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -558,6 +594,7 @@ mod tests { Turn { id: "t2".into(), parent_id: Some("t1".into()), + group_id: None, role: Role::Assistant, timestamp: "2026-01-01T00:00:01Z".into(), text: "I'll fix that for you.".into(), @@ -579,7 +616,9 @@ mod tests { output_tokens: Some(50), cache_read_tokens: None, cache_write_tokens: None, + ..Default::default() }), + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -587,6 +626,7 @@ mod tests { Turn { id: "t3".into(), parent_id: Some("t2".into()), + group_id: None, role: Role::User, timestamp: "2026-01-01T00:00:02Z".into(), text: "Thanks!".into(), @@ -595,6 +635,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -799,6 +840,7 @@ mod tests { output_tokens: Some(50), cache_read_tokens: Some(500), cache_write_tokens: Some(200), + ..Default::default() }; let json = serde_json::to_string(&usage).unwrap(); let back: TokenUsage = serde_json::from_str(&json).unwrap(); @@ -924,6 +966,7 @@ mod tests { let turn = Turn { id: "t1".into(), parent_id: None, + group_id: None, role: Role::Assistant, timestamp: "2026-01-01T00:00:00Z".into(), text: "Delegating...".into(), @@ -932,6 +975,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: Some(EnvironmentSnapshot { working_dir: Some("/project".into()), vcs_branch: Some("feat/auth".into()), @@ -976,6 +1020,7 @@ mod tests { output_tokens: Some(500), cache_read_tokens: Some(800), cache_write_tokens: None, + ..Default::default() }), provider_id: Some("claude-code".into()), files_changed: vec!["src/main.rs".into(), "src/lib.rs".into()], diff --git a/crates/toolpath-convo/src/project.rs b/crates/toolpath-convo/src/project.rs index f991d5a..3a0a051 100644 --- a/crates/toolpath-convo/src/project.rs +++ b/crates/toolpath-convo/src/project.rs @@ -157,6 +157,7 @@ mod tests { Turn { id: id.into(), parent_id: None, + group_id: None, role, timestamp: "2026-01-01T00:00:00Z".into(), text: text.into(), @@ -165,6 +166,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -337,6 +339,7 @@ mod tests { turns: vec![Turn { id: "t1".into(), parent_id: None, + group_id: None, role: Role::Assistant, timestamp: "2026-01-01T00:00:00Z".into(), text: "reading file".into(), @@ -363,6 +366,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -411,6 +415,7 @@ mod tests { Turn { id: "t1".into(), parent_id: None, + group_id: None, role: Role::Assistant, timestamp: "2026-01-01T00:00:00Z".into(), text: "turn 1".into(), @@ -423,7 +428,9 @@ mod tests { output_tokens: Some(50), cache_read_tokens: None, cache_write_tokens: None, + ..Default::default() }), + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -431,6 +438,7 @@ mod tests { Turn { id: "t2".into(), parent_id: Some("t1".into()), + group_id: None, role: Role::Assistant, timestamp: "2026-01-01T00:00:01Z".into(), text: "turn 2".into(), @@ -443,7 +451,9 @@ mod tests { output_tokens: Some(75), cache_read_tokens: None, cache_write_tokens: None, + ..Default::default() }), + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), diff --git a/crates/toolpath-cursor/Cargo.toml b/crates/toolpath-cursor/Cargo.toml index 2697997..ca09aa9 100644 --- a/crates/toolpath-cursor/Cargo.toml +++ b/crates/toolpath-cursor/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath-cursor" -version = "0.1.0" +version = "0.2.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/toolpath-cursor/examples/dump_fixture.rs b/crates/toolpath-cursor/examples/dump_fixture.rs index 8be5b2b..e6fba08 100644 --- a/crates/toolpath-cursor/examples/dump_fixture.rs +++ b/crates/toolpath-cursor/examples/dump_fixture.rs @@ -267,6 +267,7 @@ fn view_from_jsonl( turns.push(Turn { id: turn_id.clone(), parent_id: prev_id.clone(), + group_id: None, role, // Synthesize plausible monotonic timestamps; the // transcript carries no real ones. @@ -281,6 +282,7 @@ fn view_from_jsonl( model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: env_for(), delegations: Vec::new(), file_mutations: Vec::new(), diff --git a/crates/toolpath-cursor/src/project.rs b/crates/toolpath-cursor/src/project.rs index 914d3e6..34bd0c7 100644 --- a/crates/toolpath-cursor/src/project.rs +++ b/crates/toolpath-cursor/src/project.rs @@ -924,6 +924,7 @@ mod tests { Turn { id: id.into(), parent_id: None, + group_id: None, role: Role::User, timestamp: "2026-06-01T00:00:00.000Z".into(), text: text.into(), @@ -932,6 +933,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: Some(EnvironmentSnapshot { working_dir: Some("/proj".into()), vcs_branch: None, @@ -946,6 +948,7 @@ mod tests { Turn { id: id.into(), parent_id: None, + group_id: None, role: Role::Assistant, timestamp: "2026-06-01T00:00:01.000Z".into(), text: text.into(), @@ -958,7 +961,9 @@ mod tests { output_tokens: Some(5), cache_read_tokens: None, cache_write_tokens: None, + ..Default::default() }), + attributed_token_usage: None, environment: Some(EnvironmentSnapshot { working_dir: Some("/proj".into()), vcs_branch: None, diff --git a/crates/toolpath-cursor/src/provider.rs b/crates/toolpath-cursor/src/provider.rs index 1ce7f99..02746ab 100644 --- a/crates/toolpath-cursor/src/provider.rs +++ b/crates/toolpath-cursor/src/provider.rs @@ -389,6 +389,7 @@ impl<'a> Builder<'a> { Turn { id: bubble.bubble_id.clone(), parent_id: parent.map(str::to_string), + group_id: None, role: Role::User, timestamp: bubble.created_at.clone().unwrap_or_default(), text: bubble.text.clone(), @@ -397,6 +398,7 @@ impl<'a> Builder<'a> { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment, delegations: Vec::new(), file_mutations: Vec::new(), @@ -437,14 +439,24 @@ impl<'a> Builder<'a> { tool_uses.push(invocation); } - // Token usage (per-bubble snapshot — only populated when the - // server returned non-zero counts). + // Token usage. POSSIBLE GAP, UNVERIFIED: we read only the bubble's + // `tokenCount`. When that's `{0,0}` (as in the one real session we + // have) we report `None`. Community exporters read fallback fields + // (a snake_case `usage` object, `contextWindowStatusAtCreation`, + // `promptDryRunInfo`), which hints `tokenCount` isn't always + // sufficient — but we have too little real Cursor data to know how + // often it's empty or to verify those field shapes. Wiring fallbacks + // in needs a live session with non-zero counts. We report `None` + // rather than fabricate — never derive usage from the + // `promptTokenBreakdown`/`contextUsagePercent` estimates, which are + // context-size gauges, not billed spend. let token_usage = bubble.token_count.as_ref().and_then(|t| { let u = TokenUsage { input_tokens: t.input_tokens.map(|n| n as u32).filter(|n| *n > 0), output_tokens: t.output_tokens.map(|n| n as u32).filter(|n| *n > 0), cache_read_tokens: None, cache_write_tokens: None, + ..Default::default() }; if u.input_tokens.is_none() && u.output_tokens.is_none() { None @@ -468,6 +480,7 @@ impl<'a> Builder<'a> { Turn { id: bubble.bubble_id.clone(), parent_id: parent.map(str::to_string), + group_id: None, role: Role::Assistant, timestamp: bubble.created_at.clone().unwrap_or_default(), text: bubble.text.clone(), @@ -476,6 +489,7 @@ impl<'a> Builder<'a> { model, stop_reason: None, token_usage, + attributed_token_usage: None, environment, delegations: Vec::new(), file_mutations, diff --git a/crates/toolpath-cursor/tests/projection_roundtrip.rs b/crates/toolpath-cursor/tests/projection_roundtrip.rs index 9566d17..16c8877 100644 --- a/crates/toolpath-cursor/tests/projection_roundtrip.rs +++ b/crates/toolpath-cursor/tests/projection_roundtrip.rs @@ -229,6 +229,7 @@ fn projector_accepts_foreign_view_shape() { Turn { id: "uA".into(), parent_id: None, + group_id: None, role: Role::User, timestamp: "2026-06-01T00:00:00Z".into(), text: "rename main".into(), @@ -237,6 +238,7 @@ fn projector_accepts_foreign_view_shape() { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: Some(EnvironmentSnapshot { working_dir: Some("/foreign".into()), ..Default::default() @@ -247,6 +249,7 @@ fn projector_accepts_foreign_view_shape() { Turn { id: "aA".into(), parent_id: Some("uA".into()), + group_id: None, role: Role::Assistant, timestamp: "2026-06-01T00:00:01Z".into(), text: "done".into(), @@ -268,7 +271,9 @@ fn projector_accepts_foreign_view_shape() { output_tokens: Some(5), cache_read_tokens: None, cache_write_tokens: None, + ..Default::default() }), + attributed_token_usage: None, environment: Some(EnvironmentSnapshot { working_dir: Some("/foreign".into()), ..Default::default() diff --git a/crates/toolpath-dot/Cargo.toml b/crates/toolpath-dot/Cargo.toml index 6fe5921..22d6fff 100644 --- a/crates/toolpath-dot/Cargo.toml +++ b/crates/toolpath-dot/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath-dot" -version = "0.4.0" +version = "0.5.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/toolpath-gemini/Cargo.toml b/crates/toolpath-gemini/Cargo.toml index 45f620e..76fdeed 100644 --- a/crates/toolpath-gemini/Cargo.toml +++ b/crates/toolpath-gemini/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath-gemini" -version = "0.5.0" +version = "0.6.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/toolpath-gemini/src/project.rs b/crates/toolpath-gemini/src/project.rs index 59db84a..39bde55 100644 --- a/crates/toolpath-gemini/src/project.rs +++ b/crates/toolpath-gemini/src/project.rs @@ -271,11 +271,22 @@ fn build_tokens(turn: &Turn, gemini_extras: &Map) -> Option Tokens { + // Reasoning is folded into `output_tokens` on the forward path and the + // slice is recorded in `breakdowns["output"]["reasoning"]`. Un-fold it + // back out here so `output`/`thoughts` round-trip losslessly. + let thoughts = u + .breakdowns + .get("output") + .and_then(|m| m.get("reasoning")) + .copied(); Tokens { input: u.input_tokens, - output: u.output_tokens, + output: match (u.output_tokens, thoughts) { + (Some(o), Some(r)) => Some(o.saturating_sub(r)), + (o, _) => o, + }, cached: u.cache_read_tokens, - thoughts: None, + thoughts, tool: None, total: None, } @@ -551,12 +562,41 @@ fn delegation_to_chat_file(d: &DelegatedWork, project_hash: &str) -> ChatFile { #[cfg(test)] mod tests { use super::*; + use std::collections::BTreeMap; use toolpath_convo::{EnvironmentSnapshot, ToolCategory, ToolResult}; + #[test] + fn tokens_from_common_unfolds_reasoning_out_of_output() { + let mut breakdowns: BTreeMap> = BTreeMap::new(); + breakdowns.insert("output".into(), BTreeMap::from([("reasoning".into(), 243u32)])); + let usage = TokenUsage { + output_tokens: Some(337), + breakdowns, + ..Default::default() + }; + + let tokens = tokens_from_common(&usage); + assert_eq!(tokens.output, Some(94)); + assert_eq!(tokens.thoughts, Some(243)); + } + + #[test] + fn tokens_from_common_without_breakdown_leaves_output_unchanged() { + let usage = TokenUsage { + output_tokens: Some(337), + ..Default::default() + }; + + let tokens = tokens_from_common(&usage); + assert_eq!(tokens.output, Some(337)); + assert_eq!(tokens.thoughts, None); + } + fn user_turn(id: &str, text: &str) -> Turn { Turn { id: id.into(), parent_id: None, + group_id: None, role: Role::User, timestamp: "2026-04-17T15:00:00Z".into(), text: text.into(), @@ -565,6 +605,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -575,6 +616,7 @@ mod tests { Turn { id: id.into(), parent_id: None, + group_id: None, role: Role::Assistant, timestamp: "2026-04-17T15:00:01Z".into(), text: text.into(), @@ -583,6 +625,7 @@ mod tests { model: Some("gemini-3-flash-preview".into()), stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -673,6 +716,7 @@ mod tests { output_tokens: Some(50), cache_read_tokens: Some(20), cache_write_tokens: None, + ..Default::default() }); let convo = GeminiProjector::default() .project(&view_with(vec![t])) @@ -798,6 +842,7 @@ mod tests { output_tokens: Some(5), cache_read_tokens: None, cache_write_tokens: None, + ..Default::default() }); t.tool_uses = vec![ToolInvocation { id: "tc1".into(), diff --git a/crates/toolpath-gemini/src/provider.rs b/crates/toolpath-gemini/src/provider.rs index 37292aa..f39da51 100644 --- a/crates/toolpath-gemini/src/provider.rs +++ b/crates/toolpath-gemini/src/provider.rs @@ -10,7 +10,7 @@ //! [`DelegatedWork`]. use crate::GeminiConvo; -use crate::types::{ChatFile, Conversation, GeminiMessage, GeminiRole, Thought, ToolCall}; +use crate::types::{ChatFile, Conversation, GeminiMessage, GeminiRole, Thought, Tokens, ToolCall}; use serde_json::Value; use toolpath_convo::{ ConversationMeta, ConversationProvider, ConversationView, ConvoError, DelegatedWork, @@ -103,12 +103,7 @@ fn message_to_turn(msg: &GeminiMessage, working_dir: Option<&str>) -> Turn { .collect(); let file_mutations = compute_file_mutations(msg.tool_calls()); - let token_usage = msg.tokens.as_ref().map(|t| TokenUsage { - input_tokens: t.input, - output_tokens: t.output, - cache_read_tokens: t.cached, - cache_write_tokens: None, - }); + let token_usage = msg.tokens.as_ref().map(tokens_to_usage); let environment = working_dir.map(|wd| EnvironmentSnapshot { working_dir: Some(wd.to_string()), @@ -119,6 +114,7 @@ fn message_to_turn(msg: &GeminiMessage, working_dir: Option<&str>) -> Turn { Turn { id: msg.id.clone(), parent_id: None, + group_id: None, role: gemini_role_to_role(&msg.role), timestamp: msg.timestamp.clone(), text, @@ -127,12 +123,71 @@ fn message_to_turn(msg: &GeminiMessage, working_dir: Option<&str>) -> Turn { model: msg.model.clone(), stop_reason: None, token_usage, + attributed_token_usage: None, environment, delegations: vec![], file_mutations, } } +/// Map Gemini's on-disk [`Tokens`] struct onto the provider-agnostic +/// [`TokenUsage`]. +/// +/// Gemini records `thoughts` (reasoning) as a **separate additive** +/// counter, sibling to `output` — across real sessions +/// `total == input + output + thoughts` exactly, and the format doc +/// describes `output` as "generated tokens *excluding reasoning*." +/// Google bills reasoning as output, so we fold `thoughts` into +/// `output_tokens` (same convention as opencode) — that way the IR's +/// `output` means "all generated tokens" and the session total isn't +/// under-counted. +/// +/// The folded reasoning slice is *also* recorded under +/// `breakdowns["output"]["reasoning"]`. It's INFORMATIONAL: breakdowns +/// are never summed into the total (output already counts it), and the +/// invariant `Σ(inner) = reasoning ≤ output` holds because we fold the +/// same number in. It's also what lets the projector un-fold reasoning +/// back out of `output_tokens` (see `tokens_from_common`), so the +/// `output`/`thoughts` split round-trips losslessly. The entry is +/// recorded whenever `thoughts` is present (including a genuine `Some(0)`), +/// preserving the `Some(0)` vs `None` distinction; when `thoughts` is +/// absent the map stays empty and is omitted from serialization. +/// +/// `tool` is prompt-side (tool-result tokens billed separately) and +/// `total` is a Gemini-side sum; neither is folded here — both remain +/// available raw via `Turn.extra["gemini"]["tokens"]`. +fn tokens_to_usage(t: &Tokens) -> TokenUsage { + let output = t.output.unwrap_or(0); + let thoughts = t.thoughts.unwrap_or(0); + let generated = output.saturating_add(thoughts); + + let mut usage = TokenUsage { + input_tokens: t.input, + // Fold reasoning into output (additive in Gemini — billed as + // output). None only when both output and thoughts are + // absent/zero, mirroring the per-field Option semantics. + output_tokens: if generated == 0 { None } else { Some(generated) }, + cache_read_tokens: t.cached, + cache_write_tokens: None, + ..Default::default() + }; + + // Memoize the reasoning slice folded into output so the projector can + // un-fold it back out losslessly. Recorded whenever `thoughts` is + // present — including a genuine `Some(0)` — so the projector + // reconstructs `Some(0)` rather than `None`; absent only when the + // source had no reasoning counter at all. + if let Some(thoughts) = t.thoughts { + usage + .breakdowns + .entry("output".to_string()) + .or_default() + .insert("reasoning".to_string(), thoughts); + } + + usage +} + /// For each file-write tool call in this message, build a /// `FileMutation` with a pre-resolved unified diff. Preference order: /// 1. Gemini's own `resultDisplay.fileDiff` when present (real diff @@ -712,12 +767,107 @@ mod tests { let view = ConversationProvider::load_conversation(&p, "/abs/myrepo", "session-uuid").unwrap(); let total = view.total_usage.as_ref().unwrap(); - // Main turns: (100,50), (200,80). Sub-agent turn: (20,5). + // Main turns: input/(output+thoughts) = (100, 50+10), (200, 80+0). + // Sub-agent turn: (20, 5+0). thoughts is additive reasoning, folded + // into output (billed as output by Google). assert_eq!(total.input_tokens, Some(320)); - assert_eq!(total.output_tokens, Some(135)); + assert_eq!(total.output_tokens, Some(145)); assert_eq!(total.cache_read_tokens, Some(50)); } + #[test] + fn test_thoughts_folded_into_output_with_breakdown() { + // Real-fixture-shaped numbers: total == input + output + thoughts + // (8665 + 94 + 243 == 9002). output EXCLUDES reasoning, so folding + // gives output_tokens = 94 + 243 = 337, and the reasoning slice is + // recorded under breakdowns["output"]["reasoning"] = 243 (≤ output). + let t = Tokens { + input: Some(8665), + output: Some(94), + cached: Some(0), + thoughts: Some(243), + tool: Some(0), + total: Some(9002), + }; + let u = tokens_to_usage(&t); + assert_eq!(u.input_tokens, Some(8665)); + assert_eq!(u.output_tokens, Some(337)); + let reasoning = u + .breakdowns + .get("output") + .and_then(|m| m.get("reasoning")) + .copied(); + assert_eq!(reasoning, Some(243)); + // reasoning ≤ output invariant holds. + assert!(reasoning.unwrap() <= u.output_tokens.unwrap()); + } + + #[test] + fn test_present_zero_thoughts_records_zero_breakdown() { + // thoughts == Some(0) → breakdown records reasoning: 0 so the + // projector reconstructs Some(0) (not None) on the reverse path. + // output_tokens is unchanged (folding 0 is a no-op). + let t = Tokens { + input: Some(200), + output: Some(80), + cached: Some(50), + thoughts: Some(0), + tool: Some(0), + total: Some(330), + }; + let u = tokens_to_usage(&t); + assert_eq!(u.output_tokens, Some(80)); + assert_eq!( + u.breakdowns.get("output").and_then(|m| m.get("reasoning")), + Some(&0) + ); + } + + #[test] + fn test_absent_thoughts_yields_no_breakdown() { + // thoughts absent (Gemini 2.5) → treated as 0: no breakdown, + // output_tokens == output. + let t = Tokens { + input: Some(20), + output: Some(5), + cached: Some(0), + thoughts: None, + tool: None, + total: None, + }; + let u = tokens_to_usage(&t); + assert_eq!(u.output_tokens, Some(5)); + assert!(u.breakdowns.is_empty()); + } + + #[test] + fn test_zero_output_and_thoughts_yields_none_output() { + // Both output and thoughts zero → output_tokens None (mirrors the + // per-field Option semantics; no fabricated zero). thoughts is + // present (Some(0)), so the breakdown still records reasoning: 0 + // for lossless round-trip of the Some(0) distinction. + let t = Tokens { + input: Some(100), + output: Some(0), + cached: Some(0), + thoughts: Some(0), + tool: Some(0), + total: Some(100), + }; + let u = tokens_to_usage(&t); + assert_eq!(u.output_tokens, None); + assert_eq!( + u.breakdowns.get("output").and_then(|m| m.get("reasoning")), + Some(&0) + ); + + // And the fully-absent case: thoughts None → no breakdown. + let empty = Tokens::default(); + let u2 = tokens_to_usage(&empty); + assert_eq!(u2.output_tokens, None); + assert!(u2.breakdowns.is_empty()); + } + #[test] fn test_files_changed() { let (_t, p) = setup_provider(); diff --git a/crates/toolpath-gemini/tests/projection_roundtrip.rs b/crates/toolpath-gemini/tests/projection_roundtrip.rs index 87deb91..2327a7b 100644 --- a/crates/toolpath-gemini/tests/projection_roundtrip.rs +++ b/crates/toolpath-gemini/tests/projection_roundtrip.rs @@ -229,9 +229,14 @@ fn roundtrip_preserves_tool_calls_with_results() { #[test] fn roundtrip_preserves_input_output_tokens() { - // Input/output/cached tokens survive via Turn.token_usage. - // Thoughts/tool/total tokens were Gemini-extra only and don't - // round-trip now that Turn.extra is gone. + // Input/cached tokens survive via Turn.token_usage. Output is folded + // on the forward path: Gemini's `thoughts` (reasoning) is an additive + // sibling of `output` (billed as output), so the derived + // `output_tokens` is `output + thoughts`. The reasoning slice is + // recorded in `breakdowns["output"]["reasoning"]`, so the projector + // un-folds it back out on projection — `output` and `thoughts` both + // round-trip losslessly. Only the `tool`/`total` counters were + // Gemini-extra only (no IR home) and don't survive. let source = load_source_conversation(); let (_, rebuilt, _) = roundtrip(&source); @@ -249,6 +254,7 @@ fn roundtrip_preserves_input_output_tokens() { .unwrap_or_else(|| panic!("tokens lost at message {}", i)); assert_eq!(bt.input, at.input, "input tokens at msg {}", i); assert_eq!(bt.output, at.output, "output tokens at msg {}", i); + assert_eq!(bt.thoughts, at.thoughts, "thoughts tokens at msg {}", i); assert_eq!(bt.cached, at.cached, "cached tokens at msg {}", i); } } diff --git a/crates/toolpath-git/Cargo.toml b/crates/toolpath-git/Cargo.toml index 56e7a8f..89d089a 100644 --- a/crates/toolpath-git/Cargo.toml +++ b/crates/toolpath-git/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath-git" -version = "0.5.0" +version = "0.6.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/toolpath-github/Cargo.toml b/crates/toolpath-github/Cargo.toml index e0750a4..08dd7f5 100644 --- a/crates/toolpath-github/Cargo.toml +++ b/crates/toolpath-github/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath-github" -version = "0.5.0" +version = "0.6.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/toolpath-md/Cargo.toml b/crates/toolpath-md/Cargo.toml index 735c2d2..07efdae 100644 --- a/crates/toolpath-md/Cargo.toml +++ b/crates/toolpath-md/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath-md" -version = "0.6.0" +version = "0.7.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/toolpath-opencode/Cargo.toml b/crates/toolpath-opencode/Cargo.toml index 8e8c030..82e90bd 100644 --- a/crates/toolpath-opencode/Cargo.toml +++ b/crates/toolpath-opencode/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath-opencode" -version = "0.4.0" +version = "0.5.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/toolpath-opencode/src/project.rs b/crates/toolpath-opencode/src/project.rs index 2372e66..0f34ae4 100644 --- a/crates/toolpath-opencode/src/project.rs +++ b/crates/toolpath-opencode/src/project.rs @@ -754,6 +754,7 @@ mod tests { Turn { id: "u1".into(), parent_id: None, + group_id: None, role: Role::User, timestamp: "2026-04-21T12:00:00.000Z".into(), text: text.into(), @@ -762,6 +763,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -772,6 +774,7 @@ mod tests { Turn { id: "a1".into(), parent_id: None, + group_id: None, role: Role::Assistant, timestamp: "2026-04-21T12:00:01.000Z".into(), text: text.into(), @@ -780,6 +783,7 @@ mod tests { model: Some("claude-sonnet-4-6".into()), stop_reason: Some("stop".into()), token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), diff --git a/crates/toolpath-opencode/src/provider.rs b/crates/toolpath-opencode/src/provider.rs index a1fc227..3215868 100644 --- a/crates/toolpath-opencode/src/provider.rs +++ b/crates/toolpath-opencode/src/provider.rs @@ -286,6 +286,7 @@ impl<'a> Builder<'a> { self.turns.push(Turn { id: msg.id.clone(), parent_id: None, + group_id: None, role: Role::User, timestamp: millis_to_iso(msg.time_created), text, @@ -294,6 +295,7 @@ impl<'a> Builder<'a> { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment, delegations: Vec::new(), file_mutations: Vec::new(), @@ -415,8 +417,11 @@ impl<'a> Builder<'a> { } // Prefer step-summed tokens over the message-level snapshot — - // the step deltas capture the real per-step work. - let token_usage = if step_usage_set { + // the step deltas capture the real per-step work. Absent or + // all-zero counters mean "spend unknown", not "a zero-cost API + // call": decode to None, never Some(zeros) (foreign-source + // projections write zero placeholders into required fields). + let token_usage = if step_usage_set && !is_usage_zero(&step_usage) { Some(step_usage.clone()) } else { let u = tokens_to_convo(&a.tokens); @@ -450,6 +455,7 @@ impl<'a> Builder<'a> { } else { Some(a.parent_id.clone()) }, + group_id: None, role: Role::Assistant, timestamp: millis_to_iso(msg.time_created), text: text_chunks.join("\n\n"), @@ -466,6 +472,7 @@ impl<'a> Builder<'a> { }, stop_reason: stop_reason.or_else(|| a.finish.clone()), token_usage, + attributed_token_usage: None, environment, delegations, file_mutations, @@ -592,9 +599,35 @@ fn to_invocation( fn accumulate_tokens(total: &mut TokenUsage, step: &Tokens) { add_u32(&mut total.input_tokens, step.input as u32); - add_u32(&mut total.output_tokens, step.output as u32); + // opencode reports `reasoning` as a SEPARATE additive category + // (`total == input + output + reasoning + cache`), unlike Claude/OpenAI + // where reasoning is already inside `output`. Fold it into output_tokens + // so the IR's `output` means "all generated tokens" consistently and the + // session total isn't under-counted. + add_u32(&mut total.output_tokens, (step.output + step.reasoning) as u32); add_u32(&mut total.cache_read_tokens, step.cache.read as u32); add_u32(&mut total.cache_write_tokens, step.cache.write as u32); + // Memoize the reasoning slice we just folded into output. It's + // INFORMATIONAL (never summed into the total — output already counts + // it); the invariant is Σ(inner) = reasoning ≤ output. Accumulates + // across step-finish parts exactly like output does. + add_reasoning_breakdown(total, step.reasoning as u32); +} + +/// Add `reasoning` to `breakdowns["output"]["reasoning"]`, creating the +/// nested maps as needed. No-op when `reasoning` is 0 so a zero-reasoning +/// turn keeps an empty `breakdowns` map (omitted from serialization). +fn add_reasoning_breakdown(usage: &mut TokenUsage, reasoning: u32) { + if reasoning == 0 { + return; + } + let slot = usage + .breakdowns + .entry("output".to_string()) + .or_default() + .entry("reasoning".to_string()) + .or_insert(0); + *slot = slot.saturating_add(reasoning); } fn add_u32(slot: &mut Option, delta: u32) { @@ -605,16 +638,18 @@ fn add_u32(slot: &mut Option, delta: u32) { } fn tokens_to_convo(t: &Tokens) -> TokenUsage { - TokenUsage { + let mut usage = TokenUsage { input_tokens: if t.input == 0 { None } else { Some(t.input as u32) }, - output_tokens: if t.output == 0 { + // Fold reasoning into output (additive in opencode — see + // `accumulate_tokens`). + output_tokens: if t.output + t.reasoning == 0 { None } else { - Some(t.output as u32) + Some((t.output + t.reasoning) as u32) }, cache_read_tokens: if t.cache.read == 0 { None @@ -626,7 +661,11 @@ fn tokens_to_convo(t: &Tokens) -> TokenUsage { } else { Some(t.cache.write as u32) }, - } + ..Default::default() + }; + // Memoize the reasoning slice folded into output (no-op when 0). + add_reasoning_breakdown(&mut usage, t.reasoning as u32); + usage } fn is_usage_zero(u: &TokenUsage) -> bool { @@ -960,12 +999,80 @@ mod tests { let view = to_view(&mgr.read_session("ses_x").unwrap()); let u = view.turns[1].token_usage.as_ref().unwrap(); assert_eq!(u.input_tokens, Some(100)); - assert_eq!(u.output_tokens, Some(20)); + // output (20) + reasoning (5): opencode reports reasoning as a + // separate additive category, folded into output here. + assert_eq!(u.output_tokens, Some(25)); assert_eq!(u.cache_read_tokens, Some(10)); + // The reasoning slice (5) is also memoized under + // breakdowns["output"]["reasoning"] — it's the SAME number folded + // into output, so Σ(inner) = 5 ≤ output (25). + assert_eq!(u.breakdowns.get("output").and_then(|m| m.get("reasoning")), Some(&5u32)); + let total = view.total_usage.as_ref().unwrap(); assert_eq!(total.input_tokens, Some(100)); - assert_eq!(total.output_tokens, Some(20)); + assert_eq!(total.output_tokens, Some(25)); + } + + #[test] + fn zero_reasoning_yields_no_breakdowns() { + // output present but reasoning 0 → token_usage exists, but no + // breakdowns entry (empty map, omitted from serialization). + let body = r#" + INSERT INTO project (id, worktree, time_created, time_updated, sandboxes) + VALUES ('p','/p',1,2,'[]'); + INSERT INTO session (id, project_id, slug, directory, title, version, time_created, time_updated) + VALUES ('s','p','slug','/p','T','1.0.0',1,2); + INSERT INTO message (id, session_id, time_created, time_updated, data) VALUES + ('m','s',1,1,'{"parentID":"","role":"assistant","mode":"b","agent":"b","path":{"cwd":"/p","root":"/p"},"cost":0,"tokens":{"input":10,"output":20,"reasoning":0,"cache":{"read":0,"write":0}},"modelID":"m","providerID":"p","time":{"created":1}}'); + INSERT INTO part (id, message_id, session_id, time_created, time_updated, data) VALUES + ('p1','m','s',1,1,'{"type":"step-finish","reason":"stop","tokens":{"input":10,"output":20,"reasoning":0,"cache":{"read":0,"write":0}}}'); + "#; + let (_t, mgr) = setup(body); + let view = to_view(&mgr.read_session("s").unwrap()); + let u = view.turns[0].token_usage.as_ref().unwrap(); + assert_eq!(u.output_tokens, Some(20)); + assert!(u.breakdowns.is_empty()); + } + + #[test] + fn reasoning_accumulates_across_step_finishes() { + // Two step-finish parts in one turn: reasoning 5 then 7 → output + // total folds 12, and breakdowns["output"]["reasoning"] == 12. + let body = r#" + INSERT INTO project (id, worktree, time_created, time_updated, sandboxes) + VALUES ('p','/p',1,2,'[]'); + INSERT INTO session (id, project_id, slug, directory, title, version, time_created, time_updated) + VALUES ('s','p','slug','/p','T','1.0.0',1,2); + INSERT INTO message (id, session_id, time_created, time_updated, data) VALUES + ('m','s',1,1,'{"parentID":"","role":"assistant","mode":"b","agent":"b","path":{"cwd":"/p","root":"/p"},"cost":0,"tokens":{"input":0,"output":0,"reasoning":0,"cache":{"read":0,"write":0}},"modelID":"m","providerID":"p","time":{"created":1}}'); + INSERT INTO part (id, message_id, session_id, time_created, time_updated, data) VALUES + ('p1','m','s',1,1,'{"type":"step-finish","reason":"tool-calls","tokens":{"input":10,"output":20,"reasoning":5,"cache":{"read":0,"write":0}}}'), + ('p2','m','s',2,2,'{"type":"step-finish","reason":"stop","tokens":{"input":3,"output":4,"reasoning":7,"cache":{"read":0,"write":0}}}'); + "#; + let (_t, mgr) = setup(body); + let view = to_view(&mgr.read_session("s").unwrap()); + let u = view.turns[0].token_usage.as_ref().unwrap(); + // output total: (20+5) + (4+7) = 36; reasoning slice: 5+7 = 12. + assert_eq!(u.output_tokens, Some(36)); + assert_eq!(u.breakdowns.get("output").and_then(|m| m.get("reasoning")), Some(&12u32)); + } + + #[test] + fn all_zero_usage_yields_none_and_no_breakdowns() { + let body = r#" + INSERT INTO project (id, worktree, time_created, time_updated, sandboxes) + VALUES ('p','/p',1,2,'[]'); + INSERT INTO session (id, project_id, slug, directory, title, version, time_created, time_updated) + VALUES ('s','p','slug','/p','T','1.0.0',1,2); + INSERT INTO message (id, session_id, time_created, time_updated, data) VALUES + ('m','s',1,1,'{"parentID":"","role":"assistant","mode":"b","agent":"b","path":{"cwd":"/p","root":"/p"},"cost":0,"tokens":{"input":0,"output":0,"reasoning":0,"cache":{"read":0,"write":0}},"modelID":"m","providerID":"p","time":{"created":1}}'); + INSERT INTO part (id, message_id, session_id, time_created, time_updated, data) VALUES + ('p1','m','s',1,1,'{"type":"step-finish","reason":"stop","tokens":{"input":0,"output":0,"reasoning":0,"cache":{"read":0,"write":0}}}'); + "#; + let (_t, mgr) = setup(body); + let view = to_view(&mgr.read_session("s").unwrap()); + assert!(view.turns[0].token_usage.is_none()); } #[test] diff --git a/crates/toolpath-pi/Cargo.toml b/crates/toolpath-pi/Cargo.toml index 27ac341..8ce2234 100644 --- a/crates/toolpath-pi/Cargo.toml +++ b/crates/toolpath-pi/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath-pi" -version = "0.5.0" +version = "0.6.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/toolpath-pi/src/project.rs b/crates/toolpath-pi/src/project.rs index 57bc2a0..a82b15d 100644 --- a/crates/toolpath-pi/src/project.rs +++ b/crates/toolpath-pi/src/project.rs @@ -760,6 +760,7 @@ mod tests { Turn { id: id.into(), parent_id: None, + group_id: None, role: Role::User, timestamp: "2026-04-16T10:00:00Z".into(), text: text.into(), @@ -768,6 +769,7 @@ mod tests { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), @@ -778,6 +780,7 @@ mod tests { Turn { id: id.into(), parent_id: None, + group_id: None, role: Role::Assistant, timestamp: "2026-04-16T10:00:01Z".into(), text: text.into(), @@ -790,7 +793,9 @@ mod tests { output_tokens: Some(50), cache_read_tokens: None, cache_write_tokens: None, + ..Default::default() }), + attributed_token_usage: None, environment: None, delegations: vec![], file_mutations: Vec::new(), diff --git a/crates/toolpath-pi/src/provider.rs b/crates/toolpath-pi/src/provider.rs index 1990ba0..6f091c2 100644 --- a/crates/toolpath-pi/src/provider.rs +++ b/crates/toolpath-pi/src/provider.rs @@ -171,8 +171,15 @@ fn extract_tool_result_text(content: &[ToolResultContent]) -> String { texts.join("\n") } -fn usage_to_token_usage(usage: &Usage) -> TokenUsage { - TokenUsage { +/// Pi's wire requires a `usage` object on every assistant message, so +/// foreign-source projections fill it with zeros when the spend is +/// unknown. A real API message can never cost zero tokens, so an +/// all-zero `usage` decodes as "no usage recorded", not `Some(zeros)`. +fn usage_to_token_usage(usage: &Usage) -> Option { + if usage.input == 0 && usage.output == 0 && usage.cache_read == 0 && usage.cache_write == 0 { + return None; + } + Some(TokenUsage { input_tokens: Some(usage.input as u32), output_tokens: Some(usage.output as u32), cache_read_tokens: if usage.cache_read > 0 { @@ -185,7 +192,8 @@ fn usage_to_token_usage(usage: &Usage) -> TokenUsage { } else { None }, - } + ..Default::default() + }) } fn environment_for(session: &PiSession) -> EnvironmentSnapshot { @@ -238,6 +246,7 @@ pub fn session_to_view(session: &PiSession) -> ConversationView { turns.push(Turn { id: base.id.clone(), parent_id: base.parent_id.clone(), + group_id: None, role: Role::System, timestamp: base.timestamp.clone(), text: format!("Compacted (summary): {}", summary), @@ -246,6 +255,7 @@ pub fn session_to_view(session: &PiSession) -> ConversationView { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: Some(env.clone()), delegations: vec![], file_mutations: Vec::new(), @@ -256,6 +266,7 @@ pub fn session_to_view(session: &PiSession) -> ConversationView { turns.push(Turn { id: base.id.clone(), parent_id: base.parent_id.clone(), + group_id: None, role: Role::System, timestamp: base.timestamp.clone(), text: format!("Branch summary: {}", summary), @@ -264,6 +275,7 @@ pub fn session_to_view(session: &PiSession) -> ConversationView { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: Some(env.clone()), delegations: vec![], file_mutations: Vec::new(), @@ -274,6 +286,7 @@ pub fn session_to_view(session: &PiSession) -> ConversationView { turns.push(Turn { id: base.id.clone(), parent_id: base.parent_id.clone(), + group_id: None, role: Role::Other("custom".to_string()), timestamp: base.timestamp.clone(), text: String::new(), @@ -282,6 +295,7 @@ pub fn session_to_view(session: &PiSession) -> ConversationView { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: Some(env.clone()), delegations: vec![], file_mutations: Vec::new(), @@ -297,6 +311,7 @@ pub fn session_to_view(session: &PiSession) -> ConversationView { turns.push(Turn { id: base.id.clone(), parent_id: base.parent_id.clone(), + group_id: None, role: Role::Other(format!("custom:{}", custom_type)), timestamp: base.timestamp.clone(), text: extract_user_text(content), @@ -305,6 +320,7 @@ pub fn session_to_view(session: &PiSession) -> ConversationView { model: None, stop_reason: None, token_usage: None, + attributed_token_usage: None, environment: Some(env.clone()), delegations: vec![], file_mutations: Vec::new(), @@ -339,7 +355,7 @@ pub fn session_to_view(session: &PiSession) -> ConversationView { thinking = extract_assistant_thinking(content); model = Some(m.clone()); stop_reason_s = Some(stop_reason_to_string(stop_reason)); - token_usage = Some(usage_to_token_usage(usage)); + token_usage = usage_to_token_usage(usage); let turn_idx = turns.len(); for block in content { @@ -435,6 +451,7 @@ pub fn session_to_view(session: &PiSession) -> ConversationView { turns.push(Turn { id: base.id.clone(), parent_id: base.parent_id.clone(), + group_id: None, role, timestamp: base.timestamp.clone(), text, @@ -443,6 +460,7 @@ pub fn session_to_view(session: &PiSession) -> ConversationView { model, stop_reason: stop_reason_s, token_usage, + attributed_token_usage: None, environment: Some(env.clone()), delegations, file_mutations: Vec::new(), @@ -615,6 +633,34 @@ mod tests { use std::collections::HashMap; use std::path::PathBuf; + #[test] + fn test_all_zero_usage_decodes_as_none() { + // Pi's wire requires `usage`; foreign projections zero-fill it + // when spend is unknown. Zero is not a real spend. + let zero = Usage { + input: 0, + output: 0, + cache_read: 0, + cache_write: 0, + total_tokens: 0, + cost: CostBreakdown::default(), + }; + assert!(usage_to_token_usage(&zero).is_none()); + + let real = Usage { + input: 10, + output: 5, + cache_read: 0, + cache_write: 0, + total_tokens: 15, + cost: CostBreakdown::default(), + }; + let u = usage_to_token_usage(&real).unwrap(); + assert_eq!(u.input_tokens, Some(10)); + assert_eq!(u.output_tokens, Some(5)); + assert_eq!(u.cache_read_tokens, None); + } + fn header(id: &str, cwd: &str) -> SessionHeader { SessionHeader { version: 3, diff --git a/crates/toolpath/Cargo.toml b/crates/toolpath/Cargo.toml index 54deb8c..77f3849 100644 --- a/crates/toolpath/Cargo.toml +++ b/crates/toolpath/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "toolpath" -version = "0.6.0" +version = "0.7.0" edition.workspace = true license.workspace = true repository = "https://github.com/empathic/toolpath" diff --git a/crates/toolpath/src/jsonl.rs b/crates/toolpath/src/jsonl.rs index b386d57..17139da 100644 --- a/crates/toolpath/src/jsonl.rs +++ b/crates/toolpath/src/jsonl.rs @@ -1250,7 +1250,7 @@ mod tests { }; let jsonl = p.to_jsonl_string().unwrap(); assert!( - jsonl.contains(r#""kind":"https://toolpath.net/kinds/agent-coding-session/v1.0.0""#) + jsonl.contains(r#""kind":"https://toolpath.net/kinds/agent-coding-session/v1.1.0""#) ); let back = Path::from_jsonl_str(&jsonl).unwrap(); assert_eq!(canonical_json(&p), canonical_json(&back)); @@ -1259,14 +1259,14 @@ mod tests { #[test] fn path_meta_line_can_set_kind() { let patch = PathMetaPatch { - kind: Some("https://toolpath.net/kinds/agent-coding-session/v1.0.0".into()), + kind: Some("https://toolpath.net/kinds/agent-coding-session/v1.1.0".into()), ..Default::default() }; let mut meta = PathMeta::default(); apply_meta_patch(&mut meta, patch); assert_eq!( meta.kind.as_deref(), - Some("https://toolpath.net/kinds/agent-coding-session/v1.0.0") + Some("https://toolpath.net/kinds/agent-coding-session/v1.1.0") ); } diff --git a/crates/toolpath/src/lib.rs b/crates/toolpath/src/lib.rs index 2cc8a1b..423ab3f 100644 --- a/crates/toolpath/src/lib.rs +++ b/crates/toolpath/src/lib.rs @@ -147,7 +147,8 @@ pub mod v1 { pub use crate::types::{ ActorDefinition, ArtifactChange, Base, Graph, GraphIdentity, GraphMeta, Identity, Key, - PATH_KIND_AGENT_CODING_SESSION, Path, PathIdentity, PathMeta, PathOrRef, PathRef, Ref, - Signature, Step, StepIdentity, StepMeta, StructuralChange, VcsSource, + PATH_KIND_AGENT_CODING_SESSION, PATH_KIND_AGENT_CODING_SESSION_V1_0_0, Path, PathIdentity, + PathMeta, PathOrRef, PathRef, Ref, Signature, Step, StepIdentity, StepMeta, + StructuralChange, VcsSource, }; } diff --git a/crates/toolpath/src/types.rs b/crates/toolpath/src/types.rs index 4f99b22..d1be613 100644 --- a/crates/toolpath/src/types.rs +++ b/crates/toolpath/src/types.rs @@ -141,8 +141,19 @@ pub struct Base { } /// [`PathMeta::kind`] URI for a path derived from an AI coding conversation. -/// Spec at . +/// Spec at . +/// +/// v1.1.0 specifies message-level token accounting: steps derived from one +/// provider message share a `message_id`, and the message's `token_usage` +/// appears on exactly one of them (the group's last step in document +/// order), so summing usage over a path's steps yields session totals. pub const PATH_KIND_AGENT_CODING_SESSION: &str = + "https://toolpath.net/kinds/agent-coding-session/v1.1.0"; + +/// The previous version URI. Documents produced before the v1.1.0 +/// accounting rule carry this kind; consumers summing their `token_usage` +/// per step must deduplicate repeated message-level usage themselves. +pub const PATH_KIND_AGENT_CODING_SESSION_V1_0_0: &str = "https://toolpath.net/kinds/agent-coding-session/v1.0.0"; /// Path metadata @@ -825,12 +836,12 @@ mod tests { }; let json = serde_json::to_string(&meta).unwrap(); assert!( - json.contains(r#""kind":"https://toolpath.net/kinds/agent-coding-session/v1.0.0""#) + json.contains(r#""kind":"https://toolpath.net/kinds/agent-coding-session/v1.1.0""#) ); let parsed: PathMeta = serde_json::from_str(&json).unwrap(); assert_eq!( parsed.kind.as_deref(), - Some("https://toolpath.net/kinds/agent-coding-session/v1.0.0") + Some("https://toolpath.net/kinds/agent-coding-session/v1.1.0") ); } diff --git a/docs/agents/formats/claude-code/usage.md b/docs/agents/formats/claude-code/usage.md index a471376..c799ff3 100644 --- a/docs/agents/formats/claude-code/usage.md +++ b/docs/agents/formats/claude-code/usage.md @@ -5,6 +5,54 @@ records the token counts Anthropic billed for that turn, including prompt-cache statistics. The shape has grown over time and now mixes flat fields with nested breakdowns that duplicate the flat totals. +## One message, many lines: don't sum per entry + +Claude Code writes one JSONL line **per content block** of an assistant +API message (see [entry-types](entry-types.md)), each stamped with a +`usage` object. A message with thinking + text + two `tool_use` blocks +lands as four entries. Summing `message.usage` across entries +over-counts (~3× on typical sessions) — the values are **cumulative +snapshots of one message, not per-line bills.** + +The grouping key is **`message.id`** (`msg_…`), identical on every line +of the split. Empirically, across every session sampled: + +- `input_tokens` and the cache counters are **constant** across a + message's lines (prompt-side cost, paid once for the message). +- `output_tokens` is **cumulative and non-decreasing**: it streams + upward as the model generates, and the **last line carries the + message total**. ~73% of split messages repeat one value on every + line (stamped after generation); ~27% genuinely stream (distinct + values). Either way the max — which is the last line — is the total. + +Correct accounting: take the **maximum** `usage` per distinct +`message.id` (don't trust line order; the format is undocumented). This +is what `toolpath-claude` does — derived paths put the message total on +the last step of each `message.id` group, per the +[`agent-coding-session` v1.1.0 kind](https://toolpath.net/kinds/agent-coding-session/v1.1.0/). + +**Why this is a snapshot, not a per-block bill.** The Anthropic +[streaming API](https://platform.claude.com/docs/en/build-with-claude/streaming.md) +reports usage incrementally: the `message_start` event seeds +`output_tokens` near zero, and each `message_delta` carries the running +**cumulative** total, the final delta being the message total. Claude +Code stamps each content-block line with whatever snapshot was current +when it flushed the line — so the early lines hold near-`message_start` +values and the full total lands on the last line. A real prose `text` +block routinely shows `output_tokens: 1`. The per-line values therefore +track *flush timing*, not the tokens a given block cost. + +**No per-block attribution.** Because of the above, differencing +consecutive lines does **not** yield per-block token costs — the +intermediate values are streaming snapshots, not block bills. We take +the max as the message total and do not derive `attributed_token_usage` +for Claude. (Codex, by contrast, reports a genuine per-call delta — see +[`codex.md`](../codex.md).) + +One caution: the `iterations` array (below) is a breakdown *inside* one +message's `usage` — subordinate detail, not an accounting unit; never sum +it alongside the enclosing totals. + ## Full observed shape ```jsonc diff --git a/docs/agents/formats/codex.md b/docs/agents/formats/codex.md index a5b3458..9734c35 100644 --- a/docs/agents/formats/codex.md +++ b/docs/agents/formats/codex.md @@ -502,6 +502,49 @@ Populated once the turn has real usage data: Absent/null `info` on the first `token_count` of a turn (delivered before the model responds); populated thereafter. +**Cumulative vs. per-step — and the doubling trap:** per OpenAI's own +field definitions, `total_token_usage` is "cumulative tokens consumed +across the entire session" and `last_token_usage` is "the incremental +token delta for that specific event" (a single API call's tokens). Never +attribute the cumulative counter to a single turn (summing it per turn +grows quadratically). A step's own spend is the **increase** in +`total_token_usage` since the previous count. Crucially, derive that by +**differencing the cumulative**, not by summing `last_token_usage`: Codex +re-emits `token_count` events with a stale, repeated `last_token_usage` +(observed as duplicate events with identical values; OpenAI documents it +for rate-limit-only updates), so summing `last_token_usage` double-counts +— while a repeated cumulative total is a 0 delta. This is a known trap: +downstream tools that trust `last_token_usage` directly over-count +(openai/codex [#14489](https://github.com/openai/codex/issues/14489), +[#17539](https://github.com/openai/codex/issues/17539)). Each +`token_count` follows the step it measures (a `function_call` or a +`message`), so the delta attributes to that step. + +**Round scoping + attribution:** a Codex round (one user task) can emit +several assistant messages (commentary + final) and many `token_count` +events. `toolpath-codex` groups a round's assistant turns under +`Turn.group_id` (the `turn_id` from `turn_context`/`task_started`), +records each per-step delta as that step's `attributed_token_usage`, and +sets the round's total `Turn.token_usage` (on its final turn) to the sum +of those attributions — one source of truth, so the total and the +per-step shares cannot drift, and `Σ token_usage == Σ attributed ==` +session total. Every field is per-step here (each step is a separate API +call re-sending context), so Codex attribution is full, not output-only. + +**Reasoning slice of output:** `total_token_usage.reasoning_output_tokens` +is a **subset** of `output_tokens` (reasoning ⊆ output) and is itself a +cumulative session counter. `toolpath-codex` differences it per call the +*same* way as the other counters (never raw-summed — that would +double-count for the same reason `last_token_usage` does) and surfaces the +per-step reasoning delta under `attributed_token_usage.breakdowns["output"]["reasoning"]`, +with the round total carrying the summed reasoning under +`token_usage.breakdowns["output"]["reasoning"]`. Breakdowns are +**informational only**: they are never added into any total (the parent +`output_tokens` already counts those tokens), and the invariant +`Σ(reasoning) ≤ output` holds by construction. A breakdown entry is +written only when reasoning is `> 0`; zero-reasoning rounds leave the map +empty so the field is omitted. + ### `exec_command_end` detail ```json @@ -816,7 +859,8 @@ The mapping below is what the provider actually emits. Source: | `custom_tool_call` / `_output` paired by `call_id` | same (raw `input` string preserved) | | `event_msg.exec_command_end` | back-fills `Turn.tool_uses[].result` with exit code / stdout / stderr | | `event_msg.patch_apply_end.changes[]` | sibling `ArtifactChange` on the tool-call's turn with the unified diff as `raw` and `codex.{add,update,delete}` as `structural` | -| `event_msg.token_count.info.total_token_usage` | `Turn.token_usage` (last-write-wins on the next assistant turn) + `ConversationView.total_usage` | +| `event_msg.token_count.info.total_token_usage` | cumulative; differenced per step → `Turn.attributed_token_usage`, summed per round → `Turn.token_usage` (round's final turn) + `ConversationView.total_usage` | +| `event_msg.token_count.info.total_token_usage.reasoning_output_tokens` (⊆ output, cumulative) | differenced per step → `breakdowns["output"]["reasoning"]` on `attributed_token_usage`; summed per round onto `token_usage` (informational, never summed into the total) | | `event_msg` non-turn types (`task_started`, `task_complete`, `user_message`, `agent_message`, etc.) | `ConversationView.events` as typed `ConversationEvent`s | | unknown `response_item` / `event_msg` kinds | preserved verbatim in `events` and round-trip via `RolloutItem::Unknown` / `ResponseItem::Other` / `EventMsg::Other` | diff --git a/docs/agents/formats/cursor.md b/docs/agents/formats/cursor.md index a7e12d5..20d1212 100644 --- a/docs/agents/formats/cursor.md +++ b/docs/agents/formats/cursor.md @@ -359,7 +359,7 @@ but the load-bearing ones are: | `isAgentic` | bool | Whether the bubble was produced under agent mode | | `requestId` | string | Joins to a server-side request log (often `""`) | | `checkpointId` | UUID | Joins to a `cursor-commits/checkpoints//` directory | -| `tokenCount` | object | `{ inputTokens, outputTokens }` | +| `tokenCount` | object | `{ inputTokens, outputTokens }` — the per-bubble spend. **Reliability unverified**: community Cursor exporters read usage with fallbacks across `tokenCount`, a snake_case `usage` object, `contextWindowStatusAtCreation`, and `promptDryRunInfo`, which suggests `tokenCount` alone is not always sufficient — but we have too little real Cursor data to say how often it's populated. Confirm against live sessions before relying on it. | | `modelInfo` | object | `{ modelName }` (only on bubbles that produced model output) | | `toolFormerData` | object \| absent | **The tool call.** See below. | | `toolResults` | array | Always empty in observed `_v: 3` rows — superseded by `toolFormerData.result` | diff --git a/docs/agents/formats/gemini.md b/docs/agents/formats/gemini.md index 673d901..3fa8cfc 100644 --- a/docs/agents/formats/gemini.md +++ b/docs/agents/formats/gemini.md @@ -249,10 +249,58 @@ not concatenated into the visible text. | `tool` | Tool-result tokens billed separately. | | `total` | Sum of the above (not always exactly — Gemini's total occasionally includes overhead). | -All fields are optional. `input_tokens` / `output_tokens` / `cached` -map cleanly to the common `TokenUsage` schema; the other three -(`thoughts`, `tool`, `total`) are Gemini-specific and should be -preserved in a provider-namespaced extras bucket. +All fields are optional. `input` → `input_tokens` and `cached` → +`cache_read_tokens` map cleanly to the common `TokenUsage` schema. The +standalone `tool` and `total` counters are Gemini-specific and are +preserved raw in a provider-namespaced extras bucket +(`Turn.extra["gemini"]["tokens"]`). + +#### `thoughts` is additive reasoning — folded into `output_tokens` + +`thoughts` is **not** a subset of `output`: the doc above states +`output` is "generated tokens *excluding reasoning*," and the recorded +numbers confirm it exactly. Across real sessions +`total == input + output + thoughts` to the token (e.g. +`8665 + 94 + 243 = 9002`; `9562 + 157 + 24 = 9743`), and `thoughts` +routinely *exceeds* `output` (243 vs 94 in the first example). + +Google bills reasoning as output, so `thoughts` is a sibling category +of `output`, not a breakdown of it. To avoid **under-counting** +generated tokens, the derived `output_tokens` folds reasoning in: +`output_tokens = output + thoughts` (same convention as opencode, whose +`reasoning` is likewise additive and billed as output). That way the +IR's `output` consistently means "all generated tokens" and a Σ over a +path is the real generated total. `output_tokens` is left `None` only +when both `output` and `thoughts` are absent/zero. + +The folded reasoning slice is **also** recorded under +`breakdowns["output"]["reasoning"] = thoughts`. This is informational: +`TokenUsage.breakdowns` is never summed into the total (output already +counts it), and the invariant `Σ(inner) = reasoning ≤ output` holds +because the same number is folded in. The entry is recorded whenever +`thoughts` is **present** (including a genuine `Some(0)`), preserving the +`Some(0)`-vs-absent distinction; only when `thoughts` is absent entirely +does the map stay empty and get omitted from serialization. (For the +worked example, `output_tokens = 94 + 243 = 337` with +`breakdowns["output"]["reasoning"] = 243`.) + +Crucially, this record is what makes the **reverse path lossless**: on +projection (`Path → Tokens`) the projector reads +`breakdowns["output"]["reasoning"]` and un-folds reasoning back out of +the folded `output_tokens` (`output = output_tokens − reasoning`, +`thoughts = reasoning`). So `output` and `thoughts` round-trip +losslessly through the IR. Only the Gemini-extra-only `tool`/`total` +counters remain lossy on round-trip — they have no IR home. + +The stored `Tokens` struct otherwise carries **no** nested modality +detail (no `candidatesTokensDetails` / `promptTokensDetails`, no +image/text/audio split). Should a future Gemini CLI version persist +genuine modality details +(e.g. `candidatesTokensDetails: [{modality: "IMAGE", tokenCount: …}]`, +which the API exposes but the CLI does not currently write to disk), +that would be a real per-modality split of `output` and could populate +`breakdowns["output"]["image"]` / `["text"]` — but only from those +recorded fields, never fabricated. ### Tool calls diff --git a/docs/agents/formats/opencode.md b/docs/agents/formats/opencode.md index b78a4ce..f4c7f84 100644 --- a/docs/agents/formats/opencode.md +++ b/docs/agents/formats/opencode.md @@ -417,6 +417,21 @@ finish reason (`stop`, `tool-calls`, `length`, `content-filter`, …). `tokens` is a per-step delta; sum over all `step-finish` parts in a session for a total. `cost` is USD for that step. +`reasoning` is an **additive** category, separate from `output` — +`total == input + output + reasoning + cache.read + cache.write` (verified +against real sessions; the Vercel AI SDK opencode uses reports +`reasoningTokens` separately from `outputTokens`). This differs from +Claude/OpenAI, where reasoning is already inside `output`. `toolpath-opencode` +therefore folds `reasoning` into the derived `output_tokens` so the IR's +`output` consistently means "all generated tokens" and the session total +isn't under-counted. So we don't discard the slice, the same folded +reasoning count is additionally recorded under +`token_usage.breakdowns["output"]["reasoning"]` — purely informational, +never summed into the total (output already counts it), preserving the +invariant `Σ(inner) = reasoning ≤ output`. It accumulates across all +`step-finish` parts in a turn exactly like the output total does, and is +omitted entirely when reasoning is 0. + ### `snapshot`, `patch` ```json diff --git a/docs/agents/formats/pi.md b/docs/agents/formats/pi.md index 0dadca4..ca1d77e 100644 --- a/docs/agents/formats/pi.md +++ b/docs/agents/formats/pi.md @@ -145,9 +145,15 @@ splits them across two entries by design. } ``` -`totalTokens` is canonically `input + output`. The `cost` breakdown is -Pi-specific; not present in real sessions where cost can't be -computed. +`usage` is **per API call** (per assistant message), not cumulative. +`totalTokens`'s formula is **version-dependent and not load-bearing for us**: +older Pi reported `input + output`, but Pi 0.2.0+ redefined its headline +token metric to `input + output + cacheWrite` (cacheRead deliberately +excluded so repeated cache hits don't dominate). `toolpath-pi` does **not** +read `totalTokens` — it reads the raw `input`/`output`/`cacheRead`/`cacheWrite` +fields and sums each independently, so it's correct regardless of which +`totalTokens` convention a session used. The `cost` breakdown is +Pi-specific; not present in real sessions where cost can't be computed. ### Stop reasons diff --git a/site/_data/crates.json b/site/_data/crates.json index 0a7bdcd..abf8029 100644 --- a/site/_data/crates.json +++ b/site/_data/crates.json @@ -1,7 +1,7 @@ [ { "name": "toolpath", - "version": "0.6.0", + "version": "0.7.0", "description": "Core types, builders, and query API", "docs": "https://docs.rs/toolpath", "crate": "https://crates.io/crates/toolpath", @@ -9,7 +9,7 @@ }, { "name": "toolpath-convo", - "version": "0.10.0", + "version": "0.11.0", "description": "Provider-agnostic conversation types, traits, and Toolpath-Path derivation", "docs": "https://docs.rs/toolpath-convo", "crate": "https://crates.io/crates/toolpath-convo", @@ -17,7 +17,7 @@ }, { "name": "toolpath-git", - "version": "0.5.0", + "version": "0.6.0", "description": "Derive from git repository history", "docs": "https://docs.rs/toolpath-git", "crate": "https://crates.io/crates/toolpath-git", @@ -25,15 +25,15 @@ }, { "name": "toolpath-github", - "version": "0.5.0", + "version": "0.6.0", "description": "Derive from GitHub pull requests", "docs": "https://docs.rs/toolpath-github", "crate": "https://crates.io/crates/toolpath-github", - "role": "Reads GitHub PRs via the REST API and maps commits, reviews, comments, and CI checks to Steps. Everything is a Step in the DAG — code changes, review threads, approvals, and CI results." + "role": "Reads GitHub PRs via the REST API and maps commits, reviews, comments, and CI checks to Steps. Everything is a Step in the DAG \u2014 code changes, review threads, approvals, and CI results." }, { "name": "toolpath-claude", - "version": "0.11.1", + "version": "0.12.0", "description": "Derive from Claude conversation logs", "docs": "https://docs.rs/toolpath-claude", "crate": "https://crates.io/crates/toolpath-claude", @@ -41,7 +41,7 @@ }, { "name": "toolpath-gemini", - "version": "0.5.0", + "version": "0.6.0", "description": "Derive from Gemini CLI conversation logs", "docs": "https://docs.rs/toolpath-gemini", "crate": "https://crates.io/crates/toolpath-gemini", @@ -49,7 +49,7 @@ }, { "name": "toolpath-codex", - "version": "0.5.0", + "version": "0.6.0", "description": "Derive from Codex CLI rollout files", "docs": "https://docs.rs/toolpath-codex", "crate": "https://crates.io/crates/toolpath-codex", @@ -57,7 +57,7 @@ }, { "name": "toolpath-opencode", - "version": "0.4.0", + "version": "0.5.0", "description": "Derive from opencode SQLite databases", "docs": "https://docs.rs/toolpath-opencode", "crate": "https://crates.io/crates/toolpath-opencode", @@ -65,7 +65,7 @@ }, { "name": "toolpath-pi", - "version": "0.5.0", + "version": "0.6.0", "description": "Derive Toolpath provenance documents from Pi (pi.dev) agent session logs", "docs": "https://docs.rs/toolpath-pi", "crate": "https://crates.io/crates/toolpath-pi", @@ -73,15 +73,15 @@ }, { "name": "toolpath-cursor", - "version": "0.1.0", + "version": "0.2.0", "description": "Derive Toolpath provenance documents from Cursor (IDE) composers", "docs": "https://docs.rs/toolpath-cursor", "crate": "https://crates.io/crates/toolpath-cursor", - "role": "Reads Cursor.app's bubble store (`state.vscdb` SQLite, `cursorDiskKV` table) — composers, bubbles, content-addressed blobs — and derives Toolpath `Path` documents via `toolpath-convo`'s shared `derive_path`. Round-trips back to a loadable composer via `CursorProjector` with full TOOL_TABLE coverage." + "role": "Reads Cursor.app's bubble store (`state.vscdb` SQLite, `cursorDiskKV` table) \u2014 composers, bubbles, content-addressed blobs \u2014 and derives Toolpath `Path` documents via `toolpath-convo`'s shared `derive_path`. Round-trips back to a loadable composer via `CursorProjector` with full TOOL_TABLE coverage." }, { "name": "toolpath-dot", - "version": "0.4.0", + "version": "0.5.0", "description": "Graphviz DOT visualization", "docs": "https://docs.rs/toolpath-dot", "crate": "https://crates.io/crates/toolpath-dot", @@ -89,11 +89,11 @@ }, { "name": "toolpath-md", - "version": "0.6.0", + "version": "0.7.0", "description": "Markdown rendering for LLM consumption", "docs": "https://docs.rs/toolpath-md", "crate": "https://crates.io/crates/toolpath-md", - "role": "Renders a Toolpath Graph as readable Markdown — a narrative an LLM can reason about. Dead ends are called out explicitly, diffs are included at configurable detail levels, and the output preserves enough anchoring info for an LLM to reference back into the original document." + "role": "Renders a Toolpath Graph as readable Markdown \u2014 a narrative an LLM can reason about. Dead ends are called out explicitly, diffs are included at configurable detail levels, and the output preserves enough anchoring info for an LLM to reference back into the original document." }, { "name": "pathbase-client", @@ -105,15 +105,15 @@ }, { "name": "path-cli", - "version": "0.13.1", + "version": "0.14.0", "description": "Unified CLI (binary: path)", "docs": "https://docs.rs/path-cli", "crate": "https://crates.io/crates/path-cli", - "role": "One binary called `path` that ties everything together. Porcelain at the top level (share, resume, query, show, track, auth); plumbing under `path p …` (import, export, cache, list, render, merge, validate). Pathbase round-trip via `p import pathbase` / `p export pathbase` (authed default → secret pathstash; anon fallback when not logged in)." + "role": "One binary called `path` that ties everything together. Porcelain at the top level (share, resume, query, show, track, auth); plumbing under `path p \u2026` (import, export, cache, list, render, merge, validate). Pathbase round-trip via `p import pathbase` / `p export pathbase` (authed default \u2192 secret pathstash; anon fallback when not logged in)." }, { "name": "toolpath-cli", - "version": "0.13.1", + "version": "0.14.0", "description": "Deprecated alias for path-cli", "docs": "https://docs.rs/toolpath-cli", "crate": "https://crates.io/crates/toolpath-cli", diff --git a/site/kinds/agent-coding-session/index.md b/site/kinds/agent-coding-session/index.md index c904072..5bf48e6 100644 --- a/site/kinds/agent-coding-session/index.md +++ b/site/kinds/agent-coding-session/index.md @@ -12,4 +12,5 @@ Documents reference a specific version URI. They do not depend on this landing p ## Versions -- [**v1.0.0**](/kinds/agent-coding-session/v1.0.0/): `https://toolpath.net/kinds/agent-coding-session/v1.0.0` _(current)_ +- [**v1.1.0**](/kinds/agent-coding-session/v1.1.0/): `https://toolpath.net/kinds/agent-coding-session/v1.1.0` _(current)_ — adds `group_id` and specifies message-level token accounting (a message's usage appears on exactly one step, so per-step sums equal session totals) +- [**v1.0.0**](/kinds/agent-coding-session/v1.0.0/): `https://toolpath.net/kinds/agent-coding-session/v1.0.0` — superseded; see its erratum on token accounting diff --git a/site/kinds/agent-coding-session/v1.0.0/index.md b/site/kinds/agent-coding-session/v1.0.0/index.md index 2172894..4f3c070 100644 --- a/site/kinds/agent-coding-session/v1.0.0/index.md +++ b/site/kinds/agent-coding-session/v1.0.0/index.md @@ -15,6 +15,8 @@ permalink: /kinds/agent-coding-session/v1.0.0/ A Toolpath path whose `meta.kind` is this URI records an AI coding conversation. It is an ordinary path with the extra structure described here. `head`-ancestry, dead ends, signatures, and `base` all behave as in the [base format](/format/). +> **Erratum — token accounting.** This version leaves the relationship between per-step `token_usage` and API-message accounting unspecified, and producers of v1.0.0 documents duplicated it: when a provider message was split across several steps (Claude Code writes one JSONL line per content block), **every** step of the split carried the full message-level `token_usage` — and Codex-derived documents carried *cumulative session counters* rather than per-message increments. Summing `token_usage` over a v1.0.0 path's steps therefore over-counts (≈3× on real Claude sessions; unboundedly on Codex). Consumers of v1.0.0 documents must deduplicate (e.g. zero a step whose nonzero usage tuple is byte-identical to the previous step's; this heuristic does **not** repair Codex documents). [v1.1.0](/kinds/agent-coding-session/v1.1.0/) specifies the rule — usage once per message, on the last step of its `group_id` group — and producers enforce it from `toolpath-convo`'s shared derivation onward. This URI keeps meaning what it always meant; this note documents that meaning. + Every such path comes from one place: the shared `ConversationView → Path` derivation in `toolpath-convo` (`derive_path`), which the provider crates (`toolpath-claude`, `toolpath-gemini`, `toolpath-codex`, `toolpath-opencode`, `toolpath-pi`) all call. The field shapes below are therefore exact. The only producer-specific parts are the contents of a tool's `input` and the diff text in a change's `raw`. Constraints apply by structural `type`, not by artifact key: a `change` entry is checked only when its `structural.type` is one named here, and extra properties never make a path invalid. [`schema.json`](./schema.json) encodes the rules; apply it alongside the base schema. The URI is immutable. Later revisions ship under a new version URI. diff --git a/site/kinds/agent-coding-session/v1.1.0/index.md b/site/kinds/agent-coding-session/v1.1.0/index.md new file mode 100644 index 0000000..f0e4a14 --- /dev/null +++ b/site/kinds/agent-coding-session/v1.1.0/index.md @@ -0,0 +1,157 @@ +--- +layout: base.njk +title: "Kind: agent-coding-session v1.1.0" +permalink: /kinds/agent-coding-session/v1.1.0/ +--- + +# Kind: `agent-coding-session` v1.1.0 + +
+
URI
+
https://toolpath.net/kinds/agent-coding-session/v1.1.0
+
Schema
+
schema.json
+
+ +A Toolpath path whose `meta.kind` is this URI records an AI coding conversation. It is an ordinary path with the extra structure described here. `head`-ancestry, dead ends, signatures, and `base` all behave as in the [base format](/format/). + +Every such path comes from one place: the shared `ConversationView → Path` derivation in `toolpath-convo` (`derive_path`), which the provider crates (`toolpath-claude`, `toolpath-gemini`, `toolpath-codex`, `toolpath-opencode`, `toolpath-cursor`, `toolpath-pi`) all call. The field shapes below are therefore exact. The only producer-specific parts are the contents of a tool's `input`, the diff text in a change's `raw`, and the value (not the meaning) of `group_id`. + +Constraints apply by structural `type`, not by artifact key: a `change` entry is checked only when its `structural.type` is one named here, and extra properties never make a path invalid. [`schema.json`](./schema.json) encodes the rules; apply it alongside the base schema. The URI is immutable. Later revisions ship under a new version URI. + +**Changed from [v1.0.0](/kinds/agent-coding-session/v1.0.0/):** the turn payload gains an optional `group_id`, and group-level token accounting is now specified — see [Group accounting](#group-accounting). v1.1.0 documents are structurally valid v1.0.0 documents; the new version exists so consumers can rely on the accounting rule. + +## The turn payload + +One entry in a turn's `change` map has `structural.type` of `"conversation.append"`. Find it by that type: the artifact key is producer-specific, formed as `://` from the harness in `meta.source` (e.g. `claude-code://…`, `gemini-cli://…`, `codex://…`, `opencode://…`, `cursor://…`, `pi://…`). + +Its `structural` object always carries: + +| Field | Type | Meaning | +| ------ | ------ | ------------------------------------------------------------------ | +| `type` | string | the literal `"conversation.append"` | +| `role` | string | `"user"`, `"assistant"`, `"system"`, or a producer-specific string | +| `text` | string | the visible prose; present even when empty (`""`) | + +It may also carry any of the following, present only when the turn has them: + +| Field | Type | Meaning | +| ------------- | ------ | ------------------------------------------------------------- | +| `thinking` | string | the model's reasoning text | +| `group_id` | string | groups the steps derived from one source accounting unit (see below) | +| `tool_uses` | array | tools the agent invoked (shape below) | +| `token_usage` | object | the group's token counts (shape and rule below) | +| `attributed_token_usage` | object | this step's own attributed spend, when known (see below) | +| `stop_reason` | string | why the model stopped (`end_turn`, `tool_use`, …) | +| `delegations` | array | sub-agent work spawned from this turn (shape below) | +| `environment` | object | working environment at this turn (shape below) | + +The model identifier is not on the change. It lives in `step.actor` (`agent:`) and `meta.actors`. There is no provider-specific blob: every field the derivation captures is one of those listed above. + +### `group_id` + +The provider's identifier for the **source accounting unit** these steps were derived from — Claude Code's `message.id` (`msg_…`) for one split message, Codex's round `turn_id` for one round (which may itself contain several messages). It is a **grouping key, not a step identifier**: when a producer derives several steps from one accounting unit (Claude Code writes one JSONL line per content block; a Codex round emits a commentary turn plus a final turn), every sibling step carries the same `group_id`. A step without a `group_id` is its own group of one. The stored value is the provider's verbatim id; only its *meaning* (which unit it names) is provider-specific. + +### Group accounting + +How `token_usage` on steps relates to the source's accounting units: + +1. `token_usage` records a group's spend — a **per-group amount, never a cumulative session counter**. +2. Within a run of consecutive steps sharing a `group_id` (document order), the run's **last step carries the group's total `token_usage`, verbatim from the source**. In this version, the run's other steps carry none. +3. A step without a `group_id` is its own group and carries its own `token_usage` (when the source records one). + +Consequence: **summing `token_usage` over a v1.1.0 path's steps yields the session totals.** Consumers need no dedup heuristics. (JSON Schema cannot express the once-per-run rule, so it is normative prose, enforced by producer test suites.) + +`token_usage` has **one meaning everywhere it appears: the total for a group**. A step without a `group_id` is a one-step group, so its `token_usage` is that group's total (which is also its own spend — the two coincide for a group of one). Within a multi-step group, the total sits on the final step. Interpreting a value never requires reading the rest of its group: the key tells you it is a total, and `group_id` on the same payload tells you which group it totals. Per-step spend, when the source has it, rides a separate [`attributed_token_usage`](#per-step-attribution-attributed_token_usage) key — never `token_usage`. When a source format offers both a group total and a finer breakdown (Claude's `usage.iterations`, opencode's per-part `step-finish` tokens), `token_usage` carries the total; the breakdown is subordinate detail and does not ride `token_usage`. + +### Per-step attribution: `attributed_token_usage` + +Some sources expose, per step, the spend attributable to that step alone — distinct from the group total. Where a producer has it, the step carries an **`attributed_token_usage`** object (same shape as [`token_usage`](#token_usage)) holding *this step's own share*. It is **optional and orthogonal to `token_usage`**: whether a number is a group total or a step share is structural — the key it sits under — never positional. This is the rule that lets per-step accounting be added by any producer at any time without a new kind version. + +How it relates to the group total: + +- Within a `group_id` group, `Σ attributed_token_usage` over the group's steps is the group's attributed spend. The **unattributed remainder** — anything the source could not pin to a step — is *computed* by a consumer as `group's token_usage − Σ group's attributed_token_usage`; it is never recorded, so stored values stay verbatim source observations and source inconsistencies stay visible. +- For a group where the source attributes everything (e.g. Codex, where each step is a separate API call and the per-call delta is reported directly), the remainder is zero and `Σ attributed_token_usage == token_usage`. +- A group with no per-step data carries no `attributed_token_usage` at all — only the group total. Producers must not fabricate a split. + +A producer populates `attributed_token_usage` only when the source genuinely reports per-step spend. Among current producers, **Codex does** (its `token_count` events carry a per-call delta). **Claude does not**: its per-content-block `usage` values are cumulative streaming snapshots stamped at flush time, not per-block costs, so deriving a split from them would be fabrication — Claude-derived steps carry the group total only. + +`Σ token_usage` over a path's steps is unaffected by `attributed_token_usage` (they are separate keys), so the session-total guarantee above always holds. A consumer wanting per-step cost reads `attributed_token_usage` where present and falls back to the group total otherwise. + +### `tool_uses` + +Each element is an object: + +| Field | Type | Notes | +| ---------- | -------------- | ---------------------------------------------------------------------------------------------------------------------------------- | +| `id` | string | provider-assigned invocation ID | +| `name` | string | provider tool name (`Read`, `Bash`, `edit`, …) | +| `input` | any | tool arguments; shape is producer-specific | +| `category` | string \| null | Toolpath's classification: `file_read`, `file_write`, `file_search`, `shell`, `network`, `delegation`, or `null` when unrecognized | +| `result` | object | `{ "content": string, "is_error": boolean }`, when the result landed in the same turn | + +`id`, `name`, `input`, and `category` are always present (`category` may be `null`); `result` is optional. + +### `token_usage` + +| Field | Type | Notes | +| -------------------- | --------------- | ------------------------------- | +| `input_tokens` | integer \| null | always present | +| `output_tokens` | integer \| null | always present | +| `cache_read_tokens` | integer | only when the source records it | +| `cache_write_tokens` | integer | only when the source records it | +| `breakdowns` | object | only when the source itemizes a class (see below) | + +Values follow the [group accounting](#group-accounting) rule above. + +`breakdowns` is an **optional, informational** decomposition of a top-level class into named sub-classes. It is keyed by the class being broken down (e.g. `"output"`); each value is a map of sub-class → tokens (e.g. `{ "output": { "reasoning": 450 } }`). Breakdowns are **never summed into any total** — the parent class already counts these tokens; a breakdown only says *how* that class divides. Invariant: **`Σ(inner) ≤` the parent class's value**. The field is omitted entirely when empty. The same shape and rule apply on `attributed_token_usage`. Among current producers, Gemini, OpenCode, and Codex record `output → { reasoning }` (their reasoning/thoughts tokens are part of `output_tokens`); Claude records none (its JSONL `usage` does not itemize thinking tokens). + +### `environment` + +`{ "working_dir"?: string, "vcs_branch"?: string, "vcs_revision"?: string }`; every field optional. + +### `delegations` + +Each element is `{ "agent_id": string, "prompt": string, "turns"?: array, "result"?: string }`. `turns` holds the sub-agent's own turns when the producer inlines them. + +## File changes + +When a turn writes files, its step carries sibling `change` entries keyed by file path, each with `structural.type` of `"file.write"`. The unified diff, when available, is on the change's `raw`, not inside `structural`. The `structural` object holds, all optional: + +| Field | Meaning | +| ------------------ | ------------------------------------------------------------------ | +| `tool_id` | the `tool_uses[].id` that produced the mutation, when attributable | +| `tool` | that tool's `name` | +| `operation` | `"add"`, `"update"`, `"delete"`, or a producer-specific tag | +| `before` / `after` | file contents before / after, when known | +| `rename_to` | the new path, for a rename | + +## Non-turn entries + +Entries that aren't turns (attachments, preamble lines, snapshots, hook results) become steps with `structural.type` of `"conversation.event"`, carrying `entry_type` and sometimes `event_source_id` plus the producer's event data. They exist so a document round-trips back to the source format. They are not part of the transcript. + +## Actors + +`step.actor` follows the `type:name` convention, assigned by role: + +| Actor | Turn | +| ----------------- | ---------------------------------------------------------------------------------------------- | +| `human:user` | a user message | +| `agent:` | a model reply, named by the recorded model, or `agent:unknown` when none was recorded | +| `tool:` | a system turn (session init, system prompt), any other producer role, or a non-turn event step | + +`meta.actors` defines each actor the steps reference; `agent:` entries carry `provider` and `model`. A turn's original role is always in its `role` field, so collapsing system and other roles onto `tool:` loses nothing. Walk steps in `head`-ancestry order for the linear transcript. + +## Path metadata + +| Field | Meaning | +| -------------------- | -------------------------------------------------------------------------------- | +| `meta.kind` | this URI | +| `meta.source` | the producing harness: `claude-code`, `gemini-cli`, `codex`, `opencode`, `cursor`, or `pi` | +| `meta.title` | session title | +| `meta.actors` | the actor definitions the steps reference | +| `meta.files_changed` | file paths touched across the session | +| `meta.vcs_remote` | repository URL, when known | +| `meta.producer` | `{ "name": string, "version"?: string }`, the software that produced the session | + +`files_changed`, `vcs_remote`, and `producer` sit directly under `meta` (they ride `PathMeta`'s flattened `extra`), not under a nested `meta.extra`. diff --git a/site/kinds/agent-coding-session/v1.1.0/schema.json b/site/kinds/agent-coding-session/v1.1.0/schema.json new file mode 120000 index 0000000..1ba883f --- /dev/null +++ b/site/kinds/agent-coding-session/v1.1.0/schema.json @@ -0,0 +1 @@ +../../../../crates/path-cli/kinds/agent-coding-session/v1.1.0/schema.json \ No newline at end of file diff --git a/site/kinds/index.md b/site/kinds/index.md index 084bfd8..91164e5 100644 --- a/site/kinds/index.md +++ b/site/kinds/index.md @@ -14,4 +14,4 @@ Kind URIs are immutable: revisions ship at a new version URI, and old URIs keep | Kind | Current URI | Spec | | ------------------------------------------------------ | -------------------------------------------------------- | --------------------------------------------- | -| [`agent-coding-session`](/kinds/agent-coding-session/) | `https://toolpath.net/kinds/agent-coding-session/v1.0.0` | [v1.0.0](/kinds/agent-coding-session/v1.0.0/) | +| [`agent-coding-session`](/kinds/agent-coding-session/) | `https://toolpath.net/kinds/agent-coding-session/v1.1.0` | [v1.1.0](/kinds/agent-coding-session/v1.1.0/) |