empathic · akesling · Jun 22, 2026 · Jun 10, 2026 · Jun 16, 2026 · Jun 16, 2026
diff --git a/.gitignore b/.gitignore
@@ -10,3 +10,6 @@ site/wasm/
 .claude/worktrees/
 # Synthetic benchmark fixtures — generate locally via gen_synthetic_path.
 /bench/fixtures/
+
+# macOS
+.DS_Store
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,114 @@
 
 All notable changes to the Toolpath workspace are documented here.
 
+## Token usage: once per message, with per-step attribution + kind v1.1.0 — 2026-06-17
+
+Fixes token over-counting in derived documents (~3× output-token
+inflation on real Claude sessions, unbounded on Codex) and adds per-step
+token attribution where the source genuinely reports it (Codex). Two
+over-counting bugs, one spec gap, plus a capability the corrected reads
+make possible. Verified against every Claude session and all Codex
+sessions on disk, and cross-checked against the Anthropic streaming API
+reference and OpenAI's codex issue tracker.
+
+- **Claude**: Claude Code writes one JSONL line per content block of an
+  assistant API message, repeating the message-level `usage` on every
+  line. `toolpath-claude` emitted one step per line, each carrying the
+  full usage — so summing `token_usage` per step over-counted by the
+  block count, and the disambiguating `message.id` was dropped.
+- **Codex**: `toolpath-codex` stamped the *cumulative* session counter
+  (`total_token_usage`) onto each assistant turn instead of per-step
+  spend, so per-step sums grew quadratically.
+
+Core model (kind `agent-coding-session` **v1.1.0**, both fields optional
+so any producer can populate per-step attribution later with no further
+kind version):
+
+- `token_usage` always means **the total for a message**, on the
+  group's final step (`Σ token_usage` over a path = session total).
+- `attributed_token_usage` (new) is **this step's own attributed
+  spend**, on its own key so the sum above is unaffected. Whether a
+  number is a total or a share is structural (the key), never
+  positional. The unattributed remainder
+  (`group token_usage − Σ attributed`) is computed by consumers, never
+  recorded — stored values stay verbatim source observations.
+- `breakdowns` (new, optional) is a **decomposition of a top-level
+  class into named sub-classes** — keyed by the class being broken down (e.g.
+  `"output"`), inner map sub-class → tokens (e.g. `{"output":
+  {"reasoning": 243}}`). It is **informational and never summed into
+  any total** — the parent class already counts those tokens — so the
+  session-total guarantee is untouched. Invariant: `Σ(inner) ≤` the
+  parent class's value; the field is omitted when empty. It rides both
+  `token_usage` and `attributed_token_usage`.
+
+Changes:
+
+- `toolpath_convo::TokenUsage` gains `breakdowns`
+  (`BTreeMap<class, BTreeMap<sub-class, tokens>>`); the kind
+  `tokenUsage` `$def` gains a matching optional `breakdowns` property.
+- **Gemini under-count FIX**: Gemini reports `thoughts` (reasoning) as
+  an additive sibling of `output_tokens` that the derivation was
+  **dropping** — so Gemini output totals were under-counted by the
+  reasoning spend. `thoughts` is now **folded into `output_tokens`**
+  (correcting the total) *and* recorded under
+  `breakdowns["output"]["reasoning"]`; the projector **un-folds** it on
+  the reverse path for a lossless round-trip (`Some(0)` is preserved as
+  a real Gemini-3 zero-reasoning signal, not collapsed to absent).
+- **OpenCode**: continues folding `reasoning` into `output_tokens`, and
+  now also records it under `breakdowns["output"]["reasoning"]`.
+- **Codex**: `reasoning_output_tokens` (a subset of `output_tokens`,
+  cumulative → differenced like the other counters) is surfaced under
+  `breakdowns["output"]["reasoning"]` on both the per-step
+  `attributed_token_usage` and the per-round `token_usage`.
+- **Claude**: records no breakdown — its JSONL `usage` does not itemize
+  thinking tokens.
+- `toolpath_convo::Turn` gains `group_id` (grouping key) and
+  `attributed_token_usage`. `derive_path` writes `token_usage` once per
+  `group_id` group and `attributed_token_usage` on each step that has
+  it; `extract_conversation` reads both back.
+- `toolpath-claude`: a split message's lines carry `message.usage` as a
+  **cumulative streaming snapshot**, not a per-line bill — per the
+  Anthropic streaming API, `message_start` seeds `output_tokens` near
+  zero and each `message_delta` reports the running cumulative total
+  (confirmed across every session sampled: input/cache constant, output
+  climbing to the final-line total; ~27% of multi-line messages vary).
+  Each `group_id` run is reduced to the **field-wise maximum** total
+  (never under-counts whatever the line order) on its final turn. The
+  intermediate snapshots are flush-time artifacts, *not* per-block costs
+  (a real prose block routinely shows `output_tokens: 1`), so Claude
+  emits **no** `attributed_token_usage`. `total_usage` is deduped by
+  group; the projector re-expands the total onto every line of a split.
+- `toolpath-codex`: per-step spend is the increase in the cumulative
+  `total_token_usage` since the previous count — **differencing the
+  cumulative is dedup-safe**, where summing `last_token_usage` would
+  double-count because Codex re-emits a stale `last_token_usage` on
+  repeated `token_count` events (a documented trap: openai/codex #14489,
+  #17539). Each per-call delta is attributed to the step it follows as
+  `attributed_token_usage`; a round's `token_usage` total is the sum of
+  its steps' attributions (one source of truth — total and shares cannot
+  drift). The projector emits a `turn_context` per group and a cumulative
+  `token_count` after each step, so grouping and attribution survive the
+  round-trip.
+- `toolpath-pi` and `toolpath-opencode` decode absent/all-zero wire
+  usage counters as `token_usage: None` ("spend unknown") instead of
+  `Some(zeros)` — their wires require usage fields, which
+  foreign-source projections zero-fill.
+- `PATH_KIND_AGENT_CODING_SESSION` now points at v1.1.0;
+  `PATH_KIND_AGENT_CODING_SESSION_V1_0_0` names the old URI. `path p
+  validate` bundles both schemas. The v1.0.0 spec page gains an erratum
+  documenting the historical duplication (consumers of v1.0.0 documents
+  still need dedup heuristics; the byte-identical-tuple heuristic does
+  not repair Codex documents).
+
+Crates bumped (every crate that depends on `toolpath`, matching the
+domain-rename precedent since the emitted kind URI changes): `toolpath`
+0.7.0, `toolpath-convo` 0.11.0, `toolpath-git` 0.6.0, `toolpath-github`
+0.6.0, `toolpath-claude` 0.12.0, `toolpath-gemini` 0.6.0,
+`toolpath-codex` 0.6.0, `toolpath-opencode` 0.5.0, `toolpath-cursor`
+0.2.0, `toolpath-pi` 0.6.0, `toolpath-dot` 0.5.0, `toolpath-md` 0.7.0,
+`path-cli` 0.14.0, `toolpath-cli` 0.14.0. `pathbase-client` is
+unaffected.
+
 ## toolpath-claude 0.11.1 + path-cli 0.13.1 + toolpath-cli 0.13.1: derive `project_path` from the file's parent directory — 2026-06-09
 
 `ConversationReader::read_conversation_metadata` used to set

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -175,18 +175,18 @@ the server publishes that operation.
 
 Tests live alongside the code (`#[cfg(test)] mod tests`), plus `path-cli` has integration tests in `tests/`. Per-crate counts:
 
-- `toolpath`: 32 unit + 9 doc tests (serde roundtrip, builders, query)
-- `toolpath-convo`: 58 unit + 1 doc test (types, enrichment, display, ConversationView -> Path derivation)
+- `toolpath`: 69 unit + 11 doc tests (serde roundtrip, builders, query)
+- `toolpath-convo`: 118 unit + 4 doc tests (types, enrichment, display, ConversationView -> Path derivation, message-group usage accounting, breakdowns)
 - `toolpath-git`: 33 unit + 3 doc tests (derive, branch detection, diffstat)
-- `toolpath-github`: 28 unit + 2 doc tests (mapping, DAG construction, fixtures)
-- `toolpath-claude`: 278 unit + 6 doc tests (path resolution, conversation reading, query, chaining, watcher, derive, metadata first-user-message)
-- `toolpath-gemini`: 163 unit + 12 integration + 4 doc tests (path resolution, chat-file parsing, query, watcher, derive, provider, round-trip fidelity)
-- `toolpath-codex`: 69 unit + 33 integration + 1 doc test (rollout parsing, provider assembly, patch-fidelity derive, real-session fixture, source→path fidelity invariants, JSON wire-level round-trip)
-- `toolpath-opencode`: 43 unit + 1 doc test (SQLite reader, JSON payload serde, provider assembly, snapshot-based derive, tool-input fallback for gitignored paths)
-- `toolpath-cursor`: 70 unit + 8 integration round-trip + 1 real-DB sanity + 1 doc test (state.vscdb SQLite reader, bubble store + composer header parsing, content-addressed blob lookup, projector with full TOOL_TABLE coverage, JSONL transcript ingest in `examples/dump_fixture.rs`)
-- `toolpath-pi`: 123 unit + 4 doc tests (types, paths, error, reader, io, provider)
+- `toolpath-github`: 32 unit + 3 doc tests (mapping, DAG construction, fixtures)
+- `toolpath-claude`: 229 unit + 18 integration + 6 doc tests (path resolution, conversation reading, query, chaining, watcher, derive, metadata first-user-message, group_id grouping + once-per-message usage totals)
+- `toolpath-gemini`: 161 unit + 29 integration + 5 doc tests (path resolution, chat-file parsing, query, watcher, derive, provider, round-trip fidelity, thoughts-folded-into-output + reasoning breakdown round-trip)
+- `toolpath-codex`: 80 unit + 51 integration + 2 doc tests (rollout parsing, provider assembly, patch-fidelity derive, real-session fixture, source→path fidelity invariants, JSON wire-level round-trip, per-turn token deltas from cumulative counters, reasoning breakdown)
+- `toolpath-opencode`: 52 unit + 19 integration + 1 doc test (SQLite reader, JSON payload serde, provider assembly, snapshot-based derive, tool-input fallback for gitignored paths, reasoning breakdown)
+- `toolpath-cursor`: 78 unit + 8 integration round-trip + 1 real-DB sanity + 1 doc test (state.vscdb SQLite reader, bubble store + composer header parsing, content-addressed blob lookup, projector with full TOOL_TABLE coverage, JSONL transcript ingest in `examples/dump_fixture.rs`)
+- `toolpath-pi`: 133 unit + 26 integration + 5 doc tests (types, paths, error, reader, io, provider)
 - `toolpath-dot`: 30 unit + 2 doc tests (render, visual conventions, escaping)
-- `path-cli`: 260 unit + 63 integration tests (import/export/cache, track sessions, merge, validate, roundtrip, render-md snapshots, deprecation aliases, pathbase HTTP mock-server tests, fzf-friendly TSV output, `path resume` orchestration with injectable `ExecStrategy`). For an end-to-end check against a real Pathbase deployment, run `scripts/test-pathbase-live.sh <url>` — it does an anon round-trip in a sandboxed config dir and, if you're logged into that URL, an authed pathstash round-trip too.
+- `path-cli`: 294 unit + 65 integration tests (import/export/cache, track sessions, merge, validate, roundtrip, render-md snapshots, deprecation aliases, pathbase HTTP mock-server tests, fzf-friendly TSV output, `path resume` orchestration with injectable `ExecStrategy`). For an end-to-end check against a real Pathbase deployment, run `scripts/test-pathbase-live.sh <url>` — it does an anon round-trip in a sandboxed config dir and, if you're logged into that URL, an authed pathstash round-trip too.
 - `toolpath-cli`: 0 tests (it's a one-line `path_cli::run()` shim crate that exists only so `cargo install toolpath-cli` keeps installing the `path` binary)
 
 Validate example documents: `for f in examples/*.json; do cargo run -p path-cli -- p validate --input "$f"; done`
@@ -229,7 +229,7 @@ When changing a crate's public API (new types, new trait impls, new public metho
 
 The `toolpath-cli` shim lives **outside** the workspace (`exclude = ["crates/toolpath-cli"]` in the root `Cargo.toml`). Both `toolpath-cli` and `path-cli` produce a binary literally named `path`, and cargo can't write two bin targets to the same workspace `target/debug/path` — so the shim opts out and gets its own `crates/toolpath-cli/target/` (covered by the `crates/*/target` line in `.gitignore`). Practical consequences: `cargo build --workspace`, `cargo test --workspace`, and `cargo run -p toolpath-cli` from the repo root **do not** include the shim. To touch it, use `--manifest-path crates/toolpath-cli/Cargo.toml`. The release script special-cases the shim in `get_version` and `publish` so the workflow is otherwise unchanged.
 
-Build the site after changes: `cd site && pnpm run build` (should produce 7 pages).
+Build the site after changes: `cd site && pnpm run build` (should produce 11 pages).
 
 ## Things to know
 
@@ -242,7 +242,9 @@ Build the site after changes: `cd site && pnpm run build` (should produce 7 page
 - `toolpath-gemini` treats main file + sibling sub-agent UUID dir as one conversation. Sub-agent files are folded into `DelegatedWork` with populated `turns` (unlike `toolpath-claude`, whose sub-agent turns live in separate session files and stay empty). See `docs/agents/formats/gemini.md` for the full format reference.
 - Provider-specific extras convention: `Turn.extra` and `WatcherEvent::Progress.data` use provider-namespaced keys (e.g. `extra["claude"]`, `extra["gemini"]`). `toolpath-claude` populates `Turn.extra["claude"]` from `ConversationEntry.extra`; `toolpath-gemini` populates `Turn.extra["gemini"]` with the full `tokens` struct, per-thought metadata, and tool-call status. This lets trait-only consumers access provider metadata without importing provider types.
 - Shared derivation: `toolpath-convo` provides a provider-agnostic `ConversationView → Path` mapping via `toolpath_convo::derive_path`. New conversation providers should build on it rather than re-implementing the mapping.
-- Path kinds: `toolpath::v1::PathMeta.kind` is an optional URI naming a hosted kind spec; URIs are immutable and semver-versioned. The only one defined so far is `https://toolpath.net/kinds/agent-coding-session/v1.0.0` (constant `toolpath::v1::PATH_KIND_AGENT_CODING_SESSION`); every conversation → `Path` derivation sets it via the shared `toolpath_convo::derive_path` or each provider crate's own. Carried through the JSONL form via `PathOpen.meta` and `PathMeta` patch lines. Spec sources live in `site/kinds/<name>/<version>/{index.md,schema.json}` and publish under `https://toolpath.net/kinds/`; the registry index is `site/kinds/index.md`. RFC: "Document Kind". JSON Schema: `$defs/pathMeta`.
+- Path kinds: `toolpath::v1::PathMeta.kind` is an optional URI naming a hosted kind spec; URIs are immutable and semver-versioned. The only one defined so far is `https://toolpath.net/kinds/agent-coding-session/v1.1.0` (constant `toolpath::v1::PATH_KIND_AGENT_CODING_SESSION`; `…_V1_0_0` names the superseded URI); every conversation → `Path` derivation sets it via the shared `toolpath_convo::derive_path` or each provider crate's own. Carried through the JSONL form via `PathOpen.meta` and `PathMeta` patch lines. Spec sources live in `site/kinds/<name>/<version>/{index.md,schema.json}` (schema.json is a symlink into `crates/path-cli/kinds/`, which `path p validate` bundles — both versions) and publish under `https://toolpath.net/kinds/`; the registry index is `site/kinds/index.md`. RFC: "Document Kind". JSON Schema: `$defs/pathMeta`.
+- Token accounting (kind v1.1.0): two keys on `conversation.append`/`Turn`, both optional. `token_usage` = "the total for a message" (on the group's final step; `Σ` over a path = session total). `attributed_token_usage` = "this step's own attributed spend", populated only where the source genuinely reports per-step spend (its own key, so the sum is unaffected; remainder = group total − Σ attributed, computed not stored). One provider message can span several steps (Claude writes one JSONL line per content block); `Turn.group_id` groups them. `toolpath-claude` fills `group_id` from `message.id` and takes the **field-wise-max** group total (line order not trusted). Claude's per-line `usage` is a cumulative *streaming snapshot* (Anthropic streaming API: `message_start` seeds output near 0, `message_delta` is cumulative), NOT a per-block cost — so Claude emits no `attributed_token_usage`; the projector re-expands the total onto every line. `toolpath-codex` differences the cumulative `total_token_usage` (dedup-safe: never sum `last_token_usage` — Codex re-emits it stale; openai/codex #14489), attributes each per-call delta to the step it follows, and derives the round total from those attributions. pi/opencode decode all-zero wire counters as `None`. Never stamp a cumulative counter, a repeated message total, or zero-filled placeholders onto a step; never derive attribution from Claude's streaming snapshots.
+- Token usage `breakdowns` (kind v1.1.0, additive): an optional third key on `TokenUsage` — a decomposition of a top-level class into named sub-classes, keyed by class (e.g. `"output"`), inner map sub-class → tokens (e.g. `breakdowns["output"]["reasoning"] = 243`). INFORMATIONAL ONLY: **never summed into any total** (the parent class already counts those tokens, so the session-total guarantee is untouched); invariant `Σ(inner) ≤ parent`; omitted when empty; rides both `token_usage` and `attributed_token_usage`. Per-provider reality: **Gemini** reports `thoughts` (reasoning) as an additive sibling that the derivation used to **drop** (under-counting output) — it's now folded into `output_tokens` *and* recorded as `breakdowns["output"]["reasoning"]`, with the projector un-folding it on the reverse path for a lossless round-trip (`Some(0)` preserved as a real Gemini-3 zero-reasoning signal). **OpenCode** folds `reasoning` into output and records the same breakdown. **Codex** differences `reasoning_output_tokens` (⊆ output, cumulative) into `breakdowns["output"]["reasoning"]` on both per-step `attributed_token_usage` and per-round `token_usage`. **Claude** records no breakdown (its JSONL `usage` doesn't itemize thinking tokens).
 - Pi provider: `toolpath-pi` reads Pi session JSONL from `~/.pi/agent/sessions/`. Sessions use a tree (id/parentId) in a single file, and may link to a parent file via `parentSession` in the header. The tree is preserved as a DAG in the derived `Path`.
 - Codex provider: `toolpath-codex` reads Codex CLI rollout files from `~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl`. Sessions are date-bucketed (not project-keyed). File-change fidelity is excellent — Codex's `patch_apply_end` events carry either the unified diff (for updates) or the full file content (for adds), so the derived `Path` gets a real `raw` perspective on every file artifact. See `docs/agents/formats/codex.md` for the full format reference.
 - opencode provider: `toolpath-opencode` reads a SQLite database at `~/.local/share/opencode/opencode.db` (opened read-only). Each session's messages and 12 typed part variants (text, reasoning, tool, step-start/-finish, snapshot, patch, file, agent, subtask, retry, compaction) land as one step per message with tool invocations attached. File diffs come from a sibling bare git repo at `snapshot/<project-id>/[<sha1(worktree)>]/` via `git2` tree↔tree diffs — opencode respects the user's `.gitignore`, so changes under gitignored paths fall back to tool-input-derived structural changes with no `raw` perspective. Project id is the SHA of the repo's first root commit. See `docs/agents/formats/opencode.md` for the full format reference.