Skip to content

feat(query): context-budget controls for hyp query sql + pipe-truncation fix#85

Merged
philcunliffe merged 5 commits into
masterfrom
worktree-query-context-controls
Jun 4, 2026
Merged

feat(query): context-budget controls for hyp query sql + pipe-truncation fix#85
philcunliffe merged 5 commits into
masterfrom
worktree-query-context-controls

Conversation

@philcunliffe
Copy link
Copy Markdown
Contributor

Summary

Removes the (mythical) "100-row limit" on hyp query sql and replaces it with controls that actually bound what reaches an LLM/terminal context, plus fixes a silent output-truncation bug. Investigation started from a real task — reconstructing a Claude conversation DAG from ai_gateway_messages — that was paying a ~14× tax (46s across 12 paginated queries) for a constraint that didn't exist in the engine.

The improvements

1. Context-budget controls (replaces the nonexistent row cap)

The engine never had a 100-row cap (default limit is 1M); the cap lived only in the skill doc, and trusting it forced needless pagination. The real risk a cap should address is flooding the caller's context — so this decouples result size from context footprint with two independent axes:

  • Per-cell truncation (--max-cell, default 200 code points): recursively clips string leaves with a greppable …(+N) marker. Only strings shrink — numbers/booleans/JSON shape are preserved, so --format json stays valid. This is the highest-leverage knob for ai_gateway_messages, where byte mass is concentrated in a few fat columns (raw_frame, content_text, tools, attributes); clipping them lets many more rows fit a budget at full cardinality.
  • Row byte-budget (--max-bytes, default 32 KiB): drops trailing rows once cumulative serialized size is exceeded (always keeps ≥1), emitting a notice: showing X of Y rows … to stderr so stdout stays valid in every format.

applyContextControls() is a pure pre-render pass; renderResult() is unchanged byte-for-byte so existing smoke/JSON assertions hold.

2. --output <file> spill — the lossless escape hatch

Write the full, un-capped result to a file in one scan; print only a compact receipt (rows/cols/bytes/schema + clipped 3-row preview) to stdout. The data never enters context — the agent gets a handle and post-processes the file. This is the right path for large dumps (e.g. a full conversation): one ~3.3s query instead of 12 paginated ones.

3. Pipe-truncation fix (bin/hypaware.js)

process.exit() ran synchronously after async stdout.write, so piped output past the ~64 KiB pipe buffer was silently dropped (file redirection masked it — file writes are synchronous). Now drains stdout/stderr before exit. This was the actual reason large piped queries needed a temp-file workaround.

Verification

  • npm test767 pass / 0 fail (+15 new tests across context controls and argv parsing)
  • tsc --noEmit clean, lint clean
  • End-to-end on real data (1.1M-row cache): default caps trim 2336→234 rows with stderr notice; --output writes the full 2336-row file with zero markers + receipt; piped bytes now equal file bytes (2,839,485 — previously truncated at 64 KiB)
  • Hermetic smokes exercising query sql --format json (cache_spool_batching, backfill_claude_fixture) pass. (walkthrough_picker_to_first_query fails identically on master — pre-existing, unrelated.)

Considered and dropped

Projection-under-WHERE (read only projected + filter columns instead of all): implemented, then reverted after benchmarking showed it inert — no latency and no memory change even on a 239k-row filtered aggregate. hyparquet decodes row groups to evaluate the filter regardless; the columns arg only trims output objects. The wide-column cost that looked like a decode problem (SELECT * = 15.5s vs 2 cols = 1.8s) is actually output serialization width, which the context controls above already address. It belongs with a future change that adds real file/row-group pruning, where column I/O actually becomes the bottleneck.

Behavior change to note

Inline query sql output is now context-budgeted by default (caps + a stderr notice when rows are withheld). Programmatic callers wanting completeness should pass --output <file> or --max-bytes 0. The companion hypaware-query skill doc has been updated to describe the budget + --output instead of the old "100-row cap" claim.

🤖 Generated with Claude Code

philcunliffe and others added 3 commits June 4, 2026 15:14
`process.exit()` terminates synchronously and drops whatever is still
buffered in stdout/stderr. For a pipe that means output past the ~64KiB
pipe buffer was silently truncated (large `query sql` results, JSON
dumps) — while the same command redirected to a file completed fine,
because file writes are synchronous.

Drain both streams (resolving on `error`/EPIPE so exit never blocks)
after observability shutdown, then exit. Removes the need for callers to
redirect to a file just to receive complete output.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add applyContextControls(result, {maxCell, maxBytes}): a pure pre-render
pass that bounds a result's context footprint on two independent axes:

- per-cell truncation: recursively clip every string leaf to maxCell code
  points (never splitting a multibyte char), appending a greppable
  `…(+N)` marker. Only strings shrink — numbers/booleans/null and JSON
  shape are preserved, so `--format json` output stays valid.
- row budget: drop trailing rows once cumulative serialized size exceeds
  maxBytes (always keeping >=1 row), returning a one-line `notice` naming
  what was withheld — meant for stderr so stdout output is never corrupted.

renderResult is left byte-for-byte unchanged so existing smoke/json
assertions hold. Replaces the notion of a blunt fixed row cap with a
control that actually bounds context (100 rows of raw_frame JSON was
already a context bomb a row cap did nothing to stop).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire the context controls into `hyp query sql` and add an escape hatch:

- --output <file> (-o): write the full, un-capped result to a file and
  print only a compact receipt (rows/cols/bytes/schema + clipped 3-row
  preview) to stdout, so a large result never lands in the caller's
  context. The file is always lossless.
- --max-cell <n> / --max-bytes <n>: override the inline caps (0 disables
  either). Defaults: 200 code points / 32KiB.

Inline (non-spill) queries now apply the caps and route the "rows
withheld" notice to stderr alongside freshness messages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@philcunliffe
Copy link
Copy Markdown
Contributor Author

Dual-agent review — request_changes

  • Verdict: request_changes
  • Risk class: high
  • Auto-merge advisory: 👎 thumbs down — verdict is request_changes; needs human-gated follow-up

Advisory only: no merge was attempted.

Risk capstone

Cross-reference: reviewer findings vs high-risk surfaces

Source Finding (severity, evidence) Intersects
Codex byte budget ignores render overhead (major, format.js:42) Risks #3
Codex packaged skill docs claim 100-row cap (major, plugins-workspace/*/SKILL.md:59) Risks #4
Codex spill + drain untested (minor, core_commands.js:768/bin:59) Risks #1, #5
Claude --output spill untested (major, core_commands.js:768) Risks #1, Direct callers
Claude runQuerySql wiring untested (major, core_commands.js:776) Risks #2
Claude inline import() type (minor, bin/hypaware.js:67) Targets (flushStream)
Claude flushStream untested (minor, bin/hypaware.js:60) Risks #5, Concurrency surface
Codex review

Fix Validations

Pipe truncation before CLI exit

  • Status: correct
  • Evidence: bin/hypaware.js:59, bin/hypaware.js:70
  • Assessment: The main CLI path now awaits stdout/stderr drain before process.exit, which addresses buffered pipe output being dropped.

--output spill bypasses inline caps

  • Status: correct
  • Evidence: src/core/cli/core_commands.js:768, src/core/cli/core_commands.js:777
  • Assessment: The file write renders full before applyContextControls, so spill mode is lossless while stdout receives only the receipt/preview.

Inline context controls replace the row-cap myth

  • Status: incomplete
  • Evidence: src/core/cli/core_commands.js:777, src/core/query/format.js:42
  • Assessment: The controls are wired into hyp query sql, but the byte budget is not applied to the final rendered stdout size. See finding below.

Findings

1) Behavioral Correctness

  • Severity: major
  • Confidence: high
  • Evidence: src/core/query/format.js:42, src/core/query/format.js:116, src/core/query/format.js:139
  • Why it matters: --max-bytes can exceed the advertised context budget because it counts only JSON.stringify(row) bytes, while json, jsonl, table, and markdown add renderer overhead afterward.
  • Suggested fix: Budget against the selected rendered format, or pass format into the context-control pass and include newlines, JSON array syntax, headers, separators, and table/markdown escaping in the size calculation.

2) Contract & Interface Fidelity

  • Severity: major
  • Confidence: high
  • Evidence: hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md:59, hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md:59, src/core/cli/core_commands.js:99
  • Why it matters: Packaged skill guidance still says hyp query sql never returns more than 100 rows and does not mention the new context-budget/output controls, so agent callers will keep following the old false contract.
  • Suggested fix: Update both packaged hypaware-query skill docs to describe default inline truncation, stderr notices, --output, --max-cell, --max-bytes, and the absence of a 100-row engine cap.

9) Test Evidence Quality

  • Severity: minor
  • Confidence: high
  • Evidence: bin/hypaware.js:59, src/core/cli/core_commands.js:768, test/core/query-context-controls.test.js:11, test/core/query-sql-argv.test.js:18
  • Why it matters: The highest-risk user-facing paths, pipe drain and --output spill/receipt, are not covered by the added tests, so the fixed truncation bug could regress silently.
  • Suggested fix: Add a CLI-level regression that pipes >64KiB and verifies byte equality, plus a command-level --output test that verifies full file contents and capped receipt preview.

No Finding

    1. Change Impact / Blast Radius
    1. Concurrency, Ordering & State Safety
    1. Error Handling & Resilience
    1. Security Surface
    1. Resource Lifecycle & Cleanup
    1. Release Safety
    1. Architectural Consistency
    1. Debuggability & Operability

Evidence Bundle

  • Changed hot paths: bin/hypaware.js process exit flushing; src/core/cli/core_commands.js:745 runQuerySql; src/core/cli/core_commands.js:861 argv parsing; src/core/query/format.js:29 context controls; src/core/query/format.js:137 result rendering.
  • Impacted callers: src/core/cli/core_commands.js:746 calls parseQuerySqlArgv; src/core/cli/core_commands.js:777 calls applyContextControls; src/core/cli/core_commands.js:769 and src/core/cli/core_commands.js:782 call renderResult; src/core/cli/core_commands.js:936 uses capped preview controls.
  • Impacted tests: test/core/query-context-controls.test.js:11; test/core/query-context-controls.test.js:47; test/core/query-context-controls.test.js:56; test/core/query-sql-argv.test.js:18; test/core/query-sql-argv.test.js:27; existing smokes asserting clean query sql stderr include hypaware-core/smoke/flows/cache_spool_batching.js:71 and hypaware-core/smoke/flows/gateway_codex_capture.js:195.
  • Unresolved uncertainty: I did not run npm test; per constraint, source inspection was limited to five files and reference mapping used rg.
Claude review

Claude review

--output spill mode has zero test coverage

  • Severity: major
  • Confidence: 92
  • Evidence: src/core/cli/core_commands.js:768-772 (spill branch) and renderSpillReceipt
  • Why it matters: The spill path is the riskiest new code (real fs.writeFile, byte sizing, 3-row preview, jsonl re-render) and embodies the feature's central "lossless file" promise, yet nothing exercises it — query-sql-argv.test.js only checks that --output parses, never that a file is written or is complete.
  • Suggested fix: Add an in-process dispatch test (reuse the harness in test/core/command-dispatch.test.js): run query sql … --output <tmp> against a small fixture and assert the file parses back to the full, untruncated rows and stdout shows the wrote N rows … receipt.

Inline-cap / notice wiring in runQuerySql is untested

  • Severity: major
  • Confidence: 85
  • Evidence: src/core/cli/core_commands.js:776-783
  • Why it matters: The contract that the budget notice goes to stderr (so stdout stays valid in every format) and that caps apply by default is only verified at the pure-helper level; a wiring bug (notice to stdout, caps skipped) passes every current test.
  • Suggested fix: Same dispatch harness — run a query exceeding the 32KiB/200cp defaults and assert JSON.parse(stdout) succeeds, the row count is reduced, and notice: showing N of M rows appears on stderr only.

Inline import('...') type violates repo style rule

  • Severity: minor
  • Confidence: 90
  • Evidence: bin/hypaware.js:67 (@param {import('node:stream').Writable} stream)
  • Why it matters: AGENTS.md/CLAUDE.md explicitly bans inline import('...') types ("Declare type imports at the top of the file with @import JSDoc comments, then reference the bare names").
  • Suggested fix: Add a top-of-file @import { Writable } from 'node:stream' block and change the param to @param {Writable} stream.

flushStream drain fix is untested

  • Severity: minor
  • Confidence: 80
  • Evidence: bin/hypaware.js:60-87 (flushStream)
  • Why it matters: This fixes a real, previously-silent >64KiB pipe truncation; a regression would re-introduce silent data loss on exactly the large results this PR targets, and nothing pins the behavior.
  • Suggested fix: Extract flushStream into a small importable module and unit-test it against a mock writable with non-zero writableLength (resolves on drain) and one that emits error (resolves, no hang); or add a child-process smoke piping a >64KiB result and asserting byte equality.

Reports: /Users/phil/workspace/hypaware/.git/worktrees/query-context-controls/dual-review/pr-85

Review feedback on the context-controls change:

- docs: the repo-tracked packaged skill docs
  (plugins-workspace/{claude,codex}/skills/hypaware-query/SKILL.md) still
  claimed `query sql` is "hard-capped at 100 rows". Replace with the real
  context-budget + `--output` guidance so packaged agents stop following
  a false contract.

- testability: extract the post-query spill-vs-inline decision into a
  pure, exported `buildQuerySqlOutput(full, opts)` (runQuerySql now just
  performs the IO) and cover it directly — lossless spill file + receipt,
  inline cap application, and notice routed to stderr with stdout still
  valid JSON. Closes the untested-spill / untested-wiring gaps.

- exit drain: move `flushStream` out of bin/hypaware.js into
  src/core/cli/flush-streams.js (fixes the inline `import('...')` type
  the style guide bans, and makes it unit-testable) with tests for the
  drain, EPIPE, and no-double-resolve paths.

- honesty: the row budget measures serialized row-data bytes (the jsonl
  payload), not final rendered size; reword the notice + docstring + skill
  docs to say so rather than implying an exact rendered-byte ceiling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@philcunliffe
Copy link
Copy Markdown
Contributor Author

Dual-agent review — request_changes (revision 2)

  • Verdict: request_changes
  • Risk class: medium
  • Auto-merge advisory: 👎 thumbs down — verdict is request_changes; needs human-gated follow-up

Re-review after the fix commit. The four prior majors (packaged skill docs, byte-budget wording, untested spill, untested wiring) are confirmed resolved; two new efficiency findings surfaced.

Risk capstone

Cross-reference: reviewer findings vs high-risk surfaces

Source Finding (severity, evidence) Intersects
Codex eager truncation before budget (major, format.js:33) Risks #1, Direct callers
Codex spill double-render (major, core_commands.js:919/945) Risks #2, Direct callers
Claude IO plumbing + bin drain untested (minor, core_commands.js:763) Risks #3, Concurrency surface
Codex review

Fix Validations

Mythical 100-row cap replaced with context controls

  • Status: correct
  • Evidence: src/core/cli/core_commands.js:823, src/core/cli/core_commands.js:830, src/core/cli/core_commands.js:924, src/core/query/format.js:34
  • Assessment: Inline query sql now routes through applyContextControls() with default per-cell and row-byte caps, while --output bypasses those caps.

Pipe output truncation on process exit

  • Status: correct
  • Evidence: bin/hypaware.js:54, bin/hypaware.js:56, src/core/cli/flush-streams.js:22
  • Assessment: The top-level CLI now waits for stdout/stderr buffered writes before synchronous process.exit(), which addresses the described pipe-buffer truncation path.

Findings

3) Change Impact / Blast Radius

  • Severity: major
  • Confidence: high
  • Evidence: src/core/query/format.js:38, src/core/query/format.js:47, src/core/query/format.js:50, src/core/cli/core_commands.js:924
  • Why it matters: Default inline output still recursively copies/truncates every returned row before the byte budget can drop trailing rows, so broad queries can pay CPU and memory proportional to the full result even when only ~32KB will be emitted.
  • Suggested fix: Apply truncation lazily inside the byte-budget loop: truncate one source row, measure it, keep/break, and use result.rows.length for the notice count.

7) Resource Lifecycle & Cleanup

  • Severity: major
  • Confidence: high
  • Evidence: src/core/cli/core_commands.js:919, src/core/cli/core_commands.js:921, src/core/cli/core_commands.js:945
  • Why it matters: Spill mode renders the complete result once to compute receipt bytes and again for file content, doubling serialization work and peak string memory on the large dumps --output is meant to handle.
  • Suggested fix: Render the full file content once in buildQuerySqlOutput(), compute Buffer.byteLength(content) from that string, and pass the byte count/content into the receipt builder.

No Finding

  1. Behavioral Correctness
  2. Contract & Interface Fidelity
  3. Concurrency, Ordering & State Safety
  4. Error Handling & Resilience
  5. Security Surface
  6. Release Safety
  7. Test Evidence Quality
  8. Architectural Consistency
  9. Debuggability & Operability

Evidence Bundle

  • Changed hot paths: bin/hypaware.js process exit flushing; runQuerySql() output path; parseQuerySqlArgv(); buildQuerySqlOutput(); applyContextControls().
  • Impacted callers: runQuerySql() calls buildQuerySqlOutput() at src/core/cli/core_commands.js:763; inline mode calls applyContextControls() at src/core/cli/core_commands.js:924; spill receipt preview calls applyContextControls() at src/core/cli/core_commands.js:953; top-level CLI calls flushStream() at bin/hypaware.js:56.
  • Impacted tests: test/core/query-context-controls.test.js; test/core/query-sql-argv.test.js; test/core/query-sql-output.test.js:13; test/core/flush-streams.test.js.
  • Unresolved uncertainty: I did not run npm test in this read-only review environment; I also did not see an automated child-process regression that proves bin/hypaware.js preserves >64KB piped output end to end.
Claude review

Claude review

runQuerySql IO plumbing and bin drain remain integration-untested (residual)

  • Severity: minor
  • Confidence: 85
  • Evidence: src/core/cli/core_commands.js:763-766 (fs.writeFile + ctx.stdout/stderr.write); bin/hypaware.js:54-56 (flushStream drain before process.exit)
  • Why it matters: The spill/cap/notice logic is now pure and well-tested via buildQuerySqlOutput, but the trivial three-line transcription that writes the file and the two streams — and the bin drain that is the actual >64KiB pipe-truncation fix — are only covered against fakes, not end-to-end. A regression in the real process.stdout drain would silently re-truncate large piped output.
  • Suggested fix: Add one assertion to the existing walkthrough_to_first_query smoke (or a child-process smoke) that runs query sql … --output <tmp>, asserts the file lands with full content and stdout is just the receipt; optionally pipe a >64KiB result and assert byte equality.

Reports: /Users/phil/workspace/hypaware/.git/worktrees/query-context-controls/dual-review/pr-85

Address two efficiency findings from re-review (both hit the large-result
paths these features target):

- applyContextControls truncated *every* returned row before the byte
  budget dropped the trailing ones, paying CPU/allocation proportional to
  the full result even though only ~32KB is emitted. Truncate lazily inside
  the budget loop instead — one source row at a time, stop at the cutoff.
  Output is byte-identical; a landmine-row test pins that rows past the
  budget are never touched.

- buildQuerySqlOutput rendered the full result twice in spill mode (once to
  size the receipt, once for file content), doubling serialization and peak
  string memory on exactly the dumps --output exists for. Render once and
  reuse the string; a test asserts the receipt byte count equals the file
  content length.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@philcunliffe philcunliffe merged commit 7bce555 into master Jun 4, 2026
6 checks passed
@philcunliffe philcunliffe deleted the worktree-query-context-controls branch June 4, 2026 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant