feat(query): context-budget controls for hyp query sql + pipe-truncation fix by philcunliffe · Pull Request #85 · hyparam/hypaware

philcunliffe · 2026-06-04T22:15:06Z

Summary

Removes the (mythical) "100-row limit" on hyp query sql and replaces it with controls that actually bound what reaches an LLM/terminal context, plus fixes a silent output-truncation bug. Investigation started from a real task — reconstructing a Claude conversation DAG from ai_gateway_messages — that was paying a ~14× tax (46s across 12 paginated queries) for a constraint that didn't exist in the engine.

The improvements

1. Context-budget controls (replaces the nonexistent row cap)

The engine never had a 100-row cap (default limit is 1M); the cap lived only in the skill doc, and trusting it forced needless pagination. The real risk a cap should address is flooding the caller's context — so this decouples result size from context footprint with two independent axes:

Per-cell truncation (--max-cell, default 200 code points): recursively clips string leaves with a greppable …(+N) marker. Only strings shrink — numbers/booleans/JSON shape are preserved, so --format json stays valid. This is the highest-leverage knob for ai_gateway_messages, where byte mass is concentrated in a few fat columns (raw_frame, content_text, tools, attributes); clipping them lets many more rows fit a budget at full cardinality.
Row byte-budget (--max-bytes, default 32 KiB): drops trailing rows once cumulative serialized size is exceeded (always keeps ≥1), emitting a notice: showing X of Y rows … to stderr so stdout stays valid in every format.

applyContextControls() is a pure pre-render pass; renderResult() is unchanged byte-for-byte so existing smoke/JSON assertions hold.

2. `--output <file>` spill — the lossless escape hatch

Write the full, un-capped result to a file in one scan; print only a compact receipt (rows/cols/bytes/schema + clipped 3-row preview) to stdout. The data never enters context — the agent gets a handle and post-processes the file. This is the right path for large dumps (e.g. a full conversation): one ~3.3s query instead of 12 paginated ones.

3. Pipe-truncation fix (`bin/hypaware.js`)

process.exit() ran synchronously after async stdout.write, so piped output past the ~64 KiB pipe buffer was silently dropped (file redirection masked it — file writes are synchronous). Now drains stdout/stderr before exit. This was the actual reason large piped queries needed a temp-file workaround.

Verification

npm test → 767 pass / 0 fail (+15 new tests across context controls and argv parsing)
tsc --noEmit clean, lint clean
End-to-end on real data (1.1M-row cache): default caps trim 2336→234 rows with stderr notice; --output writes the full 2336-row file with zero markers + receipt; piped bytes now equal file bytes (2,839,485 — previously truncated at 64 KiB)
Hermetic smokes exercising query sql --format json (cache_spool_batching, backfill_claude_fixture) pass. (walkthrough_picker_to_first_query fails identically on master — pre-existing, unrelated.)

Considered and dropped

Projection-under-WHERE (read only projected + filter columns instead of all): implemented, then reverted after benchmarking showed it inert — no latency and no memory change even on a 239k-row filtered aggregate. hyparquet decodes row groups to evaluate the filter regardless; the columns arg only trims output objects. The wide-column cost that looked like a decode problem (SELECT * = 15.5s vs 2 cols = 1.8s) is actually output serialization width, which the context controls above already address. It belongs with a future change that adds real file/row-group pruning, where column I/O actually becomes the bottleneck.

Behavior change to note

Inline query sql output is now context-budgeted by default (caps + a stderr notice when rows are withheld). Programmatic callers wanting completeness should pass --output <file> or --max-bytes 0. The companion hypaware-query skill doc has been updated to describe the budget + --output instead of the old "100-row cap" claim.

🤖 Generated with Claude Code

`process.exit()` terminates synchronously and drops whatever is still buffered in stdout/stderr. For a pipe that means output past the ~64KiB pipe buffer was silently truncated (large `query sql` results, JSON dumps) — while the same command redirected to a file completed fine, because file writes are synchronous. Drain both streams (resolving on `error`/EPIPE so exit never blocks) after observability shutdown, then exit. Removes the need for callers to redirect to a file just to receive complete output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add applyContextControls(result, {maxCell, maxBytes}): a pure pre-render pass that bounds a result's context footprint on two independent axes: - per-cell truncation: recursively clip every string leaf to maxCell code points (never splitting a multibyte char), appending a greppable `…(+N)` marker. Only strings shrink — numbers/booleans/null and JSON shape are preserved, so `--format json` output stays valid. - row budget: drop trailing rows once cumulative serialized size exceeds maxBytes (always keeping >=1 row), returning a one-line `notice` naming what was withheld — meant for stderr so stdout output is never corrupted. renderResult is left byte-for-byte unchanged so existing smoke/json assertions hold. Replaces the notion of a blunt fixed row cap with a control that actually bounds context (100 rows of raw_frame JSON was already a context bomb a row cap did nothing to stop). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Wire the context controls into `hyp query sql` and add an escape hatch: - --output <file> (-o): write the full, un-capped result to a file and print only a compact receipt (rows/cols/bytes/schema + clipped 3-row preview) to stdout, so a large result never lands in the caller's context. The file is always lossless. - --max-cell <n> / --max-bytes <n>: override the inline caps (0 disables either). Defaults: 200 code points / 32KiB. Inline (non-spill) queries now apply the caps and route the "rows withheld" notice to stderr alongside freshness messages. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

philcunliffe · 2026-06-04T22:27:15Z

Dual-agent review — `request_changes`

Verdict: request_changes
Risk class: high
Auto-merge advisory: 👎 thumbs down — verdict is request_changes; needs human-gated follow-up

Advisory only: no merge was attempted.

Risk capstone

Cross-reference: reviewer findings vs high-risk surfaces

Source	Finding (severity, evidence)	Intersects
Codex	byte budget ignores render overhead (major, format.js:42)	Risks #3
Codex	packaged skill docs claim 100-row cap (major, plugins-workspace/*/SKILL.md:59)	Risks #4
Codex	spill + drain untested (minor, core_commands.js:768/bin:59)	Risks #1, #5
Claude	`--output` spill untested (major, core_commands.js:768)	Risks #1, Direct callers
Claude	runQuerySql wiring untested (major, core_commands.js:776)	Risks #2
Claude	inline `import()` type (minor, bin/hypaware.js:67)	Targets (flushStream)
Claude	flushStream untested (minor, bin/hypaware.js:60)	Risks #5, Concurrency surface

Codex review

Fix Validations

Pipe truncation before CLI exit

Status: correct
Evidence: bin/hypaware.js:59, bin/hypaware.js:70
Assessment: The main CLI path now awaits stdout/stderr drain before process.exit, which addresses buffered pipe output being dropped.

`--output` spill bypasses inline caps

Status: correct
Evidence: src/core/cli/core_commands.js:768, src/core/cli/core_commands.js:777
Assessment: The file write renders full before applyContextControls, so spill mode is lossless while stdout receives only the receipt/preview.

Inline context controls replace the row-cap myth

Status: incomplete
Evidence: src/core/cli/core_commands.js:777, src/core/query/format.js:42
Assessment: The controls are wired into hyp query sql, but the byte budget is not applied to the final rendered stdout size. See finding below.

Findings

1) Behavioral Correctness

Severity: major
Confidence: high
Evidence: src/core/query/format.js:42, src/core/query/format.js:116, src/core/query/format.js:139
Why it matters: --max-bytes can exceed the advertised context budget because it counts only JSON.stringify(row) bytes, while json, jsonl, table, and markdown add renderer overhead afterward.
Suggested fix: Budget against the selected rendered format, or pass format into the context-control pass and include newlines, JSON array syntax, headers, separators, and table/markdown escaping in the size calculation.

2) Contract & Interface Fidelity

Severity: major
Confidence: high
Evidence: hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md:59, hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md:59, src/core/cli/core_commands.js:99
Why it matters: Packaged skill guidance still says hyp query sql never returns more than 100 rows and does not mention the new context-budget/output controls, so agent callers will keep following the old false contract.
Suggested fix: Update both packaged hypaware-query skill docs to describe default inline truncation, stderr notices, --output, --max-cell, --max-bytes, and the absence of a 100-row engine cap.

9) Test Evidence Quality

Severity: minor
Confidence: high
Evidence: bin/hypaware.js:59, src/core/cli/core_commands.js:768, test/core/query-context-controls.test.js:11, test/core/query-sql-argv.test.js:18
Why it matters: The highest-risk user-facing paths, pipe drain and --output spill/receipt, are not covered by the added tests, so the fixed truncation bug could regress silently.
Suggested fix: Add a CLI-level regression that pipes >64KiB and verifies byte equality, plus a command-level --output test that verifies full file contents and capped receipt preview.

No Finding

1. Change Impact / Blast Radius
1. Concurrency, Ordering & State Safety
1. Error Handling & Resilience
1. Security Surface
1. Resource Lifecycle & Cleanup
1. Release Safety
1. Architectural Consistency
1. Debuggability & Operability

Evidence Bundle

Changed hot paths: bin/hypaware.js process exit flushing; src/core/cli/core_commands.js:745 runQuerySql; src/core/cli/core_commands.js:861 argv parsing; src/core/query/format.js:29 context controls; src/core/query/format.js:137 result rendering.
Impacted callers: src/core/cli/core_commands.js:746 calls parseQuerySqlArgv; src/core/cli/core_commands.js:777 calls applyContextControls; src/core/cli/core_commands.js:769 and src/core/cli/core_commands.js:782 call renderResult; src/core/cli/core_commands.js:936 uses capped preview controls.
Impacted tests: test/core/query-context-controls.test.js:11; test/core/query-context-controls.test.js:47; test/core/query-context-controls.test.js:56; test/core/query-sql-argv.test.js:18; test/core/query-sql-argv.test.js:27; existing smokes asserting clean query sql stderr include hypaware-core/smoke/flows/cache_spool_batching.js:71 and hypaware-core/smoke/flows/gateway_codex_capture.js:195.
Unresolved uncertainty: I did not run npm test; per constraint, source inspection was limited to five files and reference mapping used rg.

Claude review

Claude review

`--output` spill mode has zero test coverage

Severity: major
Confidence: 92
Evidence: src/core/cli/core_commands.js:768-772 (spill branch) and renderSpillReceipt
Why it matters: The spill path is the riskiest new code (real fs.writeFile, byte sizing, 3-row preview, jsonl re-render) and embodies the feature's central "lossless file" promise, yet nothing exercises it — query-sql-argv.test.js only checks that --output parses, never that a file is written or is complete.
Suggested fix: Add an in-process dispatch test (reuse the harness in test/core/command-dispatch.test.js): run query sql … --output <tmp> against a small fixture and assert the file parses back to the full, untruncated rows and stdout shows the wrote N rows … receipt.

Inline-cap / notice wiring in `runQuerySql` is untested

Severity: major
Confidence: 85
Evidence: src/core/cli/core_commands.js:776-783
Why it matters: The contract that the budget notice goes to stderr (so stdout stays valid in every format) and that caps apply by default is only verified at the pure-helper level; a wiring bug (notice to stdout, caps skipped) passes every current test.
Suggested fix: Same dispatch harness — run a query exceeding the 32KiB/200cp defaults and assert JSON.parse(stdout) succeeds, the row count is reduced, and notice: showing N of M rows appears on stderr only.

Inline `import('...')` type violates repo style rule

Severity: minor
Confidence: 90
Evidence: bin/hypaware.js:67 (@param {import('node:stream').Writable} stream)
Why it matters: AGENTS.md/CLAUDE.md explicitly bans inline import('...') types ("Declare type imports at the top of the file with @import JSDoc comments, then reference the bare names").
Suggested fix: Add a top-of-file @import { Writable } from 'node:stream' block and change the param to @param {Writable} stream.

`flushStream` drain fix is untested

Severity: minor
Confidence: 80
Evidence: bin/hypaware.js:60-87 (flushStream)
Why it matters: This fixes a real, previously-silent >64KiB pipe truncation; a regression would re-introduce silent data loss on exactly the large results this PR targets, and nothing pins the behavior.
Suggested fix: Extract flushStream into a small importable module and unit-test it against a mock writable with non-zero writableLength (resolves on drain) and one that emits error (resolves, no hang); or add a child-process smoke piping a >64KiB result and asserting byte equality.

Reports: /Users/phil/workspace/hypaware/.git/worktrees/query-context-controls/dual-review/pr-85

Review feedback on the context-controls change: - docs: the repo-tracked packaged skill docs (plugins-workspace/{claude,codex}/skills/hypaware-query/SKILL.md) still claimed `query sql` is "hard-capped at 100 rows". Replace with the real context-budget + `--output` guidance so packaged agents stop following a false contract. - testability: extract the post-query spill-vs-inline decision into a pure, exported `buildQuerySqlOutput(full, opts)` (runQuerySql now just performs the IO) and cover it directly — lossless spill file + receipt, inline cap application, and notice routed to stderr with stdout still valid JSON. Closes the untested-spill / untested-wiring gaps. - exit drain: move `flushStream` out of bin/hypaware.js into src/core/cli/flush-streams.js (fixes the inline `import('...')` type the style guide bans, and makes it unit-testable) with tests for the drain, EPIPE, and no-double-resolve paths. - honesty: the row budget measures serialized row-data bytes (the jsonl payload), not final rendered size; reword the notice + docstring + skill docs to say so rather than implying an exact rendered-byte ceiling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

philcunliffe · 2026-06-04T22:47:34Z

Dual-agent review — `request_changes` (revision 2)

Verdict: request_changes
Risk class: medium
Auto-merge advisory: 👎 thumbs down — verdict is request_changes; needs human-gated follow-up

Re-review after the fix commit. The four prior majors (packaged skill docs, byte-budget wording, untested spill, untested wiring) are confirmed resolved; two new efficiency findings surfaced.

Risk capstone

Cross-reference: reviewer findings vs high-risk surfaces

Source	Finding (severity, evidence)	Intersects
Codex	eager truncation before budget (major, format.js:33)	Risks #1, Direct callers
Codex	spill double-render (major, core_commands.js:919/945)	Risks #2, Direct callers
Claude	IO plumbing + bin drain untested (minor, core_commands.js:763)	Risks #3, Concurrency surface

Codex review

Fix Validations

Mythical 100-row cap replaced with context controls

Status: correct
Evidence: src/core/cli/core_commands.js:823, src/core/cli/core_commands.js:830, src/core/cli/core_commands.js:924, src/core/query/format.js:34
Assessment: Inline query sql now routes through applyContextControls() with default per-cell and row-byte caps, while --output bypasses those caps.

Pipe output truncation on process exit

Status: correct
Evidence: bin/hypaware.js:54, bin/hypaware.js:56, src/core/cli/flush-streams.js:22
Assessment: The top-level CLI now waits for stdout/stderr buffered writes before synchronous process.exit(), which addresses the described pipe-buffer truncation path.

Findings

3) Change Impact / Blast Radius

Severity: major
Confidence: high
Evidence: src/core/query/format.js:38, src/core/query/format.js:47, src/core/query/format.js:50, src/core/cli/core_commands.js:924
Why it matters: Default inline output still recursively copies/truncates every returned row before the byte budget can drop trailing rows, so broad queries can pay CPU and memory proportional to the full result even when only ~32KB will be emitted.
Suggested fix: Apply truncation lazily inside the byte-budget loop: truncate one source row, measure it, keep/break, and use result.rows.length for the notice count.

7) Resource Lifecycle & Cleanup

Severity: major
Confidence: high
Evidence: src/core/cli/core_commands.js:919, src/core/cli/core_commands.js:921, src/core/cli/core_commands.js:945
Why it matters: Spill mode renders the complete result once to compute receipt bytes and again for file content, doubling serialization work and peak string memory on the large dumps --output is meant to handle.
Suggested fix: Render the full file content once in buildQuerySqlOutput(), compute Buffer.byteLength(content) from that string, and pass the byte count/content into the receipt builder.

No Finding

Behavioral Correctness
Contract & Interface Fidelity
Concurrency, Ordering & State Safety
Error Handling & Resilience
Security Surface
Release Safety
Test Evidence Quality
Architectural Consistency
Debuggability & Operability

Evidence Bundle

Changed hot paths: bin/hypaware.js process exit flushing; runQuerySql() output path; parseQuerySqlArgv(); buildQuerySqlOutput(); applyContextControls().
Impacted callers: runQuerySql() calls buildQuerySqlOutput() at src/core/cli/core_commands.js:763; inline mode calls applyContextControls() at src/core/cli/core_commands.js:924; spill receipt preview calls applyContextControls() at src/core/cli/core_commands.js:953; top-level CLI calls flushStream() at bin/hypaware.js:56.
Impacted tests: test/core/query-context-controls.test.js; test/core/query-sql-argv.test.js; test/core/query-sql-output.test.js:13; test/core/flush-streams.test.js.
Unresolved uncertainty: I did not run npm test in this read-only review environment; I also did not see an automated child-process regression that proves bin/hypaware.js preserves >64KB piped output end to end.

Claude review

Claude review

`runQuerySql` IO plumbing and bin drain remain integration-untested (residual)

Severity: minor
Confidence: 85
Evidence: src/core/cli/core_commands.js:763-766 (fs.writeFile + ctx.stdout/stderr.write); bin/hypaware.js:54-56 (flushStream drain before process.exit)
Why it matters: The spill/cap/notice logic is now pure and well-tested via buildQuerySqlOutput, but the trivial three-line transcription that writes the file and the two streams — and the bin drain that is the actual >64KiB pipe-truncation fix — are only covered against fakes, not end-to-end. A regression in the real process.stdout drain would silently re-truncate large piped output.
Suggested fix: Add one assertion to the existing walkthrough_to_first_query smoke (or a child-process smoke) that runs query sql … --output <tmp>, asserts the file lands with full content and stdout is just the receipt; optionally pipe a >64KiB result and assert byte equality.

Reports: /Users/phil/workspace/hypaware/.git/worktrees/query-context-controls/dual-review/pr-85

Address two efficiency findings from re-review (both hit the large-result paths these features target): - applyContextControls truncated *every* returned row before the byte budget dropped the trailing ones, paying CPU/allocation proportional to the full result even though only ~32KB is emitted. Truncate lazily inside the budget loop instead — one source row at a time, stop at the cutoff. Output is byte-identical; a landmine-row test pins that rows past the budget are never touched. - buildQuerySqlOutput rendered the full result twice in spill mode (once to size the receipt, once for file content), doubling serialization and peak string memory on exactly the dumps --output exists for. Render once and reuse the string; a test asserts the receipt byte count equals the file content length. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

philcunliffe and others added 3 commits June 4, 2026 15:14

philcunliffe merged commit 7bce555 into master Jun 4, 2026
6 checks passed

philcunliffe deleted the worktree-query-context-controls branch June 4, 2026 23:20

Conversation

philcunliffe commented Jun 4, 2026

Summary

The improvements

1. Context-budget controls (replaces the nonexistent row cap)

2. --output <file> spill — the lossless escape hatch

3. Pipe-truncation fix (bin/hypaware.js)

Verification

Considered and dropped

Behavior change to note

Uh oh!

philcunliffe commented Jun 4, 2026

Dual-agent review — request_changes

Risk capstone

Cross-reference: reviewer findings vs high-risk surfaces

Fix Validations

Pipe truncation before CLI exit

--output spill bypasses inline caps

Inline context controls replace the row-cap myth

Findings

1) Behavioral Correctness

2) Contract & Interface Fidelity

9) Test Evidence Quality

No Finding

Evidence Bundle

Claude review

--output spill mode has zero test coverage

Inline-cap / notice wiring in runQuerySql is untested

Inline import('...') type violates repo style rule

flushStream drain fix is untested

Uh oh!

philcunliffe commented Jun 4, 2026

Dual-agent review — request_changes (revision 2)

Risk capstone

Cross-reference: reviewer findings vs high-risk surfaces

Fix Validations

Mythical 100-row cap replaced with context controls

Pipe output truncation on process exit

Findings

3) Change Impact / Blast Radius

7) Resource Lifecycle & Cleanup

No Finding

Evidence Bundle

Claude review

runQuerySql IO plumbing and bin drain remain integration-untested (residual)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

2. `--output <file>` spill — the lossless escape hatch

3. Pipe-truncation fix (`bin/hypaware.js`)

Dual-agent review — `request_changes`

`--output` spill bypasses inline caps

`--output` spill mode has zero test coverage

Inline-cap / notice wiring in `runQuerySql` is untested

Inline `import('...')` type violates repo style rule

`flushStream` drain fix is untested

Dual-agent review — `request_changes` (revision 2)

`runQuerySql` IO plumbing and bin drain remain integration-untested (residual)