Skip to content

perf(engine,producer): batch N drawElement frames per CDP round-trip (HF_DE_BATCH)#1928

Open
vanceingalls wants to merge 1 commit into
de2-06-lint-playerfrom
de2-07-batch-capture
Open

perf(engine,producer): batch N drawElement frames per CDP round-trip (HF_DE_BATCH)#1928
vanceingalls wants to merge 1 commit into
de2-06-lint-playerfrom
de2-07-batch-capture

Conversation

@vanceingalls

@vanceingalls vanceingalls commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

Batch N drawElement frames per CDP round-trip (HF_DE_BATCH) — stack 7/7

Third speedup lever for the drawElement fast-capture path. Median +1.20× on top of worker-encode (to 1.56×), lossless, validated on a 19-comp sweep.

The problem

With encode offloaded to the worker (#1919), the DE path is capture-bound: every frame pays two CDP page.evaluate round-trips (seek; then paint-wait → drawElementImagecreateImageBitmap → worker handoff). Ablation showed the named in-page stages don't account for the measured 9–15 ms/frame — the missing ~3.5–9 ms/frame is per-call CDP protocol overhead.

The fix

  • produceDrawElementFrameBatch (drawElementService): ONE evaluate loops N frames in-page — seek → paint-wait (tick toggle + canvas paint event) → drawElementImage composite → createImageBitmap → per-frame postMessage to the encode worker. Bitmaps still reach the worker per frame (encode starts immediately, no pipeline bubble); only the CDP round-trips amortize N-fold. Micro-pipeline: frame i+1's seek/paint-wait overlaps frame i's createImageBitmap (canvas is only redrawn after i's bitmap resolves).
  • captureFramesBatchPipelined (frameCapture): perf accounting, static-dedup lastEncodeResult retention, and mid-batch-failure recovery — frames before the failure are already at the worker; the remainder re-captures via captureFrameToBufferPipelined, which owns the screenshot-fallback semantics. Failure behavior identical to unbatched, just discovered at batch granularity.
  • Producer loop branch (runWorkerEncodePipelineLoop): batches = maximal runs of consecutive frames that aren't static-dedup frames (their reuse is order-dependent) or opt-in boundary-screenshot frames; those keep the per-frame path. onBeforeCapture (video frame injection, a node-side per-frame hook) disables batching.

Off by default — opt in with HF_DE_BATCH=4. N=2 captures most of the win; N=4 ≈ N=6 ≈ N=8 (N=8 slightly regresses on long comps — encode backpressure) → recommended default 4.

Validation (19 stratified DE comps, isolated A/B N=0 vs N=4)

  • Median 1.20× (0.97–1.45×; DE-friendly comps to 1.56× in the earlier 3-comp sweep) — whole-render wall clock.
  • Zero damaged frames (<30 dB) anywhere. 17/19 bit-identical (∞), including the edge comps: static-dedup-heavy (4b038555, d95f20b6, 2811c9cb — 1.18–1.31× with ∞) and clip-cut (0531c45f, ∞).
  • The two non-∞ explained: one within the comp's own nondeterminism noise floor (57.0 vs 59.3 dB baseline-vs-baseline); one an inherent per-comp No cached paint record frame that fires on the unbatched path too — mid-batch recovery worked as designed (min 45.7 dB = the normal screenshot-vs-DE delta on recovered frames).
  • The ~1.0× cluster = long/dedup-heavy/onBeforeCapture comps where batching correctly no-ops or amortization dilutes; worst 0.97 = noise.

Why this is DE-only

drawElementImage is a page-context API — the page captures itself, so N frames need zero CDP traffic in between. Screenshot and BeginFrame capture via browser-process CDP commands the page cannot invoke, with the JPEG riding inside each command's response — per-frame alternation is forced by the protocol. (BeginFrame = Linux/Docker production is unaffected either way.)

Verification

  • engine + producer tsc --noEmit: 0 errors
  • engine drawElementService/frameCapture/config/static-dedup: 145 pass, 3 skip
  • producer captureStreamingStage.test.ts: 4 pre-existing failures identical on the parent branch (unbuilt-dist env issue in the worktree, HDR-stage imports — unrelated to this change)

Docs: fast-capture-architecture.md §"Batch capture" + de-speedup-exploration.md §8f (P6).


Stack: #1916#1917#1918#1919#1920#1921#1928 (here, new tip).

🤖 Generated with Claude Code

…(HF_DE_BATCH)

Amortizes per-frame CDP protocol overhead (~3.5-9ms/frame) by looping
seek -> paint-wait -> drawElementImage -> createImageBitmap in ONE
page.evaluate for runs of consecutive frames; bitmaps still post to the
encode worker per frame. Validated on 19 stratified DE comps: median
1.20x on top of worker-encode (to 1.56x), zero damaged frames, edge
comps (static-dedup-heavy, clip-cut) bit-identical; mid-batch failure
re-captures via the per-frame path (screenshot-fallback semantics
preserved). Off by default; opt in with HF_DE_BATCH=4.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant