perf(engine,producer): batch N drawElement frames per CDP round-trip (HF_DE_BATCH)#1928
Open
vanceingalls wants to merge 1 commit into
Open
perf(engine,producer): batch N drawElement frames per CDP round-trip (HF_DE_BATCH)#1928vanceingalls wants to merge 1 commit into
vanceingalls wants to merge 1 commit into
Conversation
…(HF_DE_BATCH) Amortizes per-frame CDP protocol overhead (~3.5-9ms/frame) by looping seek -> paint-wait -> drawElementImage -> createImageBitmap in ONE page.evaluate for runs of consecutive frames; bitmaps still post to the encode worker per frame. Validated on 19 stratified DE comps: median 1.20x on top of worker-encode (to 1.56x), zero damaged frames, edge comps (static-dedup-heavy, clip-cut) bit-identical; mid-batch failure re-captures via the per-frame path (screenshot-fallback semantics preserved). Off by default; opt in with HF_DE_BATCH=4. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This was referenced Jul 4, 2026
Collaborator
Author
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Batch N drawElement frames per CDP round-trip (
HF_DE_BATCH) — stack 7/7Third speedup lever for the drawElement fast-capture path. Median +1.20× on top of worker-encode (to 1.56×), lossless, validated on a 19-comp sweep.
The problem
With encode offloaded to the worker (#1919), the DE path is capture-bound: every frame pays two CDP
page.evaluateround-trips (seek; then paint-wait →drawElementImage→createImageBitmap→ worker handoff). Ablation showed the named in-page stages don't account for the measured 9–15 ms/frame — the missing ~3.5–9 ms/frame is per-call CDP protocol overhead.The fix
produceDrawElementFrameBatch(drawElementService): ONE evaluate loops N frames in-page — seek → paint-wait (tick toggle + canvaspaintevent) → drawElementImage composite →createImageBitmap→ per-framepostMessageto the encode worker. Bitmaps still reach the worker per frame (encode starts immediately, no pipeline bubble); only the CDP round-trips amortize N-fold. Micro-pipeline: frame i+1's seek/paint-wait overlaps frame i'screateImageBitmap(canvas is only redrawn after i's bitmap resolves).captureFramesBatchPipelined(frameCapture): perf accounting, static-deduplastEncodeResultretention, and mid-batch-failure recovery — frames before the failure are already at the worker; the remainder re-captures viacaptureFrameToBufferPipelined, which owns the screenshot-fallback semantics. Failure behavior identical to unbatched, just discovered at batch granularity.runWorkerEncodePipelineLoop): batches = maximal runs of consecutive frames that aren't static-dedup frames (their reuse is order-dependent) or opt-in boundary-screenshot frames; those keep the per-frame path.onBeforeCapture(video frame injection, a node-side per-frame hook) disables batching.Off by default — opt in with
HF_DE_BATCH=4. N=2 captures most of the win; N=4 ≈ N=6 ≈ N=8 (N=8 slightly regresses on long comps — encode backpressure) → recommended default 4.Validation (19 stratified DE comps, isolated A/B N=0 vs N=4)
4b038555,d95f20b6,2811c9cb— 1.18–1.31× with ∞) and clip-cut (0531c45f, ∞).No cached paint recordframe that fires on the unbatched path too — mid-batch recovery worked as designed (min 45.7 dB = the normal screenshot-vs-DE delta on recovered frames).onBeforeCapturecomps where batching correctly no-ops or amortization dilutes; worst 0.97 = noise.Why this is DE-only
drawElementImageis a page-context API — the page captures itself, so N frames need zero CDP traffic in between. Screenshot and BeginFrame capture via browser-process CDP commands the page cannot invoke, with the JPEG riding inside each command's response — per-frame alternation is forced by the protocol. (BeginFrame = Linux/Docker production is unaffected either way.)Verification
tsc --noEmit: 0 errorscaptureStreamingStage.test.ts: 4 pre-existing failures identical on the parent branch (unbuilt-dist env issue in the worktree, HDR-stage imports — unrelated to this change)Docs:
fast-capture-architecture.md§"Batch capture" +de-speedup-exploration.md§8f (P6).Stack: #1916 → #1917 → #1918 → #1919 → #1920 → #1921 → #1928 (here, new tip).
🤖 Generated with Claude Code