Skip to content

feat(studio): add resolver-shadow attempt counter for soak-gate denominator#1926

Merged
vanceingalls merged 2 commits into
mainfrom
fix/resolver-shadow-attempt-counter
Jul 4, 2026
Merged

feat(studio): add resolver-shadow attempt counter for soak-gate denominator#1926
vanceingalls merged 2 commits into
mainfrom
fix/resolver-shadow-attempt-counter

Conversation

@vanceingalls

@vanceingalls vanceingalls commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

What

Adds a low-volume attempt counter to the resolver-shadow tripwire (packages/studio/src/utils/sdkResolverShadow.ts), rolled up into a new PostHog event sdk_resolver_shadow_attempt every 5 minutes (plus a visibilitychange flush for short sessions).

Why

sdk_resolver_shadow (the existing tripwire event) only fires on divergence — parity is silent by design, to avoid spamming an event on every editor edit. That makes it a pure numerator: we can count SDK-resolver failures, but never attempts. Without a denominator there's no way to compute an actual divergence rate or decide when the SDK-cutover soak sample is big enough — "0 divergences" means very different things at 5 attempts vs 5,000.

How

  • Module-private Record<string, number> counter (attemptCounts) inside sdkResolverShadow.ts, incremented by all three existing emit functions (runResolverShadow, recordResolverParity, recordAnimationResolverParity) right after their existing STUDIO_SDK_RESOLVER_SHADOW_ENABLED guard — so every real invocation is counted regardless of outcome, with no new flag check needed.
  • Accumulates in memory and flushes as one rollup event per interval (never per-attempt) — same chattiness constraint that shaped the original divergence-only design.
  • New event name (sdk_resolver_shadow_attempt) kept distinct from sdk_resolver_shadow, so the two streams (activity vs failures) stay independently queryable.
  • Flush timer + visibilitychange listener start lazily on the first recordAttempt call, mirroring studioTelemetry.ts's existing lazy-timer pattern — a session that never touches the tripwire never runs a background timer.
  • No changes to evaluateSoakGate or any existing function's public signature — this only ships the raw attempt data; the divergences / attempts rate view is a follow-up once real data exists.

Design doc: docs/hyperframes/plans/sdk/2026-07-04-resolver-shadow-attempt-counter-design.md
Plan: docs/hyperframes/plans/sdk/2026-07-04-resolver-shadow-attempt-counter-plan.md

Test plan

  • Unit tests added — 15 new tests (39→54 in sdkResolverShadow.test.ts): pure counter/rollup behavior, attempt-counting on both the parity and divergence paths of all three emit functions, and production timer/rollup wiring with fake timers.
  • Full studio suite passing (54/54 in this file; no regressions elsewhere).
  • Reviewed twice: task-scoped review (spec compliance + quality, approved) and final whole-branch review (mergeable as-is). Three Minor follow-up notes left for whoever builds the eventual rate view (dom-edit label parity with the dashboard's (dom-edit) convention, opLabel being JSON-nested vs top-level, and a documented-but-fragile DOM event-order dependency for the tab-hide flush) — none block this change.

…inator

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

Copy link
Copy Markdown
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@miga-heygen miga-heygen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — feat(studio): add resolver-shadow attempt counter for soak-gate denominator

Verdict: LGTM with nits

Clean, well-scoped addition. The counter + periodic rollup is the right shape for this — you get the denominator without recreating the per-event chattiness the divergence-only design explicitly avoided. Tests are thorough (15 new, covering both paths of all three emit functions, flag-off, and timer wiring). Design doc and plan linked. Ship it.


Findings

1. recordAttempt is outside the try/catch in all three emit functions (Minor)

runResolverShadow, recordResolverParity, and recordAnimationResolverParity all place recordAttempt(opLabel) before their try block. The contract on these functions is "never throws; never propagates from the shadow path," but recordAttemptensureAttemptFlushScheduled touches setInterval and document.addEventListener, which can throw in non-standard environments (SSR, web workers, restrictive iframe sandboxes).

If recordAttempt throws, the exception propagates past the catch and breaks the caller — the exact failure mode these functions are designed to prevent.

Suggestion: move recordAttempt inside the existing try block (right at the top). It's a one-line move in each function and preserves the "count regardless of outcome" guarantee since the catch at the bottom swallows everything:

// e.g. in runResolverShadow:
try {
  recordAttempt("dom-edit");
  const mismatches = sdkResolverShadowCheck(session, hfId, ops);
  // ...
} catch {
  // never propagate from the shadow path
}

Alternatively, wrap the body of recordAttempt itself in a try/catch so the counter is self-hardened regardless of call site.

2. __resetAttemptSchedulingForTests resets the flag but doesn't remove the listener (Nit)

__resetAttemptSchedulingForTests sets attemptUnloadListenerRegistered = false and clears the interval, but does not call document.removeEventListener. This means after a reset + re-init cycle in tests, a second visibilitychange listener is registered. In the current test suite this is harmless (the timer tests don't fire visibility events), but it could cause a double-flush surprise if a future test exercises the tab-hide path after a reset. Not blocking — just something to be aware of if the visibility flush ever gets its own test.

3. Listener-order dependency for the visibilitychange flush (Acknowledged)

The PR body already calls this out ("documented-but-fragile DOM event-order dependency for the tab-hide flush"). The new visibilitychange handler calls flushAndEmitAttempts()trackStudioEvent(), which queues an event into studioTelemetry.ts's batch queue. If studioTelemetry's own visibilitychange handler runs first (listener registration order is not guaranteed by spec), the attempt rollup event arrives in the queue after the sendBeacon flush and sits there unflushed until the tab returns (or forever, if the tab is closed). This is a low-probability data loss for the very last rollup in a session, and the 5-minute periodic flush covers the vast majority of data. Fine for now — the follow-up to add sendBeacon to this handler (mirroring studioTelemetry.ts lines 121-123) would close the gap when someone gets to it.


Ponytail (over-engineering audit)

Lean already. The counter + rollup is the minimum viable denominator — no unnecessary abstractions, no premature generalization, no config surface. The __resetAttemptSchedulingForTests export is justified overhead for the fake-timer tests. Ship.

Silent failure patterns

One: finding #1 above (recordAttempt outside try/catch). The rest of the error handling follows the established fail-open pattern correctly.


Diff stats: 2 files changed, +230 / -1

— Miga

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at 8562e0d.

Verdict

🟠 Solid design and shape; one real-behavior finding on the tab-hide flush order that partly defeats the "short session" claim in the PR body, plus a few smaller notes. Nothing that blocks merge if the tab-hide-flush trade-off is acknowledged; happy to stamp on either a fix or a decision to accept-and-document.

What this PR adds (observability shape)

Introduces a denominator for the existing sdk_resolver_shadow divergence signal:

  • Module-private attemptCounts: Record<string, number> in sdkResolverShadow.ts.
  • Incremented at the top of all three emit functions (runResolverShadow, recordResolverParity, recordAnimationResolverParity), after the STUDIO_SDK_RESOLVER_SHADOW_ENABLED guard and after the null-input early returns — so attempts count on both the parity (silent) path AND the divergence (emits) path, and don't count when input is missing.
  • Rolled up into ONE sdk_resolver_shadow_attempt PostHog event every 5 minutes (setInterval) plus a visibilitychange → hidden flush, both installed lazily on the FIRST attempt via ensureAttemptFlushScheduled.
  • Event shape: { counts: JSON.stringify({ [opLabel]: count, ... }) } — one field, JSON-string-nested.

Same-file locality, no new flag, no changes to evaluateSoakGate. Tests: 15 new (39 → 54), covering pure counter behavior, per-function attempt counting on both parity + divergence paths, and production timer/rollup wiring with fake timers.

Divergence-vs-parity coherence

  • Dim alignment: the divergence event sdk_resolver_shadow from runResolverShadow (sdkResolverShadow.ts:358) emits no opLabel — it only has one shape (DOM-edit). The attempt counter labels DOM-edit attempts as "dom-edit". So divergences_dom_edit = count(sdk_resolver_shadow WHERE opLabel IS NULL), attempts_dom_edit = counts["dom-edit"]. Works, but the "NULL means dom-edit" convention is implicit. PR body Minor #1 already acknowledges this ((dom-edit) label convention on the dashboard).
  • opLabel field-shape drift: parity attempts serialize opLabel as a nested JSON string field (counts: JSON.stringify({ setTiming: N })) while divergences emit opLabel as a top-level property on sdk_resolver_shadow. Analyst has to JSON_EXTRACT on the numerator side but not the denominator. PR body Minor #2 flags this too. Fine to ship; makes the eventual rate view a bit more finicky.
  • Time window: attempts flush every 5 minutes; divergences fire per-event through the shared studioTelemetry 30-second batch. Both hit the same PostHog project, no clock drift concern.

Failure-path coverage

Covered. recordAttempt(...) is called BEFORE each function's try {}:

  • sdkResolverShadow.ts:346runResolverShadow, before the try that catches sdkResolverShadowCheck throws.
  • sdkResolverShadow.ts:399recordResolverParity, before the try that catches resolveSnapshot / source-read errors.
  • sdkResolverShadow.ts:462recordAnimationResolverParity, before the try that catches session.getElements() failures.

Every non-guarded invocation gets counted, so the denominator can't silently under-count the failure cohort — the exact rubric miss recurring in observability PRs. Well done.

Null-input paths (!hfId, !session) return before recordAttempt — those aren't attempts on the tripwire, they're skipped calls. Correct.

Findings (severity-ordered)

  • 🟠 visibilitychange flush is dropped for short sessions due to listener-ordersdkResolverShadow.ts:284-294 (ensureAttemptFlushScheduled) vs. studioTelemetry.ts:93-115 (module-load-time visibility handler). PR body claims the visibilitychange path exists to catch "short sessions" that never see a 5-min interval fire. But:

    1. studioTelemetry.ts:93 registers its visibility handler at module load time.
    2. ensureAttemptFlushScheduled registers its visibility handler lazily on the first recordAttempt, which by construction runs AFTER studioTelemetry has been imported and initialized.
    3. DOM addEventListener fires handlers in registration order.
    4. So on visibilitychange → hidden: studioTelemetry's handler runs FIRST, drains the queue via sendBeacon (queue is EMPTY of the rollup event, since the PR's handler hasn't run yet). Then PR's handler runs, calls flushAndEmitAttemptstrackStudioEventenqueues the rollup — but studioTelemetry's sendBeacon path has already fired, and the queue-consuming flushTimer was just cleared+nulled by studioTelemetry's handler (studioTelemetry.ts:96-99).
    5. The rollup event now sits in queue — the NEXT trackStudioEvent call restarts the interval, so it goes out ~30s later IF the tab stays open. If the user closes the tab / navigates away, the rollup is lost.

    Impact: short sessions (< 5 min) that would have relied on the tab-hide flush lose their attempt count with high probability. The very cohort the visibilitychange path targets. Options: (a) fire the rollup in a sendBeacon-flavored path from flushAndEmitAttempts directly (bypass the studioTelemetry queue for this one event); (b) capture: true on the addEventListener to run before studioTelemetry's bubble-phase handler; (c) accept the trade-off and drop the visibilitychange path entirely + doc "5-min-min-session for denominator accuracy" (my recommendation — simplest, and short sessions are a small share of the fleet). Whatever you pick, worth calling out in the PR body / commit message so operators know the true guarantee.

  • 🟡 counts as JSON-string fieldsdkResolverShadow.ts:277. counts: JSON.stringify(counts) forces every downstream query to JSON_EXTRACT per opLabel. Flattening to count_setTiming: N, count_dom_edit: M, count_reorderElements: K, ... (per-key top-level fields, bounded set of ~13 known labels enumerated at sdkCutover.ts:311/329/348/377/395/475 + useDomEditSession.ts:304 + sdkResolverShadow.ts:346/399/462) would make the rate view a direct division in the PostHog UI with no JSON parsing. PR body Minor #2 acknowledges this — flagging as a 🟡 vs a 🟠 because the set is bounded so no cardinality risk and analysts can work around it, but the rate view will be more painful than it needs to be. Nit as scoped, or a follow-up.

  • 🟡 window vs document visibility listener divergencesdkResolverShadow.ts:290 binds document.addEventListener("visibilitychange", ...), while sibling studioTelemetry.ts:94 binds window.addEventListener("visibilitychange", ...). Both work (visibilitychange bubbles), but visually diverges from the referenced sibling pattern. Not a bug; consistency nit.

  • 🟡 Listener leaks for module lifetimesdkResolverShadow.ts:288-293. No removeEventListener and attemptUnloadListenerRegistered never resets in production. The sibling pattern in studioTelemetry.ts also doesn't remove, so this is consistent with house style — but the sibling registers at module-load (guaranteed once), while this PR registers on first-attempt behind a mutable-in-tests bool. __resetAttemptSchedulingForTests clears the bool but does NOT actually removeEventListener — so on the next recordAttempt, a NEW listener is added while the old one stays alive. Not observable in tests (jsdom cleanup + vi.useRealTimers teardown), but if anyone else ever calls the reset helper in a non-test long-lived environment, the leak is real. Cheapest fix: capture the listener as a named function and remove it in the reset helper too. Or accept + document "test-only, jsdom cleans up between suites."

  • 🟢 PII / redaction: rollup event carries only { opLabel: int } counts. No user content, no hfIds, no error strings. Consistent with redactValue treatment in the divergence-emit path. No new PII surface.

  • 🟢 Cardinality: opLabel set is bounded to ~13 hard-coded strings (grepped from all callers). No user-controlled input reaches the counter's keyspace. No cardinality bomb.

  • 🟢 Idempotency: flushAttemptCounts correctly returns null on empty state, and flushAndEmitAttempts no-ops on null. Double-flushes are safe.

  • 🟢 Failure-path emission: attempts count on the throw path of all three emit functions. Passes the observability-PR-must-cover-failure-path rubric.

Cross-repo consumer awareness

New event name studio:sdk_resolver_shadow_attempt (via trackStudioEvent's studio: prefix in studioTelemetry.ts:55). No existing PostHog dashboards / monitors reference it yet. Anyone consuming the divergence event studio:sdk_resolver_shadow for the soak-gate readout will want to know the denominator is now available — a heads-up in the SDK cutover soak thread would land the new signal in the analyst's mental model.

Test coverage

  • Pure counter behavior: ✅ (5 tests).
  • Attempt counting on parity AND divergence paths of all three emit functions: ✅ (6 tests).
  • Multi-chokepoint accumulation into one rollup: ✅.
  • Flag-off no-op: ✅.
  • Production timer wiring, both "no attempts → no emit" and "attempts → emit at 5 min": ✅.

Missing:

  • No test covers the visibilitychange → hidden code path. Given the 🟠 finding above, this is the exact path where the order-of-listener bug lurks. A test that fires a visibilitychange event in jsdom and asserts the rollup arrives in trackedEvents would have caught the interaction (or made you notice it during test-writing).
  • No assertion on the rollup event's distinct_id / session properties — but those are trackStudioEvent's responsibility, not this module's, so fair to leave.

AI-trailer sweep

Commit trailer Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> present. No leftover trailer or scaffolding in the source. Diff is disciplined — module-private state, no config sprawl, no new flags, no hallucinated APIs (all DOM APIs used are real). Docstrings are honest about the trade-offs (test-only export explanation, ordering fragility acknowledged in PR body). Reads like well-vetted code, not scaffold.

Peer awareness

No prior reviews on this PR at HEAD 8562e0d0 (gh api /repos/heygen-com/hyperframes/pulls/1926/reviews returned [] before I started). Fresh R1.

Slack-ready summary

hyperframes#1926 @ 8562e0d0 — 🟠 Solid parity-denominator signal; one real listener-order bug on the tab-hide flush that undoes the "short session" claim, plus known-and-shipped JSON-nested counts field. Not blocking merge if the tab-hide trade-off is acknowledged.

Top findings:

  • 🟠 visibilitychange flush drops the rollup for short sessions — studioTelemetry's handler registers at module load (fires first), PR's handler registers lazily on first attempt (fires second, after sendBeacon already went out). Fix options: capture-phase handler, direct sendBeacon in flushAndEmitAttempts, or accept + doc.
  • 🟡 counts as JSON-string forces JSON_EXTRACT in every rate query; bounded opLabel set (~13) would flatten cleanly to top-level fields. PR body Minor #2 acknowledges.
  • 🟡 document vs window visibility listener style diverges from sibling studioTelemetry.ts. Consistency nit.

Failure-path coverage: ✅ — recordAttempt fires before the try in all three emit functions.
Divergence-vs-parity dim coherence: ✅ modulo the opLabel field-shape drift the PR body already flags.
PII: clean. Cardinality: bounded. CI: all required green at HEAD.

Review by Rames D Jusso

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolver-shadow denominator shape is sound: all three shadow emit paths are still gated on STUDIO_SDK_RESOLVER_SHADOW_ENABLED, op labels are static/low-cardinality, and the new rollup event stays separate from the divergence event. The tests cover parity and divergence paths for DOM edit, element resolver, and animation resolver attempts.

Important follow-up, already raised by RDJ: packages/studio/src/utils/sdkResolverShadow.ts:290 enqueues the tab-hide rollup through trackStudioEvent, but packages/studio/src/utils/studioTelemetry.ts:94 has its own module-load visibilitychange beacon flush. On a short session, studioTelemetry can flush first, then this handler queues the attempt rollup after the beacon is gone. That weakens the PR body's "visibilitychange flush for short sessions" claim, though the 5-minute periodic denominator remains intact. Direct sendBeacon in flushAndEmitAttempts or capture-phase ordering would close it.

Nits from prior reviews still stand: recordAttempt is outside the existing try/catch, test reset doesn't remove the listener, and counts as a JSON-string property will make rate queries a little clunkier than top-level bounded fields.

CI is green at 8562e0d0.

Verdict: APPROVE
Reasoning: Core counter/rollup implementation is correct and CI is clean; the tab-hide ordering issue is real but limited to short-session telemetry completeness, so I’m treating it as follow-up rather than a merge blocker.

…sh ordering

PR review feedback (4 reviewers): recordAttempt() sat outside the try/catch
in all three emit functions, so a throw inside it (e.g. setInterval/
addEventListener failing in a non-standard environment) would break the
"never throws" contract. Also, the new visibilitychange listener races
studioTelemetry.ts's own tab-hide handler — whichever fires first can beacon
the queue before or after this module's rollup lands in it, silently
dropping the attempt count for short sessions closed before the 5-minute
timer fires.

Fixes: move recordAttempt() inside each function's try block; export
flushViaBeacon() from studioTelemetry.ts and call it explicitly after
queuing the rollup, so delivery no longer depends on listener registration
order; capture the visibilitychange handler by reference so
__resetAttemptSchedulingForTests() actually removes it instead of leaking a
duplicate on re-arm.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
@vanceingalls

Copy link
Copy Markdown
Collaborator Author

Addressed the two real findings (thanks all four for the thorough passes):

  • recordAttempt outside try/catch (miga) — moved inside each function's existing try block in all three emit functions, so it can never break the "never throws" contract.
  • visibilitychange listener-order dropping the tab-hide flush (RDJ, miguel) — added flushViaBeacon() export to studioTelemetry.ts and call it explicitly right after queuing the rollup, so delivery no longer depends on which listener (this module's, on document, vs. studioTelemetry's, on window) fires first.
  • Test-reset listener leak (RDJ) — __resetAttemptSchedulingForTests now captures the handler by reference and calls removeEventListener, so a reset+re-arm cycle doesn't double-register.

Also added test coverage for the tab-hide path itself (previously untested, which is exactly where the ordering bug was hiding) — switched the file to happy-dom so visibilitychange can actually be exercised.

Left as documented follow-ups (all three reviewers flagged these as non-blocking):

  • "dom-edit" label vs. the dashboard's (dom-edit) convention
  • counts as a JSON-string field vs. top-level per-op fields

Full studio suite green (1328 passed), typecheck clean, lint/format clean.

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the 8562e0d0..2c60ae98 delta. The follow-up fixes address the prior findings cleanly:

  • packages/studio/src/utils/sdkResolverShadow.ts now calls recordAttempt inside each existing fail-open try, so scheduling/counting cannot break the shadow emitters' never-throw contract.
  • The visibility handler is now held by reference and removed in __resetAttemptSchedulingForTests, with a regression test proving reset+re-arm does not double-register.
  • packages/studio/src/utils/studioTelemetry.ts exposes flushViaBeacon(), and the resolver-shadow hidden-tab handler calls it after queuing the rollup. That closes the listener-order bug RDJ flagged: whether studioTelemetry's own visibility handler runs before or after this module's handler, the attempt rollup is beacon-flushed instead of sitting behind a cleared timer.
  • The new happy-dom tests cover visibilitychange -> hidden and duplicate-listener reset behavior; the repo Test job is green at this head.

Remaining notes are the already-accepted query-shape nits: counts is still a JSON string, and dom-edit remains the special denominator label. Those don't block merge.

CI is green at 2c60ae98.

Verdict: APPROVE
Reasoning: The new delta directly fixes the exception-safety, tab-hide delivery, and listener-reset issues without changing the core low-volume denominator design; all checks are green on the exact head.

@vanceingalls vanceingalls merged commit fe821ff into main Jul 4, 2026
40 checks passed
@vanceingalls vanceingalls deleted the fix/resolver-shadow-attempt-counter branch July 4, 2026 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants