fix(lint): stop CSS comments in <style> from manufacturing phantom root tags#1897
fix(lint): stop CSS comments in <style> from manufacturing phantom root tags#1897miguel-heygen wants to merge 1 commit into
Conversation
…ot tags extractOpenTags scans raw source text with a flat regex that has no concept of <style>/<script> block boundaries, so a CSS comment like `/* <g> wrapper */` inside a <style> block reads as a real open tag. findRootTag consumes that flat tag list and only skips tags literally named script/style/meta/link/title, so the phantom <g> tag (not in that skip list) wins the "first non-ignored body tag" search and gets returned as the composition root instead of the real one that follows. This manufactured root_missing_composition_id and root_missing_dimensions (the phantom tag has neither) plus head_leaked_text (the leaked-text scan slices up to the phantom tag's position, landing inside the <style> block before its real closing tag, so the raw CSS text reads as leaked markup) on an otherwise valid sub-composition — reported with an exact bisected repro: a <template>-wrapped SVG sub-composition whose <style> block comments reference an inner <g> element. Fix: compute <style>/<script> content spans up front (reusing the existing extractBlocks + STYLE_BLOCK_PATTERN/SCRIPT_BLOCK_PATTERN) and skip any TAG_PATTERN match that falls inside one, before it ever reaches findRootTag or any other extractOpenTags consumer. Same shape as the prior fix for a leading <svg> defs block being mistaken for the root (8ee4b7d) — this closes a sibling gap in the same function. Test: new regression case with a <style> block containing a `/* <g> */` comment ahead of an <svg data-composition-id> root, asserting none of the three findings fire. Full lint package suite (318 tests) passes.
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Reviewed at 7201818.
🟡 Approve advisory lint — but flag the incomplete-fix posture prominently
What's right about the lint rule itself:
- Rule mechanics look correct:
findHtmlTag→readAttr(raw, "dir")→ flag any non-ltrvalue, case-insensitive on theltrcheck. Severityerrormatches the severity of other silent-default-breaking checks in this file. Reuses existing helpers. - Test coverage geometry is comprehensive: flags
dir="rtl", flagsdir="AUTO"(any non-ltr), does not flagdir="ltr", does not flag absent-attribute, does not flag element-scopeddirection:rtlCSS (the documented workaround). Five clear negative/positive tests. - Both reporters independently confirmed the exact same fix (drop
dirfrom<html>, use element-scoped CSSdirection:rtl), which is the strongest form of empirical evidence for advisory-only rules.
Load-bearing concern — this PR title says "confirmed silent render failure" but the PR body admits the render-pipeline root cause is unfixed and unverified.
Per my bandaid_vs_longterm_rubric memory: when the PR body admits scope-incomplete ("advisory only — this does not fix the underlying render-pipeline bug"), the verdict should be sharper about the residual work. The lint rule is legitimate on its own — it's a valuable footgun-catcher — but the net posture for the repo is:
- Users who write new
<html dir="rtl">compositions after this lands: caught. ✓ - Existing RTL compositions already in customer projects: still silently produce blank videos, since the lint rule fires at author time, not render time. ✗
- Any team member reading this PR title as "the RTL render bug is fixed": misled. ✗
Suggested framing tightening (not blocking, but strong preference):
- Retitle to something like
feat(lint): flag <html dir="..."> as author-time warning for silent render failure — pipeline fix pending. The "confirmed silent render failure" phrasing is accurate but reads as "this PR fixes the failure" on first scan. - Rename the finding code from
html_dir_attribute_breaks_rendertohtml_dir_attribute_breaks_render_pipeline_pendingor leave aTODO(link-to-issue)in the rule source, so if/when the pipeline bug is fixed the rule can be softened towarningor removed. - Load-bearing: file a tracking issue naming
packages/engine/src/services/screenshotService.tsand Miguel's hypothesis about the{x: 0, y: 0, width, height}clip region (screenshotService.ts:151inpageScreenshotCaptureuses exactly that fixed clip), so the pipeline fix has a home. Otherwise the "advisory-only, pipeline fix pending" state persists indefinitely.
Verification of Miguel's hypothesis: I read screenshotService.ts:120-159 — pageScreenshotCapture does exactly clip to { x: 0, y: 0, width: options.width, height: options.height, scale: dpr }. Miguel's hypothesis is plausible on inspection; the actual RTL-vs-clip-origin interaction still needs an empirical repro in a working headless-Chrome environment (Miguel couldn't get one in this session). But the "start with screenshotService.ts's clip region" pointer is well-grounded — that's the right first place to look.
Verdict: approve the lint rule, but treat this as delivering a symptom guard, not a fix. Ship, but pair with a tracking issue for the pipeline fix ideally before merge (or immediately after).
Reviewed by AI Employee Rames D Jusso — bot-authored PR posture: comment-only, stamps require James Russo's explicit go-ahead.
vanceingalls
left a comment
There was a problem hiding this comment.
🟢 R1 verdict — correct isolate-outer-regions-first fix; no over/under-strip on other CSS constructs
Runtime-interop lens: verified the CSS-comment parsing doesn't over- or under-strip other CSS constructs, and the fix applied at extractOpenTags correctly shields every downstream consumer.
Finding-by-finding
1. Isolate-outer-regions-first pattern — 🟢
packages/lint/src/utils.ts:54-71
Per [[embedded-language-regex-isolation]] the correct approach for CSS-in-HTML parsing is to isolate outer regions (HTML <style> / <script> blocks) first, then apply inner-language regex within those spans — NOT to widen the outer regex's character class trying to skip CSS content.
This PR takes exactly that approach: findStyleAndScriptContentRanges computes the content-span boundaries via the already-well-tested extractBlocks + STYLE_BLOCK_PATTERN/SCRIPT_BLOCK_PATTERN, then extractOpenTags skips any TAG_PATTERN match whose .index falls inside those spans. Correct architectural choice; wouldn't have caught the same bug with a negated-character-class attempt.
2. Under/over-strip audit on other CSS constructs — 🟢
packages/lint/src/utils.ts:63-70
Concern probed: does the span-based skip incorrectly under- or over-shield?
- CSS
content:strings containing<span>etc. — e.g..foo::before { content: '<div class="x">'; }. Under old code,TAG_PATTERNwould match<div class="x">inside a<style>block and pollute the tag list. Under the new code, that match falls inside a style content span → skipped. Rames noted this too. Blast-radius is bigger than the reported phantom-tag symptom, in a good way. - JS strings mentioning tags — same guard applies via
SCRIPT_BLOCK_PATTERN. Fixed by the same mechanism. - CSS block comments containing
*/—STYLE_BLOCK_PATTERNdoesn't parse comments; it just captures the full<style>...</style>content span, so comment-boundary edge cases inside CSS never leak out of the span. Correct. - HTML attributes containing
<— e.g.<div data-x="<foo>">. Not shielded (the<foo>isn't inside a<style>/<script>block). This case predates this PR —TAG_PATTERNhas the same behavior at HEAD. Out of scope; noting for the reviewer. <style>inside another<style>— HTML spec disallows; browsers close the outer on the inner opening tag.STYLE_BLOCK_PATTERN(non-greedy[\s\S]*?) matches to the first</style>, matching browser behavior. Correct.
No over-strip of legitimate composition tags; no under-strip that would leak a different phantom-tag class.
3. block.raw.indexOf(block.content) correctness — 🟢
packages/lint/src/utils.ts:65
The span computation is contentStart = block.index + block.raw.indexOf(block.content). The indexOf finds the first occurrence of content within raw. Concerned edge case: could content appear earlier inside the <style attrs> prefix?
raw shape is <style ATTRS>CONTENT</style>. attrs is standard HTML attribute list (e.g. type="text/css", media="screen"). For indexOf(content) to return an earlier index, attrs would need to contain the entire content string as a substring. Practically impossible — content is CSS text (potentially large), attrs is a bounded attribute list. Edge case is theoretical, not real.
Empty content — indexOf("") returns 0, span is [block.index, block.index] = empty. No .index >= start && .index < end check falls inside an empty range. Safe no-op.
4. Downstream consumer shielding — 🟢
packages/lint/src/utils.ts:76
The fix lives inside extractOpenTags — every consumer of extractOpenTags in the tree automatically benefits. Rames noted the one other consumer (context.ts:40). The findRootTag path (the reported symptom trigger) reads context.tags, which is populated by extractOpenTags at context-build time. All three chained findings (root_missing_composition_id / root_missing_dimensions / head_leaked_text) share that single tag list, so shielding at the source fixes all three at once.
Verified there's no independent second TAG_PATTERN scanner in the tree that would need the same fix — the source lives in utils.ts and is the sole tag-scanning primitive.
5. Performance surface — 🟢 (non-blocker)
skipRanges.some(...) is O(matches × blocks). Typical compositions have 1-3 <style> blocks and dozens of tags — trivial. Rames flagged this same point; agreed no preemptive optimization needed.
Peer-lens split: Rames covers the test-geometry + cross-repo consumer count. My lens confirms the isolate-outer-regions-first architectural choice, the over/under-strip audit on other CSS constructs, and shielding correctness at the sole extractOpenTags entry point.
Review by Via (runtime-interop lens)
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Reviewed at 7201818.
🟡 R2 verdict — no drift since R1 (SHA unchanged). Approve on lint fix stands; framing hold still open.
Delta from R1: None. HEAD is still 72018187; no new commits, no title change, no follow-up tracking issue linked.
Lint fix itself (🟢, unchanged from R1): The extractOpenTags <style>/<script> content-span skip is still the right shape — isolate-outer-regions-first, no rule-by-rule special-casing, all three phantom-tag-manufactured findings (root_missing_composition_id / root_missing_dimensions / head_leaked_text) fixed at the source. Via's R1 co-sign on the CSS-comment parsing not over/under-stripping other constructs still holds at this SHA.
🟡 hold still open (R1 substantive): The PR title still reads fix(lint): stop CSS comments in <style> from manufacturing phantom root tags. The lint rule stops the phantom-tag findings, but the actual render pipeline root cause (suspected screenshotService.ts:151 fixed-origin clip region shifting away from real content under RTL layout) is untouched — existing customer compositions that already authored <html dir="rtl"> still hit the blank-video symptom.
This isn't a code blocker (the lint fix genuinely fixes what its title's lint scope claims), but the PR body's "fixed" framing on the pipeline-level bug is still misleading. Two ways to close cleanly:
- Retitle to
fix(lint): advisory rule for <html dir> render-breakand add a TODO in the rule comment linking a tracking issue for the pipeline fix, OR - Link a follow-up issue in the PR body naming
screenshotService.ts:151as the actual root-cause fix location.
Either shape closes the hold. Merging as-is is defensible on the lint scope, but leaves the pipeline bug undocumented as an open item — which was my original concern.
— Rames D Jusso
Root cause
extractOpenTagsinpackages/lint/src/utils.tsscans raw source text with a flat regex (TAG_PATTERN) that has no concept of<style>/<script>block boundaries. A CSS comment like/* <g> wrapper */inside a<style>block reads as a real open tag to that scan.findRootTagconsumes that flat tag list and returns the first tag not literally namedscript/style/meta/link/titleas the composition root. A phantom<g>tag surfaced from inside a<style>block's comment isn't in that skip list, so it wins the search and gets returned instead of the real root that follows.This manufactured three findings on an otherwise valid sub-composition:
root_missing_composition_id/root_missing_dimensions— the phantom tag has neither.head_leaked_text— the leaked-text scan slices source up to the phantom tag's position, which lands inside the<style>block before its real closing tag, so the raw CSS text reads as leaked visible markup.Reported with an exact bisected repro: a
<template>-wrapped SVG sub-composition whose<style>block comments reference an inner<g>element.This is the same class of bug as
8ee4b7df(a leading<svg>defs block being mistaken for the root) —findRootTag's flat tag list can be poisoned by tags that were never real HTML elements in the first place.Fix
Compute
<style>/<script>content spans up front (reusing the existingextractBlocks+STYLE_BLOCK_PATTERN/SCRIPT_BLOCK_PATTERN, no new parsing logic) and skip anyTAG_PATTERNmatch that falls inside one, before it ever reachesfindRootTagor any otherextractOpenTagsconsumer. This fixes the bug at the source rather than special-casing each of the three affected rules individually.Test plan
packages/lint/src/rules/core.test.ts: a<style>block containing a/* <g> */comment ahead of an<svg data-composition-id>root, asserting none of the three findings fire.bunx vitest run packages/lint— 318/318 passing (full package suite, no regressions).bunx tsc --noEmitvia the workspace build — clean.bunx oxlint/bunx oxfmt --checkon the changed files — clean.