Skip to content

docs: Firecrawl-v2 consolidation blueprint#6

Open
Bharath-code wants to merge 3 commits into
mainfrom
feat/firecrawl-consolidation
Open

docs: Firecrawl-v2 consolidation blueprint#6
Bharath-code wants to merge 3 commits into
mainfrom
feat/firecrawl-consolidation

Conversation

@Bharath-code

Copy link
Copy Markdown
Owner

What

Adds docs/FIRECRAWL_CONSOLIDATION.md — a blueprint for collapsing RivalEye's scraping engine onto Firecrawl v2.

Blueprint only (no code changes yet). Execution is sequenced separately, starting with P1 (extractor swap behind a flag, shadow-tested for parity).

Why

  • The 3-tier scraper cascade (Firecrawl → Cheerio → Playwright) + hand-rolled geo layer + Gemini-vision pricing extractor is ~1,760 lines and a browser container.
  • The "geo proxy" is emulated — Playwright locale spoofing, no real IP rotation (geoContext.ts:10).
  • The Firecrawl branch in checkPricingContext.ts:121-143 is dead — it fetches then discards Firecrawl output and runs Playwright anyway.

Firecrawl v2 ships hosted equivalents: real location.country proxies, screenshot, structured json/branding extraction, and native changeTracking.

Scope

  • Keep the moat: deterministic diff engine (src/lib/diff/**) + AEO 5-model engine.
  • Replace the crawler cascade + fake geo proxy + Gemini parsing (extraction).
  • Net ~-1,400 LOC, drops the playwright/cheerio deps and the Trigger.dev browser build extension.
  • changeStatus === 'same' short-circuits unchanged pages → skips downstream LLM calls.

Wow features funded by the savings

  1. Instant paste-a-URL competitor teardown (pre-signup activation moment).
  2. Real before/after diff feed via changeTracking git-diff.
  3. Credible real-geo pricing (real proxy exits, not locale spoofing).

See the doc for the phased sequence (P1–P6) and verification plan.

🤖 Generated with Claude Code

Replace 3-tier scraper cascade + fake geo emulation + Gemini-vision
pricing with a single Firecrawl v2 scrape (real location proxies,
json/changeTracking/screenshot). Keeps the deterministic diff engine
and AEO 5-model engine untouched. Net ~-1,400 LOC, drops Playwright.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel

vercel Bot commented Jul 1, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
rivaleye Ready Ready Preview, Comment Jul 1, 2026 5:04pm

@kilo-code-bot

kilo-code-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

Kilo Code Review could not run — your account is out of credits.

Add credits or switch to a free model to enable reviews on this change.

…shadow parity

Add scrapePage.ts: single Firecrawl `scrape` with `json` structured extraction
mapped to the canonical PricingSchema, so the deterministic diff engine consumes
it unchanged. Real geo via location.country, changeTracking(json) as an
unchanged-page pre-filter, gated behind FIRECRAWL_EXTRACTOR. Additive only — no
cutover (that's P3).

Fix: firecrawl-js@4 silently drops Zod v4 schemas (detects Zod via v3 internals),
so `json` came back empty. Send z.toJSONSchema() instead. Verified live against
vercel.com/pricing and linear.app/pricing.

Add scripts/shadow-parity.test.ts + `npm run shadow`: runs the Firecrawl and
Playwright extractors on the same live URLs and reports schema/diff parity, using
diffPricing itself as the oracle. Gated off the normal suite unless SHADOW_URLS.

Tests: 7 new (mapper flows through real diffPricing); 589 + shadow skipped green;
typecheck:ci clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
scrapePage.ts gains opt-in captureScreenshot → Firecrawl screenshot format,
returns screenshotUrl; fetchScreenshotBuffer() bridges Firecrawl's hosted URL
to R2's Buffer API. Adds tests asserting geo (location.country + languages)
actually reaches Firecrawl. Still flag-gated; live pipeline cutover is P3.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant