docs: Firecrawl-v2 consolidation blueprint#6
Open
Bharath-code wants to merge 3 commits into
Open
Conversation
Replace 3-tier scraper cascade + fake geo emulation + Gemini-vision pricing with a single Firecrawl v2 scrape (real location proxies, json/changeTracking/screenshot). Keeps the deterministic diff engine and AEO 5-model engine untouched. Net ~-1,400 LOC, drops Playwright. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Kilo Code Review could not run — your account is out of credits. Add credits or switch to a free model to enable reviews on this change. |
…shadow parity Add scrapePage.ts: single Firecrawl `scrape` with `json` structured extraction mapped to the canonical PricingSchema, so the deterministic diff engine consumes it unchanged. Real geo via location.country, changeTracking(json) as an unchanged-page pre-filter, gated behind FIRECRAWL_EXTRACTOR. Additive only — no cutover (that's P3). Fix: firecrawl-js@4 silently drops Zod v4 schemas (detects Zod via v3 internals), so `json` came back empty. Send z.toJSONSchema() instead. Verified live against vercel.com/pricing and linear.app/pricing. Add scripts/shadow-parity.test.ts + `npm run shadow`: runs the Firecrawl and Playwright extractors on the same live URLs and reports schema/diff parity, using diffPricing itself as the oracle. Gated off the normal suite unless SHADOW_URLS. Tests: 7 new (mapper flows through real diffPricing); 589 + shadow skipped green; typecheck:ci clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
scrapePage.ts gains opt-in captureScreenshot → Firecrawl screenshot format, returns screenshotUrl; fetchScreenshotBuffer() bridges Firecrawl's hosted URL to R2's Buffer API. Adds tests asserting geo (location.country + languages) actually reaches Firecrawl. Still flag-gated; live pipeline cutover is P3. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
docs/FIRECRAWL_CONSOLIDATION.md— a blueprint for collapsing RivalEye's scraping engine onto Firecrawl v2.Blueprint only (no code changes yet). Execution is sequenced separately, starting with P1 (extractor swap behind a flag, shadow-tested for parity).
Why
geoContext.ts:10).checkPricingContext.ts:121-143is dead — it fetches then discards Firecrawl output and runs Playwright anyway.Firecrawl v2 ships hosted equivalents: real
location.countryproxies,screenshot, structuredjson/brandingextraction, and nativechangeTracking.Scope
src/lib/diff/**) + AEO 5-model engine.playwright/cheeriodeps and the Trigger.dev browser build extension.changeStatus === 'same'short-circuits unchanged pages → skips downstream LLM calls.Wow features funded by the savings
changeTrackinggit-diff.See the doc for the phased sequence (P1–P6) and verification plan.
🤖 Generated with Claude Code