Skip to content

fix(cli): purge stale/partial browser installs instead of wedging retries#1913

Draft
miguel-heygen wants to merge 1 commit into
mainfrom
fix/browser-ensure-stale-install-purge
Draft

fix(cli): purge stale/partial browser installs instead of wedging retries#1913
miguel-heygen wants to merge 1 commit into
mainfrom
fix/browser-ensure-stale-install-purge

Conversation

@miguel-heygen

Copy link
Copy Markdown
Collaborator

Root cause

Two independent reports of the same failure: a chrome-headless-shell zip extraction gets interrupted (Windows AV lock, sleep/wake, ctrl-C) and leaves only the alphabetically-early files (ABOUT/LICENSE) in the target directory, no executable. Every subsequent browser ensure (or implicit re-download from findBrowser/ensureBrowser) sees the directory already exists and hands it straight to @puppeteer/browsers' install(), which throws "folder exists but the executable is missing" without re-extracting — permanently wedging the machine until someone manually deletes the directory. --force didn't help because it was a phantom flag: browser.ts never declared it as a real CLI option, so it silently did nothing (mentioned only in an error-message string).

findFromCache() already detects this exact case (dir exists, exe missing) and returns it as staleHyperframesCachePath, but findBrowser()/ensureBrowser() fed that straight into a re-download without ever deleting the stale directory first, so install() hit the same "exists" branch every time.

Fix

  • findFromCache() also returns staleInstallPath (InstalledBrowser's .path — the actual install-folder root, not the missing executablePath) for the stale case.
  • Both findBrowser() and ensureBrowser() now purge that directory (rmSync, inside the existing withInstallLock mutex from fix(cli): lock chrome-headless-shell install against concurrent extraction races #1866 so a purge can't race a concurrent installer) before retrying, so install() actually re-extracts instead of erroring.
  • Wired up a real --force flag on hyperframes browser ensure: it purges the whole HF-managed cache (reusing the already-tested clearBrowser()) and skips every cache/system shortcut, so it always gets a fresh download regardless of what's currently on disk — matching what the existing (previously false) help text already claimed it did.

Not fixed here

Neither report's machine had a usable auto-detected system Chrome fallback on Windows — SYSTEM_CHROME_PATHS only lists macOS/Linux paths, so findFromSystem() can never succeed on win32. Both reporters worked around this manually via HYPERFRAMES_BROWSER_PATH, which still works fine; adding real Windows system-Chrome detection is a distinct, larger change.

Test plan

  • Extended manager.test.ts's existing stale-cache-redownload test to include a populated stale install directory and assert it's gone before the mocked install() is called (previously only asserted the redownload happened, not that the fix's purge step ran).
  • Added a new test for ensureBrowser({force: true}) purging the cache and bypassing a healthy cache/system-Chrome shortcut.
  • Fixed the shared fs mock's rmSync to actually simulate recursive deletion (drop nested tracked paths too), which the new tests need and the old ones never exercised.
  • Full CLI suite: bun run --cwd packages/cli test — 97 files / 1222 tests passing.
  • Full workspace bun run build — clean.
  • bunx oxlint / bunx oxfmt --check on changed files — clean.

…ries

Two independent reports of the same failure: a `chrome-headless-shell`
zip extraction gets interrupted (Windows AV lock, sleep/wake, ctrl-C)
and leaves only the alphabetically-early files (ABOUT/LICENSE) in the
target directory, no executable. Every subsequent `browser ensure` (or
implicit re-download from `findBrowser`/`ensureBrowser`) sees the
directory already exists and hands it straight to @puppeteer/browsers'
install(), which throws "folder exists but the executable is missing"
without re-extracting -- permanently wedging the machine until someone
manually deletes the directory. `--force` didn't help because it was a
phantom flag: `browser.ts` never declared it, so it silently did
nothing (mentioned only in an error-message string).

Root cause: `findFromCache()` already detects this exact case (dir
exists, exe missing) and returns it as `staleHyperframesCachePath`, but
`findBrowser()`/`ensureBrowser()` fed that straight into a re-download
without ever deleting the stale directory first, so install() hit the
same "exists" branch every time.

Fix:
- `findFromCache()` also returns `staleInstallPath` (InstalledBrowser's
  `.path` -- the actual install-folder root, not the missing
  executablePath) for the stale case.
- Both `findBrowser()` and `ensureBrowser()` now purge that directory
  (`rmSync`, inside the existing `withInstallLock` mutex from #1866 so
  a purge can't race a concurrent installer) before retrying, so
  install() actually re-extracts instead of erroring.
- Wired up a real `--force` flag on `hyperframes browser ensure`: it
  purges the whole HF-managed cache (reusing the already-tested
  `clearBrowser()`) and skips every cache/system shortcut, so it always
  gets a fresh download regardless of what's currently on disk --
  matching what the existing (previously false) help text already
  claimed it did.

Not fixed here (separate root cause, flagged for later): neither
report's machine had a usable auto-detected system Chrome fallback on
Windows -- `SYSTEM_CHROME_PATHS` only lists macOS/Linux paths, so
`findFromSystem()` can never succeed on win32. Both reporters worked
around this manually via HYPERFRAMES_BROWSER_PATH, which still works
fine; adding real Windows system-Chrome detection is a distinct,
larger change.

Test: extended manager.test.ts's existing stale-cache-redownload test
to include a populated stale install directory and assert it's gone
before the mocked install() is called (was previously only asserting
the redownload happened, not that the fix's purge step ran). Added a
new test for `ensureBrowser({force: true})` purging the cache and
bypassing a healthy cache/system-Chrome shortcut. Also fixed the shared
fs mock's `rmSync` to actually simulate recursive deletion (drop
nested tracked paths too), which the new tests need and the old ones
never exercised. Full CLI suite (1222 tests) passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant