feat: add self contained test-running ability to boxel-cli#4914
feat: add self contained test-running ability to boxel-cli#4914jurgenwerk wants to merge 14 commits into
Conversation
`boxel test` now ships its own copy of the host's compiled test bundle in `bundled-test-harness/`, populated at release time by `scripts/build-test-harness.ts` from `packages/host/dist/`. The resolver prefers the bundled dir, falls back to the sibling `packages/host/dist/` for in-monorepo dev. `@playwright/test` moves from devDependencies to dependencies so a published install has the driver available. Tradeoff: the bundle is ~60MB. Stripping Monaco editor chunks looked tempting (drops it to ~40MB) but the host's Ember service container imports them during boot — 404s break test-page init silently with only a 5-minute `waitForFunction` timeout. Don't re-strip without auditing the host's eager imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Host Test Results 1 files 1 suites 1h 34m 3s ⏱️ Results for commit 361f82f. Realm Server Test Results 1 files ±0 1 suites ±0 9m 39s ⏱️ +44s Results for commit 361f82f. ± Comparison against earlier commit 94656aa. |
Drops the realm-server proxy round-trip out of the agent loop. The test command now defaults to reading cards from a local workspace dir (cwd or [path] arg), starts a single in-process server that both hosts the existing test-page bundle and transpiles `.gts`/ `.ts` modules on demand via runtime-common's `transpileJS`. Base realm source is vendored into `bundled-realms/` (1.7MB) so a published install resolves `https://cardstack.com/base/...` imports without anything else on disk. `--realm <url>` is now opt-in for the older remote-realm flow. Post-build copies `content_tag_bg.wasm` next to `dist/index.js` — esbuild inlines content-tag's wasm loader, whose `readFileSync` then resolves against the CLI's dist/. The 60MB `bundled-test-harness/` still ships as-is; the slim extraction work follows in this same PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`boxel test` now ships only the chunks chromium actually loads during a card test, captured into `scripts/test-harness-manifest.json` by running the test runner with `BOXEL_TEST_HARNESS_MANIFEST=<path>`. `build-test-harness.ts` copies just the manifest-listed files from `host/dist/` into `bundled-test-harness/`, dropping the AI assistant, code-mode, monaco workers, cytoscape, katex, and all the commands that card tests don't exercise. Cuts the harness from ~60MB (sourcemaps-stripped host dist) to ~27MB without touching the host's Embroider config. Smoke-tested against a multi-test sticky-note workspace from /tmp — 4 passed, manifest stable across renderCard variants. Regenerate the manifest when host adds runtime deps card tests load: the build script header documents the one-line capture command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The slim manifest filter still reads from `packages/host/dist/`, so `pnpm run build` in `packages/boxel-cli` needs the host's dev bundle on disk. Add the build step explicitly with a comment pointing at why production mode is wrong (strips test entry). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR makes boxel test self-contained by default by bundling a slimmed host QUnit harness and vendoring base/skills realm sources, then running tests via a single in-process HTTP server that serves both the test page assets and locally transpiled realm modules.
Changes:
- Add a local-mode default for
boxel testthat serves workspace + bundled realms via an in-process server and runs QUnit in Playwright Chromium. - Introduce build scripts to vendor
bundled-realms/and copy a manifest-prunedbundled-test-harness/frompackages/host/dist/. - Update packaging/publish workflow so Playwright is a runtime dependency and the host build runs before publishing the CLI.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Moves @playwright/test to the boxel-cli runtime dependency set in the lockfile. |
| packages/software-factory/docs/runbook.md | Updates runbook to reflect local-only default boxel test behavior and --realm opt-in. |
| packages/software-factory/.agents/skills/software-factory-operations/SKILL.md | Updates skill docs/examples for new local-mode boxel test [path] flow. |
| packages/boxel-cli/tsconfig.json | Excludes bundled artifacts from TS compilation. |
| packages/boxel-cli/src/commands/test.ts | Implements local-mode test execution, unified test-page + module server, and manifest capture. |
| packages/boxel-cli/scripts/test-harness-manifest.json | Adds the committed allowlist of host dist assets to ship in the slim harness. |
| packages/boxel-cli/scripts/build.ts | Copies content_tag_bg.wasm into dist/ so transpilation works in the bundled CLI. |
| packages/boxel-cli/scripts/build-test-harness.ts | Adds manifest-driven harness bundling from packages/host/dist/. |
| packages/boxel-cli/scripts/build-realms.ts | Adds realm vendoring script for base/ and skills-realm/contents/. |
| packages/boxel-cli/package.json | Ships bundled realms/harness in published package; makes Playwright a dependency; wires new build steps. |
| packages/boxel-cli/.gitignore | Ignores new bundled artifact directories. |
| packages/boxel-cli/.eslintignore | Ignores new bundled artifact directories for linting. |
| .github/workflows/boxel-cli-publish.yml | Builds @cardstack/host before building/publishing the CLI so harness inputs exist. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)
packages/boxel-cli/src/commands/test.ts:26
- The comment above
loadChromium()still says@playwright/testis a devDependency and won’t be present in a published install, but this PR moves@playwright/testtodependenciesso it will be present at runtime. Please update/remove the comment to reflect the new packaging model (it’s still external to esbuild, but not a devDependency anymore).
// `@playwright/test` is a devDependency and external in our esbuild
// config, so it's not present in a published-from-npm install. Anything
// loaded at the top of this module would crash `boxel --help` for end
// users who never run `boxel test`. Resolved lazily inside the runner
// instead.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| .command('test') | ||
| .description( | ||
| "Run the realm's QUnit test suite (every `*.test.gts` file) in a headless Chromium driven against the host app. Monorepo-only: relies on the host app's compiled `dist/` being reachable from this CLI's location (or via TEST_HARNESS_HOST_DIST_PACKAGE_DIR).", | ||
| 'Run every `*.test.gts` file in a workspace directory in a headless Chromium driven against the host app. Defaults to type-checking and serving cards from the local workspace (cwd or [path]); pass `--realm <url>` to test cards already on a remote realm instead.', |
| * `src/lib/local-module-server.ts`) mounts these directories at `/base/` | ||
| * and `/skills/` and transpiles `.gts` / `.ts` on demand. Cards that |
| for (let entry of entries) { | ||
| // Manifest entries are URL paths captured from the test runner. | ||
| // The CLI's local-mode realm mounts (`/workspace/`, `/base/`, | ||
| // `/skills/`) and the root request (`/`) are served at runtime, | ||
| // not from host/dist — skip those. | ||
| if ( | ||
| entry === '/' || | ||
| entry.startsWith('/workspace/') || | ||
| entry.startsWith('/base/') || | ||
| entry.startsWith('/skills/') | ||
| ) { | ||
| continue; | ||
| } | ||
| let rel = entry.replace(/^\//, ''); | ||
| let src = join(HOST_DIST, rel); | ||
| let dst = join(OUT_DIR, rel); | ||
| try { | ||
| let st = statSync(src); | ||
| if (!st.isFile()) { | ||
| skipped.push(entry); | ||
| continue; | ||
| } | ||
| } catch { | ||
| skipped.push(entry); | ||
| continue; | ||
| } | ||
| mkdirSync(dirname(dst), { recursive: true }); | ||
| copyFileSync(src, dst); | ||
| copied++; |
- Fix `boxel test` --help wording: local mode doesn't typecheck, it serves cards via an in-process transpiling server. - Drop the stale `src/lib/local-module-server.ts` reference from `build-realms.ts` header; that file was merged into `test.ts`. - Harden `build-test-harness.ts`: fail loudly when files in `ALWAYS_INCLUDE` (`tests/index.html`) are missing — that's the early signal for a production-mode host build that strips test entries. Also refuse to copy `*.map` even if a stale manifest captured one. - Update the lazy-load-playwright comment to match the new packaging model — Playwright moved from devDependency to runtime dependency in the previous commit on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `BOXEL_TEST_HARNESS_MANIFEST` capture was recording every same-origin request — including `/workspace/sticky-note.test` and similar paths served from the runtime realm mounts. Those are useless in the committed manifest (the build script already skips them) and made it noisy/confusing. Filter at capture time using the runner's `realmMounts` config so only host/dist paths land in the manifest. Regenerated against the factory-test-cs11165-1xx workspace's 11-test sticky-note suite (was: 268 entries with 5 realm-mount cruft; now: 232 host-dist entries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`BOXEL_TEST_RUNNER_TIMEOUT_MS` overrides the `page.waitForFunction` ceiling (default 5 min). Used by the manifest-minimization script to fail fast on certain-failure removals without burning the full CI-friendly default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The captured manifest needed manual regeneration whenever the host's chunks re-hashed (which happens on basically any host content change). That made correctness depend on a maintainer remembering to refresh the JSON before publishing — silent staleness, easy to forget, real maintenance burden. Going with the "no manifest" approach: copy `packages/host/dist/` into `bundled-test-harness/` wholesale, dropping only sourcemaps. 60 MB unpacked, ~15 MB on the wire. Next to the 150 MB Playwright chromium download every `boxel test` user already needs, the size delta is noise. Zero possibility of staleness because there is no snapshot to drift. Removes: - `scripts/test-harness-manifest.json` (committed allowlist) - `BOXEL_TEST_HARNESS_MANIFEST` env capture path in test.ts - All the manifest filtering / required-file guards in `build-test-harness.ts` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stop bundling `packages/host/dist/` inside the boxel-cli npm tarball.
Instead, host test harness ships as a separate GH Release tarball
that boxel-cli downloads on first `boxel test` per CLI version and
caches under `~/.cache/boxel-cli/host-test-harness/<version>/`.
Why: bundling at CLI build time tied every boxel-cli publish to a
host build, and shipping the full ~60 MB inside every CLI install
was overkill for a one-time-download. The release-then-fetch shape
decouples the two: the host test harness gets cut on demand
(workflow_dispatch by a maintainer), boxel-cli embeds a pinned
version + sha256, and `boxel test` resolves through the fetcher.
Resolution order in the fetcher:
1. --host-dist-dir flag → use as-is
2. BOXEL_TEST_HARNESS_DIR env → use as-is (for CI)
3. Monorepo sibling packages/host/dist/ → use directly
(so in-repo devs never download)
4. Local cache hit at the pinned version → use
5. Download from GH release, sha256-verify, extract, use
New files:
- .github/workflows/host-test-harness-publish.yml
manual workflow_dispatch; builds host, tars dist minus
sourcemaps, creates a host-test-harness-v<ver> release.
- packages/boxel-cli/host-test-harness.json
version + sha256 pin (currently the 0.0.0-placeholder until a
maintainer cuts the first release).
- packages/boxel-cli/src/lib/host-test-harness-fetcher.ts
the resolver / downloader / sha256-verifier / cache manager.
Removed:
- packages/boxel-cli/scripts/build-test-harness.ts
- bundled-test-harness/ tracking in .gitignore, .eslintignore,
tsconfig.json, package.json files array, build chain, clean
target.
- The pnpm --filter @cardstack/host build step in the boxel-cli
publish workflow.
New CLI flags: --refresh-harness, --offline-tarball <path>.
Smoke-tested in-monorepo (resolves to sibling, 11/11 passing).
Placeholder-pin guard verified by stashing the sibling and
re-running — surfaces a clear "no release cut yet" error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit 1f69ddf.
Two unrelated CI failures revealed by the latest PR run: 1. `boxel-cli-build` job had no `host/dist/` because it didn't depend on `test-web-assets`. Add the dependency and the same download+restore steps the other CLI jobs use, so the host bundle is available when `build:test-harness` reads from it. 2. `build:realms` exited 1 on missing `packages/skills-realm/contents/`. That directory is gitignored (cloned separately via `pnpm --filter @cardstack/skills-realm skills:setup`) and isn't present in CI workspaces. Card tests don't actually load anything from `/skills/` at runtime — that mount is for AI-assistant scaffolding — so mark `skills` as a non-required realm and skip with a log when missing. Verified locally by stashing `packages/skills-realm/contents/` and running `pnpm build:realms` — produces the slightly smaller bundled-realms output (1.41 MB without skills vs 1.70 MB with). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…f-contained-qunit-test-harness-into-boxel-cli
Added earlier on this branch so the manifest-minimization script could fail-fast at 30 s per iteration. We abandoned the manifest approach, so the knob no longer has a consumer. Restore the simple 5-minute waitForFunction default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'manifest-listed chunks' / '~27MB despite the host producing 60MB' wording was left over from when we shipped a manifest-pruned slim cut. We reverted to the full host dev dist; only sourcemaps are dropped now. Update the comment to match.
|
My idea was to try to slim down the build needed for the harness but doing this looks like a significant endeavor in the embroider territory, so I opted into simply bundling the development host build into the cli (the description has info on the CLI size impact) that has the test runner and related support. This is hard because a card test imports Application, which imports compat-modules, which has 138 top-level static import statements for every file under app/. Removing any one of those chunks causes the ES module graph to abort at app.ts evaluation — before Ember boots, before QUnit loads, before any test code runs. Claude recommends we could do this:
I could try this, but perhaps bundling the host development dist into boxel cli for purposes of running the tests isn't such a big deal? On the other hand it's a lot of bloat, one other option I could think of is to publish the build as a package and let the boxel-cli download (and cache it) it only when running |
| * Copies the dev-mode host build wholesale, dropping only sourcemaps. | ||
| * Tried a manifest-driven slim cut (~27 MB instead of ~60 MB), but | ||
| * the manifest needed manual regen whenever the host's chunks | ||
| * re-hashed, and we don't want correctness to depend on human | ||
| * upkeep — see `feedback_no_manual_maintenance` in memory. The full | ||
| * dist is ~60 MB unpacked and ~15 MB on the wire; next to the | ||
| * 150 MB Playwright chromium install every `boxel test` user | ||
| * already needs, it's noise. |
There was a problem hiding this comment.
that's an interesting approach. like we chatted about the other day, it would be interesting if in rthe future we could spearate this into its own npm package (boxel-cli-test-support), that would prompt the user to install if you try to run the test validations.
Summary
boxel testis now self-contained and local-only by default — no realm-server, no Vite dev server, no push step between writing code and running tests. The factory's agent loop tightens fromwrite → realm push → boxel testto justwrite → boxel test, saving a tool-call turn per validation cycle.What ships
boxel testwith no args): the CLI starts an in-process HTTP server that hosts both the test page (frombundled-test-harness/) and the workspace's realm modules..gts/.tsfiles are transpiled on demand viaruntime-common'stranspileJS. Workspace cards are read from cwd; base + skills realm sources are vendored underbundled-realms/.pnpm build:test-harnesscopies the host's dev-modedist/intobundled-test-harness/, dropping only sourcemaps. ~60 MB unpacked / ~15 MB compressed. See "Why we ship the whole host dist" below for the rationale.http.createServerlistening on a random localhost port. Same origin for everything chromium fetches; module mounts at/workspace/,/base/,/skills/, the host bundle assets at/assets/..., and the test-page HTML at/.boxel test --realm <url>opts back into the older remote-realm flow for testing cards already on a published realm.Before vs after
packages/host/dist/(monorepo layout required)bundled-test-harness/shipped in the npm tarball--realmflag onboxel testboxel realm pushbefore each test iterationInstallation footprint
npm install -g @cardstack/boxel-cliboxel testfor the first timenpx playwright install chromiumstep the CLI's error message prompts; not auto-installed)boxel testuserWhy we ship the whole host dist (and not a pruned slim subset)
Spent meaningful time on this during the branch and rejected three different slim approaches. Recording the reasoning so future-us doesn't repeat it:
Attempt 1 — captured manifest of "what chromium actually fetches". Ran a card test with a network-request recorder, committed the resulting list of paths, had
build-test-harness.tscopy only those files. Got the harness from 60 MB → 27 MB. Rejected because:runtime-common-Czdr1nwr.js). Any host content change re-hashes filenames, the committed paths 404, and the slim bundle silently misses chunks.Attempt 2 — bisecting minimizer over the captured set. Wrote a script that removed each captured entry, re-ran the test, kept the removal only if tests stayed green. ~232 entries. After 15 minutes of running and ~30 KEEP/REMOVED decisions, observed a ~7% success rate — almost everything tried as "obviously UI" (submode-switcher, modal-container, AI-assistant chunks, code-mode commands, etc.) was actually needed at runtime. Investigated why: it's Embroider's
@embroider/virtual/compat-modulesauto-glob. The host'sapp/app.tsdoesResolver.withModules(compatModules), which materializes every.gtsunderapp/services/,app/components/,app/commands/,app/routes/. The Ember resolver eagerly imports them at app boot — not because the test code uses them, but because the resolver walks the keyspace and pre-touches each entry. Removing any auto-globbed chunk fails the boot chain even though the test never executes that code. Rejected because the only way past the floor is to refactor the host (makemonaco-service,ai-assistant-panel-service, etc. truly lazy / dynamic-import) — multi-day Embroider work that the ticket explicitly flagged as the bulk of the effort, and out of scope for what this PR is doing.Attempt 3 — runtime fetch from GitHub Release. Split the harness into its own
host-test-harness-v*release, downloaded on firstboxel testper CLI version, cached under~/.cache/boxel-cli/. Got the CLI tarball down to ~3 MB. Rejected because:~/.cache/anyway.So we ship the full ~60 MB. The size context, honestly: a slim cut would save ~30 MB on disk for
boxel testusers. They're already paying 150 MB for chromium one-time, and the CLI install dominates only briefly until chromium downloads, after which it's ~25% of the total. Not nothing, but not worth the staleness/maintenance/infra cost the slim approaches require — and the only real way to a 5–10 MB harness is the host-side lazy-loading refactor anyway.Files & flow
packages/boxel-cli/scripts/build-test-harness.ts— copiespackages/host/dist/minus sourcemaps intobundled-test-harness/. Refuses to proceed iftests/index.htmlis missing (catches accidental production-mode host builds).packages/boxel-cli/scripts/build-realms.ts— vendorspackages/base/andpackages/skills-realm/contents/intobundled-realms/. Skills realm is optional (its contents are gitignored; CI doesn't have it; runtime never reaches/skills/from a card test).packages/boxel-cli/src/commands/test.ts— addsrunTestsLocally;--realmbecomes opt-in; the test-page server gains realm-mount routing and on-demand transpile.packages/boxel-cli/scripts/build.ts— copiescontent_tag_bg.wasmnext todist/index.jspost-build (esbuild inlines content-tag's wasm loader; the loader'sreadFileSyncresolves against the CLI's dist dir)..github/workflows/boxel-cli-publish.yml— addspnpm --filter @cardstack/host buildbefore the CLI build (the bundled-test-harness step reads host's dev dist)..github/workflows/ci.yaml—boxel-cli-buildjob now depends ontest-web-assetsso the cached host dist is restored before the CLI build runs.packages/boxel-cli/package.json—@playwright/testmoves todependencies(wasdevDependencies) so the headless-browser driver ships with a published install.bundled-test-harness/,bundled-realms/added tofiles. Chromium itself is not auto-downloaded on install; users runnpx playwright install chromiumonce.software-factory-operationsskill — rewritten around the local-mode default.Other architecture decisions
boxel realm pull— the user syncs once from a realm-server (local, staging, or prod), then tests entirely locally. Validates the agent-token-efficiency goal: one command per validation cycle after the initial sync.CS-11164 in Linear
Test plan
pnpm buildproduces ~60 MBbundled-test-harness/, ~1.7 MBbundled-realms/, ~2.4 MB CLI bundle.pnpm lint(js + types) clean.cd /tmp/cs-11164-check && boxel testfrom a workspace pulled viaboxel realm pull(no monorepo on PATH, no realm-server running, no Vite) → 11 passed, 0 failed (~3 s).pnpm pack+npm install -g <tarball>,which boxelresolves to the npm-installed binary,boxel testfrom the same workspace → same result. Tarball ~18 MB.--debugshows the merged-server origin handling correctly and only emits chromium console + page errors.BOXEL_TEST_HARNESS_DIR=<path>env override and--host-dist-dir <path>flag still resolve to the right dist.🤖 Generated with Claude Code