Skip to content

e2e: abstract container behind backend interface (docker + hypeman)#273

Draft
rgarcia wants to merge 6 commits into
mainfrom
e2e-backend-interface
Draft

e2e: abstract container behind backend interface (docker + hypeman)#273
rgarcia wants to merge 6 commits into
mainfrom
e2e-backend-interface

Conversation

@rgarcia
Copy link
Copy Markdown
Contributor

@rgarcia rgarcia commented Jun 4, 2026

Summary

Abstracts the e2e browser instance behind a Backend interface in server/e2e with two interchangeable implementations selected by KI_E2E_BACKEND (default docker, so existing CI is unchanged):

  • Docker backend — the original testcontainers-go logic, moved behind the interface.
  • Hypeman backend — starts the image as a remote VM via github.com/kernel/hypeman-go, reaching it through the host's wildcard ingress (hostname {instance}.<domain>, routed by listen port, TLS-terminated). api reuses the host's existing 444→10001 browser ingress; cdp 9222 / cd 9224 are find-or-created once per host (matched by rule shape across all ingresses, never per-instance). Domain is derived from the base URL (KI_E2E_HYPEMAN_INGRESS_DOMAIN overrides); KI_E2E_HYPEMAN_RAW_IP=1 falls back to the instance's private IP.

The ~24 e2e_*_test.go files keep using *TestContainer unchanged; it's now a thin facade over the selected Backend.

Config / env

  • KI_E2E_BACKEND=docker|hypeman
  • Hypeman: HYPEMAN_BASE_URL + HYPEMAN_API_KEY (or KI_E2E_HYPEMAN_BASE_URL / HYPEMAN_AUTH_TOKEN); optional KI_E2E_HYPEMAN_INGRESS_DOMAIN, KI_E2E_HYPEMAN_INGRESS_TLS (default on), KI_E2E_HYPEMAN_RAW_IP, KI_E2E_HYPEMAN_GPU_PROFILE (vGPU images), KI_E2E_HYPEMAN_SIZE. Secrets are read from env only, never hardcoded.

Review feedback addressed

  • Dropped Container() testcontainers.Container from the interface + facade — it leaked Docker specifics and was dead (no test used it). The Docker backend keeps *testcontainers.Container internally.
  • De-leaked HostAccess — reframed from "Docker host.docker.internal (Docker backend only)" to a backend-agnostic capability ("reach a service on the test host"); the Docker backend maps host.docker.internal, the hypeman backend rejects it explicitly (no silent no-op) since a remote VM has no host-loopback bridge. Used by the private capmonster / persisted-login tests, which therefore stay on the Docker backend.

Verification (both backends exercised end-to-end)

Build/vet/unit: go build ./..., go vet ./e2e/ clean; table tests cover backend selection, raw-IP vs ingress vs TLS endpoint derivation, the per-role ingress params, domain derivation, and the HostAccess rejection.

Docker backend — PASS (onkernel/chromium-headful-private + chromium-headless-private):

--- PASS: TestScreenshotHeadless (41.78s)
--- PASS: TestDisplayResolutionChange (46.45s)

Hypeman backend — PASS against the live staging dev server (https://hypeman.dev-yul-hypeman-1.kernel.sh):

KI_E2E_BACKEND=hypeman \
HYPEMAN_BASE_URL=… HYPEMAN_API_KEY=… \
E2E_CHROMIUM_HEADLESS_IMAGE=onkernel/chromium-headless-private:be2ae22 \
go test -run TestDisplayResolutionChange ./e2e/
--- PASS: TestDisplayResolutionChange (30.13s)

This created a real instance, reused the :444→10001 ingress + created ki-e2e-cdp/ki-e2e-cd once, then drove PATCH /display (1024→1920×1080→1280×720) and verified Xvfb resolution via the API server + Exec, all over the TLS ingress. Instance + behavior confirmed; created ingresses persist for reuse, instances are cleaned up on Stop.

GPU (vGPU image): KI_E2E_HYPEMAN_GPU_PROFILE lets the backend boot chromium-headful-vgpu; the GPU-specific tests live in the private fork. They currently boot the vGPU instance to Running but its in-guest API needs the production GPU/Neko/NVIDIA-licensing env to become ready — tracked there.

Unblocks running the public e2e suite against the GPU image from kernel-images-private via the hypeman backend.


CI: running e2e against the Hypeman backend

Added a test-hypeman job to server-test.yaml that runs the same suite with KI_E2E_BACKEND=hypeman. It reuses the public onkernel/chromium-{headful,headless}:<sha> images that build-headful/build-headless push to Docker Hub — Hypeman pulls them itself on instance-create (any registry works via the host's docker creds; validated: the e2e suite already runs on Hypeman against a private onkernel/chromium-headless-private tag). Uses org var/secret HYPEMAN_API_URL / HYPEMAN_API_KEY. The runner needs no docker login.

This is the first full-suite run on the Hypeman backend; individual tests may still need backend-specific fixes, so it's reasonable to keep this check non-required in branch protection until it's consistently green.

Conceded: building images inside Hypeman (local-dev iteration) is blocked

The original goal also included a "build a local Dockerfile in Hypeman" path for local dev (edit Dockerfile → build in Hypeman → run e2e against it, without pushing to a registry). This is currently blocked and intentionally not implemented, because:

  • Hypeman can build from a local Dockerfile (POST /builds, async, usable as InstanceNewParams.Image — verified end-to-end with a trivial image), but
  • the builder VM's writable layer is RAM-backed and hard-capped at memory_mb=16384 (the API rejects more with memory_mb exceeds maximum of 16384 MB), and 16 GB is not enough for the chromium image build. Measured scaling on the headless Dockerfile: 2 GB → apt fails at ~18 s, 8 GB → ~28 s, 16 GB → apt passes but the concurrent Go-module/node stages then fail with no space left on device.

So building chromium in Hypeman needs a server-side change (a builder disk-size param decoupled from memory, a higher cap, or a pre-populated global_cache_key for the heavy base layers — the param exists with "ubuntu"/"browser" as documented example keys, which looks like the intended scaling path). Until then:

  • CI uses the registry-pull approach above (unblocked — CI already builds + pushes the images).
  • Local dev for the Hypeman backend should either use the Docker backend (devs already have Docker) or push their build to a registry Hypeman can pull. The local Dockerfile → Hypeman build path is deferred to a follow-up pending the builder-capacity fix.

Introduce a Backend interface in server/e2e that captures the public surface
the ~24 e2e_*_test.go files consume via *TestContainer (Start/Stop, the
API/CDP/ChromeDriver endpoint accessors, API clients, Wait* helpers, Exec,
ExitCh, Container). TestContainer is now a thin facade that delegates to a
Backend selected at construction time.

Two backends are provided:

- dockerBackend: the historical testcontainers-go logic, moved verbatim behind
  the interface. Default, so existing CI is unchanged.
- hypemanBackend: starts the image as a remote VM on a running Hypeman dev
  server via the github.com/kernel/hypeman-go client. Endpoints target the
  instance's network IP on the fixed guest ports (10001/9222/9224); Exec runs
  against the instance API server's /process/exec endpoint to preserve the
  (exitCode, combinedOutput, error) contract.

Backend selection is via the KI_E2E_BACKEND env var (docker|hypeman, default
docker). Hypeman connection details are read from env only and never hardcoded:
KI_E2E_HYPEMAN_BASE_URL (or HYPEMAN_BASE_URL) and HYPEMAN_AUTH_TOKEN (or the
SDK-native HYPEMAN_API_KEY). Optional GPU passthrough via
KI_E2E_HYPEMAN_GPU_DEVICES and VM sizing via KI_E2E_HYPEMAN_SIZE.

Test changes are minimal: six direct port-field accesses in two test files now
use backend-agnostic accessors (CDPAddr, ChromeDriverURL, plus new
ChromeDriverAddr/ChromeDriverWSURL helpers) instead of hardcoding
127.0.0.1:<port>, which only ever worked for the Docker backend.

Added infra-free unit tests for backend selection and hypeman config
validation. This unblocks running the e2e suite against the GPU image
(chromium-headful-vgpu) from kernel-images-private via the hypeman backend.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

socket-security Bot commented Jun 4, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updatedgolang/​github.com/​docker/​docker@​v28.5.1+incompatible ⏵ v28.5.2+incompatible7270100100100
Addedgolang/​github.com/​kernel/​hypeman-go@​v0.20.071100100100100
Updatedgolang/​golang.org/​x/​sync@​v0.17.0 ⏵ v0.18.099100100100100

View full report

@socket-security
Copy link
Copy Markdown

socket-security Bot commented Jun 4, 2026

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action Severity Alert  (click "▶" to expand/collapse)
Warn Critical
Critical CVE: gRPC-Go has an authorization bypass via missing leading slash in :path in golang google.golang.org/grpc

CVE: GHSA-p77j-4mvh-x3m3 gRPC-Go has an authorization bypass via missing leading slash in :path (CRITICAL)

Affected versions: < 1.79.3

Patched version: 1.79.3

From: ?golang/google.golang.org/grpc@v1.75.1

ℹ Read more on: This package | This alert | What is a critical CVE?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support@socket.dev.

Suggestion: Remove or replace dependencies that include known critical CVEs. Consumers can use dependency overrides or npm audit fix --force to remove vulnerable dependencies.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore golang/google.golang.org/grpc@v1.75.1. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

rgarcia and others added 5 commits June 4, 2026 11:07
Addresses review feedback on the backend interface:

- Remove Container() testcontainers.Container from the Backend interface (and
  the TestContainer facade). It leaked Docker-specifics into the otherwise
  backend-agnostic surface and was dead: no e2e test consumed it. The Docker
  backend keeps its *testcontainers.Container internally for Start/Exec.

- Hypeman backend: reach instances via a single host-level wildcard ingress
  (find-or-create, keyed by tag managed-by=ki-e2e) instead of the instance's
  private network IP. Set KI_E2E_HYPEMAN_INGRESS_DOMAIN to route
  "<instance>-<role>.<domain>" through the host's reverse proxy to guest ports
  10001/9222/9224; ingress is created at most once per host and never per
  instance. Unset = previous raw-IP behavior (needs L3 reachability to the
  instance subnet). KI_E2E_HYPEMAN_INGRESS_TLS toggles https/wss on :443.

Verification: go build ./... and go vet ./e2e/ pass; new table tests cover
raw-IP, ingress, and TLS endpoint derivation plus the shared-ingress params.
Docker-backend e2e (TestDisplayResolutionChange + TestScreenshotHeadless)
passes against onkernel/chromium-headful-private + chromium-headless-private.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… Start)

Per review: Start() reading env vars is surprising and couples the backend to the
process environment. Introduce hypemanConfig holding every option (BaseURL, Token,
IngressDomain, IngressTLS, RawIP, Size, DiskIOBps, GPUDevices, GPUProfile).
newHypemanBackend(image, cfg) and Start now consume only the struct — env parsing
collapses to a single hypemanConfigFromEnv() called by the e2e factory, so other
callers can populate options explicitly and never touch the environment.

Also defaults DiskIOBps to 62MB/s (KI_E2E_HYPEMAN_DISK_IO_BPS overrides): ad-hoc
hypeman instances otherwise get ~15MB/s, which starves the in-guest playwright
daemon's cold first-read (~43MB of node_modules) past its 5s start budget. With
62MB/s the daemon starts in time — validated: persist_login TestCookiePersistence
Headless now PASSES on hypeman (was failing on "playwright daemon failed to start
within 5s").

go build/vet/unit pass (incl. new TestHypemanConfigFromEnv); live hypeman
TestDisplayResolutionChange passes via the new construction path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ckend

Mirrors the `test` job but with KI_E2E_BACKEND=hypeman, pointing
E2E_CHROMIUM_*_IMAGE at the public onkernel/chromium-{headful,headless}:<sha>
tags that build-headful/build-headless just pushed. Hypeman pulls those images
itself on instance create, so the runner needs no docker login. Uses org
var/secret HYPEMAN_API_URL / HYPEMAN_API_KEY.

Note: we deliberately do NOT build the images inside Hypeman — its builder VM's
writable layer is RAM-backed and hard-capped at memory_mb=16384, which is too
small for the chromium image build (fails with "no space left on device"). The
registry-pull approach sidesteps that entirely. See PR description.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant