e2e: abstract container behind backend interface (docker + hypeman)#273
e2e: abstract container behind backend interface (docker + hypeman)#273rgarcia wants to merge 6 commits into
Conversation
Introduce a Backend interface in server/e2e that captures the public surface the ~24 e2e_*_test.go files consume via *TestContainer (Start/Stop, the API/CDP/ChromeDriver endpoint accessors, API clients, Wait* helpers, Exec, ExitCh, Container). TestContainer is now a thin facade that delegates to a Backend selected at construction time. Two backends are provided: - dockerBackend: the historical testcontainers-go logic, moved verbatim behind the interface. Default, so existing CI is unchanged. - hypemanBackend: starts the image as a remote VM on a running Hypeman dev server via the github.com/kernel/hypeman-go client. Endpoints target the instance's network IP on the fixed guest ports (10001/9222/9224); Exec runs against the instance API server's /process/exec endpoint to preserve the (exitCode, combinedOutput, error) contract. Backend selection is via the KI_E2E_BACKEND env var (docker|hypeman, default docker). Hypeman connection details are read from env only and never hardcoded: KI_E2E_HYPEMAN_BASE_URL (or HYPEMAN_BASE_URL) and HYPEMAN_AUTH_TOKEN (or the SDK-native HYPEMAN_API_KEY). Optional GPU passthrough via KI_E2E_HYPEMAN_GPU_DEVICES and VM sizing via KI_E2E_HYPEMAN_SIZE. Test changes are minimal: six direct port-field accesses in two test files now use backend-agnostic accessors (CDPAddr, ChromeDriverURL, plus new ChromeDriverAddr/ChromeDriverWSURL helpers) instead of hardcoding 127.0.0.1:<port>, which only ever worked for the Docker backend. Added infra-free unit tests for backend selection and hypeman config validation. This unblocks running the e2e suite against the GPU image (chromium-headful-vgpu) from kernel-images-private via the hypeman backend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
Warning Review the following alerts detected in dependencies. According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.
|
Addresses review feedback on the backend interface: - Remove Container() testcontainers.Container from the Backend interface (and the TestContainer facade). It leaked Docker-specifics into the otherwise backend-agnostic surface and was dead: no e2e test consumed it. The Docker backend keeps its *testcontainers.Container internally for Start/Exec. - Hypeman backend: reach instances via a single host-level wildcard ingress (find-or-create, keyed by tag managed-by=ki-e2e) instead of the instance's private network IP. Set KI_E2E_HYPEMAN_INGRESS_DOMAIN to route "<instance>-<role>.<domain>" through the host's reverse proxy to guest ports 10001/9222/9224; ingress is created at most once per host and never per instance. Unset = previous raw-IP behavior (needs L3 reachability to the instance subnet). KI_E2E_HYPEMAN_INGRESS_TLS toggles https/wss on :443. Verification: go build ./... and go vet ./e2e/ pass; new table tests cover raw-IP, ingress, and TLS endpoint derivation plus the shared-ingress params. Docker-backend e2e (TestDisplayResolutionChange + TestScreenshotHeadless) passes against onkernel/chromium-headful-private + chromium-headless-private. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… Start) Per review: Start() reading env vars is surprising and couples the backend to the process environment. Introduce hypemanConfig holding every option (BaseURL, Token, IngressDomain, IngressTLS, RawIP, Size, DiskIOBps, GPUDevices, GPUProfile). newHypemanBackend(image, cfg) and Start now consume only the struct — env parsing collapses to a single hypemanConfigFromEnv() called by the e2e factory, so other callers can populate options explicitly and never touch the environment. Also defaults DiskIOBps to 62MB/s (KI_E2E_HYPEMAN_DISK_IO_BPS overrides): ad-hoc hypeman instances otherwise get ~15MB/s, which starves the in-guest playwright daemon's cold first-read (~43MB of node_modules) past its 5s start budget. With 62MB/s the daemon starts in time — validated: persist_login TestCookiePersistence Headless now PASSES on hypeman (was failing on "playwright daemon failed to start within 5s"). go build/vet/unit pass (incl. new TestHypemanConfigFromEnv); live hypeman TestDisplayResolutionChange passes via the new construction path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ckend
Mirrors the `test` job but with KI_E2E_BACKEND=hypeman, pointing
E2E_CHROMIUM_*_IMAGE at the public onkernel/chromium-{headful,headless}:<sha>
tags that build-headful/build-headless just pushed. Hypeman pulls those images
itself on instance create, so the runner needs no docker login. Uses org
var/secret HYPEMAN_API_URL / HYPEMAN_API_KEY.
Note: we deliberately do NOT build the images inside Hypeman — its builder VM's
writable layer is RAM-backed and hard-capped at memory_mb=16384, which is too
small for the chromium image build (fails with "no space left on device"). The
registry-pull approach sidesteps that entirely. See PR description.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Abstracts the e2e browser instance behind a
Backendinterface inserver/e2ewith two interchangeable implementations selected byKI_E2E_BACKEND(defaultdocker, so existing CI is unchanged):testcontainers-gologic, moved behind the interface.github.com/kernel/hypeman-go, reaching it through the host's wildcard ingress (hostname{instance}.<domain>, routed by listen port, TLS-terminated).apireuses the host's existing444→10001browser ingress;cdp 9222/cd 9224are find-or-created once per host (matched by rule shape across all ingresses, never per-instance). Domain is derived from the base URL (KI_E2E_HYPEMAN_INGRESS_DOMAINoverrides);KI_E2E_HYPEMAN_RAW_IP=1falls back to the instance's private IP.The ~24
e2e_*_test.gofiles keep using*TestContainerunchanged; it's now a thin facade over the selectedBackend.Config / env
KI_E2E_BACKEND=docker|hypemanHYPEMAN_BASE_URL+HYPEMAN_API_KEY(orKI_E2E_HYPEMAN_BASE_URL/HYPEMAN_AUTH_TOKEN); optionalKI_E2E_HYPEMAN_INGRESS_DOMAIN,KI_E2E_HYPEMAN_INGRESS_TLS(default on),KI_E2E_HYPEMAN_RAW_IP,KI_E2E_HYPEMAN_GPU_PROFILE(vGPU images),KI_E2E_HYPEMAN_SIZE. Secrets are read from env only, never hardcoded.Review feedback addressed
Container() testcontainers.Containerfrom the interface + facade — it leaked Docker specifics and was dead (no test used it). The Docker backend keeps*testcontainers.Containerinternally.HostAccess— reframed from "Docker host.docker.internal (Docker backend only)" to a backend-agnostic capability ("reach a service on the test host"); the Docker backend mapshost.docker.internal, the hypeman backend rejects it explicitly (no silent no-op) since a remote VM has no host-loopback bridge. Used by the private capmonster / persisted-login tests, which therefore stay on the Docker backend.Verification (both backends exercised end-to-end)
Build/vet/unit:
go build ./...,go vet ./e2e/clean; table tests cover backend selection, raw-IP vs ingress vs TLS endpoint derivation, the per-role ingress params, domain derivation, and the HostAccess rejection.Docker backend — PASS (
onkernel/chromium-headful-private+chromium-headless-private):Hypeman backend — PASS against the live staging dev server (
https://hypeman.dev-yul-hypeman-1.kernel.sh):This created a real instance, reused the
:444→10001ingress + createdki-e2e-cdp/ki-e2e-cdonce, then drovePATCH /display(1024→1920×1080→1280×720) and verified Xvfb resolution via the API server +Exec, all over the TLS ingress. Instance + behavior confirmed; created ingresses persist for reuse, instances are cleaned up onStop.GPU (vGPU image):
KI_E2E_HYPEMAN_GPU_PROFILElets the backend bootchromium-headful-vgpu; the GPU-specific tests live in the private fork. They currently boot the vGPU instance to Running but its in-guest API needs the production GPU/Neko/NVIDIA-licensing env to become ready — tracked there.Unblocks running the public e2e suite against the GPU image from kernel-images-private via the hypeman backend.
CI: running e2e against the Hypeman backend
Added a
test-hypemanjob toserver-test.yamlthat runs the same suite withKI_E2E_BACKEND=hypeman. It reuses the publiconkernel/chromium-{headful,headless}:<sha>images thatbuild-headful/build-headlesspush to Docker Hub — Hypeman pulls them itself on instance-create (any registry works via the host's docker creds; validated: the e2e suite already runs on Hypeman against a privateonkernel/chromium-headless-privatetag). Uses org var/secretHYPEMAN_API_URL/HYPEMAN_API_KEY. The runner needs no docker login.This is the first full-suite run on the Hypeman backend; individual tests may still need backend-specific fixes, so it's reasonable to keep this check non-required in branch protection until it's consistently green.
Conceded: building images inside Hypeman (local-dev iteration) is blocked
The original goal also included a "build a local Dockerfile in Hypeman" path for local dev (edit Dockerfile → build in Hypeman → run e2e against it, without pushing to a registry). This is currently blocked and intentionally not implemented, because:
POST /builds, async, usable asInstanceNewParams.Image— verified end-to-end with a trivial image), butmemory_mb=16384(the API rejects more withmemory_mb exceeds maximum of 16384 MB), and 16 GB is not enough for the chromium image build. Measured scaling on the headless Dockerfile: 2 GB → apt fails at ~18 s, 8 GB → ~28 s, 16 GB → apt passes but the concurrent Go-module/node stages then fail withno space left on device.So building chromium in Hypeman needs a server-side change (a builder disk-size param decoupled from memory, a higher cap, or a pre-populated
global_cache_keyfor the heavy base layers — the param exists with"ubuntu"/"browser"as documented example keys, which looks like the intended scaling path). Until then: