This document provides a comprehensive analysis of the test fixtures, runners, matchers, and infrastructure used by the gh-aw-firewall integration test suite.
- Overview
- Test Runner Architecture
- Abstractions Provided
- Batch Runner Pattern
- Cleanup Strategy
- CI Workflow Post-Processing
- Limitations
- Improvement Opportunities
The test infrastructure lives in two primary locations:
| Path | Purpose |
|---|---|
tests/fixtures/ |
Reusable test helpers: runners, matchers, cleanup, log parsing |
tests/setup/ |
Jest configuration and setup files |
scripts/ci/ |
CI-specific cleanup and workflow post-processing scripts |
The suite contains 26 integration test files across tests/integration/, all executed serially via Jest with a 120-second per-test timeout.
| File | Lines | Role |
|---|---|---|
tests/fixtures/awf-runner.ts |
331 | Core test runner — wraps AWF CLI invocations |
tests/fixtures/batch-runner.ts |
118 | Batches multiple commands into a single container |
tests/fixtures/assertions.ts |
179 | Custom Jest matchers (toSucceed, toFail, etc.) |
tests/fixtures/docker-helper.ts |
297 | Low-level Docker operations helper |
tests/fixtures/cleanup.ts |
209 | TypeScript port of cleanup.sh |
tests/fixtures/log-parser.ts |
224 | Squid and iptables log parsing |
tests/setup/jest.integration.config.js |
24 | Jest config for integration tests |
tests/setup/jest.setup.ts |
9 | Registers custom matchers globally |
scripts/ci/cleanup.sh |
56 | Bash cleanup script for CI |
scripts/ci/postprocess-smoke-workflows.ts |
150 | Post-processes compiled workflow YAML for CI |
The AwfRunner class is the central abstraction for integration tests. It wraps the AWF CLI binary and provides two execution modes:
- Runs
node dist/cli.js <args> -- <command>directly - Suitable for tests that don't require iptables (no sudo)
- Rarely used in practice (most tests need sudo for iptables)
- Runs
sudo -E --preserve-env=PATH,HOME,... node dist/cli.js <args> -- <command> - Preserves critical environment variables (
PATH,HOME,GOROOT,CARGO_HOME,JAVA_HOME,DOTNET_ROOT) - Required for real firewall operation (iptables NAT rules)
- Used by ~95% of integration tests
All CLI flags are exposed as typed options:
interface AwfOptions {
allowDomains?: string[];
keepContainers?: boolean;
logLevel?: 'debug' | 'info' | 'warn' | 'error';
buildLocal?: boolean;
imageRegistry?: string;
imageTag?: string;
timeout?: number; // Default: 120000ms
env?: Record<string, string>;
volumeMounts?: string[];
containerWorkDir?: string;
tty?: boolean;
dnsServers?: string[];
allowHostPorts?: string;
enableApiProxy?: boolean;
}Every invocation returns a structured result:
interface AwfResult {
exitCode: number;
stdout: string;
stderr: string;
success: boolean; // exitCode === 0
timedOut: boolean;
workDir?: string; // Extracted from stderr: /tmp/awf-<timestamp>
}The workDir is particularly important — it's extracted from AWF's stderr logs and used by log-based assertions (toAllowDomain, toBlockDomain) to locate Squid access logs.
describe('Feature X', () => {
let runner: AwfRunner;
beforeAll(async () => {
await cleanup(false); // Pre-test cleanup
runner = createRunner();
});
afterAll(async () => {
await cleanup(false); // Post-test cleanup
});
test('should do something', async () => {
const result = await runner.runWithSudo(
'curl -f https://api.github.com/zen',
{
allowDomains: ['github.com'],
logLevel: 'debug',
timeout: 60000,
}
);
expect(result).toSucceed();
}, 120000); // Jest timeout (must be >= AWF timeout)
});Key settings:
| Setting | Value | Rationale |
|---|---|---|
testTimeout |
120000 (2 min) | Firewall tests involve Docker container lifecycle |
maxWorkers |
1 | Serial execution — avoids Docker network/container conflicts |
verbose |
true | Full test output for CI debugging |
preset |
ts-jest |
TypeScript compilation |
setupFilesAfterEnv |
jest.setup.ts |
Registers custom matchers before tests run |
Tests are discovered from tests/integration/**/*.test.ts.
Six custom matchers extend Jest's expect(), registered globally via jest.setup.ts:
| Matcher | Asserts | Input |
|---|---|---|
toSucceed() |
result.success === true (exit code 0) |
AwfResult |
toFail() |
result.success === false (non-zero exit) |
AwfResult |
toExitWithCode(code) |
result.exitCode === code |
AwfResult |
toTimeout() |
result.timedOut === true |
AwfResult |
toAllowDomain(domain) |
Domain appears as TCP_TUNNEL in Squid logs |
AwfResult |
toBlockDomain(domain) |
Domain appears as TCP_DENIED in Squid logs |
AwfResult |
Type declarations are in tests/jest-custom-matchers.d.ts, included in test files via:
/// <reference path="../jest-custom-matchers.d.ts" />toAllowDomain and toBlockDomain are more sophisticated — they:
- Extract
workDirfrom theAwfResult - Read
${workDir}/squid-logs/access.logsynchronously (Jest matchers must be sync) - Parse the Squid log using
LogParser - Check for
TCP_TUNNEL(allowed) orTCP_DENIED(blocked) entries for the domain
These require keepContainers: true to preserve the work directory.
A general-purpose Docker operations wrapper:
| Method | Purpose |
|---|---|
pullImage(image) |
Pull a Docker image |
run(options) |
Run a container with full option support |
stop(name) |
Stop a container |
rm(name, force?) |
Remove a container |
inspect(name) |
Get container state and network info |
logs(name, options?) |
Retrieve container logs |
exec(name, command) |
Execute command in running container |
networkExists(name) |
Check if a Docker network exists |
createNetwork(name, subnet?) |
Create a Docker network |
removeNetwork(name) |
Remove a Docker network |
listContainers(options?) |
List containers by filter |
wait(name) |
Wait for container exit and return exit code |
isRunning(name) |
Check if a container is currently running |
All methods use execa with reject: false to handle errors gracefully.
Note: This helper is available but less commonly used in practice — most tests go through AwfRunner.runWithSudo() which handles the full container lifecycle automatically.
Parses two log formats:
Parses the firewall_detailed log format:
%ts.%03tu %>a:%>p %{Host}>h %<a:%<p %rv %rm %>Hs %Ss:%Sh %ru "%{User-Agent}>h"
Into typed SquidLogEntry objects with fields: timestamp, clientIp, clientPort, host, destIp, destPort, protocol, method, statusCode, decision, hierarchy, url, userAgent.
Filtering methods:
filterByDecision(entries, 'allowed' | 'blocked')— Filter byTCP_TUNNEL/TCP_DENIEDfilterByDomain(entries, domain)— Filter by exact or subdomain matchgetUniqueDomains(entries)— Deduplicated domain listwasAllowed(entries, domain)/wasBlocked(entries, domain)— Boolean checks
Parses kernel log entries prefixed with [FW_BLOCKED_UDP] or [FW_BLOCKED_OTHER] from dmesg output.
Each runner.runWithSudo() call spawns a full Docker container lifecycle: config generation, Docker Compose up (Squid + Agent), iptables setup, command execution, teardown. This takes 15-25 seconds per invocation.
The chroot language test suite originally had ~73 individual test invocations, each with this overhead.
The batch runner groups commands that share the same AwfOptions (particularly allowDomains) into a single AWF container invocation. This reduced the chroot suite from ~73 to ~27 container startups.
-
Script Generation — Each command is wrapped in delimiters:
echo "===BATCH_START:python_version===" (python3 --version) 2>&1 _EC=$? echo "" echo "===BATCH_EXIT:python_version:$_EC==="
-
Single Invocation — The concatenated script runs as one
runWithSudo()call. -
Result Parsing — The combined stdout is parsed back into per-command results:
const batch = await runBatch(runner, [ { name: 'python_version', command: 'python3 --version' }, { name: 'node_version', command: 'node --version' }, ], { allowDomains: ['github.com'] }); expect(batch.get('python_version').exitCode).toBe(0);
interface BatchCommand {
name: string; // Unique identifier for this command
command: string; // Shell command to execute
}
interface BatchCommandResult {
stdout: string; // Captured output (stdout + stderr merged)
exitCode: number; // Per-command exit code
}
interface BatchResults {
get(name: string): BatchCommandResult; // Throws if name not found
overall: AwfResult; // Raw AWF result for the whole batch
}- Each command runs in a subshell
(cmd) 2>&1so failures don't abort the batch - stdout and stderr are merged via
2>&1— individual stderr is not preserved - Exit code is captured immediately into
_EC=$?beforeechoresets$? - Delimiter tokens (
===BATCH_START:,===BATCH_EXIT:) chosen to be unlikely in real output - If the batch is killed early, missing commands get
exitCode: -1
The batch runner is used in beforeAll to run all commands once, then individual test() blocks assert against named results:
describe('Quick checks (batched)', () => {
let batch: BatchResults;
beforeAll(async () => {
batch = await runBatch(runner, [
{ name: 'python_version', command: 'python3 --version' },
{ name: 'go_version', command: 'go version' },
], { allowDomains: ['github.com'], timeout: 120000 });
}, 180000);
test('Python available', () => {
expect(batch.get('python_version').exitCode).toBe(0);
});
test('Go available', () => {
expect(batch.get('go_version').exitCode).toBe(0);
});
});This pattern is used extensively in chroot-languages.test.ts (17 batched commands) and chroot-package-managers.test.ts.
The cleanup system uses a defense-in-depth approach across four stages, accounting for the fact that Docker container and network resources can leak when processes are killed mid-lifecycle.
File: tests/fixtures/cleanup.ts
The Cleanup class is a TypeScript port of the shell script, providing the same operations as programmable methods:
class Cleanup {
removeContainers() // docker rm -f awf-squid awf-agent
stopDockerComposeServices() // docker compose down -v for all /tmp/awf-*/
cleanupIptables() // Remove FW_WRAPPER chain from DOCKER-USER
removeNetwork() // docker network rm awf-net
pruneContainers() // docker container prune -f
pruneNetworks() // docker network prune -f (fixes "Pool overlaps")
removeWorkDirectories() // rm -rf /tmp/awf-*
cleanAll() // All of the above in sequence
}Called in beforeAll and afterAll of every test describe block:
beforeAll(async () => { await cleanup(false); });
afterAll(async () => { await cleanup(false); });AWF's own cleanup in src/cli.ts:
docker compose down -vstops containers- Deletes the work directory
/tmp/awf-<timestamp>
SIGINT/SIGTERM handlers in src/cli.ts trigger the same cleanup as normal exit. Cannot catch SIGKILL.
File: scripts/ci/cleanup.sh
A bash script that performs the same operations as the TypeScript Cleanup class. Run as a safety net in CI workflows with if: always().
Operations:
docker rm -f awf-squid awf-agentdocker compose -f /tmp/awf-*/docker-compose.yml down -v- Remove
FW_WRAPPERiptables chain fromDOCKER-USER docker network rm awf-netdocker container prune -fdocker network prune -f(critical for subnet pool management)rm -rf /tmp/awf-*
The timeout command used in CI can SIGKILL the AWF process after a grace period, bypassing stages 2-3. Without stages 1 and 4, orphaned Docker networks accumulate and eventually exhaust the subnet pool ("Pool overlaps" errors).
After gh-aw compile generates .lock.yml workflow files, this script transforms them for CI use:
| Transformation | Why |
|---|---|
Replace "Install awf binary" step with npm ci && npm run build |
Use locally-built code instead of pre-built GHCR binary |
Remove sparse-checkout blocks |
Full repo checkout needed for npm build |
Remove depth: 1 shallow clone |
Full checkout needed |
Replace --image-tag X --skip-pull with --build-local |
Use locally-built container images |
Processes workflow files (5 smoke, 1 build-test, 13 agentic, 3 secret-digger) across the suite. Ensures CI tests use the current source code rather than stale published images.
maxWorkers: 1 means all 26 test files run sequentially. A full integration suite run takes 30-60+ minutes depending on the number of container startups.
Root cause: All tests share the same Docker network (172.30.0.0/24), container names (awf-squid, awf-agent), and iptables chains. Parallel execution would cause conflicts.
Tests within a describe block share the same AwfRunner instance. While each runWithSudo() call creates a fresh container, there's no mechanism to isolate host-level side effects (iptables rules, Docker networks) between individual tests.
The batch runner merges stdout and stderr (2>&1), so per-command stderr is mixed into stdout. Tests can't distinguish between a command's stdout and stderr output.
toAllowDomain and toBlockDomain read Squid logs from the work directory, which is deleted during normal cleanup. Tests using these matchers must pass keepContainers: true and manually call cleanup() after assertions.
Every test has two timeout values: the AWF timeout in AwfOptions (default 120s) and the Jest test timeout (the second argument to test()). These must be kept in sync manually, with the Jest timeout always exceeding the AWF timeout.
Flaky tests (network issues, Docker daemon slowness) have no built-in retry mechanism. The test infrastructure treats every failure as final.
All integration tests require:
- Docker daemon running
- sudo access for iptables
- Port 3128 available for Squid
- The
172.30.0.0/24subnet unoccupied
This makes local development testing impossible without a Linux environment with Docker.
docker container prune -f and docker network prune -f in the cleanup routine can affect non-AWF containers and networks on the same host. This is safe in CI but could be problematic in shared development environments.
Instead of hardcoding 172.30.0.0/24, assign a unique subnet per test run. This would enable limited parallelism (2-3 workers) and eliminate "Pool overlaps" errors at the source.
The batch runner already groups commands by AwfOptions. This pattern could be extended: tests that share allowDomains and other config could be automatically grouped into fewer container invocations, even across test files.
For tests that share the same allowDomains config, the Squid proxy container could be kept running between tests, avoiding the ~10s container startup per invocation. Only the agent container would need to restart.
Using unique prefixes (e.g., awf-<random>-squid instead of awf-squid) would allow multiple test runs or workers simultaneously.
A Jest retryTimes configuration or custom retry wrapper for network-dependent tests would improve CI reliability without masking real failures.
The batch runner could be enhanced to capture stderr separately by writing it to a temp file:
(cmd) 2>/tmp/batch_stderr_nameThis would preserve per-command stderr for better failure diagnostics.
A helper that sets both the AWF timeout and the Jest timeout from a single value would eliminate the timeout duplication issue:
function testWithTimeout(name, fn, timeoutMs) {
const awfTimeout = timeoutMs - 30000; // Buffer for container lifecycle
test(name, () => fn(awfTimeout), timeoutMs);
}Replace docker container/network prune -f with targeted removal of AWF-specific resources only (e.g., by label or name pattern). This would make the cleanup safe for shared environments.
Many tests repeat the same pattern: run a curl command, check toSucceed() or toFail(). A higher-level fixture could encapsulate this:
await expectDomainAllowed(runner, 'github.com');
await expectDomainBlocked(runner, 'example.com', { allowDomains: ['github.com'] });Integration tests that use --build-local rebuild containers from scratch every time. A CI-level image cache (built once, reused across tests) would save significant time.