Skip to content

Commit c923b80

Browse files
Copilotlpcoxgithub-advanced-security[bot]Copilot
authored
feat: add AWF JSON/YAML config ingestion with schema validation and CLI precedence (#2018)
* Initial plan * feat: add JSON/YAML config file loading and schema docs Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/dcd77d8b-19a4-4eab-9b64-5772d37fda34 * refactor: tighten config validation helpers and precedence docs Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/dcd77d8b-19a4-4eab-9b64-5772d37fda34 * docs: clarify config parsing and RFC wording Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/dcd77d8b-19a4-4eab-9b64-5772d37fda34 * refactor: clarify in-place config option merge behavior Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/dcd77d8b-19a4-4eab-9b64-5772d37fda34 * Potential fix for pull request finding 'CodeQL / Useless assignment to local variable' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * fix: retry apt-get update on transient mirror failures in Dockerfiles The initial apt-get update can fail with hash mismatches when Ubuntu mirrors are mid-sync. The existing retry logic only covered apt-get install failures, not apt-get update failures. This adds a retry with cache clear for the initial apt-get update in both agent and squid Dockerfiles. Fixes: squid-proxy build failure (exit code 100) in --build-local CI Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: set COPILOT_MODEL fallback to claude-sonnet-4.5 for BYOK mode The byok-copilot feature flag generates an empty COPILOT_MODEL fallback, but BYOK providers require an explicit model. This patches the lock file with claude-sonnet-4.5 as the default. Workaround for: github/gh-aw#26565 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: use retry loop with backoff for apt-get update in Dockerfiles Replace single-retry apt-get update with a 3-attempt retry loop using exponential backoff (10s, 20s, 30s). The single retry was insufficient when Ubuntu mirrors are in prolonged sync states (observed in CI where mirror hash mismatches persisted across multiple minutes). The apt_update_retry function clears the apt cache before each attempt, ensuring a clean state. Applied to all apt-get update calls in both agent and squid Dockerfiles, including the install-retry fallback paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: use Azure apt mirrors in Dockerfiles for CI reliability GitHub Actions runners are Azure-hosted, so azure.archive.ubuntu.com is geographically closer and more reliable than archive.ubuntu.com. This reduces Hash Sum mismatch failures during Ubuntu mirror syncs. Handles both traditional sources.list (jammy/22.04) and DEB822 format (noble/24.04+) used by ubuntu/squid:latest. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Landon Cox <landon.cox@microsoft.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 4681e04 commit c923b80

File tree

9 files changed

+879
-14
lines changed

9 files changed

+879
-14
lines changed

.github/workflows/smoke-copilot.lock.yml

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ The `--` separator divides firewall options from the command to run.
3434

3535
- [Quick start](docs/quickstart.md) — install, verify, and run your first command
3636
- [Usage guide](docs/usage.md) — CLI flags, domain allowlists, examples
37+
- [AWF config schema](docs/awf-config.schema.json) — machine-readable JSON Schema for JSON/YAML configs
38+
- [AWF config spec](docs/awf-config-spec.md) — normative processing and precedence rules for tooling/compiler integration
3739
- [Enterprise configuration](docs/enterprise-configuration.md) — GitHub Enterprise Cloud and Server setup
3840
- [Chroot mode](docs/chroot-mode.md) — use host binaries with network isolation
3941
- [API proxy sidecar](docs/api-proxy-sidecar.md) — secure credential management for LLM APIs

containers/agent/Dockerfile

Lines changed: 40 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,35 @@ ARG BASE_IMAGE=ubuntu:22.04
99

1010
FROM ${BASE_IMAGE}
1111

12+
# Switch to Azure apt mirror for faster, more reliable package fetches in CI
13+
# GitHub Actions runners are Azure-hosted; azure.archive.ubuntu.com is geographically closer
14+
# Handles both traditional sources.list (jammy) and DEB822 format (noble+)
15+
RUN if [ -f /etc/apt/sources.list ]; then \
16+
sed -i 's|http://archive.ubuntu.com|http://azure.archive.ubuntu.com|g' /etc/apt/sources.list; \
17+
sed -i 's|http://security.ubuntu.com|http://azure.archive.ubuntu.com|g' /etc/apt/sources.list; \
18+
fi && \
19+
if [ -d /etc/apt/sources.list.d ]; then \
20+
find /etc/apt/sources.list.d -name '*.sources' -exec \
21+
sed -i 's|http://archive.ubuntu.com|http://azure.archive.ubuntu.com|g' {} + 2>/dev/null || true; \
22+
find /etc/apt/sources.list.d -name '*.sources' -exec \
23+
sed -i 's|http://security.ubuntu.com|http://azure.archive.ubuntu.com|g' {} + 2>/dev/null || true; \
24+
fi
25+
1226
# Install required packages and Node.js 22
1327
# Note: Some packages may already exist in runner-like base images, apt handles this gracefully
14-
# Retry logic handles transient 404s when Ubuntu archive supersedes package versions mid-build
28+
# apt_update_retry: retries up to 3 times with backoff to survive prolonged mirror syncs
1529
RUN set -eux; \
30+
apt_update_retry() { \
31+
local i; for i in 1 2 3; do \
32+
rm -rf /var/lib/apt/lists/* && apt-get update && return 0; \
33+
echo "apt-get update attempt $i/3 failed, retrying in $((i*10))s..." >&2; sleep $((i*10)); \
34+
done; return 1; \
35+
}; \
1636
PKGS="iptables curl ca-certificates git gh gnupg dnsutils net-tools netcat-openbsd gosu libcap2-bin"; \
17-
apt-get update && \
37+
apt_update_retry && \
1838
( apt-get install -y --no-install-recommends $PKGS || \
1939
(echo "apt-get install failed, retrying with fresh package index..." && \
20-
rm -rf /var/lib/apt/lists/* && \
21-
apt-get update && \
40+
apt_update_retry && \
2241
apt-get install -y --no-install-recommends $PKGS) ) && \
2342
# Prefer system binaries over runner toolcache (e.g., act images) for Node checks.
2443
export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH" && \
@@ -39,22 +58,33 @@ RUN set -eux; \
3958
# These packages are commonly needed by workflows and avoid agents spending time installing them manually
4059
# See: https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2204-Readme.md
4160
RUN set -eux; \
61+
apt_update_retry() { \
62+
local i; for i in 1 2 3; do \
63+
rm -rf /var/lib/apt/lists/* && apt-get update && return 0; \
64+
echo "apt-get update attempt $i/3 failed, retrying in $((i*10))s..." >&2; sleep $((i*10)); \
65+
done; return 1; \
66+
}; \
4267
PARITY_PKGS="libgdiplus libev-dev libssl-dev php-intl php-gd"; \
43-
apt-get update && \
68+
apt_update_retry && \
4469
( apt-get install -y --no-install-recommends $PARITY_PKGS || \
4570
(echo "apt-get install failed, retrying with fresh package index..." && \
46-
rm -rf /var/lib/apt/lists/* && \
47-
apt-get update && \
71+
apt_update_retry && \
4872
apt-get install -y --no-install-recommends $PARITY_PKGS) ) && \
4973
rm -rf /var/lib/apt/lists/*
5074

5175
# Upgrade all packages to pick up security patches
5276
# Addresses CVE-2023-44487 (HTTP/2 Rapid Reset) and other known vulnerabilities
5377
# Retry logic handles transient mirror sync failures during apt-get update
54-
RUN apt-get update && apt-get upgrade -y && rm -rf /var/lib/apt/lists/* || \
78+
RUN apt_update_retry() { \
79+
local i; for i in 1 2 3; do \
80+
rm -rf /var/lib/apt/lists/* && apt-get update && return 0; \
81+
echo "apt-get update attempt $i/3 failed, retrying in $((i*10))s..." >&2; sleep $((i*10)); \
82+
done; return 1; \
83+
}; \
84+
apt_update_retry && \
85+
apt-get upgrade -y && rm -rf /var/lib/apt/lists/* || \
5586
(echo "apt-get upgrade failed, retrying with fresh package index..." && \
56-
rm -rf /var/lib/apt/lists/* && \
57-
apt-get update && apt-get upgrade -y && rm -rf /var/lib/apt/lists/*)
87+
apt_update_retry && apt-get upgrade -y && rm -rf /var/lib/apt/lists/*)
5888

5989
# Create non-root user with UID/GID matching host user
6090
# This allows the user command to run with appropriate permissions

containers/squid/Dockerfile

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,33 @@
11
FROM ubuntu/squid:latest
22

3+
# Switch to Azure apt mirror for faster, more reliable package fetches in CI
4+
# GitHub Actions runners are Azure-hosted; azure.archive.ubuntu.com is geographically closer
5+
# Handles both traditional sources.list (jammy) and DEB822 format (noble+)
6+
RUN if [ -f /etc/apt/sources.list ]; then \
7+
sed -i 's|http://archive.ubuntu.com|http://azure.archive.ubuntu.com|g' /etc/apt/sources.list; \
8+
sed -i 's|http://security.ubuntu.com|http://azure.archive.ubuntu.com|g' /etc/apt/sources.list; \
9+
fi && \
10+
if [ -d /etc/apt/sources.list.d ]; then \
11+
find /etc/apt/sources.list.d -name '*.sources' -exec \
12+
sed -i 's|http://archive.ubuntu.com|http://azure.archive.ubuntu.com|g' {} + 2>/dev/null || true; \
13+
find /etc/apt/sources.list.d -name '*.sources' -exec \
14+
sed -i 's|http://security.ubuntu.com|http://azure.archive.ubuntu.com|g' {} + 2>/dev/null || true; \
15+
fi
16+
317
# Install additional tools for debugging, healthcheck, and SSL Bump
4-
# Retry logic handles transient 404s when Ubuntu archive supersedes package versions mid-build
18+
# apt_update_retry: retries up to 3 times with backoff to survive prolonged mirror syncs
519
RUN set -eux; \
20+
apt_update_retry() { \
21+
local i; for i in 1 2 3; do \
22+
rm -rf /var/lib/apt/lists/* && apt-get update && return 0; \
23+
echo "apt-get update attempt $i/3 failed, retrying in $((i*10))s..." >&2; sleep $((i*10)); \
24+
done; return 1; \
25+
}; \
626
PKGS="curl dnsutils net-tools netcat-openbsd openssl squid-openssl"; \
7-
apt-get update && \
27+
apt_update_retry && \
828
apt-get install -y --only-upgrade gpgv && \
929
( apt-get install -y --no-install-recommends $PKGS || \
10-
(rm -rf /var/lib/apt/lists/* && apt-get update && \
30+
(apt_update_retry && \
1131
apt-get install -y --no-install-recommends $PKGS) ) && \
1232
rm -rf /var/lib/apt/lists/*
1333

docs/awf-config-spec.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# AWF Configuration Specification (W3C-style)
2+
3+
## Status of This Document
4+
5+
This document defines the canonical configuration model for AWF (`awf`) and is intended for:
6+
7+
- `awf` CLI runtime loading (`--config`)
8+
- tooling that compiles workflows to AWF invocations (including `gh-aw`)
9+
- IDE/static validation via JSON Schema
10+
11+
The machine-readable schema is published at:
12+
13+
- `docs/awf-config.schema.json`
14+
15+
## 1. Conformance
16+
17+
The normative keywords in this document are to be interpreted as described in RFC 2119.
18+
19+
An AWF config document is conforming when:
20+
21+
1. It is valid JSON or YAML.
22+
2. Its data model satisfies `docs/awf-config.schema.json`.
23+
3. Unknown properties are not present (closed-world schema).
24+
25+
## 2. Processing Model
26+
27+
1. The user invokes `awf --config <path|-> -- <command>`.
28+
2. If `<path>` is `-`, AWF reads configuration bytes from stdin.
29+
3. If `<path>` ends with `.json`, AWF parses as JSON.
30+
4. If `<path>` ends with `.yaml` or `.yml`, AWF parses as YAML.
31+
5. Otherwise, AWF attempts JSON parse first, then YAML parse.
32+
6. AWF validates the parsed document and fails fast on validation errors.
33+
7. AWF maps config fields to CLI option semantics.
34+
8. **CLI options MUST take precedence over config file values**.
35+
36+
## 3. Precedence Rules
37+
38+
The effective configuration order is:
39+
40+
1. AWF internal defaults
41+
2. Config file (`--config`)
42+
3. Explicit CLI flags
43+
44+
This precedence model allows reusable checked-in configs with environment-specific CLI overrides.
45+
46+
## 4. Data Model
47+
48+
The root object MAY contain:
49+
50+
- `$schema`
51+
- `network`
52+
- `apiProxy`
53+
- `security`
54+
- `container`
55+
- `environment`
56+
- `logging`
57+
- `rateLimiting`
58+
59+
Section semantics and constraints are defined by `docs/awf-config.schema.json`.
60+
61+
## 5. CLI Mapping (Normative)
62+
63+
Tools generating AWF invocations (such as `gh-aw`) SHOULD use this mapping:
64+
65+
- `network.allowDomains[]``--allow-domains <csv>`
66+
- `network.blockDomains[]``--block-domains <csv>`
67+
- `network.dnsServers[]``--dns-servers <csv>`
68+
- `network.upstreamProxy``--upstream-proxy`
69+
- `apiProxy.enabled``--enable-api-proxy`
70+
- `apiProxy.targets.<provider>.host``--<provider>-api-target`
71+
- `apiProxy.targets.openai.basePath``--openai-api-base-path`
72+
- `apiProxy.targets.anthropic.basePath``--anthropic-api-base-path`
73+
- `apiProxy.targets.gemini.basePath``--gemini-api-base-path`
74+
- `security.sslBump``--ssl-bump`
75+
- `security.enableDlp``--enable-dlp`
76+
- `security.enableHostAccess``--enable-host-access`
77+
- `security.allowHostPorts``--allow-host-ports`
78+
- `security.allowHostServicePorts``--allow-host-service-ports`
79+
- `security.difcProxy.host``--difc-proxy-host`
80+
- `security.difcProxy.caCert``--difc-proxy-ca-cert`
81+
- `container.memoryLimit``--memory-limit`
82+
- `container.agentTimeout``--agent-timeout`
83+
- `container.enableDind``--enable-dind`
84+
- `container.workDir``--work-dir`
85+
- `container.containerWorkDir``--container-workdir`
86+
- `container.imageRegistry``--image-registry`
87+
- `container.imageTag``--image-tag`
88+
- `container.skipPull``--skip-pull`
89+
- `container.buildLocal``--build-local`
90+
- `container.agentImage``--agent-image`
91+
- `container.tty``--tty`
92+
- `container.dockerHost``--docker-host`
93+
- `environment.envFile``--env-file`
94+
- `environment.envAll``--env-all`
95+
- `environment.excludeEnv[]` → repeated `--exclude-env`
96+
- `logging.logLevel``--log-level`
97+
- `logging.diagnosticLogs``--diagnostic-logs`
98+
- `logging.auditDir``--audit-dir`
99+
- `logging.proxyLogsDir``--proxy-logs-dir`
100+
- `logging.sessionStateDir``--session-state-dir`
101+
- `rateLimiting.enabled: false``--no-rate-limit`
102+
- `rateLimiting.requestsPerMinute``--rate-limit-rpm`
103+
- `rateLimiting.requestsPerHour``--rate-limit-rph`
104+
- `rateLimiting.bytesPerMinute``--rate-limit-bytes-pm`
105+
106+
## 6. Stdin Mode
107+
108+
AWF MUST support `--config -` for programmatic/pipeline scenarios.
109+
110+
## 7. Error Reporting
111+
112+
On parse or validation failure, AWF MUST:
113+
114+
1. exit non-zero
115+
2. print an error describing location and reason
116+
3. avoid partial execution

0 commit comments

Comments
 (0)