Publish versioned Docker images to GHCR with current/frontier channel tags#1552
Publish versioned Docker images to GHCR with current/frontier channel tags#1552hua7450 wants to merge 3 commits into
Conversation
Adds a publish workflow that observes completed Release to Modal runs, builds multi-arch images per exact policyengine-us version, and repoints current/frontier/latest channel tags from the live gateway manifest. A POLICYENGINE_US_VERSION build arg lets anyone build arbitrary model versions locally; a workflow_dispatch path publishes them on demand. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…symmetry Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
anth-volk
left a comment
There was a problem hiding this comment.
One CI coverage gap from review.
| - name: Set up Docker Buildx | ||
| uses: docker/setup-buildx-action@v3 | ||
| - name: Build image | ||
| uses: docker/build-push-action@v5 |
There was a problem hiding this comment.
This PR build check does not cover the publish-only POLICYENGINE_US_VERSION build-arg path. The publish workflow always passes POLICYENGINE_US_VERSION=${{ matrix.us_version }}, so the Dockerfile override step can fail even while this check stays green. Please add a representative build-args entry here, ideally using the current pinned policyengine_us version, so PR CI exercises the same Dockerfile path used by publishing.
There was a problem hiding this comment.
Good catch — fixed in 486805d. The check now reads the pinned policyengine-us version from pyproject.toml and passes it as POLICYENGINE_US_VERSION, so PR CI builds through the same Dockerfile override branch the publish workflow always uses (the release build also passes the pin, so this is the representative case). Reading the pin dynamically keeps the weekly bot bumps covered without maintenance. Verified the override path end-to-end locally as well: built images with overrides 1.715.2 and 1.725.0, both served correct version-specific calculations (e.g. the KS CSFP county restriction flips between 1.715.2 and 1.725.0 exactly as the model changelog says).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Fixes #1551
Summary
Restores automated Docker image publishing (removed in 36c9327 when the release pipeline went Modal-only) as a separate distribution workflow that mirrors the Modal release channels, so API partners can run any model version locally. The deploy pipeline itself stays Modal-only;
deploy-staged.ymlis untouched.Tag scheme
us-<version>current(alsolatest)frontier<api-version>,sha-<commit>How it works
workflow_runon successful push-triggeredRelease to Modalruns. (on: push: tags:cannot work — the release tag is pushed withGITHUB_TOKEN, which GitHub suppresses as a workflow trigger.) Manual Modal dispatch releases do not publish, mirroring the PyPI publish gate; aworkflow_dispatchpath withsync_channel_tags: truecovers catch-up..github/scripts/plan_docker_tags.py, unit-tested): reads the release commit's pyproject pin, the live gateway/versions/us(source of truth for channel state), and existing GHCR tags; emits a build matrix plus retag list. Backfills any channel version with no published image (on first run:current= 1.715.2).Dockerfile.production, pushed withGITHUB_TOKEN(packages: writeonly, no new secrets).docker buildx imagetools createrepoints channel tags registry-side — promotion without rebuild, exactly like Modal's frontier→current promotion.Dockerfile changes
python:3.12-slim→python:3.13-slim(parity with Modal workers; uv.lock already resolves for 3.13)--platform=linux/amd64to enable arm64 (Apple Silicon partners currently run under emulation)POLICYENGINE_US_VERSIONbuild arg:uv sync --frozenfrom the lock, then override-install the requested version — the same two-step the Modal worker image uses--no-editable, single-layer apt cleanup, andCOPY --chowninstead of post-copychown -R(which duplicated the ~500 MB venv into a second layer per published tag)Docs
docs/engineering/skills/docker-images.md;modal-release-prs.mdamended to carve out image publishing as a distribution artifact (deployment stays Modal-only); AGENTS.md/CLAUDE.md adapter pointers; skills README indexconfig/README.mdexamples moved off the stale:latestpr-docker-build.yml: build-only PR check, path-filtered to image-affecting files (nothing in PR CI exercised the Dockerfile before)Rollout notes (post-merge)
latest: preserve today's stale digest under an explicit tag so existing pulls stay addressable:docker buildx imagetools create -t ghcr.io/policyengine/policyengine-household-api:us-1.691.1-api0.18.0 ghcr.io/policyengine/policyengine-household-api@sha256:<current latest digest>(today's:latestis a 2026-05-13 build of policyengine_us 1.691.1, API 0.18.0)Publish Docker imagedispatch (sync_channel_tags: true) —workflow_runtriggers cannot fire until this file is on the default branchus-<new>and repoints channelsTest plan
make format-checkpassesmake testvia.github/scriptspr-docker-buildcheck, which exercises the Dockerfile changes)🤖 Generated with Claude Code