Skip to content

Publish versioned Docker images to GHCR with current/frontier channel tags#1552

Open
hua7450 wants to merge 3 commits into
mainfrom
publish-versioned-docker-images
Open

Publish versioned Docker images to GHCR with current/frontier channel tags#1552
hua7450 wants to merge 3 commits into
mainfrom
publish-versioned-docker-images

Conversation

@hua7450

@hua7450 hua7450 commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Fixes #1551

Summary

Restores automated Docker image publishing (removed in 36c9327 when the release pipeline went Modal-only) as a separate distribution workflow that mirrors the Modal release channels, so API partners can run any model version locally. The deploy pipeline itself stays Modal-only; deploy-staged.yml is untouched.

Tag scheme

Tag Meaning Moves when
us-<version> Exact policyengine-us version baked in Rebuilt on code-only redeploys (same semantics as Modal workers)
current (also latest) Same model version as the hosted gateway's default channel Repointed (not rebuilt) on weekly promotion
frontier Next week's model version Repointed each release
<api-version>, sha-<commit> Build provenance Each release

How it works

  • Trigger: workflow_run on successful push-triggered Release to Modal runs. (on: push: tags: cannot work — the release tag is pushed with GITHUB_TOKEN, which GitHub suppresses as a workflow trigger.) Manual Modal dispatch releases do not publish, mirroring the PyPI publish gate; a workflow_dispatch path with sync_channel_tags: true covers catch-up.
  • Plan (.github/scripts/plan_docker_tags.py, unit-tested): reads the release commit's pyproject pin, the live gateway /versions/us (source of truth for channel state), and existing GHCR tags; emits a build matrix plus retag list. Backfills any channel version with no published image (on first run: current = 1.715.2).
  • Build: multi-arch (amd64 + arm64) via buildx/QEMU from Dockerfile.production, pushed with GITHUB_TOKEN (packages: write only, no new secrets).
  • Retag: docker buildx imagetools create repoints channel tags registry-side — promotion without rebuild, exactly like Modal's frontier→current promotion.

Dockerfile changes

  • python:3.12-slimpython:3.13-slim (parity with Modal workers; uv.lock already resolves for 3.13)
  • Dropped hardcoded --platform=linux/amd64 to enable arm64 (Apple Silicon partners currently run under emulation)
  • New POLICYENGINE_US_VERSION build arg: uv sync --frozen from the lock, then override-install the requested version — the same two-step the Modal worker image uses
  • --no-editable, single-layer apt cleanup, and COPY --chown instead of post-copy chown -R (which duplicated the ~500 MB venv into a second layer per published tag)

Docs

  • New canonical skill docs/engineering/skills/docker-images.md; modal-release-prs.md amended to carve out image publishing as a distribution artifact (deployment stays Modal-only); AGENTS.md/CLAUDE.md adapter pointers; skills README index
  • README quick-run section documents the tag scheme; config/README.md examples moved off the stale :latest
  • New pr-docker-build.yml: build-only PR check, path-filtered to image-affecting files (nothing in PR CI exercised the Dockerfile before)

Rollout notes (post-merge)

  • Before the first publish run repoints latest: preserve today's stale digest under an explicit tag so existing pulls stay addressable:
    docker buildx imagetools create -t ghcr.io/policyengine/policyengine-household-api:us-1.691.1-api0.18.0 ghcr.io/policyengine/policyengine-household-api@sha256:<current latest digest> (today's :latest is a 2026-05-13 build of policyengine_us 1.691.1, API 0.18.0)
  • Validate end-to-end with a manual Publish Docker image dispatch (sync_channel_tags: true) — workflow_run triggers cannot fire until this file is on the default branch
  • Confirm next weekly release publishes us-<new> and repoints channels

Test plan

  • make format-check passes
  • Planner smoke-tested against the live gateway and GHCR (release, dispatch-sync, and error modes): plans the 1.726.0 release build, the 1.715.2 backfill, and correct channel repoints
  • Unit tests for plan logic (rotation, backfill, current==frontier collapse, malformed versions, both pin spellings) — collected by make test via .github/scripts
  • PR CI passes (including the new pr-docker-build check, which exercises the Dockerfile changes)

🤖 Generated with Claude Code

hua7450 and others added 2 commits June 11, 2026 13:10
Adds a publish workflow that observes completed Release to Modal runs,
builds multi-arch images per exact policyengine-us version, and repoints
current/frontier/latest channel tags from the live gateway manifest.
A POLICYENGINE_US_VERSION build arg lets anyone build arbitrary model
versions locally; a workflow_dispatch path publishes them on demand.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…symmetry

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@hua7450 hua7450 marked this pull request as ready for review June 11, 2026 17:28
@hua7450 hua7450 requested a review from anth-volk June 11, 2026 17:28

@anth-volk anth-volk left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One CI coverage gap from review.

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build image
uses: docker/build-push-action@v5

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR build check does not cover the publish-only POLICYENGINE_US_VERSION build-arg path. The publish workflow always passes POLICYENGINE_US_VERSION=${{ matrix.us_version }}, so the Dockerfile override step can fail even while this check stays green. Please add a representative build-args entry here, ideally using the current pinned policyengine_us version, so PR CI exercises the same Dockerfile path used by publishing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 486805d. The check now reads the pinned policyengine-us version from pyproject.toml and passes it as POLICYENGINE_US_VERSION, so PR CI builds through the same Dockerfile override branch the publish workflow always uses (the release build also passes the pin, so this is the representative case). Reading the pin dynamically keeps the weekly bot bumps covered without maintenance. Verified the override path end-to-end locally as well: built images with overrides 1.715.2 and 1.725.0, both served correct version-specific calculations (e.g. the KS CSFP county restriction flips between 1.715.2 and 1.725.0 exactly as the model changelog says).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Publish versioned Docker images to GHCR with current/frontier channel tags

2 participants