Skip to content

fix(docker-build): route BuildKit pulls through the NVIDIA Docker Hub mirror#48

Merged
nvjaxzin merged 1 commit into
mainfrom
fix/buildkit-registry-mirror
May 26, 2026
Merged

fix(docker-build): route BuildKit pulls through the NVIDIA Docker Hub mirror#48
nvjaxzin merged 1 commit into
mainfrom
fix/buildkit-registry-mirror

Conversation

@nvjaxzin

@nvjaxzin nvjaxzin commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Configures docker/setup-buildx-action to read /etc/buildkit/buildkitd.toml, the file the nv-gha-runners already pre-populate. That config routes BuildKit's docker.io pulls through dockerhub.nvidia.com (NVIDIA's Artifactory pull-through cache) instead of going straight to Docker Hub.

Without this, BuildKit on the self-hosted runners ignores the daemon-level mirror and pulls anonymously from Docker Hub, which hits the unauthenticated rate limit and breaks Docker builds across DSX repos.

Reference: nvbug 6225636.

Changes

  • .github/actions/docker-build/action.yml — the composite action consumed by downstream DSX repos. This is the load-bearing change: bumping the consumed tag is the only thing downstream repos need to do.
  • .github/workflows/build-cds-containers.yml — this repo's own container build workflow gets the same fix.
  • .github/actions/security-container-scan/README.md — example snippets now show the BuildKit config step so adopters copy the correct pattern.

The reusable workflow .github/workflows/docker-build.yml is covered transitively because it delegates to the composite action above.

Not modified (BuildKit not involved): docker pull / docker build call sites that go through the Docker daemon directly. The nv-gha-runners Docker daemon is already configured with dockerhub.nvidia.com as a registry mirror at the host level.

Why this is the right fix

This matches the NVIDIA GHA platform team's documented best practice: https://docs.gha-runners.nvidia.com/platform/best-practices/#use-docker-cache-for-buildkit

The toml file is already on every nv-gha-runners runner image, so there's nothing to deploy on the infra side — this is purely a workflow-level opt-in.

Test plan

Static validation performed locally:

  • python -m yaml parses all three edited files cleanly.
  • actionlint v1.7.7 exits clean (-no-color -oneline) on build-cds-containers.yml and docker-build.yml. (actionlint does not lint composite-action manifests; action.yml schema is validated by GHA at action invocation.)

Live validation (requires runners; needs a vetter to allow copy-pr-bot):

  • This PR's own CI run. build-cds-containers.yml matches the path filter on .github/workflows/build-cds-containers.yml and cds-containers/**. Once a vetter approves the copy-pr-bot mirror, the workflow should run on pull-request/** and exercise the new buildkitd-config: end-to-end.
  • Downstream consumer smoke test tracked separately on an internal PR (linked from nvbug 6225636).

Rollout

After merge, cut a new tag of this action (next semver, likely v1.17.0). Downstream consumers pick it up via a one-line bump of the uses: SHA/tag — no per-repo workflow surgery required.

cc @huaweic-nv @mmou-nv @abegnoche @lachen-nv

Configure docker/setup-buildx-action with a buildkitd config that the
nv-gha-runners pre-populate at /etc/buildkit/buildkitd.toml. That file
routes BuildKit's docker.io pulls through dockerhub.nvidia.com, NVIDIA's
Artifactory pull-through cache, instead of going straight to Docker Hub.

Without this, BuildKit on nv-gha-runners ignores the daemon-level mirror
and pulls anonymously from Docker Hub, which hits the unauthenticated
rate limit and breaks Docker builds across DSX repos (nvbug 6225636).

Three call sites are updated:

- .github/actions/docker-build/action.yml — the composite action used by
  every consumer (e.g. dsx-exchange). Bumping the consuming repos to the
  next tag of this action is all they need to do.
- .github/workflows/build-cds-containers.yml — this repo's own image
  build workflow.
- .github/actions/security-container-scan/README.md — example snippets
  now show the BuildKit config step so adopters copy the right pattern.

Follows the documented NVIDIA GHA platform best practice:
https://docs.gha-runners.nvidia.com/platform/best-practices/#use-docker-cache-for-buildkit

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Brian R. Jackson <brijackson@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented May 26, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@nvjaxzin

Copy link
Copy Markdown
Contributor Author

I have read the DCO Document and I hereby sign the DCO

github-actions Bot added a commit that referenced this pull request May 26, 2026
Comment on lines +42 to +46
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
buildkitd-config: /etc/buildkit/buildkitd.toml

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example code, but this is intentional to make sure the example is complete and correct.

Comment on lines +76 to +80
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
buildkitd-config: /etc/buildkit/buildkitd.toml

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example code, but this is intentional to make sure the example is complete and correct.

@nvjaxzin

Copy link
Copy Markdown
Contributor Author

/ok to test

@nvjaxzin nvjaxzin merged commit 07b465c into main May 26, 2026
2 of 3 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 26, 2026
@nvjaxzin nvjaxzin deleted the fix/buildkit-registry-mirror branch May 26, 2026 22:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants