Detect container registry rate limits uniformly by agners · Pull Request #6732 · home-assistant/supervisor

agners · 2026-04-13T18:12:36Z

Proposed change

Container registry rate limits reach Supervisor in three distinct shapes:

HTTP 429 from the daemon — currently recognised, but the resulting exception and resolution issue are both hardcoded to "Docker Hub". Since Supervisor/Core/plugin images all live on ghcr.io now, virtually every 429 we see in the field is actually a GHCR throttle that we mislabel. Issue SUPERVISOR-16BK (>115k events, >93k users) is exactly this — the image in the event context is ghcr.io/home-assistant/amd64-hassio-supervisor:latest, yet the user sees a "log into Docker Hub" suggestion.
HTTP 500 with toomanyrequests in the body — not currently recognised. Docker daemons before 28.3.0 wrap an upstream 429 into a 500 to the client. This was fixed upstream by moby/moby commit 23fa0ae74a ("Cleanup http status error checks")
JSON error event during a streaming pull — not currently recognised. POST /images/create returns 200 OK and streams progress/error events, so rate limits that land during layer download arrive as plain text inside the stream and have no HTTP status to key off of. Happens on all recent daemon versions. Issues SUPERVISOR-13FQ (>16k events) and SUPERVISOR-13E0 (>8k events) are examples.

Cases 2 and 3 propagate as plain DockerError, bypass the 429 detection in docker/interface.py:install() entirely, never produce a DOCKER_RATELIMIT resolution issue, and generate large amounts of Sentry noise. Case 1 is handled but routes every GHCR 429 through Docker-Hub-specific messaging and suggestions.

This PR addresses all three shapes and splits the registry-specific handling so ghcr.io failures produce a new GITHUB_RATELIMIT issue with appropriate guidance (no misleading Docker Hub login suggestion), while Docker Hub failures keep their existing behaviour.

Summary of the changes:

New DockerRegistryRateLimitExceeded base exception with DockerHubRateLimitExceeded and a new GithubContainerRegistryRateLimitExceeded as subclasses. All extend APITooManyRequests so callers and future retry logic can key off a single type.
New GITHUB_RATELIMIT IssueType (no REGISTRY_LOGIN suggestion, since GHCR authentication is different from Docker Hub).
PullLogEntry.exception now maps stream errors containing toomanyrequests to DockerRegistryRateLimitExceeded (case 3).
docker/interface.py:install() routes all three cases through a single _registry_rate_limit_exception() helper that picks the right resolution issue, suggestion and exception subclass based on the image's registry.
utils/sentry.py filters APITooManyRequests (and anything wrapping it via __cause__) in both capture_exception and async_capture_exception. Single policy point, every caller benefits, no per-site changes needed.

Callers (supervisor.update(), plugin manager, HA core update) are intentionally unchanged — UPDATE_FAILED issues still get created alongside the registry-specific rate limit issue, giving users both the cause (rate limit) and the effect (update failed) in the resolution center.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New feature (which adds functionality to the supervisor)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue: Can't run the update #6634, Blocked updates. Many reported on FB #6570
Link to documentation pull request:
Link to cli pull request:
Link to client library pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
The code has been formatted using Ruff (ruff format supervisor tests)
Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

Documentation added/updated for developers.home-assistant.io
CLI updated (if necessary)
Client library updated (if necessary)

Container registry rate limits reach Supervisor in three distinct shapes: 1. HTTP 429 from the daemon - recognised today, but the exception and resolution issue are hardcoded to Docker Hub. Since Core/Supervisor/ plugin images all live on ghcr.io now, virtually every 429 we see in the field is actually a GHCR throttle that we mislabel. The biggest Sentry issue (SUPERVISOR-16BK) has >115k events / >93k users, all pulling a ghcr.io image, yet each user is told to "log into Docker Hub". 2. HTTP 500 with 'toomanyrequests' in the body - not recognised. Docker daemons before 28.3.0 wrap upstream 429s as 500 (fixed upstream by moby/moby 23fa0ae74a, "Cleanup http status error checks"). The large fleet on older daemons still produces this shape. 3. JSON error event during a streaming pull - not recognised. Once the daemon starts writing the 200 OK response body the status is locked in, so rate limits that land during layer download arrive as plain text in the pull stream. Happens on all recent daemon versions - SUPERVISOR-13FQ (>16k events) and SUPERVISOR-13E0 (>8k events) are two large examples. Cases 2 and 3 propagate as plain DockerError, bypass the 429 detection in install() entirely, never produce a DOCKER_RATELIMIT resolution issue, and generate large amounts of Sentry noise. Case 1 is detected but routes every GHCR 429 through Docker-Hub-specific messaging and suggestions. Changes: - Add DockerRegistryRateLimitExceeded as the common base class and GithubContainerRegistryRateLimitExceeded alongside the existing DockerHubRateLimitExceeded. All extend APITooManyRequests so callers and retry logic can key off a single type. - Add GITHUB_RATELIMIT IssueType so GHCR failures don't show the "log in to Docker Hub" suggestion that DOCKER_RATELIMIT carries. - PullLogEntry.exception now maps stream errors containing 'toomanyrequests' to DockerRegistryRateLimitExceeded (case 3). - docker/interface.py:install() routes all three cases through a single _registry_rate_limit_exception() helper that picks the right issue type, suggestion and exception subclass based on the image's registry. - utils/sentry.py filters APITooManyRequests (and anything wrapping it via __cause__) in capture_exception / async_capture_exception. One point of policy, every caller benefits. Callers (supervisor.update(), plugin manager, homeassistant core) are unchanged - UPDATE_FAILED issues still get created alongside the registry-specific rate limit issue, giving users the full picture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

agners · 2026-04-13T18:14:11Z

Maybe the GITHUB_RATELIMIT issue type is not really all that useful since not actionable 🤔 . Maybe simply log (for the task case)/or raise errors to the caller if it happens on a Supervisor API request?

mdegat01

Small things. Going to approve so you don't have to wait for another review as I assume the fixes are small (or possibly rejected if the merge I suggested in the second one is impossible). If significant changes result I'll take another look 👍

mdegat01 · 2026-04-16T18:31:14Z

-                    suggestions=[SuggestionType.REGISTRY_LOGIN],
-                )
-                raise DockerHubRateLimitExceeded(_LOGGER.error) from err
+            # Pre-28.3.0 daemons wrap registry rate limits as HTTP 500


So I just checked and we updated HAOS to 28.3.0 at 16.0. So 15.2 is the last version of HAOS that used a pre-28.3.0 version of docker, released April 14th, 2025. Is this even in our support window anymore? We should put an expected version of Supervisor this can be dropped on in the comment here if so.

mdegat01 · 2026-04-16T18:37:31Z

+    if _is_rate_limit(err):
+        return


Can we move this to our existing filter method here:

supervisor/supervisor/misc/filter.py

Lines 44 to 50 in a504d85

def filter_data(coresys: CoreSys, event: Event, hint: Hint) -> Event | None:

"""Filter event data before sending to sentry."""

# Ignore some exceptions

if "exc_info" in hint:

_, exc_value, _ = hint["exc_info"]

if isinstance(exc_value, (AppConfigurationError)):

return None

Then we can do one isinstance check like isinstance(exc_value, (AppConfigurationError, APITooManyRequests)) instead of multiple as they're kind of expensive. Or even it can't be merged due to the err = err.__cause__ bit it still feels like our filtering out of exception noise should all be in one place. Or is the err = err.__cause__ why it can't be merged into there?

home-assistant bot added the cla-signed label Apr 13, 2026

agners requested a review from mdegat01 April 13, 2026 18:14

agners added the new-feature A new feature label Apr 13, 2026

mdegat01 approved these changes Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect container registry rate limits uniformly#6732

Detect container registry rate limits uniformly#6732
agners wants to merge 1 commit intomainfrom
improve-docker-container-registry-toomanyrequests-detection

agners commented Apr 13, 2026 •

edited

Loading

Uh oh!

agners commented Apr 13, 2026

Uh oh!

mdegat01 left a comment •

edited

Loading

Uh oh!

mdegat01 Apr 16, 2026

Uh oh!

mdegat01 Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def filter_data(coresys: CoreSys, event: Event, hint: Hint) -> Event \| None:
	"""Filter event data before sending to sentry."""
	# Ignore some exceptions
	if "exc_info" in hint:
	_, exc_value, _ = hint["exc_info"]
	if isinstance(exc_value, (AppConfigurationError)):
	return None

Conversation

agners commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change

Type of change

Additional information

Checklist

Uh oh!

agners commented Apr 13, 2026

Uh oh!

mdegat01 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdegat01 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

mdegat01 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

agners commented Apr 13, 2026 •

edited

Loading

mdegat01 left a comment •

edited

Loading