Skip to content

Add Unix socket support for Core communication with feature flag#6742

Open
agners wants to merge 2 commits intomainfrom
add-unix-socket-support-with-feature-flag
Open

Add Unix socket support for Core communication with feature flag#6742
agners wants to merge 2 commits intomainfrom
add-unix-socket-support-with-feature-flag

Conversation

@agners
Copy link
Copy Markdown
Member

@agners agners commented Apr 16, 2026

Proposed change

Reintroduce Unix socket support for Supervisor-to-Core communication, originally merged in #6590 and reverted in #6735. The key addition over the original PR is that the feature is now gated behind the unix_socket_core_api feature flag (introduced in #6719) and disabled by default.

When enabled and Core version supports it, Supervisor communicates with Core via a Unix socket at /run/os/core.sock instead of TCP, eliminating the need for access token authentication on that path (Core authenticates the peer by the socket connection itself).

TCP path improvements (active by default, no feature flag needed)

The original PR also refactored the TCP communication path. Hence, the following improvements apply regardless of the feature flag:

  • Centralized WebSocket connection logic: APIProxy and HomeAssistantWebSocket no longer implement their own auth handshakes. Both delegate to api.connect_websocket(), eliminating duplicate auth/retry logic.
  • WSClient.connect_with_auth() proper error handling: The old version didn't close the websocket on failure or handle unexpected message types. It now closes the connection on any error and wraps unexpected exceptions in HomeAssistantAPIError.
  • make_request() early bail on stopped container: Checks is_running() before attempting a request, giving a clear error instead of a cryptic connection failure.
  • get_core_state() response validation: Now raises HomeAssistantAPIError if the response is None or not a dict (matching the validation get_config() already had).
  • "Connected to Core via TCP" log: Logs transport info on first successful connection and re-logs after container restarts, improving observability.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality to the supervisor)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:
  • Link to cli pull request:
  • Link to client library pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Ruff (ruff format supervisor tests)
  • Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

Reintroduce Unix socket support for Supervisor-to-Core communication
(reverted in #6735) with the addition of a feature flag gate. The
feature is now controlled by the `core_unix_socket` feature flag and
disabled by default.

When enabled and Core version supports it, Supervisor communicates with
Core via a Unix socket at /run/os/core.sock instead of TCP. This
eliminates the need for access token authentication on the socket path,
as Core authenticates the peer by the socket connection itself.

Key changes:
- Add FeatureFlag.CORE_UNIX_SOCKET to gate the feature
- HomeAssistantAPI: transport-aware session/url/websocket management
- WSClient: separate connect() (Unix, no auth) and connect_with_auth()
  (TCP) class methods with proper error handling
- APIProxy delegates websocket setup to api.connect_websocket()
- Container state tracking for Unix session lifecycle
- CI builder mounts /run/supervisor for integration tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@mdegat01 mdegat01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Largely the same code so can probably expediate it, especially since the bulk is now behind a feature flag. Had one small comment but seems ready to me.

If we start to build a bunch of these feature flags we may have to think of a documentation plan for them. For now though I might just recommend a comment somewhere to remind us that once this is turned on you must specifically call ha core rebuild to see it in action. A supervisor restart or even a host reboot won't turn it on alone since the container needs to be rebuilt with the new config.

url: str,
token: str,
*,
max_msg_size: int = 4 * 1024 * 1024,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be an argument? I know it was but do we currently change the default anywhere? This feels like an anti-pattern like how ruff now flags arguments called something like timeout as something to set at the object level and not allow to be set by clients on a per call basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants