Skip to content

Fix elastic manager initialization bug when elastic is disabled#4119

Open
lukebaumann wants to merge 1 commit into
mainfrom
lukebaumann/refactor-elastic-init-logic-fix
Open

Fix elastic manager initialization bug when elastic is disabled#4119
lukebaumann wants to merge 1 commit into
mainfrom
lukebaumann/refactor-elastic-init-logic-fix

Conversation

@lukebaumann

@lukebaumann lukebaumann commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR fixes a bug where the workload fails with AssertionError: assertion elastic_manager is not None when elastic training is disabled but the Pathways backend is used.

Root Cause

In elastic_utils.py, live_devices() checked if the Pathways backend was used, but did not verify if elastic_enabled was True before calling ensure_elastic_manager_initialized() and asserting that elastic_manager is not None. Since elastic_enabled was False, elastic_manager was never initialized, leading to the assertion failure.

Solution

Introduced a helper function should_use_elastic(config) which safely checks if the configuration is present and if elastic training is enabled (using elastic_enabled(config)).
Refactored ensure_elastic_manager_initialized and live_devices to use should_use_elastic(config) instead of partial checks.

Wrap elastic check condition into should_use_elastic and use it in ensure_elastic_manager_initialized and live_devices to avoid crash when elastic_enabled is False but pathways is used.

Tests

Added a unit test test_live_devices_disabled to cover the case where pathways is used but elastic is disabled.

FIXES: b/521670882
Verified: b/521670882#comment4

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Wrap elastic check condition into should_use_elastic and use it in
ensure_elastic_manager_initialized and live_devices to avoid crash when
elastic_enabled is False but pathways is used.

Bug: http://b/521670882
@lukebaumann lukebaumann force-pushed the lukebaumann/refactor-elastic-init-logic-fix branch from 72bed8b to a5233cf Compare June 9, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants