Validate scenario dependency scheduling by fallintoplace · Pull Request #923 · NVIDIA/cloudai

fallintoplace · 2026-06-12T17:24:25Z

Summary

validate start-blocking scenario dependencies before parser output is converted into TestRun objects
report the blocking cycle path and flag scenarios with no runnable root tests
add a runner guard for programmatically constructed scenarios that bypass model validation

Root cause

The scenario model only rejected direct self-dependencies and unknown dependency IDs. Longer start-blocking cycles could parse successfully, leaving the runner with no dependency-free tests to submit and no jobs to monitor.

Testing

uv run --extra dev pytest tests/test_test_scenario.py tests/test_base_runner.py
uv run --extra dev pytest tests/test_toml_files.py
uv run --extra dev ruff check src/cloudai/models/scenario.py src/cloudai/_core/base_runner.py tests/test_test_scenario.py tests/test_base_runner.py
uv run --extra dev ruff format --check src/cloudai/models/scenario.py src/cloudai/_core/base_runner.py tests/test_test_scenario.py tests/test_base_runner.py

coderabbitai · 2026-06-12T17:24:37Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 51d36c3c-3ac1-49fd-a237-ec068c34b3ca

📥 Commits

Reviewing files that changed from the base of the PR and between 18ce72e and 109352e.

📒 Files selected for processing (4)

src/cloudai/_core/base_runner.py
src/cloudai/models/scenario.py
tests/test_base_runner.py
tests/test_test_scenario.py

📝 Walkthrough

Walkthrough

This PR adds validation to prevent test scenarios with unexecutable start-blocking dependencies. The scenario model detects cycles and unrunnable root tests during validation; the runner enforces at execution time that at least one test can start immediately.

Changes

Start-blocking dependency validation

Layer / File(s)	Summary
Scenario model start-blocking dependency validation `src/cloudai/models/scenario.py`, `tests/test_test_scenario.py`	New `_find_dependency_cycle()` helper performs DFS to detect cycles in directed graphs. Self-dependency validator allows start-blocking types but rejects others. New `check_start_blocking_dependencies_are_schedulable()` filters dependencies to start-blocking edges, detects cycles, and rejects scenarios with no runnable root tests. Tests verify cycle detection with and without runnable roots.
Runner execution guard `src/cloudai/_core/base_runner.py`, `tests/test_base_runner.py`	`BaseRunner.run()` guards against scenarios where no tests are runnable by raising `ValueError` if total tests exist but none lack start-blocking dependencies. Test verifies the guard prevents execution when all tests depend on each other cyclically.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit checked graphs for their flows,
Found cycles where no test could go,
With guards at the gate,
And roots that must wait,
No blocked starts shall make schedules slow! 🌱

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Validate scenario dependency scheduling' clearly and specifically describes the main change: adding validation for scenario dependencies before test scheduling.
Description check	✅ Passed	The description is directly related to the changeset, explaining the validation logic, root cause, and testing performed for scenario dependency scheduling.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

podkidyshev · 2026-06-25T14:03:45Z

@fallintoplace could you please provide a reproducer config (scenario + test(s) tomls)?

fallintoplace · 2026-06-25T14:50:05Z

Thanks, good point. Here is a minimal reproducer using the existing Sleep workload config.

Test TOML: this can use the existing conf/common/test/sleep.toml:

name = "sleep"
description = "sleep test"
test_template_name = "Sleep"

[cmd_args]
seconds = 1

Scenario TOML, for example /tmp/start_blocking_cycle.toml:

name = "start-blocking-cycle-repro"

[[Tests]]
id = "Tests.sleep_a"
test_name = "sleep"
num_nodes = 1
time_limit = "00:01:00"

  [[Tests.dependencies]]
  type = "start_post_comp"
  id = "Tests.sleep_b"

[[Tests]]
id = "Tests.sleep_b"
test_name = "sleep"
num_nodes = 1
time_limit = "00:01:00"

  [[Tests.dependencies]]
  type = "start_post_init"
  id = "Tests.sleep_a"

Run with:

uv run cloudai dry-run \
  --system-config conf/common/system/example_slurm_cluster.toml \
  --tests-dir conf/common/test \
  --test-scenario /tmp/start_blocking_cycle.toml \
  --output-dir /tmp/cloudai-cycle-repro

Before this change, this kind of longer start-blocking cycle could parse successfully, leaving the runner with no dependency-free tests to submit. With this PR, validation should fail earlier with an error containing:

Start-blocking dependency cycle detected: Tests.sleep_a -> Tests.sleep_b -> Tests.sleep_a.
No runnable root tests found; at least one test must have no 'start_post_init' or 'start_post_comp' dependencies.

Validate scenario dependency scheduling

109352e

fallintoplace requested review from jeffnvidia, podkidyshev and srivatsankrishnan as code owners June 12, 2026 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Validate scenario dependency scheduling#923

Validate scenario dependency scheduling#923
fallintoplace wants to merge 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/scenario-dependency-validation

fallintoplace commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

podkidyshev commented Jun 25, 2026

Uh oh!

fallintoplace commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

fallintoplace commented Jun 12, 2026

Summary

Root cause

Testing

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

podkidyshev commented Jun 25, 2026

Uh oh!

fallintoplace commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading