Validate scenario dependency scheduling#923
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR adds validation to prevent test scenarios with unexecutable start-blocking dependencies. The scenario model detects cycles and unrunnable root tests during validation; the runner enforces at execution time that at least one test can start immediately. ChangesStart-blocking dependency validation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@fallintoplace could you please provide a reproducer config (scenario + test(s) tomls)? |
|
Thanks, good point. Here is a minimal reproducer using the existing Sleep workload config. Test TOML: this can use the existing name = "sleep"
description = "sleep test"
test_template_name = "Sleep"
[cmd_args]
seconds = 1Scenario TOML, for example name = "start-blocking-cycle-repro"
[[Tests]]
id = "Tests.sleep_a"
test_name = "sleep"
num_nodes = 1
time_limit = "00:01:00"
[[Tests.dependencies]]
type = "start_post_comp"
id = "Tests.sleep_b"
[[Tests]]
id = "Tests.sleep_b"
test_name = "sleep"
num_nodes = 1
time_limit = "00:01:00"
[[Tests.dependencies]]
type = "start_post_init"
id = "Tests.sleep_a"Run with: uv run cloudai dry-run \
--system-config conf/common/system/example_slurm_cluster.toml \
--tests-dir conf/common/test \
--test-scenario /tmp/start_blocking_cycle.toml \
--output-dir /tmp/cloudai-cycle-reproBefore this change, this kind of longer start-blocking cycle could parse successfully, leaving the runner with no dependency-free tests to submit. With this PR, validation should fail earlier with an error containing: |
Summary
Root cause
The scenario model only rejected direct self-dependencies and unknown dependency IDs. Longer start-blocking cycles could parse successfully, leaving the runner with no dependency-free tests to submit and no jobs to monitor.
Testing