Skip to content

feat(cli): route live_rl_mode jobs through the DSE handler#934

Draft
rutayan-nv wants to merge 30 commits into
NVIDIA:mainfrom
rutayan-nv:rpatro/gym-live-rl-routing
Draft

feat(cli): route live_rl_mode jobs through the DSE handler#934
rutayan-nv wants to merge 30 commits into
NVIDIA:mainfrom
rutayan-nv:rpatro/gym-live-rl-routing

Conversation

@rutayan-nv

Copy link
Copy Markdown
Contributor

Issue

An online live-RL run carries no TOML param sweep, so is_dse_job is False and handle_dry_run_and_run sent it down the grid-unrolled non-DSE path — never invoking the agent's run() loop.

Fix

Treat a run as agent-driven when it is a DSE sweep or cmd_args.live_rl_mode is set, and route those to handle_dse_job (which delegates to agent.run()). Plain jobs are unchanged; mixing agent-driven and plain jobs still errors.

Testing

pytest tests/test_handlers.py — new unit test for _is_dse_or_live_rl + 3 routing tests (live-RL→DSE handler, plain→non-DSE, mixed→error). ruff/pyright/vulture/import-linter clean.

Stacked on #932. Extracted from the online-gym piece of #884.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 30941949-3f92-44e9-a75a-22c0bfef6d55

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds live-RL / online execution mode backed by a GymServer protocol and a GymnasiumAdapter bridging BaseGym to the gymnasium.Env interface. Introduces domain-randomized environment parameters (EnvParamSpec, deterministic sampler, CSV sink, step observer). Adds BaseAgent.run() as the canonical trial loop. Rewrites CLI DSE handler to use agent.run() with structured failure reporting and live-RL job routing.

Changes

Live-RL and Env Params for CloudAI DSE

Layer / File(s) Summary
Core data model: ContinuousSpace, TestRun extensions, TestDefinition env_params
src/cloudai/_core/action_space.py, src/cloudai/_core/test_scenario.py, src/cloudai/models/workload.py
ContinuousSpace Pydantic model enforces low < high; TestRun gains current_env_params dict and increment_step(); TestDefinition gains an env_params field mapping names to EnvParamSpec.
Env params domain model: spec, sampler, sink, observer
src/cloudai/configurator/env_params.py
New module: EnvParamSpec and ObsLeafDescriptor with validation; StructuredObservation, EnvParamsSink, and StepObserver protocols; deterministic per-trial EnvParamsSampler seeded by (seed, name, trial); CsvSink writing step-aligned env.csv; EnvParamsObserver calling sampler and sink on before_step.
CloudAIGymEnv: GymServer protocol, online mode, env_params cache key
src/cloudai/configurator/cloudai_gym.py
GymServer protocol and _create_gym_server() factory with dynamic import and kwarg filtering; CloudAIGymEnv wires online/offline mode selection; routes action space, observation space, reset, and step to the server in online mode; adds _online_step(); extends TrajectoryEntry with env_params; requires matching env_params in the trajectory cache key; routes trajectory.csv to test_run.output_path for online runs.
GymnasiumAdapter: BaseGym → gymnasium.Env interface
src/cloudai/configurator/gymnasium_adapter.py
New class wrapping BaseGym with lazy gymnasium/numpy import; builds action_space as spaces.Dict splitting params into Discrete, continuous Box, and fixed; builds observation_space as structured spaces.Dict or flat float32 Box; decode_action handles discrete index lookup and continuous clamping/rounding; implements reset, step, step_raw, and observation conversion.
BaseAgent.run() orchestration loop
src/cloudai/configurator/base_agent.py
New BaseAgent.run() method: resets env, loops up to max_steps calling select_action, calls env.step, forwards feedback to update_policy, logs per-step and summary metrics.
CLI handlers: DSE/live-RL routing, agent.run() delegation, failure reporting
src/cloudai/cli/handlers.py
handle_dse_job rewritten to delegate to agent.run() with recoverable-error accumulation and hard-fail re-raise after reports; _record_run_failure() writes dse_failure.txt; generate_reports() gains optional error parameter; _is_dse_or_live_rl() classifies runs; validate_dse_env_params() rejects env_params on non-DSE tests; handle_dry_run_and_run dispatch routes live-RL vs plain jobs; validation wired into scenario verification.
Public API exports and gymnasium dependency
pyproject.toml, src/cloudai/configurator/__init__.py, src/cloudai/core.py, src/cloudai/util/lazy_imports.py
gymnasium~=1.2 added to dev and new rl extras; GymServer, GymnasiumAdapter, ContinuousSpace, ObsLeafDescriptor, and StructuredObservation added to __all__ across the package; lazy gymnasium import property added to LazyImports.
Tests
tests/test_action_space.py, tests/test_test_scenario.py, tests/test_env_params.py, tests/test_cloudaigym.py, tests/test_gymnasium_adapter_contract.py, tests/test_handlers.py
Full test coverage: ContinuousSpace validation; TestRun.increment_step monotonicity; EnvParamSpec, sampler, CsvSink, EnvParamsObserver, ObsLeafDescriptor; env_params cache-key and env.csv/trajectory.csv alignment; FakeGymServer online mode; GymnasiumAdapter contract stubs for monotonic step, contextual obs, continuous/discrete dispatch, and structured obs gating; handle_dse_job delegation, error accumulation, abort, and failure-report generation; live-RL routing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐇 Hoppity-hop through the action space wide,
A GymServer online, with rewards inside!
Env params sampled — deterministic seed,
The adapter wraps gymnasium with speed.
agent.run() loops, the trajectory grows,
And dse_failure.txt documents all woes.
🌿 This bunny approves — the RL garden grows!

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: routing live_rl_mode jobs through the DSE handler instead of the non-DSE path, which is the core fix addressed in this PR.
Description check ✅ Passed The description clearly explains the issue, fix, and testing approach. It outlines how live-RL runs were incorrectly routed, the solution involving treat live_rl_mode as agent-driven, and references the new unit tests added for validation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

This was referenced Jun 16, 2026
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/cloudai/configurator/cloudai_gym.py (2)

386-390: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Persist env_params with trajectory entries.

TrajectoryEntry.env_params is populated on cache hits and misses, but the CSV artifact drops it. Persisting the field keeps the on-disk trajectory contract consistent with the in-memory cache identity.

Proposed fix
             writer = csv.writer(file)
             if not file_exists:
-                writer.writerow(["step", "action", "reward", "observation"])
-            writer.writerow([entry.step, entry.action, entry.reward, entry.observation])
+                writer.writerow(["step", "action", "reward", "observation", "env_params"])
+            writer.writerow([entry.step, entry.action, entry.reward, entry.observation, entry.env_params])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 386 - 390, The CSV
writer is currently persisting only the step, action, reward, and observation
fields from the TrajectoryEntry object, but the env_params field is being
populated in memory and should also be written to disk. Add "env_params" to the
CSV header row in the writerow call when file_exists is False, and add
entry.env_params to the data row writerow call to ensure the env_params field is
persisted alongside the other trajectory fields in the CSV artifact.

260-262: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep env.csv and trajectory.csv aligned on constraint failures.

before_step() has already written env.csv by Line 260, but this early return skips write_trajectory() and after_step(). That leaves an env.csv row with no matching trajectory row whenever a trial is rejected by constraints.

Proposed fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0]
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 260 - 262, The
constraint check failure at line 260 in the step method causes an early return
that skips both write_trajectory() and after_step() calls, leaving env.csv with
an unmatched row. To fix this, before the early return statement in the
constraint_check condition, ensure that write_trajectory() is called to write a
matching trajectory row, and verify that after_step() is also invoked (or that
its necessary cleanup logic executes) so both CSV files remain synchronized when
constraint failures occur.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Line 206: The step method has an incorrect type hint for the action parameter.
The type hint `dict[str, int]` is too restrictive because the gymnasium Dict
action space (created in __init__) yields actions containing both Discrete
spaces with numpy int scalars and Box spaces with numpy float32 arrays. Change
the action parameter type hint in the step method from `dict[str, int]` to
`dict[str, Any]` to properly accommodate all possible action values from the
combined action space.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 386-390: The CSV writer is currently persisting only the step,
action, reward, and observation fields from the TrajectoryEntry object, but the
env_params field is being populated in memory and should also be written to
disk. Add "env_params" to the CSV header row in the writerow call when
file_exists is False, and add entry.env_params to the data row writerow call to
ensure the env_params field is persisted alongside the other trajectory fields
in the CSV artifact.
- Around line 260-262: The constraint check failure at line 260 in the step
method causes an early return that skips both write_trajectory() and
after_step() calls, leaving env.csv with an unmatched row. To fix this, before
the early return statement in the constraint_check condition, ensure that
write_trajectory() is called to write a matching trajectory row, and verify that
after_step() is also invoked (or that its necessary cleanup logic executes) so
both CSV files remain synchronized when constraint failures occur.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 54ac0a89-e79c-4c2e-9208-d0149316c2ff

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0e8cc and 9e8fa29.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • pyproject.toml
  • src/cloudai/_core/action_space.py
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • src/cloudai/models/workload.py
  • tests/test_action_space.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter_contract.py
  • tests/test_handlers.py
  • tests/test_test_scenario.py

Comment thread src/cloudai/configurator/gymnasium_adapter.py Outdated
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch from 9e8fa29 to 58acf09 Compare June 16, 2026 15:06
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)

237-262: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Complete constraint-failed trials through the same trajectory/observer path.

before_step() can sample/persist env params before this branch, but a constraint failure returns without a matching trajectory row or after_step() callback. It also returns a one-element observation even though offline observation shape now follows agent_metrics. Write the failure trajectory with current_env_params, call after_step(), and return an observation with the declared length.

Proposed fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0] * max(len(self.test_run.test.agent_metrics), 1)
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 237 - 262, The
constraint check failure path is incomplete compared to the cached result path
above it. When the constraint_check fails in the conditional block, you need to
add three things to match the trajectory/observer pattern: (1) write a
TrajectoryEntry to the trajectory using write_trajectory() with the current
step, action, the constraint_failure reward, an observation with the correct
length matching agent_metrics shape, and current_env_params; (2) call
after_step() on each observer in self.observers with the appropriate observation
and reward; (3) fix the return statement to return an observation with the
declared/correct length instead of the hardcoded one-element list [-1.0]. This
ensures constraint failures follow the same trajectory logging and observer
callback path as successful steps and cached results.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/cli/handlers.py`:
- Around line 350-352: The conditional at line 351 allows --single-sbatch to
bypass the agent-driven (live-RL) routing check. Currently, any --single-sbatch
scenario goes directly to handle_non_dse_job even if agent_driven test runs are
present. Modify the condition to exclude live-RL scenarios from the
--single-sbatch path by ensuring that if any(agent_driven) is true (indicating
live-RL), the code does not take the handle_non_dse_job branch regardless of the
--single-sbatch flag. This ensures live-RL jobs always route through the correct
path to execute agent.run() as required by the PR objective.

In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 207-210: The issue is that online trajectory steps use
_online_step_count which resets to 0 on every reset() call, causing trajectory
rows to collapse back to step=1 while TestRun.step remains stale, unlike the
offline path which properly uses TestRun.step. In the _online_step() method,
increment TestRun in the same way the offline path does, and write the TestRun
step value to trajectory rows instead of using _online_step_count, ensuring
consistency between online and offline trajectory tracking. Also remove or stop
resetting _online_step_count in the reset() method since it should no longer be
used.

In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 63-84: The `GymnasiumAdapter` class currently relies on duck
typing to satisfy the gymnasium interface, but ecosystem tools like Stable
Baselines3 require proper inheritance from `gymnasium.Env` to work correctly.
Add a guarded import at the top of the file that attempts to import
`gymnasium.Env` and falls back to `object` if the import fails (since gymnasium
is an optional dependency), then change the `GymnasiumAdapter` class definition
to inherit from that imported base class instead of just `object`. This
maintains compatibility when gymnasium is not installed while enabling proper
isinstance checks for ecosystem integration.

In `@tests/test_env_params.py`:
- Around line 87-88: The pytest.raises(ValueError) context manager in the
sink.write call is too broad and would pass for any ValueError, not just the
intended validation error for the drop_rate parameter. Add a match= pattern to
the pytest.raises(ValueError) call that specifies the expected error message
text from the drop_rate validation failure. This ensures the test specifically
validates the intended error path rather than catching unrelated ValueErrors.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 237-262: The constraint check failure path is incomplete compared
to the cached result path above it. When the constraint_check fails in the
conditional block, you need to add three things to match the trajectory/observer
pattern: (1) write a TrajectoryEntry to the trajectory using write_trajectory()
with the current step, action, the constraint_failure reward, an observation
with the correct length matching agent_metrics shape, and current_env_params;
(2) call after_step() on each observer in self.observers with the appropriate
observation and reward; (3) fix the return statement to return an observation
with the declared/correct length instead of the hardcoded one-element list
[-1.0]. This ensures constraint failures follow the same trajectory logging and
observer callback path as successful steps and cached results.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 9e582d9e-80ee-44c5-b97b-bdca22e4a058

📥 Commits

Reviewing files that changed from the base of the PR and between 9e8fa29 and 58acf09.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • pyproject.toml
  • src/cloudai/_core/action_space.py
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • src/cloudai/models/workload.py
  • tests/test_action_space.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter_contract.py
  • tests/test_handlers.py

Comment thread src/cloudai/configurator/cloudai_gym.py
Comment thread src/cloudai/configurator/gymnasium_adapter.py Outdated
Comment thread tests/test_env_params.py Outdated
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch 2 times, most recently from 5ed62b4 to d5c55ef Compare June 16, 2026 21:17

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/cli/handlers.py`:
- Around line 374-376: The mixed-input guard that validates agent-driven and
non-agent-driven test runs are not mixed together must be evaluated before the
args.single_sbatch short-circuit in the handler logic. Currently, the guard is
being bypassed when single_sbatch is true, allowing invalid mixed scenarios to
proceed through handle_non_dse_job. Add a guard at the beginning of the routing
logic (before the args.single_sbatch check) that validates the test_scenario
does not contain mixed agent-driven and plain jobs, and raise an appropriate
error if it does. Apply this same guard pattern to the other affected location
mentioned in the comment to ensure consistent enforcement of the mixed-job
contract across all routing paths.

In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 332-334: In the CloudAIGymEnv class logging statement, replace the
f-string formatting with %-style formatting in the logging.info call. Change the
f-string `f"CloudAIGymEnv [online] step {self._online_step_count}"` to use
%-style formatting by passing the message template as the first argument and the
variable as a subsequent argument to logging.info(). This ensures the string is
only evaluated when the logging level actually permits the message to be logged,
improving performance.

In `@tests/test_handlers.py`:
- Around line 513-529: The _run_routing helper function only tests the
single_sbatch=False path, leaving the single_sbatch=True code path untested and
at risk of regression. Modify the _run_routing function to accept a
single_sbatch parameter with a default value, then pass this parameter to the
argparse.Namespace creation instead of the hardcoded single_sbatch=False value.
Additionally, create or extend test cases in the range of lines 531-551 to call
_run_routing with both single_sbatch=True and single_sbatch=False to ensure both
routing paths are properly tested.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: ef8acc19-cbf0-48fb-8ffc-24c4e46da0a0

📥 Commits

Reviewing files that changed from the base of the PR and between 5ed62b4 and d5c55ef.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • pyproject.toml
  • src/cloudai/_core/action_space.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • tests/test_action_space.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter_contract.py
  • tests/test_handlers.py

Comment thread src/cloudai/cli/handlers.py Outdated
Comment thread src/cloudai/configurator/cloudai_gym.py
Comment thread tests/test_handlers.py Outdated
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch from d5c55ef to 0b28eb9 Compare June 16, 2026 21:48

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/util/lazy_imports.py (1)

2-2: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Copyright header year must be updated to pass CI checks.

The copyright year is "2025" but tests expect "2025-2026". Per repository conventions, files touched in consecutive years should use a hyphen range.

📅 Proposed fix
-# Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/util/lazy_imports.py` at line 2, The copyright header in the file
has the year listed as "2025" but the CI tests expect "2025-2026" per repository
conventions for files modified across consecutive years. Update the copyright
year in the header comment from "2025" to "2025-2026" to match the expected
format and pass the CI checks.

Source: Learnings

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cloudai/util/lazy_imports.py`:
- Line 2: The copyright header in the file has the year listed as "2025" but the
CI tests expect "2025-2026" per repository conventions for files modified across
consecutive years. Update the copyright year in the header comment from "2025"
to "2025-2026" to match the expected format and pass the CI checks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: ea876b83-809d-4c4f-86ec-fa3a9f8c427e

📥 Commits

Reviewing files that changed from the base of the PR and between d5c55ef and 0b28eb9.

📒 Files selected for processing (8)
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • src/cloudai/util/lazy_imports.py
  • tests/test_cloudaigym.py
  • tests/test_handlers.py

@rutayan-nv rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch from 0b28eb9 to b21a3fe Compare June 16, 2026 22:15
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch 2 times, most recently from f8d6141 to 7a1c2ca Compare June 17, 2026 18:33
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

Design note (carried from a CodeRabbit finding on base_agent.py, #933): BaseAgent.run() ignores the done flag from env.step() and terminates only via select_action() -> None + max_steps.

In offline DSE this is intentional — a constraint failure returns done=True with a penalty and the sweep should continue. In online/live-RL mode, however, GymServer.step() (via CloudAIGymEnv._online_step, cloudai_gym.py:314) can return a genuine terminal done=True, which run() currently steps past without ending or resetting the episode.

Flagging for a decision on this PR: should live-RL honor server-emitted done (end/reset the episode), or is stepping to max_steps the intended behavior? No code change made.

@rutayan-nv rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch 7 times, most recently from 48871be to e657665 Compare June 21, 2026 15:23
Drop the EnvParamsSink Protocol + CsvSink pair (and runtime_checkable) for a
single concrete EnvParamsSink, built unconditionally in CloudAIGymEnv. The
sink is now stateless: write() takes the record path per call and skips empty
samples, so non-DR runs write nothing and write_trajectory needs no branch.

Derive both records from a new iteration_dir property and expose the env
record via the env_params_record_path property (was _env_csv_path), keeping
env.csv and trajectory.csv step-aligned without coupling the name to CSV.
…ty flag

Replace the hardcoded `agent == "grid_search"` check with a BaseAgent.samples_env_params
capability flag (opt-in, defaults False). Only agents whose search consumes per-trial
env_params sampling set it True; enumerating/surrogate agents leave it False, so a config
that declares env_params for an agent that would ignore them is rejected up front instead
of silently no-op'ing. New agents answer for themselves with no string to maintain.

Relocate validate_dse_env_params out of the CLI handlers into configurator/env_params.py
next to the logic it guards, looking the agent up via the Registry. Unknown agents are
deferred to the dedicated agent-resolution error rather than masked here.

Keep all public-facing comments, docstrings, and the error message generic (no internal
agent names). Cover the full validator matrix, including the unknown-agent deferral.
Compress multi-line inline comments down to the single non-obvious rationale (or drop
them where the code already speaks), per the self-documenting-code principle. Public API
docstrings and test intent comments are left intact.
apply_params_set overlays sampled scalar draws onto cmd_args, then
reconstructs the TestDefinition to validate the applied action values.
That pass re-ran validate_env_params, which rejects a weighted env_param
whose cmd_args target is no longer a candidate list - exactly what the
overlay produces. env_params is already validated at parse time, so drop
it from the validation-only dump. Adds a regression test covering a
weighted env_param's scalar draw.
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch from 3ac3d1a to eca78d4 Compare June 25, 2026 21:15
@rutayan-nv rutayan-nv marked this pull request as draft June 25, 2026 21:27
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

Parking this PR for now. Per a change in release plan, this work isn't needed for the upcoming aiconfigurator release, so I'm moving it to draft to keep it out of the review queue. @podkidyshev — no need to review this one yet; I'll mark it ready again when it's scheduled for release. Thanks!

@rutayan-nv rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch from eca78d4 to fba8a72 Compare June 25, 2026 22:17
An env_params entry only reclassifies a list-valued cmd_args sweep as
env-sampled; a scalar is already fixed, so annotating it is a meaningless
label. Previously such an annotation was tolerated as a silent no-op, which
let it slip through parse-time validation and inconsistently trip (or not)
the downstream "no agent will sample them" check depending on run mode.

Reject it where the contract lives - TestDefinition.validate_env_params -
so the failure is immediate and mode-independent. EnvParams.from_test's
non-list guard becomes defensive (parse-time now guarantees lists); the
post-overlay path already drops env_params before re-validating, so concrete
scalar draws are unaffected.

Extract the per-field checks into a helper to keep the validator under the
complexity limit, and update tests: scalar annotations now assert rejection
instead of no-op tolerance.
…otocol

Add ObsLeafDescriptor (a self-describing observation leaf: "box" of width
dim, or "discrete" of size n) and a StructuredObservation Protocol that
documents the optional env hooks structured_observation_descriptors() and
encode_observation(). These let an env expose a named, per-leaf observation
so adapters (e.g. GymnasiumAdapter) can build the matching gymnasium
spaces.Dict; the hooks are duck-typed, so envs need not subclass.

Both exported via cloudai.core.
…ejection tests

Negative tests pass an extra kwarg and an out-of-Literal kind to assert
ValidationError; mark the deliberate type violations with type: ignore.
Wrap a CloudAI BaseGym as a gymnasium.Env-shaped object: a spaces.Dict of
Discrete (list params) and Box (ContinuousSpace) actions over the tunable
params with fixed (single-value) params injected each step; observations as
either a flat float32 Box or, when the env opts in via the structured-obs
hooks, a spaces.Dict of per-leaf ObsLeafDescriptor subspaces. Continuous
dtype="int" params are quantized (rounded/clamped) at decode_action so the
trajectory cache key collapses float jitter. The adapter is a pure
pass-through over test_run.step (never mutates it), so contextual-bandit
rollouts that reset() per trial keep a monotonic trial index.

gymnasium is an optional dependency lazy-imported behind the new [rl] extra
(also added to dev); CloudAIGymEnv.define_observation_space() now returns one
slot per agent metric so adapters get the right Box shape. Exported via
cloudai.core. Caller-contract tests pin the step-monotonicity, observation
pass-through, continuous-quantization, and structured-obs invariants.
…ature

step() delegates to decode_action(dict[str, Any]) and exists precisely to
round float/continuous policy actions to ints; widen its parameter type
from dict[str, int] to dict[str, Any] to match.
…reserve traceback on DSE re-raise

- _as_obs_array(): assert encoded keys match descriptors before coercion
  (reuses _assert_keys, same guard as decode_action/step_raw) and
  materialize output by descriptor keys to avoid KeyError on extra keys
  and silent partial observations on missing keys.
- handlers.py: re-raise the captured hard-fail with its original traceback.

Addresses CodeRabbit findings on NVIDIA#930.
…ngleton

Replace the bespoke _import_gymnasium() in-method seam with the canonical
lazy.gymnasium / lazy.np properties; addresses the in-method-import review
concern. No behavior change — gymnasium stays an optional [rl] extra.
The lazy.gymnasium.spaces refactor gives the adapter precise gymnasium types
instead of Any, which surfaced two latent issues the scoped pre-commit run
missed:

- pyright now sees Space[Any]/Dict in the adapter contract test, so concrete
  attribute access (.low/.high/.n/.spaces) is flagged. Narrow via local
  bindings + isinstance before access.
- lazy_imports.py now has a 2026 commit in its history, so the ci_only
  copyright check requires the year range 2025-2026.
Inherit from gymnasium.Env (guarded import, falls back to object when the
optional [rl] extra is absent) so ecosystem tooling that performs isinstance
checks (e.g. Stable-Baselines3) accepts the adapter.

- Use the TYPE_CHECKING import form so pyright sees a concrete base class
  while runtime keeps the optional-dependency fallback.
- Drop ClassVar on metadata to match Env's attribute shape (noqa RUF012).
- Rename the inner-env accessor unwrapped -> cloudai_env; gymnasium's
  Env.unwrapped (returns self) is the correct base-env semantics, and the
  old override returned a non-Env (BaseGym), which would mislead ecosystem
  code calling .unwrapped.
Inheriting gymnasium.Env widened the static type of action_space from the
concrete spaces.Dict the adapter builds to the base spaces.Space, which has
no __getitem__. Cast at the test call site to restore subspace indexing for
pyright (runtime is unchanged; action_space is always a Dict).
…rse of decode_action

decode_action had no public inverse, so consumers needing value->index
encoding (e.g. RLlib warm-start / behavioral cloning) reached into the
private _tunable_params dict. When ContinuousSpace support split that
internal, those consumers broke with AttributeError.

encode_action closes the contract: discrete values map to their candidate
index, continuous values wrap into the clamped float32 Box array, so
decode_action(encode_action(v)) == v for any native v. Adds round-trip
contract tests pinning the invariant and rejection of non-candidate
values / key mismatches.
…tinuousSpace

The GymnasiumAdapter's continuous-action path depends on ContinuousSpace, which
ships separately. Until then nothing constructs a ContinuousSpace, so the
continuous branches here are unreachable and the only effect of the import is an
ImportError at module load. Drop the continuous import, _continuous_params, the
Box action mapping, and decode/encode continuous handling so the adapter builds
and ships standalone over discrete + structured-observation support. The
continuous support rejoins when ContinuousSpace lands.
Adds an optional online (live-RL) execution mode: instead of launching a
workload through the runner, the env delegates each trial to an in-process
GymServer. Mode is selected via an injected server or cmd_args.live_rl_mode
(server built from cmd_args.env_class); the agent-facing interface is
unchanged. Default runner-backed behavior is untouched.
…split

A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.

Addresses CodeRabbit finding on NVIDIA#932.
Online (live-RL) steps wrote trajectory rows from a per-instance counter
that reset to 0 on every reset(). Under reset-per-episode Gymnasium/RLlib
rollouts that collapsed online trajectory.csv rows back to step=1 each
episode. Drive online steps from the monotonic test_run.step (incremented
per step, never reset) so rows keep increasing across resets.

Removes the now-redundant _online_step_count; render() and the trajectory
writer both read test_run.step. Adds a regression test asserting steps stay
1,2,3 across a mid-rollout reset.
env_params_record_path was pinned to iteration_dir (runner.scenario_root),
but online (live-RL) runs write trajectory.csv under test_run.output_path
and have no runner-backed scenario_root. write_trajectory evaluates the
record path unconditionally, so an online step hit scenario_root on a
runner that does not expose it. Derive the path from trajectory_file_path
so env.csv always sits next to trajectory.csv in both modes, with no extra
branch in the write path.
An online live-RL run carries no TOML sweep (is_dse_job is False) but must still
drive the agent's own run() loop. handle_dry_run_and_run now treats a run as
agent-driven when it is a DSE sweep OR cmd_args.live_rl_mode is set, routing it to
handle_dse_job instead of the grid-unrolled non-DSE path.
…s a valid config

Locks in the contract that an online live-RL run sources its action space from the
GymServer, not a TOML param space. A config with only env_params (no list-valued
cmd_args/extra_env_vars/num_nodes) is therefore complete and valid: is_dse_job is
False and param_space is empty, yet CloudAIGymEnv still builds the action space from
the server and honors env_params (EnvParamsObserver is built).
single_sbatch grid-unrolls DSE param spaces in one allocation, but
SingleSbatchRunner has no live-RL path, so a live_rl_mode job under
--single-sbatch would silently run once as a static job instead of
driving the in-process GymServer loop. Hard-error on that combination.

This is a narrow stopgap; the broader single_sbatch-vs-agent-driven
routing rework (mixed-job guard ordering, per-agent-type semantics) is
tracked in NVIDIA#937 and referenced
from a TODO at the guard. Extract routing into _dispatch_agent_driven_run
to keep handle_dry_run_and_run under the complexity limit.
The live-RL-only test still asserted the removed observer pattern and
declared env_params with no candidate list, which the first-class
EnvParams model (values live in cmd_args) no longer builds. Give
ball_speed a candidate list and assert env.params is built; it stays
excluded from the action space, so is_dse_job/param_space are unchanged.
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch from fba8a72 to 523a431 Compare June 26, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant