feat(cli): route live_rl_mode jobs through the DSE handler by rutayan-nv · Pull Request #934 · NVIDIA/cloudai

rutayan-nv · 2026-06-16T14:02:30Z

Issue

An online live-RL run carries no TOML param sweep, so is_dse_job is False and handle_dry_run_and_run sent it down the grid-unrolled non-DSE path — never invoking the agent's run() loop.

Fix

Treat a run as agent-driven when it is a DSE sweep or cmd_args.live_rl_mode is set, and route those to handle_dse_job (which delegates to agent.run()). Plain jobs are unchanged; mixing agent-driven and plain jobs still errors.

Testing

pytest tests/test_handlers.py — new unit test for _is_dse_or_live_rl + 3 routing tests (live-RL→DSE handler, plain→non-DSE, mixed→error). ruff/pyright/vulture/import-linter clean.

Stacked on #932. Extracted from the online-gym piece of #884.

coderabbitai · 2026-06-16T14:02:45Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 30941949-3f92-44e9-a75a-22c0bfef6d55

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds live-RL / online execution mode backed by a GymServer protocol and a GymnasiumAdapter bridging BaseGym to the gymnasium.Env interface. Introduces domain-randomized environment parameters (EnvParamSpec, deterministic sampler, CSV sink, step observer). Adds BaseAgent.run() as the canonical trial loop. Rewrites CLI DSE handler to use agent.run() with structured failure reporting and live-RL job routing.

Changes

Live-RL and Env Params for CloudAI DSE

Layer / File(s)	Summary
Core data model: ContinuousSpace, TestRun extensions, TestDefinition env_params `src/cloudai/_core/action_space.py`, `src/cloudai/_core/test_scenario.py`, `src/cloudai/models/workload.py`	`ContinuousSpace` Pydantic model enforces `low < high`; `TestRun` gains `current_env_params` dict and `increment_step()`; `TestDefinition` gains an `env_params` field mapping names to `EnvParamSpec`.
Env params domain model: spec, sampler, sink, observer `src/cloudai/configurator/env_params.py`	New module: `EnvParamSpec` and `ObsLeafDescriptor` with validation; `StructuredObservation`, `EnvParamsSink`, and `StepObserver` protocols; deterministic per-trial `EnvParamsSampler` seeded by `(seed, name, trial)`; `CsvSink` writing step-aligned `env.csv`; `EnvParamsObserver` calling sampler and sink on `before_step`.
CloudAIGymEnv: GymServer protocol, online mode, env_params cache key `src/cloudai/configurator/cloudai_gym.py`	`GymServer` protocol and `_create_gym_server()` factory with dynamic import and kwarg filtering; `CloudAIGymEnv` wires online/offline mode selection; routes action space, observation space, reset, and step to the server in online mode; adds `_online_step()`; extends `TrajectoryEntry` with `env_params`; requires matching `env_params` in the trajectory cache key; routes `trajectory.csv` to `test_run.output_path` for online runs.
GymnasiumAdapter: BaseGym → gymnasium.Env interface `src/cloudai/configurator/gymnasium_adapter.py`	New class wrapping `BaseGym` with lazy `gymnasium`/`numpy` import; builds `action_space` as `spaces.Dict` splitting params into `Discrete`, continuous `Box`, and fixed; builds `observation_space` as structured `spaces.Dict` or flat `float32 Box`; `decode_action` handles discrete index lookup and continuous clamping/rounding; implements `reset`, `step`, `step_raw`, and observation conversion.
BaseAgent.run() orchestration loop `src/cloudai/configurator/base_agent.py`	New `BaseAgent.run()` method: resets env, loops up to `max_steps` calling `select_action`, calls `env.step`, forwards feedback to `update_policy`, logs per-step and summary metrics.
CLI handlers: DSE/live-RL routing, agent.run() delegation, failure reporting `src/cloudai/cli/handlers.py`	`handle_dse_job` rewritten to delegate to `agent.run()` with recoverable-error accumulation and hard-fail re-raise after reports; `_record_run_failure()` writes `dse_failure.txt`; `generate_reports()` gains optional `error` parameter; `_is_dse_or_live_rl()` classifies runs; `validate_dse_env_params()` rejects `env_params` on non-DSE tests; `handle_dry_run_and_run` dispatch routes live-RL vs plain jobs; validation wired into scenario verification.
Public API exports and gymnasium dependency `pyproject.toml`, `src/cloudai/configurator/__init__.py`, `src/cloudai/core.py`, `src/cloudai/util/lazy_imports.py`	`gymnasium~=1.2` added to `dev` and new `rl` extras; `GymServer`, `GymnasiumAdapter`, `ContinuousSpace`, `ObsLeafDescriptor`, and `StructuredObservation` added to `__all__` across the package; lazy `gymnasium` import property added to `LazyImports`.
Tests `tests/test_action_space.py`, `tests/test_test_scenario.py`, `tests/test_env_params.py`, `tests/test_cloudaigym.py`, `tests/test_gymnasium_adapter_contract.py`, `tests/test_handlers.py`	Full test coverage: `ContinuousSpace` validation; `TestRun.increment_step` monotonicity; `EnvParamSpec`, sampler, `CsvSink`, `EnvParamsObserver`, `ObsLeafDescriptor`; env_params cache-key and `env.csv`/`trajectory.csv` alignment; `FakeGymServer` online mode; `GymnasiumAdapter` contract stubs for monotonic step, contextual obs, continuous/discrete dispatch, and structured obs gating; `handle_dse_job` delegation, error accumulation, abort, and failure-report generation; live-RL routing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐇 Hoppity-hop through the action space wide,
A GymServer online, with rewards inside!
Env params sampled — deterministic seed,
The adapter wraps gymnasium with speed.
agent.run() loops, the trajectory grows,
And dse_failure.txt documents all woes.
🌿 This bunny approves — the RL garden grows!

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: routing live_rl_mode jobs through the DSE handler instead of the non-DSE path, which is the core fix addressed in this PR.
Description check	✅ Passed	The description clearly explains the issue, fix, and testing approach. It outlines how live-RL runs were incorrectly routed, the solution involving treat live_rl_mode as agent-driven, and references the new unit tests added for validation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

rutayan-nv · 2026-06-16T14:09:10Z

@coderabbitai review

coderabbitai · 2026-06-16T14:09:28Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/cloudai/configurator/cloudai_gym.py (2)

386-390: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Persist env_params with trajectory entries.

TrajectoryEntry.env_params is populated on cache hits and misses, but the CSV artifact drops it. Persisting the field keeps the on-disk trajectory contract consistent with the in-memory cache identity.

Proposed fix

             writer = csv.writer(file)
             if not file_exists:
-                writer.writerow(["step", "action", "reward", "observation"])
-            writer.writerow([entry.step, entry.action, entry.reward, entry.observation])
+                writer.writerow(["step", "action", "reward", "observation", "env_params"])
+            writer.writerow([entry.step, entry.action, entry.reward, entry.observation, entry.env_params])

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 386 - 390, The CSV
writer is currently persisting only the step, action, reward, and observation
fields from the TrajectoryEntry object, but the env_params field is being
populated in memory and should also be written to disk. Add "env_params" to the
CSV header row in the writerow call when file_exists is False, and add
entry.env_params to the data row writerow call to ensure the env_params field is
persisted alongside the other trajectory fields in the CSV artifact.

260-262: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep env.csv and trajectory.csv aligned on constraint failures.

before_step() has already written env.csv by Line 260, but this early return skips write_trajectory() and after_step(). That leaves an env.csv row with no matching trajectory row whenever a trial is rejected by constraints.

Proposed fix

         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0]
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 260 - 262, The
constraint check failure at line 260 in the step method causes an early return
that skips both write_trajectory() and after_step() calls, leaving env.csv with
an unmatched row. To fix this, before the early return statement in the
constraint_check condition, ensure that write_trajectory() is called to write a
matching trajectory row, and verify that after_step() is also invoked (or that
its necessary cleanup logic executes) so both CSV files remain synchronized when
constraint failures occur.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Line 206: The step method has an incorrect type hint for the action parameter.
The type hint `dict[str, int]` is too restrictive because the gymnasium Dict
action space (created in __init__) yields actions containing both Discrete
spaces with numpy int scalars and Box spaces with numpy float32 arrays. Change
the action parameter type hint in the step method from `dict[str, int]` to
`dict[str, Any]` to properly accommodate all possible action values from the
combined action space.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 386-390: The CSV writer is currently persisting only the step,
action, reward, and observation fields from the TrajectoryEntry object, but the
env_params field is being populated in memory and should also be written to
disk. Add "env_params" to the CSV header row in the writerow call when
file_exists is False, and add entry.env_params to the data row writerow call to
ensure the env_params field is persisted alongside the other trajectory fields
in the CSV artifact.
- Around line 260-262: The constraint check failure at line 260 in the step
method causes an early return that skips both write_trajectory() and
after_step() calls, leaving env.csv with an unmatched row. To fix this, before
the early return statement in the constraint_check condition, ensure that
write_trajectory() is called to write a matching trajectory row, and verify that
after_step() is also invoked (or that its necessary cleanup logic executes) so
both CSV files remain synchronized when constraint failures occur.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 54ac0a89-e79c-4c2e-9208-d0149316c2ff

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0e8cc and 9e8fa29.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (17)

pyproject.toml
src/cloudai/_core/action_space.py
src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/core.py
src/cloudai/models/workload.py
tests/test_action_space.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_gymnasium_adapter_contract.py
tests/test_handlers.py
tests/test_test_scenario.py

rutayan-nv · 2026-06-16T15:15:24Z

@coderabbitai review

coderabbitai · 2026-06-16T15:15:34Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cloudai/configurator/cloudai_gym.py (1)

237-262: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Complete constraint-failed trials through the same trajectory/observer path.

before_step() can sample/persist env params before this branch, but a constraint failure returns without a matching trajectory row or after_step() callback. It also returns a one-element observation even though offline observation shape now follows agent_metrics. Write the failure trajectory with current_env_params, call after_step(), and return an observation with the declared length.

Proposed fix

         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0] * max(len(self.test_run.test.agent_metrics), 1)
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 237 - 262, The
constraint check failure path is incomplete compared to the cached result path
above it. When the constraint_check fails in the conditional block, you need to
add three things to match the trajectory/observer pattern: (1) write a
TrajectoryEntry to the trajectory using write_trajectory() with the current
step, action, the constraint_failure reward, an observation with the correct
length matching agent_metrics shape, and current_env_params; (2) call
after_step() on each observer in self.observers with the appropriate observation
and reward; (3) fix the return statement to return an observation with the
declared/correct length instead of the hardcoded one-element list [-1.0]. This
ensures constraint failures follow the same trajectory logging and observer
callback path as successful steps and cached results.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/cli/handlers.py`:
- Around line 350-352: The conditional at line 351 allows --single-sbatch to
bypass the agent-driven (live-RL) routing check. Currently, any --single-sbatch
scenario goes directly to handle_non_dse_job even if agent_driven test runs are
present. Modify the condition to exclude live-RL scenarios from the
--single-sbatch path by ensuring that if any(agent_driven) is true (indicating
live-RL), the code does not take the handle_non_dse_job branch regardless of the
--single-sbatch flag. This ensures live-RL jobs always route through the correct
path to execute agent.run() as required by the PR objective.

In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 207-210: The issue is that online trajectory steps use
_online_step_count which resets to 0 on every reset() call, causing trajectory
rows to collapse back to step=1 while TestRun.step remains stale, unlike the
offline path which properly uses TestRun.step. In the _online_step() method,
increment TestRun in the same way the offline path does, and write the TestRun
step value to trajectory rows instead of using _online_step_count, ensuring
consistency between online and offline trajectory tracking. Also remove or stop
resetting _online_step_count in the reset() method since it should no longer be
used.

In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 63-84: The `GymnasiumAdapter` class currently relies on duck
typing to satisfy the gymnasium interface, but ecosystem tools like Stable
Baselines3 require proper inheritance from `gymnasium.Env` to work correctly.
Add a guarded import at the top of the file that attempts to import
`gymnasium.Env` and falls back to `object` if the import fails (since gymnasium
is an optional dependency), then change the `GymnasiumAdapter` class definition
to inherit from that imported base class instead of just `object`. This
maintains compatibility when gymnasium is not installed while enabling proper
isinstance checks for ecosystem integration.

In `@tests/test_env_params.py`:
- Around line 87-88: The pytest.raises(ValueError) context manager in the
sink.write call is too broad and would pass for any ValueError, not just the
intended validation error for the drop_rate parameter. Add a match= pattern to
the pytest.raises(ValueError) call that specifies the expected error message
text from the drop_rate validation failure. This ensures the test specifically
validates the intended error path rather than catching unrelated ValueErrors.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 237-262: The constraint check failure path is incomplete compared
to the cached result path above it. When the constraint_check fails in the
conditional block, you need to add three things to match the trajectory/observer
pattern: (1) write a TrajectoryEntry to the trajectory using write_trajectory()
with the current step, action, the constraint_failure reward, an observation
with the correct length matching agent_metrics shape, and current_env_params;
(2) call after_step() on each observer in self.observers with the appropriate
observation and reward; (3) fix the return statement to return an observation
with the declared/correct length instead of the hardcoded one-element list
[-1.0]. This ensures constraint failures follow the same trajectory logging and
observer callback path as successful steps and cached results.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 9e582d9e-80ee-44c5-b97b-bdca22e4a058

📥 Commits

Reviewing files that changed from the base of the PR and between 9e8fa29 and 58acf09.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (15)

pyproject.toml
src/cloudai/_core/action_space.py
src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/core.py
src/cloudai/models/workload.py
tests/test_action_space.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_gymnasium_adapter_contract.py
tests/test_handlers.py

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/cli/handlers.py`:
- Around line 374-376: The mixed-input guard that validates agent-driven and
non-agent-driven test runs are not mixed together must be evaluated before the
args.single_sbatch short-circuit in the handler logic. Currently, the guard is
being bypassed when single_sbatch is true, allowing invalid mixed scenarios to
proceed through handle_non_dse_job. Add a guard at the beginning of the routing
logic (before the args.single_sbatch check) that validates the test_scenario
does not contain mixed agent-driven and plain jobs, and raise an appropriate
error if it does. Apply this same guard pattern to the other affected location
mentioned in the comment to ensure consistent enforcement of the mixed-job
contract across all routing paths.

In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 332-334: In the CloudAIGymEnv class logging statement, replace the
f-string formatting with %-style formatting in the logging.info call. Change the
f-string `f"CloudAIGymEnv [online] step {self._online_step_count}"` to use
%-style formatting by passing the message template as the first argument and the
variable as a subsequent argument to logging.info(). This ensures the string is
only evaluated when the logging level actually permits the message to be logged,
improving performance.

In `@tests/test_handlers.py`:
- Around line 513-529: The _run_routing helper function only tests the
single_sbatch=False path, leaving the single_sbatch=True code path untested and
at risk of regression. Modify the _run_routing function to accept a
single_sbatch parameter with a default value, then pass this parameter to the
argparse.Namespace creation instead of the hardcoded single_sbatch=False value.
Additionally, create or extend test cases in the range of lines 531-551 to call
_run_routing with both single_sbatch=True and single_sbatch=False to ensure both
routing paths are properly tested.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: ef8acc19-cbf0-48fb-8ffc-24c4e46da0a0

📥 Commits

Reviewing files that changed from the base of the PR and between 5ed62b4 and d5c55ef.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

pyproject.toml
src/cloudai/_core/action_space.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/core.py
tests/test_action_space.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_gymnasium_adapter_contract.py
tests/test_handlers.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cloudai/util/lazy_imports.py (1)
2-2: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Copyright header year must be updated to pass CI checks.

The copyright year is "2025" but tests expect "2025-2026". Per repository conventions, files touched in consecutive years should use a hyphen range.
📅 Proposed fix
-# Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/util/lazy_imports.py` at line 2, The copyright header in the file
has the year listed as "2025" but the CI tests expect "2025-2026" per repository
conventions for files modified across consecutive years. Update the copyright
year in the header comment from "2025" to "2025-2026" to match the expected
format and pass the CI checks.
Source: Learnings

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cloudai/util/lazy_imports.py`:
- Line 2: The copyright header in the file has the year listed as "2025" but the
CI tests expect "2025-2026" per repository conventions for files modified across
consecutive years. Update the copyright year in the header comment from "2025"
to "2025-2026" to match the expected format and pass the CI checks.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: ea876b83-809d-4c4f-86ec-fa3a9f8c427e

📥 Commits

Reviewing files that changed from the base of the PR and between d5c55ef and 0b28eb9.

📒 Files selected for processing (8)

src/cloudai/cli/handlers.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/core.py
src/cloudai/util/lazy_imports.py
tests/test_cloudaigym.py
tests/test_handlers.py

rutayan-nv · 2026-06-17T18:36:50Z

Design note (carried from a CodeRabbit finding on base_agent.py, #933): BaseAgent.run() ignores the done flag from env.step() and terminates only via select_action() -> None + max_steps.

In offline DSE this is intentional — a constraint failure returns done=True with a penalty and the sweep should continue. In online/live-RL mode, however, GymServer.step() (via CloudAIGymEnv._online_step, cloudai_gym.py:314) can return a genuine terminal done=True, which run() currently steps past without ending or resetting the episode.

Flagging for a decision on this PR: should live-RL honor server-emitted done (end/reset the episode), or is stepping to max_steps the intended behavior? No code change made.

Drop the EnvParamsSink Protocol + CsvSink pair (and runtime_checkable) for a single concrete EnvParamsSink, built unconditionally in CloudAIGymEnv. The sink is now stateless: write() takes the record path per call and skips empty samples, so non-DR runs write nothing and write_trajectory needs no branch. Derive both records from a new iteration_dir property and expose the env record via the env_params_record_path property (was _env_csv_path), keeping env.csv and trajectory.csv step-aligned without coupling the name to CSV.

…ty flag Replace the hardcoded `agent == "grid_search"` check with a BaseAgent.samples_env_params capability flag (opt-in, defaults False). Only agents whose search consumes per-trial env_params sampling set it True; enumerating/surrogate agents leave it False, so a config that declares env_params for an agent that would ignore them is rejected up front instead of silently no-op'ing. New agents answer for themselves with no string to maintain. Relocate validate_dse_env_params out of the CLI handlers into configurator/env_params.py next to the logic it guards, looking the agent up via the Registry. Unknown agents are deferred to the dedicated agent-resolution error rather than masked here. Keep all public-facing comments, docstrings, and the error message generic (no internal agent names). Cover the full validator matrix, including the unknown-agent deferral.

Compress multi-line inline comments down to the single non-obvious rationale (or drop them where the code already speaks), per the self-documenting-code principle. Public API docstrings and test intent comments are left intact.

apply_params_set overlays sampled scalar draws onto cmd_args, then reconstructs the TestDefinition to validate the applied action values. That pass re-ran validate_env_params, which rejects a weighted env_param whose cmd_args target is no longer a candidate list - exactly what the overlay produces. env_params is already validated at parse time, so drop it from the validation-only dump. Adds a regression test covering a weighted env_param's scalar draw.

rutayan-nv · 2026-06-25T21:27:57Z

Parking this PR for now. Per a change in release plan, this work isn't needed for the upcoming aiconfigurator release, so I'm moving it to draft to keep it out of the review queue. @podkidyshev — no need to review this one yet; I'll mark it ready again when it's scheduled for release. Thanks!

An env_params entry only reclassifies a list-valued cmd_args sweep as env-sampled; a scalar is already fixed, so annotating it is a meaningless label. Previously such an annotation was tolerated as a silent no-op, which let it slip through parse-time validation and inconsistently trip (or not) the downstream "no agent will sample them" check depending on run mode. Reject it where the contract lives - TestDefinition.validate_env_params - so the failure is immediate and mode-independent. EnvParams.from_test's non-list guard becomes defensive (parse-time now guarantees lists); the post-overlay path already drops env_params before re-validating, so concrete scalar draws are unaffected. Extract the per-field checks into a helper to keep the validator under the complexity limit, and update tests: scalar annotations now assert rejection instead of no-op tolerance.

…otocol Add ObsLeafDescriptor (a self-describing observation leaf: "box" of width dim, or "discrete" of size n) and a StructuredObservation Protocol that documents the optional env hooks structured_observation_descriptors() and encode_observation(). These let an env expose a named, per-leaf observation so adapters (e.g. GymnasiumAdapter) can build the matching gymnasium spaces.Dict; the hooks are duck-typed, so envs need not subclass. Both exported via cloudai.core.

…ejection tests Negative tests pass an extra kwarg and an out-of-Literal kind to assert ValidationError; mark the deliberate type violations with type: ignore.

Wrap a CloudAI BaseGym as a gymnasium.Env-shaped object: a spaces.Dict of Discrete (list params) and Box (ContinuousSpace) actions over the tunable params with fixed (single-value) params injected each step; observations as either a flat float32 Box or, when the env opts in via the structured-obs hooks, a spaces.Dict of per-leaf ObsLeafDescriptor subspaces. Continuous dtype="int" params are quantized (rounded/clamped) at decode_action so the trajectory cache key collapses float jitter. The adapter is a pure pass-through over test_run.step (never mutates it), so contextual-bandit rollouts that reset() per trial keep a monotonic trial index. gymnasium is an optional dependency lazy-imported behind the new [rl] extra (also added to dev); CloudAIGymEnv.define_observation_space() now returns one slot per agent metric so adapters get the right Box shape. Exported via cloudai.core. Caller-contract tests pin the step-monotonicity, observation pass-through, continuous-quantization, and structured-obs invariants.

…ature step() delegates to decode_action(dict[str, Any]) and exists precisely to round float/continuous policy actions to ints; widen its parameter type from dict[str, int] to dict[str, Any] to match.

…reserve traceback on DSE re-raise - _as_obs_array(): assert encoded keys match descriptors before coercion (reuses _assert_keys, same guard as decode_action/step_raw) and materialize output by descriptor keys to avoid KeyError on extra keys and silent partial observations on missing keys. - handlers.py: re-raise the captured hard-fail with its original traceback. Addresses CodeRabbit findings on NVIDIA#930.

…ngleton Replace the bespoke _import_gymnasium() in-method seam with the canonical lazy.gymnasium / lazy.np properties; addresses the in-method-import review concern. No behavior change — gymnasium stays an optional [rl] extra.

The lazy.gymnasium.spaces refactor gives the adapter precise gymnasium types instead of Any, which surfaced two latent issues the scoped pre-commit run missed: - pyright now sees Space[Any]/Dict in the adapter contract test, so concrete attribute access (.low/.high/.n/.spaces) is flagged. Narrow via local bindings + isinstance before access. - lazy_imports.py now has a 2026 commit in its history, so the ci_only copyright check requires the year range 2025-2026.

Inherit from gymnasium.Env (guarded import, falls back to object when the optional [rl] extra is absent) so ecosystem tooling that performs isinstance checks (e.g. Stable-Baselines3) accepts the adapter. - Use the TYPE_CHECKING import form so pyright sees a concrete base class while runtime keeps the optional-dependency fallback. - Drop ClassVar on metadata to match Env's attribute shape (noqa RUF012). - Rename the inner-env accessor unwrapped -> cloudai_env; gymnasium's Env.unwrapped (returns self) is the correct base-env semantics, and the old override returned a non-Env (BaseGym), which would mislead ecosystem code calling .unwrapped.

Inheriting gymnasium.Env widened the static type of action_space from the concrete spaces.Dict the adapter builds to the base spaces.Space, which has no __getitem__. Cast at the test call site to restore subspace indexing for pyright (runtime is unchanged; action_space is always a Dict).

…rse of decode_action decode_action had no public inverse, so consumers needing value->index encoding (e.g. RLlib warm-start / behavioral cloning) reached into the private _tunable_params dict. When ContinuousSpace support split that internal, those consumers broke with AttributeError. encode_action closes the contract: discrete values map to their candidate index, continuous values wrap into the clamped float32 Box array, so decode_action(encode_action(v)) == v for any native v. Adds round-trip contract tests pinning the invariant and rejection of non-candidate values / key mismatches.

…tinuousSpace The GymnasiumAdapter's continuous-action path depends on ContinuousSpace, which ships separately. Until then nothing constructs a ContinuousSpace, so the continuous branches here are unreachable and the only effect of the import is an ImportError at module load. Drop the continuous import, _continuous_params, the Box action mapping, and decode/encode continuous handling so the adapter builds and ships standalone over discrete + structured-observation support. The continuous support rejoins when ContinuousSpace lands.

Adds an optional online (live-RL) execution mode: instead of launching a workload through the runner, the env delegates each trial to an in-process GymServer. Mode is selected via an injected server or cmd_args.live_rl_mode (server built from cmd_args.env_class); the agent-facing interface is unchanged. Default runner-backed behavior is untouched.

…split A bare class name (no module) made rsplit('.', 1) return one element and raise an opaque unpacking ValueError; fail fast with a clear message. Addresses CodeRabbit finding on NVIDIA#932.

Online (live-RL) steps wrote trajectory rows from a per-instance counter that reset to 0 on every reset(). Under reset-per-episode Gymnasium/RLlib rollouts that collapsed online trajectory.csv rows back to step=1 each episode. Drive online steps from the monotonic test_run.step (incremented per step, never reset) so rows keep increasing across resets. Removes the now-redundant _online_step_count; render() and the trajectory writer both read test_run.step. Adds a regression test asserting steps stay 1,2,3 across a mid-rollout reset.

env_params_record_path was pinned to iteration_dir (runner.scenario_root), but online (live-RL) runs write trajectory.csv under test_run.output_path and have no runner-backed scenario_root. write_trajectory evaluates the record path unconditionally, so an online step hit scenario_root on a runner that does not expose it. Derive the path from trajectory_file_path so env.csv always sits next to trajectory.csv in both modes, with no extra branch in the write path.

An online live-RL run carries no TOML sweep (is_dse_job is False) but must still drive the agent's own run() loop. handle_dry_run_and_run now treats a run as agent-driven when it is a DSE sweep OR cmd_args.live_rl_mode is set, routing it to handle_dse_job instead of the grid-unrolled non-DSE path.

…s a valid config Locks in the contract that an online live-RL run sources its action space from the GymServer, not a TOML param space. A config with only env_params (no list-valued cmd_args/extra_env_vars/num_nodes) is therefore complete and valid: is_dse_job is False and param_space is empty, yet CloudAIGymEnv still builds the action space from the server and honors env_params (EnvParamsObserver is built).

single_sbatch grid-unrolls DSE param spaces in one allocation, but SingleSbatchRunner has no live-RL path, so a live_rl_mode job under --single-sbatch would silently run once as a static job instead of driving the in-process GymServer loop. Hard-error on that combination. This is a narrow stopgap; the broader single_sbatch-vs-agent-driven routing rework (mixed-job guard ordering, per-agent-type semantics) is tracked in NVIDIA#937 and referenced from a TODO at the guard. Extract routing into _dispatch_agent_driven_run to keep handle_dry_run_and_run under the complexity limit.

The live-RL-only test still asserted the removed observer pattern and declared env_params with no candidate list, which the first-class EnvParams model (values live in cmd_args) no longer builds. Give ball_speed a candidate list and assert env.params is built; it stays excluded from the action space, so is_dse_job/param_space are unchanged.

rutayan-nv requested review from jeffnvidia, podkidyshev and srivatsankrishnan as code owners June 16, 2026 14:02

This was referenced Jun 16, 2026

More Gym enhancements #884

Draft

Gym enhancements #863

Draft

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/cloudai/configurator/gymnasium_adapter.py Outdated

rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch from 9e8fa29 to 58acf09 Compare June 16, 2026 15:06

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/cloudai/configurator/cloudai_gym.py

Comment thread src/cloudai/configurator/gymnasium_adapter.py Outdated

Comment thread tests/test_env_params.py Outdated

rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch 2 times, most recently from 5ed62b4 to d5c55ef Compare June 16, 2026 21:17

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/cloudai/cli/handlers.py Outdated

Comment thread src/cloudai/configurator/cloudai_gym.py

Comment thread tests/test_handlers.py Outdated

rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch from d5c55ef to 0b28eb9 Compare June 16, 2026 21:48

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch from 0b28eb9 to b21a3fe Compare June 16, 2026 22:15

rutayan-nv mentioned this pull request Jun 16, 2026

[CLI/Core] Polymorphic agent dispatch + TestRun owns trial counter #933

Merged

rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch 2 times, most recently from f8d6141 to 7a1c2ca Compare June 17, 2026 18:33

rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch 7 times, most recently from 48871be to e657665 Compare June 21, 2026 15:23

rutayan-nv added 4 commits June 25, 2026 15:50

rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch from 3ac3d1a to eca78d4 Compare June 25, 2026 21:15

rutayan-nv marked this pull request as draft June 25, 2026 21:27

rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch from eca78d4 to fba8a72 Compare June 25, 2026 22:17

rutayan-nv added 21 commits June 26, 2026 11:12

test(env-params): suppress pyright on intentional ObsLeafDescriptor r…

584f145

…ejection tests Negative tests pass an extra kwarg and an out-of-Literal kind to assert ValidationError; mark the deliberate type violations with type: ignore.

fix(gymnasium-adapter): accept continuous actions in step() type sign…

6c2452f

…ature step() delegates to decode_action(dict[str, Any]) and exists precisely to round float/continuous policy actions to ints; widen its parameter type from dict[str, int] to dict[str, Any] to match.

style(configurator): ruff format gymnasium_adapter.py

2e2d431

fix(cloudai-gym): validate env_class is a dotted import path before r…

24de162

…split A bare class name (no module) made rsplit('.', 1) return one element and raise an opaque unpacking ValueError; fail fast with a clear message. Addresses CodeRabbit finding on NVIDIA#932.

rutayan-nv force-pushed the rpatro/gym-live-rl-routing branch from fba8a72 to 523a431 Compare June 26, 2026 15:41

Uh oh!

Conversation

rutayan-nv commented Jun 16, 2026

Issue

Fix

Testing

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

rutayan-nv commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rutayan-nv commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

rutayan-nv commented Jun 17, 2026

Uh oh!

rutayan-nv commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading