feat(configurator): add online GymServer mode to CloudAIGymEnv by rutayan-nv · Pull Request #932 · NVIDIA/cloudai

rutayan-nv · 2026-06-16T13:28:57Z

Issue

CloudAIGymEnv can only run trials by launching workloads through the runner — there is no path for fast, in-process online / live-RL training (e.g. kvpilot).

Fix

Add an optional online mode: the env delegates reset/step/action+observation spaces to an in-process GymServer instead of the runner. Mode is selected by an injected server or cmd_args.live_rl_mode=true (server built from cmd_args.env_class); the agent-facing interface and default runner-backed behavior are unchanged.

Testing

pytest tests/test_cloudaigym.py — existing offline tests stay green; 6 new tests cover delegation, trajectory output, env_class import/kwarg-filtering, missing-env_class error, and live_rl_mode auto-detection. ruff/pyright/vulture/import-linter clean.

Stacked on #930 (review after it merges; the diff narrows to this commit once parents land). Extracted from the genuinely-unique online-gym piece of #884.

coderabbitai · 2026-06-16T13:30:45Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 68808524-0a9e-4cc3-b4da-5a95fc8e1ab0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a live-RL execution path to CloudAI DSE: a ContinuousSpace Pydantic model for continuous action parameters, an env-params domain-randomization system (EnvParamSpec, EnvParamsSampler, CsvSink, EnvParamsObserver), an online GymServer protocol integrated into CloudAIGymEnv with (action, env_params)-keyed trajectory caching, a GymnasiumAdapter wrapping BaseGym to the gymnasium.Env interface, a concrete BaseAgent.run() loop, and a refactored DSE CLI handler that delegates to agent.run() with structured failure artifact reporting.

Changes

Live-RL and Gymnasium Integration

Layer / File(s)	Summary
Action-space contracts, TestRun extensions, and workload env_params `pyproject.toml`, `src/cloudai/_core/action_space.py`, `src/cloudai/_core/test_scenario.py`, `src/cloudai/models/workload.py`, `tests/test_action_space.py`, `tests/test_test_scenario.py`	`ContinuousSpace` Pydantic model with `low`/`high`/`dtype` and bound validation is introduced; `TestRun` gains `current_env_params` field and `increment_step()`; `TestDefinition` gains `env_params: dict[str, EnvParamSpec]`; `gymnasium~=1.2` is added as a dev and `rl` optional dependency.
Env-params sampling, CSV persistence, and step observers `src/cloudai/configurator/env_params.py`, `tests/test_env_params.py`	New `env_params.py` introduces `EnvParamSpec` and `ObsLeafDescriptor` schemas, `StructuredObservation` protocol, deterministic `EnvParamsSampler` keyed by `(seed, name, trial)`, `CsvSink` for step-aligned `env.csv` writes, and `EnvParamsObserver` that samples and stores env params before each trial step.
CloudAIGymEnv: online GymServer mode and env_params trajectory caching `src/cloudai/configurator/cloudai_gym.py`, `tests/test_cloudaigym.py`	Adds `GymServer` Protocol and `_create_gym_server` factory; extends `CloudAIGymEnv` to accept an optional server and auto-detect `live_rl_mode`; wires `EnvParamsObserver` via `_build_observers()`; routes `reset()`/`step()` through online (`_online_step`) vs offline paths; extends `TrajectoryEntry` with `env_params`; changes cache hits to require exact `(action, env_params)` match; adds integration tests for env-params cache semantics and online Gym-server delegation.
GymnasiumAdapter: BaseGym to gymnasium.Env interface `src/cloudai/configurator/gymnasium_adapter.py`, `tests/test_gymnasium_adapter_contract.py`	New `GymnasiumAdapter` wraps `BaseGym` with lazy `gymnasium` import; builds `action_space` as `spaces.Dict` (discrete→`Discrete`, `ContinuousSpace`→`Box`, single-element fixed injection); builds `observation_space` from `StructuredObservation` descriptors or flat `float32 Box`; implements `decode_action()`, `reset()`, `step()`, `step_raw()` returning gymnasium 5-tuples with dtype-aware observation coercion.
BaseAgent.run() loop and DSE handler refactor with failure reporting `src/cloudai/configurator/base_agent.py`, `src/cloudai/cli/handlers.py`, `tests/test_handlers.py`	`BaseAgent.run()` drives the `reset→select_action→env.step→update_policy` trial loop; `handle_dse_job` is refactored to call `agent.run()` per `TestRun`, accumulate non-zero return codes, and wrap in try/except that writes `dse_failure.txt` via `_record_run_failure` before re-raising hard failures.
Public API exports wiring `src/cloudai/configurator/__init__.py`, `src/cloudai/core.py`	`configurator/__init__.py` exports `GymServer` and `GymnasiumAdapter`; `core.py` adds `ContinuousSpace`, `GymServer`, `GymnasiumAdapter`, `ObsLeafDescriptor`, and `StructuredObservation` to `__all__`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

srivatsankrishnan
jeffnvidia
podkidyshev

Poem

🐇 Hopscotch through the gym, each trial a leap,
Rewards cascade in trajectory.csv deep,
env_params sampled with (seed, name, trial) care,
The adapter bridges gymnasium's air.
On hard-fail? A dse_failure.txt I write—
Then reraise, for the truth must stay in sight! 🌟

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding online GymServer mode to CloudAIGymEnv for live-RL training, directly aligned with the PR's primary objective.
Description check	✅ Passed	The description comprehensively explains the issue (offline-only execution), the solution (online mode with optional GymServer), and testing approach. It directly relates to the changeset which implements this online mode.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 86-87: The code at the rsplit call for env_class_path assumes the
input contains at least one dot, but if env_class_path is a bare class name
without a module path (e.g., "FakeGymServer"), rsplit returns a single-element
list causing a ValueError on unpacking. Add validation before the rsplit call to
check that env_class_path contains a dot separator; if it does not, either raise
a descriptive ValueError explaining the required format or handle it
appropriately. The validation should occur before attempting to unpack the
rsplit result into module_path and class_name.

In `@src/cloudai/configurator/env_params.py`:
- Line 2: The copyright year in the header at the top of env_params.py uses a
range format "2024-2026" which violates the repository's copyright convention
enforced by the copyright header test. Since this is a new file only touched in
2026, update the copyright line from "# Copyright (c) 2024-2026 NVIDIA
CORPORATION & AFFILIATES. All rights reserved." to use a single year format: "#
Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved." This
aligns with the repository's requirement that new files use only the current
year, not a year range.

In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Line 206: The step() method's action parameter type annotation is too
restrictive and does not match the actual types it receives. In
src/cloudai/configurator/gymnasium_adapter.py at line 206, change the step()
method signature from action: dict[str, int] to action: dict[str, Any] to
accommodate both integer and numpy array values for continuous actions, aligning
with what decode_action already handles. In
tests/test_gymnasium_adapter_contract.py at line 345, no direct change is
needed; once the step() signature is corrected to accept dict[str, Any], the
pyright type error in the test will automatically resolve since the test is
passing dict[str, NDArray | int] which matches the widened contract.

In `@tests/test_env_params.py`:
- Line 2: The copyright header in tests/test_env_params.py uses a year range
"2025-2026" which violates the repository's copyright check convention enforced
by tests/test_check_copyright_headers.py. For new files touched only in 2026,
update the copyright line from "# Copyright (c) 2025-2026 NVIDIA CORPORATION &
AFFILIATES. All rights reserved." to "# Copyright (c) 2026 NVIDIA CORPORATION &
AFFILIATES. All rights reserved." (single year only, not a range).
- Around line 87-88: The pytest.raises(ValueError) context manager on line 87
lacks a match parameter, making the test less specific. Add a match parameter to
pytest.raises(ValueError) with a regex pattern that matches the expected error
message when sink.write is called with step=0 (the invalid parameter). This
clarifies the test intent and ensures the correct error message is being raised.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 04f44c70-d208-4bf7-a8c8-d023090d11a1

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0e8cc and aa38d4e.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (17)

pyproject.toml
src/cloudai/_core/action_space.py
src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/core.py
src/cloudai/models/workload.py
tests/test_action_space.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_gymnasium_adapter_contract.py
tests/test_handlers.py
tests/test_test_scenario.py

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cloudai/configurator/cloudai_gym.py (1)

237-263: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep env.csv and trajectory.csv aligned on constraint failures.

observer.before_step() runs at Lines 237-238 and can append an env.csv row, but the constraint-failure early return at Lines 260-263 skips trajectory write/observer post-hook. That leaves a step in env.csv with no corresponding trajectory.csv row.

Suggested fix

         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0]
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(getattr(self.test_run, "current_env_params", {}) or {}),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 237 - 263, The
constraint failure early return at lines 260-263 skips both the trajectory write
and observer post-hook, but observer.before_step() was already called and may
have written to env.csv. Add a call to self.write_trajectory() with a
TrajectoryEntry and then call observer.after_step() for each observer before the
early return in the constraint check failure path, using appropriate values
(like the constraint_failure reward) to keep env.csv and trajectory.csv aligned,
similar to how the cached result path handles it at lines 250-257.

♻️ Duplicate comments (2)

src/cloudai/configurator/gymnasium_adapter.py (1)

206-206: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Widen step() action type to match continuous-action payloads.

At Line 206, dict[str, int] is too narrow for Box actions (np.ndarray leaves). This is the source of the current pyright failure and mismatches decode_action() behavior.

#!/bin/bash
# Verify root-cause annotation and failing call shape.
rg -nP 'def step\(self, action: dict\[str, int\]\)' src/cloudai/configurator/gymnasium_adapter.py
rg -nP 'adapter\.step\(\{"threshold": np\.array\(' tests/test_gymnasium_adapter_contract.py -C2

Suggested fix

-    def step(self, action: dict[str, int]) -> tuple[Any, float, bool, bool, dict[str, Any]]:
+    def step(self, action: dict[str, Any]) -> tuple[Any, float, bool, bool, dict[str, Any]]:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/gymnasium_adapter.py` at line 206, The `step()`
method in gymnasium_adapter.py has an overly restrictive type annotation for the
action parameter. Change the action parameter type from `dict[str, int]` to a
wider union type that accommodates both integer actions and numpy array values
(used by continuous Box actions). This will align the type annotation with what
`decode_action()` actually accepts and resolve the pyright type mismatch
failure.

Source: Pipeline failures

src/cloudai/configurator/cloudai_gym.py (1)

86-87: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate env_class format before unpacking.

At Line 86, rsplit(".", 1) still assumes a dotted import path and can raise an opaque unpacking ValueError for malformed values (e.g., "FakeGymServer"). Please fail with a descriptive error first.

Suggested fix

+    if "." not in str(env_class_path):
+        raise ValueError(
+            f"env_class must be a dotted import path like 'pkg.module.Class'; got {env_class_path!r}"
+        )
     module_path, class_name = str(env_class_path).rsplit(".", 1)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 86 - 87, The code at
the rsplit(".", 1) operation in the module imports section assumes a valid
dotted import path format and will raise an opaque unpacking ValueError if the
env_class_path doesn't contain a dot (e.g., "FakeGymServer"). Add validation
before the rsplit to check that the env_class_path string contains at least one
dot separator, and raise a descriptive ValueError with a clear error message if
the format is invalid, then proceed with the rsplit and unpacking only after
validation passes.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/base_agent.py`:
- Around line 149-161: The loop in the base_agent.py file continues iterating
through all self.max_steps regardless of the terminal state flag returned from
env.step(). Add a break statement in the loop after the update_policy() call to
exit when done=True, preventing the execution of unintended post-terminal steps
that violate proper episode semantics. The done flag is already being captured
from env.step() and passed to update_policy(), so you only need to check its
value and break the loop when it becomes True.

In `@tests/test_action_space.py`:
- Around line 43-50: Replace direct constructor calls with model_validate()
method calls to defer validation to runtime. In tests/test_action_space.py at
lines 43-50, update test_continuous_space_rejects_unknown_dtype() to use
ContinuousSpace.model_validate({"low": 0.0, "high": 1.0, "dtype": "double"})
instead of the constructor call, and update
test_continuous_space_forbids_extra_fields() to use
ContinuousSpace.model_validate({"low": 0.0, "high": 1.0, "step": 0.1}) instead
of the constructor call. In tests/test_env_params.py at lines 145-149, similarly
replace ObsLeafDescriptor constructor calls with
ObsLeafDescriptor.model_validate({...}) for both test cases that pass invalid
arguments (the case with extra field unexpected=1 and the case with invalid
kind="categorical"). This defers type validation to runtime so
pytest.raises(ValidationError) can catch the errors instead of pyright rejecting
them during static type checking.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 237-263: The constraint failure early return at lines 260-263
skips both the trajectory write and observer post-hook, but
observer.before_step() was already called and may have written to env.csv. Add a
call to self.write_trajectory() with a TrajectoryEntry and then call
observer.after_step() for each observer before the early return in the
constraint check failure path, using appropriate values (like the
constraint_failure reward) to keep env.csv and trajectory.csv aligned, similar
to how the cached result path handles it at lines 250-257.

---

Duplicate comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 86-87: The code at the rsplit(".", 1) operation in the module
imports section assumes a valid dotted import path format and will raise an
opaque unpacking ValueError if the env_class_path doesn't contain a dot (e.g.,
"FakeGymServer"). Add validation before the rsplit to check that the
env_class_path string contains at least one dot separator, and raise a
descriptive ValueError with a clear error message if the format is invalid, then
proceed with the rsplit and unpacking only after validation passes.

In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Line 206: The `step()` method in gymnasium_adapter.py has an overly
restrictive type annotation for the action parameter. Change the action
parameter type from `dict[str, int]` to a wider union type that accommodates
both integer actions and numpy array values (used by continuous Box actions).
This will align the type annotation with what `decode_action()` actually accepts
and resolve the pyright type mismatch failure.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 8bc0b625-bfb7-44c2-acc4-7100e48ae965

📥 Commits

Reviewing files that changed from the base of the PR and between aa38d4e and a219cbd.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (17)

pyproject.toml
src/cloudai/_core/action_space.py
src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/core.py
src/cloudai/models/workload.py
tests/test_action_space.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_gymnasium_adapter_contract.py
tests/test_handlers.py
tests/test_test_scenario.py

rutayan-nv · 2026-06-16T14:09:08Z

@coderabbitai review

coderabbitai · 2026-06-16T14:09:18Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cloudai/configurator/cloudai_gym.py (1)

237-263: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constraint-failure path can desynchronize env.csv and trajectory.csv.

observer.before_step(...) runs before constraint validation, so env params are persisted for the step, but the early return on constraint failure skips trajectory write. That breaks the step-alignment contract for downstream joins.

💡 Minimal ordering fix

-        for observer in self.observers:
-            observer.before_step(self.test_run)
-
-        cached_result = self.get_cached_trajectory_result(action)
-        if cached_result is not None:
+        if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
+            logging.info("Constraint check failed. Skipping step.")
+            return [-1.0], self.rewards.constraint_failure, True, {}
+
+        for observer in self.observers:
+            observer.before_step(self.test_run)
+
+        cached_result = self.get_cached_trajectory_result(action)
+        if cached_result is not None:
             ...
-
-        if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
-            logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 237 - 263, The
constraint check must be moved to occur before observer.before_step() is called.
Currently, observer.before_step() is invoked first (which persists env params),
then if the constraint check fails, the code returns early without writing a
trajectory entry, causing misalignment between env.csv and trajectory.csv.
Reorder the code so the constraint check using
self.test_run.test.constraint_check() happens before any observer calls,
ensuring that env params are only persisted for steps that actually produce
trajectory entries.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/_core/action_space.py`:
- Around line 47-55: The ContinuousSpace class with dtype="int" can produce
out-of-range values during rounding if the bounds are non-integers. Add
validation in the _validate_bounds method to enforce that when dtype="int", both
self.low and self.high must be integers (check that they equal their integer
conversions). This invariant ensures that rounding operations in downstream code
cannot exceed the intended bounds.

In `@tests/test_cloudaigym.py`:
- Around line 568-680: Add a new regression test function in
tests/test_cloudaigym.py that covers the constraint_check failure path when
env_params is declared. The test should create a CloudAIGymEnv with env_params
specified on the TestDefinition, mock the constraint_check method to return
False or raise an exception, call env.step() with an action, and then assert
that env.csv and trajectory.csv remain step-aligned by verifying that both files
either skip the step or both record it consistently. This ensures the failure
path maintains the corpus-friendly merge contract mentioned in the existing
tests.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 237-263: The constraint check must be moved to occur before
observer.before_step() is called. Currently, observer.before_step() is invoked
first (which persists env params), then if the constraint check fails, the code
returns early without writing a trajectory entry, causing misalignment between
env.csv and trajectory.csv. Reorder the code so the constraint check using
self.test_run.test.constraint_check() happens before any observer calls,
ensuring that env params are only persisted for steps that actually produce
trajectory entries.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 5f32c9ba-c452-4bfd-94aa-7ec4d8f6b692

📥 Commits

Reviewing files that changed from the base of the PR and between a219cbd and 0e80db2.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

pyproject.toml
src/cloudai/_core/action_space.py
src/cloudai/_core/test_scenario.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/core.py
src/cloudai/models/workload.py
tests/test_action_space.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_gymnasium_adapter_contract.py

rutayan-nv · 2026-06-16T15:15:23Z

@coderabbitai review

coderabbitai · 2026-06-16T15:15:38Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…split A bare class name (no module) made rsplit('.', 1) return one element and raise an opaque unpacking ValueError; fail fast with a clear message. Addresses CodeRabbit finding on NVIDIA#932.

coderabbitai

♻️ Duplicate comments (1)

src/cloudai/_core/action_space.py (1)

49-55: ⚠️ Potential issue | 🟠 Major

Enforce integer-compatible bounds when dtype="int".

Line 49 allows dtype="int", but Line 52-55 only validates ordering. This still permits non-integer bounds, and downstream decode (clamp then round) can emit values outside the declared interval.

🔧 Proposed fix

     `@model_validator`(mode="after")
     def _validate_bounds(self) -> Self:
         if self.low >= self.high:
             raise ValueError(f"ContinuousSpace requires low < high; got low={self.low}, high={self.high}")
+        if self.dtype == "int" and (not self.low.is_integer() or not self.high.is_integer()):
+            raise ValueError(
+                f"ContinuousSpace(dtype='int') requires integer bounds; got low={self.low}, high={self.high}"
+            )
         return self

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/_core/action_space.py` around lines 49 - 55, The _validate_bounds
method in ContinuousSpace only validates that low < high, but does not enforce
that low and high are integers when dtype="int". Add an additional validation
check in the _validate_bounds method that raises a ValueError if dtype is set to
"int" but either self.low or self.high are not integers or are not
integer-compatible values. This validation should occur alongside the existing
bounds ordering check to prevent non-integer bounds from being allowed when
integer dtype is specified.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/cloudai/_core/action_space.py`:
- Around line 49-55: The _validate_bounds method in ContinuousSpace only
validates that low < high, but does not enforce that low and high are integers
when dtype="int". Add an additional validation check in the _validate_bounds
method that raises a ValueError if dtype is set to "int" but either self.low or
self.high are not integers or are not integer-compatible values. This validation
should occur alongside the existing bounds ordering check to prevent non-integer
bounds from being allowed when integer dtype is specified.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 32b71de6-445d-4439-89d6-b07b150e0185

📥 Commits

Reviewing files that changed from the base of the PR and between 84ae7f1 and d654dfb.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

pyproject.toml
src/cloudai/_core/action_space.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/core.py
tests/test_action_space.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_gymnasium_adapter_contract.py
tests/test_handlers.py

coderabbitai

♻️ Duplicate comments (1)

src/cloudai/_core/action_space.py (1)

49-55: ⚠️ Potential issue | 🟠 Major

Enforce integer-compatible bounds when dtype="int".

Line 49 allows dtype="int", but Line 52-55 only validates ordering. This still permits non-integer bounds, and downstream decode (clamp then round) can emit values outside the declared interval.

🔧 Proposed fix

     `@model_validator`(mode="after")
     def _validate_bounds(self) -> Self:
         if self.low >= self.high:
             raise ValueError(f"ContinuousSpace requires low < high; got low={self.low}, high={self.high}")
+        if self.dtype == "int" and (not self.low.is_integer() or not self.high.is_integer()):
+            raise ValueError(
+                f"ContinuousSpace(dtype='int') requires integer bounds; got low={self.low}, high={self.high}"
+            )
         return self

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/_core/action_space.py` around lines 49 - 55, The _validate_bounds
method in ContinuousSpace only validates that low < high, but does not enforce
that low and high are integers when dtype="int". Add an additional validation
check in the _validate_bounds method that raises a ValueError if dtype is set to
"int" but either self.low or self.high are not integers or are not
integer-compatible values. This validation should occur alongside the existing
bounds ordering check to prevent non-integer bounds from being allowed when
integer dtype is specified.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/cloudai/_core/action_space.py`:
- Around line 49-55: The _validate_bounds method in ContinuousSpace only
validates that low < high, but does not enforce that low and high are integers
when dtype="int". Add an additional validation check in the _validate_bounds
method that raises a ValueError if dtype is set to "int" but either self.low or
self.high are not integers or are not integer-compatible values. This validation
should occur alongside the existing bounds ordering check to prevent non-integer
bounds from being allowed when integer dtype is specified.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 32b71de6-445d-4439-89d6-b07b150e0185

📥 Commits

Reviewing files that changed from the base of the PR and between 84ae7f1 and d654dfb.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

pyproject.toml
src/cloudai/_core/action_space.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/core.py
tests/test_action_space.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_gymnasium_adapter_contract.py
tests/test_handlers.py

…split A bare class name (no module) made rsplit('.', 1) return one element and raise an opaque unpacking ValueError; fail fast with a clear message. Addresses CodeRabbit finding on NVIDIA#932.

Drop the EnvParamsSink Protocol + CsvSink pair (and runtime_checkable) for a single concrete EnvParamsSink, built unconditionally in CloudAIGymEnv. The sink is now stateless: write() takes the record path per call and skips empty samples, so non-DR runs write nothing and write_trajectory needs no branch. Derive both records from a new iteration_dir property and expose the env record via the env_params_record_path property (was _env_csv_path), keeping env.csv and trajectory.csv step-aligned without coupling the name to CSV.

…ty flag Replace the hardcoded `agent == "grid_search"` check with a BaseAgent.samples_env_params capability flag (opt-in, defaults False). Only agents whose search consumes per-trial env_params sampling set it True; enumerating/surrogate agents leave it False, so a config that declares env_params for an agent that would ignore them is rejected up front instead of silently no-op'ing. New agents answer for themselves with no string to maintain. Relocate validate_dse_env_params out of the CLI handlers into configurator/env_params.py next to the logic it guards, looking the agent up via the Registry. Unknown agents are deferred to the dedicated agent-resolution error rather than masked here. Keep all public-facing comments, docstrings, and the error message generic (no internal agent names). Cover the full validator matrix, including the unknown-agent deferral.

Compress multi-line inline comments down to the single non-obvious rationale (or drop them where the code already speaks), per the self-documenting-code principle. Public API docstrings and test intent comments are left intact.

apply_params_set overlays sampled scalar draws onto cmd_args, then reconstructs the TestDefinition to validate the applied action values. That pass re-ran validate_env_params, which rejects a weighted env_param whose cmd_args target is no longer a candidate list - exactly what the overlay produces. env_params is already validated at parse time, so drop it from the validation-only dump. Adds a regression test covering a weighted env_param's scalar draw.

GymnasiumAdapter._decode_continuous clamps to [low, high] then rounds, so a dtype="int" space with non-integer bounds (e.g. [0.2, 0.8]) can decode to a value outside the declared range (1). Validate that int-quantized spaces declare integer bounds (CodeRabbit, NVIDIA#932/NVIDIA#927).

…split A bare class name (no module) made rsplit('.', 1) return one element and raise an opaque unpacking ValueError; fail fast with a clear message. Addresses CodeRabbit finding on NVIDIA#932.

rutayan-nv · 2026-06-25T21:27:56Z

Parking this PR for now. Per a change in release plan, this work isn't needed for the upcoming aiconfigurator release, so I'm moving it to draft to keep it out of the review queue. @podkidyshev — no need to review this one yet; I'll mark it ready again when it's scheduled for release. Thanks!

…split A bare class name (no module) made rsplit('.', 1) return one element and raise an opaque unpacking ValueError; fail fast with a clear message. Addresses CodeRabbit finding on NVIDIA#932.

An env_params entry only reclassifies a list-valued cmd_args sweep as env-sampled; a scalar is already fixed, so annotating it is a meaningless label. Previously such an annotation was tolerated as a silent no-op, which let it slip through parse-time validation and inconsistently trip (or not) the downstream "no agent will sample them" check depending on run mode. Reject it where the contract lives - TestDefinition.validate_env_params - so the failure is immediate and mode-independent. EnvParams.from_test's non-list guard becomes defensive (parse-time now guarantees lists); the post-overlay path already drops env_params before re-validating, so concrete scalar draws are unaffected. Extract the per-field checks into a helper to keep the validator under the complexity limit, and update tests: scalar annotations now assert rejection instead of no-op tolerance.

…otocol Add ObsLeafDescriptor (a self-describing observation leaf: "box" of width dim, or "discrete" of size n) and a StructuredObservation Protocol that documents the optional env hooks structured_observation_descriptors() and encode_observation(). These let an env expose a named, per-leaf observation so adapters (e.g. GymnasiumAdapter) can build the matching gymnasium spaces.Dict; the hooks are duck-typed, so envs need not subclass. Both exported via cloudai.core.

…ejection tests Negative tests pass an extra kwarg and an out-of-Literal kind to assert ValidationError; mark the deliberate type violations with type: ignore.

GymnasiumAdapter._decode_continuous clamps to [low, high] then rounds, so a dtype="int" space with non-integer bounds (e.g. [0.2, 0.8]) can decode to a value outside the declared range (1). Validate that int-quantized spaces declare integer bounds (CodeRabbit, NVIDIA#932/NVIDIA#927).

Wrap a CloudAI BaseGym as a gymnasium.Env-shaped object: a spaces.Dict of Discrete (list params) and Box (ContinuousSpace) actions over the tunable params with fixed (single-value) params injected each step; observations as either a flat float32 Box or, when the env opts in via the structured-obs hooks, a spaces.Dict of per-leaf ObsLeafDescriptor subspaces. Continuous dtype="int" params are quantized (rounded/clamped) at decode_action so the trajectory cache key collapses float jitter. The adapter is a pure pass-through over test_run.step (never mutates it), so contextual-bandit rollouts that reset() per trial keep a monotonic trial index. gymnasium is an optional dependency lazy-imported behind the new [rl] extra (also added to dev); CloudAIGymEnv.define_observation_space() now returns one slot per agent metric so adapters get the right Box shape. Exported via cloudai.core. Caller-contract tests pin the step-monotonicity, observation pass-through, continuous-quantization, and structured-obs invariants.

…ature step() delegates to decode_action(dict[str, Any]) and exists precisely to round float/continuous policy actions to ints; widen its parameter type from dict[str, int] to dict[str, Any] to match.

…reserve traceback on DSE re-raise - _as_obs_array(): assert encoded keys match descriptors before coercion (reuses _assert_keys, same guard as decode_action/step_raw) and materialize output by descriptor keys to avoid KeyError on extra keys and silent partial observations on missing keys. - handlers.py: re-raise the captured hard-fail with its original traceback. Addresses CodeRabbit findings on NVIDIA#930.

…ngleton Replace the bespoke _import_gymnasium() in-method seam with the canonical lazy.gymnasium / lazy.np properties; addresses the in-method-import review concern. No behavior change — gymnasium stays an optional [rl] extra.

The lazy.gymnasium.spaces refactor gives the adapter precise gymnasium types instead of Any, which surfaced two latent issues the scoped pre-commit run missed: - pyright now sees Space[Any]/Dict in the adapter contract test, so concrete attribute access (.low/.high/.n/.spaces) is flagged. Narrow via local bindings + isinstance before access. - lazy_imports.py now has a 2026 commit in its history, so the ci_only copyright check requires the year range 2025-2026.

Inherit from gymnasium.Env (guarded import, falls back to object when the optional [rl] extra is absent) so ecosystem tooling that performs isinstance checks (e.g. Stable-Baselines3) accepts the adapter. - Use the TYPE_CHECKING import form so pyright sees a concrete base class while runtime keeps the optional-dependency fallback. - Drop ClassVar on metadata to match Env's attribute shape (noqa RUF012). - Rename the inner-env accessor unwrapped -> cloudai_env; gymnasium's Env.unwrapped (returns self) is the correct base-env semantics, and the old override returned a non-Env (BaseGym), which would mislead ecosystem code calling .unwrapped.

Inheriting gymnasium.Env widened the static type of action_space from the concrete spaces.Dict the adapter builds to the base spaces.Space, which has no __getitem__. Cast at the test call site to restore subspace indexing for pyright (runtime is unchanged; action_space is always a Dict).

…rse of decode_action decode_action had no public inverse, so consumers needing value->index encoding (e.g. RLlib warm-start / behavioral cloning) reached into the private _tunable_params dict. When ContinuousSpace support split that internal, those consumers broke with AttributeError. encode_action closes the contract: discrete values map to their candidate index, continuous values wrap into the clamped float32 Box array, so decode_action(encode_action(v)) == v for any native v. Adds round-trip contract tests pinning the invariant and rejection of non-candidate values / key mismatches.

…tinuousSpace The GymnasiumAdapter's continuous-action path depends on ContinuousSpace, which ships separately. Until then nothing constructs a ContinuousSpace, so the continuous branches here are unreachable and the only effect of the import is an ImportError at module load. Drop the continuous import, _continuous_params, the Box action mapping, and decode/encode continuous handling so the adapter builds and ships standalone over discrete + structured-observation support. The continuous support rejoins when ContinuousSpace lands.

Adds an optional online (live-RL) execution mode: instead of launching a workload through the runner, the env delegates each trial to an in-process GymServer. Mode is selected via an injected server or cmd_args.live_rl_mode (server built from cmd_args.env_class); the agent-facing interface is unchanged. Default runner-backed behavior is untouched.

…split A bare class name (no module) made rsplit('.', 1) return one element and raise an opaque unpacking ValueError; fail fast with a clear message. Addresses CodeRabbit finding on NVIDIA#932.

Online (live-RL) steps wrote trajectory rows from a per-instance counter that reset to 0 on every reset(). Under reset-per-episode Gymnasium/RLlib rollouts that collapsed online trajectory.csv rows back to step=1 each episode. Drive online steps from the monotonic test_run.step (incremented per step, never reset) so rows keep increasing across resets. Removes the now-redundant _online_step_count; render() and the trajectory writer both read test_run.step. Adds a regression test asserting steps stay 1,2,3 across a mid-rollout reset.

env_params_record_path was pinned to iteration_dir (runner.scenario_root), but online (live-RL) runs write trajectory.csv under test_run.output_path and have no runner-backed scenario_root. write_trajectory evaluates the record path unconditionally, so an online step hit scenario_root on a runner that does not expose it. Derive the path from trajectory_file_path so env.csv always sits next to trajectory.csv in both modes, with no extra branch in the write path.

rutayan-nv requested review from jeffnvidia, podkidyshev and srivatsankrishnan as code owners June 16, 2026 13:28

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/cloudai/configurator/cloudai_gym.py Outdated

Comment thread src/cloudai/configurator/env_params.py Outdated

Comment thread src/cloudai/configurator/gymnasium_adapter.py Outdated

Comment thread tests/test_env_params.py Outdated

Comment thread tests/test_env_params.py Outdated

rutayan-nv force-pushed the rpatro/gym-online-mode branch from aa38d4e to a219cbd Compare June 16, 2026 13:50

This was referenced Jun 16, 2026

[CLI/Core] Polymorphic agent dispatch + TestRun owns trial counter #933

Merged

feat(cli): route live_rl_mode jobs through the DSE handler #934

Draft

More Gym enhancements #884

Draft

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/cloudai/configurator/base_agent.py

Comment thread tests/test_action_space.py Outdated

rutayan-nv mentioned this pull request Jun 16, 2026

Gym enhancements #863

Draft

rutayan-nv force-pushed the rpatro/gym-online-mode branch from a219cbd to 0e80db2 Compare June 16, 2026 15:06

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/cloudai/_core/action_space.py Outdated

Comment thread tests/test_cloudaigym.py Outdated

rutayan-nv force-pushed the rpatro/gym-online-mode branch 2 times, most recently from 84ae7f1 to d654dfb Compare June 16, 2026 21:17

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

rutayan-nv force-pushed the rpatro/gym-online-mode branch 2 times, most recently from d9e8566 to e27b362 Compare June 16, 2026 22:15

rutayan-nv force-pushed the rpatro/gym-online-mode branch from e27b362 to 9f0652c Compare June 16, 2026 22:40

rutayan-nv force-pushed the rpatro/gym-online-mode branch from 9f0652c to fb1480c Compare June 17, 2026 18:33

rutayan-nv force-pushed the rpatro/gym-online-mode branch from fb1480c to 2794501 Compare June 17, 2026 20:28

rutayan-nv added 4 commits June 25, 2026 15:50

rutayan-nv force-pushed the rpatro/gym-online-mode branch from b79ff99 to f0aa596 Compare June 25, 2026 21:15

rutayan-nv marked this pull request as draft June 25, 2026 21:27

rutayan-nv force-pushed the rpatro/gym-online-mode branch from f0aa596 to be4b254 Compare June 25, 2026 22:17

rutayan-nv added 3 commits June 26, 2026 11:12

test(env-params): suppress pyright on intentional ObsLeafDescriptor r…

584f145

…ejection tests Negative tests pass an extra kwarg and an out-of-Literal kind to assert ValidationError; mark the deliberate type violations with type: ignore.

rutayan-nv added 14 commits June 26, 2026 11:41

fix(gymnasium-adapter): accept continuous actions in step() type sign…

6c2452f

…ature step() delegates to decode_action(dict[str, Any]) and exists precisely to round float/continuous policy actions to ints; widen its parameter type from dict[str, int] to dict[str, Any] to match.

style(configurator): ruff format gymnasium_adapter.py

2e2d431

fix(cloudai-gym): validate env_class is a dotted import path before r…

24de162

…split A bare class name (no module) made rsplit('.', 1) return one element and raise an opaque unpacking ValueError; fail fast with a clear message. Addresses CodeRabbit finding on NVIDIA#932.

rutayan-nv force-pushed the rpatro/gym-online-mode branch from be4b254 to ed3211c Compare June 26, 2026 15:41

Uh oh!

Conversation

rutayan-nv commented Jun 16, 2026

Issue

Fix

Testing

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rutayan-nv commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rutayan-nv commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

rutayan-nv commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading