Skip to content

feat(configurator): add online GymServer mode to CloudAIGymEnv#932

Draft
rutayan-nv wants to merge 26 commits into
NVIDIA:mainfrom
rutayan-nv:rpatro/gym-online-mode
Draft

feat(configurator): add online GymServer mode to CloudAIGymEnv#932
rutayan-nv wants to merge 26 commits into
NVIDIA:mainfrom
rutayan-nv:rpatro/gym-online-mode

Conversation

@rutayan-nv

Copy link
Copy Markdown
Contributor

Issue

CloudAIGymEnv can only run trials by launching workloads through the runner — there is no path for fast, in-process online / live-RL training (e.g. kvpilot).

Fix

Add an optional online mode: the env delegates reset/step/action+observation spaces to an in-process GymServer instead of the runner. Mode is selected by an injected server or cmd_args.live_rl_mode=true (server built from cmd_args.env_class); the agent-facing interface and default runner-backed behavior are unchanged.

Testing

pytest tests/test_cloudaigym.py — existing offline tests stay green; 6 new tests cover delegation, trajectory output, env_class import/kwarg-filtering, missing-env_class error, and live_rl_mode auto-detection. ruff/pyright/vulture/import-linter clean.

Stacked on #930 (review after it merges; the diff narrows to this commit once parents land). Extracted from the genuinely-unique online-gym piece of #884.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 68808524-0a9e-4cc3-b4da-5a95fc8e1ab0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a live-RL execution path to CloudAI DSE: a ContinuousSpace Pydantic model for continuous action parameters, an env-params domain-randomization system (EnvParamSpec, EnvParamsSampler, CsvSink, EnvParamsObserver), an online GymServer protocol integrated into CloudAIGymEnv with (action, env_params)-keyed trajectory caching, a GymnasiumAdapter wrapping BaseGym to the gymnasium.Env interface, a concrete BaseAgent.run() loop, and a refactored DSE CLI handler that delegates to agent.run() with structured failure artifact reporting.

Changes

Live-RL and Gymnasium Integration

Layer / File(s) Summary
Action-space contracts, TestRun extensions, and workload env_params
pyproject.toml, src/cloudai/_core/action_space.py, src/cloudai/_core/test_scenario.py, src/cloudai/models/workload.py, tests/test_action_space.py, tests/test_test_scenario.py
ContinuousSpace Pydantic model with low/high/dtype and bound validation is introduced; TestRun gains current_env_params field and increment_step(); TestDefinition gains env_params: dict[str, EnvParamSpec]; gymnasium~=1.2 is added as a dev and rl optional dependency.
Env-params sampling, CSV persistence, and step observers
src/cloudai/configurator/env_params.py, tests/test_env_params.py
New env_params.py introduces EnvParamSpec and ObsLeafDescriptor schemas, StructuredObservation protocol, deterministic EnvParamsSampler keyed by (seed, name, trial), CsvSink for step-aligned env.csv writes, and EnvParamsObserver that samples and stores env params before each trial step.
CloudAIGymEnv: online GymServer mode and env_params trajectory caching
src/cloudai/configurator/cloudai_gym.py, tests/test_cloudaigym.py
Adds GymServer Protocol and _create_gym_server factory; extends CloudAIGymEnv to accept an optional server and auto-detect live_rl_mode; wires EnvParamsObserver via _build_observers(); routes reset()/step() through online (_online_step) vs offline paths; extends TrajectoryEntry with env_params; changes cache hits to require exact (action, env_params) match; adds integration tests for env-params cache semantics and online Gym-server delegation.
GymnasiumAdapter: BaseGym to gymnasium.Env interface
src/cloudai/configurator/gymnasium_adapter.py, tests/test_gymnasium_adapter_contract.py
New GymnasiumAdapter wraps BaseGym with lazy gymnasium import; builds action_space as spaces.Dict (discrete→Discrete, ContinuousSpaceBox, single-element fixed injection); builds observation_space from StructuredObservation descriptors or flat float32 Box; implements decode_action(), reset(), step(), step_raw() returning gymnasium 5-tuples with dtype-aware observation coercion.
BaseAgent.run() loop and DSE handler refactor with failure reporting
src/cloudai/configurator/base_agent.py, src/cloudai/cli/handlers.py, tests/test_handlers.py
BaseAgent.run() drives the reset→select_action→env.step→update_policy trial loop; handle_dse_job is refactored to call agent.run() per TestRun, accumulate non-zero return codes, and wrap in try/except that writes dse_failure.txt via _record_run_failure before re-raising hard failures.
Public API exports wiring
src/cloudai/configurator/__init__.py, src/cloudai/core.py
configurator/__init__.py exports GymServer and GymnasiumAdapter; core.py adds ContinuousSpace, GymServer, GymnasiumAdapter, ObsLeafDescriptor, and StructuredObservation to __all__.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

  • srivatsankrishnan
  • jeffnvidia
  • podkidyshev

Poem

🐇 Hopscotch through the gym, each trial a leap,
Rewards cascade in trajectory.csv deep,
env_params sampled with (seed, name, trial) care,
The adapter bridges gymnasium's air.
On hard-fail? A dse_failure.txt I write—
Then reraise, for the truth must stay in sight! 🌟

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding online GymServer mode to CloudAIGymEnv for live-RL training, directly aligned with the PR's primary objective.
Description check ✅ Passed The description comprehensively explains the issue (offline-only execution), the solution (online mode with optional GymServer), and testing approach. It directly relates to the changeset which implements this online mode.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 86-87: The code at the rsplit call for env_class_path assumes the
input contains at least one dot, but if env_class_path is a bare class name
without a module path (e.g., "FakeGymServer"), rsplit returns a single-element
list causing a ValueError on unpacking. Add validation before the rsplit call to
check that env_class_path contains a dot separator; if it does not, either raise
a descriptive ValueError explaining the required format or handle it
appropriately. The validation should occur before attempting to unpack the
rsplit result into module_path and class_name.

In `@src/cloudai/configurator/env_params.py`:
- Line 2: The copyright year in the header at the top of env_params.py uses a
range format "2024-2026" which violates the repository's copyright convention
enforced by the copyright header test. Since this is a new file only touched in
2026, update the copyright line from "# Copyright (c) 2024-2026 NVIDIA
CORPORATION & AFFILIATES. All rights reserved." to use a single year format: "#
Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved." This
aligns with the repository's requirement that new files use only the current
year, not a year range.

In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Line 206: The step() method's action parameter type annotation is too
restrictive and does not match the actual types it receives. In
src/cloudai/configurator/gymnasium_adapter.py at line 206, change the step()
method signature from action: dict[str, int] to action: dict[str, Any] to
accommodate both integer and numpy array values for continuous actions, aligning
with what decode_action already handles. In
tests/test_gymnasium_adapter_contract.py at line 345, no direct change is
needed; once the step() signature is corrected to accept dict[str, Any], the
pyright type error in the test will automatically resolve since the test is
passing dict[str, NDArray | int] which matches the widened contract.

In `@tests/test_env_params.py`:
- Line 2: The copyright header in tests/test_env_params.py uses a year range
"2025-2026" which violates the repository's copyright check convention enforced
by tests/test_check_copyright_headers.py. For new files touched only in 2026,
update the copyright line from "# Copyright (c) 2025-2026 NVIDIA CORPORATION &
AFFILIATES. All rights reserved." to "# Copyright (c) 2026 NVIDIA CORPORATION &
AFFILIATES. All rights reserved." (single year only, not a range).
- Around line 87-88: The pytest.raises(ValueError) context manager on line 87
lacks a match parameter, making the test less specific. Add a match parameter to
pytest.raises(ValueError) with a regex pattern that matches the expected error
message when sink.write is called with step=0 (the invalid parameter). This
clarifies the test intent and ensures the correct error message is being raised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 04f44c70-d208-4bf7-a8c8-d023090d11a1

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0e8cc and aa38d4e.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • pyproject.toml
  • src/cloudai/_core/action_space.py
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • src/cloudai/models/workload.py
  • tests/test_action_space.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter_contract.py
  • tests/test_handlers.py
  • tests/test_test_scenario.py

Comment thread src/cloudai/configurator/cloudai_gym.py Outdated
Comment thread src/cloudai/configurator/env_params.py Outdated
Comment thread src/cloudai/configurator/gymnasium_adapter.py Outdated
Comment thread tests/test_env_params.py Outdated
Comment thread tests/test_env_params.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)

237-263: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep env.csv and trajectory.csv aligned on constraint failures.

observer.before_step() runs at Lines 237-238 and can append an env.csv row, but the constraint-failure early return at Lines 260-263 skips trajectory write/observer post-hook. That leaves a step in env.csv with no corresponding trajectory.csv row.

Suggested fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0]
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(getattr(self.test_run, "current_env_params", {}) or {}),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 237 - 263, The
constraint failure early return at lines 260-263 skips both the trajectory write
and observer post-hook, but observer.before_step() was already called and may
have written to env.csv. Add a call to self.write_trajectory() with a
TrajectoryEntry and then call observer.after_step() for each observer before the
early return in the constraint check failure path, using appropriate values
(like the constraint_failure reward) to keep env.csv and trajectory.csv aligned,
similar to how the cached result path handles it at lines 250-257.
♻️ Duplicate comments (2)
src/cloudai/configurator/gymnasium_adapter.py (1)

206-206: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Widen step() action type to match continuous-action payloads.

At Line 206, dict[str, int] is too narrow for Box actions (np.ndarray leaves). This is the source of the current pyright failure and mismatches decode_action() behavior.

#!/bin/bash
# Verify root-cause annotation and failing call shape.
rg -nP 'def step\(self, action: dict\[str, int\]\)' src/cloudai/configurator/gymnasium_adapter.py
rg -nP 'adapter\.step\(\{"threshold": np\.array\(' tests/test_gymnasium_adapter_contract.py -C2
Suggested fix
-    def step(self, action: dict[str, int]) -> tuple[Any, float, bool, bool, dict[str, Any]]:
+    def step(self, action: dict[str, Any]) -> tuple[Any, float, bool, bool, dict[str, Any]]:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/gymnasium_adapter.py` at line 206, The `step()`
method in gymnasium_adapter.py has an overly restrictive type annotation for the
action parameter. Change the action parameter type from `dict[str, int]` to a
wider union type that accommodates both integer actions and numpy array values
(used by continuous Box actions). This will align the type annotation with what
`decode_action()` actually accepts and resolve the pyright type mismatch
failure.

Source: Pipeline failures

src/cloudai/configurator/cloudai_gym.py (1)

86-87: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate env_class format before unpacking.

At Line 86, rsplit(".", 1) still assumes a dotted import path and can raise an opaque unpacking ValueError for malformed values (e.g., "FakeGymServer"). Please fail with a descriptive error first.

Suggested fix
+    if "." not in str(env_class_path):
+        raise ValueError(
+            f"env_class must be a dotted import path like 'pkg.module.Class'; got {env_class_path!r}"
+        )
     module_path, class_name = str(env_class_path).rsplit(".", 1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 86 - 87, The code at
the rsplit(".", 1) operation in the module imports section assumes a valid
dotted import path format and will raise an opaque unpacking ValueError if the
env_class_path doesn't contain a dot (e.g., "FakeGymServer"). Add validation
before the rsplit to check that the env_class_path string contains at least one
dot separator, and raise a descriptive ValueError with a clear error message if
the format is invalid, then proceed with the rsplit and unpacking only after
validation passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/base_agent.py`:
- Around line 149-161: The loop in the base_agent.py file continues iterating
through all self.max_steps regardless of the terminal state flag returned from
env.step(). Add a break statement in the loop after the update_policy() call to
exit when done=True, preventing the execution of unintended post-terminal steps
that violate proper episode semantics. The done flag is already being captured
from env.step() and passed to update_policy(), so you only need to check its
value and break the loop when it becomes True.

In `@tests/test_action_space.py`:
- Around line 43-50: Replace direct constructor calls with model_validate()
method calls to defer validation to runtime. In tests/test_action_space.py at
lines 43-50, update test_continuous_space_rejects_unknown_dtype() to use
ContinuousSpace.model_validate({"low": 0.0, "high": 1.0, "dtype": "double"})
instead of the constructor call, and update
test_continuous_space_forbids_extra_fields() to use
ContinuousSpace.model_validate({"low": 0.0, "high": 1.0, "step": 0.1}) instead
of the constructor call. In tests/test_env_params.py at lines 145-149, similarly
replace ObsLeafDescriptor constructor calls with
ObsLeafDescriptor.model_validate({...}) for both test cases that pass invalid
arguments (the case with extra field unexpected=1 and the case with invalid
kind="categorical"). This defers type validation to runtime so
pytest.raises(ValidationError) can catch the errors instead of pyright rejecting
them during static type checking.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 237-263: The constraint failure early return at lines 260-263
skips both the trajectory write and observer post-hook, but
observer.before_step() was already called and may have written to env.csv. Add a
call to self.write_trajectory() with a TrajectoryEntry and then call
observer.after_step() for each observer before the early return in the
constraint check failure path, using appropriate values (like the
constraint_failure reward) to keep env.csv and trajectory.csv aligned, similar
to how the cached result path handles it at lines 250-257.

---

Duplicate comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 86-87: The code at the rsplit(".", 1) operation in the module
imports section assumes a valid dotted import path format and will raise an
opaque unpacking ValueError if the env_class_path doesn't contain a dot (e.g.,
"FakeGymServer"). Add validation before the rsplit to check that the
env_class_path string contains at least one dot separator, and raise a
descriptive ValueError with a clear error message if the format is invalid, then
proceed with the rsplit and unpacking only after validation passes.

In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Line 206: The `step()` method in gymnasium_adapter.py has an overly
restrictive type annotation for the action parameter. Change the action
parameter type from `dict[str, int]` to a wider union type that accommodates
both integer actions and numpy array values (used by continuous Box actions).
This will align the type annotation with what `decode_action()` actually accepts
and resolve the pyright type mismatch failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 8bc0b625-bfb7-44c2-acc4-7100e48ae965

📥 Commits

Reviewing files that changed from the base of the PR and between aa38d4e and a219cbd.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • pyproject.toml
  • src/cloudai/_core/action_space.py
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • src/cloudai/models/workload.py
  • tests/test_action_space.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter_contract.py
  • tests/test_handlers.py
  • tests/test_test_scenario.py

Comment thread src/cloudai/configurator/base_agent.py
Comment thread tests/test_action_space.py Outdated
@rutayan-nv rutayan-nv mentioned this pull request Jun 16, 2026
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@rutayan-nv rutayan-nv force-pushed the rpatro/gym-online-mode branch from a219cbd to 0e80db2 Compare June 16, 2026 15:06

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)

237-263: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constraint-failure path can desynchronize env.csv and trajectory.csv.

observer.before_step(...) runs before constraint validation, so env params are persisted for the step, but the early return on constraint failure skips trajectory write. That breaks the step-alignment contract for downstream joins.

💡 Minimal ordering fix
-        for observer in self.observers:
-            observer.before_step(self.test_run)
-
-        cached_result = self.get_cached_trajectory_result(action)
-        if cached_result is not None:
+        if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
+            logging.info("Constraint check failed. Skipping step.")
+            return [-1.0], self.rewards.constraint_failure, True, {}
+
+        for observer in self.observers:
+            observer.before_step(self.test_run)
+
+        cached_result = self.get_cached_trajectory_result(action)
+        if cached_result is not None:
             ...
-
-        if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
-            logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 237 - 263, The
constraint check must be moved to occur before observer.before_step() is called.
Currently, observer.before_step() is invoked first (which persists env params),
then if the constraint check fails, the code returns early without writing a
trajectory entry, causing misalignment between env.csv and trajectory.csv.
Reorder the code so the constraint check using
self.test_run.test.constraint_check() happens before any observer calls,
ensuring that env params are only persisted for steps that actually produce
trajectory entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/_core/action_space.py`:
- Around line 47-55: The ContinuousSpace class with dtype="int" can produce
out-of-range values during rounding if the bounds are non-integers. Add
validation in the _validate_bounds method to enforce that when dtype="int", both
self.low and self.high must be integers (check that they equal their integer
conversions). This invariant ensures that rounding operations in downstream code
cannot exceed the intended bounds.

In `@tests/test_cloudaigym.py`:
- Around line 568-680: Add a new regression test function in
tests/test_cloudaigym.py that covers the constraint_check failure path when
env_params is declared. The test should create a CloudAIGymEnv with env_params
specified on the TestDefinition, mock the constraint_check method to return
False or raise an exception, call env.step() with an action, and then assert
that env.csv and trajectory.csv remain step-aligned by verifying that both files
either skip the step or both record it consistently. This ensures the failure
path maintains the corpus-friendly merge contract mentioned in the existing
tests.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 237-263: The constraint check must be moved to occur before
observer.before_step() is called. Currently, observer.before_step() is invoked
first (which persists env params), then if the constraint check fails, the code
returns early without writing a trajectory entry, causing misalignment between
env.csv and trajectory.csv. Reorder the code so the constraint check using
self.test_run.test.constraint_check() happens before any observer calls,
ensuring that env params are only persisted for steps that actually produce
trajectory entries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 5f32c9ba-c452-4bfd-94aa-7ec4d8f6b692

📥 Commits

Reviewing files that changed from the base of the PR and between a219cbd and 0e80db2.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • pyproject.toml
  • src/cloudai/_core/action_space.py
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • src/cloudai/models/workload.py
  • tests/test_action_space.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter_contract.py

Comment thread src/cloudai/_core/action_space.py Outdated
Comment thread tests/test_cloudaigym.py Outdated
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 16, 2026
…split

A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.

Addresses CodeRabbit finding on NVIDIA#932.
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-online-mode branch 2 times, most recently from 84ae7f1 to d654dfb Compare June 16, 2026 21:17

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/cloudai/_core/action_space.py (1)

49-55: ⚠️ Potential issue | 🟠 Major

Enforce integer-compatible bounds when dtype="int".

Line 49 allows dtype="int", but Line 52-55 only validates ordering. This still permits non-integer bounds, and downstream decode (clamp then round) can emit values outside the declared interval.

🔧 Proposed fix
     `@model_validator`(mode="after")
     def _validate_bounds(self) -> Self:
         if self.low >= self.high:
             raise ValueError(f"ContinuousSpace requires low < high; got low={self.low}, high={self.high}")
+        if self.dtype == "int" and (not self.low.is_integer() or not self.high.is_integer()):
+            raise ValueError(
+                f"ContinuousSpace(dtype='int') requires integer bounds; got low={self.low}, high={self.high}"
+            )
         return self
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/_core/action_space.py` around lines 49 - 55, The _validate_bounds
method in ContinuousSpace only validates that low < high, but does not enforce
that low and high are integers when dtype="int". Add an additional validation
check in the _validate_bounds method that raises a ValueError if dtype is set to
"int" but either self.low or self.high are not integers or are not
integer-compatible values. This validation should occur alongside the existing
bounds ordering check to prevent non-integer bounds from being allowed when
integer dtype is specified.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/cloudai/_core/action_space.py`:
- Around line 49-55: The _validate_bounds method in ContinuousSpace only
validates that low < high, but does not enforce that low and high are integers
when dtype="int". Add an additional validation check in the _validate_bounds
method that raises a ValueError if dtype is set to "int" but either self.low or
self.high are not integers or are not integer-compatible values. This validation
should occur alongside the existing bounds ordering check to prevent non-integer
bounds from being allowed when integer dtype is specified.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 32b71de6-445d-4439-89d6-b07b150e0185

📥 Commits

Reviewing files that changed from the base of the PR and between 84ae7f1 and d654dfb.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • pyproject.toml
  • src/cloudai/_core/action_space.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • tests/test_action_space.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter_contract.py
  • tests/test_handlers.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/cloudai/_core/action_space.py (1)

49-55: ⚠️ Potential issue | 🟠 Major

Enforce integer-compatible bounds when dtype="int".

Line 49 allows dtype="int", but Line 52-55 only validates ordering. This still permits non-integer bounds, and downstream decode (clamp then round) can emit values outside the declared interval.

🔧 Proposed fix
     `@model_validator`(mode="after")
     def _validate_bounds(self) -> Self:
         if self.low >= self.high:
             raise ValueError(f"ContinuousSpace requires low < high; got low={self.low}, high={self.high}")
+        if self.dtype == "int" and (not self.low.is_integer() or not self.high.is_integer()):
+            raise ValueError(
+                f"ContinuousSpace(dtype='int') requires integer bounds; got low={self.low}, high={self.high}"
+            )
         return self
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/_core/action_space.py` around lines 49 - 55, The _validate_bounds
method in ContinuousSpace only validates that low < high, but does not enforce
that low and high are integers when dtype="int". Add an additional validation
check in the _validate_bounds method that raises a ValueError if dtype is set to
"int" but either self.low or self.high are not integers or are not
integer-compatible values. This validation should occur alongside the existing
bounds ordering check to prevent non-integer bounds from being allowed when
integer dtype is specified.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/cloudai/_core/action_space.py`:
- Around line 49-55: The _validate_bounds method in ContinuousSpace only
validates that low < high, but does not enforce that low and high are integers
when dtype="int". Add an additional validation check in the _validate_bounds
method that raises a ValueError if dtype is set to "int" but either self.low or
self.high are not integers or are not integer-compatible values. This validation
should occur alongside the existing bounds ordering check to prevent non-integer
bounds from being allowed when integer dtype is specified.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 32b71de6-445d-4439-89d6-b07b150e0185

📥 Commits

Reviewing files that changed from the base of the PR and between 84ae7f1 and d654dfb.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • pyproject.toml
  • src/cloudai/_core/action_space.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • tests/test_action_space.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter_contract.py
  • tests/test_handlers.py

@rutayan-nv rutayan-nv force-pushed the rpatro/gym-online-mode branch 2 times, most recently from d9e8566 to e27b362 Compare June 16, 2026 22:15
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 16, 2026
…split

A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.

Addresses CodeRabbit finding on NVIDIA#932.
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-online-mode branch from e27b362 to 9f0652c Compare June 16, 2026 22:40
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 17, 2026
…split

A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.

Addresses CodeRabbit finding on NVIDIA#932.
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-online-mode branch from 9f0652c to fb1480c Compare June 17, 2026 18:33
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 17, 2026
…split

A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.

Addresses CodeRabbit finding on NVIDIA#932.
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-online-mode branch from fb1480c to 2794501 Compare June 17, 2026 20:28
Drop the EnvParamsSink Protocol + CsvSink pair (and runtime_checkable) for a
single concrete EnvParamsSink, built unconditionally in CloudAIGymEnv. The
sink is now stateless: write() takes the record path per call and skips empty
samples, so non-DR runs write nothing and write_trajectory needs no branch.

Derive both records from a new iteration_dir property and expose the env
record via the env_params_record_path property (was _env_csv_path), keeping
env.csv and trajectory.csv step-aligned without coupling the name to CSV.
…ty flag

Replace the hardcoded `agent == "grid_search"` check with a BaseAgent.samples_env_params
capability flag (opt-in, defaults False). Only agents whose search consumes per-trial
env_params sampling set it True; enumerating/surrogate agents leave it False, so a config
that declares env_params for an agent that would ignore them is rejected up front instead
of silently no-op'ing. New agents answer for themselves with no string to maintain.

Relocate validate_dse_env_params out of the CLI handlers into configurator/env_params.py
next to the logic it guards, looking the agent up via the Registry. Unknown agents are
deferred to the dedicated agent-resolution error rather than masked here.

Keep all public-facing comments, docstrings, and the error message generic (no internal
agent names). Cover the full validator matrix, including the unknown-agent deferral.
Compress multi-line inline comments down to the single non-obvious rationale (or drop
them where the code already speaks), per the self-documenting-code principle. Public API
docstrings and test intent comments are left intact.
apply_params_set overlays sampled scalar draws onto cmd_args, then
reconstructs the TestDefinition to validate the applied action values.
That pass re-ran validate_env_params, which rejects a weighted env_param
whose cmd_args target is no longer a candidate list - exactly what the
overlay produces. env_params is already validated at parse time, so drop
it from the validation-only dump. Adds a regression test covering a
weighted env_param's scalar draw.
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 25, 2026
GymnasiumAdapter._decode_continuous clamps to [low, high] then rounds, so
a dtype="int" space with non-integer bounds (e.g. [0.2, 0.8]) can decode
to a value outside the declared range (1). Validate that int-quantized
spaces declare integer bounds (CodeRabbit, NVIDIA#932/NVIDIA#927).
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 25, 2026
…split

A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.

Addresses CodeRabbit finding on NVIDIA#932.
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-online-mode branch from b79ff99 to f0aa596 Compare June 25, 2026 21:15
@rutayan-nv rutayan-nv marked this pull request as draft June 25, 2026 21:27
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

Parking this PR for now. Per a change in release plan, this work isn't needed for the upcoming aiconfigurator release, so I'm moving it to draft to keep it out of the review queue. @podkidyshev — no need to review this one yet; I'll mark it ready again when it's scheduled for release. Thanks!

rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 25, 2026
…split

A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.

Addresses CodeRabbit finding on NVIDIA#932.
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-online-mode branch from f0aa596 to be4b254 Compare June 25, 2026 22:17
An env_params entry only reclassifies a list-valued cmd_args sweep as
env-sampled; a scalar is already fixed, so annotating it is a meaningless
label. Previously such an annotation was tolerated as a silent no-op, which
let it slip through parse-time validation and inconsistently trip (or not)
the downstream "no agent will sample them" check depending on run mode.

Reject it where the contract lives - TestDefinition.validate_env_params -
so the failure is immediate and mode-independent. EnvParams.from_test's
non-list guard becomes defensive (parse-time now guarantees lists); the
post-overlay path already drops env_params before re-validating, so concrete
scalar draws are unaffected.

Extract the per-field checks into a helper to keep the validator under the
complexity limit, and update tests: scalar annotations now assert rejection
instead of no-op tolerance.
…otocol

Add ObsLeafDescriptor (a self-describing observation leaf: "box" of width
dim, or "discrete" of size n) and a StructuredObservation Protocol that
documents the optional env hooks structured_observation_descriptors() and
encode_observation(). These let an env expose a named, per-leaf observation
so adapters (e.g. GymnasiumAdapter) can build the matching gymnasium
spaces.Dict; the hooks are duck-typed, so envs need not subclass.

Both exported via cloudai.core.
…ejection tests

Negative tests pass an extra kwarg and an out-of-Literal kind to assert
ValidationError; mark the deliberate type violations with type: ignore.
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 26, 2026
GymnasiumAdapter._decode_continuous clamps to [low, high] then rounds, so
a dtype="int" space with non-integer bounds (e.g. [0.2, 0.8]) can decode
to a value outside the declared range (1). Validate that int-quantized
spaces declare integer bounds (CodeRabbit, NVIDIA#932/NVIDIA#927).
Wrap a CloudAI BaseGym as a gymnasium.Env-shaped object: a spaces.Dict of
Discrete (list params) and Box (ContinuousSpace) actions over the tunable
params with fixed (single-value) params injected each step; observations as
either a flat float32 Box or, when the env opts in via the structured-obs
hooks, a spaces.Dict of per-leaf ObsLeafDescriptor subspaces. Continuous
dtype="int" params are quantized (rounded/clamped) at decode_action so the
trajectory cache key collapses float jitter. The adapter is a pure
pass-through over test_run.step (never mutates it), so contextual-bandit
rollouts that reset() per trial keep a monotonic trial index.

gymnasium is an optional dependency lazy-imported behind the new [rl] extra
(also added to dev); CloudAIGymEnv.define_observation_space() now returns one
slot per agent metric so adapters get the right Box shape. Exported via
cloudai.core. Caller-contract tests pin the step-monotonicity, observation
pass-through, continuous-quantization, and structured-obs invariants.
…ature

step() delegates to decode_action(dict[str, Any]) and exists precisely to
round float/continuous policy actions to ints; widen its parameter type
from dict[str, int] to dict[str, Any] to match.
…reserve traceback on DSE re-raise

- _as_obs_array(): assert encoded keys match descriptors before coercion
  (reuses _assert_keys, same guard as decode_action/step_raw) and
  materialize output by descriptor keys to avoid KeyError on extra keys
  and silent partial observations on missing keys.
- handlers.py: re-raise the captured hard-fail with its original traceback.

Addresses CodeRabbit findings on NVIDIA#930.
…ngleton

Replace the bespoke _import_gymnasium() in-method seam with the canonical
lazy.gymnasium / lazy.np properties; addresses the in-method-import review
concern. No behavior change — gymnasium stays an optional [rl] extra.
The lazy.gymnasium.spaces refactor gives the adapter precise gymnasium types
instead of Any, which surfaced two latent issues the scoped pre-commit run
missed:

- pyright now sees Space[Any]/Dict in the adapter contract test, so concrete
  attribute access (.low/.high/.n/.spaces) is flagged. Narrow via local
  bindings + isinstance before access.
- lazy_imports.py now has a 2026 commit in its history, so the ci_only
  copyright check requires the year range 2025-2026.
Inherit from gymnasium.Env (guarded import, falls back to object when the
optional [rl] extra is absent) so ecosystem tooling that performs isinstance
checks (e.g. Stable-Baselines3) accepts the adapter.

- Use the TYPE_CHECKING import form so pyright sees a concrete base class
  while runtime keeps the optional-dependency fallback.
- Drop ClassVar on metadata to match Env's attribute shape (noqa RUF012).
- Rename the inner-env accessor unwrapped -> cloudai_env; gymnasium's
  Env.unwrapped (returns self) is the correct base-env semantics, and the
  old override returned a non-Env (BaseGym), which would mislead ecosystem
  code calling .unwrapped.
Inheriting gymnasium.Env widened the static type of action_space from the
concrete spaces.Dict the adapter builds to the base spaces.Space, which has
no __getitem__. Cast at the test call site to restore subspace indexing for
pyright (runtime is unchanged; action_space is always a Dict).
…rse of decode_action

decode_action had no public inverse, so consumers needing value->index
encoding (e.g. RLlib warm-start / behavioral cloning) reached into the
private _tunable_params dict. When ContinuousSpace support split that
internal, those consumers broke with AttributeError.

encode_action closes the contract: discrete values map to their candidate
index, continuous values wrap into the clamped float32 Box array, so
decode_action(encode_action(v)) == v for any native v. Adds round-trip
contract tests pinning the invariant and rejection of non-candidate
values / key mismatches.
…tinuousSpace

The GymnasiumAdapter's continuous-action path depends on ContinuousSpace, which
ships separately. Until then nothing constructs a ContinuousSpace, so the
continuous branches here are unreachable and the only effect of the import is an
ImportError at module load. Drop the continuous import, _continuous_params, the
Box action mapping, and decode/encode continuous handling so the adapter builds
and ships standalone over discrete + structured-observation support. The
continuous support rejoins when ContinuousSpace lands.
Adds an optional online (live-RL) execution mode: instead of launching a
workload through the runner, the env delegates each trial to an in-process
GymServer. Mode is selected via an injected server or cmd_args.live_rl_mode
(server built from cmd_args.env_class); the agent-facing interface is
unchanged. Default runner-backed behavior is untouched.
…split

A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.

Addresses CodeRabbit finding on NVIDIA#932.
Online (live-RL) steps wrote trajectory rows from a per-instance counter
that reset to 0 on every reset(). Under reset-per-episode Gymnasium/RLlib
rollouts that collapsed online trajectory.csv rows back to step=1 each
episode. Drive online steps from the monotonic test_run.step (incremented
per step, never reset) so rows keep increasing across resets.

Removes the now-redundant _online_step_count; render() and the trajectory
writer both read test_run.step. Adds a regression test asserting steps stay
1,2,3 across a mid-rollout reset.
env_params_record_path was pinned to iteration_dir (runner.scenario_root),
but online (live-RL) runs write trajectory.csv under test_run.output_path
and have no runner-backed scenario_root. write_trajectory evaluates the
record path unconditionally, so an online step hit scenario_root on a
runner that does not expose it. Derive the path from trajectory_file_path
so env.csv always sits next to trajectory.csv in both modes, with no extra
branch in the write path.
@rutayan-nv rutayan-nv force-pushed the rpatro/gym-online-mode branch from be4b254 to ed3211c Compare June 26, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant