feat(configurator): add online GymServer mode to CloudAIGymEnv#932
feat(configurator): add online GymServer mode to CloudAIGymEnv#932rutayan-nv wants to merge 26 commits into
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a live-RL execution path to CloudAI DSE: a ChangesLive-RL and Gymnasium Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 86-87: The code at the rsplit call for env_class_path assumes the
input contains at least one dot, but if env_class_path is a bare class name
without a module path (e.g., "FakeGymServer"), rsplit returns a single-element
list causing a ValueError on unpacking. Add validation before the rsplit call to
check that env_class_path contains a dot separator; if it does not, either raise
a descriptive ValueError explaining the required format or handle it
appropriately. The validation should occur before attempting to unpack the
rsplit result into module_path and class_name.
In `@src/cloudai/configurator/env_params.py`:
- Line 2: The copyright year in the header at the top of env_params.py uses a
range format "2024-2026" which violates the repository's copyright convention
enforced by the copyright header test. Since this is a new file only touched in
2026, update the copyright line from "# Copyright (c) 2024-2026 NVIDIA
CORPORATION & AFFILIATES. All rights reserved." to use a single year format: "#
Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved." This
aligns with the repository's requirement that new files use only the current
year, not a year range.
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Line 206: The step() method's action parameter type annotation is too
restrictive and does not match the actual types it receives. In
src/cloudai/configurator/gymnasium_adapter.py at line 206, change the step()
method signature from action: dict[str, int] to action: dict[str, Any] to
accommodate both integer and numpy array values for continuous actions, aligning
with what decode_action already handles. In
tests/test_gymnasium_adapter_contract.py at line 345, no direct change is
needed; once the step() signature is corrected to accept dict[str, Any], the
pyright type error in the test will automatically resolve since the test is
passing dict[str, NDArray | int] which matches the widened contract.
In `@tests/test_env_params.py`:
- Line 2: The copyright header in tests/test_env_params.py uses a year range
"2025-2026" which violates the repository's copyright check convention enforced
by tests/test_check_copyright_headers.py. For new files touched only in 2026,
update the copyright line from "# Copyright (c) 2025-2026 NVIDIA CORPORATION &
AFFILIATES. All rights reserved." to "# Copyright (c) 2026 NVIDIA CORPORATION &
AFFILIATES. All rights reserved." (single year only, not a range).
- Around line 87-88: The pytest.raises(ValueError) context manager on line 87
lacks a match parameter, making the test less specific. Add a match parameter to
pytest.raises(ValueError) with a regex pattern that matches the expected error
message when sink.write is called with step=0 (the invalid parameter). This
clarifies the test intent and ensures the correct error message is being raised.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 04f44c70-d208-4bf7-a8c8-d023090d11a1
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (17)
pyproject.tomlsrc/cloudai/_core/action_space.pysrc/cloudai/_core/test_scenario.pysrc/cloudai/cli/handlers.pysrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/base_agent.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/env_params.pysrc/cloudai/configurator/gymnasium_adapter.pysrc/cloudai/core.pysrc/cloudai/models/workload.pytests/test_action_space.pytests/test_cloudaigym.pytests/test_env_params.pytests/test_gymnasium_adapter_contract.pytests/test_handlers.pytests/test_test_scenario.py
aa38d4e to
a219cbd
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)
237-263:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep
env.csvandtrajectory.csvaligned on constraint failures.
observer.before_step()runs at Lines 237-238 and can append anenv.csvrow, but the constraint-failure early return at Lines 260-263 skips trajectory write/observer post-hook. That leaves a step inenv.csvwith no correspondingtrajectory.csvrow.Suggested fix
if not self.test_run.test.constraint_check(self.test_run, self.runner.system): logging.info("Constraint check failed. Skipping step.") - return [-1.0], self.rewards.constraint_failure, True, {} + observation = [-1.0] + reward = self.rewards.constraint_failure + self.write_trajectory( + TrajectoryEntry( + step=self.test_run.step, + action=action, + reward=reward, + observation=observation, + env_params=dict(getattr(self.test_run, "current_env_params", {}) or {}), + ) + ) + for observer in self.observers: + observer.after_step(self.test_run, observation, reward) + return observation, reward, True, {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/configurator/cloudai_gym.py` around lines 237 - 263, The constraint failure early return at lines 260-263 skips both the trajectory write and observer post-hook, but observer.before_step() was already called and may have written to env.csv. Add a call to self.write_trajectory() with a TrajectoryEntry and then call observer.after_step() for each observer before the early return in the constraint check failure path, using appropriate values (like the constraint_failure reward) to keep env.csv and trajectory.csv aligned, similar to how the cached result path handles it at lines 250-257.
♻️ Duplicate comments (2)
src/cloudai/configurator/gymnasium_adapter.py (1)
206-206:⚠️ Potential issue | 🟠 Major | ⚡ Quick winWiden
step()action type to match continuous-action payloads.At Line 206,
dict[str, int]is too narrow for Box actions (np.ndarrayleaves). This is the source of the current pyright failure and mismatchesdecode_action()behavior.#!/bin/bash # Verify root-cause annotation and failing call shape. rg -nP 'def step\(self, action: dict\[str, int\]\)' src/cloudai/configurator/gymnasium_adapter.py rg -nP 'adapter\.step\(\{"threshold": np\.array\(' tests/test_gymnasium_adapter_contract.py -C2Suggested fix
- def step(self, action: dict[str, int]) -> tuple[Any, float, bool, bool, dict[str, Any]]: + def step(self, action: dict[str, Any]) -> tuple[Any, float, bool, bool, dict[str, Any]]:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/configurator/gymnasium_adapter.py` at line 206, The `step()` method in gymnasium_adapter.py has an overly restrictive type annotation for the action parameter. Change the action parameter type from `dict[str, int]` to a wider union type that accommodates both integer actions and numpy array values (used by continuous Box actions). This will align the type annotation with what `decode_action()` actually accepts and resolve the pyright type mismatch failure.Source: Pipeline failures
src/cloudai/configurator/cloudai_gym.py (1)
86-87:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate
env_classformat before unpacking.At Line 86,
rsplit(".", 1)still assumes a dotted import path and can raise an opaque unpackingValueErrorfor malformed values (e.g.,"FakeGymServer"). Please fail with a descriptive error first.Suggested fix
+ if "." not in str(env_class_path): + raise ValueError( + f"env_class must be a dotted import path like 'pkg.module.Class'; got {env_class_path!r}" + ) module_path, class_name = str(env_class_path).rsplit(".", 1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/configurator/cloudai_gym.py` around lines 86 - 87, The code at the rsplit(".", 1) operation in the module imports section assumes a valid dotted import path format and will raise an opaque unpacking ValueError if the env_class_path doesn't contain a dot (e.g., "FakeGymServer"). Add validation before the rsplit to check that the env_class_path string contains at least one dot separator, and raise a descriptive ValueError with a clear error message if the format is invalid, then proceed with the rsplit and unpacking only after validation passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/configurator/base_agent.py`:
- Around line 149-161: The loop in the base_agent.py file continues iterating
through all self.max_steps regardless of the terminal state flag returned from
env.step(). Add a break statement in the loop after the update_policy() call to
exit when done=True, preventing the execution of unintended post-terminal steps
that violate proper episode semantics. The done flag is already being captured
from env.step() and passed to update_policy(), so you only need to check its
value and break the loop when it becomes True.
In `@tests/test_action_space.py`:
- Around line 43-50: Replace direct constructor calls with model_validate()
method calls to defer validation to runtime. In tests/test_action_space.py at
lines 43-50, update test_continuous_space_rejects_unknown_dtype() to use
ContinuousSpace.model_validate({"low": 0.0, "high": 1.0, "dtype": "double"})
instead of the constructor call, and update
test_continuous_space_forbids_extra_fields() to use
ContinuousSpace.model_validate({"low": 0.0, "high": 1.0, "step": 0.1}) instead
of the constructor call. In tests/test_env_params.py at lines 145-149, similarly
replace ObsLeafDescriptor constructor calls with
ObsLeafDescriptor.model_validate({...}) for both test cases that pass invalid
arguments (the case with extra field unexpected=1 and the case with invalid
kind="categorical"). This defers type validation to runtime so
pytest.raises(ValidationError) can catch the errors instead of pyright rejecting
them during static type checking.
---
Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 237-263: The constraint failure early return at lines 260-263
skips both the trajectory write and observer post-hook, but
observer.before_step() was already called and may have written to env.csv. Add a
call to self.write_trajectory() with a TrajectoryEntry and then call
observer.after_step() for each observer before the early return in the
constraint check failure path, using appropriate values (like the
constraint_failure reward) to keep env.csv and trajectory.csv aligned, similar
to how the cached result path handles it at lines 250-257.
---
Duplicate comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 86-87: The code at the rsplit(".", 1) operation in the module
imports section assumes a valid dotted import path format and will raise an
opaque unpacking ValueError if the env_class_path doesn't contain a dot (e.g.,
"FakeGymServer"). Add validation before the rsplit to check that the
env_class_path string contains at least one dot separator, and raise a
descriptive ValueError with a clear error message if the format is invalid, then
proceed with the rsplit and unpacking only after validation passes.
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Line 206: The `step()` method in gymnasium_adapter.py has an overly
restrictive type annotation for the action parameter. Change the action
parameter type from `dict[str, int]` to a wider union type that accommodates
both integer actions and numpy array values (used by continuous Box actions).
This will align the type annotation with what `decode_action()` actually accepts
and resolve the pyright type mismatch failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 8bc0b625-bfb7-44c2-acc4-7100e48ae965
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (17)
pyproject.tomlsrc/cloudai/_core/action_space.pysrc/cloudai/_core/test_scenario.pysrc/cloudai/cli/handlers.pysrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/base_agent.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/env_params.pysrc/cloudai/configurator/gymnasium_adapter.pysrc/cloudai/core.pysrc/cloudai/models/workload.pytests/test_action_space.pytests/test_cloudaigym.pytests/test_env_params.pytests/test_gymnasium_adapter_contract.pytests/test_handlers.pytests/test_test_scenario.py
|
@coderabbitai review |
✅ Action performedReview finished.
|
a219cbd to
0e80db2
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)
237-263:⚠️ Potential issue | 🟠 Major | ⚡ Quick winConstraint-failure path can desynchronize
env.csvandtrajectory.csv.
observer.before_step(...)runs before constraint validation, so env params are persisted for the step, but the early return on constraint failure skips trajectory write. That breaks the step-alignment contract for downstream joins.💡 Minimal ordering fix
- for observer in self.observers: - observer.before_step(self.test_run) - - cached_result = self.get_cached_trajectory_result(action) - if cached_result is not None: + if not self.test_run.test.constraint_check(self.test_run, self.runner.system): + logging.info("Constraint check failed. Skipping step.") + return [-1.0], self.rewards.constraint_failure, True, {} + + for observer in self.observers: + observer.before_step(self.test_run) + + cached_result = self.get_cached_trajectory_result(action) + if cached_result is not None: ... - - if not self.test_run.test.constraint_check(self.test_run, self.runner.system): - logging.info("Constraint check failed. Skipping step.") - return [-1.0], self.rewards.constraint_failure, True, {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/configurator/cloudai_gym.py` around lines 237 - 263, The constraint check must be moved to occur before observer.before_step() is called. Currently, observer.before_step() is invoked first (which persists env params), then if the constraint check fails, the code returns early without writing a trajectory entry, causing misalignment between env.csv and trajectory.csv. Reorder the code so the constraint check using self.test_run.test.constraint_check() happens before any observer calls, ensuring that env params are only persisted for steps that actually produce trajectory entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/_core/action_space.py`:
- Around line 47-55: The ContinuousSpace class with dtype="int" can produce
out-of-range values during rounding if the bounds are non-integers. Add
validation in the _validate_bounds method to enforce that when dtype="int", both
self.low and self.high must be integers (check that they equal their integer
conversions). This invariant ensures that rounding operations in downstream code
cannot exceed the intended bounds.
In `@tests/test_cloudaigym.py`:
- Around line 568-680: Add a new regression test function in
tests/test_cloudaigym.py that covers the constraint_check failure path when
env_params is declared. The test should create a CloudAIGymEnv with env_params
specified on the TestDefinition, mock the constraint_check method to return
False or raise an exception, call env.step() with an action, and then assert
that env.csv and trajectory.csv remain step-aligned by verifying that both files
either skip the step or both record it consistently. This ensures the failure
path maintains the corpus-friendly merge contract mentioned in the existing
tests.
---
Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 237-263: The constraint check must be moved to occur before
observer.before_step() is called. Currently, observer.before_step() is invoked
first (which persists env params), then if the constraint check fails, the code
returns early without writing a trajectory entry, causing misalignment between
env.csv and trajectory.csv. Reorder the code so the constraint check using
self.test_run.test.constraint_check() happens before any observer calls,
ensuring that env params are only persisted for steps that actually produce
trajectory entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 5f32c9ba-c452-4bfd-94aa-7ec4d8f6b692
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
pyproject.tomlsrc/cloudai/_core/action_space.pysrc/cloudai/_core/test_scenario.pysrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/env_params.pysrc/cloudai/configurator/gymnasium_adapter.pysrc/cloudai/core.pysrc/cloudai/models/workload.pytests/test_action_space.pytests/test_cloudaigym.pytests/test_env_params.pytests/test_gymnasium_adapter_contract.py
|
@coderabbitai review |
✅ Action performedReview finished.
|
…split
A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.
Addresses CodeRabbit finding on NVIDIA#932.
84ae7f1 to
d654dfb
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/cloudai/_core/action_space.py (1)
49-55:⚠️ Potential issue | 🟠 MajorEnforce integer-compatible bounds when
dtype="int".Line 49 allows
dtype="int", but Line 52-55 only validates ordering. This still permits non-integer bounds, and downstream decode (clampthenround) can emit values outside the declared interval.🔧 Proposed fix
`@model_validator`(mode="after") def _validate_bounds(self) -> Self: if self.low >= self.high: raise ValueError(f"ContinuousSpace requires low < high; got low={self.low}, high={self.high}") + if self.dtype == "int" and (not self.low.is_integer() or not self.high.is_integer()): + raise ValueError( + f"ContinuousSpace(dtype='int') requires integer bounds; got low={self.low}, high={self.high}" + ) return self🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/_core/action_space.py` around lines 49 - 55, The _validate_bounds method in ContinuousSpace only validates that low < high, but does not enforce that low and high are integers when dtype="int". Add an additional validation check in the _validate_bounds method that raises a ValueError if dtype is set to "int" but either self.low or self.high are not integers or are not integer-compatible values. This validation should occur alongside the existing bounds ordering check to prevent non-integer bounds from being allowed when integer dtype is specified.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/cloudai/_core/action_space.py`:
- Around line 49-55: The _validate_bounds method in ContinuousSpace only
validates that low < high, but does not enforce that low and high are integers
when dtype="int". Add an additional validation check in the _validate_bounds
method that raises a ValueError if dtype is set to "int" but either self.low or
self.high are not integers or are not integer-compatible values. This validation
should occur alongside the existing bounds ordering check to prevent non-integer
bounds from being allowed when integer dtype is specified.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 32b71de6-445d-4439-89d6-b07b150e0185
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
pyproject.tomlsrc/cloudai/_core/action_space.pysrc/cloudai/cli/handlers.pysrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/env_params.pysrc/cloudai/configurator/gymnasium_adapter.pysrc/cloudai/core.pytests/test_action_space.pytests/test_cloudaigym.pytests/test_env_params.pytests/test_gymnasium_adapter_contract.pytests/test_handlers.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/cloudai/_core/action_space.py (1)
49-55:⚠️ Potential issue | 🟠 MajorEnforce integer-compatible bounds when
dtype="int".Line 49 allows
dtype="int", but Line 52-55 only validates ordering. This still permits non-integer bounds, and downstream decode (clampthenround) can emit values outside the declared interval.🔧 Proposed fix
`@model_validator`(mode="after") def _validate_bounds(self) -> Self: if self.low >= self.high: raise ValueError(f"ContinuousSpace requires low < high; got low={self.low}, high={self.high}") + if self.dtype == "int" and (not self.low.is_integer() or not self.high.is_integer()): + raise ValueError( + f"ContinuousSpace(dtype='int') requires integer bounds; got low={self.low}, high={self.high}" + ) return self🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/_core/action_space.py` around lines 49 - 55, The _validate_bounds method in ContinuousSpace only validates that low < high, but does not enforce that low and high are integers when dtype="int". Add an additional validation check in the _validate_bounds method that raises a ValueError if dtype is set to "int" but either self.low or self.high are not integers or are not integer-compatible values. This validation should occur alongside the existing bounds ordering check to prevent non-integer bounds from being allowed when integer dtype is specified.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/cloudai/_core/action_space.py`:
- Around line 49-55: The _validate_bounds method in ContinuousSpace only
validates that low < high, but does not enforce that low and high are integers
when dtype="int". Add an additional validation check in the _validate_bounds
method that raises a ValueError if dtype is set to "int" but either self.low or
self.high are not integers or are not integer-compatible values. This validation
should occur alongside the existing bounds ordering check to prevent non-integer
bounds from being allowed when integer dtype is specified.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 32b71de6-445d-4439-89d6-b07b150e0185
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
pyproject.tomlsrc/cloudai/_core/action_space.pysrc/cloudai/cli/handlers.pysrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/env_params.pysrc/cloudai/configurator/gymnasium_adapter.pysrc/cloudai/core.pytests/test_action_space.pytests/test_cloudaigym.pytests/test_env_params.pytests/test_gymnasium_adapter_contract.pytests/test_handlers.py
d9e8566 to
e27b362
Compare
…split
A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.
Addresses CodeRabbit finding on NVIDIA#932.
e27b362 to
9f0652c
Compare
…split
A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.
Addresses CodeRabbit finding on NVIDIA#932.
9f0652c to
fb1480c
Compare
…split
A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.
Addresses CodeRabbit finding on NVIDIA#932.
fb1480c to
2794501
Compare
Drop the EnvParamsSink Protocol + CsvSink pair (and runtime_checkable) for a single concrete EnvParamsSink, built unconditionally in CloudAIGymEnv. The sink is now stateless: write() takes the record path per call and skips empty samples, so non-DR runs write nothing and write_trajectory needs no branch. Derive both records from a new iteration_dir property and expose the env record via the env_params_record_path property (was _env_csv_path), keeping env.csv and trajectory.csv step-aligned without coupling the name to CSV.
…ty flag Replace the hardcoded `agent == "grid_search"` check with a BaseAgent.samples_env_params capability flag (opt-in, defaults False). Only agents whose search consumes per-trial env_params sampling set it True; enumerating/surrogate agents leave it False, so a config that declares env_params for an agent that would ignore them is rejected up front instead of silently no-op'ing. New agents answer for themselves with no string to maintain. Relocate validate_dse_env_params out of the CLI handlers into configurator/env_params.py next to the logic it guards, looking the agent up via the Registry. Unknown agents are deferred to the dedicated agent-resolution error rather than masked here. Keep all public-facing comments, docstrings, and the error message generic (no internal agent names). Cover the full validator matrix, including the unknown-agent deferral.
Compress multi-line inline comments down to the single non-obvious rationale (or drop them where the code already speaks), per the self-documenting-code principle. Public API docstrings and test intent comments are left intact.
apply_params_set overlays sampled scalar draws onto cmd_args, then reconstructs the TestDefinition to validate the applied action values. That pass re-ran validate_env_params, which rejects a weighted env_param whose cmd_args target is no longer a candidate list - exactly what the overlay produces. env_params is already validated at parse time, so drop it from the validation-only dump. Adds a regression test covering a weighted env_param's scalar draw.
GymnasiumAdapter._decode_continuous clamps to [low, high] then rounds, so a dtype="int" space with non-integer bounds (e.g. [0.2, 0.8]) can decode to a value outside the declared range (1). Validate that int-quantized spaces declare integer bounds (CodeRabbit, NVIDIA#932/NVIDIA#927).
…split
A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.
Addresses CodeRabbit finding on NVIDIA#932.
b79ff99 to
f0aa596
Compare
|
Parking this PR for now. Per a change in release plan, this work isn't needed for the upcoming aiconfigurator release, so I'm moving it to draft to keep it out of the review queue. @podkidyshev — no need to review this one yet; I'll mark it ready again when it's scheduled for release. Thanks! |
…split
A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.
Addresses CodeRabbit finding on NVIDIA#932.
f0aa596 to
be4b254
Compare
An env_params entry only reclassifies a list-valued cmd_args sweep as env-sampled; a scalar is already fixed, so annotating it is a meaningless label. Previously such an annotation was tolerated as a silent no-op, which let it slip through parse-time validation and inconsistently trip (or not) the downstream "no agent will sample them" check depending on run mode. Reject it where the contract lives - TestDefinition.validate_env_params - so the failure is immediate and mode-independent. EnvParams.from_test's non-list guard becomes defensive (parse-time now guarantees lists); the post-overlay path already drops env_params before re-validating, so concrete scalar draws are unaffected. Extract the per-field checks into a helper to keep the validator under the complexity limit, and update tests: scalar annotations now assert rejection instead of no-op tolerance.
…otocol Add ObsLeafDescriptor (a self-describing observation leaf: "box" of width dim, or "discrete" of size n) and a StructuredObservation Protocol that documents the optional env hooks structured_observation_descriptors() and encode_observation(). These let an env expose a named, per-leaf observation so adapters (e.g. GymnasiumAdapter) can build the matching gymnasium spaces.Dict; the hooks are duck-typed, so envs need not subclass. Both exported via cloudai.core.
…ejection tests Negative tests pass an extra kwarg and an out-of-Literal kind to assert ValidationError; mark the deliberate type violations with type: ignore.
GymnasiumAdapter._decode_continuous clamps to [low, high] then rounds, so a dtype="int" space with non-integer bounds (e.g. [0.2, 0.8]) can decode to a value outside the declared range (1). Validate that int-quantized spaces declare integer bounds (CodeRabbit, NVIDIA#932/NVIDIA#927).
Wrap a CloudAI BaseGym as a gymnasium.Env-shaped object: a spaces.Dict of Discrete (list params) and Box (ContinuousSpace) actions over the tunable params with fixed (single-value) params injected each step; observations as either a flat float32 Box or, when the env opts in via the structured-obs hooks, a spaces.Dict of per-leaf ObsLeafDescriptor subspaces. Continuous dtype="int" params are quantized (rounded/clamped) at decode_action so the trajectory cache key collapses float jitter. The adapter is a pure pass-through over test_run.step (never mutates it), so contextual-bandit rollouts that reset() per trial keep a monotonic trial index. gymnasium is an optional dependency lazy-imported behind the new [rl] extra (also added to dev); CloudAIGymEnv.define_observation_space() now returns one slot per agent metric so adapters get the right Box shape. Exported via cloudai.core. Caller-contract tests pin the step-monotonicity, observation pass-through, continuous-quantization, and structured-obs invariants.
…ature step() delegates to decode_action(dict[str, Any]) and exists precisely to round float/continuous policy actions to ints; widen its parameter type from dict[str, int] to dict[str, Any] to match.
…reserve traceback on DSE re-raise - _as_obs_array(): assert encoded keys match descriptors before coercion (reuses _assert_keys, same guard as decode_action/step_raw) and materialize output by descriptor keys to avoid KeyError on extra keys and silent partial observations on missing keys. - handlers.py: re-raise the captured hard-fail with its original traceback. Addresses CodeRabbit findings on NVIDIA#930.
…ngleton Replace the bespoke _import_gymnasium() in-method seam with the canonical lazy.gymnasium / lazy.np properties; addresses the in-method-import review concern. No behavior change — gymnasium stays an optional [rl] extra.
The lazy.gymnasium.spaces refactor gives the adapter precise gymnasium types instead of Any, which surfaced two latent issues the scoped pre-commit run missed: - pyright now sees Space[Any]/Dict in the adapter contract test, so concrete attribute access (.low/.high/.n/.spaces) is flagged. Narrow via local bindings + isinstance before access. - lazy_imports.py now has a 2026 commit in its history, so the ci_only copyright check requires the year range 2025-2026.
Inherit from gymnasium.Env (guarded import, falls back to object when the optional [rl] extra is absent) so ecosystem tooling that performs isinstance checks (e.g. Stable-Baselines3) accepts the adapter. - Use the TYPE_CHECKING import form so pyright sees a concrete base class while runtime keeps the optional-dependency fallback. - Drop ClassVar on metadata to match Env's attribute shape (noqa RUF012). - Rename the inner-env accessor unwrapped -> cloudai_env; gymnasium's Env.unwrapped (returns self) is the correct base-env semantics, and the old override returned a non-Env (BaseGym), which would mislead ecosystem code calling .unwrapped.
Inheriting gymnasium.Env widened the static type of action_space from the concrete spaces.Dict the adapter builds to the base spaces.Space, which has no __getitem__. Cast at the test call site to restore subspace indexing for pyright (runtime is unchanged; action_space is always a Dict).
…rse of decode_action decode_action had no public inverse, so consumers needing value->index encoding (e.g. RLlib warm-start / behavioral cloning) reached into the private _tunable_params dict. When ContinuousSpace support split that internal, those consumers broke with AttributeError. encode_action closes the contract: discrete values map to their candidate index, continuous values wrap into the clamped float32 Box array, so decode_action(encode_action(v)) == v for any native v. Adds round-trip contract tests pinning the invariant and rejection of non-candidate values / key mismatches.
…tinuousSpace The GymnasiumAdapter's continuous-action path depends on ContinuousSpace, which ships separately. Until then nothing constructs a ContinuousSpace, so the continuous branches here are unreachable and the only effect of the import is an ImportError at module load. Drop the continuous import, _continuous_params, the Box action mapping, and decode/encode continuous handling so the adapter builds and ships standalone over discrete + structured-observation support. The continuous support rejoins when ContinuousSpace lands.
Adds an optional online (live-RL) execution mode: instead of launching a workload through the runner, the env delegates each trial to an in-process GymServer. Mode is selected via an injected server or cmd_args.live_rl_mode (server built from cmd_args.env_class); the agent-facing interface is unchanged. Default runner-backed behavior is untouched.
…split
A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.
Addresses CodeRabbit finding on NVIDIA#932.
Online (live-RL) steps wrote trajectory rows from a per-instance counter that reset to 0 on every reset(). Under reset-per-episode Gymnasium/RLlib rollouts that collapsed online trajectory.csv rows back to step=1 each episode. Drive online steps from the monotonic test_run.step (incremented per step, never reset) so rows keep increasing across resets. Removes the now-redundant _online_step_count; render() and the trajectory writer both read test_run.step. Adds a regression test asserting steps stay 1,2,3 across a mid-rollout reset.
env_params_record_path was pinned to iteration_dir (runner.scenario_root), but online (live-RL) runs write trajectory.csv under test_run.output_path and have no runner-backed scenario_root. write_trajectory evaluates the record path unconditionally, so an online step hit scenario_root on a runner that does not expose it. Derive the path from trajectory_file_path so env.csv always sits next to trajectory.csv in both modes, with no extra branch in the write path.
be4b254 to
ed3211c
Compare
Issue
CloudAIGymEnvcan only run trials by launching workloads through the runner — there is no path for fast, in-process online / live-RL training (e.g. kvpilot).Fix
Add an optional online mode: the env delegates
reset/step/action+observation spaces to an in-processGymServerinstead of the runner. Mode is selected by an injected server orcmd_args.live_rl_mode=true(server built fromcmd_args.env_class); the agent-facing interface and default runner-backed behavior are unchanged.Testing
pytest tests/test_cloudaigym.py— existing offline tests stay green; 6 new tests cover delegation, trajectory output,env_classimport/kwarg-filtering, missing-env_classerror, andlive_rl_modeauto-detection. ruff/pyright/vulture/import-linter clean.