Skip to content

feat(configurator): add GymnasiumAdapter for CloudAI envs#930

Open
rutayan-nv wants to merge 22 commits into
NVIDIA:mainfrom
rutayan-nv:rpatro/gymnasium-adapter
Open

feat(configurator): add GymnasiumAdapter for CloudAI envs#930
rutayan-nv wants to merge 22 commits into
NVIDIA:mainfrom
rutayan-nv:rpatro/gymnasium-adapter

Conversation

@rutayan-nv

Copy link
Copy Markdown
Contributor

Issue

  • RL agents (PPO/DQN) and external training loops need a gymnasium.Env-shaped view of a CloudAI BaseGym; there is no upstream adapter, and a flat [0.0] observation gives adapters the wrong Box shape.

Fix

  • Add GymnasiumAdapter (configurator): spaces.Dict of Discrete (list) + Box (ContinuousSpace) actions with fixed params injected per step; flat-Box or structured spaces.Dict (per-leaf ObsLeafDescriptor) observations; dtype="int" continuous actions quantized at decode_action. Pure pass-through over test_run.step (never mutated) so contextual-bandit reset()-per-trial keeps a monotonic trial index. gymnasium is lazy-imported behind a new [rl] extra; define_observation_space() now sizes by agent metrics. Exported via cloudai.core.

Testing

  • tests/test_gymnasium_adapter_contract.py: caller-contract tests for step-monotonicity (within/across episodes), observation pass-through, continuous quantization/clamping, and the structured-obs gate. ruff + pyright + vulture + import-linter clean; 108 related tests pass.

Stack: #901#927 (ContinuousSpace) ← #928 (ObsLeafDescriptor) ← this. Final cloudai-side PR of the gymnasium-adapter upstreaming; consumes both primitives.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces domain-randomized per-trial env_params to CloudAI DSE: new EnvParamSpec and ContinuousSpace schemas, deterministic EnvParamsSampler, CSV persistence via CsvSink, StepObserver hooks wired into CloudAIGymEnv, and cache keys extended to (action, env_params). Adds GymnasiumAdapter wrapping BaseGym for Gymnasium-compatible RL agents, a concrete BaseAgent.run() loop, and refactored DSE job handling with structured failure reporting.

Changes

DSE env-randomized parameters and Gymnasium integration

Layer / File(s) Summary
Core data contracts
src/cloudai/_core/action_space.py, src/cloudai/_core/test_scenario.py, src/cloudai/configurator/env_params.py, src/cloudai/models/workload.py, tests/test_action_space.py, tests/test_env_params.py, tests/test_test_scenario.py
ContinuousSpace Pydantic model with low < high validation; TestRun gains current_env_params field and increment_step() method; EnvParamSpec, ObsLeafDescriptor, and StructuredObservation protocol defined with weight/dimension validation; TestDefinition.env_params field added; tests cover all validation constraints, construction semantics, and monotonic step behavior.
Env params sampling, persistence, and step observers
src/cloudai/configurator/env_params.py, tests/test_env_params.py
EnvParamsSampler produces deterministic per-(seed, param, trial) categorical draws using independent RNG streams; CsvSink appends step-aligned rows to env.csv with header/parent-dir creation; StepObserver protocol and EnvParamsObserver implementation sample params in before_step, store on test_run.current_env_params, persist to sink, and no-op in after_step; unit tests cover determinism, CSV edge cases (empty samples, step validation), and observer side effects.
CloudAIGym env_params trajectory and cache integration
src/cloudai/configurator/cloudai_gym.py, tests/test_cloudaigym.py
TrajectoryEntry gains env_params field; CloudAIGymEnv builds observer list at init, calls before_step/after_step around workload execution, increments test_run.step via increment_step() in step(), persists current_env_params in every trajectory row, and changes cache identity from action-only to (action, env_params) requiring exact recursive match; observation space sized to agent metric count; tests cover cache miss/hit on env_params mismatch, env.csv 1:1 alignment with trajectory, and cache-hit-still-writes-env.csv behavior.
BaseAgent.run() loop and DSE handler refactor
src/cloudai/configurator/base_agent.py, src/cloudai/cli/handlers.py, tests/test_handlers.py
BaseAgent.run() drives reset→loop(select_action→env.step→update_policy)→return 0 execution up to max_steps, logging step/action/reward/observation at each iteration, exiting early if select_action returns None; handle_dse_job delegates to agent.run(), accumulates recoverable non-zero returns, captures unexpected exceptions, writes dse_failure.txt via _record_run_failure() with type/message/traceback, calls generate_reports(error=...), then re-raises; tests cover delegation polymorphism, rc accumulation across test runs, hard-fail abort of remaining runs, and artifact presence.
GymnasiumAdapter
src/cloudai/configurator/gymnasium_adapter.py, tests/test_gymnasium_adapter_contract.py, pyproject.toml
New GymnasiumAdapter wraps BaseGym with lazy gymnasium/numpy import, builds action_space as spaces.Dict with Discrete subspaces for multi-value list params, Box subspaces for ContinuousSpace params (fixed single-value params injected internally), builds observation_space as structured spaces.Dict (preferred) or flat float32 Box fallback, decodes actions with strict key validation and continuous clamping/rounding based on dtype, implements reset/step/step_raw returning Gymnasium 5-tuple; gymnasium~=1.2 added to dev and new rl optional dependencies; contract tests cover monotonic step index across episodes, contextual observation propagation, continuous space dispatch with rounding/quantization, structured vs. flat obs gating, and categorical leaf subspace validity.
Public API re-exports
src/cloudai/core.py, src/cloudai/configurator/__init__.py
ContinuousSpace, GymnasiumAdapter, ObsLeafDescriptor, and StructuredObservation added to cloudai.core.__all__ and corresponding imports; GymnasiumAdapter added to cloudai.configurator.__all__.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested reviewers

  • srivatsankrishnan
  • jeffnvidia
  • podkidyshev

Poem

🐇 Hops through the gym with a randomized flair,
Each trial a new env_param drawn from thin air,
The cache key now knows both action and seed,
The adapter wraps Gymnasium — just what RL needs!
On failure, a txt file documents the crash,
Then BaseAgent.run() completes with a dash. 🥕

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main addition: introducing GymnasiumAdapter for wrapping CloudAI environments. It is specific, concise, and directly related to the primary changeset.
Description check ✅ Passed The description clearly relates to the changeset by explaining the need for a gymnasium.Env adapter, detailing the key features (action spaces, observations, continuous handling, trial monotonicity), and referencing dependent PRs and testing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)

146-172: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constraint-failure branch breaks per-step artifact contract and observation shape consistency.

At Line 171, the early return skips write_trajectory(...) and observer.after_step(...) even though before_step(...) already ran at Lines 146-147. This can leave env.csv with a step that is missing in trajectory.csv. It also returns a fixed [-1.0], which mismatches the dynamic observation shape introduced at Line 103 when agent_metrics has more than one metric.

🔧 Proposed fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            failed_observation = [self.rewards.metric_failure] * max(len(self.test_run.test.agent_metrics), 1)
+            failed_reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=failed_reward,
+                    observation=failed_observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, failed_observation, failed_reward)
+            return failed_observation, failed_reward, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 146 - 172, The
constraint-failure early return at the end of the diff skips both
write_trajectory and observer.after_step calls even though before_step was
already invoked, breaking the per-step artifact contract and leaving the
trajectory file inconsistent with env.csv. Additionally, the hardcoded [-1.0]
observation return value does not match the dynamic observation shape determined
by agent_metrics. To fix this, when the constraint_check fails: create a
TrajectoryEntry with the current step, action, the constraint_failure reward,
and the current observation from self.test_run, call write_trajectory with this
entry, invoke observer.after_step with the test_run, current observation, and
constraint_failure reward, then return the current observation (not the
hardcoded [-1.0]), the constraint_failure reward, and the appropriate done flag.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/env_params.py`:
- Line 2: The copyright headers in both files use an invalid year format that
fails the repository's copyright header validation test. In
src/cloudai/configurator/env_params.py at line 2, change the copyright header
from "Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights
reserved." to "Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights
reserved." by removing the year range and keeping only 2026. Apply the identical
change to tests/test_env_params.py at line 2, changing from the 2024-2026 format
to just 2026 to match the repository's required copyright-year formatting policy
enforced by tests/test_check_copyright_headers.py.

In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 251-253: The current implementation trusts the keys returned by
encode_observation() when building the output dictionary, which can cause
KeyError for extra keys or silently produce incomplete observations if keys are
missing. Fix this by first validating that the set of keys from the encoded
observation matches the set of keys in the descriptors dictionary, then
materialize the output by iterating through descriptors keys instead of
encoded.items(), ensuring all required descriptor keys are present and properly
coerced without relying on the encode_observation() output to have the correct
keys.
- Around line 206-207: The step() method's action parameter is typed as
dict[str, int] but the implementation and tests show it needs to accept
dict[str, Any] to handle both integer and continuous numpy array values that are
passed to decode_action(). Change the type annotation of the action parameter in
the step() method signature from dict[str, int] to dict[str, Any] to match what
decode_action() expects and what the tests actually pass to it.

In `@tests/test_action_space.py`:
- Around line 43-50: These negative-validation tests intentionally pass invalid
arguments to verify runtime validation rejects them, but this causes type
checker errors. Use typing.cast() to suppress these violations at the affected
sites. In tests/test_action_space.py lines 43-50, wrap the invalid dtype literal
"double" with cast(Any, "double") in the
test_continuous_space_rejects_unknown_dtype function, and wrap the entire
ContinuousSpace constructor call with cast(dict[str, Any], {...}) to suppress
the extra step parameter in test_continuous_space_forbids_extra_fields.
Similarly, in tests/test_env_params.py lines 142-149, apply cast(Any,
"categorical") for the invalid kind literal and cast(dict[str, Any], {...}) for
the constructor call containing the unexpected extra field.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 146-172: The constraint-failure early return at the end of the
diff skips both write_trajectory and observer.after_step calls even though
before_step was already invoked, breaking the per-step artifact contract and
leaving the trajectory file inconsistent with env.csv. Additionally, the
hardcoded [-1.0] observation return value does not match the dynamic observation
shape determined by agent_metrics. To fix this, when the constraint_check fails:
create a TrajectoryEntry with the current step, action, the constraint_failure
reward, and the current observation from self.test_run, call write_trajectory
with this entry, invoke observer.after_step with the test_run, current
observation, and constraint_failure reward, then return the current observation
(not the hardcoded [-1.0]), the constraint_failure reward, and the appropriate
done flag.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: b6827577-13bc-423b-ab43-f10df882a769

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0e8cc and 49f5b43.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • pyproject.toml
  • src/cloudai/_core/action_space.py
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • src/cloudai/models/workload.py
  • tests/test_action_space.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter_contract.py
  • tests/test_handlers.py
  • tests/test_test_scenario.py

Comment thread src/cloudai/configurator/env_params.py Outdated
Comment thread src/cloudai/configurator/gymnasium_adapter.py Outdated
Comment thread src/cloudai/configurator/gymnasium_adapter.py
Comment thread tests/test_action_space.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/cloudai/configurator/cloudai_gym.py (1)

169-171: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constraint-failure path breaks step-alignment and observation-shape contracts

At Line 169, the early return bypasses trajectory writing and after_step callbacks (after before_step already persisted env params), which can desynchronize env.csv and trajectory.csv. It also returns a fixed [-1.0], which mismatches the new metric-sized observation shape when agent_metrics has length > 1.

💡 Suggested localized fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0] * max(len(self.test_run.test.agent_metrics), 1)
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 169 - 171, The
constraint-failure path in the step function contains an early return that
bypasses trajectory writing and after_step callbacks, which creates
desynchronization between env.csv and trajectory.csv, and also returns a fixed
observation shape of [-1.0] that does not match the expected observation size
when agent_metrics has length greater than one. Instead of returning early when
the constraint_check fails, set the appropriate constraint_failure reward and
done flag, then allow the function to continue to the normal step completion
flow to ensure trajectory writing and after_step callbacks are executed, and
construct the observation array to match the correct shape based on the actual
agent_metrics size.
src/cloudai/configurator/base_agent.py (1)

91-92: ⚠️ Potential issue | 🟠 Major

Fix select_action return type to align with the run() loop's termination contract.

Line 144 checks if result is None: to break the loop, but the abstract signature on Line 91 declares select_action returns tuple[int, dict[str, Any]] (non-optional). This contract mismatch violates the expected termination protocol: implementations that follow the strict signature will never return None, but the loop expects them to.

Suggested fix
-    def select_action(self, observation: list[float] | None = None) -> tuple[int, dict[str, Any]]:
+    def select_action(
+        self, observation: list[float] | None = None
+    ) -> tuple[int, dict[str, Any]] | None:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/base_agent.py` around lines 91 - 92, The abstract
method select_action on line 91 declares a return type of tuple[int, dict[str,
Any]] (non-optional), but the run() method's loop on line 144 checks if result
is None to break, creating a contract mismatch. Update the return type
annotation of the select_action method to be tuple[int, dict[str, Any]] | None
to allow implementations to return None as a termination signal, aligning the
abstract signature with the loop's termination protocol.
♻️ Duplicate comments (2)
src/cloudai/configurator/gymnasium_adapter.py (2)

251-252: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce structured-observation key parity before materialization.

The structured path currently trusts encode_observation() keys. Extra keys can throw KeyError; missing keys can silently produce partial observations. Validate key sets first and build output from descriptor keys.

Proposed fix
         env = cast(StructuredObservation, self._env)
         encoded = env.encode_observation(list(obs))
-        return {name: self._leaf_to_value(descriptors[name], leaf) for name, leaf in encoded.items()}
+        self._assert_keys(encoded.keys(), set(descriptors), "encoded observation")
+        return {name: self._leaf_to_value(descriptors[name], encoded[name]) for name in descriptors}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/gymnasium_adapter.py` around lines 251 - 252, The
current implementation iterates over the keys returned by encode_observation()
without validating that they match the expected descriptor keys, which can cause
KeyError if extra keys are present or silently produce partial observations if
keys are missing. Validate that the keys in the encoded result match the keys in
the descriptors dictionary before materializing the output, then build the
return dictionary by iterating over descriptor keys (rather than encoded keys)
to ensure all required keys are present and handled correctly in the
_leaf_to_value call.

206-207: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Widen step() action typing to match actual accepted payloads.

step() is typed as dict[str, int], but this method forwards to decode_action() which accepts continuous Box payloads (e.g., numpy arrays). The current signature is narrower than real behavior and will keep type-checking failures on valid call sites.

Proposed fix
-    def step(self, action: dict[str, int]) -> tuple[Any, float, bool, bool, dict[str, Any]]:
+    def step(self, action: dict[str, Any]) -> tuple[Any, float, bool, bool, dict[str, Any]]:
         params = {**self._fixed_params, **self.decode_action(action)}
         return self._step_with_params(params)
#!/bin/bash
set -euo pipefail

# Verify the current step signature.
rg -nP 'def step\(self,\s*action:\s*dict\[str,\s*int\]\)' src/cloudai/configurator/gymnasium_adapter.py

# Verify continuous payload usage in tests (numpy array passed to adapter.step()).
rg -n -C2 'adapter\.step\(\{' tests/test_gymnasium_adapter_contract.py
rg -n -C2 'np\.array\(' tests/test_gymnasium_adapter_contract.py
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/gymnasium_adapter.py` around lines 206 - 207, The
step() method signature has an action parameter typed as dict[str, int], which
is too restrictive. The method actually forwards to decode_action() which
accepts continuous Box payloads including numpy arrays, but the current typing
prevents valid callers from passing these payloads without type-checking errors.
Widen the action parameter type annotation in the step() method to accept the
broader range of payload types that decode_action() actually handles, such as
numpy arrays and other gymnasium-compatible action formats.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/cli/handlers.py`:
- Around line 166-180: When re-raising an exception outside its except block
using `raise run_error` on line 180, Python rebinds the traceback context to the
new raise site, obscuring the original error frame. To preserve the original
traceback, restructure the code to use a bare `raise` statement inside the
except block where the exception is caught, or if the code structure requires
deferred raising, save the exception with its traceback using `sys.exc_info()`
and restore it when re-raising to maintain the original crash context during
debugging.

---

Outside diff comments:
In `@src/cloudai/configurator/base_agent.py`:
- Around line 91-92: The abstract method select_action on line 91 declares a
return type of tuple[int, dict[str, Any]] (non-optional), but the run() method's
loop on line 144 checks if result is None to break, creating a contract
mismatch. Update the return type annotation of the select_action method to be
tuple[int, dict[str, Any]] | None to allow implementations to return None as a
termination signal, aligning the abstract signature with the loop's termination
protocol.

In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 169-171: The constraint-failure path in the step function contains
an early return that bypasses trajectory writing and after_step callbacks, which
creates desynchronization between env.csv and trajectory.csv, and also returns a
fixed observation shape of [-1.0] that does not match the expected observation
size when agent_metrics has length greater than one. Instead of returning early
when the constraint_check fails, set the appropriate constraint_failure reward
and done flag, then allow the function to continue to the normal step completion
flow to ensure trajectory writing and after_step callbacks are executed, and
construct the observation array to match the correct shape based on the actual
agent_metrics size.

---

Duplicate comments:
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 251-252: The current implementation iterates over the keys
returned by encode_observation() without validating that they match the expected
descriptor keys, which can cause KeyError if extra keys are present or silently
produce partial observations if keys are missing. Validate that the keys in the
encoded result match the keys in the descriptors dictionary before materializing
the output, then build the return dictionary by iterating over descriptor keys
(rather than encoded keys) to ensure all required keys are present and handled
correctly in the _leaf_to_value call.
- Around line 206-207: The step() method signature has an action parameter typed
as dict[str, int], which is too restrictive. The method actually forwards to
decode_action() which accepts continuous Box payloads including numpy arrays,
but the current typing prevents valid callers from passing these payloads
without type-checking errors. Widen the action parameter type annotation in the
step() method to accept the broader range of payload types that decode_action()
actually handles, such as numpy arrays and other gymnasium-compatible action
formats.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 06fc2119-7c97-4dff-8fe0-9bcea50b5f90

📥 Commits

Reviewing files that changed from the base of the PR and between 49f5b43 and 707c038.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • pyproject.toml
  • src/cloudai/_core/action_space.py
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • src/cloudai/models/workload.py
  • tests/test_action_space.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter_contract.py
  • tests/test_handlers.py
  • tests/test_test_scenario.py

Comment thread src/cloudai/cli/handlers.py Outdated
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@rutayan-nv rutayan-nv force-pushed the rpatro/gymnasium-adapter branch from 707c038 to 0dfac69 Compare June 16, 2026 15:06
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)

146-172: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constraint-failure early return breaks env.csvtrajectory.csv step alignment

observer.before_step(...) runs at Line 147 (and EnvParamsObserver writes env.csv), but on Line 169 the constraint-failure branch returns at Line 171 without writing a trajectory row or firing after_step. This creates orphan env.csv rows for failed trials and breaks the 1:1 step-merge contract.

💡 Proposed fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0]
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 146 - 172, The
constraint-failure return path does not maintain symmetry with the successful
step path: while observer.before_step() is called at the start, the early return
when constraint_check() fails skips both writing a TrajectoryEntry and firing
observer.after_step(), creating orphan entries in the env.csv file. To fix this,
in the constraint-failure branch (after the constraint_check call), add a
write_trajectory() call with a TrajectoryEntry containing the current step,
action, reward (use self.rewards.constraint_failure), observation, and
env_params, and then call observer.after_step() with the appropriate parameters
before returning, mirroring the pattern used in the cached_result branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 146-172: The constraint-failure return path does not maintain
symmetry with the successful step path: while observer.before_step() is called
at the start, the early return when constraint_check() fails skips both writing
a TrajectoryEntry and firing observer.after_step(), creating orphan entries in
the env.csv file. To fix this, in the constraint-failure branch (after the
constraint_check call), add a write_trajectory() call with a TrajectoryEntry
containing the current step, action, reward (use
self.rewards.constraint_failure), observation, and env_params, and then call
observer.after_step() with the appropriate parameters before returning,
mirroring the pattern used in the cached_result branch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: d5a94f42-f6db-481a-bc1b-808fd6d07a4e

📥 Commits

Reviewing files that changed from the base of the PR and between 707c038 and 0dfac69.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • pyproject.toml
  • src/cloudai/_core/action_space.py
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • src/cloudai/models/workload.py
  • tests/test_action_space.py
  • tests/test_cloudaigym.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter_contract.py

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 16, 2026
…reserve traceback on DSE re-raise

- _as_obs_array(): assert encoded keys match descriptors before coercion
  (reuses _assert_keys, same guard as decode_action/step_raw) and
  materialize output by descriptor keys to avoid KeyError on extra keys
  and silent partial observations on missing keys.
- handlers.py: re-raise the captured hard-fail with its original traceback.

Addresses CodeRabbit findings on NVIDIA#930.
@rutayan-nv rutayan-nv force-pushed the rpatro/gymnasium-adapter branch from 17bea13 to 29aaabe Compare June 16, 2026 21:17

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)

169-172: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle constraint-failure trials through the same recording path.

Line 171 returns a hardcoded one-element observation and exits before write_trajectory / observer.after_step. With env params enabled, before_step has already written env.csv, so this path breaks env.csvtrajectory.csv step alignment and can return an observation shape inconsistent with define_observation_space() when multiple metrics are configured.

💡 Proposed fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [self.rewards.metric_failure] * max(len(self.test_run.test.agent_metrics), 1)
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 169 - 172, The
constraint-failure early return at line 171 bypasses the trajectory recording
path (write_trajectory and observer.after_step), causing misalignment between
env.csv and trajectory.csv when environment parameters are enabled.
Additionally, the hardcoded observation [-1.0] may not match the shape defined
by define_observation_space() when multiple metrics are configured. Instead of
returning early when test_run.test.constraint_check() fails, route this case
through the same recording and observation logic as successful steps by calling
write_trajectory and observer.after_step before returning, and ensure the
returned observation matches the shape defined by define_observation_space()
rather than using a hardcoded single-element list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 191-195: The _decode_continuous method silently truncates
multi-value inputs by flattening the array and taking only the first element at
index zero, which can cause incorrect parameter values to be processed. Add
validation after reshaping the input to check that it contains exactly one
element, and raise a ValueError with a descriptive message if the array size is
not one. This validation should occur before the clamping logic to fail fast on
malformed inputs.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 169-172: The constraint-failure early return at line 171 bypasses
the trajectory recording path (write_trajectory and observer.after_step),
causing misalignment between env.csv and trajectory.csv when environment
parameters are enabled. Additionally, the hardcoded observation [-1.0] may not
match the shape defined by define_observation_space() when multiple metrics are
configured. Instead of returning early when test_run.test.constraint_check()
fails, route this case through the same recording and observation logic as
successful steps by calling write_trajectory and observer.after_step before
returning, and ensure the returned observation matches the shape defined by
define_observation_space() rather than using a hardcoded single-element list.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: c94cbeeb-1753-4223-b398-49305c0a7939

📥 Commits

Reviewing files that changed from the base of the PR and between 17bea13 and 29aaabe.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • pyproject.toml
  • src/cloudai/_core/action_space.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/env_params.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • src/cloudai/core.py
  • tests/test_action_space.py
  • tests/test_env_params.py
  • tests/test_gymnasium_adapter_contract.py
  • tests/test_handlers.py

Comment thread src/cloudai/configurator/gymnasium_adapter.py Outdated
@rutayan-nv rutayan-nv force-pushed the rpatro/gymnasium-adapter branch from de02b99 to 67c7178 Compare June 16, 2026 22:15
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 17, 2026
…reserve traceback on DSE re-raise

- _as_obs_array(): assert encoded keys match descriptors before coercion
  (reuses _assert_keys, same guard as decode_action/step_raw) and
  materialize output by descriptor keys to avoid KeyError on extra keys
  and silent partial observations on missing keys.
- handlers.py: re-raise the captured hard-fail with its original traceback.

Addresses CodeRabbit findings on NVIDIA#930.
@rutayan-nv rutayan-nv force-pushed the rpatro/gymnasium-adapter branch from 6634765 to d9c9e14 Compare June 17, 2026 18:33
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 17, 2026
…reserve traceback on DSE re-raise

- _as_obs_array(): assert encoded keys match descriptors before coercion
  (reuses _assert_keys, same guard as decode_action/step_raw) and
  materialize output by descriptor keys to avoid KeyError on extra keys
  and silent partial observations on missing keys.
- handlers.py: re-raise the captured hard-fail with its original traceback.

Addresses CodeRabbit findings on NVIDIA#930.
@rutayan-nv rutayan-nv force-pushed the rpatro/gymnasium-adapter branch from d9c9e14 to 77e79a3 Compare June 17, 2026 20:26
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 17, 2026
…reserve traceback on DSE re-raise

- _as_obs_array(): assert encoded keys match descriptors before coercion
  (reuses _assert_keys, same guard as decode_action/step_raw) and
  materialize output by descriptor keys to avoid KeyError on extra keys
  and silent partial observations on missing keys.
- handlers.py: re-raise the captured hard-fail with its original traceback.

Addresses CodeRabbit findings on NVIDIA#930.
@rutayan-nv rutayan-nv force-pushed the rpatro/gymnasium-adapter branch from 77e79a3 to c19ae2a Compare June 17, 2026 22:03
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 17, 2026
…reserve traceback on DSE re-raise

- _as_obs_array(): assert encoded keys match descriptors before coercion
  (reuses _assert_keys, same guard as decode_action/step_raw) and
  materialize output by descriptor keys to avoid KeyError on extra keys
  and silent partial observations on missing keys.
- handlers.py: re-raise the captured hard-fail with its original traceback.

Addresses CodeRabbit findings on NVIDIA#930.
@rutayan-nv rutayan-nv force-pushed the rpatro/gymnasium-adapter branch from c19ae2a to 463da25 Compare June 17, 2026 22:38
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 20, 2026
…reserve traceback on DSE re-raise

- _as_obs_array(): assert encoded keys match descriptors before coercion
  (reuses _assert_keys, same guard as decode_action/step_raw) and
  materialize output by descriptor keys to avoid KeyError on extra keys
  and silent partial observations on missing keys.
- handlers.py: re-raise the captured hard-fail with its original traceback.

Addresses CodeRabbit findings on NVIDIA#930.
for adapters that derive ``gymnasium.spaces.Box`` from this output.
"""
return [0.0]
return [0.0] * max(len(self.test_run.test.agent_metrics), 1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it safe for existing agents?

Comment thread src/cloudai/core.py Outdated

"""Core CloudAI base classes and interfaces."""

from ._core.action_space import ContinuousSpace

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was told ContinuousSpace is parked yet I see related code here, please don't forget about it 🙏


from typing import TYPE_CHECKING, Any, Optional, cast

from cloudai._core.action_space import ContinuousSpace

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this line is gonna be removed for now, but please import from cloudai.core...

(returns ``None`` unless an observed name is a declared env_param). Envs
without that hook keep the legacy flat-Box path.
"""
getter = getattr(env, "structured_observation_descriptors", None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to see structured_observation_descriptors in the BaseGym (nor in the CloudAIGymEnv) classes...

Comment on lines +144 to +147
@property
def cloudai_env(self) -> BaseGym:
"""Return the wrapped CloudAI :class:`BaseGym` (gymnasium's ``unwrapped`` returns ``self``)."""
return self._env

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of this property? I see no added value...

descriptors = self._obs_descriptors
if descriptors is None:
return self._np.asarray(obs, dtype=self._np.float32)
env = cast(StructuredObservation, self._env)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, in the __init__ I see self._env = env where env comes from arguments and is a BaseGym. how does it end up being StructuredObservation here?

Make env_params a first-class part of CloudAIGymEnv trial identity so the
trajectory cache keys on (action, env_params) rather than action alone,
fixing the domain-randomization correctness bug where the same action under
a different env_params sample returned a stale reward.

- Cache key now includes env_params; cache-key tests pin the contract
  (formerly the TDD-red specs of NVIDIA#900, folded in here).
- Keep env.csv and trajectory.csv 1:1 step-aligned: a single TrajectoryEntry
  sinks both files coherently, including on constraint failure.
- Reject env_params on non-DSE jobs; reject non-finite / negative weights.
- Add cache-hit + declared-env_params integration coverage.

Folds the test-only PR NVIDIA#900 (cache-key TDD) into this PR so the stack has no
permanently-red standalone PR.
…rom search space

Make env_params a thin annotation over cmd_args fields instead of a holder of
candidate values. Candidate values live in cmd_args (the single source of truth,
exactly like an action-space dimension); env_params.<name> only marks a field as
env-sampled and carries optional sampling weights, never the values.

- EnvParamSpec drops `values`; validates weights (finite, non-negative, sum=1.0).
- Sampler/observer resolve candidate lists from cmd_args; scalar knobs are no-ops.
- TestDefinition.validate_env_params cross-checks annotations against cmd_args
  (key must be a real field; weights require a list and must match its length).
- Exclude env_params keys from both param_space and is_dse_job: an env-sampled
  list is not a search dimension, so an env-params-only workload is not a DSE job.
- validate_dse_env_params rejects env_params on non-DSE runs and on grid_search
  (exhaustive search cannot exploit per-trial randomization).
- Scrub private-implementation references from public docstrings.
- Unit tests use generic Atari Breakout semantics (ball_speed / paddle_width).
…pyright

- validate_env_params: reject structured (non-leaf) cmd_args targets. The
  observer cannot sample them, yet param_space/is_dse_job exclude the whole
  key, which would silently drop nested action dimensions.
- CloudAIGymEnv.write_trajectory: rebind the env.csv sink to the current
  iteration path before each write, so env.csv stays 1:1 aligned with
  trajectory.csv when the env instance is reused across iterations.
- test_env_params: assert the unknown-field rejection via model_validate so the
  negative test no longer trips pyright's call-arg check (CI Linting fix); add a
  structured-target rejection test.
An unweighted env_params spec skipped the candidate-list check, so an empty cmd_args list
(e.g. ball_speed = []) passed validation and only failed later in EnvParamsSampler.sample()
via rng.choice([]) (IndexError). Guard against an empty candidate list in validate_env_params
so the error surfaces at TestDefinition build time. Addresses CodeRabbit feedback.
…ms value objects

Replace the EnvParamsSampler class and the StepObserver/EnvParamsObserver
indirection with two frozen dataclasses: EnvParam (one resolved knob: candidates,
optional weights, single draw) and EnvParams (per-run knobs + seed, built via
from_test, sampled per trial). The sampling RNG lives in the env: step() draws
this trial's values and hands concrete values to TestRun.apply_params_set(action,
env_params=...), which overlays action and sample through one deterministic path.

Centralize the cmd_args -> env_params lookup in TestDefinition.is_env_sampled and
access current_env_params directly. Expand EnvParam/EnvParams unit tests to cover
draw, from_test, sample, and immutability.
Drop the EnvParamsSink Protocol + CsvSink pair (and runtime_checkable) for a
single concrete EnvParamsSink, built unconditionally in CloudAIGymEnv. The
sink is now stateless: write() takes the record path per call and skips empty
samples, so non-DR runs write nothing and write_trajectory needs no branch.

Derive both records from a new iteration_dir property and expose the env
record via the env_params_record_path property (was _env_csv_path), keeping
env.csv and trajectory.csv step-aligned without coupling the name to CSV.
…ty flag

Replace the hardcoded `agent == "grid_search"` check with a BaseAgent.samples_env_params
capability flag (opt-in, defaults False). Only agents whose search consumes per-trial
env_params sampling set it True; enumerating/surrogate agents leave it False, so a config
that declares env_params for an agent that would ignore them is rejected up front instead
of silently no-op'ing. New agents answer for themselves with no string to maintain.

Relocate validate_dse_env_params out of the CLI handlers into configurator/env_params.py
next to the logic it guards, looking the agent up via the Registry. Unknown agents are
deferred to the dedicated agent-resolution error rather than masked here.

Keep all public-facing comments, docstrings, and the error message generic (no internal
agent names). Cover the full validator matrix, including the unknown-agent deferral.
Compress multi-line inline comments down to the single non-obvious rationale (or drop
them where the code already speaks), per the self-documenting-code principle. Public API
docstrings and test intent comments are left intact.
apply_params_set overlays sampled scalar draws onto cmd_args, then
reconstructs the TestDefinition to validate the applied action values.
That pass re-ran validate_env_params, which rejects a weighted env_param
whose cmd_args target is no longer a candidate list - exactly what the
overlay produces. env_params is already validated at parse time, so drop
it from the validation-only dump. Adds a regression test covering a
weighted env_param's scalar draw.
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 25, 2026
…reserve traceback on DSE re-raise

- _as_obs_array(): assert encoded keys match descriptors before coercion
  (reuses _assert_keys, same guard as decode_action/step_raw) and
  materialize output by descriptor keys to avoid KeyError on extra keys
  and silent partial observations on missing keys.
- handlers.py: re-raise the captured hard-fail with its original traceback.

Addresses CodeRabbit findings on NVIDIA#930.
@rutayan-nv rutayan-nv force-pushed the rpatro/gymnasium-adapter branch from 75b0f5a to d914d74 Compare June 25, 2026 21:15
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 25, 2026
…reserve traceback on DSE re-raise

- _as_obs_array(): assert encoded keys match descriptors before coercion
  (reuses _assert_keys, same guard as decode_action/step_raw) and
  materialize output by descriptor keys to avoid KeyError on extra keys
  and silent partial observations on missing keys.
- handlers.py: re-raise the captured hard-fail with its original traceback.

Addresses CodeRabbit findings on NVIDIA#930.
@rutayan-nv rutayan-nv force-pushed the rpatro/gymnasium-adapter branch from d914d74 to 0417e79 Compare June 25, 2026 22:17
An env_params entry only reclassifies a list-valued cmd_args sweep as
env-sampled; a scalar is already fixed, so annotating it is a meaningless
label. Previously such an annotation was tolerated as a silent no-op, which
let it slip through parse-time validation and inconsistently trip (or not)
the downstream "no agent will sample them" check depending on run mode.

Reject it where the contract lives - TestDefinition.validate_env_params -
so the failure is immediate and mode-independent. EnvParams.from_test's
non-list guard becomes defensive (parse-time now guarantees lists); the
post-overlay path already drops env_params before re-validating, so concrete
scalar draws are unaffected.

Extract the per-field checks into a helper to keep the validator under the
complexity limit, and update tests: scalar annotations now assert rejection
instead of no-op tolerance.
…otocol

Add ObsLeafDescriptor (a self-describing observation leaf: "box" of width
dim, or "discrete" of size n) and a StructuredObservation Protocol that
documents the optional env hooks structured_observation_descriptors() and
encode_observation(). These let an env expose a named, per-leaf observation
so adapters (e.g. GymnasiumAdapter) can build the matching gymnasium
spaces.Dict; the hooks are duck-typed, so envs need not subclass.

Both exported via cloudai.core.
…ejection tests

Negative tests pass an extra kwarg and an out-of-Literal kind to assert
ValidationError; mark the deliberate type violations with type: ignore.
Wrap a CloudAI BaseGym as a gymnasium.Env-shaped object: a spaces.Dict of
Discrete (list params) and Box (ContinuousSpace) actions over the tunable
params with fixed (single-value) params injected each step; observations as
either a flat float32 Box or, when the env opts in via the structured-obs
hooks, a spaces.Dict of per-leaf ObsLeafDescriptor subspaces. Continuous
dtype="int" params are quantized (rounded/clamped) at decode_action so the
trajectory cache key collapses float jitter. The adapter is a pure
pass-through over test_run.step (never mutates it), so contextual-bandit
rollouts that reset() per trial keep a monotonic trial index.

gymnasium is an optional dependency lazy-imported behind the new [rl] extra
(also added to dev); CloudAIGymEnv.define_observation_space() now returns one
slot per agent metric so adapters get the right Box shape. Exported via
cloudai.core. Caller-contract tests pin the step-monotonicity, observation
pass-through, continuous-quantization, and structured-obs invariants.
…ature

step() delegates to decode_action(dict[str, Any]) and exists precisely to
round float/continuous policy actions to ints; widen its parameter type
from dict[str, int] to dict[str, Any] to match.
…reserve traceback on DSE re-raise

- _as_obs_array(): assert encoded keys match descriptors before coercion
  (reuses _assert_keys, same guard as decode_action/step_raw) and
  materialize output by descriptor keys to avoid KeyError on extra keys
  and silent partial observations on missing keys.
- handlers.py: re-raise the captured hard-fail with its original traceback.

Addresses CodeRabbit findings on NVIDIA#930.
…ngleton

Replace the bespoke _import_gymnasium() in-method seam with the canonical
lazy.gymnasium / lazy.np properties; addresses the in-method-import review
concern. No behavior change — gymnasium stays an optional [rl] extra.
The lazy.gymnasium.spaces refactor gives the adapter precise gymnasium types
instead of Any, which surfaced two latent issues the scoped pre-commit run
missed:

- pyright now sees Space[Any]/Dict in the adapter contract test, so concrete
  attribute access (.low/.high/.n/.spaces) is flagged. Narrow via local
  bindings + isinstance before access.
- lazy_imports.py now has a 2026 commit in its history, so the ci_only
  copyright check requires the year range 2025-2026.
Inherit from gymnasium.Env (guarded import, falls back to object when the
optional [rl] extra is absent) so ecosystem tooling that performs isinstance
checks (e.g. Stable-Baselines3) accepts the adapter.

- Use the TYPE_CHECKING import form so pyright sees a concrete base class
  while runtime keeps the optional-dependency fallback.
- Drop ClassVar on metadata to match Env's attribute shape (noqa RUF012).
- Rename the inner-env accessor unwrapped -> cloudai_env; gymnasium's
  Env.unwrapped (returns self) is the correct base-env semantics, and the
  old override returned a non-Env (BaseGym), which would mislead ecosystem
  code calling .unwrapped.
Inheriting gymnasium.Env widened the static type of action_space from the
concrete spaces.Dict the adapter builds to the base spaces.Space, which has
no __getitem__. Cast at the test call site to restore subspace indexing for
pyright (runtime is unchanged; action_space is always a Dict).
…rse of decode_action

decode_action had no public inverse, so consumers needing value->index
encoding (e.g. RLlib warm-start / behavioral cloning) reached into the
private _tunable_params dict. When ContinuousSpace support split that
internal, those consumers broke with AttributeError.

encode_action closes the contract: discrete values map to their candidate
index, continuous values wrap into the clamped float32 Box array, so
decode_action(encode_action(v)) == v for any native v. Adds round-trip
contract tests pinning the invariant and rejection of non-candidate
values / key mismatches.
…tinuousSpace

The GymnasiumAdapter's continuous-action path depends on ContinuousSpace, which
ships separately. Until then nothing constructs a ContinuousSpace, so the
continuous branches here are unreachable and the only effect of the import is an
ImportError at module load. Drop the continuous import, _continuous_params, the
Box action mapping, and decode/encode continuous handling so the adapter builds
and ships standalone over discrete + structured-observation support. The
continuous support rejoins when ContinuousSpace lands.
@rutayan-nv rutayan-nv force-pushed the rpatro/gymnasium-adapter branch from 0417e79 to 4f479dd Compare June 26, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants