fix(configurator): make env_params first-class to fix the trajectory cache key by rutayan-nv · Pull Request #901 · NVIDIA/cloudai

rutayan-nv · 2026-05-26T23:22:58Z

Issue

Cache correctness: cache keyed on action only → revisiting an action under a different env_params sample serves the stale reward, corrupting the signal (the bug [TDD-red] CloudAIGymEnv cache key must include env_params (drives domain-randomization correctness fix) #900 pins).
Artifact alignment: env.csv lived in workload code and was skipped on cache hits — sparse and misaligned with trajectory.csv.

Fix

Cache correctness: env_params becomes first-class schema (TestDefinition.env_params, TestRun.current_env_params, TrajectoryEntry.env_params); the cache key is now (action, current_env_params).
Artifact alignment: CloudAIGymEnv builds an EnvParamsObserver that samples deterministically and writes env.csv before the cache lookup, so it aligns 1:1 with trajectory.csv; workloads without env_params pay zero overhead.

Testing

[TDD-red] CloudAIGymEnv cache key must include env_params (drives domain-randomization correctness fix) #900's 4 TDD-red tests now pass; +9 unit (test_env_params.py) +2 integration (test_cloudaigym.py); full pytest tests: 1641 passed, 4 skipped; ruff + pyright clean.

Stack: main ← #893 ← #900 ← #901 (top). Depends on #900 → #893.

coderabbitai · 2026-05-26T23:23:03Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: c9f41e83-f4ec-40e7-9ab6-6a43ed0109c8

📥 Commits

Reviewing files that changed from the base of the PR and between 7fec7f5 and 8f91664.

📒 Files selected for processing (3)

src/cloudai/configurator/env_params.py
src/cloudai/models/workload.py
tests/test_env_params.py

📝 Walkthrough

Walkthrough

Adds per-trial env parameter sampling and validation, threads sampled env_params through TestRun and CloudAIGymEnv, persists env.csv alongside trajectories, tightens cache matching on env_params, and updates DSE CLI handling and tests for the new flow.

Changes

Env parameter sampling and DSE flow

Layer / File(s)	Summary
Env parameter primitives `src/cloudai/configurator/env_params.py`, `tests/test_env_params.py`	`EnvParamSpec`, `EnvParam`, `EnvParams`, and `EnvParamsSink` are added for weighted sampling, deterministic per-trial draws, and CSV persistence. Tests cover validation, sampling, sink output, and resolution from test definitions.
Workload and TestRun overlays `src/cloudai/models/workload.py`, `src/cloudai/_core/test_scenario.py`, `tests/test_env_params.py`	`TestDefinition` adds `env_params`, validates env-param annotations, and excludes env-sampled `cmd_args` from DSE classification. `TestRun` stores `current_env_params`, filters env-sampled keys from `param_space`, and overlays action and `env_params` through `apply_params_set`. Tests cover validation, DSE classification, and overlay behavior.
CloudAIGymEnv env_params flow `src/cloudai/configurator/cloudai_gym.py`, `tests/test_cloudaigym.py`	`CloudAIGymEnv` samples `env_params` per step, records them in trajectory entries and `env.csv`, and matches cached trajectories on both action and `env_params`. Tests cover cache keying, step alignment, overlay application, and the no-`env_params` path.
DSE validation and execution `src/cloudai/configurator/base_agent.py`, `src/cloudai/configurator/env_params.py`, `src/cloudai/cli/handlers.py`, `tests/test_handlers.py`	`BaseAgent` gains `samples_env_params`. `validate_dse_env_params` is wired into dry-run and scenario verification. `handle_dse_job` runs each DSE case through `CloudAIGymEnv` and `agent.run()`, records `dse_failure.txt` on unexpected errors, and passes run errors into report generation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA/cloudai#933: Refactors handle_dse_job and generate_reports around agent.run() and optional error handling, which this PR extends for env-parameter-aware DSE execution.

Suggested labels

enhancement

Suggested reviewers

srivatsankrishnan
jeffnvidia
amaslenn

Poem

A rabbit hopped through env.csv at dawn,
With weighted carrots freshly drawn.
The cache kept pace, the trials sang,
And DSE twinkled as samples sprang.
\o/ 🐇

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly captures the main change: making env_params first-class to fix trajectory cache correctness.
Description check	✅ Passed	The description accurately summarizes the cache-key and env.csv alignment changes plus testing coverage.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/cloudai/configurator/cloudai_gym.py (2)

143-171: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Keep trajectory.csv aligned on the constraint-failure path.

observer.before_step() has already persisted the sampled env row by Line 147, but the early return on Lines 169-171 skips write_trajectory(). Any trial rejected by constraint_check() now leaves an orphan env.csv row, which breaks the PR’s stated 1:1 env.csv ↔ trajectory.csv contract and makes per-trial artifacts inconsistent.

Suggested fix

         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0]
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 143 - 171, The early
return on constraint_check leaves an env row without a matching trajectory; to
fix, when test_run.test.constraint_check(test_run, runner.system) returns False,
write a TrajectoryEntry (use self.write_trajectory with step=self.test_run.step,
action=action, reward=self.rewards.constraint_failure, observation=[-1.0],
env_params=dict(self.test_run.current_env_params)) and call each
observer.after_step(self.test_run, observation, reward) before returning the
same quadruple (observation, reward, True, {}); this mirrors the cached-result
path and preserves the 1:1 env.csv ↔ trajectory.csv contract.

186-202: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Preserve current_env_params when recovering from runner failures.

On Line 189 the fallback rebuilds self.test_run from self.original_test_run, which resets current_env_params to its default. The trajectory row written on Line 202 then records {} even though env.csv already contains the sampled params, and the cached failure is stored under (action, {}) instead of (action, sampled_env_params). Repeating the same failing trial will miss the cache and produce disagreeing artifacts.

Suggested fix

         if self.runner.test_scenario.test_runs and self.runner.test_scenario.test_runs[0].output_path.exists():
             self.test_run = self.runner.test_scenario.test_runs[0]
         else:
-            self.test_run = copy.deepcopy(self.original_test_run)
+            self.test_run = copy.deepcopy(self.original_test_run).apply_params_set(action)
             self.test_run.step = new_tr.step
             self.test_run.output_path = new_tr.output_path
+            self.test_run.current_env_params = dict(new_tr.current_env_params)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 186 - 202, When
rebuilding self.test_run in the fallback branch (inside cloudai_gym.py) you must
preserve the sampled env params so the trajectory/cache keys match the actual
env.csv; after creating the deep-copied self.test_run from
self.original_test_run and after setting step/output_path, copy the sampled
params into self.test_run.current_env_params (e.g. assign a deepcopy of
self.original_test_run.current_env_params or the sampled params present on the
runner) before calling get_observation/compute_reward and write_trajectory; this
ensures dict(self.test_run.current_env_params) in the TrajectoryEntry reflects
the real sampled parameters.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/base_agent.py`:
- Around line 116-149: In run(), replace the eager f-string logging calls with
%-style deferred logging so formatting happens only if the message will be
emitted; specifically update the two logging.info calls inside the run method
(the "Running step ..." and "Step {step}: Observation..." messages) to use
logging.info("Running step %s (of %s) with action %s", step, self.max_steps,
action) and logging.info("Step %s: Observation: %s, Reward: %.4f", step, ...,
reward) respectively, ensuring you format the observation list without using
f-strings so the logging framework can lazily interpolate.

In `@src/cloudai/configurator/env_params.py`:
- Around line 1-2: The file header years in
src/cloudai/configurator/env_params.py are incorrect for the touched year;
update the two top lines (the SPDX copyright line and the subsequent Copyright
line) by replacing "2024-2026" with the exact year string the repo's header test
expects for this file (e.g., "2026") so the header matches
tests/test_check_copyright_headers.py.

In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 133-165: Add a public close(self) -> None method to
GymnasiumAdapter that safely delegates to the wrapped environment's close
method: check for an attribute (e.g. getattr(self._env, "close", None)), call it
if callable, and return None; do not assume inheritance from gymnasium.Env. Add
the method near other lifecycle helpers (e.g. alongside
_sync_underlying_step_counter, _step_with_params) so callers that call
env.close() won't raise AttributeError.
- Around line 77-80: GymnasiumAdapter.__init__ currently filters action space
lists into _tunable_params (len>1) and _fixed_params (len==1) but silently drops
any keys with empty lists; modify GymnasiumAdapter.__init__ to detect any entry
in the raw_action_space where len(v) == 0 and raise a ValueError identifying the
offending key(s) so callers (e.g., TestRun.param_space) fail fast; add a unit
test that constructs a TestRun or raw_action_space containing an empty list for
a cmd_args key and asserts that GymnasiumAdapter(...) raises ValueError with a
message mentioning the empty key(s).

In `@tests/test_env_params.py`:
- Around line 86-87: The test uses a broad pytest.raises(ValueError) when
calling sink.write(0, {"drop_rate": 0.0}); narrow it by adding a match parameter
to pytest.raises that matches the specific ValueError message expected (e.g.,
the message about invalid drop_rate or index) so the test asserts the exact
error condition raised by sink.write; update the pytest.raises invocation around
sink.write to include match="expected message substring" referencing the
sink.write call and pytest.raises.
- Around line 1-2: The copyright header in the file contains the range
"2025-2026" which fails CI; update the header lines (the SPDX line and the
Copyright line that currently reads "2025-2026 NVIDIA CORPORATION & AFFILIATES")
to use a single year "2026" so they match the expected format for new 2026
files.

In `@tests/test_handlers.py`:
- Around line 287-288: The override of select_action in tests/test_handlers.py
has the wrong signature; change the function declaration for select_action in
that file to match the base class signature BaseAgent.select_action(self,
observation: list[float] | None = None) (i.e., add the observation parameter
with the same type and default) and keep the existing body (raising
AssertionError) so pyright/CI type-checking passes while preserving the test
intent.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 143-171: The early return on constraint_check leaves an env row
without a matching trajectory; to fix, when
test_run.test.constraint_check(test_run, runner.system) returns False, write a
TrajectoryEntry (use self.write_trajectory with step=self.test_run.step,
action=action, reward=self.rewards.constraint_failure, observation=[-1.0],
env_params=dict(self.test_run.current_env_params)) and call each
observer.after_step(self.test_run, observation, reward) before returning the
same quadruple (observation, reward, True, {}); this mirrors the cached-result
path and preserves the 1:1 env.csv ↔ trajectory.csv contract.
- Around line 186-202: When rebuilding self.test_run in the fallback branch
(inside cloudai_gym.py) you must preserve the sampled env params so the
trajectory/cache keys match the actual env.csv; after creating the deep-copied
self.test_run from self.original_test_run and after setting step/output_path,
copy the sampled params into self.test_run.current_env_params (e.g. assign a
deepcopy of self.original_test_run.current_env_params or the sampled params
present on the runner) before calling get_observation/compute_reward and
write_trajectory; this ensures dict(self.test_run.current_env_params) in the
TrajectoryEntry reflects the real sampled parameters.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 0097c65c-5ab7-4bd2-8683-6e1ea26cc182

📥 Commits

Reviewing files that changed from the base of the PR and between 18ce72e and d076095.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (14)

pyproject.toml
src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/models/workload.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_gymnasium_adapter.py
tests/test_handlers.py
tests/test_test_scenario.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cloudai/configurator/cloudai_gym.py (1)

169-171: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve env/traj alignment and env_params identity on non-happy paths.

Line 170 currently returns on constraint failure after before_step() has already persisted env.csv, so env.csv and trajectory.csv can diverge.
Also, in Lines 189-203, replacing self.test_run with original_test_run can drop current_env_params, so the trajectory row is written with empty env params and weakens (action, env_params) cache identity.

Proposed fix

         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0]
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}
@@
         if self.runner.test_scenario.test_runs and self.runner.test_scenario.test_runs[0].output_path.exists():
             self.test_run = self.runner.test_scenario.test_runs[0]
         else:
+            preserved_env_params = dict(new_tr.current_env_params)
             self.test_run = copy.deepcopy(self.original_test_run)
             self.test_run.step = new_tr.step
             self.test_run.output_path = new_tr.output_path
+            self.test_run.current_env_params = preserved_env_params

Also applies to: 189-203

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 169 - 171, The
constraint check at lines 169-171 returns after before_step() has already
persisted env.csv, causing env.csv and trajectory.csv to diverge. Move the
constraint check before the before_step() call so that early returns don't
create misalignment between the two files. Additionally, at lines 189-203,
ensure that self.test_run is used instead of original_test_run to preserve
current_env_params in the trajectory row, maintaining proper action-env_params
cache identity.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 151-163: The _sync_underlying_step_counter method has an
off-by-one error where it sets test_run.step to self._step_count + 1, but
CloudAIGymEnv.step() also increments test_run.step on entry, causing a double
increment. Fix this by changing the assignment to test_run.step =
self._step_count instead, since the increment will be handled by
CloudAIGymEnv.step() when it is called, ensuring the first adapter-driven step
is recorded as trial 1 rather than trial 2.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 169-171: The constraint check at lines 169-171 returns after
before_step() has already persisted env.csv, causing env.csv and trajectory.csv
to diverge. Move the constraint check before the before_step() call so that
early returns don't create misalignment between the two files. Additionally, at
lines 189-203, ensure that self.test_run is used instead of original_test_run to
preserve current_env_params in the trajectory row, maintaining proper
action-env_params cache identity.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: ea1d6771-30cc-49dd-bc66-394901097b13

📥 Commits

Reviewing files that changed from the base of the PR and between d076095 and 5c85e08.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

pyproject.toml
src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/models/workload.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_gymnasium_adapter.py
tests/test_handlers.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/cloudai/configurator/cloudai_gym.py (1)

146-171: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep env.csv and trajectory.csv aligned on constraint-failure steps.

before_step() runs before the constraint check, so env.csv is already appended. The early return on Line 171 skips write_trajectory(), creating step drift between artifacts when constraints fail.

Proposed fix

         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            observation = [-1.0]
+            reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=reward,
+                    observation=observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, observation, reward)
+            return observation, reward, True, {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 146 - 171, The
constraint check failure on line 171 returns early without calling
write_trajectory(), while before_step() has already been called on line 146,
causing env.csv and trajectory.csv to become misaligned. Add a
write_trajectory() call before the early return on constraint failure, similar
to what is done in the cached result case, to write a TrajectoryEntry with the
current step, action, constraint_failure reward, an appropriate observation, and
current environment parameters to keep the CSV artifacts synchronized.

tests/test_cloudaigym.py (1)

165-194: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add an env_params-enabled constraint-failure regression test.

Current constraint-failure coverage does not exercise observer-enabled runs. Add a variant with tdef.env_params declared and assert failed steps keep env.csv and trajectory.csv step-aligned.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cloudaigym.py` around lines 165 - 194, The test_constraint_failure
function needs to be expanded to cover observer-enabled runs. Add a new test
variant by either parameterizing the existing test or creating an additional
test case where tdef.env_params is declared before creating the CloudAIGymEnv.
After the env.step() call, add assertions to verify that the env.csv and
trajectory.csv output files are kept step-aligned following the constraint
failure, ensuring that observer-enabled runs handle failed steps correctly
without misaligning the step records across output files.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 146-171: The constraint check failure on line 171 returns early
without calling write_trajectory(), while before_step() has already been called
on line 146, causing env.csv and trajectory.csv to become misaligned. Add a
write_trajectory() call before the early return on constraint failure, similar
to what is done in the cached result case, to write a TrajectoryEntry with the
current step, action, constraint_failure reward, an appropriate observation, and
current environment parameters to keep the CSV artifacts synchronized.

In `@tests/test_cloudaigym.py`:
- Around line 165-194: The test_constraint_failure function needs to be expanded
to cover observer-enabled runs. Add a new test variant by either parameterizing
the existing test or creating an additional test case where tdef.env_params is
declared before creating the CloudAIGymEnv. After the env.step() call, add
assertions to verify that the env.csv and trajectory.csv output files are kept
step-aligned following the constraint failure, ensuring that observer-enabled
runs handle failed steps correctly without misaligning the step records across
output files.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 804b42a1-6f0a-462b-b3b1-291ec8eea021

📥 Commits

Reviewing files that changed from the base of the PR and between 5c85e08 and 0efcd11.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

pyproject.toml
src/cloudai/_core/test_scenario.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/models/workload.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_gymnasium_adapter.py
tests/test_handlers.py

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 146-148: The constraint-failure early return in the code is
preventing `trajectory.csv` from being written and `after_step()` hooks from
being called, which breaks trial artifact consistency. Additionally, the early
return provides a fixed one-element observation instead of one matching the
configured observation space. Restructure the constraint-failure handling at
both affected locations to ensure that after encountering a constraint failure,
the code still executes the necessary finalization steps including writing
`trajectory.csv` and calling `after_step()` on all observers, while also
returning an observation that matches the shape of the configured observation
space rather than a hardcoded single-element observation. This applies at
src/cloudai/configurator/cloudai_gym.py lines 146-148 (anchor) and lines 169-172
(sibling).

In `@tests/test_cloudaigym.py`:
- Around line 567-615: Add a new test function to validate step alignment when
constraint_check fails during the env_params flow. Create a test similar to
test_env_csv_is_step_aligned_with_trajectory but configure the environment and
test scenario to trigger a constraint violation during env.step() calls. The
test must verify that despite the constraint failure: (1) env.csv and
trajectory.csv maintain 1:1 step row alignment (same step values in both files),
and (2) the observation returned from env.step() has length matching the number
of entries in agent_metrics. Use the same CloudAIGymEnv, TestRun, TestScenario,
and BaseRunner mocking patterns as the existing alignment test, but introduce
constraint conditions that will cause failures and ensure the alignment
verification assertions apply to the failure case.

In `@tests/test_gymnasium_adapter.py`:
- Around line 70-76: The `_GymWithTestRun` class initializes a `test_run` object
with a step counter but never increments it in the `step()` method, unlike the
real `CloudAIGymEnv.step()`. This means test assertions won't catch counter-sync
regressions in the adapter. Override or update the `step()` method in
`_GymWithTestRun` to increment `self.test_run.step` each time the method is
called, ensuring the test model accurately reflects the behavior of
`CloudAIGymEnv` and allows the adapter's pre-step sync assertions to detect
regressions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 0af1c210-2370-43d6-90f0-330aa01c925f

📥 Commits

Reviewing files that changed from the base of the PR and between 0efcd11 and b3adf41.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

pyproject.toml
src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/__init__.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/configurator/gymnasium_adapter.py
src/cloudai/models/workload.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_gymnasium_adapter.py
tests/test_handlers.py

rutayan-nv · 2026-06-16T14:09:02Z

@coderabbitai review

coderabbitai · 2026-06-16T14:09:12Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cloudai/configurator/cloudai_gym.py (1)

168-170: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constraint-failure early return still breaks env.csv/trajectory.csv alignment.

When constraint_check fails, before_step() has already executed (line 146), writing to env.csv. But this early return skips both write_trajectory() and after_step() calls. This creates misalignment: env.csv has step N while trajectory.csv does not.

The past review flagged this and was marked "Addressed", but the current code still shows the bare early return. Either the fix was incomplete or reverted.

🛠️ Proposed fix

         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            failure_obs = [-1.0]
+            failure_reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=failure_reward,
+                    observation=failure_obs,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, failure_obs, failure_reward)
+            return failure_obs, failure_reward, True, {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 168 - 170, The
constraint_check failure early return at line 168-170 skips both
write_trajectory() and after_step() calls while before_step() has already
executed and written to env.csv, causing misalignment between env.csv and
trajectory.csv. Move the write_trajectory() and after_step() method calls to
execute before this early return statement to ensure both files stay
synchronized, so that whenever a step is written to env.csv, the corresponding
trajectory is written to trajectory.csv.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 168-170: The constraint_check failure early return at line 168-170
skips both write_trajectory() and after_step() calls while before_step() has
already executed and written to env.csv, causing misalignment between env.csv
and trajectory.csv. Move the write_trajectory() and after_step() method calls to
execute before this early return statement to ensure both files stay
synchronized, so that whenever a step is written to env.csv, the corresponding
trajectory is written to trajectory.csv.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 3128d272-6f62-499c-b04a-76f177dbb799

📥 Commits

Reviewing files that changed from the base of the PR and between b3adf41 and 2411f39.

📒 Files selected for processing (10)

src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/models/workload.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_handlers.py
tests/test_test_scenario.py

rutayan-nv · 2026-06-16T15:15:17Z

@coderabbitai review

coderabbitai · 2026-06-16T15:15:24Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

♻️ Duplicate comments (1)

src/cloudai/configurator/cloudai_gym.py (1)

168-170: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve step finalization on constraint failure to keep artifacts aligned.

At Line 168, this early return skips both trajectory persistence and observer after_step() even though before_step() already ran. That can leave env.csv and trajectory.csv out of sync by step and breaks the observer lifecycle contract on failure paths.

💡 Proposed fix

         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            failure_observation = [-1.0] * max(len(self.test_run.test.agent_metrics), 1)
+            failure_reward = self.rewards.constraint_failure
+            self.write_trajectory(
+                TrajectoryEntry(
+                    step=self.test_run.step,
+                    action=action,
+                    reward=failure_reward,
+                    observation=failure_observation,
+                    env_params=dict(self.test_run.current_env_params),
+                )
+            )
+            for observer in self.observers:
+                observer.after_step(self.test_run, failure_observation, failure_reward)
+            return failure_observation, failure_reward, True, {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 168 - 170, The
constraint check failure at line 168 returns early before executing trajectory
persistence and the observer after_step() method, breaking the observer
lifecycle since before_step() already executed. Refactor this code so that the
finalization steps (trajectory persistence and observer after_step() call) run
even when constraint_check() fails. Move the early return to after all
finalization code completes, ensuring env.csv and trajectory.csv remain
synchronized and the observer contract is maintained on all code paths including
constraint failures.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 168-170: The constraint check failure at line 168 returns early
before executing trajectory persistence and the observer after_step() method,
breaking the observer lifecycle since before_step() already executed. Refactor
this code so that the finalization steps (trajectory persistence and observer
after_step() call) run even when constraint_check() fails. Move the early return
to after all finalization code completes, ensuring env.csv and trajectory.csv
remain synchronized and the observer contract is maintained on all code paths
including constraint failures.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: e0a47585-a77f-469b-ad79-79b222c1b20a

📥 Commits

Reviewing files that changed from the base of the PR and between 2411f39 and c53596d.

📒 Files selected for processing (6)

src/cloudai/_core/test_scenario.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/models/workload.py
tests/test_cloudaigym.py
tests/test_env_params.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/cli/handlers.py`:
- Around line 147-162: Replace the f-string formatting in the logging.error call
(starting with "No agent available for type") with lazy logging interpolation by
converting the f-string to a regular string with placeholder values and passing
the variables as separate arguments to the logging.error function. The same
pattern should also be applied to the logging.info call at line 162 (starting
with "Using deterministic first sweep"). This defers string formatting until the
log is actually emitted, improving performance and satisfying the Ruff G004
rule.

In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 66-67: The CsvSink for env.csv is created once during
initialization but trajectory_file_path is iteration-dependent, causing the sink
to append to stale directories when the current_iteration changes. Instead of
creating the env sink once in the initialization method, move the CsvSink
creation logic for self._env_sink from the initialization code around line 80 to
be called per iteration where current_iteration is known. Initialize
self._env_sink as None in __init__ and then recreate it for each iteration using
the current iteration-dependent trajectory_file_path to ensure proper
per-iteration alignment of env.csv with the trajectory files.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 062e4bc1-2770-4594-8ee6-896015e5c6b2

📥 Commits

Reviewing files that changed from the base of the PR and between c53596d and 5c3f65e.

📒 Files selected for processing (10)

src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/models/workload.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_handlers.py
tests/test_test_scenario.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/cloudai/configurator/cloudai_gym.py (1)

80-81: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Rebind env.csv sink using the current iteration path before each write.

Line 81 initializes CsvSink once, but trajectory writes use the dynamic trajectory_file_path. If current_iteration changes, trajectory.csv moves while env.csv keeps appending to the old sink path, breaking 1:1 alignment.

🛠️ Proposed fix

         if self._env_sink is not None:
-            self._env_sink.write(entry.step, entry.env_params)
+            # Keep env.csv colocated with the current iteration's trajectory.csv.
+            self._env_sink = CsvSink(self._env_csv_path())
+            self._env_sink.write(entry.step, entry.env_params)

Also applies to: 303-305

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 80 - 81, The CsvSink
for _env_sink is initialized once at the beginning but needs to be dynamically
rebinded before each write to track the current iteration path, similar to how
trajectory files use dynamic paths. Instead of setting self._env_sink once
during initialization in the constructor, you need to rebind the _env_sink to
the current iteration path before each write operation by calling
CsvSink(self._env_csv_path()) dynamically when needed. This ensures that env.csv
stays in sync with the trajectory.csv files as the current_iteration changes.
Apply this same fix to all locations where _env_sink is used for writing,
including the area referenced at lines 303-305.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/models/workload.py`:
- Around line 236-244: The validation loop that checks env_params should reject
structured cmd_args targets instead of silently skipping them. Remove or modify
the early continue statement when spec.weights is None, and instead add
validation to check if the cmd_args field referenced by name is a structured
type (not a list). If it is a structured field without weights declared, raise a
ValueError to prevent the field from being silently excluded from the action
space by param_space or is_dse_job checks. This ensures that env_params only
references candidate list fields in cmd_args, preventing action dimensions from
being unintentionally removed.

---

Duplicate comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 80-81: The CsvSink for _env_sink is initialized once at the
beginning but needs to be dynamically rebinded before each write to track the
current iteration path, similar to how trajectory files use dynamic paths.
Instead of setting self._env_sink once during initialization in the constructor,
you need to rebind the _env_sink to the current iteration path before each write
operation by calling CsvSink(self._env_csv_path()) dynamically when needed. This
ensures that env.csv stays in sync with the trajectory.csv files as the
current_iteration changes. Apply this same fix to all locations where _env_sink
is used for writing, including the area referenced at lines 303-305.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 1960514a-f9b1-4af4-b0c4-ed40d1947f92

📥 Commits

Reviewing files that changed from the base of the PR and between 5c3f65e and d81001c.

📒 Files selected for processing (8)

src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/models/workload.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_handlers.py

rutayan-nv · 2026-06-24T00:04:18Z

Core owns the randomization and the TestRun update; it is not agent-specific.

Real-world use. Domain randomization: vary an uncontrolled environment condition per trial so the learned policy generalizes instead of overfitting one fixed setting. Example: the agent tunes paddle_width (action) while each trial randomizes ball_speed (env_param).

Injection. env_params is an annotation, not a value store. Candidate values live in cmd_args, same as the action space (cmd_args.ball_speed = [4, 6, 8]); [env_params.ball_speed] reclassifies that list from "search" to "sample." Per trial, CloudAIGymEnv samples one value (EnvParamsObserver), writes it to TestRun.current_env_params, and overlays it onto cmd_args via model_copy(update=…) before the workload runs.

Core vs agent_config. The agent never reads or sets env_params; it only sees the reward. Sampling and the TestRun write happen in core (cloudai's Gym environment). It is environment state, not an agent hyperparameter.

vs. sweep. Inverse of a sweep over the same list: a bare list in cmd_args is the action space (searched); the same list annotated env_params is sampled, not searched (excluded from param_space/is_dse_job). Hence the hard reject under grid_search.

#900. current_env_params is part of the trajectory cache key, so revisiting an action under a different sample no longer serves a stale reward.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cloudai/models/workload.py (1)
228-230: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Don’t skip validation for empty cmd_args schemas.

Line 229 treats an empty model_fields mapping the same as a missing one, so env_params={"foo": ...} is accepted when cmd_args is a Pydantic model with no fields. Only skip when model_fields is absent.
Proposed fix
-        if not cmd_args_fields:
+        if cmd_args_fields is None:
             return self
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/models/workload.py` around lines 228 - 230, The validation logic
in the workload model is treating an empty cmd_args schema the same as no schema
at all, so env_params can slip through when the Pydantic model has zero fields.
Update the check in the method that inspects cmd_args_fields to only return
early when model_fields is missing (None), not when it is an empty mapping, so
empty schemas still trigger validation and reject unexpected env_params.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cloudai/models/workload.py`:
- Around line 228-230: The validation logic in the workload model is treating an
empty cmd_args schema the same as no schema at all, so env_params can slip
through when the Pydantic model has zero fields. Update the check in the method
that inspects cmd_args_fields to only return early when model_fields is missing
(None), not when it is an empty mapping, so empty schemas still trigger
validation and reject unexpected env_params.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: c111e710-1247-4ba9-b834-99fd924d1ef8

📥 Commits

Reviewing files that changed from the base of the PR and between cf7dbf6 and 3ce5507.

📒 Files selected for processing (2)

src/cloudai/models/workload.py
tests/test_env_params.py

podkidyshev · 2026-06-24T16:16:22Z

-                    if isinstance(value, list) or (isinstance(value, dict) and check_dict(value, path)):
+                    # env_params lists are sampled by the env, not searched by the agent, so they
+                    # are not action-space dimensions and must not make a run count as DSE.
+                    if skip_env_params and path.split(".", 1)[0] in self.env_params:


I keep seeing this path.split(".", 1)[0] in regards of env_params. Does it really need to be dot-separated in all of those places?

moreover, some workloads declare more than just the root cmd_args to contain CLIs. e.g. vllm has bench_cmd_args just as well. at the same time certain feature are truly constrained to support only cmd_args

What I'm saying, I suggest the format of env_params is the full dot-separated path to a given attributed inside TestDefinition, not CmdArgs. I believe it will look cleaner in the toml:

num_nodes = 2 [cmd_args] foo = 1 [cmd_args.args] bar = [1, 2, 3] [env_params] "cmd_args.bar" = [0.1, 0.1, 0.8]

done - f25ef4d (removed the redundant split('.', 1)[0]; env_params keys are flat cmd_args field names. Kept scope to cmd_args for now rather than full TestDefinition dot-paths — can generalize to bench_cmd_args etc. in a follow-up if you'd prefer)

Yes, I would do a follow-up later to support full paths cmd_args.foo.bar or anything.abc.def meaning any nested attribute starting from TestDefinition. So that all parsing feature act consistently

just to be clear - don't need to be this PR but highly appreciated not to be prolonged. I can take it myself as well

podkidyshev · 2026-06-24T16:29:22Z

Overall arch suggestion (summary of some of my comments): move parsing and applying of env_params closer to where the regular sweeping knobs are processed - TestRun, TestDefinition classes. So that, we reuse most of the code related to sweeping/knobs in regards of how it works with the TestDefinition and CmdArgs body

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/cloudai/configurator/cloudai_gym.py (1)

157-159: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Return a failure observation with the configured metric shape.

BaseAgent.run() intentionally does not stop on done=True; it forwards this observation to update_policy(). With multiple agent_metrics, [-1.0] has the wrong shape compared with normal get_observation() output.

Proposed fix

         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            failure_observation = [self.rewards.metric_failure] * max(len(self.test_run.test.agent_metrics), 1)
+            return failure_observation, self.rewards.constraint_failure, True, {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 157 - 159, The
constraint-check early return in cloudai_gym.py uses a hardcoded [-1.0]
observation, which does not match the configured agent_metrics shape expected by
BaseAgent.run() and update_policy(). Update the failure path in the gym step
logic to return a failure observation built with the same structure as
get_observation(), using the configured metric set rather than a single scalar
placeholder, while keeping the existing reward, done flag, and info dict
behavior.

Source: Learnings

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/_core/test_scenario.py`:
- Around line 205-215: The revalidation pass in test_scenario’s constructor call
is re-running validate_env_params after action and env_params have already been
applied, so the post-overlay cmd_args can make weighted specs fail. Update the
validation-only path around type(tdef)(**tdef.model_dump()) to clear env_params
before the second construction, or otherwise avoid revalidating that field after
overlays in this flow. Keep the fix localized to the trial setup logic that
applies _apply to action and env_params.

---

Duplicate comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 157-159: The constraint-check early return in cloudai_gym.py uses
a hardcoded [-1.0] observation, which does not match the configured
agent_metrics shape expected by BaseAgent.run() and update_policy(). Update the
failure path in the gym step logic to return a failure observation built with
the same structure as get_observation(), using the configured metric set rather
than a single scalar placeholder, while keeping the existing reward, done flag,
and info dict behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: d61df24e-8964-44da-b280-80330b0f1f78

📥 Commits

Reviewing files that changed from the base of the PR and between 3ce5507 and 99d7bf2.

📒 Files selected for processing (9)

src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/models/workload.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_handlers.py

rutayan-nv · 2026-06-25T19:24:15Z

Overall arch suggestion (summary of some of my comments): move parsing and applying of env_params closer to where the regular sweeping knobs are processed - TestRun, TestDefinition classes. So that, we reuse most of the code related to sweeping/knobs in regards of how it works with the TestDefinition and CmdArgs body

done - d81001c, f25ef4d.

Parsing: env_params is a TestDefinition field; candidates live in cmd_args and validate_env_params validates against them. is_dse_job / is_env_sampled traverse cmd_args in the same pass as sweep detection.
Applying: TestRun.apply_params_set writes the agent action and the sampled env_params through the same _apply onto cmd_args; param_space reuses the same cmd_args list-values, excluding env-sampled keys.
Per-trial sampling stays in the env layer (CloudAIGymEnv.step), keeping apply_params_set deterministic.

Make env_params a first-class part of CloudAIGymEnv trial identity so the trajectory cache keys on (action, env_params) rather than action alone, fixing the domain-randomization correctness bug where the same action under a different env_params sample returned a stale reward. - Cache key now includes env_params; cache-key tests pin the contract (formerly the TDD-red specs of NVIDIA#900, folded in here). - Keep env.csv and trajectory.csv 1:1 step-aligned: a single TrajectoryEntry sinks both files coherently, including on constraint failure. - Reject env_params on non-DSE jobs; reject non-finite / negative weights. - Add cache-hit + declared-env_params integration coverage. Folds the test-only PR NVIDIA#900 (cache-key TDD) into this PR so the stack has no permanently-red standalone PR.

…rom search space Make env_params a thin annotation over cmd_args fields instead of a holder of candidate values. Candidate values live in cmd_args (the single source of truth, exactly like an action-space dimension); env_params.<name> only marks a field as env-sampled and carries optional sampling weights, never the values. - EnvParamSpec drops `values`; validates weights (finite, non-negative, sum=1.0). - Sampler/observer resolve candidate lists from cmd_args; scalar knobs are no-ops. - TestDefinition.validate_env_params cross-checks annotations against cmd_args (key must be a real field; weights require a list and must match its length). - Exclude env_params keys from both param_space and is_dse_job: an env-sampled list is not a search dimension, so an env-params-only workload is not a DSE job. - validate_dse_env_params rejects env_params on non-DSE runs and on grid_search (exhaustive search cannot exploit per-trial randomization). - Scrub private-implementation references from public docstrings. - Unit tests use generic Atari Breakout semantics (ball_speed / paddle_width).

…pyright - validate_env_params: reject structured (non-leaf) cmd_args targets. The observer cannot sample them, yet param_space/is_dse_job exclude the whole key, which would silently drop nested action dimensions. - CloudAIGymEnv.write_trajectory: rebind the env.csv sink to the current iteration path before each write, so env.csv stays 1:1 aligned with trajectory.csv when the env instance is reused across iterations. - test_env_params: assert the unknown-field rejection via model_validate so the negative test no longer trips pyright's call-arg check (CI Linting fix); add a structured-target rejection test.

An unweighted env_params spec skipped the candidate-list check, so an empty cmd_args list (e.g. ball_speed = []) passed validation and only failed later in EnvParamsSampler.sample() via rng.choice([]) (IndexError). Guard against an empty candidate list in validate_env_params so the error surfaces at TestDefinition build time. Addresses CodeRabbit feedback.

…ms value objects Replace the EnvParamsSampler class and the StepObserver/EnvParamsObserver indirection with two frozen dataclasses: EnvParam (one resolved knob: candidates, optional weights, single draw) and EnvParams (per-run knobs + seed, built via from_test, sampled per trial). The sampling RNG lives in the env: step() draws this trial's values and hands concrete values to TestRun.apply_params_set(action, env_params=...), which overlays action and sample through one deterministic path. Centralize the cmd_args -> env_params lookup in TestDefinition.is_env_sampled and access current_env_params directly. Expand EnvParam/EnvParams unit tests to cover draw, from_test, sample, and immutability.

Drop the EnvParamsSink Protocol + CsvSink pair (and runtime_checkable) for a single concrete EnvParamsSink, built unconditionally in CloudAIGymEnv. The sink is now stateless: write() takes the record path per call and skips empty samples, so non-DR runs write nothing and write_trajectory needs no branch. Derive both records from a new iteration_dir property and expose the env record via the env_params_record_path property (was _env_csv_path), keeping env.csv and trajectory.csv step-aligned without coupling the name to CSV.

…ty flag Replace the hardcoded `agent == "grid_search"` check with a BaseAgent.samples_env_params capability flag (opt-in, defaults False). Only agents whose search consumes per-trial env_params sampling set it True; enumerating/surrogate agents leave it False, so a config that declares env_params for an agent that would ignore them is rejected up front instead of silently no-op'ing. New agents answer for themselves with no string to maintain. Relocate validate_dse_env_params out of the CLI handlers into configurator/env_params.py next to the logic it guards, looking the agent up via the Registry. Unknown agents are deferred to the dedicated agent-resolution error rather than masked here. Keep all public-facing comments, docstrings, and the error message generic (no internal agent names). Cover the full validator matrix, including the unknown-agent deferral.

Compress multi-line inline comments down to the single non-obvious rationale (or drop them where the code already speaks), per the self-documenting-code principle. Public API docstrings and test intent comments are left intact.

apply_params_set overlays sampled scalar draws onto cmd_args, then reconstructs the TestDefinition to validate the applied action values. That pass re-ran validate_env_params, which rejects a weighted env_param whose cmd_args target is no longer a candidate list - exactly what the overlay produces. env_params is already validated at parse time, so drop it from the validation-only dump. Adds a regression test covering a weighted env_param's scalar draw.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

src/cloudai/configurator/cloudai_gym.py (1)

153-155: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Return a failure observation with the normal metric width.

This branch still returns [-1.0] unconditionally. If agent_metrics has more than one entry, the agent sees a different observation shape on constraint-failure trials than on every successful trial.

Proposed fix

         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            failure_observation = [self.rewards.metric_failure] * max(len(self.test_run.test.agent_metrics), 1)
+            return failure_observation, self.rewards.constraint_failure, True, {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 153 - 155, The
constraint-failure branch in the gym step logic returns a fixed one-element
observation, which can change the observation shape when agent_metrics normally
has multiple values. Update the early return in the step method that performs
the constraint_check so it builds a failure observation with the same width as
the normal observation from agent_metrics, while still returning the failure
reward and done flag.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/_core/test_scenario.py`:
- Around line 206-222: The num-nodes sweep handling in test_scenario should
special-case NUM_NODES before the generic action overlay, since _apply()
currently treats every action key as a cmd_args field and can incorrectly write
NUM_NODES into the test definition. Update the logic in the action application
block so NUM_NODES is removed or handled separately from the loop that calls
_apply(), then continue using the existing new_tr.num_nodes assignment to
preserve the sweep value on the copied trajectory object.

In `@src/cloudai/cli/handlers.py`:
- Around line 307-312: The validation in handle_dry_run_and_run only checks
validate_dse_env_params against the parsed test_scenario, so env_params can
still slip through when args.single_sbatch forces the non-DSE path. Update the
validation flow in handle_dry_run_and_run to also consider args.single_sbatch
and reject env-aware DSE scenarios before dispatching to handle_non_dse_job. Use
the existing symbols validate_dse_env_params, handle_dry_run_and_run, and
handle_non_dse_job to locate the check and add the explicit guard for the
single-sbatch case.

In `@src/cloudai/configurator/env_params.py`:
- Around line 181-188: The validation loop in EnvParams.from_test is treating
any non-empty tr.test.env_params as requiring DSE sampling, which incorrectly
rejects scalar-only annotations that are meant to be fixed no-ops. Update the
check around the sampled calculation so it distinguishes scalar markers from
actual sampled env params, and only raises/blocks when the env_params entry
contains values that would truly be sampled. Keep the logic localized to
EnvParams.from_test and use the existing agent resolution path (agents.get,
tr.is_dse_job, agent.samples_env_params) to preserve the current unknown-agent
behavior.

---

Duplicate comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 153-155: The constraint-failure branch in the gym step logic
returns a fixed one-element observation, which can change the observation shape
when agent_metrics normally has multiple values. Update the early return in the
step method that performs the constraint_check so it builds a failure
observation with the same width as the normal observation from agent_metrics,
while still returning the failure reward and done flag.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 998335ea-88c1-4486-8001-0d885dc9d775

📥 Commits

Reviewing files that changed from the base of the PR and between 5790ab9 and 7fec7f5.

📒 Files selected for processing (9)

src/cloudai/_core/test_scenario.py
src/cloudai/cli/handlers.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/cloudai_gym.py
src/cloudai/configurator/env_params.py
src/cloudai/models/workload.py
tests/test_cloudaigym.py
tests/test_env_params.py
tests/test_handlers.py

An env_params entry only reclassifies a list-valued cmd_args sweep as env-sampled; a scalar is already fixed, so annotating it is a meaningless label. Previously such an annotation was tolerated as a silent no-op, which let it slip through parse-time validation and inconsistently trip (or not) the downstream "no agent will sample them" check depending on run mode. Reject it where the contract lives - TestDefinition.validate_env_params - so the failure is immediate and mode-independent. EnvParams.from_test's non-list guard becomes defensive (parse-time now guarantees lists); the post-overlay path already drops env_params before re-validating, so concrete scalar draws are unaffected. Extract the per-field checks into a helper to keep the validator under the complexity limit, and update tests: scalar annotations now assert rejection instead of no-op tolerance.

podkidyshev · 2026-06-26T16:23:55Z

-        for key, value in action.items():
+
+        def _apply(key: str, value: Any) -> None:


i'd suggest reverting the changes with the _apply function (local functions are a troublesome thing during debugging) and rather replace the previous root

for key, value in action.items():

with

full_action = action | (env_params or {}) for key, value in full_action.items():

it makes the delta of the PR quite shorter + no inner function

podkidyshev · 2026-06-26T16:25:18Z

        new_tr.test = tdef
        if "NUM_NODES" in action:
            new_tr.num_nodes = action["NUM_NODES"]
+        new_tr.current_env_params = dict(env_params or {})


nit

Suggested change

new_tr.current_env_params = dict(env_params or {})

new_tr.current_env_params = env_params or {}

podkidyshev · 2026-06-26T16:35:07Z

                    action=action,
                    reward=cached_result.reward,
                    observation=cached_result.observation,
+                    env_params=dict(self.test_run.current_env_params),


nit

and I see this (to my understanding) redundant dict(...) call across entire PR. is there really a need for it?

Suggested change

env_params=dict(self.test_run.current_env_params),

env_params=self.test_run.current_env_params,

podkidyshev · 2026-06-26T16:37:14Z

+        # The test_run rebuild above drops the sample; restore it so the entry, cache key, and env.csv match.
+        self.test_run.current_env_params = new_tr.current_env_params


I see the same line in src/cloudai/_core/test_scenario.py:223. If it's indeed a duplicate - I would vote for removing this one and keeping one in scenario

podkidyshev · 2026-06-26T16:50:10Z

+            if not self._values_match_exact(entry.action, action):
+                continue
+            if self._values_match_exact(entry.env_params, current_env_params):
                return entry


nit again, but I'd like to push a bit for readibility. we sacrifice a few ms for validation but I believe it's just mush easier to read

Suggested change

if not self._values_match_exact(entry.action, action):

continue

if self._values_match_exact(entry.env_params, current_env_params):

return entry

action_match = self._values_match_exact(entry.action, action)

env_params_match = self._values_match_exact(entry.env_params, current_env_params)

if action_match and env_params_match:

return entry

podkidyshev · 2026-06-26T17:03:40Z

+    seed: int
+
+    @classmethod
+    def from_test(cls, test: Any) -> Optional["EnvParams"]:


is it really Any?

podkidyshev · 2026-06-26T17:14:00Z

-                    if isinstance(value, list) or (isinstance(value, dict) and check_dict(value, path)):
+                    # env_params lists are sampled by the env, not searched by the agent, so they
+                    # are not action-space dimensions and must not make a run count as DSE.
+                    if skip_env_params and path.split(".", 1)[0] in self.env_params:


Yes, I would do a follow-up later to support full paths cmd_args.foo.bar or anything.abc.def meaning any nested attribute starting from TestDefinition. So that all parsing feature act consistently

just to be clear - don't need to be this PR but highly appreciated not to be prolonged. I can take it myself as well

podkidyshev · 2026-06-26T17:15:26Z

+        return {name: param.draw(random.Random(f"{self.seed}:{name}:{trial}")) for name, param in self.params.items()}
+
+
+class EnvParamsSink:


redundant class definition: no state, no polymorphism

podkidyshev · 2026-06-26T17:27:53Z

+            writer.writerow([step, sample])
+
+
+def validate_dse_env_params(test_scenario: "TestScenario") -> None:


I wonder if you could just merge this validation with TestDefinition.validate_env_params (eliminating this one at all and calling it explicitly from handlers.py)

rutayan-nv · 2026-06-26T20:28:30Z

Follow-up gap: single-candidate (`[1]`) env_params still slip through

This PR rejects a scalar env_params target ("nothing to sample"), but a single-element list target (cmd_args.osl = [1] + [env_params.osl]) currently passes: validate_env_params only checks "is a non-empty list", not "≥ 2 candidates".

Semantically [1] is "sample from {1}" = always 1 = a constant draw — exactly the same no-op this PR is meant to kill, just spelled as a list. So it's the same class of bug leaking through.

Consistent fix: require an env_params target to be a candidate list of ≥ 2 values (a single candidate is a no-op, like a scalar). This also lines up with the parse-time canonicalization proposed in #946 (collapse non-env [x] → x): once that exists, a [1] env_param naturally reduces to the already-rejected scalar case.

Filing as a follow-up rather than expanding this PR's scope.

rutayan-nv force-pushed the rpatro/env-params-first-class branch from d0375b4 to d076095 Compare June 12, 2026 21:14

This was referenced Jun 12, 2026

[CLI/Core] Polymorphic agent dispatch + TestRun owns trial counter #893

Closed

[TDD-red] CloudAIGymEnv cache key must include env_params (drives domain-randomization correctness fix) #900

Closed

rutayan-nv marked this pull request as ready for review June 12, 2026 21:27

rutayan-nv requested review from jeffnvidia, podkidyshev and srivatsankrishnan as code owners June 12, 2026 21:27

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

rutayan-nv force-pushed the rpatro/env-params-first-class branch from d076095 to 5c85e08 Compare June 15, 2026 17:39

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread src/cloudai/configurator/gymnasium_adapter.py Outdated

rutayan-nv force-pushed the rpatro/env-params-first-class branch from 5c85e08 to 0efcd11 Compare June 15, 2026 22:49

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

rutayan-nv force-pushed the rpatro/env-params-first-class branch from 0efcd11 to b3adf41 Compare June 15, 2026 23:20

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread src/cloudai/configurator/cloudai_gym.py Outdated

Comment thread tests/test_cloudaigym.py Outdated

Comment thread tests/test_gymnasium_adapter.py Outdated

rutayan-nv force-pushed the rpatro/env-params-first-class branch from b3adf41 to b06d621 Compare June 16, 2026 00:01

This was referenced Jun 16, 2026

feat(core): add ContinuousSpace action-space primitive #927

Draft

feat(configurator): add ObsLeafDescriptor + structured-observation protocol #928

Open

feat(configurator): add GymnasiumAdapter for CloudAI envs #930

Open

rutayan-nv force-pushed the rpatro/env-params-first-class branch from b06d621 to 2411f39 Compare June 16, 2026 13:50

rutayan-nv mentioned this pull request Jun 16, 2026

[CLI/Core] Polymorphic agent dispatch + TestRun owns trial counter #933

Merged

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

rutayan-nv force-pushed the rpatro/env-params-first-class branch from 2411f39 to c53596d Compare June 16, 2026 15:06

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

rutayan-nv force-pushed the rpatro/env-params-first-class branch 2 times, most recently from 1298463 to 9e379d1 Compare June 17, 2026 18:33

rutayan-nv force-pushed the rpatro/env-params-first-class branch 2 times, most recently from 4fce989 to 5c3f65e Compare June 21, 2026 15:34

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread src/cloudai/cli/handlers.py

Comment thread src/cloudai/configurator/cloudai_gym.py Outdated

podkidyshev requested changes Jun 23, 2026

View reviewed changes

Comment thread src/cloudai/configurator/env_params.py Outdated

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread src/cloudai/models/workload.py Outdated

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

rutayan-nv mentioned this pull request Jun 24, 2026

RL: contextual bandit — expose env_params regime in the observation (per-regime policies) rutayan-nv/cloudai#3

Open

podkidyshev requested changes Jun 24, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread src/cloudai/_core/test_scenario.py Outdated

rutayan-nv added 9 commits June 25, 2026 15:50

rutayan-nv force-pushed the rpatro/env-params-first-class branch from 5790ab9 to 7fec7f5 Compare June 25, 2026 19:51

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread src/cloudai/_core/test_scenario.py

Comment thread src/cloudai/cli/handlers.py

Comment thread src/cloudai/configurator/env_params.py

This was referenced Jun 26, 2026

DSE num_nodes sweep writes a phantom NUM_NODES field into cmd_args #943

Open

No single source of truth for "agent-driven" run; --single-sbatch conflates scheduling with search strategy #944

Open

rutayan-nv mentioned this pull request Jun 26, 2026

Rust-style source-located diagnostics for TOML config validation errors #945

Open

podkidyshev requested changes Jun 26, 2026

View reviewed changes

rutayan-nv mentioned this pull request Jun 26, 2026

DSE: single-element list dims ([1]) aren't scalarized before command-gen (no central invariant) #946

Open

		for key, value in action.items():

		def _apply(key: str, value: Any) -> None:

	new_tr.current_env_params = dict(env_params or {})
	new_tr.current_env_params = env_params or {}

	env_params=dict(self.test_run.current_env_params),
	env_params=self.test_run.current_env_params,

		# The test_run rebuild above drops the sample; restore it so the entry, cache key, and env.csv match.
		self.test_run.current_env_params = new_tr.current_env_params

		return {name: param.draw(random.Random(f"{self.seed}:{name}:{trial}")) for name, param in self.params.items()}


		class EnvParamsSink:

		writer.writerow([step, sample])


		def validate_dse_env_params(test_scenario: "TestScenario") -> None:

Uh oh!

Conversation

rutayan-nv commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Fix

Testing

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rutayan-nv commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

rutayan-nv commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rutayan-nv commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rutayan-nv commented May 26, 2026 •

edited

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

rutayan-nv commented Jun 24, 2026 •

edited

Loading

rutayan-nv commented Jun 25, 2026 •

edited

Loading

Follow-up gap: single-candidate (`[1]`) env_params still slip through