Skip to content

[TDD-red] CloudAIGymEnv cache key must include env_params (drives domain-randomization correctness fix)#900

Closed
rutayan-nv wants to merge 6 commits into
NVIDIA:mainfrom
rutayan-nv:rpatro/env-params-cache-key-tdd
Closed

[TDD-red] CloudAIGymEnv cache key must include env_params (drives domain-randomization correctness fix)#900
rutayan-nv wants to merge 6 commits into
NVIDIA:mainfrom
rutayan-nv:rpatro/env-params-cache-key-tdd

Conversation

@rutayan-nv

@rutayan-nv rutayan-nv commented May 26, 2026

Copy link
Copy Markdown
Contributor

Issue

  • Cache correctness: CloudAIGymEnv keys the trajectory cache on action only, so the same action under a different env_params sample returns a stale reward — the domain-randomization signal is silently lost.

Fix (test-only, TDD-red)

Testing

Stack: main#893#900#901. TDD-red — do not merge alone; #901 turns the failing tests green. (The GymnasiumAdapter previously bundled here moved to its own stacked PR.)

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds TestRun.increment_step() to advance the trial counter. CloudAIGymEnv calls this at the start of step(). BaseAgent.run() implements the full env-reset → action-select → env-step → policy-update loop. handle_dse_job delegates to agent.run() and gains exception handling with dse_failure.txt reporting via _record_run_failure. Tests verify step advancement semantics, trajectory cache behavior with env_params, and agent orchestration flows.

Changes

Agent Run Loop and DSE Handler Delegation

Layer / File(s) Summary
TestRun step counter
src/cloudai/_core/test_scenario.py, tests/test_test_scenario.py
increment_step() advances step by 1 and returns the new value; unit tests cover initial increment, repeated monotonic calls, and pre-seeded starting values.
CloudAIGymEnv step counter integration
src/cloudai/configurator/cloudai_gym.py
step() calls self.test_run.increment_step() at the start before action application and cached trajectory logic.
CloudAIGymEnv step and cache behavior tests
tests/test_cloudaigym.py
Updates existing tests to seed and assert post-increment step indices; adds unit tests for env_params-aware trajectory cache (miss on differing params, hit on match, back-compat when absent); adds integration test for workload re-execution on env_params change.
BaseAgent.run() orchestration loop
src/cloudai/configurator/base_agent.py
Implements BaseAgent.run(): resets env, loops up to max_steps with select_actionenv.stepupdate_policy feedback dict, early-exits on None action, returns 0.
handle_dse_job delegation and failure reporting
src/cloudai/cli/handlers.py
Refactors handle_dse_job to call agent.run() and accumulate non-zero codes; catches unexpected exceptions, logs them, and calls generate_reports(error=...) before re-raising; adds _record_run_failure writing dse_failure.txt; extends generate_reports with optional error parameter.
handle_dse_job tests with stub agent
tests/test_handlers.py
Adds CustomRunStubAgent with class-level run controls and custom_run_agent_name fixture; tests verify delegation, return code accumulation, exception propagation, aborted remaining runs, and failure file creation before re-raise.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 Hop! The counter ticks — one step, then two,
The agent loops around the gym, brand new.
On error caught, a file records the pain,
Then re-raises the exception once again.
Each trial tracked, each policy fed,
The rabbit checks the cache — no dupe ahead! 🥕

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding tests to verify that CloudAIGymEnv's cache key must include env_params to fix a domain-randomization bug.
Description check ✅ Passed The description directly explains the cache correctness issue, the test-only fix approach, testing results, and the TDD-red stack context—all closely related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/base_agent.py`:
- Line 148: The log call assumes observation is a numeric list and will raise
TypeError for non-numeric entries; update the logging around the line with
logging.info so it formats observation safely by mapping each item to a rounded
number when isinstance(item, (int, float)) and falling back to str(item) (or by
wrapping the comprehension in try/except and formatting with str on failure),
then log the resulting list along with step and reward; modify the code that
constructs the observation string (the expression using round(obs, 4)) to use
this safe formatting approach.
- Around line 137-147: The loop handling env.step(...) in base_agent.py
currently ignores the done flag and continues stepping; modify the control flow
in the method that contains the lines where observation, reward, done, *_ =
self.env.step(action) and the subsequent self.update_policy({...}) call so that
after calling update_policy you check if done is True and break (or return) to
stop the episode; ensure the terminal transition is still recorded (i.e., keep
the update_policy call for the final step) and then exit the loop to avoid
stepping into terminal states.

In `@tests/test_cloudaigym.py`:
- Around line 505-520: The test function
test_cache_hit_when_neither_has_env_params has a multi-line signature that fails
ruff-format; reformat the function signature and body to conform to ruff (e.g.,
collapse the signature to a single line and run ruff format) so the file passes
ruff-format checks, then re-run tests; target the function named
test_cache_hit_when_neither_has_env_params in the tests/test_cloudaigym.py diff
and apply ruff format or equivalent style fixes.

In `@tests/test_handlers.py`:
- Around line 287-288: The override of select_action must match the base
signature; update the tests/test_handlers.py select_action method to accept the
observation parameter (e.g., def select_action(self, observation: Any) ->
tuple[int, dict[str, Any]]), preserving the same return type and behavior (still
raise AssertionError), so the signature is compatible with
BaseAgent.select_action and satisfies the type checker.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 5198c57f-3f25-4c55-92be-50bcc9233c1a

📥 Commits

Reviewing files that changed from the base of the PR and between 18ce72e and 6b9f126.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • pyproject.toml
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • tests/test_cloudaigym.py
  • tests/test_gymnasium_adapter.py
  • tests/test_handlers.py
  • tests/test_test_scenario.py

Comment thread src/cloudai/configurator/base_agent.py
Comment thread src/cloudai/configurator/base_agent.py
Comment thread tests/test_cloudaigym.py Outdated
Comment thread tests/test_handlers.py Outdated
@rutayan-nv rutayan-nv force-pushed the rpatro/env-params-cache-key-tdd branch from 6b9f126 to 923ca0b Compare June 15, 2026 17:39

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)

255-258: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include env_params in the cache match predicate.

At Line 255–Line 258, cache lookup keys on action only, so repeated actions under different current_env_params can return stale rewards/observations from another environment sample.

Proposed targeted fix
 def get_cached_trajectory_result(self, action: Any) -> TrajectoryEntry | None:
+    current_env_params = getattr(self.test_run, "current_env_params", None)
     for entry in self.current_trajectory:
-        if self._values_match_exact(entry.action, action):
+        if self._values_match_exact(entry.action, action) and self._values_match_exact(
+            getattr(entry, "env_params", None), current_env_params
+        ):
             return entry
 
     return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 255 - 258, The
get_cached_trajectory_result method currently matches cache entries only on the
action parameter, which causes it to return stale results when the same action
is repeated under different environment configurations. Modify the cache lookup
logic in get_cached_trajectory_result to include environment parameter
validation in addition to the action match check. Add a condition that verifies
both the entry.action matches the provided action AND that the environment
parameters associated with the cached entry match the current_env_params before
returning the cached entry.

Source: Pipeline failures

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/cli/handlers.py`:
- Line 163: Replace the f-string in the logging.exception call with lazy
interpolation using % formatting instead. Change the f-string message to use the
% operator for string formatting, passing agent_type as a parameter to the
logging method, so that the string interpolation only happens if the message is
actually logged. This eliminates the Ruff G004 violation in the exception
logging path.

In `@tests/test_cloudaigym.py`:
- Line 502: The compound assertion using the `and` operator prevents pytest from
providing clear introspection about which part of the assertion failed. Split
the compound assertion `assert result is not None and result.step == 1` into two
separate assertions: one to verify `result is not None` and a second to verify
`result.step == 1`. Apply this same change to all similar compound assertions in
the test file to ensure pytest can clearly indicate which clause fails during
test execution.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 255-258: The get_cached_trajectory_result method currently matches
cache entries only on the action parameter, which causes it to return stale
results when the same action is repeated under different environment
configurations. Modify the cache lookup logic in get_cached_trajectory_result to
include environment parameter validation in addition to the action match check.
Add a condition that verifies both the entry.action matches the provided action
AND that the environment parameters associated with the cached entry match the
current_env_params before returning the cached entry.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: cd6c14de-9eea-4691-bcee-a41f93fd8e2b

📥 Commits

Reviewing files that changed from the base of the PR and between 6b9f126 and 923ca0b.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • pyproject.toml
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • tests/test_cloudaigym.py
  • tests/test_gymnasium_adapter.py
  • tests/test_handlers.py

Comment thread src/cloudai/cli/handlers.py Outdated
Comment thread tests/test_cloudaigym.py Outdated
@rutayan-nv rutayan-nv force-pushed the rpatro/env-params-cache-key-tdd branch from 923ca0b to 468419b Compare June 15, 2026 22:49

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)

74-83: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep constraint-failure observations shape-consistent with the declared observation space.

After this change, observation width is metric-count based, but step() still returns [-1.0] on constraint failure (Line 144). That can emit the wrong observation length for multi-metric runs.

Suggested fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            failure_observation = [-1.0] * len(self.define_observation_space())
+            return failure_observation, self.rewards.constraint_failure, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 74 - 83, The
`define_observation_space()` method declares an observation space with width
equal to the number of agent metrics, but the `step()` method returns a
fixed-length observation of `[-1.0]` when a constraint failure occurs at line
144. This creates a shape mismatch: multi-metric runs will get an observation of
incorrect length during constraint failures. Fix this by modifying the
constraint failure return in `step()` to generate an observation list with the
same shape as `define_observation_space()` — use `[-1.0] *
max(len(self.test_run.test.agent_metrics), 1)` instead of the hard-coded
`[-1.0]` to ensure observation consistency across all execution paths.
♻️ Duplicate comments (1)
src/cloudai/configurator/base_agent.py (1)

149-161: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop stepping once the environment reports termination.

Line 149 captures done, but the loop never checks it, so run() can continue stepping after episode termination.

Proposed minimal fix
             observation, reward, done, *_ = self.env.step(action)
             self.update_policy(
                 {
                     "trial_index": step,
                     "value": reward,
                     "observation": observation,
                     "prev_observation": prev_observation,
                     "action": action,
                     "done": done,
                 }
             )
             logging.info(f"Step {step}: Observation: {[round(obs, 4) for obs in observation]}, Reward: {reward:.4f}")
+            if done:
+                break
         return 0

Based on learnings from prior review comments, this is the same unresolved done-handling gap.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/base_agent.py` around lines 149 - 161, The `run()`
method unpacks the `done` flag from the environment step at line 149 but never
checks it, causing the loop to continue even after the episode has terminated.
Add a conditional check after the `update_policy()` call to break out of the
loop when `done` is True, ensuring that stepping stops once the environment
reports episode termination.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 74-83: The `define_observation_space()` method declares an
observation space with width equal to the number of agent metrics, but the
`step()` method returns a fixed-length observation of `[-1.0]` when a constraint
failure occurs at line 144. This creates a shape mismatch: multi-metric runs
will get an observation of incorrect length during constraint failures. Fix this
by modifying the constraint failure return in `step()` to generate an
observation list with the same shape as `define_observation_space()` — use
`[-1.0] * max(len(self.test_run.test.agent_metrics), 1)` instead of the
hard-coded `[-1.0]` to ensure observation consistency across all execution
paths.

---

Duplicate comments:
In `@src/cloudai/configurator/base_agent.py`:
- Around line 149-161: The `run()` method unpacks the `done` flag from the
environment step at line 149 but never checks it, causing the loop to continue
even after the episode has terminated. Add a conditional check after the
`update_policy()` call to break out of the loop when `done` is True, ensuring
that stepping stops once the environment reports episode termination.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 39e97025-4afa-46ac-aa93-b4260ba58cfa

📥 Commits

Reviewing files that changed from the base of the PR and between 923ca0b and 468419b.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • pyproject.toml
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • tests/test_cloudaigym.py
  • tests/test_gymnasium_adapter.py
  • tests/test_handlers.py

@rutayan-nv rutayan-nv force-pushed the rpatro/env-params-cache-key-tdd branch from 468419b to f6515cf Compare June 15, 2026 23:20

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/cloudai/configurator/cloudai_gym.py (2)

142-145: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constraint-failure observation shape can violate the declared observation space.

Line 144 returns [-1.0] unconditionally, but observation shape is now metric-count based (Line 82). Multi-metric configs can produce shape mismatch in downstream consumers.

Proposed fix
         if not self.test_run.test.constraint_check(self.test_run, self.runner.system):
             logging.info("Constraint check failed. Skipping step.")
-            return [-1.0], self.rewards.constraint_failure, True, {}
+            failure_obs = [-1.0] * len(self.define_observation_space())
+            return failure_obs, self.rewards.constraint_failure, True, {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 142 - 145, The
constraint failure path in the method that handles constraint_check() returns a
hardcoded observation of [-1.0], but the observation space shape is metric-count
based as defined elsewhere in the code. For multi-metric configurations, this
creates a shape mismatch. Fix this by replacing the hardcoded [-1.0] return
value with an observation array that matches the correct shape based on the
metric count, so the observation shape remains consistent regardless of the
number of metrics in the configuration.

255-258: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Include env_params in cache matching to prevent stale DR labels.

Line 255 currently keys cache hits by action only. This reuses rewards across different current_env_params samples and breaks reward/condition integrity.

Proposed fix
 def get_cached_trajectory_result(self, action: Any) -> TrajectoryEntry | None:
+    current_env_params = getattr(self.test_run, "current_env_params", None)
     for entry in self.current_trajectory:
-        if self._values_match_exact(entry.action, action):
+        entry_env_params = getattr(entry, "env_params", None)
+        if self._values_match_exact(entry.action, action) and self._values_match_exact(
+            entry_env_params, current_env_params
+        ):
             return entry

     return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/cloudai_gym.py` around lines 255 - 258, The
get_cached_trajectory_result method only matches cache entries based on action,
which causes it to reuse rewards across different environment parameter samples
and breaks reward/condition integrity. Modify the matching logic in this method
to check both the action AND the current_env_params when iterating through
self.current_trajectory. Update the condition used in _values_match_exact or add
an additional check to ensure that a cache entry is only returned if both the
action matches and the environment parameters match the current state. This
prevents stale DR labels from being reused when env_params differ.

Source: Pipeline failures

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 151-163: The _sync_underlying_step_counter method sets
test_run.step to self._step_count + 1 before the underlying CloudAIGymEnv.step()
method is called, but that method increments the step counter again internally,
resulting in an off-by-one error. Remove the + 1 and set test_run.step to just
self._step_count so that when CloudAIGymEnv.step() increments it internally, the
step counter will have the correct final value.

---

Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 142-145: The constraint failure path in the method that handles
constraint_check() returns a hardcoded observation of [-1.0], but the
observation space shape is metric-count based as defined elsewhere in the code.
For multi-metric configurations, this creates a shape mismatch. Fix this by
replacing the hardcoded [-1.0] return value with an observation array that
matches the correct shape based on the metric count, so the observation shape
remains consistent regardless of the number of metrics in the configuration.
- Around line 255-258: The get_cached_trajectory_result method only matches
cache entries based on action, which causes it to reuse rewards across different
environment parameter samples and breaks reward/condition integrity. Modify the
matching logic in this method to check both the action AND the
current_env_params when iterating through self.current_trajectory. Update the
condition used in _values_match_exact or add an additional check to ensure that
a cache entry is only returned if both the action matches and the environment
parameters match the current state. This prevents stale DR labels from being
reused when env_params differ.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 7bff3cab-2384-4f5f-9360-3fc537c95a58

📥 Commits

Reviewing files that changed from the base of the PR and between 468419b and f6515cf.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • pyproject.toml
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/__init__.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • src/cloudai/configurator/gymnasium_adapter.py
  • tests/test_cloudaigym.py
  • tests/test_gymnasium_adapter.py
  • tests/test_handlers.py

Comment on lines +151 to +163
def _sync_underlying_step_counter(self) -> None:
"""
Mirror ``handle_dse_job``'s 1-based ``test_run.step`` so artifact paths match.

The first step is written under ``…/<iteration>/1/``, matching how
``handle_dse_job`` numbers steps; this keeps reports and trajectory
analysis consistent regardless of whether the env is driven by the
DSE loop or by an external training loop wrapping the adapter.
"""
test_run = getattr(self._env, "test_run", None)
if test_run is not None:
test_run.step = self._step_count + 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Step synchronization is off-by-one when wrapping CloudAIGymEnv.

Line 162 sets test_run.step to self._step_count + 1 before delegating, but CloudAIGymEnv.step() increments again internally. This shifts artifact/trajectory step indexing by one.

Proposed fix
     def _sync_underlying_step_counter(self) -> None:
@@
         test_run = getattr(self._env, "test_run", None)
         if test_run is not None:
-            test_run.step = self._step_count + 1
+            # Underlying CloudAIGymEnv.step() increments at step start.
+            test_run.step = self._step_count
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/configurator/gymnasium_adapter.py` around lines 151 - 163, The
_sync_underlying_step_counter method sets test_run.step to self._step_count + 1
before the underlying CloudAIGymEnv.step() method is called, but that method
increments the step counter again internally, resulting in an off-by-one error.
Remove the + 1 and set test_run.step to just self._step_count so that when
CloudAIGymEnv.step() increments it internally, the step counter will have the
correct final value.

…ole mutator

The trial index lives on TestRun and is advanced exactly once per
CloudAIGymEnv.step(), before output_path and the trajectory row are computed.
Step artifacts and trajectory rows are therefore tagged with the advanced
index rather than the pre-step value.
handle_dse_job calls agent.run() uniformly; agents (e.g. RLlib-based) implement
their own training loop in run(). Failure handling follows a hybrid contract: a
non-zero rc accumulates via `err |= rc` and the sweep continues, while an
unexpected exception hard-fails the scenario after reports are generated and the
aborting error is documented.
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
tests/test_cloudaigym.py (2)

502-502: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Split compound assertion for clearer pytest introspection.

The compound assertion assert result is not None and result.step == 1 prevents pytest from clearly indicating which clause failed.

✂️ Proposed fix
-    assert result is not None and result.step == 1
+    assert result is not None
+    assert result.step == 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cloudaigym.py` at line 502, The compound assertion `assert result
is not None and result.step == 1` combines two independent checks which prevents
pytest from clearly identifying which specific condition failed during test
execution. Split this into two separate assertions: one to verify that result is
not None, and another to verify that result.step equals 1. This allows pytest's
introspection to pinpoint exactly which condition caused the failure.

Source: Linters/SAST tools


520-520: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Split compound assertion for clearer pytest introspection.

The compound assertion assert result is not None and result.step == 1 prevents pytest from clearly indicating which clause failed.

✂️ Proposed fix
-    assert result is not None and result.step == 1
+    assert result is not None
+    assert result.step == 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cloudaigym.py` at line 520, The compound assertion using the `and`
operator on line 520 prevents pytest from clearly indicating which clause failed
when the assertion breaks. Split the assertion `assert result is not None and
result.step == 1` into two separate assert statements, one checking that result
is not None and another checking that result.step equals 1. This allows pytest
to provide more granular failure reporting and makes debugging test failures
easier.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_cloudaigym.py`:
- Around line 523-525: The function signature for
test_step_reruns_workload_when_env_params_change violates ruff-format rules due
to improper multi-line formatting. Reformat the function signature to comply
with the project's formatting standards by either running ruff format on the
file or manually adjusting the parameter list and opening parenthesis placement
to align with ruff's style requirements.

---

Duplicate comments:
In `@tests/test_cloudaigym.py`:
- Line 502: The compound assertion `assert result is not None and result.step ==
1` combines two independent checks which prevents pytest from clearly
identifying which specific condition failed during test execution. Split this
into two separate assertions: one to verify that result is not None, and another
to verify that result.step equals 1. This allows pytest's introspection to
pinpoint exactly which condition caused the failure.
- Line 520: The compound assertion using the `and` operator on line 520 prevents
pytest from clearly indicating which clause failed when the assertion breaks.
Split the assertion `assert result is not None and result.step == 1` into two
separate assert statements, one checking that result is not None and another
checking that result.step equals 1. This allows pytest to provide more granular
failure reporting and makes debugging test failures easier.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: bd869259-1364-4c02-9fda-40317c432bf3

📥 Commits

Reviewing files that changed from the base of the PR and between f6515cf and 1f2191c.

📒 Files selected for processing (7)
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • tests/test_cloudaigym.py
  • tests/test_handlers.py
  • tests/test_test_scenario.py

Comment thread tests/test_cloudaigym.py Outdated
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

run() already treats a None return from select_action() as a stop signal, but
the abstract method declared only tuple[int, dict[str, Any]]. Widen the return
annotation to tuple[int, dict[str, Any]] | None and document the termination
semantics so the declared contract matches run()'s usage.
CloudAIGymEnv.get_cached_trajectory_result(action) keys the trajectory
cache on action alone. When a workload declares env_params (e.g.
drop_rate) and the agent re-selects the same action under a different
env_params sample, the cache returns a stale reward measured under a
different env, silently invalidating any domain-randomization workflow.

Four tests pin the contract; today two FAIL (bug-exposing), two PASS
(back-compat sanity). All 27 pre-existing tests are untouched.

  FAIL  test_cache_miss_when_env_params_differ
  FAIL  test_step_reruns_workload_when_env_params_change
  PASS  test_cache_hit_when_action_and_env_params_match
  PASS  test_cache_hit_when_neither_has_env_params

Driver for the follow-up PR that adds env_params as a first-class field
on TestDefinition + TrajectoryEntry and rewrites the cache key as
(action, env_params). Both unit and integration shapes are covered so
the fix can be validated end-to-end through env.step().
…ge signature

Satisfy ruff-format on the env_params cache-key TDD test.
@rutayan-nv rutayan-nv force-pushed the rpatro/env-params-cache-key-tdd branch from 7368d06 to 210e411 Compare June 16, 2026 22:15
CodeRabbit (Ruff PT018): `assert a and b` obscures which clause failed.
Split the two `assert result is not None and result.step == 1` checks in the
env_params cache-key tests into separate assertions for clearer pytest
introspection. Test-only change, no behavior impact.
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 20, 2026
Make env_params a first-class part of CloudAIGymEnv trial identity so the
trajectory cache keys on (action, env_params) rather than action alone,
fixing the domain-randomization correctness bug where the same action under
a different env_params sample returned a stale reward.

- Cache key now includes env_params; cache-key tests pin the contract
  (formerly the TDD-red specs of NVIDIA#900, folded in here).
- Keep env.csv and trajectory.csv 1:1 step-aligned: a single TrajectoryEntry
  sinks both files coherently, including on constraint failure.
- Reject env_params on non-DSE jobs; reject non-finite / negative weights.
- Add cache-hit + declared-env_params integration coverage.

Folds the test-only PR NVIDIA#900 (cache-key TDD) into this PR so the stack has no
permanently-red standalone PR.
@rutayan-nv

Copy link
Copy Markdown
Contributor Author

Folded into #901. As of c7f7f1f, #901 now carries the cache-key TDD specs (test_cache_miss_when_env_params_differ, test_step_reruns_workload_when_env_params_change) together with the production fix in a single green commit, eliminating this permanently-red standalone PR.

The stack now goes #933 -> #901 -> ... (this branch is dropped). Closing.

@rutayan-nv rutayan-nv closed this Jun 20, 2026
@rutayan-nv rutayan-nv deleted the rpatro/env-params-cache-key-tdd branch June 20, 2026 22:17
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 21, 2026
Make env_params a first-class part of CloudAIGymEnv trial identity so the
trajectory cache keys on (action, env_params) rather than action alone,
fixing the domain-randomization correctness bug where the same action under
a different env_params sample returned a stale reward.

- Cache key now includes env_params; cache-key tests pin the contract
  (formerly the TDD-red specs of NVIDIA#900, folded in here).
- Keep env.csv and trajectory.csv 1:1 step-aligned: a single TrajectoryEntry
  sinks both files coherently, including on constraint failure.
- Reject env_params on non-DSE jobs; reject non-finite / negative weights.
- Add cache-hit + declared-env_params integration coverage.

Folds the test-only PR NVIDIA#900 (cache-key TDD) into this PR so the stack has no
permanently-red standalone PR.
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 21, 2026
Make env_params a first-class part of CloudAIGymEnv trial identity so the
trajectory cache keys on (action, env_params) rather than action alone,
fixing the domain-randomization correctness bug where the same action under
a different env_params sample returned a stale reward.

- Cache key now includes env_params; cache-key tests pin the contract
  (formerly the TDD-red specs of NVIDIA#900, folded in here).
- Keep env.csv and trajectory.csv 1:1 step-aligned: a single TrajectoryEntry
  sinks both files coherently, including on constraint failure.
- Reject env_params on non-DSE jobs; reject non-finite / negative weights.
- Add cache-hit + declared-env_params integration coverage.

Folds the test-only PR NVIDIA#900 (cache-key TDD) into this PR so the stack has no
permanently-red standalone PR.
rutayan-nv added a commit to rutayan-nv/cloudai that referenced this pull request Jun 25, 2026
Make env_params a first-class part of CloudAIGymEnv trial identity so the
trajectory cache keys on (action, env_params) rather than action alone,
fixing the domain-randomization correctness bug where the same action under
a different env_params sample returned a stale reward.

- Cache key now includes env_params; cache-key tests pin the contract
  (formerly the TDD-red specs of NVIDIA#900, folded in here).
- Keep env.csv and trajectory.csv 1:1 step-aligned: a single TrajectoryEntry
  sinks both files coherently, including on constraint failure.
- Reject env_params on non-DSE jobs; reject non-finite / negative weights.
- Add cache-hit + declared-env_params integration coverage.

Folds the test-only PR NVIDIA#900 (cache-key TDD) into this PR so the stack has no
permanently-red standalone PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant