[TDD-red] CloudAIGymEnv cache key must include env_params (drives domain-randomization correctness fix)#900
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ChangesAgent Run Loop and DSE Handler Delegation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
7a27fa2 to
6b9f126
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/configurator/base_agent.py`:
- Line 148: The log call assumes observation is a numeric list and will raise
TypeError for non-numeric entries; update the logging around the line with
logging.info so it formats observation safely by mapping each item to a rounded
number when isinstance(item, (int, float)) and falling back to str(item) (or by
wrapping the comprehension in try/except and formatting with str on failure),
then log the resulting list along with step and reward; modify the code that
constructs the observation string (the expression using round(obs, 4)) to use
this safe formatting approach.
- Around line 137-147: The loop handling env.step(...) in base_agent.py
currently ignores the done flag and continues stepping; modify the control flow
in the method that contains the lines where observation, reward, done, *_ =
self.env.step(action) and the subsequent self.update_policy({...}) call so that
after calling update_policy you check if done is True and break (or return) to
stop the episode; ensure the terminal transition is still recorded (i.e., keep
the update_policy call for the final step) and then exit the loop to avoid
stepping into terminal states.
In `@tests/test_cloudaigym.py`:
- Around line 505-520: The test function
test_cache_hit_when_neither_has_env_params has a multi-line signature that fails
ruff-format; reformat the function signature and body to conform to ruff (e.g.,
collapse the signature to a single line and run ruff format) so the file passes
ruff-format checks, then re-run tests; target the function named
test_cache_hit_when_neither_has_env_params in the tests/test_cloudaigym.py diff
and apply ruff format or equivalent style fixes.
In `@tests/test_handlers.py`:
- Around line 287-288: The override of select_action must match the base
signature; update the tests/test_handlers.py select_action method to accept the
observation parameter (e.g., def select_action(self, observation: Any) ->
tuple[int, dict[str, Any]]), preserving the same return type and behavior (still
raise AssertionError), so the signature is compatible with
BaseAgent.select_action and satisfies the type checker.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 5198c57f-3f25-4c55-92be-50bcc9233c1a
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
pyproject.tomlsrc/cloudai/_core/test_scenario.pysrc/cloudai/cli/handlers.pysrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/base_agent.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/gymnasium_adapter.pytests/test_cloudaigym.pytests/test_gymnasium_adapter.pytests/test_handlers.pytests/test_test_scenario.py
6b9f126 to
923ca0b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)
255-258:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInclude
env_paramsin the cache match predicate.At Line 255–Line 258, cache lookup keys on
actiononly, so repeated actions under differentcurrent_env_paramscan return stale rewards/observations from another environment sample.Proposed targeted fix
def get_cached_trajectory_result(self, action: Any) -> TrajectoryEntry | None: + current_env_params = getattr(self.test_run, "current_env_params", None) for entry in self.current_trajectory: - if self._values_match_exact(entry.action, action): + if self._values_match_exact(entry.action, action) and self._values_match_exact( + getattr(entry, "env_params", None), current_env_params + ): return entry return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/configurator/cloudai_gym.py` around lines 255 - 258, The get_cached_trajectory_result method currently matches cache entries only on the action parameter, which causes it to return stale results when the same action is repeated under different environment configurations. Modify the cache lookup logic in get_cached_trajectory_result to include environment parameter validation in addition to the action match check. Add a condition that verifies both the entry.action matches the provided action AND that the environment parameters associated with the cached entry match the current_env_params before returning the cached entry.Source: Pipeline failures
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/cli/handlers.py`:
- Line 163: Replace the f-string in the logging.exception call with lazy
interpolation using % formatting instead. Change the f-string message to use the
% operator for string formatting, passing agent_type as a parameter to the
logging method, so that the string interpolation only happens if the message is
actually logged. This eliminates the Ruff G004 violation in the exception
logging path.
In `@tests/test_cloudaigym.py`:
- Line 502: The compound assertion using the `and` operator prevents pytest from
providing clear introspection about which part of the assertion failed. Split
the compound assertion `assert result is not None and result.step == 1` into two
separate assertions: one to verify `result is not None` and a second to verify
`result.step == 1`. Apply this same change to all similar compound assertions in
the test file to ensure pytest can clearly indicate which clause fails during
test execution.
---
Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 255-258: The get_cached_trajectory_result method currently matches
cache entries only on the action parameter, which causes it to return stale
results when the same action is repeated under different environment
configurations. Modify the cache lookup logic in get_cached_trajectory_result to
include environment parameter validation in addition to the action match check.
Add a condition that verifies both the entry.action matches the provided action
AND that the environment parameters associated with the cached entry match the
current_env_params before returning the cached entry.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: cd6c14de-9eea-4691-bcee-a41f93fd8e2b
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
pyproject.tomlsrc/cloudai/cli/handlers.pysrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/gymnasium_adapter.pytests/test_cloudaigym.pytests/test_gymnasium_adapter.pytests/test_handlers.py
923ca0b to
468419b
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)
74-83:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep constraint-failure observations shape-consistent with the declared observation space.
After this change, observation width is metric-count based, but
step()still returns[-1.0]on constraint failure (Line 144). That can emit the wrong observation length for multi-metric runs.Suggested fix
if not self.test_run.test.constraint_check(self.test_run, self.runner.system): logging.info("Constraint check failed. Skipping step.") - return [-1.0], self.rewards.constraint_failure, True, {} + failure_observation = [-1.0] * len(self.define_observation_space()) + return failure_observation, self.rewards.constraint_failure, True, {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/configurator/cloudai_gym.py` around lines 74 - 83, The `define_observation_space()` method declares an observation space with width equal to the number of agent metrics, but the `step()` method returns a fixed-length observation of `[-1.0]` when a constraint failure occurs at line 144. This creates a shape mismatch: multi-metric runs will get an observation of incorrect length during constraint failures. Fix this by modifying the constraint failure return in `step()` to generate an observation list with the same shape as `define_observation_space()` — use `[-1.0] * max(len(self.test_run.test.agent_metrics), 1)` instead of the hard-coded `[-1.0]` to ensure observation consistency across all execution paths.
♻️ Duplicate comments (1)
src/cloudai/configurator/base_agent.py (1)
149-161:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStop stepping once the environment reports termination.
Line 149 captures
done, but the loop never checks it, sorun()can continue stepping after episode termination.Proposed minimal fix
observation, reward, done, *_ = self.env.step(action) self.update_policy( { "trial_index": step, "value": reward, "observation": observation, "prev_observation": prev_observation, "action": action, "done": done, } ) logging.info(f"Step {step}: Observation: {[round(obs, 4) for obs in observation]}, Reward: {reward:.4f}") + if done: + break return 0Based on learnings from prior review comments, this is the same unresolved
done-handling gap.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/configurator/base_agent.py` around lines 149 - 161, The `run()` method unpacks the `done` flag from the environment step at line 149 but never checks it, causing the loop to continue even after the episode has terminated. Add a conditional check after the `update_policy()` call to break out of the loop when `done` is True, ensuring that stepping stops once the environment reports episode termination.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 74-83: The `define_observation_space()` method declares an
observation space with width equal to the number of agent metrics, but the
`step()` method returns a fixed-length observation of `[-1.0]` when a constraint
failure occurs at line 144. This creates a shape mismatch: multi-metric runs
will get an observation of incorrect length during constraint failures. Fix this
by modifying the constraint failure return in `step()` to generate an
observation list with the same shape as `define_observation_space()` — use
`[-1.0] * max(len(self.test_run.test.agent_metrics), 1)` instead of the
hard-coded `[-1.0]` to ensure observation consistency across all execution
paths.
---
Duplicate comments:
In `@src/cloudai/configurator/base_agent.py`:
- Around line 149-161: The `run()` method unpacks the `done` flag from the
environment step at line 149 but never checks it, causing the loop to continue
even after the episode has terminated. Add a conditional check after the
`update_policy()` call to break out of the loop when `done` is True, ensuring
that stepping stops once the environment reports episode termination.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 39e97025-4afa-46ac-aa93-b4260ba58cfa
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
pyproject.tomlsrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/base_agent.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/gymnasium_adapter.pytests/test_cloudaigym.pytests/test_gymnasium_adapter.pytests/test_handlers.py
468419b to
f6515cf
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/cloudai/configurator/cloudai_gym.py (2)
142-145:⚠️ Potential issue | 🟠 Major | ⚡ Quick winConstraint-failure observation shape can violate the declared observation space.
Line 144 returns
[-1.0]unconditionally, but observation shape is now metric-count based (Line 82). Multi-metric configs can produce shape mismatch in downstream consumers.Proposed fix
if not self.test_run.test.constraint_check(self.test_run, self.runner.system): logging.info("Constraint check failed. Skipping step.") - return [-1.0], self.rewards.constraint_failure, True, {} + failure_obs = [-1.0] * len(self.define_observation_space()) + return failure_obs, self.rewards.constraint_failure, True, {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/configurator/cloudai_gym.py` around lines 142 - 145, The constraint failure path in the method that handles constraint_check() returns a hardcoded observation of [-1.0], but the observation space shape is metric-count based as defined elsewhere in the code. For multi-metric configurations, this creates a shape mismatch. Fix this by replacing the hardcoded [-1.0] return value with an observation array that matches the correct shape based on the metric count, so the observation shape remains consistent regardless of the number of metrics in the configuration.
255-258:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winInclude
env_paramsin cache matching to prevent stale DR labels.Line 255 currently keys cache hits by
actiononly. This reuses rewards across differentcurrent_env_paramssamples and breaks reward/condition integrity.Proposed fix
def get_cached_trajectory_result(self, action: Any) -> TrajectoryEntry | None: + current_env_params = getattr(self.test_run, "current_env_params", None) for entry in self.current_trajectory: - if self._values_match_exact(entry.action, action): + entry_env_params = getattr(entry, "env_params", None) + if self._values_match_exact(entry.action, action) and self._values_match_exact( + entry_env_params, current_env_params + ): return entry return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/configurator/cloudai_gym.py` around lines 255 - 258, The get_cached_trajectory_result method only matches cache entries based on action, which causes it to reuse rewards across different environment parameter samples and breaks reward/condition integrity. Modify the matching logic in this method to check both the action AND the current_env_params when iterating through self.current_trajectory. Update the condition used in _values_match_exact or add an additional check to ensure that a cache entry is only returned if both the action matches and the environment parameters match the current state. This prevents stale DR labels from being reused when env_params differ.Source: Pipeline failures
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 151-163: The _sync_underlying_step_counter method sets
test_run.step to self._step_count + 1 before the underlying CloudAIGymEnv.step()
method is called, but that method increments the step counter again internally,
resulting in an off-by-one error. Remove the + 1 and set test_run.step to just
self._step_count so that when CloudAIGymEnv.step() increments it internally, the
step counter will have the correct final value.
---
Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 142-145: The constraint failure path in the method that handles
constraint_check() returns a hardcoded observation of [-1.0], but the
observation space shape is metric-count based as defined elsewhere in the code.
For multi-metric configurations, this creates a shape mismatch. Fix this by
replacing the hardcoded [-1.0] return value with an observation array that
matches the correct shape based on the metric count, so the observation shape
remains consistent regardless of the number of metrics in the configuration.
- Around line 255-258: The get_cached_trajectory_result method only matches
cache entries based on action, which causes it to reuse rewards across different
environment parameter samples and breaks reward/condition integrity. Modify the
matching logic in this method to check both the action AND the
current_env_params when iterating through self.current_trajectory. Update the
condition used in _values_match_exact or add an additional check to ensure that
a cache entry is only returned if both the action matches and the environment
parameters match the current state. This prevents stale DR labels from being
reused when env_params differ.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 7bff3cab-2384-4f5f-9360-3fc537c95a58
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
pyproject.tomlsrc/cloudai/cli/handlers.pysrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/base_agent.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/gymnasium_adapter.pytests/test_cloudaigym.pytests/test_gymnasium_adapter.pytests/test_handlers.py
| def _sync_underlying_step_counter(self) -> None: | ||
| """ | ||
| Mirror ``handle_dse_job``'s 1-based ``test_run.step`` so artifact paths match. | ||
|
|
||
| The first step is written under ``…/<iteration>/1/``, matching how | ||
| ``handle_dse_job`` numbers steps; this keeps reports and trajectory | ||
| analysis consistent regardless of whether the env is driven by the | ||
| DSE loop or by an external training loop wrapping the adapter. | ||
| """ | ||
| test_run = getattr(self._env, "test_run", None) | ||
| if test_run is not None: | ||
| test_run.step = self._step_count + 1 | ||
|
|
There was a problem hiding this comment.
Step synchronization is off-by-one when wrapping CloudAIGymEnv.
Line 162 sets test_run.step to self._step_count + 1 before delegating, but CloudAIGymEnv.step() increments again internally. This shifts artifact/trajectory step indexing by one.
Proposed fix
def _sync_underlying_step_counter(self) -> None:
@@
test_run = getattr(self._env, "test_run", None)
if test_run is not None:
- test_run.step = self._step_count + 1
+ # Underlying CloudAIGymEnv.step() increments at step start.
+ test_run.step = self._step_count🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cloudai/configurator/gymnasium_adapter.py` around lines 151 - 163, The
_sync_underlying_step_counter method sets test_run.step to self._step_count + 1
before the underlying CloudAIGymEnv.step() method is called, but that method
increments the step counter again internally, resulting in an off-by-one error.
Remove the + 1 and set test_run.step to just self._step_count so that when
CloudAIGymEnv.step() increments it internally, the step counter will have the
correct final value.
f6515cf to
3608921
Compare
…ole mutator The trial index lives on TestRun and is advanced exactly once per CloudAIGymEnv.step(), before output_path and the trajectory row are computed. Step artifacts and trajectory rows are therefore tagged with the advanced index rather than the pre-step value.
handle_dse_job calls agent.run() uniformly; agents (e.g. RLlib-based) implement their own training loop in run(). Failure handling follows a hybrid contract: a non-zero rc accumulates via `err |= rc` and the sweep continues, while an unexpected exception hard-fails the scenario after reports are generated and the aborting error is documented.
3608921 to
1f2191c
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tests/test_cloudaigym.py (2)
502-502:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSplit compound assertion for clearer pytest introspection.
The compound assertion
assert result is not None and result.step == 1prevents pytest from clearly indicating which clause failed.✂️ Proposed fix
- assert result is not None and result.step == 1 + assert result is not None + assert result.step == 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_cloudaigym.py` at line 502, The compound assertion `assert result is not None and result.step == 1` combines two independent checks which prevents pytest from clearly identifying which specific condition failed during test execution. Split this into two separate assertions: one to verify that result is not None, and another to verify that result.step equals 1. This allows pytest's introspection to pinpoint exactly which condition caused the failure.Source: Linters/SAST tools
520-520:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSplit compound assertion for clearer pytest introspection.
The compound assertion
assert result is not None and result.step == 1prevents pytest from clearly indicating which clause failed.✂️ Proposed fix
- assert result is not None and result.step == 1 + assert result is not None + assert result.step == 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_cloudaigym.py` at line 520, The compound assertion using the `and` operator on line 520 prevents pytest from clearly indicating which clause failed when the assertion breaks. Split the assertion `assert result is not None and result.step == 1` into two separate assert statements, one checking that result is not None and another checking that result.step equals 1. This allows pytest to provide more granular failure reporting and makes debugging test failures easier.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_cloudaigym.py`:
- Around line 523-525: The function signature for
test_step_reruns_workload_when_env_params_change violates ruff-format rules due
to improper multi-line formatting. Reformat the function signature to comply
with the project's formatting standards by either running ruff format on the
file or manually adjusting the parameter list and opening parenthesis placement
to align with ruff's style requirements.
---
Duplicate comments:
In `@tests/test_cloudaigym.py`:
- Line 502: The compound assertion `assert result is not None and result.step ==
1` combines two independent checks which prevents pytest from clearly
identifying which specific condition failed during test execution. Split this
into two separate assertions: one to verify that result is not None, and another
to verify that result.step equals 1. This allows pytest's introspection to
pinpoint exactly which condition caused the failure.
- Line 520: The compound assertion using the `and` operator on line 520 prevents
pytest from clearly indicating which clause failed when the assertion breaks.
Split the assertion `assert result is not None and result.step == 1` into two
separate assert statements, one checking that result is not None and another
checking that result.step equals 1. This allows pytest to provide more granular
failure reporting and makes debugging test failures easier.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: bd869259-1364-4c02-9fda-40317c432bf3
📒 Files selected for processing (7)
src/cloudai/_core/test_scenario.pysrc/cloudai/cli/handlers.pysrc/cloudai/configurator/base_agent.pysrc/cloudai/configurator/cloudai_gym.pytests/test_cloudaigym.pytests/test_handlers.pytests/test_test_scenario.py
|
@coderabbitai review |
✅ Action performedReview finished.
|
run() already treats a None return from select_action() as a stop signal, but the abstract method declared only tuple[int, dict[str, Any]]. Widen the return annotation to tuple[int, dict[str, Any]] | None and document the termination semantics so the declared contract matches run()'s usage.
CloudAIGymEnv.get_cached_trajectory_result(action) keys the trajectory cache on action alone. When a workload declares env_params (e.g. drop_rate) and the agent re-selects the same action under a different env_params sample, the cache returns a stale reward measured under a different env, silently invalidating any domain-randomization workflow. Four tests pin the contract; today two FAIL (bug-exposing), two PASS (back-compat sanity). All 27 pre-existing tests are untouched. FAIL test_cache_miss_when_env_params_differ FAIL test_step_reruns_workload_when_env_params_change PASS test_cache_hit_when_action_and_env_params_match PASS test_cache_hit_when_neither_has_env_params Driver for the follow-up PR that adds env_params as a first-class field on TestDefinition + TrajectoryEntry and rewrites the cache key as (action, env_params). Both unit and integration shapes are covered so the fix can be validated end-to-end through env.step().
…ge signature Satisfy ruff-format on the env_params cache-key TDD test.
7368d06 to
210e411
Compare
CodeRabbit (Ruff PT018): `assert a and b` obscures which clause failed. Split the two `assert result is not None and result.step == 1` checks in the env_params cache-key tests into separate assertions for clearer pytest introspection. Test-only change, no behavior impact.
Make env_params a first-class part of CloudAIGymEnv trial identity so the trajectory cache keys on (action, env_params) rather than action alone, fixing the domain-randomization correctness bug where the same action under a different env_params sample returned a stale reward. - Cache key now includes env_params; cache-key tests pin the contract (formerly the TDD-red specs of NVIDIA#900, folded in here). - Keep env.csv and trajectory.csv 1:1 step-aligned: a single TrajectoryEntry sinks both files coherently, including on constraint failure. - Reject env_params on non-DSE jobs; reject non-finite / negative weights. - Add cache-hit + declared-env_params integration coverage. Folds the test-only PR NVIDIA#900 (cache-key TDD) into this PR so the stack has no permanently-red standalone PR.
|
Folded into #901. As of c7f7f1f, #901 now carries the cache-key TDD specs ( The stack now goes #933 -> #901 -> ... (this branch is dropped). Closing. |
Make env_params a first-class part of CloudAIGymEnv trial identity so the trajectory cache keys on (action, env_params) rather than action alone, fixing the domain-randomization correctness bug where the same action under a different env_params sample returned a stale reward. - Cache key now includes env_params; cache-key tests pin the contract (formerly the TDD-red specs of NVIDIA#900, folded in here). - Keep env.csv and trajectory.csv 1:1 step-aligned: a single TrajectoryEntry sinks both files coherently, including on constraint failure. - Reject env_params on non-DSE jobs; reject non-finite / negative weights. - Add cache-hit + declared-env_params integration coverage. Folds the test-only PR NVIDIA#900 (cache-key TDD) into this PR so the stack has no permanently-red standalone PR.
Make env_params a first-class part of CloudAIGymEnv trial identity so the trajectory cache keys on (action, env_params) rather than action alone, fixing the domain-randomization correctness bug where the same action under a different env_params sample returned a stale reward. - Cache key now includes env_params; cache-key tests pin the contract (formerly the TDD-red specs of NVIDIA#900, folded in here). - Keep env.csv and trajectory.csv 1:1 step-aligned: a single TrajectoryEntry sinks both files coherently, including on constraint failure. - Reject env_params on non-DSE jobs; reject non-finite / negative weights. - Add cache-hit + declared-env_params integration coverage. Folds the test-only PR NVIDIA#900 (cache-key TDD) into this PR so the stack has no permanently-red standalone PR.
Make env_params a first-class part of CloudAIGymEnv trial identity so the trajectory cache keys on (action, env_params) rather than action alone, fixing the domain-randomization correctness bug where the same action under a different env_params sample returned a stale reward. - Cache key now includes env_params; cache-key tests pin the contract (formerly the TDD-red specs of NVIDIA#900, folded in here). - Keep env.csv and trajectory.csv 1:1 step-aligned: a single TrajectoryEntry sinks both files coherently, including on constraint failure. - Reject env_params on non-DSE jobs; reject non-finite / negative weights. - Add cache-hit + declared-env_params integration coverage. Folds the test-only PR NVIDIA#900 (cache-key TDD) into this PR so the stack has no permanently-red standalone PR.
Issue
CloudAIGymEnvkeys the trajectory cache onactiononly, so the same action under a differentenv_paramssample returns a stale reward — the domain-randomization signal is silently lost.Fix (test-only, TDD-red)
env_params: 2 fail today (exposing the bug), 2 pass as controls. The production fix lands in fix(configurator): make env_params first-class to fix the trajectory cache key #901.Testing
test_cloudaigym.pygreen (30 passed). ruff + pyright clean.Stack:
main← #893 ← #900 ← #901. TDD-red — do not merge alone; #901 turns the failing tests green. (The GymnasiumAdapter previously bundled here moved to its own stacked PR.)