feat(cli): route live_rl_mode jobs through the DSE handler#934
feat(cli): route live_rl_mode jobs through the DSE handler#934rutayan-nv wants to merge 30 commits into
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds live-RL / online execution mode backed by a ChangesLive-RL and Env Params for CloudAI DSE
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/cloudai/configurator/cloudai_gym.py (2)
386-390:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPersist
env_paramswith trajectory entries.
TrajectoryEntry.env_paramsis populated on cache hits and misses, but the CSV artifact drops it. Persisting the field keeps the on-disk trajectory contract consistent with the in-memory cache identity.Proposed fix
writer = csv.writer(file) if not file_exists: - writer.writerow(["step", "action", "reward", "observation"]) - writer.writerow([entry.step, entry.action, entry.reward, entry.observation]) + writer.writerow(["step", "action", "reward", "observation", "env_params"]) + writer.writerow([entry.step, entry.action, entry.reward, entry.observation, entry.env_params])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/configurator/cloudai_gym.py` around lines 386 - 390, The CSV writer is currently persisting only the step, action, reward, and observation fields from the TrajectoryEntry object, but the env_params field is being populated in memory and should also be written to disk. Add "env_params" to the CSV header row in the writerow call when file_exists is False, and add entry.env_params to the data row writerow call to ensure the env_params field is persisted alongside the other trajectory fields in the CSV artifact.
260-262:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep
env.csvandtrajectory.csvaligned on constraint failures.
before_step()has already writtenenv.csvby Line 260, but this early return skipswrite_trajectory()andafter_step(). That leaves anenv.csvrow with no matching trajectory row whenever a trial is rejected by constraints.Proposed fix
if not self.test_run.test.constraint_check(self.test_run, self.runner.system): logging.info("Constraint check failed. Skipping step.") - return [-1.0], self.rewards.constraint_failure, True, {} + observation = [-1.0] + reward = self.rewards.constraint_failure + self.write_trajectory( + TrajectoryEntry( + step=self.test_run.step, + action=action, + reward=reward, + observation=observation, + env_params=dict(self.test_run.current_env_params), + ) + ) + for observer in self.observers: + observer.after_step(self.test_run, observation, reward) + return observation, reward, True, {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/configurator/cloudai_gym.py` around lines 260 - 262, The constraint check failure at line 260 in the step method causes an early return that skips both write_trajectory() and after_step() calls, leaving env.csv with an unmatched row. To fix this, before the early return statement in the constraint_check condition, ensure that write_trajectory() is called to write a matching trajectory row, and verify that after_step() is also invoked (or that its necessary cleanup logic executes) so both CSV files remain synchronized when constraint failures occur.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Line 206: The step method has an incorrect type hint for the action parameter.
The type hint `dict[str, int]` is too restrictive because the gymnasium Dict
action space (created in __init__) yields actions containing both Discrete
spaces with numpy int scalars and Box spaces with numpy float32 arrays. Change
the action parameter type hint in the step method from `dict[str, int]` to
`dict[str, Any]` to properly accommodate all possible action values from the
combined action space.
---
Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 386-390: The CSV writer is currently persisting only the step,
action, reward, and observation fields from the TrajectoryEntry object, but the
env_params field is being populated in memory and should also be written to
disk. Add "env_params" to the CSV header row in the writerow call when
file_exists is False, and add entry.env_params to the data row writerow call to
ensure the env_params field is persisted alongside the other trajectory fields
in the CSV artifact.
- Around line 260-262: The constraint check failure at line 260 in the step
method causes an early return that skips both write_trajectory() and
after_step() calls, leaving env.csv with an unmatched row. To fix this, before
the early return statement in the constraint_check condition, ensure that
write_trajectory() is called to write a matching trajectory row, and verify that
after_step() is also invoked (or that its necessary cleanup logic executes) so
both CSV files remain synchronized when constraint failures occur.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 54ac0a89-e79c-4c2e-9208-d0149316c2ff
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (17)
pyproject.tomlsrc/cloudai/_core/action_space.pysrc/cloudai/_core/test_scenario.pysrc/cloudai/cli/handlers.pysrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/base_agent.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/env_params.pysrc/cloudai/configurator/gymnasium_adapter.pysrc/cloudai/core.pysrc/cloudai/models/workload.pytests/test_action_space.pytests/test_cloudaigym.pytests/test_env_params.pytests/test_gymnasium_adapter_contract.pytests/test_handlers.pytests/test_test_scenario.py
9e8fa29 to
58acf09
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cloudai/configurator/cloudai_gym.py (1)
237-262:⚠️ Potential issue | 🟠 Major | ⚡ Quick winComplete constraint-failed trials through the same trajectory/observer path.
before_step()can sample/persist env params before this branch, but a constraint failure returns without a matching trajectory row orafter_step()callback. It also returns a one-element observation even though offline observation shape now followsagent_metrics. Write the failure trajectory withcurrent_env_params, callafter_step(), and return an observation with the declared length.Proposed fix
if not self.test_run.test.constraint_check(self.test_run, self.runner.system): logging.info("Constraint check failed. Skipping step.") - return [-1.0], self.rewards.constraint_failure, True, {} + observation = [-1.0] * max(len(self.test_run.test.agent_metrics), 1) + reward = self.rewards.constraint_failure + self.write_trajectory( + TrajectoryEntry( + step=self.test_run.step, + action=action, + reward=reward, + observation=observation, + env_params=dict(self.test_run.current_env_params), + ) + ) + for observer in self.observers: + observer.after_step(self.test_run, observation, reward) + return observation, reward, True, {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/configurator/cloudai_gym.py` around lines 237 - 262, The constraint check failure path is incomplete compared to the cached result path above it. When the constraint_check fails in the conditional block, you need to add three things to match the trajectory/observer pattern: (1) write a TrajectoryEntry to the trajectory using write_trajectory() with the current step, action, the constraint_failure reward, an observation with the correct length matching agent_metrics shape, and current_env_params; (2) call after_step() on each observer in self.observers with the appropriate observation and reward; (3) fix the return statement to return an observation with the declared/correct length instead of the hardcoded one-element list [-1.0]. This ensures constraint failures follow the same trajectory logging and observer callback path as successful steps and cached results.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/cli/handlers.py`:
- Around line 350-352: The conditional at line 351 allows --single-sbatch to
bypass the agent-driven (live-RL) routing check. Currently, any --single-sbatch
scenario goes directly to handle_non_dse_job even if agent_driven test runs are
present. Modify the condition to exclude live-RL scenarios from the
--single-sbatch path by ensuring that if any(agent_driven) is true (indicating
live-RL), the code does not take the handle_non_dse_job branch regardless of the
--single-sbatch flag. This ensures live-RL jobs always route through the correct
path to execute agent.run() as required by the PR objective.
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 207-210: The issue is that online trajectory steps use
_online_step_count which resets to 0 on every reset() call, causing trajectory
rows to collapse back to step=1 while TestRun.step remains stale, unlike the
offline path which properly uses TestRun.step. In the _online_step() method,
increment TestRun in the same way the offline path does, and write the TestRun
step value to trajectory rows instead of using _online_step_count, ensuring
consistency between online and offline trajectory tracking. Also remove or stop
resetting _online_step_count in the reset() method since it should no longer be
used.
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 63-84: The `GymnasiumAdapter` class currently relies on duck
typing to satisfy the gymnasium interface, but ecosystem tools like Stable
Baselines3 require proper inheritance from `gymnasium.Env` to work correctly.
Add a guarded import at the top of the file that attempts to import
`gymnasium.Env` and falls back to `object` if the import fails (since gymnasium
is an optional dependency), then change the `GymnasiumAdapter` class definition
to inherit from that imported base class instead of just `object`. This
maintains compatibility when gymnasium is not installed while enabling proper
isinstance checks for ecosystem integration.
In `@tests/test_env_params.py`:
- Around line 87-88: The pytest.raises(ValueError) context manager in the
sink.write call is too broad and would pass for any ValueError, not just the
intended validation error for the drop_rate parameter. Add a match= pattern to
the pytest.raises(ValueError) call that specifies the expected error message
text from the drop_rate validation failure. This ensures the test specifically
validates the intended error path rather than catching unrelated ValueErrors.
---
Outside diff comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 237-262: The constraint check failure path is incomplete compared
to the cached result path above it. When the constraint_check fails in the
conditional block, you need to add three things to match the trajectory/observer
pattern: (1) write a TrajectoryEntry to the trajectory using write_trajectory()
with the current step, action, the constraint_failure reward, an observation
with the correct length matching agent_metrics shape, and current_env_params;
(2) call after_step() on each observer in self.observers with the appropriate
observation and reward; (3) fix the return statement to return an observation
with the declared/correct length instead of the hardcoded one-element list
[-1.0]. This ensures constraint failures follow the same trajectory logging and
observer callback path as successful steps and cached results.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 9e582d9e-80ee-44c5-b97b-bdca22e4a058
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
pyproject.tomlsrc/cloudai/_core/action_space.pysrc/cloudai/_core/test_scenario.pysrc/cloudai/cli/handlers.pysrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/env_params.pysrc/cloudai/configurator/gymnasium_adapter.pysrc/cloudai/core.pysrc/cloudai/models/workload.pytests/test_action_space.pytests/test_cloudaigym.pytests/test_env_params.pytests/test_gymnasium_adapter_contract.pytests/test_handlers.py
5ed62b4 to
d5c55ef
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/cli/handlers.py`:
- Around line 374-376: The mixed-input guard that validates agent-driven and
non-agent-driven test runs are not mixed together must be evaluated before the
args.single_sbatch short-circuit in the handler logic. Currently, the guard is
being bypassed when single_sbatch is true, allowing invalid mixed scenarios to
proceed through handle_non_dse_job. Add a guard at the beginning of the routing
logic (before the args.single_sbatch check) that validates the test_scenario
does not contain mixed agent-driven and plain jobs, and raise an appropriate
error if it does. Apply this same guard pattern to the other affected location
mentioned in the comment to ensure consistent enforcement of the mixed-job
contract across all routing paths.
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 332-334: In the CloudAIGymEnv class logging statement, replace the
f-string formatting with %-style formatting in the logging.info call. Change the
f-string `f"CloudAIGymEnv [online] step {self._online_step_count}"` to use
%-style formatting by passing the message template as the first argument and the
variable as a subsequent argument to logging.info(). This ensures the string is
only evaluated when the logging level actually permits the message to be logged,
improving performance.
In `@tests/test_handlers.py`:
- Around line 513-529: The _run_routing helper function only tests the
single_sbatch=False path, leaving the single_sbatch=True code path untested and
at risk of regression. Modify the _run_routing function to accept a
single_sbatch parameter with a default value, then pass this parameter to the
argparse.Namespace creation instead of the hardcoded single_sbatch=False value.
Additionally, create or extend test cases in the range of lines 531-551 to call
_run_routing with both single_sbatch=True and single_sbatch=False to ensure both
routing paths are properly tested.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: ef8acc19-cbf0-48fb-8ffc-24c4e46da0a0
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
pyproject.tomlsrc/cloudai/_core/action_space.pysrc/cloudai/cli/handlers.pysrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/env_params.pysrc/cloudai/configurator/gymnasium_adapter.pysrc/cloudai/core.pytests/test_action_space.pytests/test_cloudaigym.pytests/test_env_params.pytests/test_gymnasium_adapter_contract.pytests/test_handlers.py
d5c55ef to
0b28eb9
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cloudai/util/lazy_imports.py (1)
2-2:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winCopyright header year must be updated to pass CI checks.
The copyright year is "2025" but tests expect "2025-2026". Per repository conventions, files touched in consecutive years should use a hyphen range.
📅 Proposed fix
-# Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/util/lazy_imports.py` at line 2, The copyright header in the file has the year listed as "2025" but the CI tests expect "2025-2026" per repository conventions for files modified across consecutive years. Update the copyright year in the header comment from "2025" to "2025-2026" to match the expected format and pass the CI checks.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/cloudai/util/lazy_imports.py`:
- Line 2: The copyright header in the file has the year listed as "2025" but the
CI tests expect "2025-2026" per repository conventions for files modified across
consecutive years. Update the copyright year in the header comment from "2025"
to "2025-2026" to match the expected format and pass the CI checks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: ea876b83-809d-4c4f-86ec-fa3a9f8c427e
📒 Files selected for processing (8)
src/cloudai/cli/handlers.pysrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/gymnasium_adapter.pysrc/cloudai/core.pysrc/cloudai/util/lazy_imports.pytests/test_cloudaigym.pytests/test_handlers.py
0b28eb9 to
b21a3fe
Compare
f8d6141 to
7a1c2ca
Compare
|
Design note (carried from a CodeRabbit finding on In offline DSE this is intentional — a constraint failure returns Flagging for a decision on this PR: should live-RL honor server-emitted |
48871be to
e657665
Compare
Drop the EnvParamsSink Protocol + CsvSink pair (and runtime_checkable) for a single concrete EnvParamsSink, built unconditionally in CloudAIGymEnv. The sink is now stateless: write() takes the record path per call and skips empty samples, so non-DR runs write nothing and write_trajectory needs no branch. Derive both records from a new iteration_dir property and expose the env record via the env_params_record_path property (was _env_csv_path), keeping env.csv and trajectory.csv step-aligned without coupling the name to CSV.
…ty flag Replace the hardcoded `agent == "grid_search"` check with a BaseAgent.samples_env_params capability flag (opt-in, defaults False). Only agents whose search consumes per-trial env_params sampling set it True; enumerating/surrogate agents leave it False, so a config that declares env_params for an agent that would ignore them is rejected up front instead of silently no-op'ing. New agents answer for themselves with no string to maintain. Relocate validate_dse_env_params out of the CLI handlers into configurator/env_params.py next to the logic it guards, looking the agent up via the Registry. Unknown agents are deferred to the dedicated agent-resolution error rather than masked here. Keep all public-facing comments, docstrings, and the error message generic (no internal agent names). Cover the full validator matrix, including the unknown-agent deferral.
Compress multi-line inline comments down to the single non-obvious rationale (or drop them where the code already speaks), per the self-documenting-code principle. Public API docstrings and test intent comments are left intact.
apply_params_set overlays sampled scalar draws onto cmd_args, then reconstructs the TestDefinition to validate the applied action values. That pass re-ran validate_env_params, which rejects a weighted env_param whose cmd_args target is no longer a candidate list - exactly what the overlay produces. env_params is already validated at parse time, so drop it from the validation-only dump. Adds a regression test covering a weighted env_param's scalar draw.
3ac3d1a to
eca78d4
Compare
|
Parking this PR for now. Per a change in release plan, this work isn't needed for the upcoming aiconfigurator release, so I'm moving it to draft to keep it out of the review queue. @podkidyshev — no need to review this one yet; I'll mark it ready again when it's scheduled for release. Thanks! |
eca78d4 to
fba8a72
Compare
An env_params entry only reclassifies a list-valued cmd_args sweep as env-sampled; a scalar is already fixed, so annotating it is a meaningless label. Previously such an annotation was tolerated as a silent no-op, which let it slip through parse-time validation and inconsistently trip (or not) the downstream "no agent will sample them" check depending on run mode. Reject it where the contract lives - TestDefinition.validate_env_params - so the failure is immediate and mode-independent. EnvParams.from_test's non-list guard becomes defensive (parse-time now guarantees lists); the post-overlay path already drops env_params before re-validating, so concrete scalar draws are unaffected. Extract the per-field checks into a helper to keep the validator under the complexity limit, and update tests: scalar annotations now assert rejection instead of no-op tolerance.
…otocol Add ObsLeafDescriptor (a self-describing observation leaf: "box" of width dim, or "discrete" of size n) and a StructuredObservation Protocol that documents the optional env hooks structured_observation_descriptors() and encode_observation(). These let an env expose a named, per-leaf observation so adapters (e.g. GymnasiumAdapter) can build the matching gymnasium spaces.Dict; the hooks are duck-typed, so envs need not subclass. Both exported via cloudai.core.
…ejection tests Negative tests pass an extra kwarg and an out-of-Literal kind to assert ValidationError; mark the deliberate type violations with type: ignore.
Wrap a CloudAI BaseGym as a gymnasium.Env-shaped object: a spaces.Dict of Discrete (list params) and Box (ContinuousSpace) actions over the tunable params with fixed (single-value) params injected each step; observations as either a flat float32 Box or, when the env opts in via the structured-obs hooks, a spaces.Dict of per-leaf ObsLeafDescriptor subspaces. Continuous dtype="int" params are quantized (rounded/clamped) at decode_action so the trajectory cache key collapses float jitter. The adapter is a pure pass-through over test_run.step (never mutates it), so contextual-bandit rollouts that reset() per trial keep a monotonic trial index. gymnasium is an optional dependency lazy-imported behind the new [rl] extra (also added to dev); CloudAIGymEnv.define_observation_space() now returns one slot per agent metric so adapters get the right Box shape. Exported via cloudai.core. Caller-contract tests pin the step-monotonicity, observation pass-through, continuous-quantization, and structured-obs invariants.
…ature step() delegates to decode_action(dict[str, Any]) and exists precisely to round float/continuous policy actions to ints; widen its parameter type from dict[str, int] to dict[str, Any] to match.
…reserve traceback on DSE re-raise - _as_obs_array(): assert encoded keys match descriptors before coercion (reuses _assert_keys, same guard as decode_action/step_raw) and materialize output by descriptor keys to avoid KeyError on extra keys and silent partial observations on missing keys. - handlers.py: re-raise the captured hard-fail with its original traceback. Addresses CodeRabbit findings on NVIDIA#930.
…ngleton Replace the bespoke _import_gymnasium() in-method seam with the canonical lazy.gymnasium / lazy.np properties; addresses the in-method-import review concern. No behavior change — gymnasium stays an optional [rl] extra.
The lazy.gymnasium.spaces refactor gives the adapter precise gymnasium types instead of Any, which surfaced two latent issues the scoped pre-commit run missed: - pyright now sees Space[Any]/Dict in the adapter contract test, so concrete attribute access (.low/.high/.n/.spaces) is flagged. Narrow via local bindings + isinstance before access. - lazy_imports.py now has a 2026 commit in its history, so the ci_only copyright check requires the year range 2025-2026.
Inherit from gymnasium.Env (guarded import, falls back to object when the optional [rl] extra is absent) so ecosystem tooling that performs isinstance checks (e.g. Stable-Baselines3) accepts the adapter. - Use the TYPE_CHECKING import form so pyright sees a concrete base class while runtime keeps the optional-dependency fallback. - Drop ClassVar on metadata to match Env's attribute shape (noqa RUF012). - Rename the inner-env accessor unwrapped -> cloudai_env; gymnasium's Env.unwrapped (returns self) is the correct base-env semantics, and the old override returned a non-Env (BaseGym), which would mislead ecosystem code calling .unwrapped.
Inheriting gymnasium.Env widened the static type of action_space from the concrete spaces.Dict the adapter builds to the base spaces.Space, which has no __getitem__. Cast at the test call site to restore subspace indexing for pyright (runtime is unchanged; action_space is always a Dict).
…rse of decode_action decode_action had no public inverse, so consumers needing value->index encoding (e.g. RLlib warm-start / behavioral cloning) reached into the private _tunable_params dict. When ContinuousSpace support split that internal, those consumers broke with AttributeError. encode_action closes the contract: discrete values map to their candidate index, continuous values wrap into the clamped float32 Box array, so decode_action(encode_action(v)) == v for any native v. Adds round-trip contract tests pinning the invariant and rejection of non-candidate values / key mismatches.
…tinuousSpace The GymnasiumAdapter's continuous-action path depends on ContinuousSpace, which ships separately. Until then nothing constructs a ContinuousSpace, so the continuous branches here are unreachable and the only effect of the import is an ImportError at module load. Drop the continuous import, _continuous_params, the Box action mapping, and decode/encode continuous handling so the adapter builds and ships standalone over discrete + structured-observation support. The continuous support rejoins when ContinuousSpace lands.
Adds an optional online (live-RL) execution mode: instead of launching a workload through the runner, the env delegates each trial to an in-process GymServer. Mode is selected via an injected server or cmd_args.live_rl_mode (server built from cmd_args.env_class); the agent-facing interface is unchanged. Default runner-backed behavior is untouched.
…split
A bare class name (no module) made rsplit('.', 1) return one element and
raise an opaque unpacking ValueError; fail fast with a clear message.
Addresses CodeRabbit finding on NVIDIA#932.
Online (live-RL) steps wrote trajectory rows from a per-instance counter that reset to 0 on every reset(). Under reset-per-episode Gymnasium/RLlib rollouts that collapsed online trajectory.csv rows back to step=1 each episode. Drive online steps from the monotonic test_run.step (incremented per step, never reset) so rows keep increasing across resets. Removes the now-redundant _online_step_count; render() and the trajectory writer both read test_run.step. Adds a regression test asserting steps stay 1,2,3 across a mid-rollout reset.
env_params_record_path was pinned to iteration_dir (runner.scenario_root), but online (live-RL) runs write trajectory.csv under test_run.output_path and have no runner-backed scenario_root. write_trajectory evaluates the record path unconditionally, so an online step hit scenario_root on a runner that does not expose it. Derive the path from trajectory_file_path so env.csv always sits next to trajectory.csv in both modes, with no extra branch in the write path.
An online live-RL run carries no TOML sweep (is_dse_job is False) but must still drive the agent's own run() loop. handle_dry_run_and_run now treats a run as agent-driven when it is a DSE sweep OR cmd_args.live_rl_mode is set, routing it to handle_dse_job instead of the grid-unrolled non-DSE path.
…s a valid config Locks in the contract that an online live-RL run sources its action space from the GymServer, not a TOML param space. A config with only env_params (no list-valued cmd_args/extra_env_vars/num_nodes) is therefore complete and valid: is_dse_job is False and param_space is empty, yet CloudAIGymEnv still builds the action space from the server and honors env_params (EnvParamsObserver is built).
single_sbatch grid-unrolls DSE param spaces in one allocation, but SingleSbatchRunner has no live-RL path, so a live_rl_mode job under --single-sbatch would silently run once as a static job instead of driving the in-process GymServer loop. Hard-error on that combination. This is a narrow stopgap; the broader single_sbatch-vs-agent-driven routing rework (mixed-job guard ordering, per-agent-type semantics) is tracked in NVIDIA#937 and referenced from a TODO at the guard. Extract routing into _dispatch_agent_driven_run to keep handle_dry_run_and_run under the complexity limit.
The live-RL-only test still asserted the removed observer pattern and declared env_params with no candidate list, which the first-class EnvParams model (values live in cmd_args) no longer builds. Give ball_speed a candidate list and assert env.params is built; it stays excluded from the action space, so is_dse_job/param_space are unchanged.
fba8a72 to
523a431
Compare
Issue
An online live-RL run carries no TOML param sweep, so
is_dse_jobis False andhandle_dry_run_and_runsent it down the grid-unrolled non-DSE path — never invoking the agent'srun()loop.Fix
Treat a run as agent-driven when it is a DSE sweep or
cmd_args.live_rl_modeis set, and route those tohandle_dse_job(which delegates toagent.run()). Plain jobs are unchanged; mixing agent-driven and plain jobs still errors.Testing
pytest tests/test_handlers.py— new unit test for_is_dse_or_live_rl+ 3 routing tests (live-RL→DSE handler, plain→non-DSE, mixed→error). ruff/pyright/vulture/import-linter clean.