test(aiconfig): add aiconfigurator SDK contract canary by rutayan-nv · Pull Request #939 · NVIDIA/cloudai

rutayan-nv · 2026-06-22T19:02:06Z

What

A lightweight CI canary that verifies aiconfig's runtime/predictor.py stays compatible with the pinned aiconfigurator SDK.

Why

predictor.py is our adapter onto aiconfigurator.sdk.*; the version is pinned by the aiconfig workload's PythonEnvironment (aiconfigurator~=0.5.0). If a pin bump renames or drops a SDK symbol the adapter uses, the adapter breaks at run time and DSE silently stops producing metrics — and nothing catches it today (the existing aiconfig tests mock or never touch the SDK).

How

ci_only test. Fetches only the pinned source via uv pip install --no-deps (seconds, no torch), then AST-audits it — no imports, so the torch-importing SDK modules still parse.
Asserts the exact surface predictor.py consumes still exists: ModelConfig / RuntimeConfig kwargs, the five *QuantMode enums + members, get_database / get_model / get_backend, and InferenceSession.run_static.
The pin is read from the workload (single source of truth); expected symbols are kept in lockstep with predictor.py.

Scope

Catches input-side drift (removed/renamed symbols). Return-shape changes (e.g. a renamed summary-df column) need a real prediction run and are out of scope.

Test

pytest -m ci_only → 7 passed (~0.2s). Deselected from the hermetic suite; tests/* coverage gate unaffected.

Review: @podkidyshev

runtime/predictor.py adapts aiconfigurator.sdk, whose version is pinned by the workload's PythonEnvironment (aiconfigurator~=0.5.0). A pin bump can rename or drop a SDK symbol the adapter relies on and silently break DSE, with no test to catch it -- the existing aiconfig tests never touch the SDK. This ci_only canary fetches only the pinned source (uv pip install --no-deps; no torch) and AST-audits the exact surface predictor.py consumes: the ModelConfig/RuntimeConfig kwargs, the *QuantMode enums and members, get_database/get_model/get_backend, and InferenceSession.run_static. It catches input-side drift (removed/renamed symbols); return-shape changes are out of scope.

coderabbitai · 2026-06-22T19:02:23Z

📝 Walkthrough

Walkthrough

Adds a new CI-only test module tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py (217 lines). It fetches the pinned aiconfigurator SDK source via uv pip install --no-deps, AST-parses the downloaded files without importing them, and asserts that all constructor kwargs, quantization enum members, free functions, and InferenceSession methods referenced by runtime/predictor.py are still present.

Changes

aiconfigurator SDK Compatibility Canary

Layer / File(s)	Summary
Module docstring, CI marker, and contract constants `tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py`	Defines the module docstring, the CI-only pytest marker, and all contract constants: required `ModelConfig`/`RuntimeConfig` kwargs, required quantization enum members, expected free-function locations, and required `InferenceSession` methods.
Pinned SDK requirement resolution and uv-based source fetch `tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py`	Implements `_pinned_requirement()` to resolve the single pinned requirement string from `AiconfiguratorTestDefinition`, `_fetch_pinned_sdk_src()` (with `lru_cache`) to run `uv pip install --no-deps` into a temp directory and validate the result, and `_sdk_src()` to skip tests when uv is unavailable or the fetch fails.
AST parsing utilities and contract-check logic `tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py`	Implements `_parse()`, `_find_class()`, `_defines_function()`, `_accepted_kwargs()` (handling explicit `__init__` params, annotated fields, and `**kwargs`), and `_enum_members()` to extract assignment-based enum identifiers—all without importing the SDK.
Parametrized and direct pytest test cases `tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py`	Adds four test functions asserting that config classes exist and accept required kwargs, quantization enums define required members, predictor-dependent free functions exist in their expected modules, and `InferenceSession` defines `run_static`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 A canary of ASTs hops through the code,
No torch imports needed upon this road.
It sniffs each kwarg, each enum, each fn,
And skips with a wiggle if uv's not in.
"The contract still holds!" squeaks the rabbit with cheer,
No SDK surprises shall drift in here. 🌿

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'test(aiconfig): add aiconfigurator SDK contract canary' directly and concisely describes the main change—adding a new test module that acts as a canary to verify SDK contract compatibility.
Description check	✅ Passed	The description is clearly related to the changeset, explaining the purpose, approach, and scope of the new canary test that verifies predictor.py compatibility with the pinned aiconfigurator SDK.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py`:
- Around line 134-139: The _sdk_src() function is catching all RuntimeError
exceptions and converting them to pytest.skip(), which masks compatibility
issues when the pinned SDK source is broken or package layout changes. This
defeats the purpose of the canary test. Remove or refactor the broad
RuntimeError exception handler to allow actual compatibility errors to fail the
test instead of silently skipping it. Only skip the test for specific, expected
errors if necessary, not for all RuntimeError cases, so that broken pins and
package drift are properly caught rather than hidden by false-green CI results.
- Around line 120-127: The subprocess.run() call for installing the
aiconfigurator package lacks a timeout parameter, which can cause CI to hang
indefinitely if network access stalls. Add a timeout parameter to the
subprocess.run() call to prevent indefinite blocking. Set an appropriate timeout
value (in seconds) that allows sufficient time for the network-dependent
installation to complete while preventing excessive hangs in CI environments.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 5cf80125-f7f3-4523-955c-84879abdc21a

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0e8cc and 75a8a99.

📒 Files selected for processing (1)

tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py

coderabbitai · 2026-06-22T19:06:48Z

+    proc = subprocess.run(
+        [uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()],
+        capture_output=True,
+        text=True,
+    )
+    if proc.returncode != 0:
+        raise RuntimeError(f"failed to fetch aiconfigurator --no-deps: {proc.stderr.strip()[:400]}")
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a timeout around the SDK fetch subprocess.

Line 120 starts a network-dependent install with no timeout, so CI can hang until the outer job timeout if uv/index access stalls.

Suggested patch

`@functools.lru_cache`(maxsize=1) def _fetch_pinned_sdk_src() -> Path: @@ - proc = subprocess.run( - [uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()], - capture_output=True, - text=True, - ) + try: + proc = subprocess.run( + [uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()], + capture_output=True, + text=True, + check=False, + timeout=120, + ) + except subprocess.TimeoutExpired as exc: + raise RuntimeError("timed out fetching pinned aiconfigurator source via uv") from exc

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

proc = subprocess.run(

[uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()],

capture_output=True,

text=True,

)

if proc.returncode != 0:

raise RuntimeError(f"failed to fetch aiconfigurator --no-deps: {proc.stderr.strip()[:400]}")

try:

proc = subprocess.run(

[uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()],

capture_output=True,

text=True,

check=False,

timeout=120,

)

except subprocess.TimeoutExpired as exc:

raise RuntimeError("timed out fetching pinned aiconfigurator source via uv") from exc

if proc.returncode != 0:

raise RuntimeError(f"failed to fetch aiconfigurator --no-deps: {proc.stderr.strip()[:400]}")

🧰 Tools

🪛 Ruff (0.15.18)

[error] 120-120: subprocess call: check for execution of untrusted input

(S603)

[warning] 120-120: subprocess.run without explicit check argument

Add explicit check=False

(PLW1510)

[warning] 126-126: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py` around lines 120 - 127, The subprocess.run() call for installing the aiconfigurator package lacks a timeout parameter, which can cause CI to hang indefinitely if network access stalls. Add a timeout parameter to the subprocess.run() call to prevent indefinite blocking. Set an appropriate timeout value (in seconds) that allows sufficient time for the network-dependent installation to complete while preventing excessive hangs in CI environments.

coderabbitai · 2026-06-22T19:06:48Z

+def _sdk_src() -> Path:
+    try:
+        return _fetch_pinned_sdk_src()
+    except RuntimeError as exc:
+        pytest.skip(str(exc))
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t skip on all fetch/runtime errors in this CI canary.

Lines 137-138 convert every RuntimeError into pytest.skip, which can silently bypass compatibility checks when the pin is broken or package layout changes. That creates false-green CI for the exact drift this test is meant to catch.

Suggested patch

def _sdk_src() -> Path: try: return _fetch_pinned_sdk_src() except RuntimeError as exc: - pytest.skip(str(exc)) + msg = str(exc) + if "uv is not available" in msg: + pytest.skip(msg) + pytest.fail(msg, pytrace=False)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _sdk_src() -> Path:

try:

return _fetch_pinned_sdk_src()

except RuntimeError as exc:

pytest.skip(str(exc))

def _sdk_src() -> Path:

try:

return _fetch_pinned_sdk_src()

except RuntimeError as exc:

msg = str(exc)

if "uv is not available" in msg:

pytest.skip(msg)

pytest.fail(msg, pytrace=False)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py` around lines 134 - 139, The _sdk_src() function is catching all RuntimeError exceptions and converting them to pytest.skip(), which masks compatibility issues when the pinned SDK source is broken or package layout changes. This defeats the purpose of the canary test. Remove or refactor the broad RuntimeError exception handler to allow actual compatibility errors to fail the test instead of silently skipping it. Only skip the test for specific, expected errors if necessary, not for all RuntimeError cases, so that broken pins and package drift are properly caught rather than hidden by false-green CI results.

podkidyshev · 2026-06-24T09:26:08Z

@rutayan-nv I'm not quite sure about this PR... It crosses the line of a unit test by accessing outside world (fetching the lib). Additionally, one of the PythonEnvironment features is to use a custom python version if needed (used for one of the private workloads) which this unit test doesn't do, it modifies current venv. Essentially it's an integration test now and we don't have any dedicated CI environment for it yet.

What I would do is make the python version + aiconfigurator version configurable in the test toml so that users can control it. What do you think?

rutayan-nv requested review from jeffnvidia, podkidyshev and srivatsankrishnan as code owners June 22, 2026 19:02

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(aiconfig): add aiconfigurator SDK contract canary#939

test(aiconfig): add aiconfigurator SDK contract canary#939
rutayan-nv wants to merge 1 commit into
NVIDIA:mainfrom
rutayan-nv:rpatro/aiconfig-sdk-contract-canary

rutayan-nv commented Jun 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

podkidyshev commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rutayan-nv commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Scope

Test

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

podkidyshev commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rutayan-nv commented Jun 22, 2026 •

edited

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading