Skip to content

test(aiconfig): add aiconfigurator SDK contract canary#939

Open
rutayan-nv wants to merge 1 commit into
NVIDIA:mainfrom
rutayan-nv:rpatro/aiconfig-sdk-contract-canary
Open

test(aiconfig): add aiconfigurator SDK contract canary#939
rutayan-nv wants to merge 1 commit into
NVIDIA:mainfrom
rutayan-nv:rpatro/aiconfig-sdk-contract-canary

Conversation

@rutayan-nv

@rutayan-nv rutayan-nv commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What

A lightweight CI canary that verifies aiconfig's runtime/predictor.py stays compatible with the pinned aiconfigurator SDK.

Why

predictor.py is our adapter onto aiconfigurator.sdk.*; the version is pinned by the aiconfig workload's PythonEnvironment (aiconfigurator~=0.5.0). If a pin bump renames or drops a SDK symbol the adapter uses, the adapter breaks at run time and DSE silently stops producing metrics — and nothing catches it today (the existing aiconfig tests mock or never touch the SDK).

How

  • ci_only test. Fetches only the pinned source via uv pip install --no-deps (seconds, no torch), then AST-audits it — no imports, so the torch-importing SDK modules still parse.
  • Asserts the exact surface predictor.py consumes still exists: ModelConfig / RuntimeConfig kwargs, the five *QuantMode enums + members, get_database / get_model / get_backend, and InferenceSession.run_static.
  • The pin is read from the workload (single source of truth); expected symbols are kept in lockstep with predictor.py.

Scope

Catches input-side drift (removed/renamed symbols). Return-shape changes (e.g. a renamed summary-df column) need a real prediction run and are out of scope.

Test

pytest -m ci_only → 7 passed (~0.2s). Deselected from the hermetic suite; tests/* coverage gate unaffected.


Review: @podkidyshev

runtime/predictor.py adapts aiconfigurator.sdk, whose version is pinned by the
workload's PythonEnvironment (aiconfigurator~=0.5.0). A pin bump can rename or
drop a SDK symbol the adapter relies on and silently break DSE, with no test to
catch it -- the existing aiconfig tests never touch the SDK.

This ci_only canary fetches only the pinned source (uv pip install --no-deps;
no torch) and AST-audits the exact surface predictor.py consumes: the
ModelConfig/RuntimeConfig kwargs, the *QuantMode enums and members,
get_database/get_model/get_backend, and InferenceSession.run_static. It catches
input-side drift (removed/renamed symbols); return-shape changes are out of scope.
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a new CI-only test module tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py (217 lines). It fetches the pinned aiconfigurator SDK source via uv pip install --no-deps, AST-parses the downloaded files without importing them, and asserts that all constructor kwargs, quantization enum members, free functions, and InferenceSession methods referenced by runtime/predictor.py are still present.

Changes

aiconfigurator SDK Compatibility Canary

Layer / File(s) Summary
Module docstring, CI marker, and contract constants
tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py
Defines the module docstring, the CI-only pytest marker, and all contract constants: required ModelConfig/RuntimeConfig kwargs, required quantization enum members, expected free-function locations, and required InferenceSession methods.
Pinned SDK requirement resolution and uv-based source fetch
tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py
Implements _pinned_requirement() to resolve the single pinned requirement string from AiconfiguratorTestDefinition, _fetch_pinned_sdk_src() (with lru_cache) to run uv pip install --no-deps into a temp directory and validate the result, and _sdk_src() to skip tests when uv is unavailable or the fetch fails.
AST parsing utilities and contract-check logic
tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py
Implements _parse(), _find_class(), _defines_function(), _accepted_kwargs() (handling explicit __init__ params, annotated fields, and **kwargs), and _enum_members() to extract assignment-based enum identifiers—all without importing the SDK.
Parametrized and direct pytest test cases
tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py
Adds four test functions asserting that config classes exist and accept required kwargs, quantization enums define required members, predictor-dependent free functions exist in their expected modules, and InferenceSession defines run_static.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 A canary of ASTs hops through the code,
No torch imports needed upon this road.
It sniffs each kwarg, each enum, each fn,
And skips with a wiggle if uv's not in.
"The contract still holds!" squeaks the rabbit with cheer,
No SDK surprises shall drift in here. 🌿

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'test(aiconfig): add aiconfigurator SDK contract canary' directly and concisely describes the main change—adding a new test module that acts as a canary to verify SDK contract compatibility.
Description check ✅ Passed The description is clearly related to the changeset, explaining the purpose, approach, and scope of the new canary test that verifies predictor.py compatibility with the pinned aiconfigurator SDK.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py`:
- Around line 134-139: The _sdk_src() function is catching all RuntimeError
exceptions and converting them to pytest.skip(), which masks compatibility
issues when the pinned SDK source is broken or package layout changes. This
defeats the purpose of the canary test. Remove or refactor the broad
RuntimeError exception handler to allow actual compatibility errors to fail the
test instead of silently skipping it. Only skip the test for specific, expected
errors if necessary, not for all RuntimeError cases, so that broken pins and
package drift are properly caught rather than hidden by false-green CI results.
- Around line 120-127: The subprocess.run() call for installing the
aiconfigurator package lacks a timeout parameter, which can cause CI to hang
indefinitely if network access stalls. Add a timeout parameter to the
subprocess.run() call to prevent indefinite blocking. Set an appropriate timeout
value (in seconds) that allows sufficient time for the network-dependent
installation to complete while preventing excessive hangs in CI environments.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 5cf80125-f7f3-4523-955c-84879abdc21a

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0e8cc and 75a8a99.

📒 Files selected for processing (1)
  • tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py

Comment on lines +120 to +127
proc = subprocess.run(
[uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()],
capture_output=True,
text=True,
)
if proc.returncode != 0:
raise RuntimeError(f"failed to fetch aiconfigurator --no-deps: {proc.stderr.strip()[:400]}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a timeout around the SDK fetch subprocess.

Line 120 starts a network-dependent install with no timeout, so CI can hang until the outer job timeout if uv/index access stalls.

Suggested patch
 `@functools.lru_cache`(maxsize=1)
 def _fetch_pinned_sdk_src() -> Path:
@@
-    proc = subprocess.run(
-        [uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()],
-        capture_output=True,
-        text=True,
-    )
+    try:
+        proc = subprocess.run(
+            [uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()],
+            capture_output=True,
+            text=True,
+            check=False,
+            timeout=120,
+        )
+    except subprocess.TimeoutExpired as exc:
+        raise RuntimeError("timed out fetching pinned aiconfigurator source via uv") from exc
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
proc = subprocess.run(
[uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()],
capture_output=True,
text=True,
)
if proc.returncode != 0:
raise RuntimeError(f"failed to fetch aiconfigurator --no-deps: {proc.stderr.strip()[:400]}")
try:
proc = subprocess.run(
[uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()],
capture_output=True,
text=True,
check=False,
timeout=120,
)
except subprocess.TimeoutExpired as exc:
raise RuntimeError("timed out fetching pinned aiconfigurator source via uv") from exc
if proc.returncode != 0:
raise RuntimeError(f"failed to fetch aiconfigurator --no-deps: {proc.stderr.strip()[:400]}")
🧰 Tools
🪛 Ruff (0.15.18)

[error] 120-120: subprocess call: check for execution of untrusted input

(S603)


[warning] 120-120: subprocess.run without explicit check argument

Add explicit check=False

(PLW1510)


[warning] 126-126: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py` around lines
120 - 127, The subprocess.run() call for installing the aiconfigurator package
lacks a timeout parameter, which can cause CI to hang indefinitely if network
access stalls. Add a timeout parameter to the subprocess.run() call to prevent
indefinite blocking. Set an appropriate timeout value (in seconds) that allows
sufficient time for the network-dependent installation to complete while
preventing excessive hangs in CI environments.

Comment on lines +134 to +139
def _sdk_src() -> Path:
try:
return _fetch_pinned_sdk_src()
except RuntimeError as exc:
pytest.skip(str(exc))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t skip on all fetch/runtime errors in this CI canary.

Lines 137-138 convert every RuntimeError into pytest.skip, which can silently bypass compatibility checks when the pin is broken or package layout changes. That creates false-green CI for the exact drift this test is meant to catch.

Suggested patch
 def _sdk_src() -> Path:
     try:
         return _fetch_pinned_sdk_src()
     except RuntimeError as exc:
-        pytest.skip(str(exc))
+        msg = str(exc)
+        if "uv is not available" in msg:
+            pytest.skip(msg)
+        pytest.fail(msg, pytrace=False)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _sdk_src() -> Path:
try:
return _fetch_pinned_sdk_src()
except RuntimeError as exc:
pytest.skip(str(exc))
def _sdk_src() -> Path:
try:
return _fetch_pinned_sdk_src()
except RuntimeError as exc:
msg = str(exc)
if "uv is not available" in msg:
pytest.skip(msg)
pytest.fail(msg, pytrace=False)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py` around lines
134 - 139, The _sdk_src() function is catching all RuntimeError exceptions and
converting them to pytest.skip(), which masks compatibility issues when the
pinned SDK source is broken or package layout changes. This defeats the purpose
of the canary test. Remove or refactor the broad RuntimeError exception handler
to allow actual compatibility errors to fail the test instead of silently
skipping it. Only skip the test for specific, expected errors if necessary, not
for all RuntimeError cases, so that broken pins and package drift are properly
caught rather than hidden by false-green CI results.

@podkidyshev

Copy link
Copy Markdown
Contributor

@rutayan-nv I'm not quite sure about this PR... It crosses the line of a unit test by accessing outside world (fetching the lib). Additionally, one of the PythonEnvironment features is to use a custom python version if needed (used for one of the private workloads) which this unit test doesn't do, it modifies current venv. Essentially it's an integration test now and we don't have any dedicated CI environment for it yet.

What I would do is make the python version + aiconfigurator version configurable in the test toml so that users can control it. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants