test(aiconfig): add aiconfigurator SDK contract canary#939
Conversation
runtime/predictor.py adapts aiconfigurator.sdk, whose version is pinned by the workload's PythonEnvironment (aiconfigurator~=0.5.0). A pin bump can rename or drop a SDK symbol the adapter relies on and silently break DSE, with no test to catch it -- the existing aiconfig tests never touch the SDK. This ci_only canary fetches only the pinned source (uv pip install --no-deps; no torch) and AST-audits the exact surface predictor.py consumes: the ModelConfig/RuntimeConfig kwargs, the *QuantMode enums and members, get_database/get_model/get_backend, and InferenceSession.run_static. It catches input-side drift (removed/renamed symbols); return-shape changes are out of scope.
📝 WalkthroughWalkthroughAdds a new CI-only test module Changesaiconfigurator SDK Compatibility Canary
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py`:
- Around line 134-139: The _sdk_src() function is catching all RuntimeError
exceptions and converting them to pytest.skip(), which masks compatibility
issues when the pinned SDK source is broken or package layout changes. This
defeats the purpose of the canary test. Remove or refactor the broad
RuntimeError exception handler to allow actual compatibility errors to fail the
test instead of silently skipping it. Only skip the test for specific, expected
errors if necessary, not for all RuntimeError cases, so that broken pins and
package drift are properly caught rather than hidden by false-green CI results.
- Around line 120-127: The subprocess.run() call for installing the
aiconfigurator package lacks a timeout parameter, which can cause CI to hang
indefinitely if network access stalls. Add a timeout parameter to the
subprocess.run() call to prevent indefinite blocking. Set an appropriate timeout
value (in seconds) that allows sufficient time for the network-dependent
installation to complete while preventing excessive hangs in CI environments.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 5cf80125-f7f3-4523-955c-84879abdc21a
📒 Files selected for processing (1)
tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py
| proc = subprocess.run( | ||
| [uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()], | ||
| capture_output=True, | ||
| text=True, | ||
| ) | ||
| if proc.returncode != 0: | ||
| raise RuntimeError(f"failed to fetch aiconfigurator --no-deps: {proc.stderr.strip()[:400]}") | ||
|
|
There was a problem hiding this comment.
Add a timeout around the SDK fetch subprocess.
Line 120 starts a network-dependent install with no timeout, so CI can hang until the outer job timeout if uv/index access stalls.
Suggested patch
`@functools.lru_cache`(maxsize=1)
def _fetch_pinned_sdk_src() -> Path:
@@
- proc = subprocess.run(
- [uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()],
- capture_output=True,
- text=True,
- )
+ try:
+ proc = subprocess.run(
+ [uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()],
+ capture_output=True,
+ text=True,
+ check=False,
+ timeout=120,
+ )
+ except subprocess.TimeoutExpired as exc:
+ raise RuntimeError("timed out fetching pinned aiconfigurator source via uv") from exc📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| proc = subprocess.run( | |
| [uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()], | |
| capture_output=True, | |
| text=True, | |
| ) | |
| if proc.returncode != 0: | |
| raise RuntimeError(f"failed to fetch aiconfigurator --no-deps: {proc.stderr.strip()[:400]}") | |
| try: | |
| proc = subprocess.run( | |
| [uv, "pip", "install", "--no-deps", "--target", str(target), _pinned_requirement()], | |
| capture_output=True, | |
| text=True, | |
| check=False, | |
| timeout=120, | |
| ) | |
| except subprocess.TimeoutExpired as exc: | |
| raise RuntimeError("timed out fetching pinned aiconfigurator source via uv") from exc | |
| if proc.returncode != 0: | |
| raise RuntimeError(f"failed to fetch aiconfigurator --no-deps: {proc.stderr.strip()[:400]}") |
🧰 Tools
🪛 Ruff (0.15.18)
[error] 120-120: subprocess call: check for execution of untrusted input
(S603)
[warning] 120-120: subprocess.run without explicit check argument
Add explicit check=False
(PLW1510)
[warning] 126-126: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py` around lines
120 - 127, The subprocess.run() call for installing the aiconfigurator package
lacks a timeout parameter, which can cause CI to hang indefinitely if network
access stalls. Add a timeout parameter to the subprocess.run() call to prevent
indefinite blocking. Set an appropriate timeout value (in seconds) that allows
sufficient time for the network-dependent installation to complete while
preventing excessive hangs in CI environments.
| def _sdk_src() -> Path: | ||
| try: | ||
| return _fetch_pinned_sdk_src() | ||
| except RuntimeError as exc: | ||
| pytest.skip(str(exc)) | ||
|
|
There was a problem hiding this comment.
Don’t skip on all fetch/runtime errors in this CI canary.
Lines 137-138 convert every RuntimeError into pytest.skip, which can silently bypass compatibility checks when the pin is broken or package layout changes. That creates false-green CI for the exact drift this test is meant to catch.
Suggested patch
def _sdk_src() -> Path:
try:
return _fetch_pinned_sdk_src()
except RuntimeError as exc:
- pytest.skip(str(exc))
+ msg = str(exc)
+ if "uv is not available" in msg:
+ pytest.skip(msg)
+ pytest.fail(msg, pytrace=False)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _sdk_src() -> Path: | |
| try: | |
| return _fetch_pinned_sdk_src() | |
| except RuntimeError as exc: | |
| pytest.skip(str(exc)) | |
| def _sdk_src() -> Path: | |
| try: | |
| return _fetch_pinned_sdk_src() | |
| except RuntimeError as exc: | |
| msg = str(exc) | |
| if "uv is not available" in msg: | |
| pytest.skip(msg) | |
| pytest.fail(msg, pytrace=False) | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/workloads/aiconfig/test_aiconfigurator_sdk_contract.py` around lines
134 - 139, The _sdk_src() function is catching all RuntimeError exceptions and
converting them to pytest.skip(), which masks compatibility issues when the
pinned SDK source is broken or package layout changes. This defeats the purpose
of the canary test. Remove or refactor the broad RuntimeError exception handler
to allow actual compatibility errors to fail the test instead of silently
skipping it. Only skip the test for specific, expected errors if necessary, not
for all RuntimeError cases, so that broken pins and package drift are properly
caught rather than hidden by false-green CI results.
|
@rutayan-nv I'm not quite sure about this PR... It crosses the line of a unit test by accessing outside world (fetching the lib). Additionally, one of the PythonEnvironment features is to use a custom python version if needed (used for one of the private workloads) which this unit test doesn't do, it modifies current venv. Essentially it's an integration test now and we don't have any dedicated CI environment for it yet. What I would do is make the python version + aiconfigurator version configurable in the test toml so that users can control it. What do you think? |
What
A lightweight CI canary that verifies
aiconfig'sruntime/predictor.pystays compatible with the pinnedaiconfiguratorSDK.Why
predictor.pyis our adapter ontoaiconfigurator.sdk.*; the version is pinned by the aiconfig workload'sPythonEnvironment(aiconfigurator~=0.5.0). If a pin bump renames or drops a SDK symbol the adapter uses, the adapter breaks at run time and DSE silently stops producing metrics — and nothing catches it today (the existing aiconfig tests mock or never touch the SDK).How
ci_onlytest. Fetches only the pinned source viauv pip install --no-deps(seconds, no torch), then AST-audits it — no imports, so the torch-importing SDK modules still parse.predictor.pyconsumes still exists:ModelConfig/RuntimeConfigkwargs, the five*QuantModeenums + members,get_database/get_model/get_backend, andInferenceSession.run_static.predictor.py.Scope
Catches input-side drift (removed/renamed symbols). Return-shape changes (e.g. a renamed summary-df column) need a real prediction run and are out of scope.
Test
pytest -m ci_only→ 7 passed (~0.2s). Deselected from the hermetic suite;tests/*coverage gate unaffected.Review: @podkidyshev