[Agents Extension] Add test scenarios using the cli-interactive-tester tool#8524
[Agents Extension] Add test scenarios using the cli-interactive-tester tool#8524trangevi wants to merge 19 commits into
Conversation
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
…ll add back later Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
There was a problem hiding this comment.
Pull request overview
Adds a comprehensive, goal-based suite of manual interactive test scenarios for the azure.ai.agents azd extension, designed to be driven via the cli-interactive-tester MCP server. This codifies repeatable end-to-end command flows (from offline help/version checks through Tier 2 cloud provision/deploy/invoke) along with a profile/override mechanism and supporting fixtures.
Changes:
- Introduces a tiered scenario catalog (
00-,10-,2x-) with tagging conventions for selective runs and fleet orchestration. - Adds shared profile defaults (
profile.yaml), a local override template (profile.local.yaml.example), and gitignore rules for local profiles and run artifacts. - Adds a minimal “from-code” Python fixture used by scaffold-only init scenarios, and documents scenario usage in both the scenarios README and the extension
AGENTS.md.
Reviewed changes
Copilot reviewed 37 out of 37 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/extensions/azure.ai.agents/AGENTS.md | Documents the existence/intent of the manual cli-interactive-tester scenario suite and how contributors should use it. |
| cli/azd/extensions/azure.ai.agents/cspell.yaml | Adds a new word to prevent false-positive spellcheck failures from scenario docs. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/.gitignore | Ignores local profiles and tester output artifacts. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/README.md | Provides full orchestration guidance (tiers, tags, WSL path rules, auth prerequisites, hooks, and fleet mode). |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/profile.yaml | Defines repo-shared default profile values (region/model/shared suffix). |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/profile.local.yaml.example | Provides a template for per-user/per-CI identifying values (prefix/subscription/optional tenant). |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/fixtures/from-code/app.py | Minimal Python source fixture for “init from existing code” scenarios. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/fixtures/from-code/requirements.txt | Minimal requirements file to ensure Python project detection during init-from-code flows. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-version.yaml | Tier 0 scenario for azd ai agent version. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-help-root.yaml | Tier 0 scenario validating root help output/command discovery. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-sample-list-text.yaml | Tier 0 scenario for sample list text rendering. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-sample-list-json-filters.yaml | Tier 0 scenario for sample list JSON output and filtering flags. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-doctor-empty-dir.yaml | Tier 0 scenario for doctor behavior in an empty directory. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-doctor-local-only.yaml | Tier 0 scenario for doctor --local-only. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-init-validate-mutually-exclusive.yaml | Tier 0 negative-path scenario validating init argument conflicts. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-init-validate-no-prompt-missing.yaml | Tier 0 negative-path scenario validating --no-prompt missing inputs behavior. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-init-picker-navigation.yaml | Tier 0 scenario focusing on init picker UX (filtering, navigation, abort behavior). |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/10-init-template-python.yaml | Tier 1 scenario scaffolding from a Python template (auth required; stops before provision). |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/10-init-template-dotnet.yaml | Tier 1 scenario scaffolding from a .NET template (auth required; stops before provision). |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/10-init-from-manifest-url.yaml | Tier 1 scenario scaffolding from a GitHub manifest URL (auth + gh auth prerequisite). |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/10-init-from-code.yaml | Tier 1 scenario for “use code in current directory” flow using seeded fixture. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/10-init-flags-agent-name-model.yaml | Tier 1 scenario validating --agent-name/--model overrides when initializing from a manifest URL. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/10-init-deploy-mode-code.yaml | Tier 1 scenario validating interactive code-deploy mode prompts (entry point/runtime). |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/20-setup-deploy-shared-agent.yaml | Tier 2 setup scenario that provisions and deploys a shared agent used by subsequent Tier 2 scenarios. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/21-show.yaml | Tier 2 scenario validating show table output. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/21-show-json.yaml | Tier 2 scenario validating show --output json. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/22-invoke-remote.yaml | Tier 2 scenario validating remote invoke. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/22-invoke-new-session.yaml | Tier 2 scenario validating session vs conversation memory semantics for invoke. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/22-invoke-input-file.yaml | Tier 2 scenario validating invoke -f request-body-from-file behavior. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/23-sessions-lifecycle.yaml | Tier 2 scenario validating the sessions lifecycle command group. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/24-files-lifecycle.yaml | Tier 2 scenario validating the files lifecycle command group. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/25-monitor-console.yaml | Tier 2 scenario validating monitor console logs. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/25-monitor-system.yaml | Tier 2 scenario validating monitor system/container events. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/26-endpoint-update.yaml | Tier 2 scenario validating endpoint update behavior (patching without new version). |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/27-run-local-and-invoke-local.yaml | Tier 2 scenario validating run + invoke --local with allocated ports and two sessions. |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/2A-doctor-provisioned-all-pass.yaml | Tier 2 scenario validating doctor against a provisioned project (with a known-acceptable warning). |
| cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/2Z-teardown-down.yaml | Tier 2 teardown scenario to destroy resources and clean the shared working directory. |
Signed-off-by: trangevi <trangevi@microsoft.com>
📋 Prioritization NoteThanks for the contribution! The linked issue isn't in the current milestone yet. |
Adds a workflow skill under .github/skills/agent-scenario-tests/ that resolves the current branch's PR, maps changed files to impacted cli-interactive-tester scenario tags, drives the matching scenarios through the tester MCP server, and posts a results comment on the PR. Cost-aware: Tier 2 runs only after explicit user confirmation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…coverage Add 9 cli-interactive-tester scenarios closing the coverage gaps found in the PR #8524 review: - eval (cmd:eval): 00-eval-context-required (offline endpoint-required) and 28-eval-lifecycle (Tier 2 init/run/list/show against the shared agent). - optimize (cmd:optimize): 00-optimize-apply-requires-candidate (offline required-flag) and 29-optimize-submit-and-cancel (Tier 2, capped iteration). - invoke: 00-invoke-validate-protocol (offline unsupported-protocol) and 23-invoke-protocol-invocations (Tier 2 invocations memory semantics). - init: 00-init-validate-deploy-mode (offline value/required-flag validation) and 10-init-deploy-mode-container (Tier 1 container scaffold). - doctor: 00-doctor-partial-failure (mixed PASS+FAIL, exit 1). Add cmd:eval and cmd:optimize to the tag taxonomy, update the scenarios README tier tables, and update the agent-scenario-tests skill impact-mapping (eval and optimize are now covered Tier 2 commands, no longer listed as gaps). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A local scenario run revealed that 00-init-validate-deploy-mode never exercised
the --deploy-mode validation: in an empty directory with --no-prompt, init fails
earlier with 'template selection requires interactive mode' because
validateCodeDeployInput is only reached after an init method resolves.
Reclassify Tier 0 -> Tier 1, rename to 10-init-validate-deploy-mode.yaml, and
seed the from-code fixture so the from-code method resolves and the bogus
--deploy-mode value is actually rejected ('--deploy-mode must be container or
code'). Reaching the check scaffolds a starter template (network), hence Tier 1.
Note the late-validation UX (template scaffolded before the flag is validated)
as a report_finding. Update the README tier tables accordingly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dings A full Tier 0+1+2 cli-interactive-tester run against a freshly deployed shared agent surfaced two scenario-accuracy issues (the CLI itself behaved correctly): - 28-eval-lifecycle: 'eval init --no-wait' is ASYNC — it submits dataset (datagen-*) and evaluator (evaluatorgen-*) generation jobs and writes eval.yaml, but does NOT create an eval 'run'. So 'eval list' legitimately shows 0 rows right after init and 'eval show' (no id) errors cleanly. Refined the header + goals to describe the async semantics and treat an empty list / eval-id-required message as expected rather than a failure. - 29-optimize-submit-and-cancel: the optimize command group is preview-gated per subscription. On a non-enrolled subscription both 'optimize' and 'optimize list' return a clean 400 SubscriptionNotRegistered (signup: aka.ms/ao/quickstart), so the submit->status->cancel lifecycle can't run. Documented the Agent Optimizer enrollment prerequisite and added a gating check that accepts the clean SubscriptionNotRegistered error as a valid outcome when enrollment is absent. Add cspell words: datagen, evaluatorgen, signup. All Tier 0 (13) and Tier 1 (2) scenarios and the Tier 2 setup/invoke/teardown passed; resources were fully torn down with azd down --force --purge. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 6 new cli-interactive-tester scenarios covering three recently merged commands: Tier 0 (offline help validation): - 00-delete-help.yaml: validates azd ai agent delete --help output - 00-endpoint-show-help.yaml: validates azd ai agent endpoint show --help output - 00-code-download-help.yaml: validates azd ai agent code download --help output Tier 2 (cloud E2E, run between 2A-doctor and 2Z-teardown): - 2B-endpoint-show.yaml: shows endpoint config (table + JSON output) - 2C-code-download.yaml: negative-path test (container agent returns AgentNotCodeBased) - 2D-delete.yaml: deletes agent with --force, confirms removal via show All scenarios tested locally: 6/6 PASS. Co-authored-by: Jian Wu <wujia@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 4 Tier 1 scenarios that copy fixtures used a hardcoded /mnt/c/Repos/... fallback path (Travis's machine). Replace with bash :? expansion so that missing AZD_AGENTS_FIXTURES fails immediately with a clear message instead of a cryptic 'No such file or directory'. Co-authored-by: Jian Wu <wujia@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MCP-tool-driven workflow for running cli-interactive-tester scenarios. Manual dispatch only (workflow_dispatch) with tier selection (0, 0+1, 0+1+2). Key design: - All scenarios executed via cli-interactive-tester MCP tool (not shell parsing) - Tool installed via git clone + pip install -e from coreai-microsoft repo - Checkout hardcoded to trangevi/test-scenarios (until PR Azure#8524 merges) - ubuntu-22.04 runner (consistent with existing pipelines) - profile.local.yaml generated from GitHub secrets at runtime - Tier 2 includes always-run teardown step for resource cleanup - Results uploaded as artifacts Blocking: python -m auto_test_tool.runner batch mode needs to be confirmed or implemented. Without it, scenarios cannot run headlessly in CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MCP-tool-driven workflow for running cli-interactive-tester scenarios. Manual dispatch only (workflow_dispatch) with tier selection (0, 0+1, 0+1+2). Key design: - All scenarios executed via cli-interactive-tester MCP tool (not shell parsing) - Tool installed via git clone + pip install -e from coreai-microsoft repo - Checkout hardcoded to trangevi/test-scenarios (until PR Azure#8524 merges) - ubuntu-22.04 runner (consistent with existing pipelines) - profile.local.yaml generated from GitHub secrets at runtime - Tier 2 includes always-run teardown step for resource cleanup - Results uploaded as artifacts Blocking: python -m auto_test_tool.runner batch mode needs to be confirmed or implemented. Without it, scenarios cannot run headlessly in CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot-driven pipeline using cli-interactive-tester MCP tool. Same architecture as local testing: Copilot CLI (LLM) ↔ MCP ↔ cli-interactive-tester ↔ tmux ↔ azd CLI Design: - workflow_dispatch only (tier selector: 0 / 0+1 / 0+1+2) - ubuntu-22.04 runner - cli-interactive-tester installed via git clone + pip install -e - MCP config generated for Copilot to connect to the tool - Copilot reads scenario goals and drives terminal autonomously - Tier 2 has always-run teardown for Azure resource cleanup - Results uploaded as artifacts (HTML reports + screenshots) Checkout: hardcoded to trangevi/test-scenarios (until PR Azure#8524 merges) Blocking: need to confirm how to invoke Copilot CLI headlessly in CI (copilot --mcp-config --prompt-file, gh copilot run, or Extensions API) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agentic Workflow (.md source) for Copilot-driven E2E testing. Uses gh-aw framework — same pattern as extension-pr-labeler. Architecture: gh-aw framework → Copilot CLI (LLM) ↔ MCP ↔ cli-interactive-tester ↔ tmux ↔ azd Key design: - gh-aw .md source file (compile with 'gh aw compile' to generate .lock.yml) - cli-interactive-tester registered as MCP tool in frontmatter - Copilot reads scenario YAML goals and drives terminal autonomously - workflow_dispatch with tier selector (0 / 0+1 / 0+1+2) - Setup: Go build, Python 3.12, tmux, uv, Azure login, test profile - Checkout: trangevi/test-scenarios (until PR Azure#8524 merges) TODO: - Confirm cli-interactive-tester repo visibility (public/private) - Run 'gh aw compile' to generate .lock.yml - Configure secrets: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID, FOUNDRY_PROJECT_ENDPOINT, GH_TOKEN, COPILOT_GITHUB_TOKEN Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agentic Workflow for Copilot-driven E2E testing. Uses gh-aw framework — same pattern as extension-pr-labeler. Architecture: gh-aw framework → Copilot CLI (LLM) ↔ MCP ↔ cli-interactive-tester ↔ tmux ↔ azd Key design: - cli-interactive-tester registered as MCP tool in frontmatter - Copilot reads scenario YAML goals and drives terminal autonomously - workflow_dispatch with tier selector (0 / 0+1 / 0+1+2) - Setup: Go build, Python 3.12, tmux, uv, Azure login, test profile - Checkout: trangevi/test-scenarios (until PR Azure#8524 merges) TODO: - Confirm cli-interactive-tester repo visibility (public/private) - Run 'gh aw compile' to generate .lock.yml - Configure secrets: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID, FOUNDRY_PROJECT_ENDPOINT, GH_TOKEN, COPILOT_GITHUB_TOKEN Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot CLI-driven pipeline using cli-interactive-tester MCP tool. Same architecture as local testing — Copilot reads scenario goals and drives terminal via MCP protocol. Implementation: - Copilot CLI installed via npm install -g @github/copilot - Auth via COPILOT_GITHUB_TOKEN (Fine-grained PAT, Copilot Requests perm) - MCP config in ~/.copilot/mcp-config.json (auto-loaded by Copilot) - Execution: copilot -p prompt --allow-tool=... --no-ask-user - workflow_dispatch with tier selector (0 / 0+1 / 0+1+2) - ubuntu-22.04 runner - Checkout: trangevi/test-scenarios (until PR Azure#8524 merges) - Tier 2 has always-run teardown for Azure resource cleanup - Results uploaded as artifacts TODO: - Confirm --allow-tool syntax for MCP-registered tools - Configure COPILOT_PAT secret (Fine-grained PAT) - Confirm cli-interactive-tester repo visibility - Create prompt-ci-run.md in scenarios directory Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
therealjohn
left a comment
There was a problem hiding this comment.
Approved, but I'm requesting changes to block until someone on the AZD can also take a look. Will ping and re-approve once they can review.
No description provided.