Skip to content

[Agents Extension] Add test scenarios using the cli-interactive-tester tool#8524

Open
trangevi wants to merge 19 commits into
mainfrom
trangevi/test-scenarios
Open

[Agents Extension] Add test scenarios using the cli-interactive-tester tool#8524
trangevi wants to merge 19 commits into
mainfrom
trangevi/test-scenarios

Conversation

@trangevi

@trangevi trangevi commented Jun 2, 2026

Copy link
Copy Markdown
Member

No description provided.

Signed-off-by: trangevi <trangevi@microsoft.com>
@github-actions github-actions Bot added the ext-agents azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes} extensions label Jun 2, 2026
trangevi added 11 commits June 2, 2026 14:14
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
…ll add back later

Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
@trangevi trangevi marked this pull request as ready for review June 5, 2026 21:32
Copilot AI review requested due to automatic review settings June 5, 2026 21:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a comprehensive, goal-based suite of manual interactive test scenarios for the azure.ai.agents azd extension, designed to be driven via the cli-interactive-tester MCP server. This codifies repeatable end-to-end command flows (from offline help/version checks through Tier 2 cloud provision/deploy/invoke) along with a profile/override mechanism and supporting fixtures.

Changes:

  • Introduces a tiered scenario catalog (00-, 10-, 2x-) with tagging conventions for selective runs and fleet orchestration.
  • Adds shared profile defaults (profile.yaml), a local override template (profile.local.yaml.example), and gitignore rules for local profiles and run artifacts.
  • Adds a minimal “from-code” Python fixture used by scaffold-only init scenarios, and documents scenario usage in both the scenarios README and the extension AGENTS.md.

Reviewed changes

Copilot reviewed 37 out of 37 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cli/azd/extensions/azure.ai.agents/AGENTS.md Documents the existence/intent of the manual cli-interactive-tester scenario suite and how contributors should use it.
cli/azd/extensions/azure.ai.agents/cspell.yaml Adds a new word to prevent false-positive spellcheck failures from scenario docs.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/.gitignore Ignores local profiles and tester output artifacts.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/README.md Provides full orchestration guidance (tiers, tags, WSL path rules, auth prerequisites, hooks, and fleet mode).
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/profile.yaml Defines repo-shared default profile values (region/model/shared suffix).
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/profile.local.yaml.example Provides a template for per-user/per-CI identifying values (prefix/subscription/optional tenant).
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/fixtures/from-code/app.py Minimal Python source fixture for “init from existing code” scenarios.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/fixtures/from-code/requirements.txt Minimal requirements file to ensure Python project detection during init-from-code flows.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-version.yaml Tier 0 scenario for azd ai agent version.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-help-root.yaml Tier 0 scenario validating root help output/command discovery.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-sample-list-text.yaml Tier 0 scenario for sample list text rendering.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-sample-list-json-filters.yaml Tier 0 scenario for sample list JSON output and filtering flags.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-doctor-empty-dir.yaml Tier 0 scenario for doctor behavior in an empty directory.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-doctor-local-only.yaml Tier 0 scenario for doctor --local-only.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-init-validate-mutually-exclusive.yaml Tier 0 negative-path scenario validating init argument conflicts.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-init-validate-no-prompt-missing.yaml Tier 0 negative-path scenario validating --no-prompt missing inputs behavior.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/00-init-picker-navigation.yaml Tier 0 scenario focusing on init picker UX (filtering, navigation, abort behavior).
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/10-init-template-python.yaml Tier 1 scenario scaffolding from a Python template (auth required; stops before provision).
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/10-init-template-dotnet.yaml Tier 1 scenario scaffolding from a .NET template (auth required; stops before provision).
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/10-init-from-manifest-url.yaml Tier 1 scenario scaffolding from a GitHub manifest URL (auth + gh auth prerequisite).
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/10-init-from-code.yaml Tier 1 scenario for “use code in current directory” flow using seeded fixture.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/10-init-flags-agent-name-model.yaml Tier 1 scenario validating --agent-name/--model overrides when initializing from a manifest URL.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/10-init-deploy-mode-code.yaml Tier 1 scenario validating interactive code-deploy mode prompts (entry point/runtime).
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/20-setup-deploy-shared-agent.yaml Tier 2 setup scenario that provisions and deploys a shared agent used by subsequent Tier 2 scenarios.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/21-show.yaml Tier 2 scenario validating show table output.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/21-show-json.yaml Tier 2 scenario validating show --output json.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/22-invoke-remote.yaml Tier 2 scenario validating remote invoke.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/22-invoke-new-session.yaml Tier 2 scenario validating session vs conversation memory semantics for invoke.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/22-invoke-input-file.yaml Tier 2 scenario validating invoke -f request-body-from-file behavior.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/23-sessions-lifecycle.yaml Tier 2 scenario validating the sessions lifecycle command group.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/24-files-lifecycle.yaml Tier 2 scenario validating the files lifecycle command group.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/25-monitor-console.yaml Tier 2 scenario validating monitor console logs.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/25-monitor-system.yaml Tier 2 scenario validating monitor system/container events.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/26-endpoint-update.yaml Tier 2 scenario validating endpoint update behavior (patching without new version).
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/27-run-local-and-invoke-local.yaml Tier 2 scenario validating run + invoke --local with allocated ports and two sessions.
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/2A-doctor-provisioned-all-pass.yaml Tier 2 scenario validating doctor against a provisioned project (with a known-acceptable warning).
cli/azd/extensions/azure.ai.agents/tests/cli-interactive-tester-scenarios/2Z-teardown-down.yaml Tier 2 teardown scenario to destroy resources and clean the shared working directory.

Comment thread cli/azd/extensions/azure.ai.agents/AGENTS.md Outdated
@trangevi trangevi linked an issue Jun 5, 2026 that may be closed by this pull request
Signed-off-by: trangevi <trangevi@microsoft.com>
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

📋 Prioritization Note

Thanks for the contribution! The linked issue isn't in the current milestone yet.
Review may take a bit longer — reach out to @rajeshkamal5050 or @kristenwomack if you'd like to discuss prioritization.

Adds a workflow skill under .github/skills/agent-scenario-tests/ that resolves the current branch's PR, maps changed files to impacted cli-interactive-tester scenario tags, drives the matching scenarios through the tester MCP server, and posts a results comment on the PR. Cost-aware: Tier 2 runs only after explicit user confirmation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
glharper and others added 5 commits June 9, 2026 13:21
…coverage

Add 9 cli-interactive-tester scenarios closing the coverage gaps found in the
PR #8524 review:

- eval (cmd:eval): 00-eval-context-required (offline endpoint-required) and
  28-eval-lifecycle (Tier 2 init/run/list/show against the shared agent).
- optimize (cmd:optimize): 00-optimize-apply-requires-candidate (offline
  required-flag) and 29-optimize-submit-and-cancel (Tier 2, capped iteration).
- invoke: 00-invoke-validate-protocol (offline unsupported-protocol) and
  23-invoke-protocol-invocations (Tier 2 invocations memory semantics).
- init: 00-init-validate-deploy-mode (offline value/required-flag validation)
  and 10-init-deploy-mode-container (Tier 1 container scaffold).
- doctor: 00-doctor-partial-failure (mixed PASS+FAIL, exit 1).

Add cmd:eval and cmd:optimize to the tag taxonomy, update the scenarios README
tier tables, and update the agent-scenario-tests skill impact-mapping (eval and
optimize are now covered Tier 2 commands, no longer listed as gaps).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A local scenario run revealed that 00-init-validate-deploy-mode never exercised
the --deploy-mode validation: in an empty directory with --no-prompt, init fails
earlier with 'template selection requires interactive mode' because
validateCodeDeployInput is only reached after an init method resolves.

Reclassify Tier 0 -> Tier 1, rename to 10-init-validate-deploy-mode.yaml, and
seed the from-code fixture so the from-code method resolves and the bogus
--deploy-mode value is actually rejected ('--deploy-mode must be container or
code'). Reaching the check scaffolds a starter template (network), hence Tier 1.
Note the late-validation UX (template scaffolded before the flag is validated)
as a report_finding. Update the README tier tables accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dings

A full Tier 0+1+2 cli-interactive-tester run against a freshly deployed shared
agent surfaced two scenario-accuracy issues (the CLI itself behaved correctly):

- 28-eval-lifecycle: 'eval init --no-wait' is ASYNC — it submits dataset
  (datagen-*) and evaluator (evaluatorgen-*) generation jobs and writes
  eval.yaml, but does NOT create an eval 'run'. So 'eval list' legitimately
  shows 0 rows right after init and 'eval show' (no id) errors cleanly. Refined
  the header + goals to describe the async semantics and treat an empty list /
  eval-id-required message as expected rather than a failure.

- 29-optimize-submit-and-cancel: the optimize command group is preview-gated per
  subscription. On a non-enrolled subscription both 'optimize' and 'optimize
  list' return a clean 400 SubscriptionNotRegistered (signup: aka.ms/ao/quickstart),
  so the submit->status->cancel lifecycle can't run. Documented the Agent
  Optimizer enrollment prerequisite and added a gating check that accepts the
  clean SubscriptionNotRegistered error as a valid outcome when enrollment is
  absent.

Add cspell words: datagen, evaluatorgen, signup. All Tier 0 (13) and Tier 1 (2)
scenarios and the Tier 2 setup/invoke/teardown passed; resources were fully torn
down with azd down --force --purge.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 6 new cli-interactive-tester scenarios covering three recently merged commands:

Tier 0 (offline help validation):
- 00-delete-help.yaml: validates azd ai agent delete --help output
- 00-endpoint-show-help.yaml: validates azd ai agent endpoint show --help output
- 00-code-download-help.yaml: validates azd ai agent code download --help output

Tier 2 (cloud E2E, run between 2A-doctor and 2Z-teardown):
- 2B-endpoint-show.yaml: shows endpoint config (table + JSON output)
- 2C-code-download.yaml: negative-path test (container agent returns AgentNotCodeBased)
- 2D-delete.yaml: deletes agent with --force, confirms removal via show

All scenarios tested locally: 6/6 PASS.

Co-authored-by: Jian Wu <wujia@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 4 Tier 1 scenarios that copy fixtures used a hardcoded /mnt/c/Repos/...
fallback path (Travis's machine). Replace with bash :? expansion so that
missing AZD_AGENTS_FIXTURES fails immediately with a clear message instead
of a cryptic 'No such file or directory'.

Co-authored-by: Jian Wu <wujia@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
v1212 pushed a commit to v1212/azure-dev that referenced this pull request Jun 11, 2026
MCP-tool-driven workflow for running cli-interactive-tester scenarios.
Manual dispatch only (workflow_dispatch) with tier selection (0, 0+1, 0+1+2).

Key design:
- All scenarios executed via cli-interactive-tester MCP tool (not shell parsing)
- Tool installed via git clone + pip install -e from coreai-microsoft repo
- Checkout hardcoded to trangevi/test-scenarios (until PR Azure#8524 merges)
- ubuntu-22.04 runner (consistent with existing pipelines)
- profile.local.yaml generated from GitHub secrets at runtime
- Tier 2 includes always-run teardown step for resource cleanup
- Results uploaded as artifacts

Blocking: python -m auto_test_tool.runner batch mode needs to be confirmed
or implemented. Without it, scenarios cannot run headlessly in CI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
v1212 pushed a commit to v1212/azure-dev that referenced this pull request Jun 11, 2026
MCP-tool-driven workflow for running cli-interactive-tester scenarios.
Manual dispatch only (workflow_dispatch) with tier selection (0, 0+1, 0+1+2).

Key design:
- All scenarios executed via cli-interactive-tester MCP tool (not shell parsing)
- Tool installed via git clone + pip install -e from coreai-microsoft repo
- Checkout hardcoded to trangevi/test-scenarios (until PR Azure#8524 merges)
- ubuntu-22.04 runner (consistent with existing pipelines)
- profile.local.yaml generated from GitHub secrets at runtime
- Tier 2 includes always-run teardown step for resource cleanup
- Results uploaded as artifacts

Blocking: python -m auto_test_tool.runner batch mode needs to be confirmed
or implemented. Without it, scenarios cannot run headlessly in CI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
v1212 pushed a commit to v1212/azure-dev that referenced this pull request Jun 11, 2026
Copilot-driven pipeline using cli-interactive-tester MCP tool.
Same architecture as local testing:
  Copilot CLI (LLM) ↔ MCP ↔ cli-interactive-tester ↔ tmux ↔ azd CLI

Design:
- workflow_dispatch only (tier selector: 0 / 0+1 / 0+1+2)
- ubuntu-22.04 runner
- cli-interactive-tester installed via git clone + pip install -e
- MCP config generated for Copilot to connect to the tool
- Copilot reads scenario goals and drives terminal autonomously
- Tier 2 has always-run teardown for Azure resource cleanup
- Results uploaded as artifacts (HTML reports + screenshots)

Checkout: hardcoded to trangevi/test-scenarios (until PR Azure#8524 merges)

Blocking: need to confirm how to invoke Copilot CLI headlessly in CI
(copilot --mcp-config --prompt-file, gh copilot run, or Extensions API)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
v1212 pushed a commit to v1212/azure-dev that referenced this pull request Jun 11, 2026
Agentic Workflow (.md source) for Copilot-driven E2E testing.
Uses gh-aw framework — same pattern as extension-pr-labeler.

Architecture:
  gh-aw framework → Copilot CLI (LLM) ↔ MCP ↔ cli-interactive-tester ↔ tmux ↔ azd

Key design:
- gh-aw .md source file (compile with 'gh aw compile' to generate .lock.yml)
- cli-interactive-tester registered as MCP tool in frontmatter
- Copilot reads scenario YAML goals and drives terminal autonomously
- workflow_dispatch with tier selector (0 / 0+1 / 0+1+2)
- Setup: Go build, Python 3.12, tmux, uv, Azure login, test profile
- Checkout: trangevi/test-scenarios (until PR Azure#8524 merges)

TODO:
- Confirm cli-interactive-tester repo visibility (public/private)
- Run 'gh aw compile' to generate .lock.yml
- Configure secrets: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID,
  FOUNDRY_PROJECT_ENDPOINT, GH_TOKEN, COPILOT_GITHUB_TOKEN

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
v1212 pushed a commit to v1212/azure-dev that referenced this pull request Jun 11, 2026
Agentic Workflow for Copilot-driven E2E testing.
Uses gh-aw framework — same pattern as extension-pr-labeler.

Architecture:
  gh-aw framework → Copilot CLI (LLM) ↔ MCP ↔ cli-interactive-tester ↔ tmux ↔ azd

Key design:
- cli-interactive-tester registered as MCP tool in frontmatter
- Copilot reads scenario YAML goals and drives terminal autonomously
- workflow_dispatch with tier selector (0 / 0+1 / 0+1+2)
- Setup: Go build, Python 3.12, tmux, uv, Azure login, test profile
- Checkout: trangevi/test-scenarios (until PR Azure#8524 merges)

TODO:
- Confirm cli-interactive-tester repo visibility (public/private)
- Run 'gh aw compile' to generate .lock.yml
- Configure secrets: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID,
  FOUNDRY_PROJECT_ENDPOINT, GH_TOKEN, COPILOT_GITHUB_TOKEN

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
v1212 pushed a commit to v1212/azure-dev that referenced this pull request Jun 11, 2026
Copilot CLI-driven pipeline using cli-interactive-tester MCP tool.
Same architecture as local testing — Copilot reads scenario goals
and drives terminal via MCP protocol.

Implementation:
- Copilot CLI installed via npm install -g @github/copilot
- Auth via COPILOT_GITHUB_TOKEN (Fine-grained PAT, Copilot Requests perm)
- MCP config in ~/.copilot/mcp-config.json (auto-loaded by Copilot)
- Execution: copilot -p prompt --allow-tool=... --no-ask-user
- workflow_dispatch with tier selector (0 / 0+1 / 0+1+2)
- ubuntu-22.04 runner
- Checkout: trangevi/test-scenarios (until PR Azure#8524 merges)
- Tier 2 has always-run teardown for Azure resource cleanup
- Results uploaded as artifacts

TODO:
- Confirm --allow-tool syntax for MCP-registered tools
- Configure COPILOT_PAT secret (Fine-grained PAT)
- Confirm cli-interactive-tester repo visibility
- Create prompt-ci-run.md in scenarios directory

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@glharper glharper enabled auto-merge (squash) June 11, 2026 14:43

@therealjohn therealjohn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but I'm requesting changes to block until someone on the AZD can also take a look. Will ping and re-approve once they can review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ext-agents azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes} extensions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Initial pass at scenario testing

5 participants