Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"name": "agentops-accelerator",
"source": "../../plugins/agentops",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
"version": "0.4.1",
"version": "0.6.0",
"keywords": [
"agentops",
"evaluation",
Expand Down
2 changes: 1 addition & 1 deletion .github/plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"name": "agentops-accelerator",
"source": "../../plugins/agentops",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
"version": "0.4.1",
"version": "0.6.0",
"keywords": [
"agentops",
"evaluation",
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
with:
ref: ${{ inputs.checkout_ref || github.ref }}
fetch-depth: 0 # Full history required for setuptools-scm
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/agentops-watchdog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
timeout-minutes: 30
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@v7

- name: Azure login (OIDC)
uses: azure/login@v3
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7

- name: Install uv
uses: astral-sh/setup-uv@v7
Expand Down Expand Up @@ -66,7 +66,7 @@ jobs:
os: [ubuntu-latest, windows-latest]
python-version: ["3.11", "3.12", "3.13"]
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7

- name: Install uv
uses: astral-sh/setup-uv@v7
Expand All @@ -93,7 +93,7 @@ jobs:
runs-on: ubuntu-latest
needs: test
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7

- name: Install uv
uses: astral-sh/setup-uv@v7
Expand Down Expand Up @@ -125,7 +125,7 @@ jobs:
permissions:
id-token: write # Required for PyPI Trusted Publishing (OIDC)
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
with:
fetch-depth: 0 # Full history for setuptools-scm

Expand Down Expand Up @@ -162,7 +162,7 @@ jobs:
needs: publish-dev
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
with:
fetch-depth: 0

Expand Down Expand Up @@ -215,7 +215,7 @@ jobs:
build-vsix:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7

# CI uses the committed package.json version as-is (no publish, dry-run only).
# The version in package.json is synced by cut-release.yml when a release branch is created.
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/cut-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:
echo "version=$VERSION" >> "$GITHUB_ENV"

- name: Checkout develop
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
ref: develop
fetch-depth: 0
Expand Down
18 changes: 9 additions & 9 deletions .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
offline-smoke:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7

- name: Set up Python
uses: actions/setup-python@v6
Expand Down Expand Up @@ -69,7 +69,7 @@ jobs:
unit-tests-with-coverage:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7

- name: Set up Python
uses: actions/setup-python@v6
Expand Down Expand Up @@ -127,7 +127,7 @@ jobs:
hosted_agent_name: ${{ steps.create_hosted_agent.outputs.agent_name }}
suffix: ${{ steps.suffix.outputs.value }}
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7

- id: suffix
name: Compute per-run suffix
Expand Down Expand Up @@ -256,7 +256,7 @@ jobs:
id-token: write
contents: read
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
- uses: actions/setup-python@v6
with:
python-version: "3.12"
Expand Down Expand Up @@ -307,7 +307,7 @@ jobs:
id-token: write
contents: read
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
- uses: actions/setup-python@v6
with:
python-version: "3.12"
Expand Down Expand Up @@ -359,7 +359,7 @@ jobs:
id-token: write
contents: read
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
- uses: actions/setup-python@v6
with:
python-version: "3.12"
Expand Down Expand Up @@ -409,7 +409,7 @@ jobs:
id-token: write
contents: read
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
- uses: actions/setup-python@v6
with:
python-version: "3.12"
Expand Down Expand Up @@ -464,7 +464,7 @@ jobs:
id-token: write
contents: read
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
- name: Azure login (OIDC)
uses: ./.github/actions/azure-oidc-login
with:
Expand Down Expand Up @@ -529,7 +529,7 @@ jobs:
if: always()
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
- name: Download all artifacts
uses: actions/download-artifact@v8
with:
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ jobs:
needs: publish-testpypi
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
with:
fetch-depth: 0

Expand Down Expand Up @@ -185,7 +185,7 @@ jobs:
env:
VSIX_FILE: agentops-skills.vsix
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
with:
fetch-depth: 0 # Full history for version derivation

Expand Down Expand Up @@ -255,7 +255,7 @@ jobs:
permissions:
contents: write
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7

- name: Download Python dist artifacts
uses: actions/download-artifact@v8
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/staging.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ jobs:
needs: publish-testpypi
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
with:
fetch-depth: 0

Expand Down Expand Up @@ -137,7 +137,7 @@ jobs:
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7

- name: Sync VSIX version from branch name
run: |
Expand Down
30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,17 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres

## [Unreleased]

## [0.6.0] - 2026-06-26

### Added
- **Retrieval telemetry can now be imported as evaluation datasets.** The new
`telemetry_imports` config contract and `agentops telemetry validate`,
`agentops telemetry preview`, and `agentops telemetry import` commands let
teams turn reviewed retrieval telemetry into dataset-backed eval rows with
`response_source: dataset`. Grey-box HTTP agents can map `response_fields` from
`$response.context`, and the evaluation docs now cover the import workflow and
contract.

### Changed
- **Prompt-agent PR validation now uses sandbox instead of dev.** Generated
GitHub and Azure DevOps PR workflows stage prompt-agent candidates in the
Expand Down Expand Up @@ -178,6 +189,25 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres
tutorial are updated to describe the new contract.
([#214](https://github.com/Azure/agentops/issues/214))

### Fixed
- **Clean installs now include the pager dependency used by explain commands.**
`agentops explain`, `agentops init explain`, and `agentops doctor explain`
import Click directly to render long manual output, so `click>=8.1,<9` is now
declared as a runtime dependency instead of relying on transitive installs.

- **`agentops eval init` now works with both old and new `azure.ai.agents` azd
extensions.** Version 0.1.40 of the extension renamed the eval subcommand from
`azd ai agent eval init` to `azd ai agent eval generate`, which made
`agentops eval init` hard-fail with `Command "init" is deprecated, use 'azd ai
agent eval generate' instead`. AgentOps now invokes `generate` first and
transparently falls back to the legacy `init` subcommand when an older
extension does not recognise `generate`. The fallback only triggers on
subcommand-name/deprecation errors; genuine failures (authentication, project
endpoint, timeouts) are still surfaced immediately and unchanged. All
previously passed flags (`--project-endpoint`, `--agent`,
`--gen-instruction-file`, `--eval-model`, `--dataset`, `--evaluator`) and the
recipe discovery/persistence behaviour are preserved.

## [0.4.0] - 2026-06-14

### Added
Expand Down
100 changes: 99 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,110 @@ practices.
## Learn more

For setup guides, tutorials, architecture, CI/CD guidance, Doctor checks, and
evaluator reference, see:
evaluator reference, start with the documentation site:

<p align="center">
<a href="https://aka.ms/agentops-accelerator"><b>https://aka.ms/agentops-accelerator</b></a>
</p>

## Run a first evaluation

```powershell
az login
$env:AZURE_AI_FOUNDRY_PROJECT_ENDPOINT = "https://<resource>.services.ai.azure.com/api/projects/<project>"
$env:AZURE_OPENAI_ENDPOINT = "https://<openai-resource>.openai.azure.com"
$env:AZURE_OPENAI_DEPLOYMENT = "gpt-4o-mini"
agentops eval analyze
agentops eval run
agentops doctor --evidence-pack
```

For Foundry targets, use either `project_endpoint:` in `agentops.yaml` or
`AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`. Config wins when both are set.

Outputs land in `.agentops/results/latest/`:

- `results.json` - machine-readable (versioned, stable schema)
- `report.md` - human-readable, PR-friendly

Release evidence lands in `.agentops/release/latest/`:

- `evidence.json` - machine-readable production-readiness projection
- `evidence.md` - PR/release summary

Capture the first successful run as a baseline:

```powershell
New-Item -ItemType Directory -Force .agentops\baseline | Out-Null
Copy-Item .agentops\results\latest\results.json .agentops\baseline\results.json
```

To see a visible comparison, publish a new agent version with a prompt
that paraphrases instead of copying exact-answer requests, update
`agentops.yaml` to that new `name:version`, and compare against the
baseline:

```powershell
agentops eval run --baseline .agentops/baseline/results.json
```

The report grows a `Comparison vs Baseline` section with per-metric deltas.

---

## Commands

Install optional extras as needed: `[agent]` for Doctor/Cockpit and `[mcp]` for MCP.

- `agentops --version` - show installed version.
- `agentops init` - bootstrap config and seed data.
- `agentops eval analyze` - check eval readiness.
- `agentops eval init` - bootstrap an azd `eval.yaml` recipe and wire `execution: azd`.
- `agentops eval run [--baseline PATH]` - run an evaluation.
- `agentops eval promote-traces --source FILE [--apply]` - promote local trace export files.
- `agentops telemetry validate NAME` - validate an Azure Monitor or Application Insights import.
- `agentops telemetry preview NAME --rows N` - preview telemetry import rows.
- `agentops telemetry import NAME --apply` - write the imported telemetry dataset.
- `agentops report generate` - regenerate `report.md`.
- `agentops workflow analyze` - recommend CI/CD shape.
- `agentops workflow generate` - generate CI/CD workflows.
- `agentops skills install` - install Copilot or Claude skills.
- `agentops mcp serve` - start the MCP server.
- `agentops doctor [--evidence-pack]` - run readiness checks.
- `agentops cockpit` - open the local Cockpit.
- `agentops agent serve` - serve Doctor as a Copilot Extension.

## AgentOps Cockpit

`agentops cockpit` opens a localhost command center for the current workspace.
It combines eval history, Doctor findings, workflow status, and links to the
matching Foundry and Azure Monitor views.

Cockpit sections, in display order:

- **Foundry connection** - project, tenant, agent, App Insights.
- **Foundry launchpad** - links for the agent, project, and telemetry.
- **Observability readiness** - tracing, evals, red team, alerts.
- **AgentOps Doctor** - latest Doctor findings.
- **Eval gate summary** - local and CI gate history.
- **Quality gate summary** - score trends and regressions.
- **Production signal** - App Insights health snapshot.
- **CI/CD Pipelines** - GitHub Actions status.
- **Next actions** - contextual recommendations.

## Documentation

- [Foundry Prompt Agent tutorial](docs/tutorial-prompt-agent.md) - use this when the Foundry target is `agent: name:version`. Walks the sandbox to dev journey with a PR gate.
- [Hosted or HTTP Agent tutorial](docs/tutorial-hosted-agent-quickstart.md) - use this when the target is a Foundry hosted or HTTP endpoint URL. Same sandbox to dev journey for endpoint-based agents.
- [End-to-end tutorial](docs/tutorial-end-to-end.md) - extends either of the above with the full sandbox to dev to qa to prod promotion, Foundry red-team scans, and trace-to-regression promotion.
- [Evaluation paths](docs/evaluation.md) - choose static dataset, grey-box HTTP, or telemetry/trace import.
- [Core concepts](docs/concepts.md)
- [How it works](docs/how-it-works.md)
- [Doctor explained](docs/doctor-explained.md)
- [CI/CD with GitHub Actions](docs/ci-github-actions.md)
- [Built-in evaluator reference](docs/foundry-evaluation-sdk-built-in-evaluators.md)
- [Release process](docs/release-process.md)

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development, testing, and contribution guidance.
Loading
Loading