diff --git a/AGENTS.md b/AGENTS.md index 0f284f4..38194f4 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -14,7 +14,7 @@ The repository provides: - A normalized output contract (`results.json`, `report.md`) for CI and PRs - A release evidence contract (`evidence.json`, `evidence.md`) for production promotion reviews - A local Cockpit (`agentops cockpit`) that links out to Foundry for runtime - observability and surfaces Doctor findings AgentOps owns end-to-end + observability and surfaces Doctor findings AgentOps handles end-to-end - A Doctor (`agentops doctor`) for readiness, regression, and OpEx checks - AI Landing Zone deployment readiness checks that connect official preflight, azd/Bicep workflow deployment, AgentOps eval gates, and private-network runner diff --git a/README.md b/README.md index 0264af9..7d8e3df 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@

AgentOps Accelerator

-Evaluate. Ship. Observe. Own. +Evaluate. Ship. Observe. Operate.
Continuous evaluation, safety testing, observability, and release readiness for Microsoft Foundry agents.

@@ -21,7 +21,7 @@ Continuous evaluation, safety testing, observability, and release readiness for License: MIT

-AgentOps Accelerator helps Microsoft Foundry agent teams evaluate quality, prepare releases, monitor behavior, and stay accountable after launch. It gives you a practical starting point for agent operations, with Foundry integration as the default path and deeper setup guidance in the full docs. +AgentOps Accelerator helps Microsoft Foundry agent teams evaluate quality, prepare releases, monitor behavior, and operate reliably after launch. It gives you a practical starting point for agent operations, with Foundry integration as the default path and deeper setup guidance in the full docs. ## Get started @@ -46,7 +46,7 @@ Use AgentOps Accelerator when you need to: - Compare changes across versions - Capture release evidence - Monitor agent quality and regressions -- Give teams a repeatable way to own agent behavior in production +- Give teams a repeatable way to operate agents responsibly in production The accelerator keeps the local workflow simple, then points you to the full docs when you are ready to configure pipelines, dashboards, and release diff --git a/docs/ci-github-actions.md b/docs/ci-github-actions.md index 0dde1f9..0233db1 100644 --- a/docs/ci-github-actions.md +++ b/docs/ci-github-actions.md @@ -201,7 +201,7 @@ prompt. ### 4. Choose deployment mode AgentOps is azd-first for deployment: AgentOps runs the evaluation gate, -while Azure Developer CLI owns infrastructure, packaging, deployment, and +while Azure Developer CLI manages infrastructure, packaging, deployment, and hooks declared in `azure.yaml`. Before choosing manually, run: @@ -371,11 +371,11 @@ agentops workflow analyze --format markdown --out agentops-workflow-plan.md Use the output as the plan for your coding agent: -1. AgentOps owns repo-side eval gates, Doctor readiness checks, artifacts, and +1. AgentOps handles repo-side eval gates, Doctor readiness checks, artifacts, and Cockpit visibility. -2. `azd` owns `provision`, `deploy`, and hooks for app/infra lifecycle when +2. `azd` manages `provision`, `deploy`, and hooks for app/infra lifecycle when `azure.yaml` is present or can be added. -3. Foundry owns hosted agents, evaluations, traces, and operations. +3. Foundry manages hosted agents, evaluations, traces, and operations. 4. Project-specific steps such as indexing data, seeding search, building containers, updating app config, or running private-network post-provision work stay in the accelerator's azd hooks or existing deployment tooling. @@ -425,8 +425,8 @@ contract to gate deploys: | `2` | Eval ran, one or more thresholds failed | ❌ fail (deploy never runs) | | `1` | Runtime / config error | ❌ fail | -For prompt-agent cloud eval, Foundry owns the managed evaluation run and -AgentOps owns the CI exit code. A threshold failure exits `2`, so the PR/deploy +For prompt-agent cloud eval, Foundry runs the managed evaluation and +AgentOps enforces the CI exit code. A threshold failure exits `2`, so the PR/deploy gate fails with the failing threshold rows in `report.md`. ## Artifacts diff --git a/docs/how-it-works.md b/docs/how-it-works.md index 29acc72..def4519 100644 --- a/docs/how-it-works.md +++ b/docs/how-it-works.md @@ -14,7 +14,7 @@ is the proof?** It: 4. Returns CI-friendly exit codes: `0` pass, `2` threshold failure, `1` error. 5. Writes release evidence with `agentops doctor --evidence-pack`. -Foundry owns agent creation, deployment, runtime, traces, monitoring, +Foundry manages agent creation, deployment, runtime, traces, monitoring, red-teaming, datasets, and Microsoft-hosted evaluation drilldown. AgentOps references the candidate those tools produced and adds the repo-controlled release proof: diff --git a/docs/tutorial-end-to-end.md b/docs/tutorial-end-to-end.md index e9b4bb7..8e3975f 100644 --- a/docs/tutorial-end-to-end.md +++ b/docs/tutorial-end-to-end.md @@ -26,7 +26,7 @@ review. |---|---|---|---|---| | 1 | Define the agent goal and risks | Foundry docs, VS Code, Copilot | Helps define what must be proven before release. | Success criteria and risk list | | 2 | Choose Prompt Agent or Hosted Agent | Foundry portal, Foundry Toolkit, team architecture | Later references the target as `name:version` or URL. | Target type decision | -| 3 | Provision the **sandbox** and **dev** environments (separate Foundry projects for prompt agents; separate endpoints for hosted agents) | Foundry portal, `microsoft-foundry` skill, your platform | No ownership of create/deploy. | Two environments scoped to author and shared dev work | +| 3 | Provision the **sandbox** and **dev** environments (separate Foundry projects for prompt agents; separate endpoints for hosted agents) | Foundry portal, `microsoft-foundry` skill, your platform | No AgentOps create/deploy role. | Two environments scoped to author and shared dev work | | 4 | Author and iterate in **sandbox** | Foundry playground (prompt agents) or local app (hosted agents), `agentops eval run` | Local eval gate before opening a PR. | Working sandbox-validated agent | | 5 | Configure release checks | AgentOps CLI and skills | Creates `agentops.yaml` and repo-side release contract. | Release checklist in repo | | 6 | Open PR | Generated PR workflow with `--doctor-gate critical` | Routes to the right runner, normalizes proof, and blocks the PR on critical Doctor findings. | PR gate signal | @@ -554,7 +554,7 @@ PR as evidence. Production deploy workflows always run Doctor with `--severity-fail critical` regardless of this flag. No tutorial-only Action replacement is needed. The generated workflow keeps the -evaluation in Foundry while AgentOps owns the CI threshold decision and the +evaluation in Foundry while AgentOps enforces the CI threshold decision and the `results.json` / `report.md` artifacts. The detailed managed-eval view stays in Foundry Evaluations through the link in the AgentOps report. diff --git a/docs/tutorial-hosted-agent-quickstart.md b/docs/tutorial-hosted-agent-quickstart.md index 3b0fbd0..76749d0 100644 --- a/docs/tutorial-hosted-agent-quickstart.md +++ b/docs/tutorial-hosted-agent-quickstart.md @@ -8,7 +8,7 @@ or cloud-hosted URL (your **dev** environment) for CI. This path validates the AgentOps local route in a two-environment arrangement: -- Foundry or your app platform owns hosting and runtime operations in +- Foundry or your app platform manages hosting and runtime operations in each environment. - AgentOps invokes the endpoint from CI, applies repo thresholds, writes normalized `results.json`, runs Doctor with `--severity-fail critical` @@ -124,7 +124,7 @@ green dev → ready for promotion to qa / prod | Open PR | GitHub or Azure DevOps + generated PR workflow | PR workflow runs eval against the **dev URL** and Doctor with `--severity-fail critical`. | PR gate (eval thresholds + critical Doctor findings block merge). | | Merge + deploy to dev | Your existing deploy pipeline (Foundry Toolkit, azd, ACA, AKS) + generated dev deploy workflow | Update the dev endpoint with the new commit and re-evaluate. | Deploy-time gate with the same `--severity-fail critical` (always strict on deploy). | | Observe runtime | Foundry Operate, Azure Monitor, Application Insights | Confirm traces, latency, errors, and metrics exist. | Checks whether telemetry is wired. | -| Review readiness | AgentOps Doctor and Cockpit | Check CI, eval, telemetry, evidence, and links. | Primary owner of repo-side release proof. | +| Review readiness | AgentOps Doctor and Cockpit | Check CI, eval, telemetry, evidence, and links. | Primary repo-side release proof surface. | > **Architectural note.** For hosted endpoints the natural regression > gate runs at **deploy time** (post-merge), not PR time. The PR diff --git a/docs/tutorial-prompt-agent.md b/docs/tutorial-prompt-agent.md index 3324d33..2acbfa8 100644 --- a/docs/tutorial-prompt-agent.md +++ b/docs/tutorial-prompt-agent.md @@ -8,10 +8,10 @@ Cockpit. This path validates the Foundry-native multi-environment route: -- Foundry owns the prompt agent runtime, cloud evaluation execution, traces, +- Foundry manages the prompt agent runtime, cloud evaluation execution, traces, Rubric evaluator definitions, traces, Guardrails, red-team scans, and Operate dashboards in **each environment**. -- AgentOps owns repo-side readiness: source-controlled prompts, CI gates, +- AgentOps manages repo-side readiness: source-controlled prompts, CI gates, Doctor blocking, release evidence, threshold enforcement, ASSERT/ACS evidence references, and Cockpit. @@ -111,10 +111,10 @@ have a real `foundry-agent.json` artifact to open. | Step | Main tool | What you do | AgentOps role | |---|---|---|---| -| Create two Foundry projects | Foundry portal (or `microsoft-foundry` skill) | Create `travel-agent-sandbox` (where you author) and `travel-agent-dev` (left empty — CI seeds it). | No ownership; AgentOps consumes the published baseline from sandbox and bootstraps dev. | +| Create two Foundry projects | Foundry portal (or `microsoft-foundry` skill) | Create `travel-agent-sandbox` (where you author) and `travel-agent-dev` (left empty — CI seeds it). | No AgentOps create/deploy role; AgentOps consumes the published baseline from sandbox and bootstraps dev. | | Author in sandbox | Foundry playground | Iterate on the prompt safely in sandbox Foundry. | Optional spot-check via local `agentops eval run`. | | Promote the prompt to git | Editor | Copy validated instructions into `.agentops/prompts/travel-agent.md`. | The CI gate reads this file. | -| First green PR + dev deploy | GitHub Actions + Foundry dev project | Push prompt, open PR, watch CI auto-bootstrap the first version of `travel-agent` in dev from `prompt_agent_bootstrap` (the dev project is still empty at this point), evaluate it, run Doctor; merge; deploy lands in dev. | Owns the gate, the bootstrap-on-first-deploy, the threshold decision, the Doctor blocking step, the deploy artifact, and the release evidence. | +| First green PR + dev deploy | GitHub Actions + Foundry dev project | Push prompt, open PR, watch CI auto-bootstrap the first version of `travel-agent` in dev from `prompt_agent_bootstrap` (the dev project is still empty at this point), evaluate it, run Doctor; merge; deploy lands in dev. | Runs the gate, bootstrap-on-first-deploy, threshold decision, Doctor blocking step, deploy artifact, and release evidence. | | Force a regression | Editor + GitHub Actions | Edit the prompt to a worse version, push, observe BOTH eval threshold failure AND Doctor regression CRITICAL. | Catches the regression at PR time, not after merge. | | Fix and redeploy | Editor + GitHub Actions | Restore prompt, push, PR green, merge, deploy. | Records the recovery. | | Review readiness | AgentOps Doctor + Cockpit | Check CI, eval, telemetry, evidence, and links. | Turns scattered signals into release blockers, warnings, evidence files, and next actions. | @@ -1316,12 +1316,12 @@ agentops doctor --workspace . --evidence-pack `evidence.json` and `evidence.md` now include the suite/run id, total cases, violation counts, attack-success-rate, and SHA-256 hashes for both artifacts — without claiming AgentOps invented the verdicts. The verdicts -come from ASSERT and PyRIT; AgentOps owns orchestration, normalization, +come from ASSERT and PyRIT; AgentOps handles orchestration, normalization, and gating. ## 13. Generate the PR + dev deploy workflows -> **Pipeline ownership.** This tutorial uses `agentops workflow generate` +> **Pipeline responsibility.** This tutorial uses `agentops workflow generate` > because the workflow is the release-readiness contract: it stages the prompt > agent, runs eval thresholds, Doctor checks, and writes release evidence. For a > full `azd` / AI Landing Zone app, you can also use `azd pipeline config` to @@ -1396,7 +1396,7 @@ the PR template. The table below summarizes the three values: |---|---| | `critical` (default) | The PR step fails if Doctor reports any critical findings. Use this to catch regressions that pass thresholds but still drift meaningfully (for example, `groundedness` 5.0 → 4.0). | | `warning` | The PR step fails on warnings or critical findings. Tighter; useful for late-stage hardening. | -| `none` | Doctor runs advisory only. The PR step never fails because of Doctor. Use this only if you have a separate scheduled Doctor pipeline that owns the readiness call. | +| `none` | Doctor runs advisory only. The PR step never fails because of Doctor. Use this only if you have a separate scheduled Doctor pipeline that makes the readiness call. | Deploy templates always run with `--severity-fail critical` regardless of `--doctor-gate`. The gate flag affects the PR template only; deploys are diff --git a/plugins/agentops/skills/agentops-workflow/SKILL.md b/plugins/agentops/skills/agentops-workflow/SKILL.md index 90cb635..0214d1a 100644 --- a/plugins/agentops/skills/agentops-workflow/SKILL.md +++ b/plugins/agentops/skills/agentops-workflow/SKILL.md @@ -39,7 +39,7 @@ AgentOps reuses **azd** for app/infrastructure deployment when the repo already has an azd project, and stays **Foundry-native** for prompt-agent candidate workflows. Do not invent a parallel deployment system. AgentOps should gate quality and record proof; `azd provision`, `azd deploy`, azd hooks, Foundry -Toolkit, the `microsoft-foundry` skill, and project tooling own lifecycle +Toolkit, the `microsoft-foundry` skill, and project tooling manage lifecycle actions. For Foundry prompt-agent configs (`agent: name:version`), the generated eval gate @@ -452,11 +452,11 @@ needed, put it behind azd's native hook mechanism in `azure.yaml`. For Azure AI accelerators copied from templates, use AgentOps to make the landing-zone path actionable: -1. AgentOps owns eval gates, Doctor, reports, Cockpit readiness, and the +1. AgentOps handles eval gates, Doctor, reports, Cockpit readiness, and the workflow guardrails around deployment. -2. Foundry owns hosted agents, prompt-agent versions, evaluations, traces, +2. Foundry manages hosted agents, prompt-agent versions, evaluations, traces, monitoring, datasets, and operations. -3. azd/Bicep/AILZ owns app and infrastructure deploy when `azure.yaml` or +3. azd/Bicep/AILZ manages app and infrastructure deploy when `azure.yaml` or `infra/*.bicep` exists. 4. Project-specific steps such as indexing, data seeding, model deployment, container build/push, App Config updates, or private-network post-provision @@ -500,7 +500,7 @@ Prompt-agent workflows: This avoids the bad pattern of evaluating one agent version and deploying a different prompt. The invariant is: **evaluated version == deployed version**. -Foundry manages agent versions; AgentOps owns the repo-side gate and +Foundry manages agent versions; AgentOps enforces the repo-side gate and deployment record. For multi-environment prompt-agent workflows (sandbox → dev → qa → prod), strongly recommend adding the `prompt_agent_bootstrap` block so operators do not have to manually diff --git a/src/agentops/templates/skills/agentops-workflow/SKILL.md b/src/agentops/templates/skills/agentops-workflow/SKILL.md index 39cba19..a0110df 100644 --- a/src/agentops/templates/skills/agentops-workflow/SKILL.md +++ b/src/agentops/templates/skills/agentops-workflow/SKILL.md @@ -39,7 +39,7 @@ AgentOps reuses **azd** for app/infrastructure deployment when the repo already has an azd project, and stays **Foundry-native** for prompt-agent candidate workflows. Do not invent a parallel deployment system. AgentOps should gate quality and record proof; `azd provision`, `azd deploy`, azd hooks, Foundry -Toolkit, the `microsoft-foundry` skill, and project tooling own lifecycle +Toolkit, the `microsoft-foundry` skill, and project tooling manage lifecycle actions. For Foundry prompt-agent configs (`agent: name:version`), the generated eval gate @@ -453,11 +453,11 @@ needed, put it behind azd's native hook mechanism in `azure.yaml`. For Azure AI accelerators copied from templates, use AgentOps to make the landing-zone path actionable: -1. AgentOps owns eval gates, Doctor, reports, Cockpit readiness, and the +1. AgentOps handles eval gates, Doctor, reports, Cockpit readiness, and the workflow guardrails around deployment. -2. Foundry owns hosted agents, prompt-agent versions, evaluations, traces, +2. Foundry manages hosted agents, prompt-agent versions, evaluations, traces, monitoring, datasets, and operations. -3. azd/Bicep/AILZ owns app and infrastructure deploy when `azure.yaml` or +3. azd/Bicep/AILZ manages app and infrastructure deploy when `azure.yaml` or `infra/*.bicep` exists. 4. Project-specific steps such as indexing, data seeding, model deployment, container build/push, App Config updates, or private-network post-provision @@ -501,7 +501,7 @@ Prompt-agent workflows: This avoids the bad pattern of evaluating one agent version and deploying a different prompt. The invariant is: **evaluated version == deployed version**. -Foundry manages agent versions; AgentOps owns the repo-side gate and +Foundry manages agent versions; AgentOps enforces the repo-side gate and deployment record. For multi-environment prompt-agent workflows (sandbox → dev → qa → prod), strongly recommend adding the `prompt_agent_bootstrap` block so operators do not have to manually