diff --git a/.agents/skills/autoreview/SKILL.md b/.agents/skills/autoreview/SKILL.md index bac660ece..1adcde7e9 100644 --- a/.agents/skills/autoreview/SKILL.md +++ b/.agents/skills/autoreview/SKILL.md @@ -1,6 +1,6 @@ --- name: autoreview -description: "Pre-commit/ship code review: Codex default; optional Claude, Pi, Droid, Copilot, or OpenCode." +description: "Pre-commit/ship code review: Codex default; optional Claude, Pi, Droid, Copilot, Cursor Agent, or OpenCode." --- # Auto Review @@ -11,7 +11,7 @@ Codex review is the default when no engine is set. It uses `gpt-5.5` by default, Use when: -- user asks for Codex review / Claude review / Pi review / Droid review / OpenCode review / autoreview / second-model review +- user asks for Codex review / Claude review / Pi review / Droid review / Cursor Agent review / OpenCode review / autoreview / second-model review - after non-trivial code edits, before final/commit/ship - reviewing a local branch or PR branch after fixes @@ -197,10 +197,12 @@ For models with slashes or extra colons, prefer keyed form: ```bash "$AUTOREVIEW" --engine pi --model anthropic/claude-sonnet-4 --thinking high "$AUTOREVIEW" --engine opencode --model opencode/north-mini-code-free --thinking high +"$AUTOREVIEW" --engine cursor-agent --model gpt-5 "$AUTOREVIEW" --engine droid --model claude-opus-4-8 --thinking low "$AUTOREVIEW" --reviewers codex,pi --model codex=gpt-5.5 --model pi=anthropic/claude-sonnet-4 "$AUTOREVIEW" --reviewers codex,opencode --model codex=gpt-5.5 --model opencode=opencode/north-mini-code-free "$AUTOREVIEW" --reviewers codex,droid --model codex=gpt-5.5 --model droid=claude-opus-4-8 +"$AUTOREVIEW" --reviewers codex,cursor-agent --model codex=gpt-5.5 --model cursor-agent=gpt-5 ``` ## Models and thinking @@ -214,7 +216,7 @@ Recommended model defaults: | **codex** (default) | `gpt-5.5` | OpenAI's current GPT-5.5 alias | | **claude** | `claude-fable-5` | Anthropic's most capable widely released Claude model | -CLI flags and environment variables override these defaults. Droid, Copilot, Pi, and OpenCode do not get built-in model defaults here because their provider catalogs are external to the Codex/Claude closeout path and may vary by installation. +CLI flags and environment variables override these defaults. Droid, Copilot, Cursor Agent, Pi, and OpenCode do not get built-in model defaults here because their provider catalogs are external to the Codex/Claude closeout path and may vary by installation. | Engine | Model flag | Example model IDs | Thinking flag | Accepted levels | |--------|------------|-------------------|---------------|-----------------| @@ -222,6 +224,7 @@ CLI flags and environment variables override these defaults. Droid, Copilot, Pi, | **claude** | `claude --model X` | `claude-fable-5`, `claude-opus-4-8`, `claude-sonnet-4-6`, `claude-haiku-4-5` | `--effort Y` | `low`, `medium`, `high`, `xhigh`, `max` | | **droid** | `droid exec --model X` | `claude-opus-4-8`, Factory model IDs | `-r, --reasoning-effort Y` | `off`, `none`, `low`, `medium`, `high` | | **copilot** | `copilot --model X` | `gpt-5.2`, Copilot model aliases | not supported | n/a | +| **cursor-agent** | `cursor-agent --model X` | `gpt-5`, Cursor model aliases | not supported | n/a | | **pi** | `pi --model X` | `anthropic/claude-sonnet-4`, `openai/gpt-4o` | `--thinking Y` | `off`, `minimal`, `low`, `medium`, `high`, `xhigh` | | **opencode** | `opencode run -m X` | `opencode/north-mini-code-free`, OpenCode provider/model IDs | `--variant Y` | `minimal`, `low`, `medium`, `high`, `max` | @@ -243,6 +246,9 @@ Examples matching current `main` behavior: # GitHub Copilot (model only; no thinking knob) "$AUTOREVIEW" --engine copilot --model gpt-5.2 +# Cursor Agent (model only; no thinking knob) +"$AUTOREVIEW" --engine cursor-agent --model gpt-5 + # Pi with explicit model and thinking level "$AUTOREVIEW" --engine pi --model anthropic/claude-sonnet-4 --thinking high --pi-bin pi @@ -263,7 +269,9 @@ CLI flags take precedence over environment variables. | `AUTOREVIEW__THINKING` | Per-engine thinking override | | `AUTOREVIEW_CLAUDE_FALLBACK_MODEL` | Claude-only fallback chain | -Codex maps thinking to `model_reasoning_effort`. Claude maps thinking to `--effort`. Droid maps thinking to `-r, --reasoning-effort`. Pi maps thinking to `--thinking`. OpenCode maps thinking to `--variant`. Copilot rejects `--thinking`. Only Claude accepts `--fallback-model`; global CLI/env fallback requires at least one Claude reviewer, and engine-specific fallback overrides require that reviewer to be selected. Non-Claude fallback overrides, including `AUTOREVIEW__FALLBACK_MODEL`, fail closed instead of being silently ignored. +Use uppercase engine IDs with `-` replaced by `_` in environment variables, such as `AUTOREVIEW_CURSOR_AGENT_MODEL=gpt-5`. + +Codex maps thinking to `model_reasoning_effort`. Claude maps thinking to `--effort`. Droid maps thinking to `-r, --reasoning-effort`. Pi maps thinking to `--thinking`. OpenCode maps thinking to `--variant`. Copilot and Cursor Agent reject `--thinking`. Only Claude accepts `--fallback-model`; global CLI/env fallback requires at least one Claude reviewer, and engine-specific fallback overrides require that reviewer to be selected. Non-Claude fallback overrides, including `AUTOREVIEW__FALLBACK_MODEL`, fail closed instead of being silently ignored. ## Review engine isolation @@ -275,8 +283,9 @@ When autoreview runs inside the repository under review, external reviewer CLIs | **claude** | `--safe-mode --setting-sources user --strict-mcp-config --disallowedTools mcp__*` plus explicit `--allowedTools` (`--safe-mode` requires Claude Code `v2.1.169+`) | Claude Code [CLI reference](https://code.claude.com/docs/en/cli-reference) | | **pi** | `--no-approve --no-session --no-context-files --no-extensions --no-skills --no-prompt-templates --no-themes`, plus read-only tool allowlist | Pi CLI `--help`; requires Pi `v0.79.0+` | | **opencode** | `opencode run --dir --pure --format json`, prompt over stdin, neutral subprocess cwd, injected deny-by-default permissions, project config disabled | OpenCode CLI `--help` | +| **cursor-agent** | `cursor-agent --print --workspace --trust --mode ask --sandbox enabled`, prompt over stdin, neutral subprocess cwd | Cursor Agent `--help` | -Codex `--ignore-user-config` skips config loading for the exec run. Autoreview reconstructs only the documented `cli_auth_credentials_store`, `forced_login_method`, and `forced_chatgpt_workspace_id` settings from `CODEX_HOME/config.toml`, keeping authentication and workspace restrictions usable without forwarding unrelated user configuration. The explicit repo trust override and zero project-doc budget keep reviewed-repo `AGENTS.md` and `.codex/` trust surfaces out of the review prompt. `--ignore-rules` skips user/project execpolicy rules. Claude `--safe-mode` disables project hooks, skills, plugins, MCP servers, and CLAUDE.md while preserving normal authentication, model selection, built-in tools, and permissions; managed settings policy can still apply. `--setting-sources user` avoids project/local settings from the reviewed checkout, and current Claude Code docs note the project-skill blocking behavior was fixed in `v2.1.69`. `--strict-mcp-config` and `--disallowedTools mcp__*` keep MCP unavailable to the review run. `--bare` is not used here because Claude's headless docs say it skips OAuth and keychain reads. Pi `--no-approve` ignores project-local files for one run; the helper requires Pi `v0.79.0+` plus help output that advertises every required isolation flag because older legacy binaries can ignore unknown flags. The current package is `@earendil-works/pi-coding-agent`; deprecated `@mariozechner/pi-coding-agent` `0.73.x` is intentionally rejected. Pi version/help probes and the review command run from neutral temporary directories, not the reviewed repo. Pi `--no-context-files` removes `AGENTS.md`/`CLAUDE.md`, the resource-disable flags keep `.pi` extensions, skills, prompts, and themes out of the run, `--no-session` avoids writing review sessions, and the read-only allowlist omits `bash`, `edit`, and `write`. OpenCode starts from a neutral temporary directory, points at the reviewed repo with `--dir`, disables project config through `OPENCODE_DISABLE_PROJECT_CONFIG=1`, and injects `OPENCODE_CONFIG_CONTENT`; permissions default to deny, allow read/grep/glob, preserve OpenCode's `.env` ask rules, and gate `websearch`/`webfetch` with `--no-web-search`. The injected config also clears command/instruction/plugin arrays and disables write/edit/bash/task/skill/todowrite tools without changing user auth storage. The helper sends the review prompt over stdin rather than argv and extracts the final structured JSON from `type: "text"` events. OpenCode rejects `--no-tools`. +Codex `--ignore-user-config` skips config loading for the exec run. Autoreview reconstructs only the documented `cli_auth_credentials_store`, `forced_login_method`, and `forced_chatgpt_workspace_id` settings from `CODEX_HOME/config.toml`, keeping authentication and workspace restrictions usable without forwarding unrelated user configuration. The explicit repo trust override and zero project-doc budget keep reviewed-repo `AGENTS.md` and `.codex/` trust surfaces out of the review prompt. `--ignore-rules` skips user/project execpolicy rules. Claude `--safe-mode` disables project hooks, skills, plugins, MCP servers, and CLAUDE.md while preserving normal authentication, model selection, built-in tools, and permissions; managed settings policy can still apply. `--setting-sources user` avoids project/local settings from the reviewed checkout, and current Claude Code docs note the project-skill blocking behavior was fixed in `v2.1.69`. `--strict-mcp-config` and `--disallowedTools mcp__*` keep MCP unavailable to the review run. `--bare` is not used here because Claude's headless docs say it skips OAuth and keychain reads. Pi `--no-approve` ignores project-local files for one run; the helper requires Pi `v0.79.0+` plus help output that advertises every required isolation flag because older legacy binaries can ignore unknown flags. The current package is `@earendil-works/pi-coding-agent`; deprecated `@mariozechner/pi-coding-agent` `0.73.x` is intentionally rejected. Pi version/help probes and the review command run from neutral temporary directories, not the reviewed repo. Pi `--no-context-files` removes `AGENTS.md`/`CLAUDE.md`, the resource-disable flags keep `.pi` extensions, skills, prompts, and themes out of the run, `--no-session` avoids writing review sessions, and the read-only allowlist omits `bash`, `edit`, and `write`. OpenCode starts from a neutral temporary directory, points at the reviewed repo with `--dir`, disables project config through `OPENCODE_DISABLE_PROJECT_CONFIG=1`, and injects `OPENCODE_CONFIG_CONTENT`; permissions default to deny, allow read/grep/glob, preserve OpenCode's `.env` ask rules, and gate `websearch`/`webfetch` with `--no-web-search`. The injected config also clears command/instruction/plugin arrays and disables write/edit/bash/task/skill/todowrite tools without changing user auth storage. Cursor Agent requires a trusted workspace for headless runs, so autoreview trusts only a helper-owned empty temporary workspace and passes the review bundle over stdin; the reviewed repo is present only inside the prompt bundle and is never the trusted Cursor workspace. Cursor Agent rejects `--no-web-search` because the CLI does not expose a CLI-level web-search disable switch. The helper sends the review prompt over stdin rather than argv and extracts the final structured JSON from `type: "text"` or `type: "result"` events. OpenCode and Cursor Agent reject `--no-tools`. ## Context Efficiency @@ -315,13 +324,13 @@ The helper: - otherwise uses current PR base if `gh pr view` works - otherwise uses `origin/main` for non-main branches - does not fetch automatically during branch review; the selected base ref must already resolve locally -- supports `--engine codex`, `claude`, `droid`, `copilot`, `pi`, and `opencode`; default is `AUTOREVIEW_ENGINE` or `codex`; Codex should remain the default when nothing is set +- supports `--engine codex`, `claude`, `droid`, `copilot`, `cursor-agent`, `pi`, and `opencode`; default is `AUTOREVIEW_ENGINE` or `codex`; Codex should remain the default when nothing is set - resolves bare `git`, `gh`, reviewer, and PowerShell shell commands from absolute `PATH` entries only, never from the reviewed checkout; explicit relative `--*-bin` paths are resolved from the reviewed repository root - use `--mode commit --commit ` for already-committed work, especially clean `main` after landing - should be left in `--mode auto` or forced to `--mode branch` for PR/branch work; do not force `--mode local` after committing - writes only to stdout unless `--output`, `--json-output`, or live streamed engine stderr is set - supports `--dry-run`, `--parallel-tests`, `--parallel-tests-shell`, `--prompt`, repo-relative `--prompt-file`, repo-relative `--dataset`, `--no-tools`, `--no-web-search`, and commit refs -- supports `--stream-engine-output` or `AUTOREVIEW_STREAM_ENGINE_OUTPUT=1` for live engine text while preserving structured validation; Codex and Claude hide tool/file event details, emit compact activity summaries, and report usage at turn completion +- supports `--stream-engine-output` or `AUTOREVIEW_STREAM_ENGINE_OUTPUT=1` for live engine text while preserving structured validation; Codex, Claude, and Cursor Agent hide tool/file event details, emit compact activity summaries, and report usage at turn completion - supports opt-in review panels with `--panel` / `--reviewers`, plus per-engine `--model`, `--thinking`, and Claude `--fallback-model` - uses built-in model defaults `codex=gpt-5.5` and `claude=claude-fable-5`; honors `AUTOREVIEW_MODEL`, `AUTOREVIEW_THINKING`, `AUTOREVIEW_FALLBACK_MODEL`, and per-engine `AUTOREVIEW__MODEL` / `AUTOREVIEW__THINKING` environment overrides when CLI flags are omitted - allows read-only tools and web search by default where the selected CLI supports them; forbids nested review in the prompt; Codex is run through `codex exec` with auth-only user settings, read-only sandbox, reviewed-repo instruction/config/rule isolation flags, and structured output @@ -329,6 +338,7 @@ The helper: - runs Droid with `droid exec` in read-only mode, forwards `--model` and `-r, --reasoning-effort`, and switches `--output-format` to `stream-json` when streaming is enabled - runs Pi `v0.79.0+` from neutral temporary directories with `--no-approve`, `--no-session`, disabled Pi context/resource loading, and built-in read-only tools (`read,grep,find,ls`) when tools are enabled - runs OpenCode with `opencode run --dir --pure --format json` from a neutral temporary directory, forwards `--model` and `--variant`, injects deny-by-default permissions, disables project config loading, and passes the review prompt over stdin +- runs Cursor Agent from a helper-owned temporary workspace with `--trust`, `--mode ask`, and sandboxing enabled, forwards `--model`, passes the review prompt over stdin, and rejects `--no-web-search` - prints `review still running: elapsed=s pid=` to stderr at long-running intervals while waiting for the selected review engine, unless streamed output or compact Codex activity has been visible recently - prints `autoreview clean: no accepted/actionable findings reported` when the selected review command exits 0 - exits nonzero when accepted/actionable findings are present diff --git a/.agents/skills/autoreview/scripts/autoreview b/.agents/skills/autoreview/scripts/autoreview index bcbf0b8c8..91f938d8b 100755 --- a/.agents/skills/autoreview/scripts/autoreview +++ b/.agents/skills/autoreview/scripts/autoreview @@ -19,7 +19,7 @@ from pathlib import Path from typing import Any, Callable -ENGINES = ("codex", "claude", "droid", "copilot", "pi", "opencode") +ENGINES = ("codex", "claude", "droid", "copilot", "cursor-agent", "pi", "opencode") SAFE_GIT_CONFIG_ARGS = ( "-c", "core.fsmonitor=false", @@ -92,6 +92,7 @@ THINKING_LEVELS_BY_ENGINE = { "claude": {"low", "medium", "high", "xhigh", "max"}, "droid": {"off", "none", "low", "medium", "high"}, "copilot": set(), + "cursor-agent": set(), "pi": {"off", "minimal", "low", "medium", "high", "xhigh"}, "opencode": {"minimal", "low", "medium", "high", "max"}, } @@ -1024,7 +1025,7 @@ def build_prompt(repo: Path, target: str, target_ref: str | None, bundle: str, e {json.dumps(SCHEMA, indent=2)} - Do not modify files. - Do not invoke nested reviewers or review tools. - - Forbidden nested review commands include: codex review, autoreview, claude review, oracle review. + - Forbidden nested review commands include: codex review, autoreview, claude review, cursor-agent, oracle review. - You may use read-only tools and web search to inspect files, dependency contracts, upstream docs, current behavior, and security implications. - Shell commands, if available, must be read-only inspection commands. Do not run tests, formatters, package installs, generators, network mutation commands, git mutation commands, or commands that write files. - Report only actionable defects introduced or exposed by this change. @@ -1414,6 +1415,46 @@ def run_copilot(args: argparse.Namespace, repo: Path, prompt: str) -> str: return result.stdout +def run_cursor_agent(args: argparse.Namespace, repo: Path, prompt: str) -> str: + if args.thinking: + raise SystemExit("--thinking is not supported by the cursor-agent engine") + if not args.tools: + raise SystemExit("--no-tools is not supported by the cursor-agent engine; use --engine claude --no-tools for a no-tools run") + if not args.web_search: + raise SystemExit("--no-web-search is not supported by the cursor-agent engine; use an engine with a CLI-level web-search disable switch") + with tempfile.TemporaryDirectory(prefix="autoreview-cursor-agent.") as tempdir: + # Trust only the helper-owned empty workspace, never the reviewed repo. + # Cursor may load trusted project hooks/config before model instructions apply. + cmd = [ + resolve_command(args.cursor_agent_bin, repo), + "--print", + "--output-format", + "stream-json" if args.stream_engine_output else "json", + "--trust", + "--workspace", + tempdir, + "--mode", + "ask", + "--sandbox", + "enabled", + ] + if args.model: + cmd.extend(["--model", args.model]) + result = run_with_heartbeat( + cmd, + Path(tempdir), + input_text=prompt, + label="cursor-agent", + stream_output=args.stream_engine_output, + stream_display=CursorAgentStreamDisplay() if args.stream_engine_output else None, + resolve_root=repo, + env=safe_engine_env(repo, [Path(cmd[0]).parent]), + ) + if result.returncode != 0: + raise SystemExit(f"cursor-agent engine failed ({result.returncode})\n{result.stderr or result.stdout}") + return result.stdout + + def build_opencode_cmd(args: argparse.Namespace, repo: Path) -> list[str]: cmd = [ resolve_command(args.opencode_bin, repo), @@ -1603,6 +1644,41 @@ class ClaudeStreamDisplay: return text +class CursorAgentStreamDisplay(ClaudeStreamDisplay): + def __call__(self, name: str, line: str) -> str | None: + if name != "stdout": + return line + try: + event = json.loads(line) + except json.JSONDecodeError: + return self.visible(line) + event_type = event.get("type") + if event_type == "system": + return self.visible(f"cursor-agent session: {event.get('session_id', '')}\n") + if event_type == "assistant": + return self.assistant_message(event) + if event_type == "result": + return self.visible(self.flush_hidden() + self.result_summary(event)) + return self.hidden_activity() + + def result_summary(self, event: dict[str, Any]) -> str: + usage = event.get("usage") + fields: list[str] = [] + if isinstance(usage, dict): + for key in ("inputTokens", "cacheReadTokens", "cacheWriteTokens", "outputTokens"): + value = usage.get(key) + if isinstance(value, int): + fields.append(f"{key}={value}") + return "cursor-agent usage: " + " ".join(fields) + "\n" if fields else "cursor-agent turn completed\n" + + def flush_hidden(self) -> str: + if not self.hidden_events: + return "" + count = self.hidden_events + self.hidden_events = 0 + return f"cursor-agent activity: {count} hidden tool/status events\n" + + def format_codex_usage(usage: dict[str, Any]) -> str: fields = [ "input_tokens", @@ -1904,6 +1980,26 @@ print(json.dumps({"type": "text", "part": {"type": "text", "text": json.dumps(re ''' +def fake_cursor_agent_script() -> str: + return r'''#!/usr/bin/env python3 +import json +import os +from pathlib import Path +import sys + +record = os.environ["AUTOREVIEW_FAKE_RECORD"] +args = sys.argv[1:] +Path(record).write_text(json.dumps({"argv": args, "cwd": os.getcwd(), "stdin": sys.stdin.read()})) +report = { + "findings": [], + "overall_correctness": "patch is correct", + "overall_explanation": "fake cursor-agent clean", + "overall_confidence": 0.99, +} +print(json.dumps({"type": "result", "subtype": "success", "result": json.dumps(report)})) +''' + + def self_test_engine_isolation() -> int: with tempfile.TemporaryDirectory(prefix="autoreview-isolation-test.") as tempdir: root = Path(tempdir) @@ -1914,6 +2010,7 @@ def self_test_engine_isolation() -> int: claude_bin = root / "claude" pi_bin = root / "pi" opencode_bin = root / "opencode" + cursor_agent_bin = root / "cursor-agent" record_path = root / "record.json" pi_invocations_path = root / "pi-invocations.jsonl" hostile_ps_path = root / "hostile-ps-ran" @@ -1921,11 +2018,13 @@ def self_test_engine_isolation() -> int: write_executable(claude_bin, fake_claude_script()) write_executable(pi_bin, fake_pi_script()) write_executable(opencode_bin, fake_opencode_script()) + write_executable(cursor_agent_bin, fake_cursor_agent_script()) write_executable(repo / "ps", f"#!/usr/bin/env python3\nfrom pathlib import Path\nPath({str(hostile_ps_path)!r}).write_text('ran')\n") args = argparse.Namespace( codex_bin=str(codex_bin), claude_bin=str(claude_bin), + cursor_agent_bin=str(cursor_agent_bin), pi_bin=str(pi_bin), opencode_bin=str(opencode_bin), tools=True, @@ -2037,6 +2136,32 @@ def self_test_engine_isolation() -> int: if hostile_ps_path.exists(): raise SystemExit("heartbeat metrics isolation self-test failed: repo-local ps executed") + run_cursor_agent(args, repo, "review hostile patch") + cursor_record = json.loads(record_path.read_text()) + cursor_argv = cursor_record["argv"] + for required in [ + "--print", + "--output-format", + "json", + "--trust", + "--workspace", + "--mode", + "ask", + "--sandbox", + "enabled", + ]: + if required not in cursor_argv: + raise SystemExit(f"cursor-agent isolation self-test failed: missing {required}") + workspace = cursor_argv[cursor_argv.index("--workspace") + 1] + if Path(workspace).resolve() == repo.resolve(): + raise SystemExit("cursor-agent isolation self-test failed: trusted reviewed repo") + if Path(cursor_record["cwd"]).resolve() == repo.resolve(): + raise SystemExit("cursor-agent isolation self-test failed: review ran inside hostile repo") + if cursor_record["stdin"] != "review hostile patch": + raise SystemExit("cursor-agent isolation self-test failed: prompt not delivered over stdin") + if "review hostile patch" in cursor_argv: + raise SystemExit("cursor-agent isolation self-test failed: prompt leaked into argv") + os.environ["AUTOREVIEW_FAKE_CLAUDE_VERSION"] = "2.1.168 (Claude Code)" try: ensure_claude_isolation_supported(args, repo) @@ -2162,13 +2287,35 @@ def parse_json_candidate(text: str) -> Any | None: try: parsed = json.loads(stripped) except json.JSONDecodeError: - return None + return parse_embedded_json_object(stripped) if isinstance(parsed, str) and parsed != text: nested = parse_json_candidate(parsed) return nested if nested is not None else parsed return parsed +def parse_embedded_json_object(text: str) -> Any | None: + decoder = json.JSONDecoder() + candidates: list[Any] = [] + for index, char in enumerate(text): + if char not in "[{": + continue + try: + parsed, _end = decoder.raw_decode(text[index:]) + except json.JSONDecodeError: + continue + if isinstance(parsed, str): + nested = parse_json_candidate(parsed) + if nested is not None: + candidates.append(nested) + else: + candidates.append(parsed) + for candidate in reversed(candidates): + if isinstance(candidate, dict) and "findings" in candidate: + return candidate + return candidates[-1] if candidates else None + + def _assert_opencode_permission(web_search: bool) -> None: env = opencode_review_env(web_search) if env.get("OPENCODE_DISABLE_PROJECT_CONFIG") != "1": @@ -2479,7 +2626,7 @@ def parse_args() -> argparse.Namespace: parser.add_argument("--base") parser.add_argument("--commit", default="HEAD") parser.add_argument("--engine", choices=ENGINES, default=os.environ.get("AUTOREVIEW_ENGINE", "codex")) - parser.add_argument("--reviewers", help="Comma-separated review panel, e.g. codex,claude,pi or codex:gpt-5.5:high.") + parser.add_argument("--reviewers", help="Comma-separated review panel, e.g. codex,claude,pi,cursor-agent or codex:gpt-5.5:high.") parser.add_argument("--panel", action="store_true", help="Run a Codex/Claude review panel unless --engine changes the first reviewer.") parser.add_argument( "--model", @@ -2497,9 +2644,10 @@ def parse_args() -> argparse.Namespace: parser.add_argument("--claude-bin", default=os.environ.get("CLAUDE_BIN", "claude")) parser.add_argument("--droid-bin", default=os.environ.get("DROID_BIN", "droid")) parser.add_argument("--copilot-bin", default=os.environ.get("COPILOT_BIN", "copilot")) + parser.add_argument("--cursor-agent-bin", default=os.environ.get("CURSOR_AGENT_BIN", "cursor-agent")) parser.add_argument("--opencode-bin", default=os.environ.get("OPENCODE_BIN", "opencode")) parser.add_argument("--pi-bin", default=os.environ.get("PI_BIN", "pi")) - parser.add_argument("--no-tools", dest="tools", action="store_false", default=True, help="Disable tools for engines that support it. Codex, copilot, and opencode reject no-tools review.") + parser.add_argument("--no-tools", dest="tools", action="store_false", default=True, help="Disable tools for engines that support it. Codex, copilot, cursor-agent, and opencode reject no-tools review.") parser.add_argument("--self-test-opencode-jsonl-parser", action="store_true", help=argparse.SUPPRESS) parser.add_argument("--self-test-opencode-isolation", action="store_true", help=argparse.SUPPRESS) parser.add_argument("--self-test-opencode-real-project-isolation", action="store_true", help=argparse.SUPPRESS) @@ -2520,7 +2668,7 @@ def parse_args() -> argparse.Namespace: "--stream-engine-output", action="store_true", default=os.environ.get("AUTOREVIEW_STREAM_ENGINE_OUTPUT") == "1", - help="Stream review engine output while preserving buffered output for validation. Codex output is filtered to hide tool/file chatter.", + help="Stream review engine output while preserving buffered output for validation. Codex, Claude, and cursor-agent output is filtered to hide tool/file chatter.", ) parser.add_argument("--parallel-tests", help="Run a test command concurrently with review; failure fails the helper.") parser.add_argument( @@ -2552,6 +2700,8 @@ def run_engine(args: argparse.Namespace, repo: Path, prompt: str) -> str: return run_droid(args, repo, prompt) if args.engine == "copilot": return run_copilot(args, repo, prompt) + if args.engine == "cursor-agent": + return run_cursor_agent(args, repo, prompt) if args.engine == "pi": return run_pi(args, repo, prompt) if args.engine == "opencode": @@ -2566,7 +2716,8 @@ def env_defaults_for(env_suffix: str) -> tuple[str | None, dict[str, str]]: global_value = global_value.strip() or None per_engine: dict[str, str] = {} for engine in ENGINES: - value = os.environ.get(f"AUTOREVIEW_{engine.upper()}_{env_key}") + engine_key = engine.upper().replace("-", "_") + value = os.environ.get(f"AUTOREVIEW_{engine_key}_{env_key}") if value is None: continue value = value.strip() @@ -2801,6 +2952,7 @@ def self_test_config_defaults() -> None: "AUTOREVIEW_MODEL", "AUTOREVIEW_CODEX_MODEL", "AUTOREVIEW_CLAUDE_MODEL", + "AUTOREVIEW_CURSOR_AGENT_MODEL", "AUTOREVIEW_THINKING", "AUTOREVIEW_CODEX_THINKING", "AUTOREVIEW_CLAUDE_THINKING", @@ -2828,6 +2980,11 @@ def self_test_config_defaults() -> None: raise SystemExit(f"self-test config defaults failed: global model={global_only.model!r}") if global_only.thinking != "low": raise SystemExit(f"self-test config defaults failed: global thinking={global_only.thinking!r}") + os.environ.pop("AUTOREVIEW_THINKING") + os.environ["AUTOREVIEW_CURSOR_AGENT_MODEL"] = "env-cursor-model" + cursor_agent = reviewer_args(reviewer_test_args(engine="cursor-agent"))[0] + if cursor_agent.model != "env-cursor-model": + raise SystemExit(f"self-test config defaults failed: cursor-agent model={cursor_agent.model!r}") cli = reviewer_args(reviewer_test_args(engine="codex", model=["cli-model"], thinking=["medium"]))[0] if cli.model != "cli-model" or cli.thinking != "medium": raise SystemExit("self-test config defaults failed: CLI values should override env") diff --git a/.agents/skills/autoreview/scripts/test-review-harness.ps1 b/.agents/skills/autoreview/scripts/test-review-harness.ps1 index c74ff0471..bbfe8870f 100644 --- a/.agents/skills/autoreview/scripts/test-review-harness.ps1 +++ b/.agents/skills/autoreview/scripts/test-review-harness.ps1 @@ -3,7 +3,7 @@ param( [ValidateSet('malicious', 'benign')] [string] $Fixture, - [ValidateSet('codex', 'claude', 'droid', 'copilot', 'pi', 'opencode')] + [ValidateSet('codex', 'claude', 'droid', 'copilot', 'cursor-agent', 'pi', 'opencode')] [string[]] $Engine, [Alias('h')] diff --git a/.agents/skills/autoreview/scripts/test-review-harness.py b/.agents/skills/autoreview/scripts/test-review-harness.py index a9a463b72..883287eeb 100644 --- a/.agents/skills/autoreview/scripts/test-review-harness.py +++ b/.agents/skills/autoreview/scripts/test-review-harness.py @@ -13,7 +13,7 @@ from pathlib import Path -ENGINES = ("codex", "claude", "droid", "copilot", "pi", "opencode") +ENGINES = ("codex", "claude", "droid", "copilot", "cursor-agent", "pi", "opencode") DEFAULT_ENGINES = ("codex", "claude") MALICIOUS_INITIAL = """export function uploadPath(name) { diff --git a/.agents/skills/autoreview/tests/test_autoreview_hardening.py b/.agents/skills/autoreview/tests/test_autoreview_hardening.py index e1bae3d96..d16a94d17 100644 --- a/.agents/skills/autoreview/tests/test_autoreview_hardening.py +++ b/.agents/skills/autoreview/tests/test_autoreview_hardening.py @@ -2,6 +2,7 @@ from __future__ import annotations import argparse +import json import os import runpy import subprocess @@ -204,6 +205,88 @@ def fake_run_with_heartbeat( self.assertIn("--allow-tool=web_fetch", captured[-1]) self.assertIn("--allow-all-urls", captured[-1]) + def test_cursor_agent_runs_from_helper_owned_workspace(self) -> None: + captured: list[dict[str, object]] = [] + + def fake_run_with_heartbeat( + cmd: list[str], + cwd: Path, + **kwargs: object, + ) -> subprocess.CompletedProcess[str]: + captured.append({"cmd": cmd, "cwd": cwd, **kwargs}) + report = { + "findings": [], + "overall_correctness": "patch is correct", + "overall_explanation": "ok", + "overall_confidence": 0.9, + } + return subprocess.CompletedProcess( + cmd, + 0, + json.dumps({"type": "result", "result": json.dumps(report)}), + "", + ) + + self.helper["run_cursor_agent"].__globals__["run_with_heartbeat"] = fake_run_with_heartbeat + self.helper["run_cursor_agent"].__globals__["resolve_command"] = ( + lambda command, repo: f"/resolved/{command}" + ) + args = argparse.Namespace( + cursor_agent_bin="cursor-agent", + thinking=None, + tools=True, + model=None, + web_search=True, + stream_engine_output=False, + ) + + self.helper["run_cursor_agent"](args, Path("/repo"), "prompt") + + record = captured[-1] + cmd = record["cmd"] + cwd = record["cwd"] + self.assertIsInstance(cmd, list) + self.assertIsInstance(cwd, Path) + self.assertIn("--trust", cmd) + self.assertIn("--workspace", cmd) + workspace = Path(cmd[cmd.index("--workspace") + 1]) + self.assertEqual(workspace, cwd) + self.assertNotEqual(cwd, Path("/repo")) + self.assertIn("--mode", cmd) + self.assertEqual(cmd[cmd.index("--mode") + 1], "ask") + self.assertIn("--sandbox", cmd) + self.assertEqual(cmd[cmd.index("--sandbox") + 1], "enabled") + self.assertEqual(record["input_text"], "prompt") + self.assertNotIn("prompt", cmd) + self.assertEqual(record["resolve_root"], Path("/repo")) + + def test_cursor_agent_rejects_no_web_search(self) -> None: + args = argparse.Namespace( + cursor_agent_bin="cursor-agent", + thinking=None, + tools=True, + model=None, + web_search=False, + stream_engine_output=False, + ) + + with self.assertRaisesRegex(SystemExit, "--no-web-search is not supported"): + self.helper["run_cursor_agent"](args, Path("/repo"), "prompt") + + def test_extract_json_accepts_cursor_agent_preface_text(self) -> None: + report = { + "findings": [], + "overall_correctness": "patch is correct", + "overall_explanation": "ok", + "overall_confidence": 0.9, + } + + parsed = self.helper["extract_json"]( + "I checked the patch and will now return JSON.\n" + json.dumps(report) + ) + + self.assertEqual(parsed["overall_correctness"], "patch is correct") + if __name__ == "__main__": unittest.main()