This directory contains a tool for evaluating the SAST (Static Application Security Testing) tool on individual GitHub pull requests.
The evaluation tool allows you to run the Claude Code Security Reviewer on any GitHub PR to analyze its security findings. This is useful for:
- Testing the tool on specific PRs
- Evaluating performance and accuracy
- Debugging security analysis issues
- Python 3.9+
- Git 2.20+ (for worktree support)
- GitHub CLI (
gh) for API access - Environment variables:
ANTHROPIC_API_KEY: Required for Claude API accessGITHUB_TOKEN: Recommended for GitHub API rate limits
Run an evaluation on a single PR:
python -m claudecode.evals.run_eval example/repo#123 --verbose- PR specification: Required positional argument in format
owner/repo#pr_number --output-dir PATH: Directory for results (default:./eval_results)--work-dir PATH: Directory where git repositories will be cloned and stored (default:~/code/audit)--verbose: Enable verbose logging to see detailed progress
The evaluation generates a JSON file in the output directory with:
- Success/failure status
- Runtime metrics
- Security findings count
- Detailed findings with file, line, severity, and descriptions
Example output file: pr_example_repo_123.json
The evaluation tool uses git worktrees for efficient repository management:
- Clones the repository once as a base
- Creates lightweight worktrees for each PR evaluation
- Automatically handles cleanup of worktrees
- Runs the SAST audit in the PR-specific worktree