SkillFlowGuard

A lightweight workflow-level security auditor for agent skill ecosystems.

SkillFlowGuard detects risks that emerge when individually reasonable skills are composed into a workflow, such as recommendation chains, artifact handoffs, permission escalation, and hidden natural-language coordination signals.

Why SkillFlowGuard?

Modern agent systems often compose multiple tools, skills, or graph nodes into a single workflow. A single skill may look safe in isolation, but the workflow can become risky when:

one skill recommends another downstream skill,
one skill writes an artifact that a later skill reads,
a workflow moves from local-only access to network access,
natural-language instructions imply hidden handoffs or execution coupling.

SkillFlowGuard audits these cross-skill relationships before execution.

Features

Workflow-level risk detection across composed agent skills
Structured, document, and optional LLM-assisted analysis
Generic JSON and LangGraph-style workflow import adapters
Text, JSON, dashboard-style HTML, SARIF, and GitHub Code Scanning output
CI-oriented controls: --fail-on, --min-level, and TOML config
Config-based suppressions for accepted findings with required reasons
Baseline comparison for CI workflows that should fail only on new findings
Stable finding fingerprints for baseline comparison and SARIF integrations
Synthetic evaluation benchmark with pytest and GitHub Actions coverage

Installation

git clone https://github.com/Calvin1989/SkillFlowGuard.git
cd SkillFlowGuard
pip install -e .

For optional LLM support:

pip install -e ".[llm]"

Quick Start

Analyze a workflow:

skillflowguard analyze examples/suspicious_chain --extract-doc

Generate JSON:

skillflowguard analyze examples/suspicious_chain --extract-doc --format json

Generate SARIF for security tooling:

skillflowguard analyze examples/suspicious_chain --extract-doc --format sarif --output reports/suspicious.sarif

Filter displayed findings by severity:

skillflowguard analyze examples/suspicious_chain --extract-doc --min-level high

Generate a dashboard-style HTML report:

skillflowguard analyze examples/suspicious_chain --extract-doc --format html --output reports/suspicious.html

The HTML report is a zero-dependency static file with severity filter controls, expandable finding cards, baseline delta section, and suppressed findings display. Open it directly in a browser -- no server or build step needed.

Use a project-level config file:

skillflowguard analyze examples/suspicious_chain --config examples/skillflowguard.toml

Example config:

[analysis]
extract_doc = true
format = "text"
min_level = "medium"

[[suppressions]]
rule = "cross_skill_recommendation"
reason = "Accepted in the sample workflow."

Invalid config values are rejected with friendly CLI errors and exit code 2.

Suppressions hide accepted findings from normal report output while preserving them in JSON under suppressed_findings.

Suppression rules are validated against the built-in rule catalog to avoid silent typos.

--min-level filters displayed findings only. Risk score, risk level, and --fail-on are still based on the full analysis result.

Example Output

Summary:
  Findings: 4
  Risk Score: 0.85
  Risk Level: HIGH
  Document Extraction: ON
  LLM Analysis: OFF

Detected Risks:
  [MEDIUM] code-review recommends report-exporter, which appears later in the workflow
  [HIGH] code-review writes [report.json], and report-exporter reads them later
  [HIGH] report-exporter requests network access after code-review used local-only permissions
  [CRITICAL] recommendation + artifact dependency + network access appear in one chain

Detection Rules

List built-in rules:

skillflowguard rules
skillflowguard rules --format json

Rule	Level	Description
`cross_skill_recommendation`	Medium	A skill recommends another downstream skill.
`workspace_anchor_dependency`	High	A skill writes an artifact that a later skill reads.
`permission_escalation`	High	The workflow moves from local-only permissions to network access.
`description_permission_mismatch`	Medium	A skill claims local/offline behavior but requests network permission.
`combined_high_risk_chain`	Critical	Recommendation, artifact dependency, and network access occur together.
`over_privileged_skill`	Medium	A skill combines read, write, and network privileges, which may increase blast radius if misused.

Rule metadata is centralized and reused by the rules CLI command and SARIF report descriptors.

Import External Workflows

Generic JSON import:

skillflowguard import generic-json examples/generic_adapter_input.json --output imported/generic_chain
skillflowguard analyze imported/generic_chain

LangGraph-style import:

skillflowguard import langgraph-style examples/langgraph_style_input.json --output imported/langgraph_chain
skillflowguard analyze imported/langgraph_chain

The LangGraph-style adapter supports deterministic graph-style JSON with nodes, edges, and an entrypoint. It does not parse arbitrary LangGraph Python programs.

Optional LLM Analysis

LLM mode extracts subtle semantic signals from SKILL.md, such as implicit skill pairing or artifact handoff language.

Default provider:

skillflowguard analyze examples/subtle_chain --llm

OpenAI-compatible provider:

skillflowguard analyze examples/subtle_chain --llm \
  --llm-provider openai-compatible \
  --llm-base-url <provider-base-url> \
  --llm-model <model-name> \
  --llm-api-key-env <ENV_VAR_NAME>

--llm sends SKILL.md content to the configured provider. Do not use it on sensitive documents unless authorized.

Evaluation

SkillFlowGuard includes a synthetic benchmark under evaluation/.

python evaluation/run_eval.py --mode structured
python evaluation/run_eval.py --mode extract-doc
python evaluation/run_eval.py --mode llm-mock

Markdown summary:

python evaluation/run_eval.py --mode structured --format markdown

Current rule-level results on 21 manually labeled synthetic workflow cases:

Mode	Precision	Recall	F1
`structured`	1.000	0.566	0.723
`extract-doc`	1.000	0.755	0.860
`llm-mock`	1.000	1.000	1.000

llm-mock is deterministic and does not call a real LLM API. The benchmark is synthetic and should not be interpreted as real-world detection performance.

CI and Code Scanning

CI gate example:

skillflowguard analyze examples/suspicious_chain --extract-doc --fail-on high

SARIF output can be uploaded to GitHub Code Scanning. See:

Baseline Comparison

For CI workflows that accumulate accepted findings over time, --baseline compares the current report against a previously accepted JSON report. Only findings not present in the baseline are considered new.

# Generate a baseline report
skillflowguard analyze examples/suspicious_chain --extract-doc --format json --output baseline.json

# Compare against the baseline, fail only on new high+ findings
skillflowguard analyze examples/suspicious_chain --extract-doc --baseline baseline.json --fail-on-new high

The JSON output includes a baseline section with new_findings, blocking_new_findings, and counts.

Finding identity is based on rule + detail. --fail-on-new supports medium, high, and critical thresholds.

Findings include stable fingerprints for baseline comparison, suppressions, and SARIF partialFingerprints.

Text reports include baseline comparison counts and list blocking new findings when --fail-on-new is used.

Pre-commit Integration

SkillFlowGuard can run as a pre-commit hook to block risky workflows before they enter your repository.

From GitHub

Add to your .pre-commit-config.yaml:

repos:
  - repo: https://github.com/Calvin1989/SkillFlowGuard
    rev: v2.14.0
    hooks:
      - id: skillflowguard
        args: ["examples/suspicious_chain", "--extract-doc", "--fail-on", "high"]

Install and run:

pre-commit install
pre-commit run skillflowguard --all-files

Local Example

A local config is included for quick demos:

pre-commit run --config examples/pre_commit/.pre-commit-config.yaml --all-files

Quick Demo

Run the interview demo script to generate sample reports in all formats:

# PowerShell
scripts/demo.ps1

# POSIX shell
sh scripts/demo.sh

Localized HTML Reports

HTML reports support localized labels:

skillflowguard analyze examples/suspicious_chain --extract-doc --format html --report-language zh --output reports/demo/suspicious.zh.html

Rule IDs and fingerprints remain stable across languages.

See Interview Demo Kit for a full walkthrough and talking points.

Testing

pytest

Current suite:

148 passed

Documentation

Project Structure

skillflowguard/
  adapters/
  loader.py
  doc_parser.py
  llm_doc_parser.py
  rule_metadata.py
  rules.py
  analyzer.py
  report.py
  config.py
  cli.py

examples/
evaluation/
tests/
docs/

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.github/workflows		.github/workflows
docs		docs
evaluation		evaluation
examples		examples
scripts		scripts
skillflowguard		skillflowguard
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-hooks.yaml		.pre-commit-hooks.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkillFlowGuard

Why SkillFlowGuard?

Features

Installation

Quick Start

Example Output

Detection Rules

Import External Workflows

Optional LLM Analysis

Evaluation

CI and Code Scanning

Baseline Comparison

Pre-commit Integration

From GitHub

Local Example

Quick Demo

Localized HTML Reports

Testing

Documentation

Project Structure

Roadmap

Changelog

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SkillFlowGuard

Why SkillFlowGuard?

Features

Installation

Quick Start

Example Output

Detection Rules

Import External Workflows

Optional LLM Analysis

Evaluation

CI and Code Scanning

Baseline Comparison

Pre-commit Integration

From GitHub

Local Example

Quick Demo

Localized HTML Reports

Testing

Documentation

Project Structure

Roadmap

Changelog

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages