A deterministic, offline compiler that turns a natural-language description of a bioinformatics analysis into a self-contained, agent-executable RO-Crate package — with a full-lifecycle conversational shell wrapped around the executing package.
The compiler classifies the intake, selects an archetype, builds a task DAG, emits a package, and an execution harness drives an agent (Claude Code, a shell script, anything callable with a package path) against the emitted DAG. The emitted package is an ECAA (Evidence-Carrying Analysis Artifact) v0.2 package — a typed RO-Crate that carries, alongside the analysis itself, the claims it supports, the evidence backing each claim, and the decision record that produced them. An embedded ECAA validator checks the package against a machine-checkable contract over those subgraphs. Emit-time validation defaults to pure-Rust JSON Schema (schema_only) and is advisory (warn-only) on a plain emit unless ECAA_VALIDATION_BLOCK_ON_FAIL=1; the local conformance gate (make conformance, ECAA_CONFORMANCE_MODE=1) runs it block-on-fail. The contract establishes machine-checkable consistency between an analysis's claims, evidence, decisions, and execution provenance — it does not establish biological validity; treat verdicts as a hygiene floor, not a quality ceiling.
| Component | Crate / dir | Role |
|---|---|---|
| Compiler | crates/{core, cli} |
Classifier → DAG → emitter. Synchronous, no LLM dependency. |
| Conversation shim | crates/conversation |
Closed tool vocabulary wraps the compiler. LLM is a UX shim only. |
| Chat server | crates/server |
Axum HTTP + SSE backend at /api/chat/* and /api/git/*. |
| Execution harness | crates/harness |
Loops an agent subprocess against ready tasks. Local / Mock / AWS / SLURM executors. |
| ECAA validator | crates/{ecaa-conformance, ecaa-types} + docs/ecaa-spec/ |
Emits + validates the ECAA conformance contract. ecaa-conformance re-exports core's public API and is the conformance harness a second implementer runs against their own packages — not an independent reimplementation of core. |
| Web UI | ui/ |
React 18 + Vite + TypeScript chat surface. |
Current ECAA profile IRI: https://w3id.org/ecaa/v0.2. The active spec files are in docs/ecaa-spec/: v0.2.md, ecaa-v0.2.ttl, ecaa-v0.2.shacl.ttl, and ecaa-v0.2.jsonld.
Linux x86-64 is the primary supported target. macOS works for dev. Windows requires WSL2.
# 1. System tools
sudo apt-get install -y build-essential pkg-config libssl-dev mold git curl # Debian/Ubuntu
# or: sudo dnf install -y @development-tools openssl-devel mold git curl # Fedora/RHEL
# or: brew install mold openssl@3 pkg-config # macOS
# 2. Rust toolchain (auto-installs the pinned channel from rust-toolchain.toml)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
# 3. Cargo dev tools
cargo install --locked cargo-nextest cargo-hakari
# 4. Node.js 20+ (for the UI + Playwright)
# install per your platform; verify: node --version # ≥ 20
# 5. Build everything
make build # Rust workspace (debug)
make install # binaries to ~/.cargo/bin
(cd ui && npm install)make doctor prints toolchain readiness; make help lists targets.
Two terminals:
# terminal A — chat server on :3000
make dev-server
# terminal B — Vite dev UI on :5173 (proxies /api/* to :3000)
make dev-uiThe chat surface boots in offline mode without an API key (the UI renders but assistant turns are mocked). For LLM-mediated chat:
export ECAA_ANTHROPIC_API_KEY=<your key>
make dev-serverFor local production configuration, start from .env.example. Its active defaults are loopback/local and durable under $HOME/.ecaa-workflow; live API, eval, debug, AWS, SLURM, and external-validator knobs are commented until deliberately enabled.
Smoke-test the compiler against a bundled scenario:
ecaa-workflow intake \
--input testdata/scenarios/01-bulk-rnaseq-ibd/request.md \
--output /tmp/ibd-package
ecaa-workflow dag --package /tmp/ibd-packagemake test # cargo test --workspace
make test-ui # Vitest + axe a11y
make check # test + tsc --noEmit
make e2e-playwright # mocked Playwright tier- Compiler is synchronous.
tokiois allowed inserver,conversation, andcli(forserveonly). Never incoreorharness. Harness usesureq(sync). - Deterministic output. Emitted packages are byte-reproducible. Use
BTreeMap, notHashMap. Avoid timestamps and random IDs outsideuuid_short(). - LLM as UX shim. Closed tool vocabulary (
Tool::COUNTasserted at compile time). High-impact actions are gated by deterministic server state, not LLM inference. - Confirmation discipline.
emit_packagereturnsPreconditionFailureunlesssession.user_confirmed == true. The button click is a server-side action the LLM observes only viaget_session_state. - ECAA conformance. Every normal emitted package carries the eight required ECAA v0.2 sidecars:
runtime/intake-conversation.jsonl,runtime/decisions.jsonl,runtime/validation-reports.jsonl,runtime/proofs.jsonl,runtime/claim-verification.json,runtime/verifier-decisions.jsonl,runtime/assumptions.jsonl, andruntime/audit-proof-report.json. There is no reduced ECAA mode switch;ECAA_ABLATE_*flags are the only code-backed way to emit non-conformant control artifacts.
config/ is the source of truth for modalities, archetypes, atoms, compute profiles, gene panels, plot affordances, and downstream-policy contracts. config/archetypes/ and config/stage-atoms/ carry their own READMEs. Runtime environment variables are catalogued in .env.example.
User guide: USERS.md. Methods summary: METHODS.md. Contributor guide: CONTRIBUTING.md. ECAA spec: docs/ecaa-spec/.
Apache-2.0 — see LICENSE.