Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile.toml
Original file line number Diff line number Diff line change
Expand Up @@ -811,7 +811,7 @@ workspace = false
command = "bash"
args = [
"-lc",
"set -euo pipefail; docker compose -f docker-compose.baseline.yml run --build --rm baseline-runner bash scripts/real-world-live-adapters.sh",
"set -euo pipefail; lightrag_start=\"$(printenv ELF_LIGHTRAG_CONTEXT_START || true)\"; graphiti_start=\"$(printenv ELF_GRAPHITI_ZEP_SMOKE_START || true)\"; status=0; if [ \"$lightrag_start\" = \"1\" ]; then docker compose -f docker-compose.baseline.yml --profile lightrag up -d lightrag; fi; if [ \"$graphiti_start\" = \"1\" ]; then docker compose -f docker-compose.baseline.yml --profile graphiti-zep up -d graphiti-falkordb; fi; docker compose -f docker-compose.baseline.yml run --build --rm -e ELF_REAL_WORLD_LIVE_ENABLE_RAGFLOW -e ELF_REAL_WORLD_LIVE_ENABLE_LIGHTRAG -e ELF_REAL_WORLD_LIVE_ENABLE_GRAPHRAG -e ELF_REAL_WORLD_LIVE_ENABLE_GRAPHITI_ZEP -e ELF_REAL_WORLD_LIVE_ENABLE_GRAPHIFY -e ELF_RAGFLOW_SMOKE_START -e ELF_RAGFLOW_SMOKE_ACCEPT_RESOURCE_ENVELOPE -e ELF_RAGFLOW_SMOKE_ALLOW_ARM -e ELF_RAGFLOW_SMOKE_PULL_IMAGE -e ELF_RAGFLOW_SMOKE_CLEANUP -e ELF_RAGFLOW_SMOKE_DEVICE -e ELF_RAGFLOW_API_PORT -e ELF_RAGFLOW_API_BASE -e ELF_RAGFLOW_API_KEY -e RAGFLOW_API_KEY -e ELF_RAGFLOW_SMOKE_STARTUP_ATTEMPTS -e ELF_RAGFLOW_SMOKE_STARTUP_INTERVAL_SECONDS -e ELF_RAGFLOW_SMOKE_COMPOSE_TIMEOUT_SECONDS -e ELF_RAGFLOW_REPO_URL -e ELF_RAGFLOW_REF -e ELF_RAGFLOW_IMAGE -e ELF_RAGFLOW_COMPOSE_PROJECT -e ELF_LIGHTRAG_CONTEXT_START -e ELF_LIGHTRAG_API_BASE -e ELF_LIGHTRAG_ADAPTER_ID -e ELF_LIGHTRAG_ADAPTER_NAME -e ELF_LIGHTRAG_STARTUP_ATTEMPTS -e ELF_LIGHTRAG_STARTUP_INTERVAL_SECONDS -e ELF_LIGHTRAG_INDEX_ATTEMPTS -e ELF_LIGHTRAG_INDEX_INTERVAL_SECONDS -e ELF_GRAPHRAG_SMOKE_RUN -e ELF_GRAPHRAG_SMOKE_WORK_DIR -e ELF_GRAPHRAG_SMOKE_INSTALL -e ELF_GRAPHRAG_VERSION -e ELF_GRAPHRAG_PACKAGE -e ELF_GRAPHRAG_REF -e ELF_GRAPHRAG_CHAT_MODEL -e ELF_GRAPHRAG_EMBEDDING_MODEL -e ELF_GRAPHRAG_API_BASE -e ELF_GRAPHRAG_API_KEY -e ELF_GRAPHRAG_INDEX_METHOD -e ELF_GRAPHRAG_QUERY_METHOD -e ELF_GRAPHRAG_TIMEOUT_SECONDS -e ELF_GRAPHRAG_MAX_DOCS -e ELF_GRAPHRAG_MAX_INPUT_CHARS -e ELF_GRAPHITI_ZEP_SMOKE_START -e ELF_GRAPHITI_ZEP_SMOKE_RUN -e ELF_GRAPHITI_ZEP_SMOKE_WORK_DIR -e ELF_GRAPHITI_ZEP_SMOKE_INSTALL -e ELF_GRAPHITI_ZEP_VERSION -e ELF_GRAPHITI_ZEP_PACKAGE -e ELF_GRAPHITI_ZEP_REF -e ELF_GRAPHITI_ZEP_API_BASE -e ELF_GRAPHITI_ZEP_API_KEY -e ELF_GRAPHITI_ZEP_LLM_MODEL -e ELF_GRAPHITI_ZEP_EMBEDDING_MODEL -e ELF_GRAPHITI_ZEP_FALKORDB_HOST -e ELF_GRAPHITI_ZEP_FALKORDB_PORT -e ELF_GRAPHITI_ZEP_FALKORDB_DATABASE -e ELF_GRAPHITI_ZEP_TIMEOUT_SECONDS -e ELF_GRAPHITI_ZEP_STARTUP_ATTEMPTS -e ELF_GRAPHITI_ZEP_STARTUP_INTERVAL_SECONDS -e ELF_GRAPHIFY_SMOKE_RUN -e ELF_GRAPHIFY_SMOKE_WORK_DIR -e ELF_GRAPHIFY_SMOKE_INSTALL -e ELF_GRAPHIFY_PACKAGE -e ELF_GRAPHIFY_REF -e ELF_GRAPHIFY_TIMEOUT_SECONDS -e ELF_GRAPHIFY_QUERY_BUDGET baseline-runner bash scripts/real-world-live-adapters.sh || status=$?; if [ \"$lightrag_start\" = \"1\" ]; then docker compose -f docker-compose.baseline.yml --profile lightrag stop lightrag lightrag-mock-provider >/dev/null 2>&1 || true; fi; if [ \"$graphiti_start\" = \"1\" ]; then docker compose -f docker-compose.baseline.yml --profile graphiti-zep stop graphiti-falkordb >/dev/null 2>&1 || true; fi; exit \"$status\"",
]


Expand Down
22 changes: 17 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,10 +161,18 @@ provider-backed ELF evidence was required.
1 incomplete, 2 blocked, and 12 not_encoded jobs.
- Expanded adapter-pack coverage after XY-834: the real-world external adapter
manifest now includes `research_gate` records for RAGFlow, LightRAG, GraphRAG,
Graphiti/Zep, Letta, LangGraph, nanograph, llm-wiki, gbrain, graphify, and deeper
qmd/OpenViking profiles. These records carry source/setup/runtime/resource/retry
metadata and typed `blocked`, `incomplete`, `wrong_result`, or `not_encoded` states;
they are not fixture-backed or live adapter pass evidence.
Graphiti/Zep, Letta, LangGraph, nanograph, llm-wiki, gbrain, and deeper
qmd/OpenViking profiles, while graphify now has a scored tiny Docker smoke record.
These records carry source/setup/runtime/resource/retry metadata and typed
`blocked`, `incomplete`, `wrong_result`, or `not_encoded` states; they are not
fixture-backed or live adapter pass evidence.
- Graph/RAG scored-smoke promotion after XY-900: RAGFlow, LightRAG, GraphRAG,
Graphiti/Zep, and graphify smokes now emit scored or typed non-pass
`real_world_job` adapter reports when run. graphify currently reaches a tiny Docker
graph/report smoke and scores `wrong_result`; the other in-scope projects remain
typed blocked or incomplete without explicit service, resource, or provider setup.
These reports preserve the smoke-only boundary and do not create an ELF win claim
against graph/RAG strengths.
- The benchmark runner and report publisher are checked in and Docker-isolated:
`cargo make baseline-live-docker`, `cargo make baseline-backfill-docker`,
`cargo make baseline-production-private-addendum`,
Expand All @@ -183,6 +191,7 @@ Detailed evidence and interpretation:
- [Real-World Comparison Report - June 10, 2026](docs/guide/benchmarking/2026-06-10-real-world-comparison-report.md)
- [Live Real-World Adapter Sweep Report - June 10, 2026](docs/guide/benchmarking/2026-06-10-live-real-world-sweep-report.md)
- [Post-Adapter Production Adoption Refresh - June 10, 2026](docs/guide/benchmarking/2026-06-10-production-adoption-refresh.md)
- [Graph/RAG Scored Smoke Adapter Report - June 11, 2026](docs/guide/benchmarking/2026-06-11-graph-rag-scored-smoke-adapter-report.md)
- [Live Baseline Benchmark Runbook](docs/guide/benchmarking/live_baseline_benchmark.md)
- [Single-User Production Runbook](docs/guide/single_user_production.md)
- Benchmark contract:
Expand Down Expand Up @@ -254,6 +263,9 @@ Detailed comparison, mechanism-level analysis, and source map:
- [Real-World Comparison Report - June 10, 2026](docs/guide/benchmarking/2026-06-10-real-world-comparison-report.md)
- [Live Real-World Adapter Sweep Report - June 10, 2026](docs/guide/benchmarking/2026-06-10-live-real-world-sweep-report.md)
- [Post-Adapter Production Adoption Refresh - June 10, 2026](docs/guide/benchmarking/2026-06-10-production-adoption-refresh.md)
- [Competitor Strength Evidence Matrix - June 11, 2026](docs/guide/benchmarking/2026-06-11-competitor-strength-evidence-matrix.md)
- [Temporal History Competitor Gap Report - June 11, 2026](docs/guide/benchmarking/2026-06-11-temporal-history-competitor-gap-report.md)
- [Graph/RAG Scored Smoke Adapter Report - June 11, 2026](docs/guide/benchmarking/2026-06-11-graph-rag-scored-smoke-adapter-report.md)
- [Live Baseline Benchmark Runbook](docs/guide/benchmarking/live_baseline_benchmark.md)
- [Real-World Agent Memory Benchmark](docs/guide/benchmarking/real_world_agent_memory_benchmark.md)
- [External Memory Improvement Plan](docs/guide/research/external_memory_improvement_plan.md)
Expand All @@ -263,7 +275,7 @@ Detailed comparison, mechanism-level analysis, and source map:
- [Real-World Benchmark Dimension Research Run](docs/research/2026-06-09-xy-841-external-memory-benchmark-dimensions.json)
- [RAG/Graph Adapter Feasibility Research Run](docs/research/2026-06-10-xy-882-rag-graph-adapter-feasibility.json)

Latest real-world benchmark report: June 10, 2026. Latest external research refresh:
Latest real-world benchmark report: June 11, 2026. Latest external research refresh:
June 10, 2026.

## Documentation
Expand Down
Loading