From 8408429274a5ef16a907ffca7bbd8fd8d2dac70c Mon Sep 17 00:00:00 2001 From: Yvette Carlisle Date: Tue, 23 Jun 2026 03:55:48 +0800 Subject: [PATCH] {"schema":"decodex/commit/1","summary":"Add P3 competitor-strength absorption closeout report","authority":"XY-1072"} --- README.md | 23 ++- ...competitor-strength-absorption-report.json | 182 ++++++++++++++++++ .../tests/real_world_job_benchmark.rs | 146 ++++++++++++++ ...3-competitor-strength-absorption-report.md | 150 +++++++++++++++ docs/evidence/benchmarking/index.md | 1 + docs/log.md | 8 + 6 files changed, 504 insertions(+), 6 deletions(-) create mode 100644 apps/elf-eval/fixtures/report_snapshots/2026-06-23-p3-competitor-strength-absorption-report.json create mode 100644 docs/evidence/benchmarking/2026-06-23-p3-competitor-strength-absorption-report.md diff --git a/README.md b/README.md index 954fb64..fd00959 100644 --- a/README.md +++ b/README.md @@ -299,6 +299,15 @@ provider-backed ELF evidence was required. comparison gates. This improves auditability only: no graph-memory parity, OpenViking trajectory win/tie/loss, hosted Zep, private-corpus, or provider-backed quality claim is made. +- P3 competitor-strength absorption closeout after XY-1072: the June 23 closeout + publishes a product-by-product absorption report for qmd, PageIndex/OpenKB, + mem0/OpenMemory, Letta, Graphiti/Zep, OpenViking, RAGFlow, GraphRAG, and LightRAG. + ELF is strongest at governed source-linked memory and knowledge authority, while + qmd replay/debug ergonomics, PageIndex/OpenKB tree/wiki artifacts, mem0/OpenMemory + history and UI/export, Letta core/archive, Graphiti/Zep temporal graph validity, + OpenViking trajectory, and graph/RAG citation/navigation remain optimization inputs + or typed blockers. The report makes P4 queue items inspectable but applies no + `decodex:queued:elf` label. - Operator-approved public-proxy addendum after XY-930: the June 19 follow-up runs `cargo make baseline-production-private-addendum` with a simulated/public-proxy production corpus manifest approved for this stage. The run records 12 documents, @@ -441,6 +450,7 @@ Detailed evidence and interpretation: - [mem0/OpenMemory and Letta Memory-History/Core-Archive Adapter Report - June 22, 2026](docs/evidence/benchmarking/2026-06-22-mem0-openmemory-letta-memory-history-core-archive-report.md) - [Temporal and Trajectory Adapter Coverage Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-temporal-trajectory-adapter-coverage-report.md) - [Graph/RAG Adapter Matrix Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md) +- [P3 Competitor-Strength Absorption Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-p3-competitor-strength-absorption-report.md) - [Live Baseline Benchmark Runbook](docs/runbook/benchmarking/live_baseline_benchmark.md) - [Single-User Production Runbook](docs/runbook/single_user_production.md) - Benchmark contract: @@ -536,6 +546,7 @@ Detailed comparison, mechanism-level analysis, and source map: - [mem0/OpenMemory and Letta Memory-History/Core-Archive Adapter Report - June 22, 2026](docs/evidence/benchmarking/2026-06-22-mem0-openmemory-letta-memory-history-core-archive-report.md) - [Temporal and Trajectory Adapter Coverage Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-temporal-trajectory-adapter-coverage-report.md) - [Graph/RAG Adapter Matrix Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md) +- [P3 Competitor-Strength Absorption Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-p3-competitor-strength-absorption-report.md) - [Live Baseline Benchmark Runbook](docs/runbook/benchmarking/live_baseline_benchmark.md) - [Real-World Agent Memory Benchmark](docs/runbook/benchmarking/real_world_agent_memory_benchmark.md) - [External Memory Improvement Plan](docs/evidence/external_memory/external_memory_improvement_plan.md) @@ -554,12 +565,12 @@ Report - June 20, 2026, and the Live Knowledge-Page Rebuild/Lint Report - June 2 2026; June 22 adds the P1 Memory Authority Closeout Report, P2 Knowledge Workspace PageIndex/OpenKB Closeout Report, PageIndex/OpenKB Same-Corpus Adapter Report, and mem0/OpenMemory and Letta Memory-History/Core-Archive Adapter Report; -June 23 adds the Temporal and Trajectory Adapter Coverage Report and the Graph/RAG -Adapter Matrix Report after the June 19 XY-930 operator-approved public-proxy -production addendum and service-native Dreaming readback, the qmd debug-ergonomics -Dreaming retest, the June 17 competitor-strength closeout, and the June 16 temporal -reconciliation, live consolidation self-check, proactive-brief, and scheduled-memory -scoring evidence. +June 23 adds the Temporal and Trajectory Adapter Coverage Report, the Graph/RAG +Adapter Matrix Report, and the P3 Competitor-Strength Absorption Report after the +June 19 XY-930 operator-approved public-proxy production addendum and service-native +Dreaming readback, the qmd debug-ergonomics Dreaming retest, the June 17 +competitor-strength closeout, and the June 16 temporal reconciliation, live +consolidation self-check, proactive-brief, and scheduled-memory scoring evidence. ## Documentation diff --git a/apps/elf-eval/fixtures/report_snapshots/2026-06-23-p3-competitor-strength-absorption-report.json b/apps/elf-eval/fixtures/report_snapshots/2026-06-23-p3-competitor-strength-absorption-report.json new file mode 100644 index 0000000..bd9cd5b --- /dev/null +++ b/apps/elf-eval/fixtures/report_snapshots/2026-06-23-p3-competitor-strength-absorption-report.json @@ -0,0 +1,182 @@ +{ + "schema": "elf.p3_competitor_strength_absorption_report/v1", + "authority": "XY-1072", + "phase": "P3 competitor-strength adapters closeout", + "generated_at": "2026-06-23T00:00:00Z", + "report_markdown": "docs/evidence/benchmarking/2026-06-23-p3-competitor-strength-absorption-report.md", + "self_assessment": { + "verdict": "pass_with_p4_queue_ready_after_main_thread_acceptance", + "strongest_at": "ELF is strongest at governed, source-linked, reviewable memory and knowledge authority across Source Library, Memory Authority, Knowledge Workspace, graph-lite reports, Dreaming review queue, and recall/debug readback.", + "p4_queue_ready_after_main_thread_acceptance": true, + "p4_queued_label_applied": false, + "typed_non_pass_states_are_not_wins": true + }, + "rerun_evidence": [ + { + "command": "cargo make real-world-memory-pageindex-openkb", + "status": "pass", + "artifact_json": "tmp/real-world-memory/pageindex-openkb/report.json", + "result": "2 jobs, 0 pass, 0 wrong_result, 0 incomplete, 2 blocked" + }, + { + "command": "cargo make real-world-memory-mem0-openmemory-letta", + "status": "pass", + "artifact_json": "tmp/real-world-memory/mem0-openmemory-letta/report.json", + "result": "4 jobs, 1 pass, 0 wrong_result, 0 incomplete, 3 blocked" + }, + { + "command": "cargo make real-world-memory-context-trajectory", + "status": "pass", + "artifact_json": "tmp/real-world-memory/context-trajectory/report.json", + "result": "3 jobs, 0 pass, 0 wrong_result, 0 incomplete, 3 blocked" + }, + { + "command": "cargo make real-world-memory-graph-rag", + "status": "pass", + "artifact_json": "tmp/real-world-memory/graph-rag/report.json", + "result": "5 jobs, 0 pass, 1 wrong_result, 1 incomplete, 3 blocked" + } + ], + "product_strengths": [ + { + "product": "qmd", + "current_status": "mixed", + "absorbed_by_elf": "ELF recall/debug now exposes trace hydration, replay commands, candidate-drop visibility, and selected-but-not-narrated evidence in the operator-debug slice.", + "remains_stronger_elsewhere": "qmd still has the default top-k JSON artifact and short local CLI replay edge; expansion, dense/sparse, fusion, and rerank attribution parity is not proven.", + "blocked_or_missing_adapter": "Comparable qmd-style immediate candidate replay with expansion, fusion, rerank, and dropped-candidate details.", + "evidence_report": "docs/evidence/benchmarking/2026-06-19-qmd-debug-ergonomics-dreaming-retest-report.md", + "p4_queue_item": "qmd_candidate_replay_parity" + }, + { + "product": "VectifyAI PageIndex", + "current_status": "blocked", + "absorbed_by_elf": "ELF Source Library has long-document source records, hydrated excerpts, source refs, and explicit same-corpus PageIndex blocker requirements.", + "remains_stronger_elsewhere": "PageIndex remains the reference for vectorless long-document tree retrieval and PageIndex MCP direction.", + "blocked_or_missing_adapter": "Contained PageIndex tree artifact, cited node paths, traversal output, MCP readback, and setup/runtime metadata mapped to ELF source ids.", + "evidence_report": "docs/evidence/benchmarking/2026-06-22-pageindex-openkb-same-corpus-adapter-report.md", + "p4_queue_item": "source_library_tree_and_wiki_adapters" + }, + { + "product": "VectifyAI OpenKB", + "current_status": "blocked", + "absorbed_by_elf": "ELF Knowledge Workspace has source-linked project/entity/concept/issue pages, stale lint, watch/rebuild, and version-diff readback.", + "remains_stronger_elsewhere": "OpenKB remains the reference for compiled wiki export, saved exploration, concept/entity indexes, lint, watch, and recompile workflow.", + "blocked_or_missing_adapter": "Contained OpenKB wiki export, entity/concept index export, lint output, saved exploration state, and watch/recompile trace mapped to ELF source ids.", + "evidence_report": "docs/evidence/benchmarking/2026-06-22-pageindex-openkb-same-corpus-adapter-report.md", + "p4_queue_item": "source_library_tree_and_wiki_adapters" + }, + { + "product": "mem0/OpenMemory", + "current_status": "split_pass_and_blocked", + "absorbed_by_elf": "The P3 slice maps mem0 SDK Memory.history, scoped search, and local get_all export-style output to source ids and keeps OpenMemory product evidence separate.", + "remains_stronger_elsewhere": "mem0 remains stronger on explicit local SDK ADD, UPDATE, DELETE history readback; OpenMemory remains the product UI/export reference.", + "blocked_or_missing_adapter": "OpenMemory product container, app database export, browser/API/export-helper readback, hosted Platform evidence, and optional graph memory remain unproven.", + "evidence_report": "docs/evidence/benchmarking/2026-06-22-mem0-openmemory-letta-memory-history-core-archive-report.md", + "p4_queue_item": "memory_history_export_and_core_archive" + }, + { + "product": "Letta", + "current_status": "blocked", + "absorbed_by_elf": "The P3 slice names ELF core-block and archival source ids that a contained Letta export/readback must map before scoring.", + "remains_stronger_elsewhere": "Letta remains the reference for core/archive memory product modeling and export/readback shape.", + "blocked_or_missing_adapter": "Exported Letta core block JSON, archival passage/readback/search JSON, visibility/provenance metadata, and source ids.", + "evidence_report": "docs/evidence/benchmarking/2026-06-22-mem0-openmemory-letta-memory-history-core-archive-report.md", + "p4_queue_item": "memory_history_export_and_core_archive" + }, + { + "product": "Graphiti/Zep", + "current_status": "blocked", + "absorbed_by_elf": "The Graphiti/Zep fixture now names current facts, historical facts, provider boundary evidence, and the blocked trace stage.", + "remains_stronger_elsewhere": "Graphiti/Zep remains the temporal graph validity reference; hosted Zep and provider-backed graph quality are not proven locally.", + "blocked_or_missing_adapter": "Provider-backed Graphiti search output that maps current and historical facts to validity windows and same-corpus source ids.", + "evidence_report": "docs/evidence/benchmarking/2026-06-23-temporal-trajectory-adapter-coverage-report.md", + "p4_queue_item": "temporal_trajectory_graph_rag_adapters" + }, + { + "product": "OpenViking", + "current_status": "blocked", + "absorbed_by_elf": "The context-trajectory fixtures expose same-corpus, hierarchy, recursive-expansion, rejected-sibling, decoy, and comparison gates as typed blockers.", + "remains_stronger_elsewhere": "OpenViking remains the reference for filesystem-like context URIs, hierarchy selection, staged retrieval trajectory, and recursive expansion.", + "blocked_or_missing_adapter": "Comparable same-corpus staged artifacts, selected hierarchy nodes, rejected siblings or decoys, pruned branches, and expansion paths.", + "evidence_report": "docs/evidence/benchmarking/2026-06-23-temporal-trajectory-adapter-coverage-report.md", + "p4_queue_item": "temporal_trajectory_graph_rag_adapters" + }, + { + "product": "RAGFlow", + "current_status": "blocked_or_not_encoded", + "absorbed_by_elf": "The adapter matrix turns RAGFlow retrieval, citation, navigation, stale-source, faithfulness, and knowledge-compilation expectations into explicit rows.", + "remains_stronger_elsewhere": "RAGFlow remains a document-processing and RAG workflow reference; no same-corpus quality pass exists.", + "blocked_or_missing_adapter": "Answer text and selected reference chunks with document ids, chunk ids, content, metadata, and stale-source outputs mapped to evidence ids.", + "evidence_report": "docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md", + "p4_queue_item": "temporal_trajectory_graph_rag_adapters" + }, + { + "product": "GraphRAG", + "current_status": "blocked_or_not_encoded", + "absorbed_by_elf": "The adapter matrix names output-table, citation, graph/community navigation, faithfulness, and stale-source requirements without claiming parity.", + "remains_stronger_elsewhere": "GraphRAG remains the reference for graph-oriented retrieval, community reports, and graph summary synthesis.", + "blocked_or_missing_adapter": "Mapped documents, text_units, communities, reports, entities, relationships, local-search answers, and unsupported/stale claim lint.", + "evidence_report": "docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md", + "p4_queue_item": "temporal_trajectory_graph_rag_adapters" + }, + { + "product": "LightRAG", + "current_status": "incomplete_or_not_encoded", + "absorbed_by_elf": "The adapter matrix records context/source reference, retrieval, navigation, faithfulness, stale-source, and knowledge-compilation coverage gaps.", + "remains_stronger_elsewhere": "LightRAG remains the lightweight graph/RAG architecture reference; context export is incomplete in the current evidence.", + "blocked_or_missing_adapter": "Opt-in Docker API output with context, file paths, snippets, source references, and answer checking mapped to evidence ids.", + "evidence_report": "docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md", + "p4_queue_item": "temporal_trajectory_graph_rag_adapters" + } + ], + "p4_optimization_queue": [ + { + "key": "qmd_candidate_replay_parity", + "priority": "P0", + "ready_after_main_thread_acceptance": true, + "queued_label_applied": false, + "scope": "Emit comparable immediate candidate replay artifacts with expansion, dense/sparse, fusion, rerank, dropped evidence, and one-command replay lines." + }, + { + "key": "adapter_outcome_grammar_and_metrics", + "priority": "P0", + "ready_after_main_thread_acceptance": true, + "queued_label_applied": false, + "scope": "Harden public comparison grammar, typed outcomes, expected evidence recall, irrelevant context ratio, unsupported-claim counts, and resource metrics." + }, + { + "key": "source_library_tree_and_wiki_adapters", + "priority": "P1", + "ready_after_main_thread_acceptance": true, + "queued_label_applied": false, + "scope": "Materialize PageIndex tree artifacts and OpenKB wiki/index/lint/watch outputs over the same corpus." + }, + { + "key": "memory_history_export_and_core_archive", + "priority": "P1", + "ready_after_main_thread_acceptance": true, + "queued_label_applied": false, + "scope": "Harden mem0/OpenMemory history/export comparison and Letta core/archive export/readback mapping." + }, + { + "key": "temporal_trajectory_graph_rag_adapters", + "priority": "P1", + "ready_after_main_thread_acceptance": true, + "queued_label_applied": false, + "scope": "Materialize Graphiti/Zep temporal validity, OpenViking trajectory, and RAGFlow/GraphRAG/LightRAG citation/navigation artifacts." + } + ], + "claim_boundaries": { + "allowed": [ + "ELF is strongest at governed source-linked memory and knowledge authority in the checked-in evidence.", + "P3 absorbed competitor strengths into ELF-owned evidence surfaces, same-corpus blockers, and P4 optimization inputs.", + "The P4 optimization queue is ready for main-thread inspection after this closeout passes self-assessment." + ], + "not_allowed": [ + "Typed non-pass states are not wins.", + "Do not claim ELF broadly beats qmd, PageIndex, OpenKB, mem0/OpenMemory, Letta, Graphiti/Zep, OpenViking, RAGFlow, GraphRAG, or LightRAG.", + "Do not claim private-corpus, hosted, provider-backed, UI/export, graph/RAG, or core/archive parity from fixture-only, blocked, incomplete, wrong-result, or not-encoded evidence.", + "Do not apply decodex:queued:elf to a P4 issue until the main thread accepts the P3 closeout." + ] + } +} diff --git a/apps/elf-eval/tests/real_world_job_benchmark.rs b/apps/elf-eval/tests/real_world_job_benchmark.rs index 25025e1..8f1e3a2 100644 --- a/apps/elf-eval/tests/real_world_job_benchmark.rs +++ b/apps/elf-eval/tests/real_world_job_benchmark.rs @@ -291,6 +291,10 @@ fn graph_rag_adapter_matrix_report_json_path() -> Result { report_snapshot_path("2026-06-23-graph-rag-adapter-matrix-report.json") } +fn p3_competitor_strength_absorption_report_json_path() -> Result { + report_snapshot_path("2026-06-23-p3-competitor-strength-absorption-report.json") +} + fn operator_approved_public_proxy_private_addendum_report_json_path() -> Result { report_snapshot_path( "2026-06-19-operator-approved-public-proxy-production-private-addendum.json", @@ -377,6 +381,14 @@ fn graph_rag_adapter_matrix_report_markdown_path() -> Result { .join("2026-06-23-graph-rag-adapter-matrix-report.md")) } +fn p3_competitor_strength_absorption_report_markdown_path() -> Result { + Ok(workspace_root()? + .join("docs") + .join("evidence") + .join("benchmarking") + .join("2026-06-23-p3-competitor-strength-absorption-report.md")) +} + fn graph_topic_map_report_markdown_path() -> Result { Ok(workspace_root()? .join("docs") @@ -4550,6 +4562,140 @@ fn graph_rag_adapter_matrix_report_preserves_no_parity_claims() -> Result<()> { Ok(()) } +#[test] +fn p3_competitor_strength_absorption_report_preserves_claim_boundaries() -> Result<()> { + let report = serde_json::from_str::(&fs::read_to_string( + p3_competitor_strength_absorption_report_json_path()?, + )?)?; + let markdown = fs::read_to_string(p3_competitor_strength_absorption_report_markdown_path()?)?; + let benchmarking_index = fs::read_to_string(benchmarking_index_path()?)?; + let readme = fs::read_to_string(readme_path()?)?; + + assert_eq!( + report.pointer("/schema").and_then(Value::as_str), + Some("elf.p3_competitor_strength_absorption_report/v1") + ); + assert_eq!(report.pointer("/authority").and_then(Value::as_str), Some("XY-1072")); + assert_eq!( + report.pointer("/self_assessment/verdict").and_then(Value::as_str), + Some("pass_with_p4_queue_ready_after_main_thread_acceptance") + ); + assert_eq!( + report.pointer("/self_assessment/p4_queued_label_applied").and_then(Value::as_bool), + Some(false) + ); + assert_eq!( + report + .pointer("/self_assessment/typed_non_pass_states_are_not_wins") + .and_then(Value::as_bool), + Some(true) + ); + + let products = array_at(&report, "/product_strengths")?; + + for product in [ + "qmd", + "VectifyAI PageIndex", + "VectifyAI OpenKB", + "mem0/OpenMemory", + "Letta", + "Graphiti/Zep", + "OpenViking", + "RAGFlow", + "GraphRAG", + "LightRAG", + ] { + find_by_field(products, "/product", product)?; + } + + let qmd = find_by_field(products, "/product", "qmd")?; + let pageindex = find_by_field(products, "/product", "VectifyAI PageIndex")?; + let mem0 = find_by_field(products, "/product", "mem0/OpenMemory")?; + let graphiti = find_by_field(products, "/product", "Graphiti/Zep")?; + let lightrag = find_by_field(products, "/product", "LightRAG")?; + + assert_eq!(qmd.pointer("/current_status").and_then(Value::as_str), Some("mixed")); + assert!( + qmd.pointer("/remains_stronger_elsewhere") + .and_then(Value::as_str) + .is_some_and(|value| value.contains("top-k JSON")) + ); + assert_eq!(pageindex.pointer("/current_status").and_then(Value::as_str), Some("blocked")); + assert_eq!( + mem0.pointer("/current_status").and_then(Value::as_str), + Some("split_pass_and_blocked") + ); + assert_eq!(graphiti.pointer("/current_status").and_then(Value::as_str), Some("blocked")); + assert_eq!( + lightrag.pointer("/current_status").and_then(Value::as_str), + Some("incomplete_or_not_encoded") + ); + + let queue = array_at(&report, "/p4_optimization_queue")?; + + for key in [ + "qmd_candidate_replay_parity", + "adapter_outcome_grammar_and_metrics", + "source_library_tree_and_wiki_adapters", + "memory_history_export_and_core_archive", + "temporal_trajectory_graph_rag_adapters", + ] { + let item = find_by_field(queue, "/key", key)?; + + assert_eq!( + item.pointer("/ready_after_main_thread_acceptance").and_then(Value::as_bool), + Some(true) + ); + assert_eq!(item.pointer("/queued_label_applied").and_then(Value::as_bool), Some(false)); + } + + assert_product_queue_items_reference_queue(products, queue)?; + + assert!(array_contains_str( + &report, + "/claim_boundaries/not_allowed", + "Typed non-pass states are not wins." + )?); + assert!(array_contains_str( + &report, + "/claim_boundaries/not_allowed", + "Do not apply decodex:queued:elf to a P4 issue until the main thread accepts the P3 closeout." + )?); + assert!(markdown.contains("P3 is decision-ready for main-thread inspection")); + assert!(markdown.contains("Typed non-pass states are not wins")); + assert!(markdown.contains("No P4 issue receives `decodex:queued:elf`")); + assert!(benchmarking_index.contains("2026-06-23-p3-competitor-strength-absorption-report.md")); + assert!(readme.contains("P3 competitor-strength absorption closeout after XY-1072")); + assert!(readme.contains("`decodex:queued:elf` label")); + + Ok(()) +} + +fn assert_product_queue_items_reference_queue(products: &[Value], queue: &[Value]) -> Result<()> { + let queue_keys = queue + .iter() + .filter_map(|item| item.pointer("/key").and_then(Value::as_str)) + .collect::>(); + + for product in products { + let product_name = product + .pointer("/product") + .and_then(Value::as_str) + .ok_or_else(|| eyre::eyre!("product row is missing product name"))?; + let queue_item = product + .pointer("/p4_queue_item") + .and_then(Value::as_str) + .ok_or_else(|| eyre::eyre!("product {product_name} is missing p4_queue_item"))?; + + assert!( + queue_keys.contains(&queue_item), + "product {product_name} references missing P4 queue item {queue_item}" + ); + } + + Ok(()) +} + fn find_matrix_row<'a>(rows: &'a [Value], adapter: &str, dimension: &str) -> Result<&'a Value> { rows.iter() .find(|row| { diff --git a/docs/evidence/benchmarking/2026-06-23-p3-competitor-strength-absorption-report.md b/docs/evidence/benchmarking/2026-06-23-p3-competitor-strength-absorption-report.md new file mode 100644 index 0000000..1eb515b --- /dev/null +++ b/docs/evidence/benchmarking/2026-06-23-p3-competitor-strength-absorption-report.md @@ -0,0 +1,150 @@ +--- +type: Evidence +title: "P3 Competitor-Strength Absorption Report - June 23, 2026" +description: "P3 closeout report for competitor-strength absorption, remaining external strengths, typed blockers, and the P4 optimization queue." +resource: docs/evidence/benchmarking/2026-06-23-p3-competitor-strength-absorption-report.md +status: active +authority: evidence +owner: benchmarking +last_verified: 2026-06-23 +tags: + - docs + - evidence + - benchmarking + - p3-closeout +source_refs: + - apps/elf-eval/fixtures/report_snapshots/2026-06-23-p3-competitor-strength-absorption-report.json +code_refs: + - Makefile.toml + - apps/elf-eval/fixtures/real_world_external_adapters/pageindex_openkb/ + - apps/elf-eval/fixtures/real_world_external_adapters/mem0_openmemory_letta/ + - apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ + - apps/elf-eval/fixtures/real_world_memory/context_trajectory/ +related: + - docs/spec/agent_memory_knowledge_system_v1.md + - docs/evidence/benchmarking/2026-06-19-qmd-debug-ergonomics-dreaming-retest-report.md + - docs/evidence/benchmarking/2026-06-22-pageindex-openkb-same-corpus-adapter-report.md + - docs/evidence/benchmarking/2026-06-22-mem0-openmemory-letta-memory-history-core-archive-report.md + - docs/evidence/benchmarking/2026-06-23-temporal-trajectory-adapter-coverage-report.md + - docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md +drift_watch: + - docs/evidence/benchmarking/2026-06-23-p3-competitor-strength-absorption-report.md + - apps/elf-eval/fixtures/report_snapshots/2026-06-23-p3-competitor-strength-absorption-report.json + - docs/evidence/benchmarking/index.md + - README.md +--- +# P3 Competitor-Strength Absorption Report - June 23, 2026 + +Purpose: Close XY-1072 by publishing which competitor strengths ELF absorbed, which +remain stronger elsewhere, and which adapters are still blocked before P4 quality +hardening. +Status: evidence +Read this when: You need the P3 closeout answer for qmd, PageIndex/OpenKB, +mem0/OpenMemory, Letta, Graphiti/Zep, OpenViking, RAGFlow, GraphRAG, and LightRAG. +Not this document: A P4 queue action, hosted/private-corpus proof, or broad +ELF-over-every-competitor claim. +Inputs: The June 19 qmd report, June 22 PageIndex/OpenKB and +mem0/OpenMemory/Letta reports, June 23 temporal/trajectory report, June 23 graph/RAG +adapter matrix, and their focused rerun commands. + +## Executive Judgment + +P3 is decision-ready for main-thread inspection. It is not a P4 queue action. + +ELF is strongest at governed, source-linked, reviewable memory and knowledge +authority: Source Library records, Memory Authority correction/history, Knowledge +Workspace pages, graph-lite reports, Dreaming review queue, and recall/debug readback +are all typed, source-linked, and bounded. P3 absorbed competitor strengths by turning +them into ELF-owned evidence surfaces, same-corpus adapter blockers, and concrete P4 +optimization inputs. + +The competitor picture is still mixed. qmd keeps the default top-k JSON and short +local replay edge. PageIndex/OpenKB, OpenMemory UI/export, Letta core/archive, +Graphiti/Zep temporal validity, OpenViking trajectory, and graph/RAG citation or +navigation strengths remain blocked, incomplete, or not encoded until comparable +same-corpus artifacts exist. Typed non-pass states are not wins. + +No P4 issue receives `decodex:queued:elf` from this closeout. The queue below is +ready for main-thread inspection only after this report and validation evidence are +accepted. + +## Rerun Evidence + +| Command | Result | Evidence | +| --- | --- | --- | +| `cargo make real-world-memory-pageindex-openkb` | `pass` | PageIndex/OpenKB slice remains 2 jobs, 0 pass, 0 wrong_result, 0 incomplete, and 2 blocked. | +| `cargo make real-world-memory-mem0-openmemory-letta` | `pass` | mem0/OpenMemory/Letta slice remains 4 jobs, 1 pass, 0 wrong_result, 0 incomplete, and 3 blocked. | +| `cargo make real-world-memory-context-trajectory` | `pass` | OpenViking context-trajectory slice remains 3 jobs, 0 pass, 0 wrong_result, 0 incomplete, and 3 blocked. | +| `cargo make real-world-memory-graph-rag` | `pass` | Representative graph/RAG slice remains 5 jobs, 0 pass, 1 wrong_result, 1 incomplete, and 3 blocked; the adapter matrix records 0 pass rows. | + +Checked-in closeout snapshot: + +- `apps/elf-eval/fixtures/report_snapshots/2026-06-23-p3-competitor-strength-absorption-report.json` + +## Product Strengths And ELF Response + +| Product/reference | What ELF absorbed | What remains stronger elsewhere or blocked | P4 optimization input | +| --- | --- | --- | --- | +| qmd | ELF recall/debug exposes trace hydration, replay commands, candidate-drop visibility, and selected-but-not-narrated evidence in the operator-debug slice. | qmd still has the default top-k JSON artifact and short local CLI replay edge; expansion, dense/sparse, fusion, and rerank attribution parity is not proven. | `qmd_candidate_replay_parity` | +| VectifyAI PageIndex | ELF Source Library has long-document source records, hydrated excerpts, source refs, and explicit same-corpus PageIndex blocker requirements. | PageIndex remains the vectorless long-document tree retrieval and PageIndex MCP reference until tree artifacts, cited node paths, traversal output, and MCP readback map to ELF source ids. | `source_library_tree_and_wiki_adapters` | +| VectifyAI OpenKB | ELF Knowledge Workspace has source-linked project/entity/concept/issue pages, stale lint, watch/rebuild, and version-diff readback. | OpenKB remains the compiled wiki, saved exploration, concept/entity index, lint, watch, and recompile workflow reference until contained exports map to ELF source ids. | `source_library_tree_and_wiki_adapters` | +| mem0/OpenMemory | The P3 slice maps mem0 SDK `Memory.history`, scoped search, and local `get_all` export-style output to source ids while keeping OpenMemory product evidence separate. | mem0 remains stronger on explicit local SDK ADD, UPDATE, DELETE history readback; OpenMemory UI/export remains blocked until product-container and app-database exports map same-corpus rows. | `memory_history_export_and_core_archive` | +| Letta | The P3 slice names ELF core-block and archival source ids that a contained Letta export/readback must map before scoring. | Letta remains the core/archive memory model and export/readback reference until exported core block JSON, archival passage/readback/search JSON, visibility/provenance metadata, and source ids exist. | `memory_history_export_and_core_archive` | +| Graphiti/Zep | The fixture now names current facts, historical facts, provider-boundary evidence, and the blocked trace stage. | Graphiti/Zep remains the temporal graph validity reference; hosted Zep and provider-backed graph quality are not proven locally. | `temporal_trajectory_graph_rag_adapters` | +| OpenViking | Context-trajectory fixtures expose same-corpus, hierarchy, recursive-expansion, rejected-sibling, decoy, and comparison gates as typed blockers. | OpenViking remains the filesystem-like URI, hierarchy selection, staged retrieval trajectory, and recursive expansion reference until comparable staged artifacts exist. | `temporal_trajectory_graph_rag_adapters` | +| RAGFlow | The adapter matrix turns retrieval, citation, navigation, stale-source, faithfulness, and knowledge-compilation expectations into explicit rows. | RAGFlow remains blocked or not encoded until answers and selected reference chunks map document ids, chunk ids, content, metadata, and stale-source outputs to evidence ids. | `temporal_trajectory_graph_rag_adapters` | +| GraphRAG | The adapter matrix names output-table, citation, graph/community navigation, faithfulness, and stale-source requirements without claiming parity. | GraphRAG remains blocked or not encoded until documents, text units, communities, reports, entities, relationships, local-search answers, and unsupported/stale claim lint map to evidence ids. | `temporal_trajectory_graph_rag_adapters` | +| LightRAG | The adapter matrix records context/source reference, retrieval, navigation, faithfulness, stale-source, and knowledge-compilation coverage gaps. | LightRAG remains incomplete or not encoded until Docker API output exposes context, file paths, snippets, source references, and answer checking mapped to evidence ids. | `temporal_trajectory_graph_rag_adapters` | + +## What ELF Is Strongest At + +ELF's durable strength is not a single retrieval trick. It is governed memory change +control backed by source evidence: + +- Source material remains source material until an explicit reviewable memory path + promotes it. +- Memory changes have policy decisions, history, correction, rollback, and recall + debug readback. +- Knowledge pages are derived, cited, linted, rebuildable, and version-diffed. +- Graph-lite and Dreaming outputs stay source-backed and reviewable. +- Recall/debug surfaces show selected, dropped, stale, blocked, not-requested, and + reviewable context instead of hiding missing evidence behind a broad score. + +That makes ELF strongest as an integrated agent memory and knowledge authority +system. It does not make ELF stronger than each competitor on that competitor's own +specialty. + +## P4 Optimization Queue + +The P4 queue is ready for main-thread inspection after this closeout passes +self-assessment. No queue label is applied here. + +| Priority | Queue item | Scope | +| --- | --- | --- | +| P0 | `qmd_candidate_replay_parity` | Emit comparable immediate candidate replay artifacts with expansion, dense/sparse, fusion, rerank, dropped evidence, and one-command replay lines. | +| P0 | `adapter_outcome_grammar_and_metrics` | Harden public comparison grammar, typed outcomes, expected evidence recall, irrelevant context ratio, unsupported-claim counts, and resource metrics. | +| P1 | `source_library_tree_and_wiki_adapters` | Materialize PageIndex tree artifacts and OpenKB wiki/index/lint/watch outputs over the same corpus. | +| P1 | `memory_history_export_and_core_archive` | Harden mem0/OpenMemory history/export comparison and Letta core/archive export/readback mapping. | +| P1 | `temporal_trajectory_graph_rag_adapters` | Materialize Graphiti/Zep temporal validity, OpenViking trajectory, and RAGFlow/GraphRAG/LightRAG citation/navigation artifacts. | + +## Claim Boundaries + +Allowed: + +- ELF is strongest at governed source-linked memory and knowledge authority in the + checked-in evidence. +- P3 absorbed competitor strengths into ELF-owned evidence surfaces, same-corpus + blockers, and P4 optimization inputs. +- The P4 optimization queue is ready for main-thread inspection after this closeout + passes self-assessment. + +Not allowed: + +- Typed non-pass states are not wins. +- Do not claim ELF broadly beats qmd, PageIndex, OpenKB, mem0/OpenMemory, Letta, + Graphiti/Zep, OpenViking, RAGFlow, GraphRAG, or LightRAG. +- Do not claim private-corpus, hosted, provider-backed, UI/export, graph/RAG, or + core/archive parity from fixture-only, blocked, incomplete, wrong-result, or + not-encoded evidence. +- Do not apply `decodex:queued:elf` to a P4 issue until the main thread accepts the + P3 closeout. diff --git a/docs/evidence/benchmarking/index.md b/docs/evidence/benchmarking/index.md index d80e5bc..58cb526 100644 --- a/docs/evidence/benchmarking/index.md +++ b/docs/evidence/benchmarking/index.md @@ -55,3 +55,4 @@ Routes to: Benchmarking evidence concepts under `docs/evidence/benchmarking/`. - `2026-06-22-mem0-openmemory-letta-memory-history-core-archive-report.md`: mem0/OpenMemory and Letta Memory-History/Core-Archive Adapter Report - June 22, 2026; adds `cargo make real-world-memory-mem0-openmemory-letta`, maps mem0 SDK history/export outputs to source ids, preserves OpenMemory UI/export as a product blocker, preserves Letta core/archive readback as typed blockers, and makes no hosted/product parity claim. - `2026-06-23-temporal-trajectory-adapter-coverage-report.md`: Temporal and Trajectory Adapter Coverage Report - June 23, 2026; refreshes Graphiti/Zep temporal-validity and OpenViking context-trajectory adapter evidence with trace-stage typed blockers, source ids, and explicit no-parity boundaries. - `2026-06-23-graph-rag-adapter-matrix-report.md`: Graph/RAG Adapter Matrix Report - June 23, 2026; adds manifest-backed RAGFlow, GraphRAG, and LightRAG rows for retrieval, citation, navigation, stale-source behavior, answer faithfulness, and knowledge compilation while preserving 0 pass rows and no graph/RAG parity claim. +- `2026-06-23-p3-competitor-strength-absorption-report.md`: P3 Competitor-Strength Absorption Report - June 23, 2026; closes XY-1072 by naming which qmd, PageIndex/OpenKB, mem0/OpenMemory, Letta, Graphiti/Zep, OpenViking, RAGFlow, GraphRAG, and LightRAG strengths ELF absorbed, which remain stronger elsewhere or blocked, and which P4 optimization queue items are ready for main-thread inspection without applying a queue label. diff --git a/docs/log.md b/docs/log.md index 2ccf716..f3e305c 100644 --- a/docs/log.md +++ b/docs/log.md @@ -99,3 +99,11 @@ logs. plus a drift audit covering the new admin rebuild endpoint, changed/unchanged/ stale/blocked section output, stale-section/changed-claim/missing-citation/conflict classifications, and reviewable memory-candidate proposal routing. + +## 2026-06-23 + +- Added the P3 competitor-strength absorption closeout report for XY-1072, plus a + checked-in snapshot and guard test that keeps qmd, PageIndex/OpenKB, + mem0/OpenMemory, Letta, Graphiti/Zep, OpenViking, RAGFlow, GraphRAG, and LightRAG + claim boundaries explicit while leaving P4 queue labels unapplied pending + main-thread acceptance.