diff --git a/README.md b/README.md index 173714aa..11b5fe2d 100644 --- a/README.md +++ b/README.md @@ -178,6 +178,7 @@ Detailed comparison, mechanism-level analysis, and source map: - [Live Baseline Benchmark Report - June 9, 2026](docs/guide/benchmarking/2026-06-09-live-baseline-report.md) - [Live Baseline Benchmark Runbook](docs/guide/benchmarking/live_baseline_benchmark.md) +- [External Memory Improvement Plan](docs/guide/research/external_memory_improvement_plan.md) - [Detailed External Comparison](docs/guide/research/comparison_external_projects.md) - [Research Projects Inventory](docs/guide/research/research_projects_inventory.md) - [Agent Memory Selection Research Run](docs/research/2026-06-08-agent-memory-selection.json) diff --git a/docs/guide/research/external_memory_improvement_plan.md b/docs/guide/research/external_memory_improvement_plan.md new file mode 100644 index 00000000..f288685e --- /dev/null +++ b/docs/guide/research/external_memory_improvement_plan.md @@ -0,0 +1,569 @@ +# External Memory Improvement Plan - June 9, 2026 + +Goal: Convert the June 2026 live benchmark, external memory-system research, and Dexter radar operating pattern into an issue-ready ELF improvement plan. +Read this when: Deciding what to implement next before using ELF as a personal production memory system. +Inputs: `README.md`, `docs/guide/benchmarking/2026-06-09-live-baseline-report.md`, `docs/guide/research/comparison_external_projects.md`, `docs/guide/research/research_projects_inventory.md`, current Linear readback, and the local Dexter Pattern Radar automation pattern. +Depends on: `docs/governance.md`, `docs/spec/system_elf_memory_service_v2.md`, and the checked-in live baseline runner. +Outputs: Prioritized gaps, issue queue, parallelization plan, acceptance criteria, and follow-up radar model. + +## Summary Judgment + +ELF is currently a credible personal-production candidate for an evidence-bound agent memory service, but it should not be treated as fully proven until the P0 items below land. + +The objective position is: + +- Better than the tested alternatives on evidence-bound writes, deterministic ingestion boundaries, source-of-truth discipline, rebuildable indexing, multi-tenant service shape, and the current encoded Docker benchmark. +- Comparable to the best tested alternative, qmd, on local retrieval quality under the smoke scenario, but ELF has a stronger service/provenance model while qmd has stronger local retrieval-debug ergonomics. +- Behind agentmemory, claude-mem/OpenMemory-style tools, and some managed-memory products on operator UX, visible memory inspection, and turn-by-turn operational comfort. +- Behind Graphiti/Zep, Letta, and mem0-style systems on some memory semantics: temporal graph validity, explicit memory history, core-vs-archival blocks, and reviewable memory evolution. +- Not yet proven on large private personal corpus migration, repeated batch backfill, cold-start persistence across every adapter, or long-running unattended production operation. + +So the answer is not "ELF is universally better." The current evidence supports "ELF is the better foundation for this repo's desired high-trust, evidence-linked memory system, and it can become the better personal-production choice if the P0 work lands and is benchmarked." + +## Evidence Base + +### Live Benchmark Evidence + +Checked-in report: `docs/guide/benchmarking/2026-06-09-live-baseline-report.md`. + +Current encoded result: + +- ELF provider stress run: `live-baseline-20260609010854`, `Qwen3-Embedding-8B`, 4096-dimensional provider embeddings, 480 documents, 16 queries, 8 of 8 encoded checks passing, elapsed 1163 seconds. +- All-project smoke run: `live-baseline-20260609022837`. +- ELF and qmd passed every encoded smoke check. +- agentmemory passed same-corpus retrieval but failed or could not complete lifecycle checks. +- mem0, memsearch, and claude-mem returned wrong same-corpus retrieval results in the encoded smoke. +- OpenViking was incomplete because its local embedding dependency could not complete inside the Docker runner. + +What this proves: + +- ELF's current service path can run real provider embeddings through Docker-isolated benchmark scripts. +- ELF's strict provenance/service model does not prevent it from passing the encoded retrieval checks. +- 4096-dimensional provider embeddings are operationally usable for the tested scale. + +What this does not prove: + +- It does not prove ELF beats every project on all retrieval workloads. +- It does not prove long-running personal production safety. +- It does not prove private-corpus migration quality. +- It does not prove viewer/operator ergonomics are competitive. +- It does not prove every adapter's lifecycle behavior is correctly represented. + +### External Project Activity Snapshot + +Captured from GitHub API on June 9, 2026. Activity is only a refresh signal, not a quality ranking. + +| Project | Stars | Last push | Latest release | Why keep tracking | +| --- | ---: | --- | --- | --- | +| rohitg00/agentmemory | 21969 | 2026-06-08 | v0.9.27 | Coding-agent continuity, packaging, viewer, benchmark claims | +| mem0ai/mem0 | 58095 | 2026-06-09 | cli-node-v0.2.8 | Memory lifecycle, hosted/OpenMemory ecosystem, graph option | +| zilliztech/memsearch | 1948 | 2026-06-01 | v0.4.6 | Markdown-first store and hybrid retrieval ergonomics | +| tobi/qmd | 26294 | 2026-06-08 | v2.5.3 | Strong local retrieval pipeline and transparent debug workflow | +| thedotmack/claude-mem | 81336 | 2026-06-08 | v13.4.1 | Progressive disclosure, auto-capture loop, local viewer | +| volcengine/OpenViking | 25368 | 2026-06-09 | v0.3.24 | Hierarchical context model and staged retrieval trajectory | +| nvk/llm-wiki | 547 | 2026-05-23 | v0.10.2 | Evidence-to-knowledge page compilation | +| garrytan/gbrain | 21723 | 2026-06-08 | none | Human-operable knowledge memory shape | +| GoogleCloudPlatform/generative-ai | 17001 | 2026-06-09 | none | Managed memory/dreaming reference patterns | +| safishamsi/graphify | 63545 | 2026-06-08 | v0.8.36 | Graph-compressed navigation and graph reports | +| nanograph/nanograph | 149 | 2026-05-17 | v1.3.0 | Typed graph ergonomics | +| letta-ai/letta | 23219 | 2026-05-14 | 0.16.8 | Core memory blocks vs archival memory | +| langchain-ai/langgraph | 34219 | 2026-06-07 | 1.2.4 | Replay-first state and regression workflow | +| getzep/graphiti | 27194 | 2026-06-09 | v0.29.2 | Temporal graph memory semantics | +| infiniflow/ragflow | 82243 | 2026-06-09 | v0.25.6 | Full RAG app benchmark reference | +| HKUDS/LightRAG | 36316 | 2026-06-09 | v1.5.0 | Lightweight graph/RAG architecture | +| microsoft/graphrag | 33574 | 2026-06-05 | v3.1.0 | GraphRAG indexing and community reports | +| virattt/dexter | 26927 | 2026-06-03 | v2026.6.3 | Radar operating model and research-worker patterns | + +### Failure Semantics + +Use these terms in future benchmark reports and Linear issues: + +| Term | Meaning | Example | +| --- | --- | --- | +| `pass` | Encoded check completed and returned expected result. | ELF same-corpus retrieval and lifecycle checks pass. | +| `wrong_result` | The system completed but returned an incorrect memory or missed the expected evidence. | mem0/memsearch/claude-mem smoke retrieval mismatch. | +| `lifecycle_fail` | Retrieval may work, but update/delete/cold-start/persistence behavior is wrong or incomplete. | agentmemory adapter passing retrieval but not lifecycle. | +| `incomplete` | The benchmark could not reach the behavioral check due to install/runtime/dependency failure. | OpenViking local embedding install failure in Docker. | +| `not_encoded` | Capability is not currently covered by the benchmark, so no pass/fail claim is allowed. | Viewer quality, batch backfill UX, graph temporal validity. | +| `blocked` | A safe test cannot run without external credentials, manual setup, or a dependency outside the issue scope. | Private corpus evaluation before sanitized corpus exists. | + +## Priority Program + +### P0 - Personal Production Readiness + +These items decide whether ELF is safe and comfortable enough for single-user production use. + +#### P0.1 Batch Ingest and Backfill Throughput + +Problem: +The current provider stress result is acceptable for 480 documents, but production adoption needs predictable bulk loading and recovery behavior for a larger personal memory corpus. + +Adopt from: + +- qmd and memsearch: practical local indexing ergonomics. +- LangGraph-style replay discipline: rerunnable import paths with explicit progress. +- ELF's own outbox/worker architecture. + +Implementation shape: + +- Add a bulk ingest/backfill command or HTTP job surface that accepts generated or file-backed note batches. +- Use micro-batched embedding requests. +- Add bounded concurrent embedding workers. +- Use durable job rows with checkpointed offsets and retry state. +- Use batch Qdrant upserts. +- Preserve Postgres as source of truth; Qdrant remains rebuildable. +- Expose batch progress and per-stage timing in report artifacts. + +Acceptance: + +- Docker-only benchmark profile for 480, 2k, and 10k document backfills. +- Backfill can be interrupted and resumed without duplicate source notes. +- Search quality after resume equals a clean run for the same manifest. +- Provider credentials stay in `.env`; no host-global install path is required. + +Linear mapping: + +- New issue required: `[ELF prod P0] Add resumable batch ingest and backfill benchmark`. +- Parallelizable with P0.2 and P0.4. + +#### P0.2 Private Production Corpus Benchmark + +Problem: +The generated benchmark is useful but not enough to decide personal production adoption. A sanitized real corpus is needed. + +Adopt from: + +- agentmemory: coding-agent continuity scenarios. +- qmd: local query/debug workflow. +- LangGraph: replayable regression cases. + +Implementation shape: + +- Build a private/sanitized corpus manifest for real project memory: issues, PRs, worktrees, runbooks, decisions, and stalled-lane recovery notes. +- Define task-oriented queries: "resume lane", "find prior decision", "explain stale blocker", "recover exact command", "compare project status". +- Include cold-start, update, delete/expiry, and contradictory-memory cases. +- Keep the actual private corpus out of public docs if needed, but commit the manifest schema and synthetic fixtures. + +Acceptance: + +- Benchmark reports separate public generated corpus from private production corpus. +- Every query has expected evidence ids and allowed alternates. +- Results record precision, wrong-result count, latency, provider, dimensions, and cost proxy. +- Any claim that ELF is production-ready must cite this report. + +Linear mapping: + +- New issue required: `[ELF prod P0] Add private-corpus production adoption benchmark`. +- Blocks a final "use as personal production memory" decision. + +#### P0.3 Single-User Production Runbook and Recovery Contract + +Problem: +Docker compose and strict config now exist, but production use needs backup, restore, upgrade, and disaster-recovery instructions. + +Adopt from: + +- memsearch: simple local store expectations. +- Docker-first deployment discipline from the new live baseline. +- ELF governance: explicit config and source-of-truth boundaries. + +Implementation shape: + +- Document a single-user production profile using Docker Compose for Postgres, Qdrant, API, worker, and MCP if needed. +- Add backup/restore commands for Postgres. +- Add Qdrant rebuild instructions from Postgres. +- Add health checks, migration checks, and rollback notes. +- Document provider `.env` expectations and what must not be committed. + +Acceptance: + +- Fresh machine restore proves notes/search work after Postgres restore and Qdrant rebuild. +- Runbook includes exact commands and fail-closed warnings. +- No host-global service install is required. + +Linear mapping: + +- New issue required: `[ELF prod P0] Add single-user production runbook with backup and restore`. +- Parallelizable with P0.1 after config paths are stable. + +#### P0.4 Retrieval Observability and Viewer Follow-Through + +Problem: +For daily use, API-only debugging is too slow. ELF now has a base read-only viewer path, but retrieval tuning still needs first-class panels. + +Adopt from: + +- claude-mem/OpenMemory-style viewer ergonomics. +- qmd transparent expansion/fusion/rerank controls. +- OpenViking staged retrieval trajectory. + +Implementation shape: + +- Extend the viewer with search session timelines, candidate lists, dense/BM25/fusion/rerank scores, relation context, latency, and provider metadata. +- Add a `GET /v2/searches/{id}` or equivalent trace readback if not already exposed for every panel. +- Keep the viewer read-only for P0. +- Add direct links from benchmark failures to trace ids where possible. + +Acceptance: + +- A benchmark wrong-result can be debugged from viewer panels without raw database queries. +- The viewer shows which stage dropped or reranked the expected memory. +- Read-only authorization and no-mutation behavior are tested. + +Linear mapping: + +- Existing: XY-19 base read-only viewer is done. +- Existing follow-up: XY-27 should be prioritized from Backlog to active after P0.1/P0.2 are queued. + +#### P0.5 Durable External Adapter and Lifecycle Benchmark Coverage + +Problem: +The current all-project smoke found adapter-level ambiguity. It is not enough to say "agentmemory failed" if the adapter uses an in-memory or incomplete lifecycle path. + +Adopt from: + +- agentmemory: actual durable package behavior and benchmark claims. +- ELF benchmark runner: Docker-isolated reproducibility. + +Implementation shape: + +- Replace mock/in-memory external adapters with durable local modes where feasible. +- For every external adapter, mark which behaviors are real, mocked, unsupported, or blocked. +- Add lifecycle checks: update, delete/expire, cold-start reload, and same-corpus retrieval. +- Keep failures typed with the terms in this document. + +Acceptance: + +- agentmemory adapter either passes durable lifecycle checks or is explicitly marked blocked with evidence. +- OpenViking incomplete state records a pinned dependency failure and retry path. +- qmd smoke pass remains covered and gains scale/stress profiles. + +Linear mapping: + +- Existing: XY-801 created the initial agentmemory import/baseline boundary and is done. +- New issue required: `[ELF benchmark P0] Make external adapters lifecycle-durable and fail-typed`. + +### P1 - Memory Quality and Product Differentiation + +These items make ELF not merely usable, but materially better than adjacent memory products for high-trust agent work. + +#### P1.1 Reviewable Consolidation Worker + +Problem: +ELF has the right evidence-bound source model, but long-term memory quality needs consolidation without hidden mutation. + +Adopt from: + +- Gemini/managed memory "dreaming" direction, but with explicit review. +- Always-On Memory Agent: background consolidation loop. +- Dexter: proposal-only memo/readback artifacts. + +Implementation shape: + +- Implement consolidation jobs over immutable notes/events/traces. +- Write derived proposals, not source-note rewrites. +- Include source ids, confidence, unsupported-claim flags, conflicts, and review state. +- Add apply/discard/defer transitions. + +Acceptance: + +- Every proposed derived memory is traceable to source evidence. +- No derived proposal can silently replace source truth. +- Consolidation output appears in viewer/readback. + +Linear mapping: + +- Existing foundation: XY-800 is done. +- New follow-up required: `[ELF vNext P1] Implement reviewable consolidation worker and proposal review flow`. + +#### P1.2 Knowledge Memory Pages + +Problem: +Many compact memories remain hard to navigate unless compiled into stable, provenance-linked entity/project/concept pages. + +Adopt from: + +- llm-wiki and gbrain: maintained knowledge pages. +- ELF provenance model: every page section cites notes/events. + +Implementation shape: + +- Build derived pages for entities, concepts, projects, issues, and decisions. +- Add backlinks, source coverage, stale/unsupported-claim lint, and rebuild commands. +- Keep pages derived and rebuildable, not authoritative source truth. + +Acceptance: + +- A project page can be rebuilt from notes and preserves citations. +- Lint catches unsupported claims and stale source references. +- Viewer/search can surface page snippets with provenance. + +Linear mapping: + +- Existing: XY-286 is the right epic and should be expanded with smaller implementation issues. + +#### P1.3 Temporal Graph-Lite Validity + +Problem: +ELF already persists structured relations, but production memory needs time-aware facts: what was true when, what superseded it, and why. + +Adopt from: + +- Graphiti/Zep: temporal graph memory semantics. +- nanograph: typed graph/query ergonomics, without replacing Postgres. + +Implementation shape: + +- Add valid_from, valid_to or invalidated_at semantics for relation facts. +- Keep append-only relation history. +- Add APIs for current facts vs historical facts. +- Extend search relation_context to respect temporal validity. + +Acceptance: + +- Contradictory facts do not overwrite silently. +- Search can choose current-only or historical relation context. +- Tests cover invalidation and old-state replay. + +Linear mapping: + +- Existing related: XY-70 covers graph-lite typed schema/query. +- New issue required: `[ELF graph P1] Add temporal validity to graph-lite facts`. + +#### P1.4 Memory History and Evolution API + +Problem: +Users and agents need to inspect how a memory changed over time, especially when an LLM proposed an update. + +Adopt from: + +- mem0: lifecycle/event history. +- ELF ingest decision table: existing audit direction. + +Implementation shape: + +- Add memory event history for add, update, ignore, reject, expire, derived, applied, and invalidated transitions. +- Expose history readbacks via HTTP/MCP. +- Link ingest decisions to note/relation versions. + +Acceptance: + +- A user can explain why a memory currently exists and what earlier evidence changed it. +- History survives restart and migration. +- Benchmark lifecycle checks include history expectations. + +Linear mapping: + +- New issue required: `[ELF memory P1] Add memory history and evolution readback API`. + +#### P1.5 Core Memory Blocks vs Archival Memory + +Problem: +Some memories should be intentionally small, always-attached operating context; most memory should remain retrievable archival context. + +Adopt from: + +- Letta: core memory blocks vs archival memory. +- ELF scope controls: explicit attachment and sharing. + +Implementation shape: + +- Add scoped, read-only memory blocks for stable agent/project instructions. +- Keep block attachment explicit per tenant/project/agent. +- Do not let blocks bypass evidence or policy boundaries. +- Keep blocks inspectable in viewer and MCP readback. + +Acceptance: + +- Agents can request their attached core blocks separately from search. +- Blocks have source/provenance metadata and audit history. +- Archival search remains independent. + +Linear mapping: + +- New issue required: `[ELF memory P1] Add scoped core memory blocks with archival separation`. + +#### P1.6 Search Trajectory and Query Planning + +Problem: +ELF already has expansion, hybrid retrieval, and reranking, but external tools expose the route more clearly. + +Adopt from: + +- qmd: weighted fusion and local debug knobs. +- OpenViking: staged retrieval trajectory and recursive retrieval. +- graphify: graph-compressed navigation hints. + +Implementation shape: + +- Add stable trace schema for query expansion, dense retrieval, BM25 retrieval, fusion, rerank, graph context, and final selection. +- Add optional recursive or staged retrieval profiles. +- Expose search-plan hints without making them hidden authority. + +Acceptance: + +- Every search result can explain its path. +- Tuning can be done through config/profile changes and benchmark replay. +- Wrong-result reports show stage-level cause. + +Linear mapping: + +- Existing related: XY-27 retrieval observability. +- New issue may be needed after XY-27: `[ELF retrieval P1] Add staged search trajectory profiles`. + +### P2 - Ongoing Intelligence and Ecosystem Parity + +These items keep ELF improving after the first production cut. + +#### P2.1 ELF External Memory Pattern Radar + +Problem: +External memory projects are moving quickly. Manual one-off reviews will go stale. + +Adopt from: + +- Local Dexter Pattern Radar automation. +- Decodex radar evidence discipline. + +Implementation shape: + +- Create a weekly Codex automation for ELF memory-system radar. +- Track upstream deltas for agentmemory, mem0, qmd, claude-mem, OpenViking, Graphiti, Letta, LightRAG, GraphRAG, and related projects. +- Maintain a structured cursor file plus prose memory. +- For every candidate pattern, produce an architecture-fit matrix: + - upstream change + - reusable pattern + - ELF verdict: covered, reject, or gap + - product value + - duplicate/coverage evidence + - safety boundary + - issue decision + - acceptance evidence +- Search Linear before creating issues. +- Create issues only when repo evidence shows a real gap. + +Acceptance: + +- A no-issue run records why ELF is already covered or why a pattern is rejected. +- A new issue includes source links, repo evidence, non-goals, and validation criteria. +- The radar never treats external runtime adoption as the default. + +Linear mapping: + +- New issue required: `[ELF ops P2] Add weekly external memory pattern radar automation`. + +#### P2.2 Broaden Benchmark Adapter Coverage + +Problem: +The current smoke covers the first project set, but broader claims need RAGFlow, LightRAG, GraphRAG, and deeper qmd/OpenViking profiles. + +Adopt from: + +- RAGFlow, LightRAG, GraphRAG: graph/RAG baselines. +- Current Docker live benchmark. + +Implementation shape: + +- Add D1/D2 research runs before implementation for large RAG systems. +- Add adapters only when Docker isolation is practical. +- Track install time, resource needs, and failure mode separately from retrieval quality. + +Acceptance: + +- Reports separate unsupported, blocked, incomplete, and wrong-result states. +- No external project is marked worse solely because setup is heavier. +- Claims remain scoped to encoded checks. + +Linear mapping: + +- New issue required: `[ELF benchmark P2] Add expanded RAG and graph-memory baseline adapters`. + +#### P2.3 CLI and SDK Ergonomics + +Problem: +ELF is service-first. External projects often feel easier for a local developer because their CLI path is direct. + +Adopt from: + +- qmd, memsearch, agentmemory: local CLI ergonomics. + +Implementation shape: + +- Add CLI wrappers for add/search/status/backfill/report if they are still missing or scattered. +- Keep commands thin over HTTP/MCP contracts. +- Link commands to benchmark and runbook workflows. + +Acceptance: + +- A local user can add notes, search, view status, run backfill, and generate benchmark report from documented commands. +- CLI output includes trace ids and source ids. + +Linear mapping: + +- New issue required after P0 runbook: `[ELF dx P2] Add local CLI wrappers for production memory workflows`. + +## Issue Queue + +| Order | Priority | Issue | Existing mapping | Parallelizable | Blocks | +| ---: | --- | --- | --- | --- | --- | +| 1 | P0 | Add resumable batch ingest and backfill benchmark | New | yes | production corpus migration | +| 2 | P0 | Add private-corpus production adoption benchmark | New | yes | final adoption claim | +| 3 | P0 | Add single-user production runbook with backup and restore | New | yes | unattended use | +| 4 | P0 | Prioritize retrieval observability panels | XY-27, after XY-19 | yes | efficient tuning | +| 5 | P0 | Make external adapters lifecycle-durable and fail-typed | New, follows XY-801 | yes | fair external comparison | +| 6 | P1 | Implement reviewable consolidation worker and proposal review flow | follows XY-800 | partly | knowledge pages | +| 7 | P1 | Split XY-286 into derived page storage, rebuild, lint, and viewer/search integration | XY-286 | partly | durable knowledge layer | +| 8 | P1 | Add temporal validity to graph-lite facts | follows/relates XY-70 | yes | time-aware relation context | +| 9 | P1 | Add memory history and evolution readback API | New | yes | lifecycle auditability | +| 10 | P1 | Add scoped core memory blocks with archival separation | New | yes | agent operating context | +| 11 | P1 | Add staged search trajectory profiles | New or XY-27 follow-up | after XY-27 | advanced retrieval tuning | +| 12 | P2 | Add weekly external memory pattern radar automation | New | yes | ongoing parity | +| 13 | P2 | Add expanded RAG and graph-memory baseline adapters | New | yes | broader public comparison | +| 14 | P2 | Add local CLI wrappers for production memory workflows | New | after P0.3 | local ergonomics | + +## Parallel Development Plan + +Safe concurrent lanes: + +- Lane A: P0.1 batch ingest/backfill. +- Lane B: P0.2 private-corpus benchmark and manifest schema. +- Lane C: P0.3 production runbook and backup/restore proof. +- Lane D: P0.5 adapter lifecycle benchmark hardening. +- Lane E: XY-27 retrieval observability panels. +- Lane F: P2.1 radar automation, because it is mostly automation/config/docs and should not touch runtime code. + +Avoid running concurrently without coordination: + +- P1.1 consolidation worker and P1.2 knowledge pages, because knowledge pages should build on the reviewed derived proposal model. +- P1.3 temporal graph validity and XY-70 typed graph work, unless ownership is split cleanly between storage semantics and query ergonomics. +- P1.6 staged search trajectory and XY-27 viewer panels, unless the trace schema is agreed first. + +Recommended Decodex queue order: + +1. Queue P0.2 and P0.3 first because they define adoption evidence and recovery expectations. +2. Queue P0.1 and P0.5 in parallel because they exercise different implementation surfaces. +3. Promote XY-27 after the trace data needed by P0.5 is clear. +4. Start P1.1 only after P0.2 has enough corpus scenarios to evaluate consolidation quality. +5. Split XY-286 after P1.1 defines derived proposal semantics. + +## Non-Goals + +- Do not replace ELF core storage with any external memory runtime. +- Do not make Qdrant authoritative. +- Do not treat graph memory as a separate hidden source of truth. +- Do not allow background consolidation to mutate source notes silently. +- Do not benchmark with host-global installs when Docker isolation is feasible. +- Do not claim overall superiority from a benchmark dimension that is not encoded. +- Do not create new Linear issues from radar output without duplicate search and repo evidence. + +## Production Adoption Gate + +For personal production use, the minimum acceptable gate is: + +- P0.1 batch ingest/backfill passes generated scale checks and resume checks. +- P0.2 private corpus benchmark has a passing or explicitly bounded result. +- P0.3 backup/restore runbook is tested on Docker Compose. +- P0.4/XY-27 gives enough viewer traceability to debug bad retrieval without raw SQL. +- P0.5 benchmark reports use typed failure states for external comparisons. + +After that gate, ELF can reasonably be used as the personal production memory system with known limitations. Before that gate, ELF is a strong foundation with promising benchmark evidence, but the adoption risk is still too high to call it production-proven. diff --git a/docs/guide/research/index.md b/docs/guide/research/index.md index d9d85967..d3fb7912 100644 --- a/docs/guide/research/index.md +++ b/docs/guide/research/index.md @@ -10,6 +10,7 @@ Outputs: The smallest comparison or inventory document needed for implementation - `research_projects_inventory.md`: audited and pending external projects, research depth, and current planning surface. - `comparison_external_projects.md`: detailed capability comparison, project trade-offs, source map, and research-backed ELF directions. +- `external_memory_improvement_plan.md`: prioritized June 2026 improvement backlog, issue queue, parallelization plan, and production-adoption gate from benchmark and external-project evidence. - `agentmemory_adapter.md`: fixture-backed agentmemory import and baseline adapter boundary for `elf-eval`. ## Machine-Readable Runs