From ca2da6f7a6cec69286a3801b907353d956172c52 Mon Sep 17 00:00:00 2001 From: Yvette Carlisle Date: Mon, 22 Jun 2026 11:31:16 +0800 Subject: [PATCH] {"schema":"decodex/commit/1","summary":"Codify the Agent Memory Knowledge System contract and gate command alias","authority":"XY-1059"} --- Makefile.toml | 7 + README.md | 2 + docs/index.md | 3 + docs/log.md | 8 + docs/spec/agent_memory_knowledge_system_v1.md | 284 ++++++++++++++++++ docs/spec/index.md | 1 + docs/spec/system_version_registry.md | 11 + 7 files changed, 316 insertions(+) create mode 100644 docs/spec/agent_memory_knowledge_system_v1.md diff --git a/Makefile.toml b/Makefile.toml index 967d9f02..846970cc 100644 --- a/Makefile.toml +++ b/Makefile.toml @@ -988,6 +988,7 @@ args = [ # | check-docs | command | | # | check-rust | command | | # | check-trace-gate | command | | +# | checks | composite | | [tasks.check] clear = true @@ -1024,6 +1025,12 @@ args = [ "scripts/trace-gate.sh", ] +[tasks.checks] +workspace = false +dependencies = [ + "check", +] + # Clean # | task | type | cwd | # | -------------------------- | ------- | --- | diff --git a/README.md b/README.md index daaeaf4e..2b207830 100644 --- a/README.md +++ b/README.md @@ -496,6 +496,8 @@ proactive-brief, and scheduled-memory scoring evidence. ## Documentation - Start here: `docs/index.md` +- Agent Memory + Knowledge System product contract: + `docs/spec/agent_memory_knowledge_system_v1.md` - Runbook index: `docs/runbook/index.md` - Single-user production runbook: [docs/runbook/single_user_production.md](docs/runbook/single_user_production.md) diff --git a/docs/index.md b/docs/index.md index c4a952cb..d5850e2b 100644 --- a/docs/index.md +++ b/docs/index.md @@ -28,6 +28,9 @@ The split below is by question type, not by human-versus-agent audience. - Need contracts, invariants, schemas, enums, state machines, or required behavior -> `docs/spec/` +- Need the Agent Memory + Knowledge System product boundary, P0-P5 roadmap, + Decodex phase gate, or competitor absorption rules -> + `docs/spec/agent_memory_knowledge_system_v1.md` - Need runbooks, migrations, validation steps, troubleshooting, or operational sequences -> `docs/runbook/` - Need the single-user production backup, restore, and Qdrant rebuild path -> diff --git a/docs/log.md b/docs/log.md index e6f0f7ed..332a4e8b 100644 --- a/docs/log.md +++ b/docs/log.md @@ -75,3 +75,11 @@ logs. `elf.graph_report/v1` through service, HTTP, and MCP readback, using existing Postgres graph-lite facts with sourced, inferred, ambiguous, stale, and superseded markers while keeping `valid_from`/`valid_to` as the internal temporal vocabulary. + +## 2026-06-22 + +- Added `docs/spec/agent_memory_knowledge_system_v1.md` for XY-1059, codifying the + Agent Memory + Knowledge System product boundary, P0-P5 roadmap, Decodex + phase-gate rule, competitor absorption boundaries, validation expectations, and + phase closeout checklist. +- Linked the new product contract from the docs root index and spec index. diff --git a/docs/spec/agent_memory_knowledge_system_v1.md b/docs/spec/agent_memory_knowledge_system_v1.md new file mode 100644 index 00000000..91c02ba1 --- /dev/null +++ b/docs/spec/agent_memory_knowledge_system_v1.md @@ -0,0 +1,284 @@ +--- +type: Spec +title: "Agent Memory and Knowledge System v1" +description: "Define the ELF Agent Memory + Knowledge System product contract, roadmap, phase gate, and claim boundaries." +resource: docs/spec/agent_memory_knowledge_system_v1.md +status: active +authority: normative +owner: spec +last_verified: 2026-06-22 +tags: + - docs + - spec + - agent-memory + - knowledge +source_refs: + - docs/evidence/benchmarking/2026-06-20-agent-knowledge-os-closeout-benchmark-report.md + - docs/runbook/benchmarking/real_world_agent_memory_benchmark.md + - docs/spec/real_world_agent_memory_benchmark_v1.md + - docs/spec/system_elf_memory_service_v2.md +code_refs: + - Makefile.toml +related: + - docs/spec/system_elf_memory_service_v2.md + - docs/spec/system_knowledge_pages_v1.md + - docs/spec/system_recall_debug_panel_v1.md + - docs/spec/system_graph_memory_postgres_v1.md + - docs/spec/system_memory_summary_v1.md +drift_watch: + - docs/spec/agent_memory_knowledge_system_v1.md + - docs/evidence/benchmarking/2026-06-20-agent-knowledge-os-closeout-benchmark-report.md + - docs/runbook/benchmarking/real_world_agent_memory_benchmark.md + - Makefile.toml +--- +# Agent Memory and Knowledge System v1 + +Purpose: Define the ELF Agent Memory + Knowledge System product contract, roadmap, +phase gate, and claim boundaries. +Status: normative +Read this when: You are shaping product work, opening implementation issues, reviewing +Agent Memory + Knowledge System claims, or deciding which phase may be queued. +Not this document: Low-level service API semantics, benchmark fixture schemas, +operator run commands, or implementation details for one subsystem. +Defines: `elf.agent_memory_knowledge_system/v1` product boundary, P0-P5 roadmap, +phase-gate rules, agent-facing surfaces, UI role, benchmark metrics, competitor +absorption rules, and phase closeout checklist. + +## Product Contract + +ELF is an open-source Agent Memory + Knowledge System. + +ELF turns sources into traceable knowledge, promotes reliable knowledge into agent +memory, and makes recall explainable, correctable, rollbackable, and benchmarked. + +The lead wedge is source-linked memory authority plus recall/debug quality. ELF must +not be positioned as a generic RAG framework, wiki compiler, hosted memory SDK, +graph database, or document-search replacement. + +## System Boundary + +The product is composed of six typed layers: + +| Layer | Authority | Required boundary | +| --- | --- | --- | +| Source Library | Captured documents, excerpts, imports, and source refs. | Sources remain evidence. Derived memory and pages must cite sources instead of replacing them. | +| Memory Authority | Notes, core blocks, ingest decisions, history, corrections, and rollback evidence. | Memory writes are policy-gated, evidence-linked, auditable, and reversible. | +| Knowledge Workspace | Derived project, entity, concept, issue, and decision pages. | Pages are rebuildable derived artifacts with citations, lint, and stale-source detection. | +| Graph-lite Facts | Postgres-backed relation facts and temporal markers. | Graph facts are source-backed context, not a separate authority store. | +| Dreaming Review | Reviewable consolidation, summary, brief, tag, correction, and promotion proposals. | Derived proposals must be reviewable and must not mutate sources without an explicit accepted transition. | +| Recall Debug | Search traces, dropped candidates, source/doc/page/graph/proposal rows, and replay aids. | Recall must expose why context was selected, dropped, unavailable, blocked, or not requested. | + +Existing subsystem specs own their detailed contracts. This document owns how those +subsystems fit into the Agent Memory + Knowledge System product boundary. + +## Non-Goals + +- Do not turn ELF into a broad RAGFlow, OpenKB, PageIndex, mem0, Zep, Letta, qmd, + OpenViking, agentmemory, claude-mem, or memsearch replacement. +- Do not weaken Postgres source-of-truth, source-ref, evidence-binding, English-gate, + scope, lifecycle, or review boundaries to match another product's ergonomics. +- Do not claim hosted managed-memory, private-corpus, provider-backed, UI/export, + graph/RAG, core/archive, context-trajectory, or long-document parity without + same-corpus checked-in or operator-owned evidence for that exact claim. +- Do not collapse `blocked`, `incomplete`, `not_encoded`, `wrong_result`, or + `unsupported_claim` states into pass claims. +- Do not queue later phases while the current accepted phase is still under review. + +## Data Model Direction + +All implementation phases must preserve the source-to-memory authority chain: + +1. Sources are captured as documents, excerpts, event audits, issue/PR records, or + other source refs with stable provenance. +2. Candidate knowledge is derived from sources as proposals, page sections, graph + facts, summaries, or memory candidates. +3. Promotion into memory records an explicit policy decision, source refs, actor, + confidence, importance, lifecycle state, and audit trail. +4. Correction and rollback create durable history instead of silently rewriting the + evidence chain. +5. Recall reads from typed surfaces and returns enough trace data to debug selection, + demotion, filtering, staleness, and missing anchors. + +Postgres remains the authority for notes, docs metadata, graph-lite facts, derived +pages, proposal review state, traces, and audit history. Qdrant and any future +retrieval index remain derived and rebuildable. + +## Agent-Facing Surfaces + +Agent-facing tools must be thin MCP or HTTP facades over typed service behavior. +Business logic and policy remain in `elf-api` and `elf-service`. + +Current and future Agent Memory + Knowledge System work should use these surface +families: + +| Surface family | Examples | Boundary | +| --- | --- | --- | +| Source capture and hydration | `elf_docs_put`, `elf_docs_search_l0`, `elf_docs_excerpts_get` | Capture and retrieve source evidence without promoting it to memory by default. | +| Memory write and readback | `elf_notes_ingest`, `elf_events_ingest`, `elf_searches_create`, `elf_searches_notes`, `elf_core_blocks_get`, `elf_entity_memory_get` | Writes must preserve policy and evidence decisions; reads must honor scopes and lifecycle. | +| Provenance and history | `elf_admin_note_provenance_get`, `elf_admin_memory_history_get`, trace bundle tools | Debug memory authority without raw database access in normal workflows. | +| Knowledge and graph context | Knowledge page search/readback, `elf_graph_query`, graph report surfaces | Expose derived knowledge and graph facts as labeled context, not authoritative note hits. | +| Dreaming review | Dreaming review queue and proposal review surfaces | Keep proposals reviewable; auto-apply is limited to explicitly accepted low-risk derived organization cases. | +| Recall debug | `elf_recall_debug_panel`, trace and trajectory readback | Show selected, dropped, available, reviewable, blocked, and not-requested context. | + +New MCP tools must name the underlying authority layer, link to the owning spec, and +preserve read/write boundaries. A readback tool must not become a hidden mutation path. + +## UI Role + +The UI is an operator console for source review, memory authority, knowledge pages, +proposal review, graph/topic inspection, and recall debugging. + +The UI must: + +- label authoritative notes, derived pages, graph facts, proposals, and trace rows + differently; +- show citations, lint state, review state, lifecycle state, and rollback/correction + affordances where applicable; +- prefer typed service readback over raw store inspection; +- avoid presenting derived pages or proposals as current memory unless they have been + promoted through the relevant authority path. + +The UI is not the source of truth and must not bypass API, MCP, scope, review, or +write-policy contracts. + +## Roadmap + +The roadmap phases below are product phases. They are not broad permission to queue or +implement every item in a phase at once. + +| Phase | Name | Scope | Gate to leave phase | +| --- | --- | --- | --- | +| P0 | Product contract and phase gate | Codify this product boundary, roadmap, competitor absorption rules, validation expectations, and closeout checklist. | Docs are reviewed, repo docs validation passes, claim boundaries match the June 20 closeout evidence, and the main thread accepts the next phase. | +| P1 | Memory Authority MVP loop | Deliver one source-backed memory-authority vertical slice: capture source evidence, create/review one proposal through a proposal inbox, record the authority ledger, apply/correct/rollback, recall through agent-facing tools, and debug stale/correction behavior. | The slice has service tests, provenance/history evidence, recall/debug readback, and at least one real-world stale/correction benchmark job. | +| P2 | Knowledge Workspace | Promote source-linked project/entity/concept/timeline pages with rebuild, lint, watch, search, and version-diff readback. | Pages stay derived, every section is cited or explicitly unsupported, stale-source lint runs, and benchmark reports publish citation/staleness metrics. | +| P3 | Competitor-strength adapters | Add contained comparison adapters for qmd replay, PageIndex/OpenKB, mem0/OpenMemory, Letta, Graphiti/Zep, OpenViking, graph/RAG references, and other accepted deltas. | Each adapter preserves typed non-pass states and emits same-corpus evidence before any parity, win, tie, or loss claim. | +| P4 | Benchmark and quality hardening | Expand adversarial jobs, public comparison grammar, quality metrics, latency/cost/resource reporting, and unsupported-claim detection. | Reports preserve job/suite/project typed states, expected evidence recall, irrelevant context ratio, unsupported claims, and resource metrics. | +| P5 | Productization | Improve local setup, agent recipes, operator UI, privacy/delete/export boundaries, and production-quality workflows. | Operator workflows have documented setup, privacy/delete/export semantics, and validation evidence without weakening source authority. | + +### First Implementation Phase Constraint + +The first implementation issue after P0 must be the smallest coherent P1 vertical +slice. It may touch only the surfaces needed to prove one source-linked +memory-authority loop end to end. + +The first P1 issue must not build the full Knowledge Workspace, broad operator UI, +external adapter pack, hosted memory behavior, graph/RAG parity, or product-wide +rewrites. Those are later phases unless a main-thread decision explicitly narrows and +accepts a different next slice. + +## Decodex Phase Gate + +Decodex execution for this project is single-phase gated: + +- Only the next accepted phase may carry the service-scoped queue label + `decodex:queued:elf`. +- Later-phase issues must remain unqueued while the current phase is running, under + review, or waiting for main-thread acceptance. +- After each phase lands, the main thread must review evidence, tests, benchmark + results, claim boundaries, and next-phase readiness before any later issue receives + `decodex:queued:elf`. +- `decodex:active:elf` means runtime ownership of an active lane. It is not a request + to start additional phases. +- `In Review` is a PR-backed handoff state. It is not phase acceptance by itself. + +P0 is the current phase for this contract. No P1 issue should be queued until the P0 +change is reviewed and accepted by the main thread. + +## Competitor Absorption Rules + +External projects are references for targeted improvements. They are not hidden +dependencies and are not automatic proof that ELF is weaker or stronger. + +| Competitor/reference | Strength to absorb | Claim boundary | +| --- | --- | --- | +| qmd | Transparent expansion, fusion, rerank, top-k, and compact replay ergonomics. | Preserve qmd's debug edge until ELF emits comparable replay artifacts. | +| VectifyAI PageIndex | Long-document tree retrieval and PageIndex MCP ecosystem direction. | No win/tie/loss claim until a same-corpus adapter compares tree artifacts with ELF source refs and recall debug rows. | +| VectifyAI OpenKB | Compiled Markdown wiki, concept/entity pages, lint, watch, and recompile workflows. | Absorb into Knowledge Workspace without treating derived wiki pages as source memory. | +| OpenViking | Filesystem-like context URIs, hierarchy selection, staged trajectory, and recursive expansion. | Keep trajectory/hierarchy claims blocked until same-corpus staged artifacts exist. | +| mem0/OpenMemory | Entity-scoped history, hosted ecosystem, UI/export, and optional graph memory direction. | Separate local SDK history evidence from hosted, UI/export, and optional graph-memory parity. | +| Letta | Core/archive memory split and export/readback model. | No core/archive parity claim until contained Letta export/readback artifacts include source ids. | +| Graphiti/Zep and graph/RAG projects | Temporal graph validity, citation/navigation, and graph retrieval references. | Graph-lite reports are ELF-native evidence, not broad graph/RAG parity. | +| agentmemory and claude-mem | Capture hooks, local viewers, continuity UX, and progressive disclosure. | Improve operator UX and capture audit without dropping evidence, scope, or write-policy gates. | +| memsearch | Markdown-first canonical store, incremental reindex, and local hybrid retrieval. | Treat as workflow inspiration; ELF's source-of-truth remains Postgres plus typed source refs. | + +Allowed claims: + +- ELF is the strongest measured integrated Agent Knowledge OS product in the June 20, + 2026 checked-in matrix. +- ELF has complete same-repo evidence across the six Agent Knowledge OS layers in + that matrix. +- Competitor strengths remain optimization inputs and comparison targets. + +Disallowed claims: + +- ELF broadly beats every competitor on every competitor-owned strength. +- Reference-only, blocked, incomplete, wrong-result, or not-tested evidence is a pass. +- Public-proxy or local fixture evidence proves private-corpus or provider-backed + production quality. + +## Benchmark Metrics + +Phase closeout and comparison reports must use the real-world benchmark vocabulary +instead of broad leaderboards. + +Required quality dimensions are: + +- `answer_correctness` +- `evidence_grounding` +- `trap_avoidance` +- `uncertainty_handling` +- `workflow_helpfulness` + +Use optional dimensions when the phase touches them: + +- `lifecycle_behavior` +- `debuggability` +- `latency_resource` +- `personalization_fit` + +Reports must preserve typed outcomes: + +- `pass` +- `wrong_result` +- `lifecycle_fail` +- `incomplete` +- `blocked` +- `not_encoded` +- `unsupported_claim` + +Relevant phase reports should also publish expected evidence recall, irrelevant context +ratio, unsupported-claim counts, stale-answer counts, source-ref coverage, citation +coverage, freshness/rationale coverage, proposal lineage completeness, source mutation +count, trace explainability counters, and latency/cost/resource metrics when those +metrics apply to the touched phase. + +## Validation + +Repository-native validation is authoritative. + +- Use `Makefile.toml` as the source of truth for task names. +- For docs-only phase work, run at least `cargo make check-docs` before claiming the + docs are validation-ready. +- Before a PR handoff or any push that refreshes a PR head, run the registered + Decodex workflow gate: `cargo make fmt`, `cargo make lint-fix`, then + `cargo make checks`. In this Makefile tree, `checks` aliases the repo-native + aggregate `check` task. +- If a phase changes commands, schemas, config, runtime behavior, status semantics, + or benchmark claims, update the owning docs and include drift evidence as required + by `docs/policy.md`. + +## Phase Closeout Checklist + +Every phase closeout must answer these checks before the next phase can be queued: + +- Evidence: source refs, artifacts, traces, screenshots, or reports prove the claims + made by the phase. +- Tests: repo-native validation ran, and failures are either fixed or recorded as + explicit blockers. +- Benchmark: relevant real-world jobs or typed benchmark reports exist, or untouched + areas are explicitly `not_encoded` or out of scope. +- Claim boundary: the closeout does not convert blocked, incomplete, wrong-result, + not-tested, public-proxy, local fixture, or reference-only evidence into parity or + production claims. +- Next-phase readiness: the next phase has one accepted issue narrow enough for + Decodex to execute without broad rewrites, and no later issue is queued. diff --git a/docs/spec/index.md b/docs/spec/index.md index 37a0927b..695c9c54 100644 --- a/docs/spec/index.md +++ b/docs/spec/index.md @@ -31,6 +31,7 @@ Question this index answers: "what must remain true?" ## Documents +- `agent_memory_knowledge_system_v1.md`: Agent Memory and Knowledge System v1. - `external_memory_pattern_radar_v1.md`: External Memory Pattern Radar v1. - `production_corpus_manifest_v1.md`: Production Corpus Manifest v1. - `real_world_agent_memory_benchmark_v1.md`: Real-World Agent Memory Benchmark v1. diff --git a/docs/spec/system_version_registry.md b/docs/spec/system_version_registry.md index 4785b614..e7e6a6aa 100644 --- a/docs/spec/system_version_registry.md +++ b/docs/spec/system_version_registry.md @@ -28,6 +28,17 @@ This document is normative. When a new versioned identifier is introduced, it mu ## Registry +### Agent Memory + Knowledge System product contract + +- Identifier: `elf.agent_memory_knowledge_system/v1`. +- Type: Product boundary, roadmap, phase-gate, and claim-boundary contract. +- Defined in: `docs/spec/agent_memory_knowledge_system_v1.md`. +- Consumers: Decodex phase planning, issue scoping, product documentation, benchmark + closeout review, and implementation agents deciding which phase may be queued. +- Bump rule: Introduce a new identifier only when product phases, phase-gate + semantics, authority-layer boundaries, or claim-boundary rules become incompatible + with this contract. + ### HTTP API version - Identifier: `/v2` (URL path prefix).