Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions Makefile.toml
Original file line number Diff line number Diff line change
Expand Up @@ -988,6 +988,7 @@ args = [
# | check-docs | command | |
# | check-rust | command | |
# | check-trace-gate | command | |
# | checks | composite | |

[tasks.check]
clear = true
Expand Down Expand Up @@ -1024,6 +1025,12 @@ args = [
"scripts/trace-gate.sh",
]

[tasks.checks]
workspace = false
dependencies = [
"check",
]

# Clean
# | task | type | cwd |
# | -------------------------- | ------- | --- |
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -496,6 +496,8 @@ proactive-brief, and scheduled-memory scoring evidence.
## Documentation

- Start here: `docs/index.md`
- Agent Memory + Knowledge System product contract:
`docs/spec/agent_memory_knowledge_system_v1.md`
- Runbook index: `docs/runbook/index.md`
- Single-user production runbook:
[docs/runbook/single_user_production.md](docs/runbook/single_user_production.md)
Expand Down
3 changes: 3 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ The split below is by question type, not by human-versus-agent audience.

- Need contracts, invariants, schemas, enums, state machines, or required behavior ->
`docs/spec/`
- Need the Agent Memory + Knowledge System product boundary, P0-P5 roadmap,
Decodex phase gate, or competitor absorption rules ->
`docs/spec/agent_memory_knowledge_system_v1.md`
- Need runbooks, migrations, validation steps, troubleshooting, or operational sequences ->
`docs/runbook/`
- Need the single-user production backup, restore, and Qdrant rebuild path ->
Expand Down
8 changes: 8 additions & 0 deletions docs/log.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,11 @@ logs.
`elf.graph_report/v1` through service, HTTP, and MCP readback, using existing
Postgres graph-lite facts with sourced, inferred, ambiguous, stale, and superseded
markers while keeping `valid_from`/`valid_to` as the internal temporal vocabulary.

## 2026-06-22

- Added `docs/spec/agent_memory_knowledge_system_v1.md` for XY-1059, codifying the
Agent Memory + Knowledge System product boundary, P0-P5 roadmap, Decodex
phase-gate rule, competitor absorption boundaries, validation expectations, and
phase closeout checklist.
- Linked the new product contract from the docs root index and spec index.
284 changes: 284 additions & 0 deletions docs/spec/agent_memory_knowledge_system_v1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
---
type: Spec
title: "Agent Memory and Knowledge System v1"
description: "Define the ELF Agent Memory + Knowledge System product contract, roadmap, phase gate, and claim boundaries."
resource: docs/spec/agent_memory_knowledge_system_v1.md
status: active
authority: normative
owner: spec
last_verified: 2026-06-22
tags:
- docs
- spec
- agent-memory
- knowledge
source_refs:
- docs/evidence/benchmarking/2026-06-20-agent-knowledge-os-closeout-benchmark-report.md
- docs/runbook/benchmarking/real_world_agent_memory_benchmark.md
- docs/spec/real_world_agent_memory_benchmark_v1.md
- docs/spec/system_elf_memory_service_v2.md
code_refs:
- Makefile.toml
related:
- docs/spec/system_elf_memory_service_v2.md
- docs/spec/system_knowledge_pages_v1.md
- docs/spec/system_recall_debug_panel_v1.md
- docs/spec/system_graph_memory_postgres_v1.md
- docs/spec/system_memory_summary_v1.md
drift_watch:
- docs/spec/agent_memory_knowledge_system_v1.md
- docs/evidence/benchmarking/2026-06-20-agent-knowledge-os-closeout-benchmark-report.md
- docs/runbook/benchmarking/real_world_agent_memory_benchmark.md
- Makefile.toml
---
# Agent Memory and Knowledge System v1

Purpose: Define the ELF Agent Memory + Knowledge System product contract, roadmap,
phase gate, and claim boundaries.
Status: normative
Read this when: You are shaping product work, opening implementation issues, reviewing
Agent Memory + Knowledge System claims, or deciding which phase may be queued.
Not this document: Low-level service API semantics, benchmark fixture schemas,
operator run commands, or implementation details for one subsystem.
Defines: `elf.agent_memory_knowledge_system/v1` product boundary, P0-P5 roadmap,
phase-gate rules, agent-facing surfaces, UI role, benchmark metrics, competitor
absorption rules, and phase closeout checklist.

## Product Contract

ELF is an open-source Agent Memory + Knowledge System.

ELF turns sources into traceable knowledge, promotes reliable knowledge into agent
memory, and makes recall explainable, correctable, rollbackable, and benchmarked.

The lead wedge is source-linked memory authority plus recall/debug quality. ELF must
not be positioned as a generic RAG framework, wiki compiler, hosted memory SDK,
graph database, or document-search replacement.

## System Boundary

The product is composed of six typed layers:

| Layer | Authority | Required boundary |
| --- | --- | --- |
| Source Library | Captured documents, excerpts, imports, and source refs. | Sources remain evidence. Derived memory and pages must cite sources instead of replacing them. |
| Memory Authority | Notes, core blocks, ingest decisions, history, corrections, and rollback evidence. | Memory writes are policy-gated, evidence-linked, auditable, and reversible. |
| Knowledge Workspace | Derived project, entity, concept, issue, and decision pages. | Pages are rebuildable derived artifacts with citations, lint, and stale-source detection. |
| Graph-lite Facts | Postgres-backed relation facts and temporal markers. | Graph facts are source-backed context, not a separate authority store. |
| Dreaming Review | Reviewable consolidation, summary, brief, tag, correction, and promotion proposals. | Derived proposals must be reviewable and must not mutate sources without an explicit accepted transition. |
| Recall Debug | Search traces, dropped candidates, source/doc/page/graph/proposal rows, and replay aids. | Recall must expose why context was selected, dropped, unavailable, blocked, or not requested. |

Existing subsystem specs own their detailed contracts. This document owns how those
subsystems fit into the Agent Memory + Knowledge System product boundary.

## Non-Goals

- Do not turn ELF into a broad RAGFlow, OpenKB, PageIndex, mem0, Zep, Letta, qmd,
OpenViking, agentmemory, claude-mem, or memsearch replacement.
- Do not weaken Postgres source-of-truth, source-ref, evidence-binding, English-gate,
scope, lifecycle, or review boundaries to match another product's ergonomics.
- Do not claim hosted managed-memory, private-corpus, provider-backed, UI/export,
graph/RAG, core/archive, context-trajectory, or long-document parity without
same-corpus checked-in or operator-owned evidence for that exact claim.
- Do not collapse `blocked`, `incomplete`, `not_encoded`, `wrong_result`, or
`unsupported_claim` states into pass claims.
- Do not queue later phases while the current accepted phase is still under review.

## Data Model Direction

All implementation phases must preserve the source-to-memory authority chain:

1. Sources are captured as documents, excerpts, event audits, issue/PR records, or
other source refs with stable provenance.
2. Candidate knowledge is derived from sources as proposals, page sections, graph
facts, summaries, or memory candidates.
3. Promotion into memory records an explicit policy decision, source refs, actor,
confidence, importance, lifecycle state, and audit trail.
4. Correction and rollback create durable history instead of silently rewriting the
evidence chain.
5. Recall reads from typed surfaces and returns enough trace data to debug selection,
demotion, filtering, staleness, and missing anchors.

Postgres remains the authority for notes, docs metadata, graph-lite facts, derived
pages, proposal review state, traces, and audit history. Qdrant and any future
retrieval index remain derived and rebuildable.

## Agent-Facing Surfaces

Agent-facing tools must be thin MCP or HTTP facades over typed service behavior.
Business logic and policy remain in `elf-api` and `elf-service`.

Current and future Agent Memory + Knowledge System work should use these surface
families:

| Surface family | Examples | Boundary |
| --- | --- | --- |
| Source capture and hydration | `elf_docs_put`, `elf_docs_search_l0`, `elf_docs_excerpts_get` | Capture and retrieve source evidence without promoting it to memory by default. |
| Memory write and readback | `elf_notes_ingest`, `elf_events_ingest`, `elf_searches_create`, `elf_searches_notes`, `elf_core_blocks_get`, `elf_entity_memory_get` | Writes must preserve policy and evidence decisions; reads must honor scopes and lifecycle. |
| Provenance and history | `elf_admin_note_provenance_get`, `elf_admin_memory_history_get`, trace bundle tools | Debug memory authority without raw database access in normal workflows. |
| Knowledge and graph context | Knowledge page search/readback, `elf_graph_query`, graph report surfaces | Expose derived knowledge and graph facts as labeled context, not authoritative note hits. |
| Dreaming review | Dreaming review queue and proposal review surfaces | Keep proposals reviewable; auto-apply is limited to explicitly accepted low-risk derived organization cases. |
| Recall debug | `elf_recall_debug_panel`, trace and trajectory readback | Show selected, dropped, available, reviewable, blocked, and not-requested context. |

New MCP tools must name the underlying authority layer, link to the owning spec, and
preserve read/write boundaries. A readback tool must not become a hidden mutation path.

## UI Role

The UI is an operator console for source review, memory authority, knowledge pages,
proposal review, graph/topic inspection, and recall debugging.

The UI must:

- label authoritative notes, derived pages, graph facts, proposals, and trace rows
differently;
- show citations, lint state, review state, lifecycle state, and rollback/correction
affordances where applicable;
- prefer typed service readback over raw store inspection;
- avoid presenting derived pages or proposals as current memory unless they have been
promoted through the relevant authority path.

The UI is not the source of truth and must not bypass API, MCP, scope, review, or
write-policy contracts.

## Roadmap

The roadmap phases below are product phases. They are not broad permission to queue or
implement every item in a phase at once.

| Phase | Name | Scope | Gate to leave phase |
| --- | --- | --- | --- |
| P0 | Product contract and phase gate | Codify this product boundary, roadmap, competitor absorption rules, validation expectations, and closeout checklist. | Docs are reviewed, repo docs validation passes, claim boundaries match the June 20 closeout evidence, and the main thread accepts the next phase. |
| P1 | Memory Authority MVP loop | Deliver one source-backed memory-authority vertical slice: capture source evidence, create/review one proposal through a proposal inbox, record the authority ledger, apply/correct/rollback, recall through agent-facing tools, and debug stale/correction behavior. | The slice has service tests, provenance/history evidence, recall/debug readback, and at least one real-world stale/correction benchmark job. |
| P2 | Knowledge Workspace | Promote source-linked project/entity/concept/timeline pages with rebuild, lint, watch, search, and version-diff readback. | Pages stay derived, every section is cited or explicitly unsupported, stale-source lint runs, and benchmark reports publish citation/staleness metrics. |
| P3 | Competitor-strength adapters | Add contained comparison adapters for qmd replay, PageIndex/OpenKB, mem0/OpenMemory, Letta, Graphiti/Zep, OpenViking, graph/RAG references, and other accepted deltas. | Each adapter preserves typed non-pass states and emits same-corpus evidence before any parity, win, tie, or loss claim. |
| P4 | Benchmark and quality hardening | Expand adversarial jobs, public comparison grammar, quality metrics, latency/cost/resource reporting, and unsupported-claim detection. | Reports preserve job/suite/project typed states, expected evidence recall, irrelevant context ratio, unsupported claims, and resource metrics. |
| P5 | Productization | Improve local setup, agent recipes, operator UI, privacy/delete/export boundaries, and production-quality workflows. | Operator workflows have documented setup, privacy/delete/export semantics, and validation evidence without weakening source authority. |

### First Implementation Phase Constraint

The first implementation issue after P0 must be the smallest coherent P1 vertical
slice. It may touch only the surfaces needed to prove one source-linked
memory-authority loop end to end.

The first P1 issue must not build the full Knowledge Workspace, broad operator UI,
external adapter pack, hosted memory behavior, graph/RAG parity, or product-wide
rewrites. Those are later phases unless a main-thread decision explicitly narrows and
accepts a different next slice.

## Decodex Phase Gate

Decodex execution for this project is single-phase gated:

- Only the next accepted phase may carry the service-scoped queue label
`decodex:queued:elf`.
- Later-phase issues must remain unqueued while the current phase is running, under
review, or waiting for main-thread acceptance.
- After each phase lands, the main thread must review evidence, tests, benchmark
results, claim boundaries, and next-phase readiness before any later issue receives
`decodex:queued:elf`.
- `decodex:active:elf` means runtime ownership of an active lane. It is not a request
to start additional phases.
- `In Review` is a PR-backed handoff state. It is not phase acceptance by itself.

P0 is the current phase for this contract. No P1 issue should be queued until the P0
change is reviewed and accepted by the main thread.

## Competitor Absorption Rules

External projects are references for targeted improvements. They are not hidden
dependencies and are not automatic proof that ELF is weaker or stronger.

| Competitor/reference | Strength to absorb | Claim boundary |
| --- | --- | --- |
| qmd | Transparent expansion, fusion, rerank, top-k, and compact replay ergonomics. | Preserve qmd's debug edge until ELF emits comparable replay artifacts. |
| VectifyAI PageIndex | Long-document tree retrieval and PageIndex MCP ecosystem direction. | No win/tie/loss claim until a same-corpus adapter compares tree artifacts with ELF source refs and recall debug rows. |
| VectifyAI OpenKB | Compiled Markdown wiki, concept/entity pages, lint, watch, and recompile workflows. | Absorb into Knowledge Workspace without treating derived wiki pages as source memory. |
| OpenViking | Filesystem-like context URIs, hierarchy selection, staged trajectory, and recursive expansion. | Keep trajectory/hierarchy claims blocked until same-corpus staged artifacts exist. |
| mem0/OpenMemory | Entity-scoped history, hosted ecosystem, UI/export, and optional graph memory direction. | Separate local SDK history evidence from hosted, UI/export, and optional graph-memory parity. |
| Letta | Core/archive memory split and export/readback model. | No core/archive parity claim until contained Letta export/readback artifacts include source ids. |
| Graphiti/Zep and graph/RAG projects | Temporal graph validity, citation/navigation, and graph retrieval references. | Graph-lite reports are ELF-native evidence, not broad graph/RAG parity. |
| agentmemory and claude-mem | Capture hooks, local viewers, continuity UX, and progressive disclosure. | Improve operator UX and capture audit without dropping evidence, scope, or write-policy gates. |
| memsearch | Markdown-first canonical store, incremental reindex, and local hybrid retrieval. | Treat as workflow inspiration; ELF's source-of-truth remains Postgres plus typed source refs. |

Allowed claims:

- ELF is the strongest measured integrated Agent Knowledge OS product in the June 20,
2026 checked-in matrix.
- ELF has complete same-repo evidence across the six Agent Knowledge OS layers in
that matrix.
- Competitor strengths remain optimization inputs and comparison targets.

Disallowed claims:

- ELF broadly beats every competitor on every competitor-owned strength.
- Reference-only, blocked, incomplete, wrong-result, or not-tested evidence is a pass.
- Public-proxy or local fixture evidence proves private-corpus or provider-backed
production quality.

## Benchmark Metrics

Phase closeout and comparison reports must use the real-world benchmark vocabulary
instead of broad leaderboards.

Required quality dimensions are:

- `answer_correctness`
- `evidence_grounding`
- `trap_avoidance`
- `uncertainty_handling`
- `workflow_helpfulness`

Use optional dimensions when the phase touches them:

- `lifecycle_behavior`
- `debuggability`
- `latency_resource`
- `personalization_fit`

Reports must preserve typed outcomes:

- `pass`
- `wrong_result`
- `lifecycle_fail`
- `incomplete`
- `blocked`
- `not_encoded`
- `unsupported_claim`

Relevant phase reports should also publish expected evidence recall, irrelevant context
ratio, unsupported-claim counts, stale-answer counts, source-ref coverage, citation
coverage, freshness/rationale coverage, proposal lineage completeness, source mutation
count, trace explainability counters, and latency/cost/resource metrics when those
metrics apply to the touched phase.

## Validation

Repository-native validation is authoritative.

- Use `Makefile.toml` as the source of truth for task names.
- For docs-only phase work, run at least `cargo make check-docs` before claiming the
docs are validation-ready.
- Before a PR handoff or any push that refreshes a PR head, run the registered
Decodex workflow gate: `cargo make fmt`, `cargo make lint-fix`, then
`cargo make checks`. In this Makefile tree, `checks` aliases the repo-native
aggregate `check` task.
- If a phase changes commands, schemas, config, runtime behavior, status semantics,
or benchmark claims, update the owning docs and include drift evidence as required
by `docs/policy.md`.

## Phase Closeout Checklist

Every phase closeout must answer these checks before the next phase can be queued:

- Evidence: source refs, artifacts, traces, screenshots, or reports prove the claims
made by the phase.
- Tests: repo-native validation ran, and failures are either fixed or recorded as
explicit blockers.
- Benchmark: relevant real-world jobs or typed benchmark reports exist, or untouched
areas are explicitly `not_encoded` or out of scope.
- Claim boundary: the closeout does not convert blocked, incomplete, wrong-result,
not-tested, public-proxy, local fixture, or reference-only evidence into parity or
production claims.
- Next-phase readiness: the next phase has one accepted issue narrow enough for
Decodex to execute without broad rewrites, and no later issue is queued.
1 change: 1 addition & 0 deletions docs/spec/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Question this index answers: "what must remain true?"

## Documents

- `agent_memory_knowledge_system_v1.md`: Agent Memory and Knowledge System v1.
- `external_memory_pattern_radar_v1.md`: External Memory Pattern Radar v1.
- `production_corpus_manifest_v1.md`: Production Corpus Manifest v1.
- `real_world_agent_memory_benchmark_v1.md`: Real-World Agent Memory Benchmark v1.
Expand Down
11 changes: 11 additions & 0 deletions docs/spec/system_version_registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,17 @@ This document is normative. When a new versioned identifier is introduced, it mu

## Registry

### Agent Memory + Knowledge System product contract

- Identifier: `elf.agent_memory_knowledge_system/v1`.
- Type: Product boundary, roadmap, phase-gate, and claim-boundary contract.
- Defined in: `docs/spec/agent_memory_knowledge_system_v1.md`.
- Consumers: Decodex phase planning, issue scoping, product documentation, benchmark
closeout review, and implementation agents deciding which phase may be queued.
- Bump rule: Introduce a new identifier only when product phases, phase-gate
semantics, authority-layer boundaries, or claim-boundary rules become incompatible
with this contract.

### HTTP API version

- Identifier: `/v2` (URL path prefix).
Expand Down