Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,7 @@
**/.next
**/node_modules
**/npm-debug.log
.worktrees
.git
target
tmp
32 changes: 32 additions & 0 deletions Makefile.toml
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,38 @@ args = [
]


# Competitive parity
# | task | type | cwd |
# | ------------------- | ------- | --- |
# | parity-docker | command | |
# | parity-docker-clean | command | |

[tasks.parity-docker]
workspace = false
command = "docker"
args = [
"compose",
"-f",
"docker-compose.parity.yml",
"run",
"--build",
"--rm",
"parity-runner",
]

[tasks.parity-docker-clean]
workspace = false
command = "docker"
args = [
"compose",
"-f",
"docker-compose.parity.yml",
"down",
"-v",
"--remove-orphans",
]


# Meta
# | task | type | cwd |
# | ------ | --------- | --- |
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ This table compares capability coverage, not overall project quality.
| Source-of-truth + rebuildable derived index | ✅ | ⚠️ | ✅ | ⚠️ | ⚠️ | ⚠️ | ✅ |
| Hierarchical/recursive retrieval strategy | ⚠️ (in progress) | ⚠️ | ✅ | ⚠️ | ⚠️ | ⚠️ | ⚠️ |
| Progressive context loading (L0/L1/L2 style) | ⚠️ (in progress) | ⚠️ | ✅ | ⚠️ | — | ⚠️ | — |
| Built-in web memory inspector/viewer | | ✅ | — | ✅ (OpenMemory) | — | ✅ | — |
| Built-in web memory inspector/viewer | | ✅ | — | ✅ (OpenMemory) | — | ✅ | — |
| Hosted managed option | — | — | — | ✅ | — | — | — |
| Multi-tenant scope semantics | ✅ | ⚠️ | ⚠️ | ✅ | — | — | — |
| TTL/lifecycle policy controls | ✅ | ⚠️ | ⚠️ | ✅ | — | ⚠️ | — |
Expand Down
53 changes: 53 additions & 0 deletions docker-compose.parity.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
name: elf-parity-gate

services:
postgres:
image: pgvector/pgvector:pg18
environment:
POSTGRES_DB: postgres
POSTGRES_PASSWORD: elf_dev_password
POSTGRES_USER: elf_dev
healthcheck:
test:
- CMD-SHELL
- pg_isready -U elf_dev -d postgres
interval: 2s
timeout: 5s
retries: 30
volumes:
- elf-parity-postgres-data:/var/lib/postgresql

qdrant:
image: qdrant/qdrant:v1.16.3
volumes:
- elf-parity-qdrant-data:/qdrant/storage

parity-runner:
build:
context: .
dockerfile: docker/parity/Dockerfile
depends_on:
postgres:
condition: service_healthy
qdrant:
condition: service_started
environment:
CARGO_HOME: /usr/local/cargo
ELF_HARNESS_COLLECTION: elf_parity_consolidation
ELF_HARNESS_DB_NAME: elf_parity_consolidation
ELF_HARNESS_RUN_ID: parity-docker
ELF_PG_DSN: postgres://elf_dev:elf_dev_password@postgres:5432/postgres
ELF_QDRANT_GRPC_URL: http://qdrant:6334
ELF_QDRANT_HTTP_URL: http://qdrant:6333
volumes:
- elf-parity-cargo-registry:/usr/local/cargo/registry
- elf-parity-cargo-git:/usr/local/cargo/git
- elf-parity-target:/workspace/target
- ./tmp/parity:/workspace/tmp/parity

volumes:
elf-parity-cargo-git:
elf-parity-cargo-registry:
elf-parity-postgres-data:
elf-parity-qdrant-data:
elf-parity-target:
23 changes: 23 additions & 0 deletions docker/parity/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
FROM rust:1-bookworm

RUN apt-get update \
&& apt-get install -y --no-install-recommends \
bash \
ca-certificates \
clang \
cmake \
curl \
git \
jq \
libssl-dev \
perl \
pkg-config \
postgresql-client \
protobuf-compiler \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /workspace

COPY . /workspace

CMD ["bash", "scripts/parity-docker-gate.sh"]
80 changes: 80 additions & 0 deletions docs/guide/competitive_parity_testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Competitive Parity Testing

Goal: Run the Docker-only parity gate that decides whether ELF has enough evidence to be considered against external memory systems.
Read this when: You need to prove ELF meets the minimum adoption bar instead of relying on architecture claims.
Preconditions: Docker and Docker Compose are available on the host.
Depends on: `docs/spec/system_competitive_parity_gate_v1.md`, `docs/guide/research/agentmemory_adapter.md`, and `Makefile.toml`.
Verification: `cargo make parity-docker` exits successfully and writes `tmp/parity/competitive-parity-report.json` with `verdict = "pass"`.

## Run

Start the gate from the repository root:

```sh
cargo make parity-docker
```

This command invokes Docker Compose on the host. The actual adapter check,
service-backed ELF run, Postgres database, Qdrant vector store, Cargo registry cache,
and Rust build target all run inside Docker-managed containers or volumes.

The report is written to:

```text
tmp/parity/competitive-parity-report.json
```

## Clean Up

Remove parity containers and Docker-managed volumes:

```sh
cargo make parity-docker-clean
```

The cleanup command removes Postgres, Qdrant, Cargo cache, and Rust target volumes
for the parity environment. It does not remove the host report directory under
`tmp/parity/`.

## Current Gate Coverage

The checked-in gate currently proves this minimum set:

- the agentmemory fixture adapter maps the sanitized sample into 2 note candidates,
2 doc candidates, 1 baseline query, and 1 explicit ignored item;
- note candidate source references keep the agentmemory fixture resolver and origin
identifiers;
- unsupported agentmemory memory kinds are rejected with the preserved reason
`unsupported_memory_kind`;
- ELF can run a Postgres/Qdrant-backed retrieval and consolidation harness in Docker;
- consolidation preserves or improves recall while keeping retrieved context size no
larger than the baseline run;
- the local admin viewer route returns 200 during the Docker service run.

This is not enough for personal production adoption by itself. It is the required
floor that prevents subjective comparisons from being mistaken for evidence.

## Production Adoption Expansion

Before using ELF as personal production memory infrastructure, extend the same gate
with private data and live baselines:

1. Build a sanitized private fixture pack from real personal coding-agent memory
cases. Keep the source fixture out of the repository unless it has been reviewed
for secrets and sensitive content.
2. Run the adapter/import/retrieval path against that private fixture pack inside
Docker.
3. Add at least one live containerized external baseline, starting with agentmemory,
against the same retrieval cases.
4. Keep the acceptance decision strict: ELF is not adopted if it loses on retrieval
quality, migration fidelity, operator inspectability, or failure recovery without
a documented compensating advantage.

## Failure Handling

When `cargo make parity-docker` fails:

- keep `tmp/parity/competitive-parity-report.json` if it was written;
- inspect `tmp/parity/consolidation-harness.log` for service-backed failures;
- fix the failing gate dimension before expanding to broader baselines;
- do not lower thresholds to make a comparison pass.
2 changes: 2 additions & 0 deletions docs/guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ Then structure the body for execution:

## Guide subfolders

- `docs/guide/competitive_parity_testing.md` for running the Docker-only adoption
gate against external memory-system baselines.
- `docs/guide/development/` for repository-development workflows.
- `docs/guide/research/` for external comparisons and decision-support materials that are
non-normative.
2 changes: 2 additions & 0 deletions docs/spec/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ Question this index answers: "what must remain true?"
and storage invariants.
- `system_consolidation_proposals_v1.md`: Reviewable derived consolidation run and
proposal contract over immutable source evidence.
- `system_competitive_parity_gate_v1.md`: Docker-only adoption gate that decides
whether ELF meets or exceeds selected external memory-system baselines.

## Spec document contract

Expand Down
147 changes: 147 additions & 0 deletions docs/spec/system_competitive_parity_gate_v1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Competitive Parity Gate v1 Specification

Purpose: Define the adoption gate ELF must pass before it can be treated as production-eligible memory infrastructure.
Status: normative
Read this when: You are deciding whether ELF is at least as usable as the external memory systems it is being compared against.
Not this document: A market survey, implementation plan, or claim that architecture alone makes ELF better.
Defines: `elf.competitive_parity_gate/v1` dimensions, Docker isolation rules, baseline families, hard thresholds, and report schema.

Related inputs:

- `docs/research/2026-06-08-agent-memory-selection.json`
- `docs/guide/research/comparison_external_projects.md`
- `docs/guide/research/agentmemory_adapter.md`
- `docs/spec/system_elf_memory_service_v2.md`
- `docs/spec/system_consolidation_proposals_v1.md`

## Core Rule

ELF is adoption-eligible only when current test evidence shows that it meets or
exceeds the selected baseline projects in user-visible value. A design advantage,
unchecked capability table, or speculative architecture claim is not sufficient.

The gate must fail closed. If ELF cannot run the comparison, preserve evidence,
retrieve expected memory, expose inspection surfaces, or cleanly isolate state, the
gate result is `fail`.

## Contract Schema

Canonical schema identifier:

```text
elf.competitive_parity_gate/v1
```

Every parity report must carry:

```json
{
"schema": "elf.competitive_parity_gate.report/v1",
"gate_schema": "elf.competitive_parity_gate/v1"
}
```

## Docker Isolation

Competitive parity runs must use Docker Compose as the execution boundary.

Required properties:

- The host may invoke `docker compose`, but benchmark code, service processes,
Postgres, Qdrant, Cargo builds, and test commands must run inside containers.
- The parity compose file must not publish service ports to the host by default.
- Postgres, Qdrant, Cargo registry, Cargo git cache, and Rust target output must use
Docker-managed volumes.
- The only allowed host artifact is the parity report directory, normally
`tmp/parity/`.
- A parity runner must refuse to run on the host unless an explicit
`ELF_PARITY_ALLOW_HOST=1` override is supplied for debugging.
- Cleanup must be possible with `docker compose -f docker-compose.parity.yml down -v
--remove-orphans`.

## Baseline Families

The gate tracks baseline families separately so evidence can grow without changing
the core contract:

- `agentmemory_fixture`: sanitized offline agentmemory-style session exports mapped
through the ELF-owned fixture adapter.
- `agentmemory_live_container`: future containerized agentmemory service comparisons
against the same private evaluation cases.
- `claude_mem_fixture`: future fixture import and retrieval comparison for
progressive-disclosure Claude memory workflows.
- `mem0_openmemory_fixture`: future local OpenMemory-style workflow comparison.
- `qmd_memsearch_fixture`: future local retrieval-quality comparison against
CLI/MCP-first hybrid retrieval systems.

External projects are baselines and product references. They must not become hidden
runtime dependencies of ELF core memory semantics unless a separate design spec
explicitly adopts that dependency.

## Gate Dimensions

Each completed gate report must evaluate these dimensions:

| Dimension | Meaning | First hard threshold |
| --------- | ------- | -------------------- |
| `docker_isolation` | The full run used container services and container-local build state. | `pass` |
| `adapter_coverage` | Baseline fixture records are mapped into candidate ELF notes, docs, queries, and ignored reasons. | agentmemory sample emits 2 note candidates, 2 doc candidates, 1 baseline query, and 1 ignored item |
| `provenance_integrity` | Candidate writes keep source-system, session, and item references. | agentmemory note candidate provenance completeness is `1.0` |
| `unsafe_rejection` | Unsupported or unsafe external memory items are rejected explicitly. | at least one ignored item with reason `unsupported_memory_kind` |
| `retrieval_quality` | ELF returns the expected memory for parity queries after normal ingestion/indexing. | consolidation harness after-run recall is not below baseline recall |
| `context_efficiency` | Retrieval/consolidation does not require more context to preserve recall. | consolidation harness after-run context chars are not above baseline |
| `source_safety` | Consolidation output remains derived and reviewable; authoritative source records are not destructively rewritten. | consolidation proposal/source immutability contract remains satisfied |
| `operator_inspectability` | A local operator can inspect memory state without write authority. | admin `GET /viewer` returns 200 during the Docker service run |
| `cleanup` | Test state can be removed without host database or vector-store residue. | documented compose cleanup command exists and succeeds when run |

These are minimum thresholds. Passing them only proves that the checked-in gate is
alive. Personal production use requires the same gate shape to pass against a larger
private fixture pack and at least one live containerized baseline.

## First Gate Scope

The first checked-in executable gate covers:

- Docker-only execution through `docker-compose.parity.yml`.
- Offline `agentmemory_fixture` adapter validation using the sanitized sample fixture.
- Service-backed ELF consolidation/retrieval validation using Postgres and Qdrant
containers.
- Admin viewer availability during the service-backed run.
- A machine-readable report under `tmp/parity/competitive-parity-report.json`.

The first gate does not claim broad market superiority. It establishes a hard,
repeatable lower bound that must stay green before broader baselines are meaningful.

## Report Schema

Parity reports must be JSON objects with at least:

- `schema`: `elf.competitive_parity_gate.report/v1`
- `gate_schema`: `elf.competitive_parity_gate/v1`
- `gate_id`: stable or timestamped run identifier
- `verdict`: `pass` or `fail`
- `docker_only`: boolean
- `baselines`: object keyed by baseline family
- `dimensions`: object keyed by gate dimension
- `thresholds`: object describing the hard thresholds used by the run
- `artifacts`: object with relative paths to preserved run evidence

Reports may include extra metrics, but extra fields must not weaken the hard
thresholds in this spec.

## Adoption Decision

Treat ELF as `not_adoptable_for_production` while any of these are true:

- The Docker parity gate fails.
- The gate only passes the checked-in toy fixture and has not passed a private
personal fixture pack.
- At least one selected external baseline outperforms ELF on retrieval quality,
migration fidelity, operator inspectability, or failure recovery without a
documented compensating ELF advantage.
- Evidence cannot be reproduced from the report artifacts.

Treat ELF as `personal_production_candidate` only after the Docker gate passes on
both the checked-in fixture and a private personal fixture pack, and after at least
one live external baseline comparison is no worse than ELF on the selected
acceptance metrics.
Loading