diff --git a/docs/guide/benchmarking/2026-06-11-elf-qmd-memory-evolution-diagnostic.md b/docs/guide/benchmarking/2026-06-11-elf-qmd-memory-evolution-diagnostic.md
new file mode 100644
index 00000000..bf4e53a1
--- /dev/null
+++ b/docs/guide/benchmarking/2026-06-11-elf-qmd-memory-evolution-diagnostic.md
@@ -0,0 +1,211 @@
+# ELF/qmd Memory-Evolution Diagnostic - June 11, 2026
+
+Goal: Explain the fresh live memory-evolution failures for ELF and qmd, and turn the
+measured gaps into benchmark and optimization directions without implementing those
+optimizations here.
+Read this when: You need to decide whether ELF currently beats qmd on
+current-vs-historical memory, supersession, delete/tombstone handling, or temporal
+relation validity.
+Inputs: Fresh local runs of `cargo make real-world-memory-evolution` and
+`cargo make real-world-memory-live-adapters` on commit `87a388b`.
+Outputs: Fixture evidence, live ELF/qmd job-level diagnosis, claim boundaries, and
+future iteration directions.
+
+## Executive Judgment
+
+ELF does not yet have a production-quality live memory-evolution win. The fixture
+suite passes, but the live adapter path still fails five of six current-vs-historical
+jobs.
+
+The narrow fresh result is:
+
+- Fixture memory-evolution: `5/5` pass.
+- ELF live memory-evolution: `1/6` pass, `5/6` wrong_result.
+- qmd live memory-evolution: `0/6` pass, `6/6` wrong_result.
+
+ELF is better than qmd on this fresh live slice only in a limited sense: ELF retrieves
+all required memory-evolution evidence and passes the delete/TTL tombstone job; qmd
+misses three required evidence links and fails the delete/TTL job.
+
+That is not enough to claim ELF has solved memory evolution. The main live ELF gap is
+not basic retrieval. ELF retrieves the current evidence, rationale evidence, and often
+the relevant historical evidence, but the answer and trace do not explicitly encode
+that a historical fact was superseded, invalidated, or preserved as history. The
+scorer therefore records no conflict detection and assigns `0.0` lifecycle behavior
+on the five supersession jobs.
+
+For a memory system meant to support real agents, this is a P0 product-quality gap:
+users do not only ask for the newest note. They ask what changed, why, what used to be
+true, which source is current, and whether an old conclusion is stale.
+
+## Fresh Runs
+
+| Command | Result | Runtime |
+| --- | --- | ---: |
+| `cargo make real-world-memory-evolution` | pass | 50.34 seconds |
+| `cargo make real-world-memory-live-adapters` | pass | 112.26 seconds |
+
+The live adapter command emitted repeated Qdrant client/server compatibility warnings,
+but it completed and wrote ELF and qmd reports. Treat the warning as benchmark-harness
+risk, not as a run failure.
+
+## Fixture Baseline
+
+`cargo make real-world-memory-evolution` proves the benchmark contract itself can
+score the intended behavior:
+
+| Metric | Value |
+| --- | ---: |
+| Jobs | `5` |
+| Pass | `5` |
+| Wrong result | `0` |
+| Mean score | `1.000` |
+| Expected evidence recall | `11/11` |
+| Evidence coverage | `11/11` |
+| Conflict detections | `5` |
+| Update rationales available | `5` |
+| History-readback encoded jobs | `1` |
+
+This is fixture evidence. It proves the scenario contract is encoded and scored. It
+does not prove the ELF live service or qmd CLI path can produce the same behavior.
+
+## Live Full-Sweep Context
+
+The fresh live sweep changed the qmd full-suite shape compared with the previous
+coverage audit:
+
+| Adapter | Jobs | Pass | Wrong result | Blocked | Not encoded | Mean score | Mean latency | Expected evidence recall | Evidence coverage |
+| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
+| ELF live service adapter | `38` | `18` | `5` | `2` | `13` | `0.525` | `8.620 ms` | `41/77` | `48/84` |
+| qmd live CLI adapter | `38` | `17` | `6` | `2` | `13` | `0.486` | `691.163 ms` | `38/77` | `45/84` |
+
+Do not turn this into a broad win claim. The difference is explained by this
+memory-evolution slice: qmd failed the delete/TTL job that ELF passed.
+
+## Live Memory-Evolution Result
+
+| Adapter | Jobs | Pass | Wrong result | Mean score | Expected evidence matched | Produced evidence |
+| --- | ---: | ---: | ---: | ---: | ---: | ---: |
+| ELF live service adapter | `6` | `1` | `5` | `0.492` | `13/13` | `13` |
+| qmd live CLI adapter | `6` | `0` | `6` | `0.325` | `10/13` | `10` |
+
+### Job Matrix
+
+| Job | ELF status | ELF score | qmd status | qmd score | Diagnosis |
+| --- | --- | ---: | --- | ---: | --- |
+| `memory-evolution-benchmark-verdict-001` | wrong_result | `0.40` | wrong_result | `0.15` | ELF retrieved current verdict, caveat, and rationale, but did not cite the old not-ready verdict as historical. qmd also missed the private-corpus caveat evidence. |
+| `memory-evolution-deploy-method-001` | wrong_result | `0.40` | wrong_result | `0.40` | Both retrieved current production runbook and supersession rationale, but neither explicitly preserved the old quickstart path as historical conflict evidence. |
+| `memory-evolution-issue-state-001` | wrong_result | `0.40` | wrong_result | `0.40` | Both answered the current done state and resolution rationale, but neither surfaced the earlier blocked state as superseded history. |
+| `memory-evolution-preference-001` | wrong_result | `0.40` | wrong_result | `0.15` | ELF retrieved current preference and rationale, but did not preserve the old terse preference as historical. qmd only returned the rationale evidence. |
+| `memory-evolution-relation-temporal-001` | wrong_result | `0.35` | wrong_result | `0.35` | Both retrieved current and historical owners, but neither produced a scored temporal-validity explanation or update rationale. |
+| `memory-evolution-delete-ttl-001` | pass | `1.00` | wrong_result | `0.50` | ELF retrieved both tombstone and current plan evidence. qmd retrieved only the current plan and missed the tombstone. |
+
+### Dimension Pattern
+
+For ELF's five wrong-result jobs, the pattern is consistent:
+
+| Dimension | Score pattern |
+| --- | --- |
+| `answer_correctness` | `0.0` on all five wrong-result jobs |
+| `evidence_grounding` | `1.0` on all five wrong-result jobs |
+| `lifecycle_behavior` | `0.0` on all five wrong-result jobs |
+| `trap_avoidance` | `1.0` on all five wrong-result jobs |
+
+That means ELF usually finds the right evidence and avoids stale facts as current, but
+the answer is not lifecycle-aware enough. It does not represent the historical version
+as a first-class part of the answer, so the benchmark cannot credit conflict
+detection.
+
+qmd has the same lifecycle pattern, plus evidence misses:
+
+| qmd miss | Effect |
+| --- | --- |
+| `verdict-bounded-private-caveat` missing | Benchmark verdict job drops to `0.15`. |
+| `pref-current-concise-rationale` missing | Preference job drops to `0.15`. |
+| `delete-tombstone` missing | Delete/TTL job is `wrong_result` despite answering the current plan. |
+
+## What This Says About ELF
+
+ELF currently looks strong at current-fact retrieval and typed source-of-truth
+discipline. It is not yet strong enough at memory evolution.
+
+The missing product behavior is a temporal reconciliation layer:
+
+1. Detect that current and historical evidence both relate to the same claim.
+2. Explain which evidence is current and which is historical.
+3. Preserve old facts when the user asks what changed.
+4. Mark superseded facts as no longer current without deleting their historical value.
+5. Expose tombstones and invalidation evidence as answerable lifecycle facts.
+6. Emit trace artifacts that show conflict candidates, current winner, historical
+   loser, and update rationale.
+
+This is why the fixture can pass while the live path fails. The fixture response is a
+curated memory-evolution answer. The live adapters are retrieval-backed materializers,
+not full temporal reconciliation engines.
+
+## What ELF Should Borrow
+
+These are optimization directions, not implemented changes in this report:
+
+| Source/reference | Useful idea for ELF | Benchmark gate before claiming progress |
+| --- | --- | --- |
+| Graphiti/Zep | Temporal fact validity windows, invalidation, and current/historical graph facts. | Run the Graphiti/Zep temporal graph adapter and compare current, historical, and future-validity jobs. |
+| mem0/OpenMemory | Entity-scoped memory history and user-visible memory lifecycle inspection. | Add entity/preference history readback and UI/export evidence checks. |
+| Letta | Core memory blocks separate from archival memory. | Add core-vs-archival jobs that distinguish always-loaded operating context from retrieved history. |
+| qmd | Local replay and candidate inspection ergonomics. | Emit ELF trace hydration with conflict candidates, demoted historical facts, and replay commands. |
+| Existing ELF production ops | Tombstone and deletion semantics. | Extend delete/TTL scoring from one isolated job into update/delete/recreate history cases. |
+
+## Next Benchmark And Report Directions
+
+1. Live temporal reconciliation report
+   - Score whether ELF can answer "what changed?" with current evidence,
+     historical evidence, and update rationale in the same answer.
+   - Include trace hydration for current winner, historical loser, and conflict
+     resolution reason.
+
+2. Graphiti/Zep temporal graph comparison
+   - Use the existing Graphiti/Zep research gate as the next real adapter target.
+   - The goal is not to copy a graph database blindly; it is to measure validity
+     windows and supersession semantics against ELF.
+
+3. mem0/OpenMemory history comparison
+   - Measure preference/entity history, correction, deletion, and user-visible
+     inspection.
+   - This directly maps to personal agent-memory expectations.
+
+4. qmd tombstone/delete diagnostic
+   - qmd is already the retrieval-debug reference, but it missed the delete tombstone
+     in this run.
+   - Keep this as a measured qmd gap before using qmd as a lifecycle reference.
+
+5. ELF trace-candidate conflict profile
+   - Add a report that shows top candidates for conflict jobs, not only final mapped
+     evidence ids.
+   - This should make it obvious whether historical evidence was absent, present but
+     unselected, or selected but not narrated.
+
+## Claim Boundaries
+
+Allowed claims:
+
+- The fixture memory-evolution suite passes.
+- In the fresh live memory-evolution run, ELF outscored qmd and passed one job qmd
+  failed.
+- ELF retrieved all required memory-evolution evidence in the live run.
+- ELF still failed five of six live memory-evolution jobs because current-vs-historical
+  conflict detection was not encoded in the answer behavior.
+
+Not allowed:
+
+- Do not claim ELF has solved memory evolution.
+- Do not claim ELF broadly beats qmd as a memory system.
+- Do not promote fixture memory-evolution pass into live production proof.
+- Do not treat Graphiti/Zep, mem0/OpenMemory, or Letta as beaten; their strongest
+  scenarios still need comparable adapter reports.
+
+## Bottom Line
+
+The next ELF iteration direction should prioritize temporal reconciliation over more
+generic retrieval work. Retrieval is good enough to find the needed evidence in this
+slice; the failing behavior is deciding and explaining how current, historical,
+deleted, and superseded memories relate.
diff --git a/docs/guide/benchmarking/index.md b/docs/guide/benchmarking/index.md
index 81e90780..1cc0563b 100644
--- a/docs/guide/benchmarking/index.md
+++ b/docs/guide/benchmarking/index.md
@@ -61,6 +61,10 @@ cleanup, use `docs/guide/single_user_production.md`.
 - `2026-06-11-elf-qmd-retrieval-debug-profile.md`: fresh ELF/qmd retrieval-debug
   profile with real-world retrieval-suite evidence, 480-document stress baseline
   evidence, qmd top-10 artifact inspection, and explicit rerank/fusion non-claims.
+- `2026-06-11-elf-qmd-memory-evolution-diagnostic.md`: fresh ELF/qmd
+  memory-evolution diagnostic showing fixture pass, live ELF/qmd current-vs-historical
+  wrong-result patterns, qmd tombstone evidence miss, and temporal-reconciliation
+  iteration directions.
 - `real_world_agent_memory_benchmark.md`: operator overview for the v1 real-world
   agent memory benchmark contract, including suite taxonomy, typed report states,
   knowledge-compilation fixture tasks, and the production-ops fixture target.
diff --git a/docs/research/2026-06-11-elf-qmd-memory-evolution-diagnostic.json b/docs/research/2026-06-11-elf-qmd-memory-evolution-diagnostic.json
new file mode 100644
index 00000000..f7a639ae
--- /dev/null
+++ b/docs/research/2026-06-11-elf-qmd-memory-evolution-diagnostic.json
@@ -0,0 +1,197 @@
+{
+  "schema": "elf.memory_evolution_diagnostic_report/v1",
+  "run_id": "2026-06-11-elf-qmd-memory-evolution-diagnostic",
+  "commit": "87a388b6f33ff0142359876e5d9632fc096ee956",
+  "created_at": "2026-06-11",
+  "scope": "ELF versus qmd live memory-evolution behavior, current-vs-historical conflict diagnosis, and optimization directions",
+  "commands": [
+    {
+      "command": "cargo make real-world-memory-evolution",
+      "status": "pass",
+      "runtime_seconds": 50.34,
+      "artifact": "tmp/real-world-memory/evolution-report.json"
+    },
+    {
+      "command": "cargo make real-world-memory-live-adapters",
+      "status": "pass",
+      "runtime_seconds": 112.26,
+      "artifact": "tmp/real-world-memory/live-adapters/"
+    }
+  ],
+  "fixture_memory_evolution": {
+    "job_count": 5,
+    "pass": 5,
+    "wrong_result": 0,
+    "mean_score": 1.0,
+    "expected_evidence_total": 11,
+    "expected_evidence_matched": 11,
+    "conflict_detection_count": 5,
+    "update_rationale_available_count": 5,
+    "history_readback_encoded_count": 1
+  },
+  "live_full_sweep_context": {
+    "elf": {
+      "job_count": 38,
+      "pass": 18,
+      "wrong_result": 5,
+      "blocked": 2,
+      "not_encoded": 13,
+      "mean_score": 0.525,
+      "mean_latency_ms": 8.62,
+      "expected_evidence_total": 77,
+      "expected_evidence_matched": 41,
+      "evidence_required_count": 84,
+      "evidence_covered_count": 48
+    },
+    "qmd": {
+      "job_count": 38,
+      "pass": 17,
+      "wrong_result": 6,
+      "blocked": 2,
+      "not_encoded": 13,
+      "mean_score": 0.486,
+      "mean_latency_ms": 691.163,
+      "expected_evidence_total": 77,
+      "expected_evidence_matched": 38,
+      "evidence_required_count": 84,
+      "evidence_covered_count": 45
+    }
+  },
+  "live_memory_evolution": {
+    "elf": {
+      "jobs": 6,
+      "pass": 1,
+      "wrong_result": 5,
+      "mean_score": 0.4916666666666667,
+      "expected_evidence_total": 13,
+      "expected_evidence_matched": 13,
+      "produced_evidence_total": 13,
+      "diagnosis": "ELF retrieved all required evidence but failed supersession jobs because conflict detection and lifecycle-aware current-vs-historical answer behavior were not emitted."
+    },
+    "qmd": {
+      "jobs": 6,
+      "pass": 0,
+      "wrong_result": 6,
+      "mean_score": 0.325,
+      "expected_evidence_total": 13,
+      "expected_evidence_matched": 10,
+      "produced_evidence_total": 10,
+      "diagnosis": "qmd had the same missing conflict-detection pattern and additionally missed three required evidence links, including the delete tombstone."
+    }
+  },
+  "job_diagnosis": [
+    {
+      "job_id": "memory-evolution-benchmark-verdict-001",
+      "elf_status": "wrong_result",
+      "elf_score": 0.4,
+      "qmd_status": "wrong_result",
+      "qmd_score": 0.15,
+      "diagnosis": "ELF retrieved current verdict, caveat, and rationale but did not cite the old not-ready verdict as historical; qmd also missed private-corpus caveat evidence."
+    },
+    {
+      "job_id": "memory-evolution-deploy-method-001",
+      "elf_status": "wrong_result",
+      "elf_score": 0.4,
+      "qmd_status": "wrong_result",
+      "qmd_score": 0.4,
+      "diagnosis": "Both retrieved the current runbook and supersession rationale but did not preserve the old quickstart path as historical conflict evidence."
+    },
+    {
+      "job_id": "memory-evolution-issue-state-001",
+      "elf_status": "wrong_result",
+      "elf_score": 0.4,
+      "qmd_status": "wrong_result",
+      "qmd_score": 0.4,
+      "diagnosis": "Both answered the current done state and rationale but did not surface the earlier blocked state as superseded history."
+    },
+    {
+      "job_id": "memory-evolution-preference-001",
+      "elf_status": "wrong_result",
+      "elf_score": 0.4,
+      "qmd_status": "wrong_result",
+      "qmd_score": 0.15,
+      "diagnosis": "ELF retrieved current preference and rationale but did not preserve the old terse preference as historical; qmd only returned rationale evidence."
+    },
+    {
+      "job_id": "memory-evolution-relation-temporal-001",
+      "elf_status": "wrong_result",
+      "elf_score": 0.35,
+      "qmd_status": "wrong_result",
+      "qmd_score": 0.35,
+      "diagnosis": "Both retrieved current and historical owners but did not emit scored temporal-validity explanation or update rationale."
+    },
+    {
+      "job_id": "memory-evolution-delete-ttl-001",
+      "elf_status": "pass",
+      "elf_score": 1.0,
+      "qmd_status": "wrong_result",
+      "qmd_score": 0.5,
+      "diagnosis": "ELF retrieved tombstone and current plan evidence; qmd retrieved only the current plan and missed the tombstone."
+    }
+  ],
+  "elf_failure_pattern": {
+    "wrong_result_jobs": 5,
+    "answer_correctness_score": 0.0,
+    "evidence_grounding_score": 1.0,
+    "lifecycle_behavior_score": 0.0,
+    "trap_avoidance_score": 1.0,
+    "interpretation": "The issue is lifecycle-aware reconciliation and narration, not basic evidence retrieval."
+  },
+  "claim_boundary": {
+    "fixture_claim": "fixture_memory_evolution_passes",
+    "live_claim": "elf_narrowly_outscores_qmd_on_this_fresh_slice_but_does_not_solve_memory_evolution",
+    "not_allowed": [
+      "ELF broadly beats qmd as a memory system",
+      "ELF has solved temporal memory evolution",
+      "fixture pass is production proof",
+      "Graphiti/Zep, mem0/OpenMemory, or Letta are beaten"
+    ]
+  },
+  "optimization_directions": [
+    {
+      "direction": "temporal_reconciliation_layer",
+      "description": "Detect current and historical evidence for the same claim, choose the current winner, preserve the historical loser, and cite update rationale."
+    },
+    {
+      "direction": "history_readback_and_note_version_links",
+      "description": "Expose add/update/delete/ignore history and version links for user preference and entity memory changes."
+    },
+    {
+      "direction": "tombstone_and_invalidation_evidence",
+      "description": "Treat deletion and TTL tombstones as answerable evidence instead of only suppressing stale retrieval."
+    },
+    {
+      "direction": "trace_conflict_candidates",
+      "description": "Hydrate trace artifacts with conflict candidates, current winners, historical losers, dropped candidates, and replay commands."
+    }
+  ],
+  "borrow_from": [
+    {
+      "project": "Graphiti/Zep",
+      "borrow": "temporal fact windows, invalidation, supersession, and graph fact provenance",
+      "benchmark_gate": "Graphiti/Zep temporal graph adapter for current, historical, and future-valid facts"
+    },
+    {
+      "project": "mem0/OpenMemory",
+      "borrow": "entity-scoped history, lifecycle inspection, and memory UI/readback",
+      "benchmark_gate": "entity and preference history readback with correction and deletion evidence"
+    },
+    {
+      "project": "Letta",
+      "borrow": "core memory blocks versus archival memory",
+      "benchmark_gate": "core-vs-archival jobs for operating context and historical retrieval"
+    },
+    {
+      "project": "qmd",
+      "borrow": "local replay and candidate inspection ergonomics",
+      "benchmark_gate": "ELF trace hydration with conflict candidates and replay commands"
+    }
+  ],
+  "next_reports": [
+    "Live temporal reconciliation report",
+    "Graphiti/Zep temporal graph comparison",
+    "mem0/OpenMemory history comparison",
+    "qmd tombstone/delete diagnostic",
+    "ELF trace-candidate conflict profile"
+  ]
+}