diff --git a/README.md b/README.md
index 828d1821..60535d0f 100644
--- a/README.md
+++ b/README.md
@@ -143,6 +143,10 @@ with the production embedding provider path, `Qwen3-Embedding-8B`, and
   passed same-corpus retrieval but failed lifecycle/cold-start coverage. memsearch,
   mem0, OpenViking, and claude-mem remained `incomplete` or wrong-result typed states;
   those states are reported as limitations, not hidden as proof.
+- Real-world agent memory aggregate after the P1 benchmark batch: 38 fixture-backed
+  jobs across 11 suites, 35 pass, 1 incomplete, 2 blocked, 0 wrong-result,
+  0 not-encoded, and 0 unsupported-claim results. The remaining non-pass jobs are
+  production-ops operator boundaries, not hidden benchmark wins.
 - The benchmark runner and report publisher are checked in and Docker-isolated:
   `cargo make baseline-live-docker`, `cargo make baseline-backfill-docker`,
   `cargo make baseline-production-private-addendum`,
@@ -157,19 +161,30 @@ Detailed evidence and interpretation:
 - [Live Baseline Benchmark Report - June 9, 2026](docs/guide/benchmarking/2026-06-09-live-baseline-report.md)
 - [Synthetic Production Corpus Benchmark Report - June 9, 2026](docs/guide/benchmarking/2026-06-09-production-corpus-report.md)
 - [Production Adoption Gate Report - June 9, 2026](docs/guide/benchmarking/2026-06-09-production-adoption-gate-report.md)
+- [Real-World Comparison Report - June 10, 2026](docs/guide/benchmarking/2026-06-10-real-world-comparison-report.md)
 - [Live Baseline Benchmark Runbook](docs/guide/benchmarking/live_baseline_benchmark.md)
 - [Single-User Production Runbook](docs/guide/single_user_production.md)
-- Future benchmark contract:
+- Benchmark contract:
   [Real-World Agent Memory Benchmark v1](docs/spec/real_world_agent_memory_benchmark_v1.md).
-  This contract defines job-level suites for agent work. Checked-in fixture runners now
-  cover a smoke work-resume slice and proposal-only consolidation cases through
-  `cargo make real-world-job-smoke` and `cargo make real-world-memory-consolidation`,
-  and `cargo make real-world-memory` now reports the first external adapter coverage
-  manifest for ELF, qmd, agentmemory, mem0/OpenMemory, claude-mem, memsearch, and
-  OpenViking. Those real-world reports still distinguish fixture-backed and
-  live-baseline-only evidence from true live real-world adapter runs; no external
-  project has a live real-world suite win until an adapter actually executes
-  `real_world_job` prompts and scoring.
+  This contract defines job-level suites for agent work. `cargo make real-world-memory`
+  now reports fixture-backed ELF evidence plus the external adapter coverage manifest
+  for ELF, qmd, agentmemory, mem0/OpenMemory, claude-mem, memsearch, and OpenViking.
+  The report still distinguishes fixture-backed and live-baseline-only evidence from
+  true live real-world adapter runs; no external project has a live real-world suite win
+  until an adapter actually executes `real_world_job` prompts and scoring.
+
+Evidence-backed position after the June 10 real-world report:
+
+- ELF is better evidenced than the tested alternatives on evidence-bound writes,
+  deterministic ingestion boundaries, Postgres source-of-truth plus rebuildable Qdrant
+  indexing, scoped service APIs, and fixture-backed provenance/resume/evolution checks.
+- ELF and qmd are both strong in the current encoded retrieval evidence: qmd remains
+  the local retrieval-debug baseline, while ELF has the stronger service and provenance
+  contract.
+- ELF is still behind or not yet proven on live real-world external adapters,
+  private-corpus production quality, credentialed production-ops gates, qmd-style local
+  debug knobs, agentmemory/claude-mem/OpenMemory-style continuity UX, OpenViking-style
+  context trajectory, and hosted managed memory.
 
 Quick comparison snapshot (objective/high-level).
 This table compares capability coverage, not overall project quality.
@@ -222,7 +237,8 @@ Detailed comparison, mechanism-level analysis, and source map:
 - [Agent Memory Selection Research Run](docs/research/2026-06-08-agent-memory-selection.json)
 - [Real-World Benchmark Dimension Research Run](docs/research/2026-06-09-xy-841-external-memory-benchmark-dimensions.json)
 
-Latest external research refresh: June 9, 2026.
+Latest real-world benchmark report: June 10, 2026. Latest external research refresh:
+June 9, 2026.
 
 ## Documentation
 
diff --git a/apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json b/apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json
index c66ebd56..1c37fc4c 100644
--- a/apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json
+++ b/apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json
@@ -20,7 +20,7 @@
       "evidence_class": "fixture_backed",
       "docker_default": true,
       "host_global_installs_required": false,
-      "overall_status": "wrong_result",
+      "overall_status": "incomplete",
       "setup": {
         "status": "pass",
         "evidence": "The checked-in real_world_memory fixtures parse and score through the ELF fixture runner.",
@@ -28,13 +28,13 @@
         "artifact": "tmp/real-world-memory/real-world-memory-report.json"
       },
       "run": {
-        "status": "wrong_result",
-        "evidence": "The current fixture set reports 27 jobs, 25 pass, 1 wrong_result, and 1 not_encoded.",
+        "status": "incomplete",
+        "evidence": "The current fixture set reports 38 jobs, 35 pass, 1 incomplete, 2 blocked, 0 wrong_result, 0 not_encoded, and 0 unsupported_claim.",
         "command": "cargo make real-world-memory",
         "artifact": "tmp/real-world-memory/real-world-memory-report.json"
       },
       "result": {
-        "status": "wrong_result",
+        "status": "incomplete",
         "evidence": "This is fixture-backed ELF scoring, not a live external adapter result.",
         "artifact": "tmp/real-world-memory/real-world-memory-report.md"
       },
@@ -66,40 +66,50 @@
           "status": "pass",
           "evidence": "Checked-in work-resume fixtures are encoded and passing."
         },
+        {
+          "suite_id": "project_decisions",
+          "status": "pass",
+          "evidence": "Checked-in project-decision fixtures cover accepted decisions, reversals, current validation gates, rationale, and bounded caveats."
+        },
         {
           "suite_id": "retrieval",
           "status": "pass",
-          "evidence": "Checked-in retrieval fixtures are encoded; one deliberate operator-debug wrong-result case is reported under operator_debugging_ux."
+          "evidence": "Checked-in retrieval fixtures cover alternate phrasing, distractors, multi-hop routing, current-versus-obsolete selection, and minimal context."
         },
         {
           "suite_id": "memory_evolution",
-          "status": "not_encoded",
-          "evidence": "The relation temporal-validity case is deliberately not_encoded until temporal graph validity is implemented."
+          "status": "pass",
+          "evidence": "Checked-in memory-evolution fixtures cover current-versus-historical facts and the relation temporal-validity case is encoded."
         },
         {
-          "suite_id": "operator_debugging_ux",
-          "status": "wrong_result",
-          "evidence": "The aggregate fixture set includes one deliberate wrong-result trace attribution case."
+          "suite_id": "consolidation",
+          "status": "pass",
+          "evidence": "Proposal-only consolidation fixtures are encoded and passing without source mutation."
         },
         {
-          "suite_id": "capture_integration",
+          "suite_id": "knowledge_compilation",
           "status": "pass",
-          "evidence": "The redaction and capture-boundary fixture is encoded and passing."
+          "evidence": "Knowledge page fixtures are encoded and passing with citation and rebuild metrics."
         },
         {
-          "suite_id": "personalization",
+          "suite_id": "operator_debugging_ux",
           "status": "pass",
-          "evidence": "The scoped preference fixture is encoded and passing."
+          "evidence": "Operator-debugging fixtures now expose stage attribution and dropped-candidate evidence without raw SQL."
         },
         {
-          "suite_id": "consolidation",
+          "suite_id": "capture_integration",
           "status": "pass",
-          "evidence": "Proposal-only consolidation fixtures are encoded and passing without source mutation."
+          "evidence": "The redaction and capture-boundary fixture is encoded and passing."
         },
         {
-          "suite_id": "knowledge_compilation",
+          "suite_id": "production_ops",
+          "status": "incomplete",
+          "evidence": "Production-ops fixtures encode restore, Qdrant rebuild, backfill resume, resource-envelope interpretation, plus typed incomplete and blocked operator boundaries."
+        },
+        {
+          "suite_id": "personalization",
           "status": "pass",
-          "evidence": "Knowledge page fixtures are encoded and passing with citation and rebuild metrics."
+          "evidence": "The scoped preference fixture is encoded and passing."
         }
       ],
       "evidence": [
@@ -115,7 +125,8 @@
         }
       ],
       "notes": [
-        "This adapter record exists to keep ELF fixture results separate from live external adapter results."
+        "This adapter record exists to keep ELF fixture results separate from live external adapter results.",
+        "The remaining non-pass ELF fixture states are production-ops operator boundaries: a Docker local-embedding dependency, provider credentials, and an operator-owned private corpus manifest."
       ],
       "follow_up": {
         "title": "[ELF benchmark vNext] Replace fixture-only ELF answers with live real-world adapter execution where appropriate",
diff --git a/apps/elf-eval/tests/real_world_job_benchmark.rs b/apps/elf-eval/tests/real_world_job_benchmark.rs
index eb1d38ca..04a8b409 100644
--- a/apps/elf-eval/tests/real_world_job_benchmark.rs
+++ b/apps/elf-eval/tests/real_world_job_benchmark.rs
@@ -224,7 +224,7 @@ fn real_world_report_includes_external_adapter_coverage_manifest() -> Result<()>
 		report
 			.pointer("/external_adapters/summary/overall_status_counts/wrong_result")
 			.and_then(Value::as_u64),
-		Some(4)
+		Some(3)
 	);
 	assert_eq!(
 		report
@@ -236,7 +236,7 @@ fn real_world_report_includes_external_adapter_coverage_manifest() -> Result<()>
 		report
 			.pointer("/external_adapters/summary/overall_status_counts/incomplete")
 			.and_then(Value::as_u64),
-		Some(1)
+		Some(2)
 	);
 	assert_eq!(
 		report
@@ -258,6 +258,7 @@ fn real_world_report_includes_external_adapter_coverage_manifest() -> Result<()>
 	let openviking = find_by_field(adapters, "/adapter_id", "openviking_live_baseline")?;
 
 	assert_eq!(elf.pointer("/evidence_class").and_then(Value::as_str), Some("fixture_backed"));
+	assert_eq!(elf.pointer("/overall_status").and_then(Value::as_str), Some("incomplete"));
 	assert_eq!(qmd.pointer("/overall_status").and_then(Value::as_str), Some("pass"));
 	assert_eq!(qmd.pointer("/suites/0/status").and_then(Value::as_str), Some("not_encoded"));
 	assert_eq!(
diff --git a/docs/guide/benchmarking/2026-06-10-real-world-comparison-report.md b/docs/guide/benchmarking/2026-06-10-real-world-comparison-report.md
new file mode 100644
index 00000000..1082526c
--- /dev/null
+++ b/docs/guide/benchmarking/2026-06-10-real-world-comparison-report.md
@@ -0,0 +1,177 @@
+# Real-World Comparison Report - June 10, 2026
+
+Goal: Publish the post-P1 real-world agent memory benchmark evidence and adoption
+implications.
+Read this when: You need the checked-in evidence behind README-level real-world
+benchmark claims after XY-833 and XY-861 through XY-864 landed.
+Inputs: Generated reports under `tmp/real-world-memory/` and `tmp/real-world-job/`,
+`apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json`,
+and the live-baseline reports linked from this guide.
+Depends on: `docs/spec/real_world_agent_memory_benchmark_v1.md`,
+`docs/guide/benchmarking/real_world_agent_memory_benchmark.md`, and
+`docs/guide/benchmarking/live_baseline_benchmark.md`.
+Verification: The commands listed below were run from branch `y/elf-xy-865`. The
+generated reports used runner version
+`0.2.0-89d30dc04a854771f2a62f607e1d13498ccb3073-aarch64-apple-darwin`; the working
+tree also contained the adapter manifest refresh recorded here.
+
+## Context
+
+Dependency batch state at report time:
+
+| Issue | Result | PR |
+| --- | --- | --- |
+| XY-833 operator-debugging UX repair | Done | `https://github.com/hack-ink/ELF/pull/147` |
+| XY-861 project-decision suite | Done | `https://github.com/hack-ink/ELF/pull/151` |
+| XY-862 production-ops suite | Done | `https://github.com/hack-ink/ELF/pull/148` |
+| XY-863 graph temporal validity | Done | `https://github.com/hack-ink/ELF/pull/150` |
+| XY-864 external adapter comparison contract | Done | `https://github.com/hack-ink/ELF/pull/149` |
+
+This report is for the XY-865 branch `y/elf-xy-865` and PR title
+`XY-865: [ELF benchmark vNext P1] Publish real-world comparison report and adoption plan`.
+
+No private-corpus or credentialed provider checks were run for this report because no
+operator-owned private manifest or routed provider credentials were supplied. Those
+paths remain typed `blocked` boundaries, not passes.
+
+## Commands
+
+| Command | Generated artifact | Run ID | Generated at |
+| --- | --- | --- | --- |
+| `cargo make real-world-memory` | `tmp/real-world-memory/real-world-memory-report.{json,md}` | `real-world-memory` | `2026-06-10T04:21:32.545027Z` |
+| `cargo make real-world-memory-project-decisions` | `tmp/real-world-memory/project-decisions/report.{json,md}` | `real-world-memory-project-decisions` | `2026-06-10T04:21:52.403238Z` |
+| `cargo make real-world-memory-production-ops` | `tmp/real-world-memory/production-ops-report.{json,md}` | `real-world-memory-production-ops` | `2026-06-10T04:21:59.520163Z` |
+| `cargo make real-world-memory-evolution` | `tmp/real-world-memory/evolution-report.{json,md}` | `real-world-memory-evolution` | `2026-06-10T04:22:06.325152Z` |
+| `cargo make real-world-job-operator-ux` | `tmp/real-world-job/real-world-job-operator-ux-report.{json,md}` | `real-world-job-operator-ux` | `2026-06-10T04:22:12.28938Z` |
+
+All generated reports used runner version
+`0.2.0-89d30dc04a854771f2a62f607e1d13498ccb3073-aarch64-apple-darwin`.
+
+## Aggregate Result
+
+`cargo make real-world-memory` now reports `38` jobs across all `11` encoded real-world
+suites:
+
+| Metric | Value |
+| --- | ---: |
+| Pass | `35` |
+| Incomplete | `1` |
+| Blocked | `2` |
+| Wrong result | `0` |
+| Lifecycle fail | `0` |
+| Not encoded | `0` |
+| Unsupported claim | `0` |
+| Mean score | `0.921` |
+| Evidence coverage | `82/82` (`1.000`) |
+| Source-ref coverage | `82/82` (`1.000`) |
+| Quote coverage | `82/82` (`1.000`) |
+| Expected evidence recall | `75/75` (`1.000`) |
+| Redaction leaks | `0` |
+| Scope violations | `0` |
+| Temporal validity gaps | `0` |
+| Qdrant rebuild cases | `2/2` pass |
+
+Suite-level outcomes:
+
+| Suite | Jobs | Status | Mean score | Interpretation |
+| --- | ---: | --- | ---: | --- |
+| `trust_source_of_truth` | 1 | `pass` | `1.000` | Source-of-truth rebuild fixture passed. |
+| `work_resume` | 5 | `pass` | `1.000` | Resume and exact next-action fixtures passed. |
+| `project_decisions` | 5 | `pass` | `1.000` | Current decisions, reversals, rationale, and caveats passed. |
+| `retrieval` | 5 | `pass` | `1.000` | Retrieval fixtures with distractors and obsolete context passed. |
+| `memory_evolution` | 6 | `pass` | `1.000` | Current-vs-historical and temporal relation validity passed. |
+| `consolidation` | 4 | `pass` | `1.000` | Proposal-only consolidation passed with `0` source mutations. |
+| `knowledge_compilation` | 2 | `pass` | `1.000` | Derived page fixtures passed with citation/rebuild checks. |
+| `operator_debugging_ux` | 1 | `pass` | `1.000` | Aggregate stage-attribution fixture passed. |
+| `capture_integration` | 2 | `pass` | `1.000` | Redaction and capture-boundary fixtures passed. |
+| `production_ops` | 6 | `incomplete` | `0.500` | Three jobs passed, one is a typed dependency `incomplete`, and two are typed operator `blocked`. |
+| `personalization` | 1 | `pass` | `1.000` | Scoped preference correction passed. |
+
+## Focused P1 Slices
+
+| Command | Jobs | Status summary | Evidence notes |
+| --- | ---: | --- | --- |
+| `cargo make real-world-memory-project-decisions` | 5 | `5` pass | Current decision, historical/reversed decision, validation gate, tradeoff rationale, and private-manifest caveat all passed. |
+| `cargo make real-world-memory-evolution` | 5 | `5` pass | Temporal relation validity is now encoded and passing; stale answers `0`, conflict detections `5`, update rationales `5`. |
+| `cargo make real-world-job-operator-ux` | 5 | `5` pass | Dropped evidence, rerank promotion, provider latency, rebuild change, and misleading relation-context debug cases passed with raw SQL needed `0`. |
+| `cargo make real-world-memory-production-ops` | 6 | `3` pass, `1` incomplete, `2` blocked | Restore/Qdrant rebuild, interrupted backfill resume, and resource envelope passed; local embedding dependency, provider credentials, and private manifest remain typed non-pass boundaries. |
+
+## External Adapter Evidence
+
+The real-world runner loads
+`apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json`.
+That manifest is an evidence ledger, not a leaderboard. It keeps three evidence classes
+separate:
+
+| Evidence class | Count | Meaning |
+| --- | ---: | --- |
+| `fixture_backed` | 1 | ELF fixture scoring through checked-in real-world jobs. |
+| `live_baseline_only` | 6 | Docker same-corpus/lifecycle evidence from the live-baseline runner only. |
+| `live_real_world` | 0 | No external project currently executes `real_world_job` prompts and scoring. |
+
+Adapter-level status after refreshing the manifest:
+
+| Project | Evidence class | Overall status | What is proven | What is not proven |
+| --- | --- | --- | --- | --- |
+| ELF | `fixture_backed` | `incomplete` | Fixture-backed real-world scoring passes 10 of 11 suites, with production-ops typed boundaries preserved. | A live end-to-end real-world service adapter is not encoded. |
+| qmd | `live_baseline_only` | `pass` | Docker same-corpus retrieval, update, delete, and cold-start live-baseline checks pass. | qmd does not yet run any real-world job suite. |
+| agentmemory | `live_baseline_only` | `lifecycle_fail` | Same-corpus retrieval can run through current adapter. | Durable storage/cold-start lifecycle and real-world suites are blocked by the current in-memory adapter path. |
+| mem0/OpenMemory | `live_baseline_only` | `wrong_result` | Local OSS setup is represented separately from hosted/OpenMemory claims. | Same-corpus retrieval was not a clean pass and no real-world job adapter is encoded. |
+| memsearch | `live_baseline_only` | `wrong_result` | Markdown-first design remains a source-of-truth ergonomics reference. | Same-corpus retrieval was not a clean pass and real-world suites are incomplete/not encoded. |
+| OpenViking | `live_baseline_only` | `incomplete` | Hierarchical context trajectory remains a reference direction. | Docker local-embedding setup must be pinned before fair retrieval or real-world jobs can run. |
+| claude-mem | `live_baseline_only` | `wrong_result` | Progressive disclosure and local viewer remain UX references. | Current Docker evidence is not a clean same-corpus pass and progressive disclosure jobs are not encoded. |
+
+External summary counters: `7` adapter records, `6` external projects, `7` Docker-default,
+`0` host-global-install requirements, `0` live real-world adapters, `3` external
+wrong-result overall states, `1` lifecycle-fail state, and `1` external incomplete state.
+
+## Remaining Gaps
+
+Every remaining non-pass state is either a follow-up or an explicit non-goal for this
+report:
+
+| Gap | Status | Follow-up or non-goal |
+| --- | --- | --- |
+| ELF production-ops cold-start dependency fixture | `incomplete` | `[ELF benchmark P0] Pin Docker-compatible local embedding dependency for cold-start adapter checks`. |
+| ELF provider-backed production-ops gate | `blocked` | Run only with routed operator credentials; credentials were not supplied for this report. |
+| ELF private production corpus | `blocked` | Supply an operator-owned sanitized private manifest; private-corpus checks were a non-goal without that manifest. |
+| ELF fixture-backed scoring is not live service execution | `not_encoded` capability | `[ELF benchmark vNext] Replace fixture-only ELF answers with live real-world adapter execution where appropriate`. |
+| qmd real-world job adapter | `not_encoded` suites | Add a qmd adapter that executes `real_world_job` prompts and scoring before claiming real-world suite parity. |
+| agentmemory durable lifecycle | `lifecycle_fail` / `blocked` | `[ELF benchmark P0] Make agentmemory adapter lifecycle-durable and fail-typed`. |
+| mem0/OpenMemory same-corpus and real-world coverage | `wrong_result` / `not_encoded` | Add/fix a local OSS adapter before claiming lifecycle, personalization, or OpenMemory UI parity. |
+| memsearch same-corpus and real-world coverage | `wrong_result` / `incomplete` | Fix Docker same-corpus retrieval/reindex evidence before scoring Markdown-first real-world jobs. |
+| OpenViking Docker local embedding path | `incomplete` | `[ELF benchmark adapter] Pin OpenViking Docker local embedding dependency path`. |
+| claude-mem durable/progressive-disclosure adapter | `wrong_result` / `not_encoded` | Add durable local repository and progressive-disclosure job coverage before UX parity claims. |
+
+## Adoption Implications
+
+What ELF is better at in the current evidence:
+
+- Evidence-bound writes, deterministic ingestion boundaries, source-of-truth discipline,
+  rebuildable Qdrant indexing, scoped service APIs, and audited fixture-backed real-world
+  provenance are stronger than the currently tested alternatives.
+- The P1 fixture batch removed the previous real-world `wrong_result` and `not_encoded`
+  aggregate gaps for project decisions, temporal relation validity, and operator
+  debugging UX.
+
+Where ELF is comparable or still being tested:
+
+- qmd remains the strongest local retrieval-debug baseline. It passes current
+  live-baseline checks, while ELF has the stronger evidence/provenance service contract.
+- The fixture-backed retrieval and memory-evolution suites pass, but this is not the
+  same as proving every external project on the same real-world jobs.
+
+Where ELF is behind or not yet proven:
+
+- No external project has a live real-world adapter win, including ELF as a live service
+  adapter; the current ELF result is fixture-backed.
+- Production-ops is intentionally not a full pass because credentialed and private
+  corpus checks need operator-owned inputs.
+- ELF still needs to absorb external strengths: qmd-style local debug knobs,
+  agentmemory/claude-mem/OpenMemory-style continuity and viewer ergonomics,
+  OpenViking-style context trajectory, mem0-style entity history, and memsearch-style
+  canonical local-store ergonomics.
+
+The current adoption statement is therefore: ELF is the best-supported foundation in
+this repository for high-trust evidence-linked agent memory, but this report does not
+claim overall external superiority or private-corpus production proof.
diff --git a/docs/guide/benchmarking/index.md b/docs/guide/benchmarking/index.md
index e6ea0bff..7cbb67ec 100644
--- a/docs/guide/benchmarking/index.md
+++ b/docs/guide/benchmarking/index.md
@@ -37,6 +37,9 @@ cleanup, use `docs/guide/single_user_production.md`.
 - `2026-06-09-operator-debugging-ux-report.md`: checked-in real-world job
   operator-debugging UX report with trace/viewer links, raw-SQL avoidance, root-cause
   step counts, dropped-candidate visibility, and repair-action clarity.
+- `2026-06-10-real-world-comparison-report.md`: checked-in post-P1 real-world
+  comparison report with aggregate fixture evidence, external-adapter evidence classes,
+  remaining typed gaps, and adoption implications.
 - `real_world_agent_memory_benchmark.md`: operator overview for the v1 real-world
   agent memory benchmark contract, including suite taxonomy, typed report states,
   knowledge-compilation fixture tasks, and the production-ops fixture target.