From db09505dc1a9455c034c6677e281b285d9fddcb1 Mon Sep 17 00:00:00 2001 From: Yvette Carlisle Date: Fri, 19 Jun 2026 22:50:50 +0800 Subject: [PATCH] {"schema":"decodex/commit/1","summary":"Record XY-930 public-proxy production addendum","authority":"XY-930"} --- README.md | 21 +- ...lic-proxy-production-private-addendum.json | 227 ++++++++++++++++++ .../tests/real_world_job_benchmark.rs | 113 +++++++++ ...ublic-proxy-production-private-addendum.md | 157 ++++++++++++ docs/evidence/benchmarking/index.md | 1 + docs/log.md | 4 + 6 files changed, 518 insertions(+), 5 deletions(-) create mode 100644 apps/elf-eval/fixtures/report_snapshots/2026-06-19-operator-approved-public-proxy-production-private-addendum.json create mode 100644 docs/evidence/benchmarking/2026-06-19-operator-approved-public-proxy-production-private-addendum.md diff --git a/README.md b/README.md index e4ca0368..81b99066 100644 --- a/README.md +++ b/README.md @@ -207,6 +207,13 @@ provider-backed ELF evidence was required. This improves local Dreaming runtime authority and auditability, but it does not prove Pulse, ChatGPT Tasks, Claude Dreams, hosted managed-memory, or private-corpus parity. +- Operator-approved public-proxy addendum after XY-930: the June 19 follow-up runs + `cargo make baseline-production-private-addendum` with a simulated/public-proxy + production corpus manifest approved for this stage. The run records 12 documents, + 8 queries, 8/8 query passes, 8/8 full checks, 0 wrong_result, and 0 blocked while + using local `local-hash` embeddings. This closes the proxy/simulated-corpus stage; + it does not prove real private-corpus production quality or provider-backed + embedding quality. - Full-suite live real-world adapter sweep after XY-926: ELF and qmd emit Docker-isolated `live_real_world` records for all 55 checked-in jobs across 13 suites through `cargo make real-world-memory-live-adapters`. Both keep the original @@ -325,6 +332,7 @@ Detailed evidence and interpretation: - [OpenViking Trajectory Materialization Report - June 19, 2026](docs/evidence/benchmarking/2026-06-19-openviking-trajectory-materialization-report.md) - [Service-Native Dreaming Readback Report - June 19, 2026](docs/evidence/benchmarking/2026-06-19-service-native-dreaming-readback-report.md) - [OpenMemory UI/Export Product Readback Report - June 19, 2026](docs/evidence/benchmarking/2026-06-19-openmemory-ui-export-product-readback-report.md) +- [Operator-Approved Public-Proxy Production-Private Addendum - June 19, 2026](docs/evidence/benchmarking/2026-06-19-operator-approved-public-proxy-production-private-addendum.md) - [Live Baseline Benchmark Runbook](docs/runbook/benchmarking/live_baseline_benchmark.md) - [Single-User Production Runbook](docs/runbook/single_user_production.md) - Benchmark contract: @@ -346,7 +354,8 @@ Evidence-backed position after the June 16 temporal reconciliation report: the local retrieval-debug baseline and now has full-suite live sweep evidence with typed non-pass states, while ELF has the stronger service and provenance contract. - ELF is still behind or not yet proven on full-suite live real-world pass parity, - private-corpus production quality, credentialed production-ops gates, + real private-corpus production quality, provider-backed private-corpus quality, + credentialed production-ops gates, qmd-style local debug knobs, agentmemory/claude-mem/OpenMemory-style capture and continuity UX, OpenViking-style context trajectory, and hosted managed memory. @@ -412,6 +421,7 @@ Detailed comparison, mechanism-level analysis, and source map: - [Graph/RAG Citation and Navigation Promotion Report - June 19, 2026](docs/evidence/benchmarking/2026-06-19-graph-rag-citation-navigation-promotion-report.md) - [qmd Debug-Ergonomics Dreaming Retest Report - June 19, 2026](docs/evidence/benchmarking/2026-06-19-qmd-debug-ergonomics-dreaming-retest-report.md) - [OpenMemory UI/Export Product Readback Report - June 19, 2026](docs/evidence/benchmarking/2026-06-19-openmemory-ui-export-product-readback-report.md) +- [Operator-Approved Public-Proxy Production-Private Addendum - June 19, 2026](docs/evidence/benchmarking/2026-06-19-operator-approved-public-proxy-production-private-addendum.md) - [Live Baseline Benchmark Runbook](docs/runbook/benchmarking/live_baseline_benchmark.md) - [Real-World Agent Memory Benchmark](docs/runbook/benchmarking/real_world_agent_memory_benchmark.md) - [External Memory Improvement Plan](docs/evidence/external_memory/external_memory_improvement_plan.md) @@ -424,10 +434,11 @@ Detailed comparison, mechanism-level analysis, and source map: - [Dreaming Product Surface Follow-Up Research](docs/research/dreaming_product_surface_followup.md) Latest real-world benchmark report: June 19, 2026. Latest external research refresh: -June 11, 2026; June 19 adds service-native Dreaming readback after the qmd -debug-ergonomics Dreaming retest, the June 17 competitor-strength closeout, and the -June 16 temporal reconciliation, live consolidation self-check, proactive-brief, and -scheduled-memory scoring evidence. +June 11, 2026; June 19 adds the XY-930 operator-approved public-proxy production +addendum and service-native Dreaming readback after the qmd debug-ergonomics Dreaming +retest, the June 17 competitor-strength closeout, and the June 16 temporal +reconciliation, live consolidation self-check, proactive-brief, and scheduled-memory +scoring evidence. ## Documentation diff --git a/apps/elf-eval/fixtures/report_snapshots/2026-06-19-operator-approved-public-proxy-production-private-addendum.json b/apps/elf-eval/fixtures/report_snapshots/2026-06-19-operator-approved-public-proxy-production-private-addendum.json new file mode 100644 index 00000000..ce86fc4c --- /dev/null +++ b/apps/elf-eval/fixtures/report_snapshots/2026-06-19-operator-approved-public-proxy-production-private-addendum.json @@ -0,0 +1,227 @@ +{ + "schema": "elf.operator_approved_public_proxy_baseline_report/v1", + "report_id": "xy-930-operator-approved-public-proxy-production-private-addendum-2026-06-19", + "authority": "XY-930", + "created_at": "2026-06-19T14:40:13Z", + "goal": "Record the operator-approved simulated/public-proxy production-corpus run through the fail-closed production-private addendum path while preserving private-corpus and provider-backed claim boundaries.", + "command": { + "command": "ELF_BASELINE_PROJECTS=ELF ELF_BASELINE_PRODUCTION_CORPUS_MANIFEST=/workspace/tmp/.json ELF_BASELINE_PRIVATE_ADDENDUM=tmp/live-baseline/operator-approved-public-proxy-addendum.md cargo make baseline-production-private-addendum", + "status": "pass", + "run_id": "live-baseline-20260619143959", + "report_artifact": "tmp/live-baseline/live-baseline-report.json", + "markdown_artifact": "tmp/live-baseline/operator-approved-public-proxy-addendum.md", + "project_head": "56c68e6518ed7c255d6c21b867315277670fc995" + }, + "corpus": { + "profile": "production-private", + "runner_track": "private_production", + "manifest_kind": "operator_approved_public_proxy", + "manifest_id": "operator-approved-public-proxy-prod-corpus-2026-06-19", + "document_count": 12, + "query_count": 8, + "source_boundary": "The manifest is sanitized generated/public-proxy material approved for this run; source text and local manifest paths are not checked in.", + "runner_label_boundary": "The runner labels the track private_production because the fail-closed production-private entrypoint was exercised. This report does not convert the proxy corpus into real private-corpus proof." + }, + "embedding": { + "mode": "local", + "provider_id": "local", + "model": "local-hash", + "dimensions": 256, + "timeout_ms": 1000, + "api_base": "http://127.0.0.1", + "path": "/embeddings", + "provider_backed_quality_proven": false + }, + "summary": { + "project": "ELF", + "project_status": "pass", + "retrieval_status": "retrieval_pass", + "total": 1, + "pass": 1, + "fail": 0, + "wrong_result": 0, + "lifecycle_fail": 0, + "incomplete": 0, + "blocked": 0, + "not_encoded": 0, + "reason": "ELF added the operator-approved public-proxy corpus, rebuilt Qdrant, and returned expected evidence for every query." + }, + "check_summary": { + "total": 8, + "pass": 8, + "fail": 0, + "wrong_result": 0, + "lifecycle_fail": 0, + "incomplete": 0, + "blocked": 0, + "not_encoded": 0 + }, + "query_summary": { + "total": 8, + "pass": 8, + "fail": 0, + "wrong_result_count": 0, + "latency_ms_mean": 10.842727625, + "latency_ms_p50": 8.186716, + "latency_ms_p95": 30.443385, + "latency_ms_p99": 30.443385, + "latency_ms_max": 30.443385 + }, + "queries": [ + { + "id": "q-resume-xy930-policy", + "task": "resume_lane", + "trace_id": "882fc41f-7ea0-42c1-a04e-a62713b8e7d0", + "expected_evidence": "issue-xy930-policy", + "top_evidence": "issue-xy930-policy", + "matched": true, + "latency_ms": 9.300164 + }, + { + "id": "q-recover-private-command", + "task": "recover_exact_command", + "trace_id": "929516c3-03d9-4d9f-aa7d-cc5a5c76e9d3", + "expected_evidence": "runbook-private-command", + "top_evidence": "runbook-private-command", + "matched": true, + "latency_ms": 30.443385 + }, + { + "id": "q-explain-provider-blocker", + "task": "explain_stale_blocker", + "trace_id": "66e32fc2-71b1-40bf-b1d3-7e60427a2573", + "expected_evidence": "blocker-provider-missing", + "top_evidence": "blocker-provider-missing", + "matched": true, + "latency_ms": 8.186716 + }, + { + "id": "q-find-proxy-boundary", + "task": "find_prior_decision", + "trace_id": "93651b26-6584-4883-ae30-ff9928cace59", + "expected_evidence": "decision-proxy-boundary", + "top_evidence": "decision-proxy-boundary", + "matched": true, + "latency_ms": 7.743761 + }, + { + "id": "q-compare-dreaming-graphrag", + "task": "compare_project_status", + "trace_id": "b4a71e95-1571-4b7d-9fa6-e6e8be1b62a1", + "expected_evidence": "issue-xy986-dreaming", + "top_evidence": "issue-xy986-dreaming", + "matched": true, + "latency_ms": 7.350473 + }, + { + "id": "q-detect-sdk-ui-export", + "task": "detect_contradiction_update", + "trace_id": "6790eab4-561c-4c9e-abc4-728580f359c5", + "expected_evidence": "issue-xy987-openmemory", + "top_evidence": "issue-xy987-openmemory", + "matched": true, + "latency_ms": 7.606096 + }, + { + "id": "q-recover-addendum-safety", + "task": "recover_exact_command", + "trace_id": "11fa7d80-7a95-4b6f-861f-ae43acf469e0", + "expected_evidence": "runbook-addendum-safety", + "top_evidence": "runbook-addendum-safety", + "matched": true, + "latency_ms": 7.805386 + }, + { + "id": "q-resume-cleanup", + "task": "resume_lane", + "trace_id": "7e44260b-330d-4168-ab98-7fae99e5318f", + "expected_evidence": "worktree-cleanup", + "top_evidence": "worktree-cleanup", + "matched": true, + "latency_ms": 8.30584 + } + ], + "backfill": { + "source_count": 12, + "completed_count": 12, + "batch_size": 32, + "worker_concurrency": 1, + "attempts": 2, + "interrupted_after": 6, + "completed_before_resume": 6, + "completed_after_resume": 12, + "skipped_completed": 6, + "duplicate_source_notes": 0, + "elapsed_seconds": 0.175270381 + }, + "resource_envelope": { + "elapsed_seconds": 1.313984156, + "rss_kb": 37656, + "max_rss_kb": 1500000, + "postgres_database_bytes": 11867839, + "corpus_dir_bytes": 1422, + "report_dir_bytes": 15289, + "checkpoint_file_bytes": 3094 + }, + "cost_proxy": { + "scope": "primary corpus note text plus declared same-corpus query text", + "estimated_input_chars": 1542, + "estimated_input_tokens": 386, + "configured_usd_per_1k_tokens": null, + "estimated_usd": null + }, + "improvement_regression_readback": { + "previous_state": [ + "XY-930 was blocked on the absent operator-owned production corpus manifest and absent credentialed provider setup." + ], + "improved": [ + "The fail-closed production-private manifest path is now exercised with an operator-approved public-proxy corpus.", + "Same-corpus retrieval improved from blocked by missing manifest to 8/8 pass on the approved proxy corpus.", + "Backfill resume, update, delete, cold-start, concurrent write/search, and resource-envelope checks all remained pass." + ], + "unchanged": [ + "Real private-corpus production quality is still not proven.", + "Provider-backed embedding quality is still not proven because this run used local-hash embeddings.", + "Broad competitor superiority is unchanged; this run only covers the ELF private-entrypoint proxy signal." + ], + "regressed": [] + }, + "claim_boundaries": { + "allowed": [ + "The production-private addendum entrypoint passed on the operator-approved public-proxy corpus.", + "The run produced 8/8 query passes, 0 wrong_result, 0 lifecycle_fail, 0 blocked, 0 incomplete, and 0 not_encoded.", + "The run is useful as a proxy signal for XY-930 planning and benchmark continuity." + ], + "not_allowed": [ + "Do not call this real private-corpus production proof.", + "Do not claim provider-backed production quality; embedding mode was local.", + "Do not treat the runner track private_production as a private data authority claim.", + "Do not use this single ELF proxy run as broad competitor-superiority evidence." + ] + }, + "public_dataset_candidates": [ + { + "name": "SWE-bench", + "url": "https://github.com/swe-bench/SWE-bench", + "used_in_this_run": false, + "note": "Candidate public issue/PR corpus for a future downloadable proxy expansion." + }, + { + "name": "SWE-bench original dataset description", + "url": "https://www.swebench.com/original.html", + "used_in_this_run": false, + "note": "Describes the public 12-repository, 2294-task benchmark; not downloaded for this run." + } + ], + "next_optimization_direction": { + "immediate": [ + "Keep this report as the XY-930 public-proxy closure evidence.", + "Use the same addendum path for future public/downloaded corpora before any real private corpus is introduced." + ], + "when_operator_inputs_exist": [ + "Run the same profile with a real private production corpus manifest.", + "Run provider-backed embeddings with ELF_BASELINE_ELF_EMBEDDING_MODE=provider and a routed provider setup.", + "Compare proxy, real-private, and provider-backed results for retrieval deltas before claiming production quality." + ] + } +} diff --git a/apps/elf-eval/tests/real_world_job_benchmark.rs b/apps/elf-eval/tests/real_world_job_benchmark.rs index 42ebd39a..be370dce 100644 --- a/apps/elf-eval/tests/real_world_job_benchmark.rs +++ b/apps/elf-eval/tests/real_world_job_benchmark.rs @@ -254,6 +254,12 @@ fn graph_rag_citation_navigation_promotion_report_json_path() -> Result report_snapshot_path("2026-06-19-graph-rag-citation-navigation-promotion-report.json") } +fn operator_approved_public_proxy_private_addendum_report_json_path() -> Result { + report_snapshot_path( + "2026-06-19-operator-approved-public-proxy-production-private-addendum.json", + ) +} + fn openviking_trajectory_materialization_report_markdown_path() -> Result { Ok(workspace_root()? .join("docs") @@ -294,6 +300,14 @@ fn graph_rag_citation_navigation_promotion_report_markdown_path() -> Result Result { + Ok(workspace_root()? + .join("docs") + .join("evidence") + .join("benchmarking") + .join("2026-06-19-operator-approved-public-proxy-production-private-addendum.md")) +} + fn live_temporal_reconciliation_report_json_path() -> Result { report_snapshot_path("2026-06-16-live-temporal-reconciliation-report.json") } @@ -3437,6 +3451,105 @@ fn assert_service_native_dreaming_docs(markdown: &str, benchmarking_index: &str, assert!(readme.contains("real-world-memory-service-native-dreaming")); } +#[test] +fn operator_approved_public_proxy_private_addendum_preserves_boundary() -> Result<()> { + let report = serde_json::from_str::(&fs::read_to_string( + operator_approved_public_proxy_private_addendum_report_json_path()?, + )?)?; + let markdown = fs::read_to_string( + operator_approved_public_proxy_private_addendum_report_markdown_path()?, + )?; + let benchmarking_index = fs::read_to_string(benchmarking_index_path()?)?; + let readme = fs::read_to_string(readme_path()?)?; + + assert_eq!( + report.pointer("/schema").and_then(Value::as_str), + Some("elf.operator_approved_public_proxy_baseline_report/v1") + ); + assert_eq!(report.pointer("/authority").and_then(Value::as_str), Some("XY-930")); + assert_eq!(report.pointer("/command/status").and_then(Value::as_str), Some("pass")); + assert_eq!( + report.pointer("/command/run_id").and_then(Value::as_str), + Some("live-baseline-20260619143959") + ); + assert_eq!( + report.pointer("/corpus/profile").and_then(Value::as_str), + Some("production-private") + ); + assert_eq!( + report.pointer("/corpus/runner_track").and_then(Value::as_str), + Some("private_production") + ); + assert_eq!( + report.pointer("/corpus/manifest_kind").and_then(Value::as_str), + Some("operator_approved_public_proxy") + ); + assert_eq!( + report.pointer("/corpus/manifest_id").and_then(Value::as_str), + Some("operator-approved-public-proxy-prod-corpus-2026-06-19") + ); + assert_eq!(report.pointer("/embedding/mode").and_then(Value::as_str), Some("local")); + assert_eq!( + report.pointer("/embedding/provider_backed_quality_proven").and_then(Value::as_bool), + Some(false) + ); + assert_eq!(report.pointer("/summary/project_status").and_then(Value::as_str), Some("pass")); + assert_eq!(report.pointer("/summary/wrong_result").and_then(Value::as_u64), Some(0)); + assert_eq!(report.pointer("/summary/blocked").and_then(Value::as_u64), Some(0)); + assert_eq!(report.pointer("/summary/incomplete").and_then(Value::as_u64), Some(0)); + assert_eq!(report.pointer("/check_summary/total").and_then(Value::as_u64), Some(8)); + assert_eq!(report.pointer("/check_summary/pass").and_then(Value::as_u64), Some(8)); + assert_eq!( + report.pointer("/query_summary/wrong_result_count").and_then(Value::as_u64), + Some(0) + ); + assert_eq!(report.pointer("/backfill/completed_count").and_then(Value::as_u64), Some(12)); + assert_eq!(report.pointer("/backfill/duplicate_source_notes").and_then(Value::as_u64), Some(0)); + + let queries = array_at(&report, "/queries")?; + let provider = find_by_field(queries, "/id", "q-explain-provider-blocker")?; + + assert_eq!(queries.len(), 8); + assert_eq!( + provider.pointer("/top_evidence").and_then(Value::as_str), + Some("blocker-provider-missing") + ); + assert_eq!(provider.pointer("/matched").and_then(Value::as_bool), Some(true)); + assert!(array_contains_str( + &report, + "/claim_boundaries/not_allowed", + "Do not call this real private-corpus production proof." + )?); + assert!(array_contains_str( + &report, + "/claim_boundaries/not_allowed", + "Do not claim provider-backed production quality; embedding mode was local." + )?); + assert!(array_contains_str( + &report, + "/improvement_regression_readback/unchanged", + "Real private-corpus production quality is still not proven." + )?); + assert!(array_contains_str( + &report, + "/next_optimization_direction/when_operator_inputs_exist", + "Run provider-backed embeddings with ELF_BASELINE_ELF_EMBEDDING_MODE=provider and a routed provider setup." + )?); + assert!(markdown.contains("proxy corpus pass")); + assert!(markdown.contains("Do not call this real private-corpus production proof.")); + assert!(markdown.contains("| Embedding mode | `local` |")); + assert!( + benchmarking_index + .contains("2026-06-19-operator-approved-public-proxy-production-private-addendum.md") + ); + assert!(benchmarking_index.contains("not real private-corpus or provider-backed proof")); + assert!(readme.contains("Operator-approved public-proxy addendum after XY-930")); + assert!(readme.contains("8/8 query passes")); + assert!(readme.contains("does not prove real private-corpus production quality")); + + Ok(()) +} + #[test] fn openmemory_ui_export_product_recheck_preserves_blocked_boundary() -> Result<()> { let report = serde_json::from_str::(&fs::read_to_string( diff --git a/docs/evidence/benchmarking/2026-06-19-operator-approved-public-proxy-production-private-addendum.md b/docs/evidence/benchmarking/2026-06-19-operator-approved-public-proxy-production-private-addendum.md new file mode 100644 index 00000000..2da5237b --- /dev/null +++ b/docs/evidence/benchmarking/2026-06-19-operator-approved-public-proxy-production-private-addendum.md @@ -0,0 +1,157 @@ +--- +type: Evidence +title: "Operator-Approved Public-Proxy Production-Private Addendum - June 19, 2026" +description: "Checked-in benchmark evidence record for the XY-930 operator-approved public-proxy run through the production-private addendum path." +resource: docs/evidence/benchmarking/2026-06-19-operator-approved-public-proxy-production-private-addendum.md +status: active +authority: current_state +owner: evidence +last_verified: 2026-06-19 +tags: + - docs + - evidence + - benchmarking +--- +# Operator-Approved Public-Proxy Production-Private Addendum - June 19, 2026 + +Goal: Close the current XY-930 blocker with an operator-approved simulated/public-proxy +production corpus while preserving the private-corpus and provider-backed evidence +boundaries. +Read this when: You need to know whether the fail-closed +`ELF_BASELINE_PRODUCTION_CORPUS_MANIFEST` path can run without a real private corpus, +and which claims remain disallowed. +Inputs: +`apps/elf-eval/fixtures/report_snapshots/2026-06-19-operator-approved-public-proxy-production-private-addendum.json`, +`tmp/live-baseline/live-baseline-report.json`, and +`tmp/live-baseline/operator-approved-public-proxy-addendum.md`. +Outputs: A public-safe report snapshot, a production-private addendum run, and explicit +claim boundaries for simulated/public-proxy versus real private/provider evidence. + +## Executive Judgment + +The XY-930 proxy run is complete: the production-private addendum entrypoint passed on +an operator-approved simulated/public-proxy corpus. + +The command exercised the fail-closed production-private manifest path and published: + +- 12 documents. +- 8 queries. +- 8/8 full checks passing. +- 8/8 same-corpus query matches. +- 0 wrong_result. +- 0 lifecycle_fail. +- 0 blocked. +- 0 incomplete. +- 0 not_encoded. + +This improves the lane from "blocked by missing manifest" to "proxy corpus pass." It +does not prove real private-corpus production quality, provider-backed embedding +quality, or broad competitor superiority. + +## Command Evidence + +| Command | Status | Run ID | Artifacts | +| --- | --- | --- | --- | +| `ELF_BASELINE_PROJECTS=ELF ELF_BASELINE_PRODUCTION_CORPUS_MANIFEST=/workspace/tmp/.json ELF_BASELINE_PRIVATE_ADDENDUM=tmp/live-baseline/operator-approved-public-proxy-addendum.md cargo make baseline-production-private-addendum` | `pass` | `live-baseline-20260619143959` | `tmp/live-baseline/live-baseline-report.json`; `tmp/live-baseline/operator-approved-public-proxy-addendum.md` | + +The runner reports corpus profile `production-private` and track +`private_production` because the production-private entrypoint was used. The manifest +itself is `operator-approved-public-proxy-prod-corpus-2026-06-19`, so the track label +must not be read as real private data authority. + +## Run Summary + +| Field | Value | +| --- | --- | +| Project | `ELF` | +| Commit | `56c68e6518ed7c255d6c21b867315277670fc995` | +| Corpus profile | `production-private` | +| Corpus track | `private_production` | +| Corpus manifest | `operator-approved-public-proxy-prod-corpus-2026-06-19` | +| Manifest kind | `operator_approved_public_proxy` | +| Embedding mode | `local` | +| Embedding model | `local-hash` | +| Query mean latency | `10.842727625 ms` | +| Query P50/P95/P99 | `8.186716 ms` / `30.443385 ms` / `30.443385 ms` | +| Resource envelope | `1.313984156s`, `37656` RSS KB | +| Cost proxy | `386` estimated input tokens; no configured cost rate | + +## Query Evidence + +| Query | Task | Expected Evidence | Top Evidence | Trace ID | Latency | +| --- | --- | --- | --- | --- | --- | +| `q-resume-xy930-policy` | `resume_lane` | `issue-xy930-policy` | `issue-xy930-policy` | `882fc41f-7ea0-42c1-a04e-a62713b8e7d0` | `9.300164 ms` | +| `q-recover-private-command` | `recover_exact_command` | `runbook-private-command` | `runbook-private-command` | `929516c3-03d9-4d9f-aa7d-cc5a5c76e9d3` | `30.443385 ms` | +| `q-explain-provider-blocker` | `explain_stale_blocker` | `blocker-provider-missing` | `blocker-provider-missing` | `66e32fc2-71b1-40bf-b1d3-7e60427a2573` | `8.186716 ms` | +| `q-find-proxy-boundary` | `find_prior_decision` | `decision-proxy-boundary` | `decision-proxy-boundary` | `93651b26-6584-4883-ae30-ff9928cace59` | `7.743761 ms` | +| `q-compare-dreaming-graphrag` | `compare_project_status` | `issue-xy986-dreaming` | `issue-xy986-dreaming` | `b4a71e95-1571-4b7d-9fa6-e6e8be1b62a1` | `7.350473 ms` | +| `q-detect-sdk-ui-export` | `detect_contradiction_update` | `issue-xy987-openmemory` | `issue-xy987-openmemory` | `6790eab4-561c-4c9e-abc4-728580f359c5` | `7.606096 ms` | +| `q-recover-addendum-safety` | `recover_exact_command` | `runbook-addendum-safety` | `runbook-addendum-safety` | `11fa7d80-7a95-4b6f-861f-ae43acf469e0` | `7.805386 ms` | +| `q-resume-cleanup` | `resume_lane` | `worktree-cleanup` | `worktree-cleanup` | `7e44260b-330d-4168-ab98-7fae99e5318f` | `8.30584 ms` | + +## Backfill And Lifecycle Evidence + +- Backfill source count: `12`. +- Completed count: `12`. +- Batch size: `32`. +- Worker concurrency: `1`. +- Resume probe: interrupted after `6/12`, then resumed to `12/12`. +- Skipped completed on resume: `6`. +- Duplicate source notes: `0`. +- Encoded checks passed: resumable backfill, same-corpus retrieval, async worker + indexing, update replacement, delete suppression, cold-start recovery, concurrent + write/search, and resource envelope. + +## Improvement/Regression Readback + +Improved: + +- XY-930 no longer depends on a human-supplied real private manifest for this proxy + stage. +- The production-private addendum path moved from missing-manifest blocked to 8/8 + pass on the approved public-proxy corpus. +- Resume, lifecycle, cold-start, concurrent write/search, and resource checks stayed + green. + +Unchanged: + +- Real private-corpus production quality is still not proven. +- Provider-backed embedding quality is still not proven because this run used + `local-hash`. +- Broad competitor superiority is unchanged; this run only covers the ELF + private-entrypoint proxy signal. + +Regressed: none. + +## Claim Boundaries + +Allowed: + +- The production-private addendum entrypoint passed on the operator-approved + public-proxy corpus. +- This stage produced 8/8 query passes, 0 wrong_result, 0 lifecycle_fail, 0 blocked, + 0 incomplete, and 0 not_encoded. +- The result is a useful proxy signal for XY-930 planning and benchmark continuity. + +Not allowed: + +- Do not call this real private-corpus production proof. +- Do not claim provider-backed production quality; embedding mode was local. +- Do not treat the runner track `private_production` as a private data authority + claim. +- Do not use this single ELF proxy run as broad competitor-superiority evidence. + +## Next Optimization Direction + +Immediate: + +- Keep this report as the XY-930 public-proxy closure evidence. +- Reuse the same addendum path for future public/downloaded corpora before any real + private corpus is introduced. + +When operator-owned inputs exist: + +- Run the same profile with a real private production corpus manifest. +- Run provider-backed embeddings with `ELF_BASELINE_ELF_EMBEDDING_MODE=provider`. +- Compare proxy, real-private, and provider-backed results before claiming production + quality. diff --git a/docs/evidence/benchmarking/index.md b/docs/evidence/benchmarking/index.md index dee67261..6c92eaf3 100644 --- a/docs/evidence/benchmarking/index.md +++ b/docs/evidence/benchmarking/index.md @@ -40,5 +40,6 @@ Routes to: Benchmarking evidence concepts under `docs/evidence/benchmarking/`. - `2026-06-19-letta-core-archive-export-readback-report.md`: Letta Core/Archive Export-Readback Report - June 19, 2026; adds a Docker-contained Letta materialization/report command while preserving all six core/archive comparison scenarios as typed blockers until exported core block JSON, archival readback/search JSON, and source ids exist. - `2026-06-19-openmemory-ui-export-product-readback-report.md`: OpenMemory UI/Export Product Readback Report - June 19, 2026; refreshes the product UI/export recheck and preserves the scenario as blocked because the export helper still needs Docker access to a running OpenMemory product container. - `2026-06-19-openviking-trajectory-materialization-report.md`: OpenViking Trajectory Materialization Report - June 19, 2026; materializes the context-trajectory fixture slice through a dedicated repo task while preserving staged retrieval, hierarchy selection, and recursive/context expansion as typed blockers. +- `2026-06-19-operator-approved-public-proxy-production-private-addendum.md`: Operator-Approved Public-Proxy Production-Private Addendum - June 19, 2026; closes the current XY-930 proxy/simulated-corpus stage with 8/8 query pass, 0 wrong_result, and explicit boundaries that this is not real private-corpus or provider-backed proof. - `2026-06-19-qmd-debug-ergonomics-dreaming-retest-report.md`: qmd Debug-Ergonomics Dreaming Retest Report - June 19, 2026; confirms qmd's default top-k/replay edge is unchanged while ELF keeps the narrow operator-debug trace/stage visibility wins. - `2026-06-19-service-native-dreaming-readback-report.md`: Service-Native Dreaming Readback Report - June 19, 2026; materializes memory summary, proactive brief, and scheduled-memory derived outputs through `ElfService` readback with 9 pass, 0 wrong_result, and 2 typed XY-930 blockers. diff --git a/docs/log.md b/docs/log.md index 43d22370..8bc421b7 100644 --- a/docs/log.md +++ b/docs/log.md @@ -55,3 +55,7 @@ logs. - Added the graph/RAG citation/navigation promotion report and snapshot for XY-985, preserving representative graph/RAG outcomes as typed non-pass while recording graphify evidence-linked output and the remaining adapter-specific blockers. +- Added the operator-approved public-proxy production-private addendum report and + snapshot for XY-930, recording `baseline-production-private-addendum` as 8/8 pass + on the simulated/public-proxy corpus while preserving real private-corpus and + provider-backed production quality as unproven.