Project remains listed but becomes unqueryable after MCP restart with stale WAL/SHM sidecars


### Environment

- **Source tag:** `DeusData/codebase-memory-mcp` `v0.8.1`
- **Resolved source commit:** `f0c9be19c5d74b84f418d807bfdce7b5d6a261ff` (lightweight tag; verified by `git ls-remote` and the GitHub API `git/refs/tags/v0.8.1` endpoint)
- **Binary variant:** standard non-UI
- **Build:** locally compiled on macOS 10.15.8 Catalina x86_64 with Apple Clang 12.0.0 (`clang-1200.0.32.29`); `MACOSX_DEPLOYMENT_TARGET=10.15`; `CC=cc`; `CXX=c++`; the resulting `LC_BUILD_VERSION` is `minos=10.15, sdk=10.15.6` (Catalina-compatible).
- **Binary SHA-256:** `5a859337d243a0f4be913d764bb73a0e01f83e68225cf63c0f000e468df166de`
- **Cache:** isolated `CBM_CACHE_DIR` per test (no shared or prior-pilot caches were used)
- **Fixture:** a generic 3-file React project at `/Users/test/cbm-repro/sample-react-project/` (`package.json`, `src/main.tsx`, `src/App.tsx`) so that no Astryx or other private project names appear in the cache filenames or in the indexer's output
- **No installer script, no `install.sh`, no auto-configuration of any agent.** The binary is driven exclusively over raw stdio MCP (`bin-source-built/codebase-memory-mcp` invoked as a subprocess; `initialize`, `notifications/initialized`, `tools/list`, `tools/call`; nothing else)

### Expected behavior

After the MCP process exits (any reason) and is restarted with the same `CBM_CACHE_DIR`:

1. The previously-indexed project remains registered.
2. `index_status`, `get_graph_schema`, `get_architecture`, `search_graph`, `search_code`, and `get_code_snippet` all return the previously-indexed data.
3. Either the WAL is checkpointed on graceful close, or the WAL is recovered on next start, so that the project record is visible in the main `.db` file after restart.
4. The project's `.db-wal` and `.db-shm` sidecar files (if any) are either checkpointed (truncated to zero) on close, or they reflect a state that the next start can recover from.

### Actual behavior

**Before termination (in the same process):**
- `initialize`, `notifications/initialized`, `tools/list` succeed.
- `index_repository` succeeds with `status:"indexed"`, `nodes:8`, `edges:10` for the 3-file `sample-react-project` fixture.
- `list_projects` returns the project with `name`, `root_path`, `nodes:8`, `edges:10`, `size_bytes:1769472`.
- `index_status` returns `{"project":"…","nodes":8,"edges":10,"status":"ready"}`.
- `search_code` returns the project file matches.

**After termination and restart (separate fresh process, same `CBM_CACHE_DIR`):**
- `list_projects` returns the same project entry (same `name`, `nodes`, `edges`, `size_bytes`).
- `index_status` returns:

  ```json
  {
    "error": "project not found or not indexed",
    "hint": "Use list_projects to see all indexed projects, then pass the project name.",
    "available_projects": ["…sample-react-project"],
    "count": 1
  }
  ```

- `search_code` returns the same `"project not found or not indexed"` error.
- The project's `.db` file **remains on disk** with the same size (1,769,472 bytes) and the same SHA-256 as immediately before termination.
- After the restart sequence runs queries against the same cache, `.db-wal` and `.db-shm` sidecar files appear in the cache directory and persist.

The project's file is on disk; the project is in `list_projects`; but the query tools return "project not found". The project is **registered but unreachable through queries**.

### Process shutdown behavior (kept separate from the persistence defect)

The lifecycle tests used four independent caches, each starting empty, each receiving one termination event. The exit-behavior was observed as follows:

| Test | Termination sent | Process exited within 15 s window? | Exit code | Time to exit |
|------|-------------------|------------------------------------|-----------|--------------|
| A | close client stdin cleanly (no signal) | yes | 0 | 0.0 s |
| B | SIGINT | yes | 0 | 0.41 s |
| C | SIGTERM | yes | 0 | 0.21 s |
| D | SIGKILL | yes | -9 | 0.2 s |

In the four isolated tests above, the process always exited within the test window. **However**, a separate controlled run (in a different prior pilot) observed the binary remaining in `S` (sleeping) state when SIGTERM was sent; that observation is **out of scope for this issue** and is reported separately if relevant. The persistence defect observed here is independent of which termination mode was used: in all four tests, the `.db` file remained on disk with the same size, and after the restart all query tools reported "project not found" while `list_projects` continued to see the project.

### Diagnostic result (disposable cache only)

In an isolated disposable cache, after the restart sequence returned the "project not found" responses, **removing the `.db-wal` and `.db-shm` sidecar files caused the project's queries to succeed again on the next restart** (without re-indexing). The `.db` file itself was not deleted and was not modified.

This is a **diagnostic action performed on a disposable test cache only** to isolate the failure to the sidecar files. It is not proposed as a production workaround. The implication is that the sidecar files, when present, cause the query open path (`cbm_store_open_path_query` in `mcp.c`'s `resolve_store`) to either fail the integrity check or fail the project-record lookup against the main `.db` file. A production fix should make the open path tolerate (or actively recover from) the sidecar state, not require users to delete sidecars.

### Compact before/after cache inventory

This is the cache inventory observed in one of the four tests (the others are identical except for the project-DB SHA-256, which is unique per test). All numbers are exact and reproducible; see `evidence/upstream-prep/lifecycle-*/test.log` for the full per-test logs.

| Stage | File | Size (bytes) | SHA-256 (first 16 hex) |
|-------|------|--------------|------------------------|
| Before termination | `…sample-react-project.db` | 1,769,472 | `b7e13d8b157b09a7` (test B; varies per test) |
| Before termination | `_config.db` | 12,288 | `e8f7566de75fe3f6` (constant across all tests) |
| Immediately after exit | `…sample-react-project.db` | 1,769,472 | `b7e13d8b157b09a7` (unchanged) |
| Immediately after exit | `_config.db` | 12,288 | `e8f7566de75fe3f6` (unchanged) |
| After restart sequence (queries failed) | `…sample-react-project.db` | 1,769,472 | `b7e13d8b157b09a7` (unchanged) |
| After restart sequence (queries failed) | `_config.db` | 12,288 | `e8f7566de75fe3f6` (unchanged) |

**The project DB is preserved across the lifecycle event in all four tests.** Across the four tests, the project DB SHA-256s differ (each indexing run produces a fresh DB) but the sizes are identical (1,769,472 bytes), and within each test the project DB SHA-256 is identical before, after exit, and after restart.

### Relationship to PR #387 (merged) — context only

PR #387 ("fix(store): checkpoint WAL on close and startup to prevent orphan accumulation") was intended to:

1. **Checkpoint the WAL before close**, so the next open sees the same rows the previous process wrote.
2. **Recover stale WAL at startup** (`PRAGMA wal_checkpoint(PASSIVE)`), so an ungraceful exit doesn't poison the next open.
3. Prevent orphan WAL/SHM sidecar accumulation.

The v0.8.1 reproduction above indicates that, under the tested stdio lifecycle, at least one of these three is incomplete or has regressed: the sidecar files are present in the cache after the restart, the `.db` is on disk, but the query open path does not return the project's data. A subsequent restart in a **fresh** cache (where the indexer wrote the data and exited cleanly) is what the diagnostic above measures; the sidecar state from a *prior* failed restart can be cleared by deleting the sidecars, but the user's expectation is that the binary's own close/restart logic handles this.

PR #387 should be checked for whether the close-time checkpoint covers the project-record write path and whether the startup-time recovery handles the case where the `.db` is on disk but the sidecars are present.

### Relationship to issue #277 (open) — context only

Issue #277 ("New files not indexed — WAL-checkpoint blocked on successfully-indexed project") is in the same WAL/recovery family. The defect observed here is **not** the same symptom (here: queryable in the same process, unqueryable after restart; #277: new files not detected). However, the underlying root-cause family (WAL state not being committed before close, or not being recovered on next open) is plausibly the same.

### Relationship to issue #557 (open) — context only

Issue #557 describes a *different* defect: the binary **unlinks the project DB** on a corruption determination (`deleting corrupt db` path in `resolve_store`). The v0.8.1 reproduction in this issue does **not** reproduce that symptom. In all four tests above, the `.db` file is preserved with the same size and the same SHA-256. The defect in this issue is a *lesser* manifestation of the same WAL/recovery area: the sidecar files are present and the next open cannot read the project, but the file is not unlinked. (See the issue-557 family discussion above for the "delete corrupt db" code path; the controlled lifecycle isolation in this issue's tests did not reach that path.)

### Minimal reproduction

All commands below are public-safe; no private names, paths, or model names appear. The fixture is generic; the cache is disposable; the reproduction is one shell command plus one Python script.

```bash
# Variables (set BINARY, CACHE, FIXTURE; create an empty CACHE; use a small generic fixture)
BINARY=/Users/test/cbm-repro/bin-source-built/codebase-memory-mcp   # locally compiled v0.8.1
CACHE=/Users/test/cbm-repro/cache-wal-restart-XXXX                    # fresh, empty
FIXTURE=/Users/test/cbm-repro/sample-react-project                    # 3 files
mkdir -p "$CACHE"

# Step 1 — start the binary, index, and confirm queryability in the same process.
# Step 2 — terminate (any method; here we use stdin close which is the most reproducible).
# Step 3 — restart with the same CACHE.
# Step 4 — list_projects still works; index_status and search_code return "project not found".
# Step 5 — diagnostic: remove sidecars, restart, queries succeed again.
#
# All four lifecycle tests (close-stdin / SIGINT / SIGTERM / SIGKILL) produce the same Step 4 result.

python3 - <<'PY'
import os, json, select, subprocess, time, signal
BINARY, CACHE, FIXTURE = os.environ['BINARY'], os.environ['CACHE'], os.environ['FIXTURE']

def send(p, r):
    p.stdin.write((json.dumps(r) + "\n").encode())
    p.stdin.flush()

def recv(p, rid, t=15):
    end = time.time() + t
    buf = b""
    while time.time() < end:
        r, _, _ = select.select([p.stdout], [], [], 0.3)
        if r:
            c = os.read(p.stdout.fileno(), 4096)
            if not c: return None
            buf += c
            while b"\n" in buf:
                l, buf = buf.split(b"\n", 1)
                if not l.strip(): continue
                try: x = json.loads(l.decode())
                except: continue
                if x.get("id") == rid: return x
    return None

def text_of(resp):
    if not resp or "result" not in resp: return ""
    return resp["result"].get("content", [{}])[0].get("text", "")

# --- Pass 1: index + query, then exit via stdin close ---
p = subprocess.Popen([BINARY], stdin=subprocess.PIPE, stdout=subprocess.PIPE, bufsize=0,
                     env={**os.environ, "CBM_CACHE_DIR": CACHE})
send(p, {"jsonrpc":"2.0","id":1,"method":"initialize",
         "params":{"protocolVersion":"2024-11-05","capabilities":{},
                  "clientInfo":{"name":"r","version":"1"}}})
recv(p, 1)
send(p, {"jsonrpc":"2.0","method":"notifications/initialized","params":{}})
send(p, {"jsonrpc":"2.0","id":2,"method":"tools/call",
         "params":{"name":"index_repository","arguments":{"repo_path": FIXTURE}}})
recv(p, 2, 60)
send(p, {"jsonrpc":"2.0","id":3,"method":"tools/call",
         "params":{"name":"index_status","arguments":{"project":"sample-react-project"}}})
print("index_status in pass 1:", text_of(recv(p, 3)))
p.stdin.close()  # close stdin; this is the "EOF" termination mode

# --- Pass 2: restart with the same CACHE ---
p = subprocess.Popen([BINARY], stdin=subprocess.PIPE, stdout=subprocess.PIPE, bufsize=0,
                     env={**os.environ, "CBM_CACHE_DIR": CACHE})
send(p, {"jsonrpc":"2.0","id":1,"method":"initialize",
         "params":{"protocolVersion":"2024-11-05","capabilities":{},
                  "clientInfo":{"name":"r","version":"1"}}})
recv(p, 1)
send(p, {"jsonrpc":"2.0","method":"notifications/initialized","params":{}})
send(p, {"jsonrpc":"2.0","id":2,"method":"tools/call",
         "params":{"name":"list_projects","arguments":{}}})
print("list_projects in pass 2:", text_of(recv(p, 2)))
send(p, {"jsonrpc":"2.0","id":3,"method":"tools/call",
         "params":{"name":"index_status","arguments":{"project":"sample-react-project"}}})
print("index_status in pass 2:", text_of(recv(p, 3)))
send(p, {"jsonrpc":"2.0","id":4,"method":"tools/call",
         "params":{"name":"search_code","arguments":{"pattern":"App","project":"sample-react-project"}}})
print("search_code in pass 2:", text_of(recv(p, 4)))
p.stdin.close()
PY

# Inspect the cache; the project .db is on disk; the .db-wal and .db-shm may also be present.
ls -la "$CACHE"

# Diagnostic only — disposable cache. Do NOT do this in production.
rm -f "$CACHE"/*.db-wal "$CACHE"/*.db-shm

# Restart again; index_status and search_code now succeed without re-indexing.
```

### Why this is not just a CLI-wrapper or field-name issue

The four raw-MCP query tools tested in this issue (`index_status`, `search_code`, `list_projects`, `get_architecture`) all use the live schema's canonical `project` field (the field name is `project`, not `project_name`; the related CLI vs MCP schema mismatch is reported in a separate draft). The query calls in this issue's reproduction succeed in the same process and fail in the restart process with the exact same field values. The defect is in the server-side state at the moment of `cbm_store_open_path_query` in `resolve_store`, not in the client's field name.

### Local verdict

The local controlled pilot verdict is `CODEBASE MEMORY PILOT BLOCKED — NO OPENCLAW CONFIG CHANGES RETAINED`. The locally-compiled v0.8.1 binary has not been registered as an MCP server in the test environment; `openclaw.json` was not modified; no skills, hooks, or instructions were installed. No part of this controlled pilot was performed against an authoritative or production repository. The standard non-UI binary and a 3-file generic React fixture were used. All cache directories used by the test were disposable and isolated per test.

---


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project remains listed but becomes unqueryable after MCP restart with stale WAL/SHM sidecars #790

Environment

Expected behavior

Actual behavior

Process shutdown behavior (kept separate from the persistence defect)

Diagnostic result (disposable cache only)

Compact before/after cache inventory

Relationship to PR #387 (merged) — context only

Relationship to issue #277 (open) — context only

Relationship to issue #557 (open) — context only

Minimal reproduction

Why this is not just a CLI-wrapper or field-name issue

Local verdict

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Test	Termination sent	Process exited within 15 s window?	Exit code	Time to exit
A	close client stdin cleanly (no signal)	yes	0	0.0 s
B	SIGINT	yes	0	0.41 s
C	SIGTERM	yes	0	0.21 s
D	SIGKILL	yes	-9	0.2 s

Stage	File	Size (bytes)	SHA-256 (first 16 hex)
Before termination	`…sample-react-project.db`	1,769,472	`b7e13d8b157b09a7` (test B; varies per test)
Before termination	`_config.db`	12,288	`e8f7566de75fe3f6` (constant across all tests)
Immediately after exit	`…sample-react-project.db`	1,769,472	`b7e13d8b157b09a7` (unchanged)
Immediately after exit	`_config.db`	12,288	`e8f7566de75fe3f6` (unchanged)
After restart sequence (queries failed)	`…sample-react-project.db`	1,769,472	`b7e13d8b157b09a7` (unchanged)
After restart sequence (queries failed)	`_config.db`	12,288	`e8f7566de75fe3f6` (unchanged)

Project remains listed but becomes unqueryable after MCP restart with stale WAL/SHM sidecars #790

Description

Environment

Expected behavior

Actual behavior

Process shutdown behavior (kept separate from the persistence defect)

Diagnostic result (disposable cache only)

Compact before/after cache inventory

Relationship to PR #387 (merged) — context only

Relationship to issue #277 (open) — context only

Relationship to issue #557 (open) — context only

Minimal reproduction

Why this is not just a CLI-wrapper or field-name issue

Local verdict

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions