Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions Makefile.cbm
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,8 @@ CYPHER_SRCS = src/cypher/cypher.c
# MCP server module (new)
MCP_SRCS = src/mcp/mcp.c

MEMORY_SRCS = src/personal_memory/memory.c

# Discover module (new)
DISCOVER_SRCS = \
src/discover/language.c \
Expand Down Expand Up @@ -304,7 +306,7 @@ TRE_CFLAGS = -std=c11 -g -O1 -w -Ivendored/tre
YYJSON_SRC = vendored/yyjson/yyjson.c

# All production sources
PROD_SRCS = $(FOUNDATION_SRCS) $(STORE_SRCS) $(CYPHER_SRCS) $(MCP_SRCS) $(DISCOVER_SRCS) $(GRAPH_BUFFER_SRCS) $(PIPELINE_SRCS) $(SIMHASH_SRCS) $(SEMANTIC_SRCS) $(TRACES_SRCS) $(WATCHER_SRCS) $(GIT_SRCS) $(CLI_SRCS) $(UI_SRCS) $(YYJSON_SRC)
PROD_SRCS = $(FOUNDATION_SRCS) $(STORE_SRCS) $(CYPHER_SRCS) $(MCP_SRCS) $(MEMORY_SRCS) $(DISCOVER_SRCS) $(GRAPH_BUFFER_SRCS) $(PIPELINE_SRCS) $(SIMHASH_SRCS) $(SEMANTIC_SRCS) $(TRACES_SRCS) $(WATCHER_SRCS) $(GIT_SRCS) $(CLI_SRCS) $(UI_SRCS) $(YYJSON_SRC)
EXISTING_C_SRCS = $(EXTRACTION_SRCS) $(LSP_SRCS) $(TS_RUNTIME_SRC) \
$(GRAMMAR_SRCS) $(AC_LZ4_SRCS) $(ZSTD_SRCS) $(SQLITE_WRITER_SRC)

Expand Down Expand Up @@ -692,7 +694,7 @@ SYSROOT = $(shell xcrun --show-sdk-path 2>/dev/null)
SYSROOT_FLAG = $(if $(SYSROOT),-isysroot $(SYSROOT),)

# Our source files (excluding vendored, grammars, tree-sitter runtime)
LINT_SRCS = $(FOUNDATION_SRCS) $(STORE_SRCS) $(CYPHER_SRCS) $(MCP_SRCS) \
LINT_SRCS = $(FOUNDATION_SRCS) $(STORE_SRCS) $(CYPHER_SRCS) $(MCP_SRCS) $(MEMORY_SRCS) \
$(DISCOVER_SRCS) $(GRAPH_BUFFER_SRCS) $(PIPELINE_SRCS) $(SIMHASH_SRCS) $(SEMANTIC_SRCS) \
$(TRACES_SRCS) $(WATCHER_SRCS) $(CLI_SRCS) $(EXTRACTION_SRCS) $(AC_LZ4_SRCS) \
$(ZSTD_SRCS) $(SQLITE_WRITER_SRC) $(MAIN_SRC)
Expand Down
45 changes: 39 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

**The fastest and most efficient code intelligence engine for AI coding agents.** Full-indexes an average repository in milliseconds, the Linux kernel (28M LOC, 75K files) in 3 minutes. Answers structural queries in under 1ms. Ships as a single static binary for macOS, Linux, and Windows — download, run `install`, done.

High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-sitter/) AST analysis across all 158 languages, enhanced with [**Hybrid LSP** semantic type resolution](#hybrid-lsp) for Python, TypeScript / JavaScript / JSX / TSX, PHP, C#, Go, C, C++, Java, Kotlin, and Rust — producing a persistent knowledge graph of functions, classes, call chains, HTTP routes, and cross-service links. 14 MCP tools. Zero dependencies. Plug and play across 11 coding agents.
High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-sitter/) AST analysis across all 158 languages, enhanced with [**Hybrid LSP** semantic type resolution](#hybrid-lsp) for Python, TypeScript / JavaScript / JSX / TSX, PHP, C#, Go, C, C++, Java, Kotlin, and Rust — producing a persistent knowledge graph of functions, classes, call chains, HTTP routes, and cross-service links. 15 MCP tools. Zero dependencies. Plug and play across 11 coding agents.

> **Research** — The design and benchmarks behind this project are described in the preprint [*Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP*](https://arxiv.org/abs/2603.27277) (arXiv:2603.27277). Evaluated across 31 real-world repositories: 83% answer quality, 10× fewer tokens, 2.1× fewer tool calls vs. file-by-file exploration.

Expand All @@ -37,7 +37,7 @@ High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-si
- **11 agents, one command** — `install` auto-detects Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, and Kiro — configures MCP entries, instruction files, and pre-tool hooks for each.
- **Built-in graph visualization** — 3D interactive UI at `localhost:9749` (optional UI binary variant).
- **Infrastructure-as-code indexing** — Dockerfiles, Kubernetes manifests, and Kustomize overlays indexed as graph nodes with cross-references. `Resource` nodes for K8s kinds, `Module` nodes for Kustomize overlays with `IMPORTS` edges to referenced resources.
- **14 MCP tools** — search, trace, architecture, impact analysis, Cypher queries, dead code detection, cross-service HTTP linking, ADR management, and more.
- **15 MCP tools** — search, trace, architecture, impact analysis, Cypher queries, dead code detection, cross-service HTTP linking, ADR and personal memory management, and more.

## Quick Start

Expand Down Expand Up @@ -136,7 +136,7 @@ Removes all agent configs, skills, hooks, and instructions. Does not remove the

### Graph & analysis
- **Architecture overview**: `get_architecture` returns languages, packages, entry points, routes, hotspots, boundaries, layers, and clusters in a single call
- **Architecture Decision Records**: `manage_adr` persists architectural decisions across sessions
- **Architecture Decision Records**: `manage_adr` persists project ADRs; `manage_memory` stores personal repo memory locally outside source repos
- **Louvain community detection**: Discovers functional modules by clustering call edges
- **Git diff impact mapping**: `detect_changes` maps uncommitted changes to affected symbols with risk classification
- **Call graph**: Resolves function calls across files and packages (import-aware, type-inferred)
Expand Down Expand Up @@ -176,6 +176,7 @@ Removes all agent configs, skills, hooks, and instructions. Does not remove the

### Distribution & operation
- **Single static binary, zero infrastructure**: SQLite-backed, persists to `~/.cache/codebase-memory-mcp/`
- **Personal memory**: Local repo knowledge lives in `manage_memory` under the user data dir; it is never written to the repo unless you explicitly export artifacts
- **Auto-sync**: Background watcher detects file changes and re-indexes automatically
- **Route nodes**: REST endpoints are first-class graph entities
- **CLI mode**: `codebase-memory-mcp cli search_graph '{"name_pattern": ".*Handler.*"}'`
Expand All @@ -197,6 +198,20 @@ Commit a single compressed file to your repo and your teammates skip the reindex

The result is similar in spirit to graphify's `graphify-out/` directory, but as a single compressed file with explicit two-tier export, integrity-checked import, and zero merge friction.

## Personal Repo Memory

Use `manage_memory` when you want ADR-style knowledge to stay local and private.

- **Storage**: local SQLite `memory.db` under the user data dir, separate from repo files
- **Override**: set `CBM_MEMORY_DIR=/path/to/private/memory`
- **No upload**: `manage_memory` never writes `.codebase-memory/` or changes tracked files
- **Branch aware**: memory is keyed by repo identity, branch, and document type; feature branches can keep overlays while `main` keeps base memory
- **LLM workflow**: call `manage_memory(mode="get")` at session start, investigate with graph tools if empty/stale, then update with `manage_memory(mode="update", content="...")`
- **Branch promotion**: after branch work, copy branch memory into base with `manage_memory(mode="promote", branch="feature")`
- **Maintenance**: list entries with `manage_memory(mode="list")`, remove one branch/doc with `manage_memory(mode="delete", branch="...")`, inspect paths with `manage_memory(mode="settings")`

`manage_adr(scope="personal", ...)` is also accepted as a compatibility path to the same personal store.

## How It Works

codebase-memory-mcp is a **structural analysis backend** — it builds and queries the knowledge graph. It does **not** include an LLM. Instead, it relies on your MCP client (Claude Code, or any MCP-compatible agent) to be the intelligence layer.
Expand Down Expand Up @@ -349,7 +364,7 @@ Add to `~/.claude/.mcp.json` (global) or project `.mcp.json`:
}
```

Restart your agent. Verify with `/mcp` — you should see `codebase-memory-mcp` with 14 tools.
Restart your agent. Verify with `/mcp` — you should see `codebase-memory-mcp` with 15 tools.

</details>

Expand Down Expand Up @@ -420,6 +435,7 @@ codebase-memory-mcp cli --raw search_graph '{"label": "Function"}' | jq '.result
| `get_architecture` | Codebase overview: languages, packages, routes, hotspots, clusters, ADR. |
| `search_code` | Grep-like text search within indexed project files. |
| `manage_adr` | CRUD for Architecture Decision Records. |
| `manage_memory` | Local personal repo memory outside source repos. |
| `ingest_traces` | Ingest runtime traces to validate HTTP_CALLS edges. |

## Graph Data Model
Expand Down Expand Up @@ -458,14 +474,28 @@ Layered: hardcoded patterns (`.git`, `node_modules`, etc.) → `.gitignore` hier
codebase-memory-mcp config list # show all settings
codebase-memory-mcp config set auto_index true # auto-index on session start
codebase-memory-mcp config set auto_index_limit 50000 # max files for auto-index
codebase-memory-mcp config set auto_update false # disable startup update checks
codebase-memory-mcp config get memory_dir # local personal memory directory
codebase-memory-mcp config set memory_enabled true # local per-repo memory (default)
codebase-memory-mcp config set memory_dir ~/private/cbm # override personal memory dir
codebase-memory-mcp config reset auto_index # reset to default
```

Default memory config:
- `memory_enabled=true`: `manage_memory` is enabled by default.
- `memory_default_scope=personal`: LLM workflows should prefer local personal memory.
- `memory_dir=<user data dir>`: global user-home memory DB location, override with config or `CBM_MEMORY_DIR`.

Default update config:
- `auto_update=true`: MCP startup checks GitHub for newer releases and shows a one-shot notice.
- Set `auto_update=false` to disable network update checks. Manual `codebase-memory-mcp update` still works.

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `CBM_CACHE_DIR` | `~/.cache/codebase-memory-mcp` | Override the database storage directory. All project indexes and config are stored here. |
| `CBM_MEMORY_DIR` | macOS: `~/Library/Application Support/codebase-memory-mcp`; Linux: `~/.local/share/codebase-memory-mcp`; Windows: `%LOCALAPPDATA%/codebase-memory-mcp` | Override local personal memory storage. `manage_memory` writes `memory.db` here and does not touch the repo. |
| `CBM_DIAGNOSTICS` | `false` | Set to `1` or `true` to enable periodic diagnostics output to `/tmp/cbm-diagnostics-<pid>.json`. |
| `CBM_DOWNLOAD_URL` | *(GitHub releases)* | Override the download URL for updates. Used for testing or self-hosted deployments. |
| `CBM_LOG_LEVEL` | `info` | Set the minimum log level. Accepted values (case-insensitive): `debug`, `info`, `warn`, `error`, `none` — or their numeric equivalents `0`–`4` matching the internal enum. Logs go to stderr; stdout is reserved for MCP JSON-RPC. |
Expand All @@ -475,6 +505,9 @@ codebase-memory-mcp config reset auto_index # reset to default
```bash
# Store indexes in a custom directory
export CBM_CACHE_DIR=~/my-projects/cbm-data

# Store personal memory in a private directory
export CBM_MEMORY_DIR=~/private/cbm-memory
```

## Custom File Extensions
Expand All @@ -497,7 +530,7 @@ Project config overrides global for conflicting extensions. Unknown language val

## Persistence

SQLite databases stored at `~/.cache/codebase-memory-mcp/`. Persists across restarts (WAL mode, ACID-safe). To reset: `rm -rf ~/.cache/codebase-memory-mcp/`.
Project index SQLite databases are stored at `~/.cache/codebase-memory-mcp/`. Personal memory is stored separately in the user data dir or `CBM_MEMORY_DIR`. Both persist across restarts (WAL mode, ACID-safe). To reset indexes: `rm -rf ~/.cache/codebase-memory-mcp/`. To reset personal memory, remove `memory.db` from `manage_memory(mode="settings")` output.

## Troubleshooting

Expand Down Expand Up @@ -556,7 +589,7 @@ Also supported (not yet benchmarked): Ada, Agda, Apex, Assembly (NASM), Astro, A
```
src/
main.c Entry point (MCP stdio server + CLI + install/update/config)
mcp/ MCP server (14 tools, JSON-RPC 2.0, session detection, auto-index)
mcp/ MCP server (15 tools, JSON-RPC 2.0, session detection, auto-index)
cli/ Install/uninstall/update/config (10 agents, hooks, instructions)
store/ SQLite graph storage (nodes, edges, traversal, search, Louvain)
pipeline/ Multi-pass indexing (structure → definitions → calls → HTTP links → config → tests)
Expand Down
8 changes: 4 additions & 4 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
"featureList": [
"Indexes 158 programming languages via vendored tree-sitter grammars",
"Hybrid LSP semantic type resolution for Python, TypeScript/JavaScript, PHP, C#, Go, C/C++, Java, Kotlin, and Rust",
"14 MCP tools for structural search, call-path tracing, and Cypher graph queries",
"15 MCP tools for structural search, call-path tracing, and Cypher graph queries",
"Semantic vector code search via bundled nomic-embed-code embeddings (no API key, fully local)",
"Semantic graph edges (SEMANTICALLY_RELATED) and near-clone detection (SIMILAR_TO, MinHash + LSH)",
"Cross-service linking for HTTP, gRPC, GraphQL, tRPC, and pub/sub channels with confidence scoring",
Expand Down Expand Up @@ -699,8 +699,8 @@ <h3>3D graph visualization</h3>
<p>An optional UI binary serves an interactive 3D graph at <code>localhost:9749</code> to explore nodes, edges, and clusters visually.</p>
</div>
<div class="feature">
<h3>14 MCP tools</h3>
<p><code>search_graph</code>, <code>trace_path</code>, <code>detect_changes</code>, <code>query_graph</code> (Cypher), <code>get_architecture</code>, <code>get_code_snippet</code>, <code>manage_adr</code>, and 7 more.</p>
<h3>15 MCP tools</h3>
<p><code>search_graph</code>, <code>trace_path</code>, <code>detect_changes</code>, <code>query_graph</code> (Cypher), <code>get_architecture</code>, <code>get_code_snippet</code>, <code>manage_adr</code>, <code>manage_memory</code>, and 7 more.</p>
</div>
<div class="feature">
<h3>Cypher graph queries</h3>
Expand Down Expand Up @@ -728,7 +728,7 @@ <h3>Change-impact analysis</h3>
</div>
<div class="feature">
<h3>Architecture Decision Records</h3>
<p><code>manage_adr</code> persists architectural decisions alongside the graph, so design rationale survives across sessions and teammates.</p>
<p><code>manage_adr</code> persists project ADRs; <code>manage_memory</code> stores personal repo memory locally outside source repos.</p>
</div>
</div>
</section>
Expand Down
2 changes: 1 addition & 1 deletion docs/llms.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
- License: MIT, open source.
- Languages: 158 (158 vendored tree-sitter grammars compiled into the binary).
- Hybrid LSP type resolution: 9 language families (Python, TypeScript/JavaScript/JSX/TSX, PHP, C#, Go, C/C++, Java, Kotlin, Rust) — a lightweight C implementation of language type-resolution algorithms, structurally inspired by and compatible with major language servers including tsserver, pyright, gopls, Roslyn, Eclipse JDT, and rust-analyzer.
- MCP tools: 14 (search_graph incl. semantic_query vector search, trace_path (alias: trace_call_path), query_graph (Cypher), detect_changes, get_architecture, get_code_snippet, manage_adr, and more).
- MCP tools: 15 (search_graph incl. semantic_query vector search, trace_path (alias: trace_call_path), query_graph (Cypher), detect_changes, get_architecture, get_code_snippet, manage_adr, manage_memory, and more).
- Semantic search: natural-language code discovery via bundled nomic-embed-code embeddings (768-dim, compiled into the binary); 11-signal combined scoring; fully local, no API key.
- Semantic & similarity edges: SEMANTICALLY_RELATED (vocabulary-mismatch matches) and SIMILAR_TO (MinHash + LSH near-clone / duplicate detection).
- Cross-repo intelligence: CROSS_* edges link nodes across multiple repos indexed in one store; multi-galaxy 3D layout and cross-repo architecture summary.
Expand Down
6 changes: 3 additions & 3 deletions scripts/smoke-invariants.sh
Original file line number Diff line number Diff line change
Expand Up @@ -446,9 +446,9 @@ inv_mcp_initialize() {
}

# ── Invariant 4: tools/list returns all expected tools ─────────────────────
# Cross-check against the canonical 14-tool list (TOOLS[] in src/mcp/mcp.c).
EXPECTED_TOOLS="index_repository search_graph query_graph trace_path get_code_snippet get_graph_schema get_architecture search_code list_projects delete_project index_status detect_changes manage_adr ingest_traces"
EXPECTED_TOOL_COUNT=14
# Cross-check against the canonical 15-tool list (TOOLS[] in src/mcp/mcp.c).
EXPECTED_TOOLS="index_repository search_graph query_graph trace_path get_code_snippet get_graph_schema get_architecture search_code list_projects delete_project index_status detect_changes manage_adr manage_memory ingest_traces"
EXPECTED_TOOL_COUNT=15
inv_tools_list() {
if ! mcp_alive; then
fail "tools-list" "server not alive"
Expand Down
Loading
Loading