Skip to content

Da7-Tech/mind

Repository files navigation

mind — brain-like memory for any coding agent

tests python deps recall@1 license English العربية

One Python file. Zero dependencies. Zero API keys. Fully offline. Multilingual — engineered for EN + AR, measured on 10 languages.

Your coding agent forgets everything between sessions. mind gives it a memory that works the way yours does: a weighted concept graph that recalls by spreading activation (not flat search), forgets by the Ebbinghaus curve (unused memories fade, reinforced ones harden), and reorganizes itself while you sleep through a deterministic dream cycle — no LLM calls, no token bill, every decision explained in a journal you can read.

It plugs into every agent at once: one memory, exported to AGENTS.md (Codex, Cursor, Zed, ...), CLAUDE.md (Claude Code) and GEMINI.md — and adopted automatically by .cursorrules, .windsurfrules, .clinerules and .roo/rules/mind.md in projects that already use those tools.

curl -fsSLO https://raw.githubusercontent.com/Da7-Tech/mind/v6.1.3/mind.py
python3 -c "import hashlib;h=hashlib.sha256(open('mind.py','rb').read()).hexdigest();assert h=='27a40a5d44ea9b2fdce0eb18bb90bcbb885d98064e085cc2579cf3b16c18be97',h;print('mind.py: OK')"
python3 mind.py init
python3 mind.py remember "the project database is postgres 16"
python3 mind.py recall "which database do we use"
# recall for "which database do we use" — 1 results [0.20 ms]
#   1. [0.033] (direct) the project database is postgres 16
python3 mind.py dream        # between sessions: forget, consolidate, promote

That's the whole install — pinned to a release tag and integrity-checked with nothing but python3 (works on Windows too). No server, no vector store, no embedding model, no configuration file.

Measured, not vibes

Every number below comes from python3 bench/bench.py — rerun it yourself (Python 3.14 on Apple M-series — latencies are environment-dependent, rerun for your hardware; 20 bilingual queries with known answers against distractor-filled graphs):

graph size recall@1 recall@5 median latency p95
100 nodes 1.00 1.00 ~0.5–0.7 ms ~2–4 ms
1,000 nodes 1.00 1.00 ~2–3 ms ~11–16 ms

(The long-standing 0.95 miss — a category question, "what css framework", against a memory that only said "tailwind" — is closed by the curated concept seed in 5.5.0. Synonymy outside the seed and the corpus still misses; see limitations.)

Dream determinism: PASS — identical memory state always produces the identical consolidation plan.

180-day soak (bench/soak.py — the real code driven through an injected clock with a realistic workload: daily/weekly/monthly facts + 357 junk notes

  • a dream every night): core-fact survival 15/15 across all cadence tiers, junk older than the grace window surviving: 0/256, working memory 8/8 hot slots held by core facts, graph size bounded (~106 nodes), recall on the aged graph ~0.4 ms. The soak caught two real calibration bugs before release (facts pruned one day before their first monthly recall; decayed weight vetoing exact matches) — both fixed with regression tests.

Multilingual, measured (bench/multilang.py, runs in CI): the tool is engineered for English + Arabic, but the core (Unicode tokenizer, IDF, char-n-gram fallback) is language-neutral, and 5.6.0 adds character-bigram indexing for scripts written without spaces. Recall@1 on 8 languages it was never tuned for, each against distractor noise:

French German Spanish Russian Turkish Chinese Japanese Korean
3/3 3/3 3/3 3/3 3/3 3/3 3/3 3/3

(Chinese/Japanese were 3/6 before the bigram tokenizer. 3 queries per language is a smoke benchmark, not the 20-query EN/AR suite; heavy inflection without stemming will cost precision in cases it doesn't cover. Thai is tokenized the same way but not yet benchmarked.)

Discrimination, measured (bench/discrim.py, runs in CI): two independent audits correctly pointed out that needle-in-clean-noise recall says nothing about telling apart facts that SHARE vocabulary — so this benchmark uses only lexically competing distractors, including the audits' exact failure cases ("what is my name" vs "file name must match the class name", in both English and Arabic). Current score: 12/12, gated at ≥ 0.85 in CI so discrimination can never silently regress again.

Fuzzed (bench/fuzz.py, seeded + deterministic): 420 adversarial cases in the full run (CI runs the 160-case quick set on every push) — hostile graph files (NaN/Infinity, wrong-typed fields, control characters, truncated JSON, dangling edges) and hostile CLI input. Contract: never a traceback, never data loss, the graph always loads clean afterwards. Its first run caught a real defect that six earlier audit rounds had missed (see CHANGELOG 5.5.0).

Mutation-tested (bench/mutate.py): first-order defects are injected (flipped comparisons, broken arithmetic, nudged constants) and the suite must catch them. Its first run exposed 17 behaviors the tests didn't actually pin down — each is now locked by a dedicated regression test (raw kill rate went 33% → 43% on the seeded 120-mutant sample; the raw number is published because hiding it would be the exact sin this tool exists to catch). Surviving mutants are triaged in the tool's output: unreachable get() defaults, display-only constants, and ranking-calibration values guarded by the CI benchmark gate (recall@1 ≥ 0.9) rather than unit assertions.

Test suite: 165 tests, stdlib unittest, python3 -m unittest discover -s tests — including regression tests for concurrency (parallel writers must not lose each other's memories), destructive-op gating, corrupt-graph recovery, and a mutation-kill class where every test pins a behavior the suite previously didn't bite on.

How it works — three layers, like a brain

Layer 1  WORKING MEMORY   .mind/ACTIVE.md  → injected into agent rule files
         the ~200-300 tokens the agent always sees: hottest memories + cortex index

Layer 2  HIPPOCAMPUS      .mind/graph.json → weighted concept graph
         recall = spreading activation (≤3 hops) fused with direct keyword
         matches via Reciprocal Rank Fusion + IDF, re-ranked by offline
         hash embeddings; near-duplicate results are diversified (pattern
         separation); fuzzy fallback finds memories from partial cues
         (pattern completion)

Layer 3  CORTEX           .mind/cortex/*.md → consolidated durable knowledge
         fed by the dreamer when a cluster of related memories recurs

DREAMER  between sessions  python3 mind.py dream [--dry-run]
         light sleep  count + clear session signals (telemetry, reported in the journal)
         deep sleep   Ebbinghaus decay  R = e^(−t/S)  — stability S grows
                      with each confirmed recall; weak unused nodes pruned;
                      weak edges pruned (synaptic pruning)
         REM          cluster related memories → promote recurring themes
                      to cortex; flag contradictions (never auto-delete)

Wrong memory? Reconsolidation is temporal fusion (6.0.0), not erasure:

python3 mind.py correct "database mysql" "the database is postgres 16"
# the MySQL fact is CLOSED (valid_to = now, superseded_by = <new id>),
# the transition is an explicit `supersedes` edge, and recall stops
# returning the old fact — but the graph still knows you migrated.
# Closed facts stay in the graph through the 45-day grace window, then
# archive; beyond that the lineage lives in the successor's history
# entries and the journal (forever), so `--at` answers within the
# retention window and `why` answers always.

Provenance & time — "where did this fact come from, and is it still true?"

Every fact answers both questions, from the moment it is learned:

  • Write-time provenance: every node records origin (who wrote it — agents set MIND_BY / MIND_SESSION env vars — and via which command), and every mutation (remember / link / confirm / correct / prune) appends to .mind/journal.jsonl — an append-only log that is never rotated and never cleared (unlike signals.jsonl, which is session telemetry). Even after a fact is pruned, the journal keeps its lineage. Journal appends are single O_APPEND writes (safe under concurrent writers on local filesystems); if the journal is unwritable the memory write still succeeds with a warning — availability over completeness, and why says plainly when provenance is missing.
  • Truth validity, separate from attention: weight says how salient a memory is; valid_from/valid_to say whether it is true. Decay touches only salience — nothing is ever marked false by forgetting. Only an explicit correct (or a contradiction you resolve) closes a fact's validity.
  • Ask the graph:
python3 mind.py why a1b2c3d4e5f6        # origin, validity, history, events
python3 mind.py entity "database"       # every fact about a term — current
                                        # and superseded, with intervals
python3 mind.py recall "which db" --at 2026-01-15   # what was true THEN

Honest scope: entity resolution is lexical (normalization + stemming + the concept seed unify spellings, inflections, and AR↔EN variants of the same term) — pronouns and free descriptions ("the new hire" = "Sara") are not resolved; that needs a model and would break the zero-dependency promise.

Why not just use ___?

why not just use

spreading-activation recall sleep consolidation temporal validity provenance log works with any agent zero setup / zero keys consolidation costs $0 (no LLM)
mem0 ~ metadata ✗ (SDK) ✗ needs LLM + embedder keys
Letta (MemGPT) ✓ sleep-time compute ✗ (own server) ✗ full platform ✗ burns tokens
Zep / Graphiti ~ graph traversal ✓ bi-temporal (its core strength) ✗ Neo4j + LLM keys
HippoRAG 2 ✓ PageRank ✗ (batch RAG lib) ✗ GPU/API
OpenClaw dreams ✗ (OpenClaw only) ✓ inside OpenClaw ✗ burns tokens
claude-mem ✗ compression ~ session refs ~ several agents ✗ Bun + worker + Chroma
mind ✓ deterministic ✓ valid-time + --at ✓ append-only journal + why ✓ AGENTS/CLAUDE/GEMINI ✓ one file

Credit where due: Graphiti's bi-temporal model is the reference point for temporal knowledge graphs — mind's valid-time layer (6.0.0) is the zero-dependency, deterministic take on the same idea, not a claim to have out-modeled it.

Honest note: Brain Memory is the closest project in spirit (files + decay + sleep phases) — credit where due. mind differs in being a single copy-able file, bilingual EN/AR at the tokenizer level, fully deterministic in consolidation, and shipping with a reproducible benchmark instead of claims.

Commands

command what it does
init create .mind/ + export agent files
remember "text" add a memory node
link "a" "b" [rel] connect two memories (weighted edge)
recall "question" spreading-activation recall (prints memory ids)
confirm <id> [...] reinforce memories that actually answered you
correct "old" "new" reconsolidate a wrong memory (history kept)
dream [--dry-run] run the sleep cycle; journal in .mind/dreams/
export regenerate agent rule files
status health report

Reinforcement is explicit: recall is a pure read (repeated queries can't skew weights); when a recalled memory actually answers the question, the agent runs confirm <id> — that hardens the memory (+2 weeks stability) and restrengthens its edges. The exported agent instructions teach this loop, and every dream weakens all edges slightly (synaptic homeostasis), so connections that never earn a confirmation decay and prune away.

Safety properties

  • Atomic, durable, symlink-refusing writes everywhere — O_NOFOLLOW + fsync-before-rename (survives power loss), the lock file itself is opened symlink-safe, and every internal write also rejects a symlinked parent directory so nothing can escape the .mind/ boundary
  • Never silently destroys data: corrupt graphs are quarantined, not erased; memories pruned by decay are archived to .mind/archive.md — and if the archive cannot be written, nothing is pruned at all; user content in AGENTS.md/CLAUDE.md is preserved outside guard markers
  • dream --dry-run previews the full plan without touching disk
  • File-locked saves — safe under concurrent agent processes
  • Memory files are plain JSON + Markdown: git diff them, sync them, read them

Honest limitations

  • Recall is lexical + graph-structural + a curated concept seed (83 unambiguous tool→category mappings: tailwind→css, hetzner→cloud, sentry→errors...). Cross-domain synonymy outside that seed and the corpus still misses; polysemous words (black, express, spring...) are deliberately excluded from the seed because a false category on an everyday sentence is worse than a missed synonym. True embeddings would close the remainder at the cost of the zero-dependency promise; a pluggable backend is on the roadmap.
  • Arabic stemming is light (prefix/suffix + broken-plural seed), not a full morphological analyzer. Other languages get no stemming or stopword lists at all — IDF and character n-grams carry them (measured above), but inflection-heavy queries will miss more often than in EN/AR.
  • No-space scripts (Chinese, Japanese, Korean, Thai) are indexed as character bigrams, not true word segmentation — the standard search-engine tradeoff: excellent recall, occasional false bigram overlap between unrelated phrases.
  • Optimized for personal/project agent memory (10²–10³ nodes), not enterprise RAG over millions of documents — use a real graph DB for that.
  • Tokens shorter than 3 characters (db, ai, os) are not indexed — write them out once ("database", "openai") and the co-occurrence index bridges the rest.
  • Timestamps are naive local time compared lexicographically (ISO). On a single machine this is exact; syncing one .mind/ across machines in different time zones can skew validity ordering by the zone offset.
  • Node ids are md5[:12] of the text — content addressing only; no security property is (or should be) derived from them.
  • A fact recalled fewer than twice and untouched for longer than the 45-day grace window decays out of the graph (into the archive). Facts you need less often than ~every six weeks should live in cortex notes, not the hippocampus — that's the brain deal: use it or archive it.

Using with Hermes, Claude Code, Codex, Gemini CLI...

mind init writes the working memory into AGENTS.md, CLAUDE.md and GEMINI.md with guard markers, preserving your existing content. If the project already has .cursorrules, .windsurfrules, .clinerules or a .roo/ directory, those rule files are kept in sync too — adopted, never imposed on projects that don't use them. Any agent that reads those files gets the memory and the instructions to use it — nothing else to configure. A ready-made Hermes skill lives in SKILL.md.

Development

python3 -m unittest discover -s tests   # 165 tests
python3 bench/bench.py                  # reproduce the EN/AR numbers
python3 bench/multilang.py              # 8 untuned languages
python3 bench/soak.py                   # 180 simulated days
python3 bench/discrim.py                # competing-distractor recall
python3 bench/fuzz.py                   # 420 adversarial cases
python3 bench/mutate.py                 # does the suite actually bite?

Exit codes are a contract: 0 success (including "no results"), 1 runtime/library failure, 2 usage error.

Design rationale: docs/DESIGN.md · Arabic README: README.ar.md · License: MIT

Contributing

Issues and PRs welcome — the roadmap issues are scoped and ready to pick up. Ground rules: keep mind.py a single stdlib-only file, every change needs a test, and claims need measurements (bench/bench.py must stay green). Questions → Discussions.

If mind remembers something useful for you, a ⭐ helps other agents' humans find it.

About

Brain-like memory for any coding agent: spreading-activation recall, Ebbinghaus forgetting, deterministic dream consolidation. One Python file, zero dependencies, offline, EN+AR.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages