Independent benchmark: LoCoMo / LongMemEval subset

The built-in benchmark (`bench/bench.py`) is honest but self-authored. The memory-tools space compares on LoCoMo / LongMemEval — and self-reported numbers there are famously contested, so an **independent, reproducible** harness would stand out. Task: a `bench/locomo.py` that downloads a public subset, maps it to remember/recall calls, and publishes the score with the exact commit + command. Zero-dependency constraint applies to `mind.py` only — the harness may use whatever it needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Independent benchmark: LoCoMo / LongMemEval subset #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Independent benchmark: LoCoMo / LongMemEval subset #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions