Skip to content

Independent benchmark: LoCoMo / LongMemEval subset #4

Description

@Da7-Tech

The built-in benchmark (bench/bench.py) is honest but self-authored. The memory-tools space compares on LoCoMo / LongMemEval — and self-reported numbers there are famously contested, so an independent, reproducible harness would stand out. Task: a bench/locomo.py that downloads a public subset, maps it to remember/recall calls, and publishes the score with the exact commit + command. Zero-dependency constraint applies to mind.py only — the harness may use whatever it needs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    benchmarkMeasurement and evaluationhelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions