Agent memory persistence — Benchmark Sources & Consensus
Agent ability to retain, transfer, and recall context across sessions — measured by task success rates before and after memory handoffs between models or restarts.
Platforms tracked: Hermes · Openclaw · Nemoclaw · Claude Cowork · Chatgpt
Consensus across 1 source
Across 1 source, VEKTOR Slipstream edges Microsoft PAM on Transfer Continuity Score (0.894 vs 0.880) across 50 engineering scenarios; memory lift ratio 6.61x vs 2.51x.
All Sources
We aggregate published benchmarks; we never run our own tests and never pick winners. Each row links back to the original publication.
| Source | Date | Finding | Methodology | Quality |
|---|---|---|---|---|
| Medium / Vektor Memory | 2026-05-31 | VEKTOR Slipstream scores 0.894 Transfer Continuity Score vs Microsoft PAM's 0.880 across 50 engineering scenarios; memory lift ratio 6.61x vs 2.51x. | Transfer Continuity Score (task success with vs without memory transfer); 50 engineering scenarios across Q&A, coding, planning; GPT-4 Turbo baseline | high |
How we work
OpenClawDatabase aggregates and links to published benchmarks. We don't run our own tests, and we don't pick winners. Our weekly benchmark-aggregator routine scans 7+ live leaderboards (OpenRouter, Aider, SWE-bench, GAIA, LMSYS, BigCodeBench, MMLU-Pro) plus relevant Reddit and Hacker News threads, then writes structured entries into /assets/benchmarks.json. Every row here links back to the original publication.
← Back to all benchmark tasks · See also: Decision guide · Cost calculator