Last updated: 2026-05-31

Agent memory persistence — Benchmark Sources & Consensus

Agent ability to retain, transfer, and recall context across sessions — measured by task success rates before and after memory handoffs between models or restarts.

Platforms tracked: Hermes · Openclaw · Nemoclaw · Claude Cowork · Chatgpt

Consensus across 1 source

Across 1 source, VEKTOR Slipstream edges Microsoft PAM on Transfer Continuity Score (0.894 vs 0.880) across 50 engineering scenarios; memory lift ratio 6.61x vs 2.51x.

All Sources

We aggregate published benchmarks; we never run our own tests and never pick winners. Each row links back to the original publication.

SourceDateFindingMethodologyQuality
Medium / Vektor Memory 2026-05-31 VEKTOR Slipstream scores 0.894 Transfer Continuity Score vs Microsoft PAM's 0.880 across 50 engineering scenarios; memory lift ratio 6.61x vs 2.51x. Transfer Continuity Score (task success with vs without memory transfer); 50 engineering scenarios across Q&A, coding, planning; GPT-4 Turbo baseline high

How we work

OpenClawDatabase aggregates and links to published benchmarks. We don't run our own tests, and we don't pick winners. Our weekly benchmark-aggregator routine scans 7+ live leaderboards (OpenRouter, Aider, SWE-bench, GAIA, LMSYS, BigCodeBench, MMLU-Pro) plus relevant Reddit and Hacker News threads, then writes structured entries into /assets/benchmarks.json. Every row here links back to the original publication.

← Back to all benchmark tasks · See also: Decision guide · Cost calculator

📬 Weekly Digest — In Your Inbox

One email a week: top news, releases, and our deepest new guide. No spam. Same content via RSS if you prefer.