Last updated: 2026-06-28

Agent memory persistence — Benchmark Sources & Consensus

Name: Agent memory persistence benchmark sources
Creator: OpenClawDatabase
License: https://creativecommons.org/licenses/by/4.0/

Agent ability to retain, transfer, and recall context across sessions — measured by task success rates before and after memory handoffs between models or restarts.

Platforms tracked: Hermes · Openclaw · Nemoclaw · Claude Cowork · Chatgpt

Consensus across 2 sources

Across 2 sources, persistent-memory layers lift agent task success: VEKTOR Slipstream scores 0.894 Transfer Continuity (vs PAM 0.880); World Model MCP adds +10.2 pts on SWE-bench repeat-mistakes.

All Sources

We aggregate published benchmarks; we never run our own tests and never pick winners. Each row links back to the original publication.

Source	Date	Finding	Methodology	Quality
Medium / Vektor Memory	2026-05-31	VEKTOR Slipstream scores 0.894 Transfer Continuity Score vs Microsoft PAM's 0.880 across 50 engineering scenarios; memory lift ratio 6.61x vs 2.51x.	Transfer Continuity Score (task success with vs without memory transfer); 50 engineering scenarios across Q&A, coding, planning; GPT-4 Turbo baseline	high
Hacker News / GitHub	2026-06-24	World Model MCP, a harness-neutral memory layer, cut repeat coding-agent mistakes by +10.2 pts paired delta on 49 SWE-bench instances.	Pre-registered SWE-bench Verified repeat-mistake test; 49 instances; paired	high

How we work

OpenClawDatabase aggregates and links to published benchmarks. We don't run our own tests, and we don't pick winners. Our weekly benchmark-aggregator routine scans 7+ live leaderboards (OpenRouter, Aider, SWE-bench, GAIA, LMSYS, BigCodeBench, MMLU-Pro) plus relevant Reddit and Hacker News threads, then writes structured entries into /assets/benchmarks.json. Every row here links back to the original publication.

← Back to all benchmark tasks · See also: Decision guide · Cost calculator