Last updated: 2026-04-06

Persistent Memory Architecture — Episodes, Facts & Compression

Memory is the feature that separates Hermes from every other open-source agent. Where OpenClaw keeps a conversation window, Hermes keeps a database — one that grows across months, compresses intelligently, and is retrieved selectively based on what's relevant to the current task. This guide explains exactly how it works and how to tune it.

The Three Memory Types

Hermes organises everything it knows into three distinct memory types, each stored separately and retrieved differently:

Type	What it stores	Retention	Retrieval method
Episodic	Raw session logs — what happened, when, in what order	Full fidelity for 30 days, then compressed	Recency + semantic similarity to current task
Semantic	Compressed facts extracted from episodes — names, decisions, preferences, relationships	Indefinite — never auto-deleted	Keyword and semantic search
Procedural	Learned patterns — what approaches worked, what failed, how the user prefers tasks done	Indefinite — updated by self-reflection cycle	Task-type matching at task start

At the start of each task, Hermes assembles a memory context from all three types: recent episodes that look relevant, facts that match key entities in the task, and procedural patterns that apply to this task type. This assembled context — not raw conversation history — is what gets injected into the model's context window.

The Memory Lifecycle

1. Session Recording (Episodic store)

Every turn of every conversation is stored as a raw episode. Each episode has a timestamp, session ID, task context, the messages exchanged, and a vector embedding for similarity search. Immediately after a session, new episodes are marked as raw.

2. Fact Extraction (Semantic store)

Within an hour of a session ending, Hermes runs a lightweight background job using the light model (Haiku by default) to extract facts from new raw episodes:

# This extraction happens automatically — you don't trigger it manually.
# But you can check its output:
hermes memory facts list --recent 20

# Example output:
# [fact:042] User prefers reports in bullet point format (confidence: high)
# [fact:043] Project "Atlas" uses Python 3.12 and PostgreSQL
# [fact:044] User's GitHub username is: your-handle
# [fact:045] Deadline for Q2 report: 2026-06-30

3. Episode Compression

After 30 days, raw episodes are compressed. The compression job:

Groups related episodes into clusters (by task, topic, or time window)
Uses the light model to write a dense summary of each cluster
Stores the summary as a compressed episode, discarding the raw logs
Retains any facts already extracted to the semantic store — those are not affected

Compression reduces storage by roughly 90% while preserving the information Hermes needs for future retrieval. A 6-month episodic log typically compresses from several hundred MB to under 20 MB.

4. Self-Reflection (Procedural store)

After each task completes, Hermes runs a reflection pass — a short model call asking: "What did I do well? What could I do differently? What should I remember about how this type of task works?" The output goes to REFLECTIONS.md in the workspace and to the procedural memory store. At the start of similar future tasks, these reflections surface automatically.

You can read the reflection log:

hermes memory reflections list

# Or read the file directly:
cat ~/.hermes/workspace/REFLECTIONS.md

Memory Backends

SQLite (default — personal use)

{
  "memory": {
    "backend": "sqlite",
    "path": "~/.hermes/memory.db",
    "vacuumSchedule": "weekly",   // auto-vacuum to reclaim space
    "walMode": true               // write-ahead logging for better concurrency
  }
}

SQLite is the default for good reason: zero configuration, single file, trivial to back up (cp ~/.hermes/memory.db ~/backup/). It handles millions of episodes without performance problems. The only reason to switch to PostgreSQL is if multiple machines need to share the same memory store.

PostgreSQL (team/production use)

{
  "memory": {
    "backend": "postgres",
    "connectionString": "${HERMES_DB_URL}",
    // e.g. postgres://user:password@localhost:5432/hermes
    "poolSize": 5,
    "sslMode":  "require"
  }
}

PostgreSQL enables multiple Hermes daemons to share a memory store — useful if you run Hermes on both a VPS and a local machine and want them to share context. The schema is applied automatically on first connection:

hermes db migrate   # apply schema to a fresh PostgreSQL database

Memory Retrieval — How Hermes Finds Relevant Context

When a new task arrives, Hermes queries the memory store using a multi-pass retrieval strategy:

Recency pass: Always include the last 3 episodes regardless of relevance
Semantic pass: Embed the task description, run vector similarity search against episode embeddings, include the top 5 results
Entity pass: Extract named entities from the task (project names, people, domains), pull all facts tagged with those entities
Procedural pass: Match the task's inferred type (research, writing, coding, monitoring) against procedural patterns
Deduplication: Merge overlapping results, rank by combined recency + relevance score, trim to fit context budget

The context budget is configurable:

{
  "memory": {
    "retrieval": {
      "contextBudgetTokens": 20000,  // how many tokens of memory to inject per task
      "recencyEpisodes":    3,       // always include N most recent
      "semanticTopK":       5,       // semantic search result count
      "minRelevanceScore":  0.65     // discard results below this similarity threshold
    }
  }
}

Increasing contextBudgetTokens improves recall but costs more

A higher budget means more memory injected per task — which means more input tokens charged per API call. For most tasks, 20,000 tokens of memory context is plenty. For complex projects with months of history, 40,000–60,000 may be warranted. Monitor your API spend and adjust accordingly — see the Cost Optimisation guide for general token budgeting strategies.

Vector Embeddings

Semantic retrieval depends on vector embeddings. Hermes generates embeddings when episodes are stored and when tasks are submitted. The embedding model is configured separately from the main model:

{
  "memory": {
    "embeddings": {
      "provider":   "anthropic",           // anthropic | openai | local
      "model":      "text-embedding-3-small",  // used if provider is openai
      // For Anthropic, uses the built-in embedding endpoint
      // For local: use ollama with nomic-embed-text
      "dimensions": 1536,
      "batchSize":  100   // embed up to 100 episodes per batch job
    }
  }
}

Using OpenAI's text-embedding-3-small for embeddings while using Claude for generation is a common cost-saving pattern — embedding calls are cheap (~$0.02/million tokens) and the model quality difference for retrieval is minimal.

For fully local embeddings with no API cost:

# Pull a local embedding model via Ollama
ollama pull nomic-embed-text

# Configure Hermes to use it
hermes config set memory.embeddings.provider "ollama"
hermes config set memory.embeddings.model "nomic-embed-text"

Manual Memory Management

You can manually add, edit, and delete memory entries:

# Add a fact manually (useful for bootstrapping a new install)
hermes memory fact add "User's timezone is Europe/London (UTC+1 in summer)"
hermes memory fact add "Primary project is 'Atlas' — B2B SaaS, Python/PostgreSQL stack"
hermes memory fact add "Preferred report format: executive summary first, bullet points, tables"

# Search memory
hermes memory search "Atlas project"
hermes memory search --type facts "deadline"
hermes memory search --type episodes "GitHub"

# Delete a fact
hermes memory fact delete fact:044

# Compact memory manually (useful before a big task to ensure retrieval is optimal)
hermes memory compact

# Full memory stats
hermes memory status --verbose
# Episodes: 142 (raw: 12, compressed: 130)
# Facts: 89
# Reflections: 28
# Embeddings: 142 episode + 89 fact vectors
# DB size: 18.4 MB
# Last vacuum: 2026-04-01
# Last compression: 2026-04-05

MEMORY.md — Manual Seed File

Like OpenClaw's MEMORY.md, Hermes reads ~/.hermes/workspace/MEMORY.md at daemon start and injects it into every task context. Use it for facts you want Hermes to always know, regardless of retrieval scoring:

# ~/.hermes/workspace/MEMORY.md

## Always Remember
- My name: [Your name]
- My timezone: Europe/London
- Primary project: Atlas — Python 3.12, PostgreSQL, deployed on Hetzner
- GitHub username: your-handle
- I prefer concise updates — one sentence per item unless I ask for more

## Do Not
- Refer me to professionals for general questions
- Add disclaimers to every response

Backing Up and Migrating Memory

# Back up the SQLite store (stop daemon first for clean copy)
hermes stop
cp ~/.hermes/memory.db ~/backups/hermes-memory-$(date +%F).db
hermes start

# Or use the built-in export (works while running — uses WAL snapshot)
hermes memory export --output ~/hermes-memory-export.json
# Exports all episodes, facts, and reflections as JSON

# Import on a new machine
hermes memory import ~/hermes-memory-export.json
# Re-generates embeddings automatically (may take a few minutes for large stores)

← Back to Hermes hub · See also: Long-Running Tasks & Scheduling · Quick Start