Recall — Persistent Memory for Any Agent Harness

Recall is a SQLite-backed persistent memory layer for coding agents. Stop-hook extraction captures sessions as you work, MCP tools expose them mid-session, hybrid search (FTS5 + embeddings) retrieves them, and a tiered L0/L1 recall block injects identity + top-ranked records at every session start. Works across Claude Code, OpenCode, and Pi from one local database.

Recall — Persistent Memory for Any Agent Harness

All coding agents forget when a session ends. Recall doesn't — it extracts, indexes, and recalls what matters across every session, across every agent you use.

Built on the Model Context Protocol. One SQLite file. No phone-home. No vendor lock-in.

Stable on Claude Code. Beta on Pi and Alpha for OpenCode (MCP works; lifecycle extensions are early). Codex CLI and Gemini CLI on the roadmap. See Roadmap.

Jump to the Docs

The Problem

AI agents have no memory between sessions. Context is lost. You repeat yourself. Decisions made last week are forgotten today. Every new session re-learns the basics.

How Recall Fixes It

Install once, then forget about it. Recall runs silently in the background:

┌──────────┐    ┌────────────────┐    ┌──────────────┐    ┌───────────────┐    ┌──────────────┐
│ You Work │───▶│ Stop hook fires│───▶│ Auto-Extract │───▶│ SQLite + FTS5 │───▶│ Next Session │
└─────▲────┘    │ (end of turn)  │    └──────────────┘    └───────────────┘    └──────┬───────┘
      │         └────────────────┘                                                    │
      └───────────────────────────── Memory Available ───────────────────────────────┘

Auto-extraction — sessions are parsed into structured summaries incrementally as you work (Stop hook fires at the end of every turn, not only when you exit)
Full-text + semantic search — find anything from any past session
Tiered session-start context — L0 identity (who you are) + L1 importance-ranked top records load automatically
Zero friction — no workflow changes, no manual steps
MCP integration — your agent searches memory automatically through standard MCP tools

Why Recall

Four things that set Recall apart from cloud-hosted memory layers and from agent-specific scratch files:

Local-first, zero infrastructure. One SQLite file at ~/.claude/memory.db. WAL mode, 0600 perms. No vector database, no graph database, no agent server, no API keys for retrieval. Nothing leaves your machine — no telemetry, no phone-home. Optional Ollama for embeddings (also local).
Multi-agent native. One memory layer across the agents you actually use. Stable on Claude Code today; Pi and OpenCode connect via MCP; Codex CLI and Gemini CLI on the way. Memories captured by one agent are searchable from any other agent on the same machine.
Structured taxonomy, not a flat blob. Decisions (with supersede/revert lifecycle and confidence scoring), learnings, breadcrumbs, and curated Library of Alexandria entries — each has a purpose and a query path. Importance scoring (1–10) surfaces what matters first.
Hybrid search that works offline. FTS5 keyword search ships with SQLite — no embedding infrastructure required to find anything. Optional Ollama embeddings layer on top for semantic queries. Both are merged via Reciprocal Rank Fusion. Lose Ollama, lose nothing — the keyword path keeps working.

Quick Start

git clone https://github.com/edheltzel/Recall.git
cd Recall
./install.sh

Verify it works:

mem stats        # Database overview
mem doctor       # Health check

Restart your agent (Claude Code, Pi, or OpenCode) to load the MCP server and hooks.

First run: set your identity

Recall's tiered SessionRecall injects a small identity file at the top of every session (the L0 tier — your role, projects, tools, and working preferences). Without it, L0 is empty and every new session has to re-learn the basics.

mem onboard

A 7-question interview that writes ~/.claude/MEMORY/identity.md. Run it once. Re-run whenever your role, active projects, or working preferences change. Use | (not ,) to separate values so a phrase like no force-push, ever survives as a single entry.

Updating

From inside Claude Code, /Recall:update prints the current vs. latest release and the exact command to run. From a shell:

./update.sh --check   # version check only
./update.sh           # full update: pull, build, migrate, re-register hooks

Uninstalling

./uninstall.sh --dry-run   # preview, touch nothing
./uninstall.sh             # surgical remove; preserves memory.db + backups
./uninstall.sh --purge     # also destroy memory.db + backup tree (confirmed)

Full installation guide — prerequisites, platform support, session extraction setup, uninstalling

How Recall Works

Recall operates as three integrated layers — data flows in automatically, gets stored in a searchable database, and surfaces when you (or Claude) need it.

┌──────────────────────────────────────────────────────────────────────┐
│                        DATA ENTRY POINTS                             │
│                                                                      │
│  ┌────────────┐  ┌────────────┐  ┌──────────────┐  ┌────────────┐    │
│  │ CLI Direct │  │ MCP Server │  │  Stop Hook   │  │   Batch    │    │
│  │  mem add   │  │ (Claude    │  │ SessionExt-  │  │  Extract   │    │
│  │  mem dump  │  │  Code)     │  │  ract.ts     │  │  (cron)    │    │
│  └─────┬──────┘  └─────┬──────┘  └──────┬───────┘  └─────┬──────┘    │
└────────┼────────────────┼────────────────┼────────────────┼──────────┘
         │                │                │                │
         ▼                ▼                ▼                ▼
┌───────────────────────────────────────────────────────────────────────┐
│                      PROCESSING LAYER                                 │
│                                                                       │
│  Direct Inserts:              Session Extraction Pipeline:            │
│  mem add breadcrumb ──┐       Read JSONL                              │
│  mem add decision  ───┤         → Filter noise (tool results)         │
│  mem add learning  ───┤         → Dedup check (.extraction_tracker)   │
│  memory_add (MCP)  ───┤         → Acquire lock                        │
│                       │         → Claude Haiku extract                │
│                       │           (>120K? chunk → meta-extract)       │
│                       │           (fallback: Ollama)                  │
│                       │         → Quality gate                        │
│                       │           (requires SUMMARY + MAIN IDEAS)     │
│                       │              │                                │
└───────────────────────┼──────────────┼────────────────────────────────┘
                        │              │
                        ▼              ▼
┌──────────────────────────────────────────────────────────────────────┐
│                    STORAGE LAYER (Dual-Write)                        │
│                                                                      │
│  SQLite (~/.claude/memory.db)       Memory Files (~/.claude/MEMORY/) │
│  ┌────────────────────────────┐     ┌──────────────────────────────┐ │
│  │ sessions ←── messages      │     │ DISTILLED.md    (archive)    │ │
│  │ decisions    learnings     │     │ HOT_RECALL.md   (last 10)    │ │
│  │ breadcrumbs  loa_entries   │     │ SESSION_INDEX.json           │ │
│  │ embeddings (768-dim vecs)  │     │ DECISIONS.log                │ │
│  │                            │     │ REJECTIONS.log               │ │
│  │ FTS5 indexes (auto-sync)   │     │ ERROR_PATTERNS.json          │ │
│  │ WAL mode · 0600 perms      │     └──────────────────────────────┘ │
│  └────────────────────────────┘                                      │
└──────────────────────────────────────────────────────────────────────┘
                        │
                        ▼
┌──────────────────────────────────────────────────────────────────────┐
│                      RETRIEVAL LAYER                                 │
│                                                                      │
│  ┌───────────────┐  ┌────────────────┐  ┌─────────────────────────┐  │
│  │Keyword (FTS5) │  │Semantic (Embed)│  │  Hybrid (RRF Fusion)    │  │
│  │mem search     │  │mem semantic    │  │  mem hybrid (DEFAULT)   │  │
│  │memory_search  │  │embed → Ollama  │  │  FTS5 rank ─┐           │  │
│  │               │  │cosine sim      │  │  Embed rank ─┤→ merged  │  │
│  └───────────────┘  └────────────────┘  │  RRF(k=60) ◄┘           │  │
│                                         └─────────────────────────┘  │
│  Direct: mem recent · mem show · memory_recall · context_for_agent   │
└──────────────────────────────────────────────────────────────────────┘
                        │
                        ▼
┌──────────────────────────────────────────────────────────────────────┐
│  CONSUMERS:  Coding agents (MCP)  ·  CLI user (mem)  ·  Sub-agents   │
└──────────────────────────────────────────────────────────────────────┘

Session Lifecycle

Session starts — A SessionStart hook injects two tiers of context: L0 identity (your ~/.claude/MEMORY/identity.md, always on) and L1 top records (top 12 by importance score, with 4 slots reserved for curated Library of Alexandria entries). L2/L3 stay on disk and are pulled on demand via MCP search.
During the session — your agent searches memory via MCP tools (memory_search, memory_hybrid_search, memory_recall, context_for_agent) before falling back to git history. Decisions, learnings, and breadcrumbs are recorded in real-time with memory_add.
End of every turn — A Stop hook fires SessionExtract.ts, which self-spawns a background process (non-blocking). It checks .extraction_tracker.json and only re-extracts if the conversation has grown meaningfully since last time — so capture is incremental, not just an "on exit" event.
Extraction pipeline — The conversation JSONL is filtered, deduplicated, and sent to the claude CLI running Haiku (with chunking for large sessions >120K chars). Optional Ollama fallback if the CLI fails. A quality gate rejects low-quality extractions before they're stored.
PreCompact flush — When Claude Code is about to compact its context, a PreCompact hook (SessionPreCompact.ts) flushes the in-flight messages first, so the squashed window is never lost.
Dual-write storage — Results are written to SQLite (the only query surface — every CLI/MCP read hits this) and to markdown artifacts (DISTILLED.md, HOT_RECALL.md, etc., write-only, human-readable).
Batch catchup (optional) — A cron job (BatchExtract.ts) sweeps any sessions the Stop hook missed during crashes or interruptions, and ingests sessions dropped by the OpenCode plugin and Pi extension into ~/.claude/MEMORY/{opencode,pi}-sessions/. install.sh prints the registration command at the end — opt in by running it once; nothing is auto-scheduled.

Search Strategies

Strategy	Command	How it works
Keyword	`mem search "query"`	FTS5 full-text search across all tables
Semantic	`mem embed semantic "query"`	Ollama embeddings → cosine similarity (requires Ollama)
Hybrid (default)	`mem "query"`	Both keyword + semantic, merged with Reciprocal Rank Fusion (k=60). Falls back to keyword-only if Ollama is unavailable

Architecture deep-dive — database tables, FTS5 indexes, extraction pipeline details

What You Get

Auto-captured session memory — extracted incrementally (Stop hook on every turn) via Claude Haiku, with BatchExtract.ts cron sweeper as a crash-recovery safety net
MCP server (mem-mcp) — memory_search, memory_hybrid_search, memory_recall, memory_add, memory_dump, context_for_agent exposed to your agent mid-session
Hybrid search — FTS5 keyword search + optional Ollama embeddings, fused via Reciprocal Rank Fusion. Lose Ollama, lose nothing — keyword path keeps working
Tiered SessionRecall (v0.7.0+) — L0 identity (~/.claude/MEMORY/identity.md) + L1 top 12 records ranked by importance, with 4 reserved slots for curated Library of Alexandria entries. L2/L3 fetched on demand
Importance scoring (1–10) — every record carries an importance score that drives what surfaces in L1. Manage with mem pin / mem unpin / mem importance backfill
PreCompact flush — SessionPreCompact.ts writes in-flight messages to SQLite before Claude compacts its context window, so the squashed chunk is never lost
Decision lifecycle — mem decision supersede/revert tracks when a decision was replaced or rolled back; confidence scoring (high/medium/low) on every decision and learning
Cross-host ingestion — OpenCode plugin and Pi extension drop sessions into ~/.claude/MEMORY/{opencode,pi}-sessions/; BatchExtract pulls them into the same SQLite DB. One memory layer across agents
Library of Alexandria — curated knowledge entries (session distillations, imported docs, telos goals, quotes) with Fabric extract_wisdom analysis. Default importance 8 — these get reserved L1 slots
Breadcrumbs, decisions, learnings — three structured record types for non-session memory, addable from CLI (mem add), MCP (memory_add), or slash commands (/Recall:add)
Benchmark harness — mem benchmark run B measures wake-up context efficiency against locked baselines so regressions are visible
Onboarding — mem onboard runs a 7-question interview that writes your L0 identity file

Measured wake-up efficiency

Suite B measures the byte cost of session-start memory injection. Latest tracked run (2026-04-18, scope atlas-recall):

Variant	Chars	Tokens (est, 4 ch/tok)
v2 tiered SessionRecall (L0 + L1 top 12)	5,306	~1,327
v1 flat-blob SessionRecall (simulated)	8,020	~2,005
CLAUDE.md static baseline	8,760	~2,190

v2 is 51% smaller than v1 on this corpus. CLAUDE.md is hand-written static context; Recall is auto-extracted dynamic memory — the two are complementary, not competitors. Numbers scale with your own DB and L0 identity; reproduce with mem benchmark run B. Methodology and caveats live in benchmarks/README.md.

CLI at a Glance

mem "kubernetes auth"          # Search your memory
mem onboard                    # Seed your L0 identity tier (one-time)
mem dump "Session Title"       # Save this session
mem add decision "Use X" ...   # Record a decision
mem decision list              # List decisions with status and confidence
mem pin decisions 42           # Pin a record to high importance
mem benchmark run B            # Measure wake-up context efficiency
mem prune                      # Preview stale records for removal
mem stats                      # See what's stored
mem doctor                     # Health check

See it in action

Search	Stats

Health Check	Recent Memory

Full CLI reference

For AI Agents

If you're an AI agent reading this repository:

What you need	Where to find it
Using Recall from Claude Code (MCP tools, CLI, core rules)	`FOR_CLAUDE.md`
Using Recall from OpenCode	`FOR_OPENCODE.md`
Using Recall from Pi	`FOR_PI.md`
Developing Recall (build, test, conventions)	`CLAUDE.md`

Roadmap

Recall is built around two integration surfaces: MCP (memory search and add, available from inside the agent) and lifecycle hooks (auto-extraction, session-start context injection, pre-compact flushes). Different agents support different surfaces — the table below tracks where each one stands.

Agent	MCP	Lifecycle hooks	Status
Claude Code	✅	✅ Stop · SessionStart · PreCompact	Stable — reference implementation
Pi	✅	⚠ Beta — `recall-compaction` + `recall-extract` extensions	In progress
OpenCode	✅	⚠ Alpha — `recall-extract` plugin	In progress
Codex CLI	—	—	Coming soon
Gemini CLI	—	—	Coming soon

Candidate — Cursor: both .cursor/hooks.json and MCP are first-class; the integration model maps cleanly onto Recall's existing hook architecture. Tracked but not started.

Have an agent you'd like to see supported? Open an issue — Recall is designed to be agent-agnostic, and any host that speaks MCP is a candidate.

Documentation

Guide	Description
Installation	Prerequisites, install, verify, session extraction
CLI Reference	All commands and options
MCP Tools	Tools available to AI agents
Architecture	Database, search, extraction pipeline
Slash Commands	`/Recall:*` commands for Claude Code
Upgrading	Update, backup, migration system
Troubleshooting	Common issues and fixes
Changelog	Release notes and breaking changes
Acknowledgments	Ideas borrowed, reshaped, and rejected — with credits to original authors

Acknowledgments

Recall's tiered session-start context (L0 identity + L1 importance-ranked), PreCompact hook, and importance scoring were inspired by MemPalace (Milla Jovovich, Ben Sigman — MIT). We reshaped every adopted idea to fit Recall's SQLite + FTS5 architecture and rejected others (PALACE_PROTOCOL behavioral injection, KG triples) where they didn't survive review. See ACKNOWLEDGMENTS.md for the full what-we-took, what-we-rejected, and credits to the independent critics (lhl, roman-rr, danilchenko, tentenco) whose analyses shaped our reshape decisions.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recall — Persistent Memory for Any Agent Harness

The Problem

How Recall Fixes It

Why Recall

Quick Start

First run: set your identity

Updating

Uninstalling

How Recall Works

Session Lifecycle

Search Strategies

What You Get

Measured wake-up efficiency

CLI at a Glance

For AI Agents

Roadmap

Documentation

Acknowledgments

License

About

Uh oh!

Releases 13

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.claude		.claude
assets		assets
benchmarks		benchmarks
commands/Recall		commands/Recall
docs		docs
hooks		hooks
lib		lib
opencode		opencode
pi		pi
src		src
templates		templates
tests		tests
.gitignore		.gitignore
ACKNOWLEDGMENTS.md		ACKNOWLEDGMENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
FOR_CLAUDE.md		FOR_CLAUDE.md
FOR_OPENCODE.md		FOR_OPENCODE.md
FOR_PI.md		FOR_PI.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
install.sh		install.sh
package.json		package.json
tsconfig.json		tsconfig.json
uninstall.sh		uninstall.sh
update.sh		update.sh

Folders and files

Latest commit

History

Repository files navigation

Recall — Persistent Memory for Any Agent Harness

The Problem

How Recall Fixes It

Why Recall

Quick Start

First run: set your identity

Updating

Uninstalling

How Recall Works

Session Lifecycle

Search Strategies

What You Get

Measured wake-up efficiency

CLI at a Glance

For AI Agents

Roadmap

Documentation

Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 13

Contributors

Uh oh!

Languages