Local-first semantic memory + self-learning for AI coding agents. ~300 lines of
Python, one dependency (numpy), zero servers, every artifact a plain file you can read,
grep, and git diff.
It gives a file-based agent memory (like Claude Code's) two capabilities that are usually locked behind heavyweight multi-agent frameworks:
| Capability | What it does | Heavyweight-framework equivalent |
|---|---|---|
| Vector recall | Semantic search over your markdown memory files | A binary HNSW vector database |
| Self-learning | A trajectory log of what worked, retrieved alongside facts so past solutions inform new work | A "ReasoningBank" / trajectory store |
| Auto-recall | A UserPromptSubmit hook injects relevant memory into every prompt automatically |
An always-on retrieval agent |
Run it yourself — self-contained, touches only a temp dir:
./examples/demo.shReal output (M = stored fact, T = learned trajectory). Watch step 4: the lesson
recorded in step 3 outranks the static note on a related query — that's the self-learning
loop working.
### 2. Semantic recall — ask in your own words
$ memdb.py search "how do I publish a website and clear the cache" -k 2
1. [M 0.282] CDN deploy notes
Deploy static sites by uploading build output then purging the CDN cache
### 3. Self-learning — record what worked
$ memdb.py learn --task "Fix 502 from a CDN" --did "..." --avoid "Token was revoked; use dashboard"
Learned trajectory -> trajectories/fix-502-from-a-cdn.md
### 4. The lesson now surfaces on a related query
$ memdb.py search "getting a 502 error behind the cdn" -k 2
1. [T 0.750] fix-502-from-a-cdn <- the lesson, ranked first
2. [M 0.256] CDN deploy notes
### 5. The auto-recall hook (fed a prompt the way Claude Code feeds it)
$ echo '{"prompt":"deploy the site and clear the cache"}' | recall_hook.py
<vector-memory-recall>
Possibly-relevant saved memory/trajectories for this request:
- [fact 0.54] CDN deploy notes — memory/cdn-deploy.md
</vector-memory-recall>
To turn this into a GIF for the repo header:
asciinema rec demo.cast -c './examples/demo.sh' && agg demo.cast demo.gif
Most "agent memory" frameworks ship a vector DB daemon, an extra MCP server, swarm consensus, and a benchmark table — a lot of trust surface for two good ideas. This project distills those two ideas (semantic recall + trajectory learning) into something you fully own: no daemon, no API key required, nothing hidden in a binary store.
The default backend is TF-IDF cosine similarity in pure numpy — instant, offline, free, and deterministic. That's enough for hundreds of memory files. When you want true paraphrase/synonym matching at larger scale, set one env var to switch to real embeddings via any OpenAI-compatible endpoint (OpenAI, DashScope, Together, local Ollama, …).
git clone https://github.com/Javierg720/agent-vector-memory.git
cd agent-vector-memory
python3 -m pip install numpy # the only dependency
./install.sh # optional: registers the auto-recall hookPoint it at your memory folder (a directory of markdown files with name:/description:
frontmatter — Claude Code's ~/.claude/memory works out of the box):
export VECTOR_MEMORY_DIR="$HOME/.claude/memory"# semantic recall over facts + learned lessons
python3 scripts/memdb.py search "deploy a static site and clear the CDN cache" -k 5
# record what worked (the self-learning step)
python3 scripts/memdb.py learn \
--task "Deploy static site behind a CDN" \
--did "Upload build/ over SFTP; purge cache via the dashboard" \
--avoid "Don't rely on the API token — it was revoked; use the dashboard" \
--tags "deploy,cdn" --date "2026-06-14"
python3 scripts/memdb.py list # list trajectories
python3 scripts/memdb.py index # rebuild the indexsearch and learn rebuild the index automatically; you rarely call index by hand.
- Recall before a non-trivial task — pull relevant facts and past lessons.
- Do the work.
- Learn after a non-obvious solve — record a trajectory so next time is faster.
Wire up install.sh and step 1 happens on every prompt automatically.
All optional, all environment variables:
| Var | Default | Purpose |
|---|---|---|
VECTOR_MEMORY_DIR |
~/.claude/memory |
Folder of markdown "fact" files to index |
VECTOR_MEMORY_HOME |
repo dir | Where trajectories/ and index/ live |
VECTOR_MEMORY_THRESHOLD |
0.10 |
Min score for the auto-recall hook to surface a hit |
VECTOR_MEMORY_MAX_HITS |
4 |
Max hits the hook injects |
EMBED_API_KEY |
unset | Set to switch from local TF-IDF to embeddings |
EMBED_BASE_URL |
OpenAI | Any OpenAI-compatible /embeddings endpoint |
EMBED_MODEL |
text-embedding-3-small |
Embedding model name |
- Documents are markdown files with
name:anddescription:/task:frontmatter. Facts live inVECTOR_MEMORY_DIR; trajectories live intrajectories/. Both are indexed and searched together. - Local backend builds a TF-IDF matrix (title/description weighted 3×), L2-normalizes rows, and ranks by cosine similarity to the query vector. Pure numpy.
- Embedding backend calls an OpenAI-compatible endpoint, caches vectors in
index/vectors.json, and ranks the same way. - The hook (
scripts/recall_hook.py) reads Claude Code'sUserPromptSubmitJSON from stdin, searches against the prompt, and prints a short recall block to stdout — which Claude Code injects as context. It exits 0 and stays silent on any error, so it can never break prompt submission.
MIT — see LICENSE.