A cognitive memory system for LLMs implementing human-inspired 3-axis memory architecture.
LLMs face a fundamental constraint: finite context windows.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Session 1 β Session 2 β Session 3 β Current β
β (lost) β (lost) β (lost) β (active) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Current workarounds fall short:
| Approach | Limitation |
|---|---|
| Summarization | Information loss, extra LLM calls |
| Sliding Window | Important early context lost |
| Full History | Hits token limits quickly |
| RAG | Not optimized for conversation context |
Memory Indexer provides Zero Context Engineeringβyou focus on your prompt, we handle all memory management.
Before (manual context management):
class ChatService:
def chat(self, message):
# You manage: history, summarization, token counting,
# context assembly, profile loading, fact extraction...
if self.count_tokens(self.history) > MAX_TOKENS:
self.history = self.summarize(self.history) # πAfter (with Memory Indexer):
class ChatService:
def chat(self, message):
await memory.store(session, message) # Auto-classify, auto-place
context = await memory.recall(message) # Intelligent retrieval
return await llm.generate(context, message) # Done."The goal of memory is not to transmit the most accurate information over time, but to guide and optimize intelligent decision-making by only preserving valuable information." β Richards & Frankland (2017)
| What It Is | What It Isn't |
|---|---|
| General-purpose memory primitives | A chatbot framework |
| Cognitive science-based architecture | A vector database |
| MCP server for any LLM client | Tied to specific use cases |
| Domain-agnostic building blocks | An opinionated application |
3-Axis Memory Model where each memory has three orthogonal dimensions:
Type Γ Scope Γ Tier = What Γ When Γ Where
| Axis | Values | Cognitive Basis |
|---|---|---|
| Type | Episodic, Semantic, Procedural, Fact, Reflection | Tulving's memory classification |
| Scope | Turn, Topic, Session, User | Temporal reach (seconds β forever) |
| Tier | Buffer, Short, Long, Archive | Atkinson-Shiffrin + Baddeley |
Tier Promotion Pipeline (Atkinson-Shiffrin + Tulving):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Buffer (T0) - Sensory Store β
β TTL: 60s idle β 500 tokens β 3 turns β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Short (T1) - Working Memory (Baddeley's 7Β±2) β
β Capacity: 9 items, auto-promote when exceeded β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Long (T2) - Episodic Memory β
β Session-level events and experiences β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Archive (T3) - Semantic Memory β
β Promotion: Confidence β₯ 0.8 AND Confirms β₯ 3 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Operation | Latency | Throughput |
|---|---|---|
| Store | ~2.3 ΞΌs | 435K ops/s |
| Recall (limit 5) | ~1.5 ΞΌs | 667K ops/s |
| StoreβRecall workflow | ~3.8 ΞΌs | 263K ops/s |
In-memory storage with mock embeddings. See Benchmark Details for full results.
dotnet tool install -g MemoryIndexer.McpConfigure Claude Desktop (%APPDATA%\Claude\claude_desktop_config.json):
{
"mcpServers": {
"memory-indexer": {
"command": "memory-indexer-mcp"
}
}
}dotnet add package MemoryIndexer.Sdk// Register your embedding service BEFORE AddMemoryIndexer()
services.AddSingleton<IEmbeddingService>(myEmbeddingService);
// InMemory storage (default)
services.AddMemoryIndexer(options =>
{
options.Embedding.Dimensions = 1536; // Match your embedding model
});
// Or with SQLite persistent storage
services.AddMemoryIndexer(options =>
{
options.Storage.ConnectionString = "memories.db";
options.Embedding.Dimensions = 1536;
}).WithSqliteVec();
// Store
await memoryService.StoreAsync("user123", "User prefers dark mode", importance: 0.8f);
// Recall
var results = await memoryService.RecallAsync("user123", "UI preferences", limit: 5);Web-based chat demonstrating Context Budget APIβintelligent recall replaces full conversation history.
Traditional: messages = [msg1, msg2, ... msgN] β Token cost: O(n)
This Demo: context = recall(query, budget=2000) β Token cost: O(1)
Features:
- Token-budget-aware context building (RecentHeavy, Balanced, SemanticHeavy strategies)
- 4-tier memory visualization (Buffer β Short β Long β Archive)
- Session isolation with cross-session user facts
- Flexible embeddings (inject your own IEmbeddingService) with LLM support (GpuStack/OpenAI)
cd samples/MemoryChatApp
.\start-dev.ps1 # Opens frontend + backendAI vs AI demo where two LLM agents play 20 Questions using only memory recallβno chat history injection.
Traditional: messages: [Q1, A1, Q2, A2, ... Q19, A19] β O(n) growing context
This Demo: user: "Alpha says: Yes" β O(1) constant context
What It Proves:
- Agents build coherent multi-turn strategy via
memory_recall()only - O(1) context maintenance regardless of conversation length
- Memory isolation between agents works correctly
cd samples/TwentyQuestionsGame
dotnet run # Auto-detect LLM provider
dotnet run -- --local # Use local ONNX model (no API key)Memory Indexer provides IMemoryStore interface for custom storage implementations. Use this to integrate with PostgreSQL, Qdrant, Redis, Pinecone, or any other storage system.
using MemoryIndexer.Utilities;
public class MyPostgresStore : IMemoryStore
{
public async Task<MemoryUnit> StoreAsync(MemoryUnit memory, CancellationToken ct)
{
memory.PrepareForStore(); // Extension: sets Id, CreatedAt, UpdatedAt
memory.ValidateForStore(); // Extension: validates required fields
// Your storage logic here
await _db.Memories.AddAsync(MapToEntity(memory), ct);
await _db.SaveChangesAsync(ct);
return memory;
}
// ... implement other IMemoryStore methods
}
// Register your custom store
services.AddSingleton<IMemoryStore, MyPostgresStore>();
services.AddMemoryIndexer(options => options.Embedding.Dimensions = 1536);See Custom IMemoryStore Implementation Guide for complete patterns including hybrid PostgreSQL+Qdrant setups.
| Document | Description |
|---|---|
| Architecture | System design, 3-axis model, tier/type details |
| Intelligence | Conflict resolution, adaptive retrieval, graph traversal |
| Evaluation | KPIs, NIAH tests, multi-needle scenarios |
| Health | Health checks, Kubernetes probes |
| Benchmarks | Performance measurements |
| Guides | Configuration, custom storage, usage patterns |
| Roadmap | Feature timeline and status |
Built on cutting-edge memory research:
- MemGPT: OS-inspired virtual memory paging
- Mem0/Mem0g: Graph-based memory networks
- H-MEM: Hierarchical memory with index routing
- Cognitive Psychology: Atkinson-Shiffrin, Baddeley, Tulving models
MIT License - see LICENSE for details.
Built by iyulab