Skip to content

iyulab/memory-indexer

Repository files navigation

Memory Indexer

A cognitive memory system for LLMs implementing human-inspired 3-axis memory architecture.

CI Tests NuGet .NET License

The Problem

LLMs face a fundamental constraint: finite context windows.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Session 1  β”‚  Session 2  β”‚  Session 3  β”‚  Current  β”‚
β”‚   (lost)    β”‚   (lost)    β”‚   (lost)    β”‚  (active) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Current workarounds fall short:

Approach Limitation
Summarization Information loss, extra LLM calls
Sliding Window Important early context lost
Full History Hits token limits quickly
RAG Not optimized for conversation context

The Solution

Memory Indexer provides Zero Context Engineeringβ€”you focus on your prompt, we handle all memory management.

Before (manual context management):

class ChatService:
    def chat(self, message):
        # You manage: history, summarization, token counting,
        # context assembly, profile loading, fact extraction...
        if self.count_tokens(self.history) > MAX_TOKENS:
            self.history = self.summarize(self.history)  # πŸ˜“

After (with Memory Indexer):

class ChatService:
    def chat(self, message):
        await memory.store(session, message)           # Auto-classify, auto-place
        context = await memory.recall(message)         # Intelligent retrieval
        return await llm.generate(context, message)    # Done.

"The goal of memory is not to transmit the most accurate information over time, but to guide and optimize intelligent decision-making by only preserving valuable information." β€” Richards & Frankland (2017)

Role & Scope

What It Is What It Isn't
General-purpose memory primitives A chatbot framework
Cognitive science-based architecture A vector database
MCP server for any LLM client Tied to specific use cases
Domain-agnostic building blocks An opinionated application

Core Architecture

3-Axis Memory Model where each memory has three orthogonal dimensions:

Type Γ— Scope Γ— Tier = What Γ— When Γ— Where
Axis Values Cognitive Basis
Type Episodic, Semantic, Procedural, Fact, Reflection Tulving's memory classification
Scope Turn, Topic, Session, User Temporal reach (seconds β†’ forever)
Tier Buffer, Short, Long, Archive Atkinson-Shiffrin + Baddeley
Tier Promotion Pipeline (Atkinson-Shiffrin + Tulving):

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Buffer (T0) - Sensory Store                        β”‚
β”‚  TTL: 60s idle β”‚ 500 tokens β”‚ 3 turns               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Short (T1) - Working Memory (Baddeley's 7Β±2)       β”‚
β”‚  Capacity: 9 items, auto-promote when exceeded      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Long (T2) - Episodic Memory                        β”‚
β”‚  Session-level events and experiences               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Archive (T3) - Semantic Memory                     β”‚
β”‚  Promotion: Confidence β‰₯ 0.8 AND Confirms β‰₯ 3       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Benchmark Summary

Operation Latency Throughput
Store ~2.3 ΞΌs 435K ops/s
Recall (limit 5) ~1.5 ΞΌs 667K ops/s
Store→Recall workflow ~3.8 μs 263K ops/s

In-memory storage with mock embeddings. See Benchmark Details for full results.

Quick Start

As MCP Server

dotnet tool install -g MemoryIndexer.Mcp

Configure Claude Desktop (%APPDATA%\Claude\claude_desktop_config.json):

{
  "mcpServers": {
    "memory-indexer": {
      "command": "memory-indexer-mcp"
    }
  }
}

As SDK

dotnet add package MemoryIndexer.Sdk
// Register your embedding service BEFORE AddMemoryIndexer()
services.AddSingleton<IEmbeddingService>(myEmbeddingService);

// InMemory storage (default)
services.AddMemoryIndexer(options =>
{
    options.Embedding.Dimensions = 1536;  // Match your embedding model
});

// Or with SQLite persistent storage
services.AddMemoryIndexer(options =>
{
    options.Storage.ConnectionString = "memories.db";
    options.Embedding.Dimensions = 1536;
}).WithSqliteVec();

// Store
await memoryService.StoreAsync("user123", "User prefers dark mode", importance: 0.8f);

// Recall
var results = await memoryService.RecallAsync("user123", "UI preferences", limit: 5);

Samples

Web-based chat demonstrating Context Budget APIβ€”intelligent recall replaces full conversation history.

Traditional: messages = [msg1, msg2, ... msgN]  β†’ Token cost: O(n)
This Demo:   context = recall(query, budget=2000)  β†’ Token cost: O(1)

Features:

  • Token-budget-aware context building (RecentHeavy, Balanced, SemanticHeavy strategies)
  • 4-tier memory visualization (Buffer β†’ Short β†’ Long β†’ Archive)
  • Session isolation with cross-session user facts
  • Flexible embeddings (inject your own IEmbeddingService) with LLM support (GpuStack/OpenAI)
cd samples/MemoryChatApp
.\start-dev.ps1               # Opens frontend + backend

AI vs AI demo where two LLM agents play 20 Questions using only memory recallβ€”no chat history injection.

Traditional: messages: [Q1, A1, Q2, A2, ... Q19, A19]  ← O(n) growing context
This Demo:   user: "Alpha says: Yes"                   ← O(1) constant context

What It Proves:

  • Agents build coherent multi-turn strategy via memory_recall() only
  • O(1) context maintenance regardless of conversation length
  • Memory isolation between agents works correctly
cd samples/TwentyQuestionsGame
dotnet run                    # Auto-detect LLM provider
dotnet run -- --local         # Use local ONNX model (no API key)

Custom Storage Backends

Memory Indexer provides IMemoryStore interface for custom storage implementations. Use this to integrate with PostgreSQL, Qdrant, Redis, Pinecone, or any other storage system.

using MemoryIndexer.Utilities;

public class MyPostgresStore : IMemoryStore
{
    public async Task<MemoryUnit> StoreAsync(MemoryUnit memory, CancellationToken ct)
    {
        memory.PrepareForStore();   // Extension: sets Id, CreatedAt, UpdatedAt
        memory.ValidateForStore();  // Extension: validates required fields

        // Your storage logic here
        await _db.Memories.AddAsync(MapToEntity(memory), ct);
        await _db.SaveChangesAsync(ct);
        return memory;
    }
    // ... implement other IMemoryStore methods
}

// Register your custom store
services.AddSingleton<IMemoryStore, MyPostgresStore>();
services.AddMemoryIndexer(options => options.Embedding.Dimensions = 1536);

See Custom IMemoryStore Implementation Guide for complete patterns including hybrid PostgreSQL+Qdrant setups.

Documentation

Document Description
Architecture System design, 3-axis model, tier/type details
Intelligence Conflict resolution, adaptive retrieval, graph traversal
Evaluation KPIs, NIAH tests, multi-needle scenarios
Health Health checks, Kubernetes probes
Benchmarks Performance measurements
Guides Configuration, custom storage, usage patterns
Roadmap Feature timeline and status

Research Foundation

Built on cutting-edge memory research:

  • MemGPT: OS-inspired virtual memory paging
  • Mem0/Mem0g: Graph-based memory networks
  • H-MEM: Hierarchical memory with index routing
  • Cognitive Psychology: Atkinson-Shiffrin, Baddeley, Tulving models

License

MIT License - see LICENSE for details.


Built by iyulab

About

.NET SDK and MCP server implementing a cognitive science-based 3-axis memory system (type x scope x tier) for LLM context management beyond finite context windows.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages