Skip to content

Proposal: HIPPOCAMPUS Pre-Computed Concept Index for Retrieval Optimization #17

@globalcaos

Description

@globalcaos

Problem

Runtime vector search for memory retrieval is expensive and query-dependent. Every recall requires embedding the query, scanning the vector space, and ranking results — at inference time. As memory stores grow, this cost scales linearly.

Proposed Solution: Pre-Computed Concept Index

Instead of searching at inference time, pre-compute a concept-to-memory mapping during offline consolidation (the equivalent of "sleep"). At retrieval time, anchor words detected in the query map directly to pre-indexed memory clusters — O(1) dictionary lookup instead of runtime kNN.

How it works

  1. Build phase (offline, e.g. nightly consolidation):

    • Define an anchor vocabulary (concepts the agent frequently reasons about)
    • For each anchor, embed it and find the k-nearest memory chunks
    • Store the mapping: concept → [chunk_ids]
  2. Retrieval phase (inference time):

    • Detect anchor words in the current query/context
    • Look up pre-computed chunk lists — no embedding, no search
    • Fall through to traditional vector search only for novel/unseen concepts

Two-tier architecture

Tier Built Indexes Purpose
Episodic Real-time (on memory store) Raw events with temporal context Recent, unprocessed recall
Semantic Nightly (post-consolidation) Abstracted knowledge Stable, high-precision recall

Retrieval checks semantic tier first (higher precision), falls through to episodic (higher recency). The staleness gap in the semantic tier is intentional — you can't abstract an event before reflecting on it. This mirrors hippocampal replay during slow-wave sleep → cortical consolidation.

What we've built so far

Honest status: It's not yet wired into our live retrieval path. We still use runtime semantic search (Gemini embeddings via memory_search). The index exists and rebuilds nightly, but we haven't replaced the hot path with it yet. This is a design proposal, not a battle-tested system.

Research paper

We've written a paper exploring the neuroscience analogy and the math behind the approach:

The core insight from neuroscience: the human hippocampus doesn't store memories — it indexes them. Hippocampal damage prevents forming new memories not because storage fails, but because indexing breaks.

Why this could matter for Hexis

Hexis already has a rich memory taxonomy and Postgres + pgvector for retrieval. This concept index could sit as an optimization layer on top:

  • Pre-compute concept mappings during Hexis's consolidation/heartbeat cycles
  • Reduce inference-time vector search calls for frequently accessed memory types
  • Backend-agnostic — works with Postgres/pgvector just as well as SQLite

What we'd like

Your feedback on whether this approach has merit for Hexis's retrieval path. We're genuinely looking to learn, not to sell — if the idea doesn't fit, that's useful feedback too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions