Skip to content

Sean-Koval/context-engineering

Repository files navigation

ContextEngine

An open-source context management platform for LLM-powered agents

Python 3.12+ License: MIT Tests

The Problem

LLM agents experience context degradation well before hitting token limits. Quality drops significantly after ~50 tool calls, typically around 60-70% context capacityβ€”a phenomenon we call "pre-rot".

Quality β”‚
   100% β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
        β”‚                        β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
    80% β”‚                                β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
        β”‚                                        β–ˆβ–ˆβ–ˆβ–ˆ
    60% β”‚                                            β–ˆβ–ˆβ–ˆβ–ˆ
        │────────────────────────────────────────────────────
            0%    25%    50%    65%    80%    100%
                      Token Usage

        │◄──── Safe ────►│◄ Pre-Rot ►│◄── Degraded ──►│

The Solution

ContextEngine treats context as a first-class data structure (graph-based, not flat strings) and applies intelligent compression through a tiered strategy:

  1. Lossless (100% recoverable): Externalize payloads, deduplicate, collapse tool chains
  2. Compaction (80-95% recoverable): Schema compression, entity-centric filtering
  3. Summarization (last resort): Hierarchical, task-aware, and incremental summarization

Features

  • Graph-based Context: Typed nodes (messages, tool calls, artifacts) with relationships
  • Smart Compression: 10-20x compression with minimal information loss
  • Pre-rot Detection: Act before quality degrades, not after
  • Entity Tracking: NER-powered entity extraction and importance scoring
  • Semantic Search: Embedding-based similarity and duplicate detection
  • Recovery Manifests: Track all compression operations for potential rollback
  • Tiered Storage: Hot/warm/cold storage with automatic migration policies
  • Tool Caching: Semantic and exact-match caching with pattern detection
  • Predictive Prefetch: Learn tool patterns and prefetch likely next calls
  • Observable: OpenTelemetry tracing and Prometheus metrics built-in

Installation

# Using uv (recommended)
uv add context-core context-compression context-memory context-tools

# Or with pip
pip install context-core context-compression context-memory context-tools

Quick Start

from context_core import ContextGraph, TokenBudget, EntityTracker, SemanticIndex
from context_compression import CompressionPipeline, CompressionTier

# Create a context graph
graph = ContextGraph(session_id="my-session")

# Add messages and tool calls
graph.add_message(role="user", content="Find all Python files in the src directory")
call = graph.add_tool_call("glob", {"pattern": "src/**/*.py"})
graph.add_tool_result(call.id, ["src/main.py", "src/utils.py", "src/config.py"])

# Track token budget with pre-rot detection
budget = TokenBudget(total_tokens=100_000)
budget.allocate("context", graph.total_tokens)

if budget.status.needs_compression:
    # Apply intelligent compression
    pipeline = CompressionPipeline()
    results = pipeline.compress(graph, max_tier=CompressionTier.COMPACTION)

    print(f"Saved {sum(r.tokens_saved for r in results)} tokens")

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        ContextEngine                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚  β”‚context-core β”‚  β”‚  context-   β”‚  β”‚  context-   β”‚              β”‚
β”‚  β”‚             β”‚  β”‚ compression β”‚  β”‚   observe   β”‚              β”‚
β”‚  β”‚ β€’ Graph     β”‚  β”‚             β”‚  β”‚             β”‚              β”‚
β”‚  β”‚ β€’ Entities  β”‚  β”‚ β€’ Pipeline  β”‚  β”‚ β€’ Tracing   β”‚              β”‚
β”‚  β”‚ β€’ Semantic  β”‚  β”‚ β€’ Lossless  β”‚  β”‚ β€’ Metrics   β”‚              β”‚
β”‚  β”‚ β€’ Budget    β”‚  β”‚ β€’ Compactionβ”‚  β”‚ β€’ Events    β”‚              β”‚
β”‚  β”‚ β€’ Tokenizer β”‚  β”‚ β€’ Summary   β”‚  β”‚             β”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚  β”‚  context-   β”‚  β”‚  context-   β”‚  β”‚  context-   β”‚              β”‚
β”‚  β”‚   memory    β”‚  β”‚    tools    β”‚  β”‚ multiagent  β”‚  (Phase 4)   β”‚
β”‚  β”‚             β”‚  β”‚             β”‚  β”‚             β”‚              β”‚
β”‚  β”‚ β€’ Tiered    β”‚  β”‚ β€’ Caching   β”‚  β”‚ β€’ Broker    β”‚              β”‚
β”‚  β”‚ β€’ Working   β”‚  β”‚ β€’ Patterns  β”‚  β”‚ β€’ Handoff   β”‚              β”‚
β”‚  β”‚ β€’ Retrieval β”‚  β”‚ β€’ Prefetch  β”‚  β”‚ β€’ Sync      β”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Packages

Package Description Status Tests
context-core Graph, entities, semantic index, token budget βœ… Complete 358
context-compression Compression pipeline with 9 strategies βœ… Complete 311
context-observe OpenTelemetry tracing, Prometheus metrics βœ… Complete -
context-memory Storage backends, tiered storage, retrieval βœ… Complete 307
context-tools Tool caching, patterns, prefetching βœ… Complete 283
context-multiagent Broker, handoff, shared memory πŸ“… Phase 4 -

Compression Strategies

Lossless (100% Recoverable)

Strategy Description Compression
ExternalizePayloads Move large outputs to external storage 2-5x
DeduplicateSemantically Remove near-duplicate content 1.5-2x
CollapseToolChains Merge sequential related tool calls 2-3x

Compaction (80-95% Recoverable)

Strategy Description Compression
SchemaCompression Extract and reference repeated JSON schemas 2-4x
EntityCentricCompression Keep only entity-relevant sentences 2-3x
TaskRelevanceCompression Filter by current task relevance 2-4x

Summarization (Irreversible, Last Resort)

Strategy Description Compression
HierarchicalSummarization Bottom-up multi-level summaries 5-10x
TaskAwareSummarization Task-focused with relevance scoring 5-10x
IncrementalSummarization Streaming updates to running summary 5-10x

Development

# Clone the repository
git clone https://github.com/Sean-Koval/context-engineering.git
cd context-engineering

# Install dependencies
uv sync

# Install all packages in development mode
uv pip install -e packages/context-core -e packages/context-compression -e packages/context-memory -e packages/context-tools -e packages/context-observe

# Run tests
uv run pytest

# Format and lint
uv run ruff format .
uv run ruff check --fix .

# Type check
uv run ty check .

Project Structure

context-engineering/
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ context-core/           # Foundation package (358 tests)
β”‚   β”‚   β”œβ”€β”€ src/context_core/
β”‚   β”‚   β”‚   β”œβ”€β”€ graph/          # ContextGraph, nodes, edges
β”‚   β”‚   β”‚   β”œβ”€β”€ entities/       # EntityTracker, NER backends
β”‚   β”‚   β”‚   β”œβ”€β”€ semantic/       # SemanticIndex, vector stores
β”‚   β”‚   β”‚   β”œβ”€β”€ budget/         # TokenBudget, pre-rot detection
β”‚   β”‚   β”‚   └── tokenizer/      # Tokenizer protocol, implementations
β”‚   β”‚   └── tests/
β”‚   β”œβ”€β”€ context-compression/    # Compression pipeline (311 tests)
β”‚   β”‚   β”œβ”€β”€ src/context_compression/
β”‚   β”‚   β”‚   β”œβ”€β”€ strategies/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ lossless/   # Externalize, deduplicate, collapse
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ compaction/ # Schema, entity-centric, task
β”‚   β”‚   β”‚   β”‚   └── summarization/ # Hierarchical, task-aware, incremental
β”‚   β”‚   β”‚   β”œβ”€β”€ recovery/       # Manifest, operations
β”‚   β”‚   β”‚   └── pipeline.py     # CompressionPipeline orchestrator
β”‚   β”‚   └── tests/
β”‚   β”œβ”€β”€ context-memory/         # Persistent storage (307 tests)
β”‚   β”‚   β”œβ”€β”€ src/context_memory/
β”‚   β”‚   β”‚   β”œβ”€β”€ backends/       # FileSystem, SQLite, Postgres, Redis
β”‚   β”‚   β”‚   β”œβ”€β”€ retrieval/      # Semantic, Entity, Temporal, Ensemble
β”‚   β”‚   β”‚   β”œβ”€β”€ artifacts/      # Versioned artifact management
β”‚   β”‚   β”‚   β”œβ”€β”€ tiered.py       # Hot/warm/cold tiered storage
β”‚   β”‚   β”‚   β”œβ”€β”€ working.py      # Working memory with LRU cache
β”‚   β”‚   β”‚   └── eviction.py     # Multi-tier eviction strategies
β”‚   β”‚   └── tests/
β”‚   β”œβ”€β”€ context-tools/          # Tool optimization (283 tests)
β”‚   β”‚   β”œβ”€β”€ src/context_tools/
β”‚   β”‚   β”‚   β”œβ”€β”€ cache/          # ToolCallCache, semantic matching
β”‚   β”‚   β”‚   β”œβ”€β”€ patterns/       # ToolUsagePatterns, antipattern detection
β”‚   β”‚   β”‚   β”œβ”€β”€ compression/    # ToolResultCompressor, schema extraction
β”‚   β”‚   β”‚   └── prefetch/       # ToolPrefetcher, argument prediction
β”‚   β”‚   └── tests/
β”‚   └── context-observe/        # Observability
β”‚       β”œβ”€β”€ src/context_observe/
β”‚       β”‚   β”œβ”€β”€ tracer.py       # OpenTelemetry integration
β”‚       β”‚   β”œβ”€β”€ metrics.py      # Prometheus metrics
β”‚       β”‚   └── events.py       # Structured logging
β”‚       └── tests/
β”œβ”€β”€ specs/                      # Technical specifications
β”œβ”€β”€ docs/                       # Research and analysis
β”œβ”€β”€ INDEX.md                    # Implementation progress tracking
β”œβ”€β”€ TASK_BOARD.md              # Granular task breakdown
└── MASTER_ROADMAP.md          # Vision and architecture

Roadmap

Phase Focus Status
Phase 1 Foundation (Graph, Entities, Semantic, Budget) βœ… Complete
Phase 2 Compression (Pipeline, 9 Strategies, Recovery) βœ… Complete
Phase 3 Memory & Tools (Storage, Caching, Patterns) βœ… Complete
Phase 4 Multi-Agent (Broker, Handoff, Sync) πŸ“… Planned

Key Metrics

Metric Target Current
Context utilization before degradation 90%+ βœ…
Reversible compression ratio 3-5x βœ…
Total compression ratio 10-20x βœ…
Test coverage 90%+ 1,259 tests
Memory retrieval p99 latency < 100ms βœ…
Tool cache hit rate > 60% βœ…

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

For Coding Agents

Each spec file in specs/ contains:

  • Complete Python code with type hints
  • Pydantic models for all data structures
  • Implementation checklists
  • Test specifications

Use TASK_BOARD.md for granular task breakdown with dependencies.

License

MIT License - see LICENSE for details.

Acknowledgments


ContextEngine - Because context is too valuable to waste.

About

An open-source context management platform for agents

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •