Skip to content

feat(context): Add configurable context window management for multi-backend support #278

@akaszubski

Description

@akaszubski

Summary

Extend autonomous-dev's context management to support non-Claude backends (GPT-4, Gemini, Ollama, vLLM, etc.) by adding configurable context window sizes, custom compaction strategies, and checkpoint/resume integration around context operations.

What Does NOT Work

Pattern: Hardcoded Claude-specific limits fail for other backends

  • Current system assumes Claude's 200K token context window
  • GPT-4 has 128K limit, Gemini 1.5 has 1M limit, Llama-3.1 has 128K, Mixtral has 32K
  • No mechanism to configure context window size per backend
  • Session state manager has no awareness of backend-specific constraints

Pattern: Auto-compaction assumption fails for non-Claude backends

Pattern: One-size-fits-all compaction fails for different backends

  • Claude handles compaction automatically and effectively
  • Other backends may need explicit summarization logic
  • No pluggable compaction strategy mechanism
  • Cannot customize compaction per backend requirements

Scenarios

Fresh Install

What happens: User installs autonomous-dev with default configuration

  • Uses Claude backend (default)
  • Context window = 200K tokens (Claude default)
  • Auto-compaction managed by Claude Code automatically
  • No custom compaction logic needed

Expected behavior:

  • No configuration required for Claude users
  • Default .env values work out of the box
  • Backward compatible with existing workflows

Update/Upgrade - Valid Existing Data

What happens: User has existing .env with Claude configuration

  • Upgrade to multi-backend support version
  • Existing .env has no CONTEXT_WINDOW_SIZE variable

Expected behavior:

  • Auto-detect backend from .env (if BACKEND=claude or not set)
  • Set context window to Claude default (200K)
  • Preserve existing configuration
  • No user prompts needed

Update/Upgrade - Non-Claude Backend

What happens: User switches to GPT-4 backend

  • Sets BACKEND=openai in .env
  • No CONTEXT_WINDOW_SIZE specified

Expected behavior:

  • Auto-detect GPT-4 context limit (128K)
  • Enable custom compaction (CUSTOM_CONTEXT_COMPACTION=true)
  • Create checkpoint before first compaction
  • Use Anchored Iterative Summarization strategy
  • Log warning about reduced context capacity vs Claude

Update/Upgrade - Ollama Self-Hosted

What happens: User runs Ollama locally with Llama-3.1-70B

  • Sets BACKEND=ollama in .env
  • Custom CONTEXT_WINDOW_SIZE=8192 (model-specific limit)

Expected behavior:

  • Respect custom window size
  • Trigger compaction at 85% (configurable COMPACTION_THRESHOLD_PCT)
  • Use summarization strategy appropriate for smaller windows
  • Create checkpoint before compaction
  • Resume seamlessly after compaction

Update/Upgrade - User Customizations

What happens: User has custom .env settings

  • Custom CONTEXT_WINDOW_SIZE=100000
  • Custom COMPACTION_THRESHOLD_PCT=90
  • Custom COMPACTION_STRATEGY=truncate

Expected behavior:

  • Never overwrite user's explicit settings
  • Respect custom context window size
  • Use custom thresholds and strategies
  • Log info about active custom configuration

Implementation Approach

1. Configuration Layer (.env variables)

File: plugins/autonomous-dev/.env.example

Add backend-specific context window configuration:

# Context Window Configuration (Issue #278)
CONTEXT_WINDOW_SIZE=200000          # Max tokens (default: Claude's 200K)
COMPACTION_THRESHOLD_PCT=85         # Trigger compaction at 85% full (170K for 200K window)
BACKEND=claude                      # Backend: claude, openai, gemini, ollama, custom

# Custom Compaction Strategy (for non-Claude backends)
CUSTOM_CONTEXT_COMPACTION=false     # true = implement our own compaction, false = rely on Claude
COMPACTION_STRATEGY=auto            # auto (backend default) | summarize | truncate | clustering
CHECKPOINT_BEFORE_COMPACTION=true   # Create checkpoint before compaction (safety)

2. Context Window Manager Library

File: plugins/autonomous-dev/lib/context_window_manager.py (NEW)

Create new library that:

  • Reads backend configuration from .env
  • Provides context window size for current backend
  • Calculates when to trigger compaction (threshold percentage)
  • Detects whether custom compaction is needed
  • Exposes API for session state manager integration

Key functions:

def get_context_window_size() -> int:
    """Get configured or auto-detected context window size."""
    
def get_compaction_threshold() -> int:
    """Calculate compaction trigger point (window_size * threshold_pct)."""
    
def should_trigger_compaction(current_tokens: int) -> bool:
    """Check if current token count exceeds threshold."""
    
def get_backend_name() -> str:
    """Get configured backend name."""
    
def needs_custom_compaction() -> bool:
    """Check if CUSTOM_CONTEXT_COMPACTION=true."""

3. Custom Compaction Strategies

File: plugins/autonomous-dev/lib/compaction_strategies.py (NEW)

Implement pluggable compaction strategies:

Strategy 1: AutoCompactionStrategy (Claude default)

  • Delegates to Claude Code's native compaction
  • No action needed - Claude handles it
  • Just monitor threshold for logging

Strategy 2: SummarizeCompactionStrategy (GPT-4, Ollama)

  • Uses Anchored Iterative Summarization (Factory.ai pattern)
  • Maintains structured summary with sections
  • Only summarizes newly-truncated spans (not entire history)
  • Preserves recent N messages (e.g., last 20)
  • Uses separate model call (Haiku for speed/cost)

Strategy 3: TruncateCompactionStrategy (Fallback)

  • Removes oldest messages (FIFO)
  • Keeps recent N messages
  • Simple but loses historical context
  • Use when summarization not available

Strategy 4: ClusteringCompactionStrategy (Advanced)

  • Groups related memories into clusters
  • Merges similar context
  • 6-8x compression ratio
  • Use for large-scale batch processing

Interface:

class CompactionStrategy(ABC):
    @abstractmethod
    def compact(self, session_state: Dict[str, Any], keep_recent: int = 20) -> Dict[str, Any]:
        """Compact session state, return new state with reduced tokens."""
        pass

4. Checkpoint Integration

File: Extend plugins/autonomous-dev/lib/session_state_manager.py

Add checkpoint/resume around compaction:

  • create_compaction_checkpoint(): Save state before compaction
  • restore_from_checkpoint(): Rollback if compaction fails
  • cleanup_checkpoint(): Remove checkpoint after success

Integration pattern (reuses Issue #276/#277 infrastructure):

# Before compaction
checkpoint_id = manager.create_compaction_checkpoint()
try:
    compaction_strategy.compact(state)
    manager.cleanup_checkpoint(checkpoint_id)
except Exception as e:
    manager.restore_from_checkpoint(checkpoint_id)
    raise

5. Batch State Manager Integration

File: Extend plugins/autonomous-dev/lib/batch_state_manager.py

Un-deprecate and enhance should_auto_clear():

  • Check CUSTOM_CONTEXT_COMPACTION environment variable
  • If false (Claude): Return False (let Claude handle it)
  • If true (custom backend): Use context_window_manager.should_trigger_compaction()
  • Create checkpoint before triggering compaction

Update from Issue #277:

def should_auto_clear(
    state: BatchState,
    checkpoint_callback: Optional[callable] = None
) -> bool:
    """Check if context should be auto-cleared.
    
    Issue #277: DEPRECATED for Claude backends (auto-compact handles it).
    Issue #278: RE-ACTIVATED for custom backends (CUSTOM_CONTEXT_COMPACTION=true).
    """
    # Check if custom compaction is enabled
    if not context_window_manager.needs_custom_compaction():
        return False  # Claude handles it automatically
    
    # Custom backend - check threshold
    needs_clear = context_window_manager.should_trigger_compaction(
        state.context_token_estimate
    )
    
    # Invoke checkpoint callback before compaction
    if needs_clear and checkpoint_callback:
        checkpoint_callback()
    
    return needs_clear

6. SessionStart Hook Enhancement

File: Extend plugins/autonomous-dev/hooks/SessionStart-batch-recovery.sh

Update hook to handle BOTH Claude auto-compact AND custom compaction:

  • Detect source="compact" (Claude auto-compact) OR
  • Detect compaction checkpoint file (custom compaction)
  • Load checkpoint and resume batch workflow
  • Display appropriate message based on compaction type

Test Scenarios

Fresh Install (No Existing Data)

Test: Install with defaults (Claude)

  • Creates .env.example with Claude defaults
  • Context window = 200K tokens
  • CUSTOM_CONTEXT_COMPACTION=false
  • No prompts needed
  • Auto-compaction works as before (backward compatible)

Test: Install with GPT-4 backend

  • Set BACKEND=openai before first use
  • Context window auto-set to 128K
  • CUSTOM_CONTEXT_COMPACTION=true (auto-enabled)
  • Compaction strategy = summarize
  • Checkpoint created on first compaction

Test: Install with Ollama backend

  • Set BACKEND=ollama and CONTEXT_WINDOW_SIZE=8192
  • Custom window respected
  • CUSTOM_CONTEXT_COMPACTION=true
  • Compaction triggers at 85% (6963 tokens)
  • Summarization uses Haiku model (cost-effective)

Update with Valid Existing Data

Test: Upgrade from v3.45.0 (Claude only)

  • Preserves existing .env settings
  • Adds new variables with defaults
  • No behavior change for Claude users
  • Backward compatible with existing workflows

Test: Switch from Claude to GPT-4

  • Update BACKEND=openai in .env
  • Context window reduces to 128K
  • Warning logged about reduced capacity
  • CUSTOM_CONTEXT_COMPACTION auto-enabled
  • Session state preserved during switch

Update with Invalid/Broken Data

Test: Invalid CONTEXT_WINDOW_SIZE in .env

  • Log error message with valid range (1K-2M tokens)
  • Fall back to backend default (128K for GPT-4, 200K for Claude)
  • Create backup of broken .env as .env.bak
  • Provide fix command in error message

Test: Missing .env file

  • Create new .env from .env.example
  • Use Claude defaults (backward compatible)
  • Log info message about file creation
  • No user prompts needed

Update with User Customizations

Test: Custom CONTEXT_WINDOW_SIZE=100000

  • Never overwrite user setting
  • Use custom size for compaction decisions
  • Log info about custom configuration
  • Respect user's explicit choice

Test: Custom compaction strategy

  • Honor COMPACTION_STRATEGY=truncate
  • Use specified strategy over backend default
  • Log which strategy is active
  • Allow runtime override via API

Rollback After Failure

Test: Compaction fails mid-operation

  • Checkpoint created before compaction
  • Restore state from checkpoint
  • Log rollback event to audit log
  • Session continues with pre-compaction state
  • No data loss

Test: Backend switch during batch processing

  • Detect backend change in .env
  • Complete current feature with old settings
  • Apply new settings for next feature
  • Create checkpoint before switch

Compaction Strategies

Test: Summarize strategy with GPT-4

  • Preserves recent 20 messages
  • Summarizes older messages
  • Uses Haiku for summarization (cost-effective)
  • Achieves 3-5x compression
  • Maintains context coherence

Test: Truncate strategy fallback

  • Removes oldest messages
  • Keeps recent 20 messages
  • Works when summarization unavailable
  • Logs truncation event

Test: Clustering strategy (advanced)

  • Groups related memories
  • Achieves 6-8x compression
  • Preserves semantic relationships
  • Works for large batch processing

Acceptance Criteria

Fresh Install

  • Creates .env.example with all backend configurations documented
  • Claude backend works with zero configuration (200K context window)
  • GPT-4 backend auto-configures to 128K when BACKEND=openai set
  • Ollama backend respects custom CONTEXT_WINDOW_SIZE setting
  • No user prompts needed for default backends
  • Backward compatible with existing Claude-only installations

Updates

  • Preserves valid existing .env configurations
  • Adds missing variables with backend-appropriate defaults
  • Never overwrites user's explicit custom settings
  • Logs info/warning messages about configuration changes
  • Provides fix commands for invalid configurations

Validation

  • Reports configuration errors clearly (invalid window size, unknown backend)
  • Provides specific fix commands in error messages
  • Validates CONTEXT_WINDOW_SIZE is within backend limits (1K-2M tokens)
  • Warns when custom size conflicts with backend capacity
  • Audit logs all configuration changes

Security

  • Validates .env file paths (no path traversal - CWE-22)
  • Prevents symlink attacks on checkpoint files (CWE-59)
  • Sets checkpoint file permissions to 0o600 (owner only)
  • Sanitizes backend names to prevent injection (CWE-117)
  • Audit logs checkpoint create/restore/cleanup events

Compaction

  • Creates checkpoint before every compaction operation
  • Restores from checkpoint if compaction fails
  • Cleans up checkpoint after successful compaction
  • Supports pluggable compaction strategies (auto, summarize, truncate, clustering)
  • Logs compaction events with before/after token counts
  • Preserves workflow methodology across compactions

Backend Integration

  • Auto-detects backend from .env BACKEND variable
  • Maps backend to correct context window size:
    • Claude: 200K tokens (default)
    • OpenAI GPT-4: 128K tokens
    • Google Gemini 1.5: 1M tokens
    • Ollama: User-configurable (model-dependent)
    • Custom: User-configurable
  • Selects appropriate default compaction strategy per backend
  • Allows user override of all auto-detected settings
  • Supports adding new backends without code changes (config-driven)

Backward Compatibility

Security Considerations

  • CWE-22 (Path Traversal): Validate batch_id and checkpoint paths (reuse Issue fix(ralph): RALPH loop stops batch processing without persistence - missing checkpoint/resume mechanism #276 validation)
  • CWE-59 (Symlink Following): File permissions validation (0o600 only)
  • CWE-117 (Log Injection): Sanitize backend names and configuration values before logging
  • CWE-502 (Insecure Deserialization): JSON-only serialization for checkpoints (no pickle)
  • Atomic writes: tempfile.mkstemp() → write → chmod(0o600) → replace() pattern
  • File locking: threading.RLock for concurrent access serialization
  • Backup recovery: Automatic fallback to .bak files on corruption

Environment Requirements

  • Python: 3.11+ (existing requirement)
  • Claude Code: 2.0+ with plugins (existing requirement)
  • Operating System: macOS, Linux (existing requirement)
  • Dependencies: No new dependencies (reuses existing libraries)
  • Optional: Haiku model access for summarization (cost-effective compaction)

Breaking Changes

should_auto_clear() Un-Deprecation

Issue #277: Deprecated should_auto_clear() assuming Claude handles compaction
Issue #278: Re-activates should_auto_clear() for custom backends

Migration:

Configuration Priority

New three-tier priority system:

  1. Environment variables (.env)
  2. PROJECT.md configuration
  3. Backend-specific defaults

Example:

# .env takes precedence
CONTEXT_WINDOW_SIZE=150000  # Overrides backend default

# If not in .env, check PROJECT.md
# If not in PROJECT.md, use backend default (200K for Claude, 128K for GPT-4)

Source of Truth

Verified: Existing codebase patterns

Web Research Verified:

  • Anchored Iterative Summarization (Factory.ai, 2024)
  • Semantic Compression research (6-8x compression, arxiv 2023)
  • Claude API context window specs (200K standard, 1M beta)
  • GPT-4 specs (128K context window)

Date Verified: 2026-01-28

Research Cached: .claude/cache/research_278.json (for /implement reuse)


Related Issues:

Estimated Implementation Time: 4-6 hours (one focused development session)

Priority: HIGH - Enables Claude Code usage with custom backends (Ollama, vLLM, self-hosted models)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions