feat(context): Add configurable context window management for multi-backend support

## Summary

Extend autonomous-dev's context management to support non-Claude backends (GPT-4, Gemini, Ollama, vLLM, etc.) by adding configurable context window sizes, custom compaction strategies, and checkpoint/resume integration around context operations.

## What Does NOT Work

**Pattern: Hardcoded Claude-specific limits fail for other backends**
- Current system assumes Claude's 200K token context window
- GPT-4 has 128K limit, Gemini 1.5 has 1M limit, Llama-3.1 has 128K, Mixtral has 32K
- No mechanism to configure context window size per backend
- Session state manager has no awareness of backend-specific constraints

**Pattern: Auto-compaction assumption fails for non-Claude backends**
- Issue #277 deprecated should_auto_clear() assuming Claude handles compaction automatically
- Other backends (Ollama, vLLM self-hosted) don't auto-compact like Claude does
- No mechanism to trigger custom compaction when threshold reached
- Users running custom backends will hit hard limit and lose work

**Pattern: One-size-fits-all compaction fails for different backends**
- Claude handles compaction automatically and effectively
- Other backends may need explicit summarization logic
- No pluggable compaction strategy mechanism
- Cannot customize compaction per backend requirements

## Scenarios

### Fresh Install

**What happens**: User installs autonomous-dev with default configuration
- Uses Claude backend (default)
- Context window = 200K tokens (Claude default)
- Auto-compaction managed by Claude Code automatically
- No custom compaction logic needed

**Expected behavior**:
- No configuration required for Claude users
- Default `.env` values work out of the box
- Backward compatible with existing workflows

### Update/Upgrade - Valid Existing Data

**What happens**: User has existing `.env` with Claude configuration
- Upgrade to multi-backend support version
- Existing `.env` has no `CONTEXT_WINDOW_SIZE` variable

**Expected behavior**:
- Auto-detect backend from `.env` (if `BACKEND=claude` or not set)
- Set context window to Claude default (200K)
- Preserve existing configuration
- No user prompts needed

### Update/Upgrade - Non-Claude Backend

**What happens**: User switches to GPT-4 backend
- Sets `BACKEND=openai` in `.env`
- No `CONTEXT_WINDOW_SIZE` specified

**Expected behavior**:
- Auto-detect GPT-4 context limit (128K)
- Enable custom compaction (CUSTOM_CONTEXT_COMPACTION=true)
- Create checkpoint before first compaction
- Use Anchored Iterative Summarization strategy
- Log warning about reduced context capacity vs Claude

### Update/Upgrade - Ollama Self-Hosted

**What happens**: User runs Ollama locally with Llama-3.1-70B
- Sets `BACKEND=ollama` in `.env`
- Custom `CONTEXT_WINDOW_SIZE=8192` (model-specific limit)

**Expected behavior**:
- Respect custom window size
- Trigger compaction at 85% (configurable COMPACTION_THRESHOLD_PCT)
- Use summarization strategy appropriate for smaller windows
- Create checkpoint before compaction
- Resume seamlessly after compaction

### Update/Upgrade - User Customizations

**What happens**: User has custom `.env` settings
- Custom `CONTEXT_WINDOW_SIZE=100000`
- Custom `COMPACTION_THRESHOLD_PCT=90`
- Custom `COMPACTION_STRATEGY=truncate`

**Expected behavior**:
- Never overwrite user's explicit settings
- Respect custom context window size
- Use custom thresholds and strategies
- Log info about active custom configuration

## Implementation Approach

### 1. Configuration Layer (.env variables)

**File**: `plugins/autonomous-dev/.env.example`

Add backend-specific context window configuration:

```bash
# Context Window Configuration (Issue #278)
CONTEXT_WINDOW_SIZE=200000          # Max tokens (default: Claude's 200K)
COMPACTION_THRESHOLD_PCT=85         # Trigger compaction at 85% full (170K for 200K window)
BACKEND=claude                      # Backend: claude, openai, gemini, ollama, custom

# Custom Compaction Strategy (for non-Claude backends)
CUSTOM_CONTEXT_COMPACTION=false     # true = implement our own compaction, false = rely on Claude
COMPACTION_STRATEGY=auto            # auto (backend default) | summarize | truncate | clustering
CHECKPOINT_BEFORE_COMPACTION=true   # Create checkpoint before compaction (safety)
```

### 2. Context Window Manager Library

**File**: `plugins/autonomous-dev/lib/context_window_manager.py` (NEW)

Create new library that:
- Reads backend configuration from `.env`
- Provides context window size for current backend
- Calculates when to trigger compaction (threshold percentage)
- Detects whether custom compaction is needed
- Exposes API for session state manager integration

**Key functions**:
```python
def get_context_window_size() -> int:
    """Get configured or auto-detected context window size."""
    
def get_compaction_threshold() -> int:
    """Calculate compaction trigger point (window_size * threshold_pct)."""
    
def should_trigger_compaction(current_tokens: int) -> bool:
    """Check if current token count exceeds threshold."""
    
def get_backend_name() -> str:
    """Get configured backend name."""
    
def needs_custom_compaction() -> bool:
    """Check if CUSTOM_CONTEXT_COMPACTION=true."""
```

### 3. Custom Compaction Strategies

**File**: `plugins/autonomous-dev/lib/compaction_strategies.py` (NEW)

Implement pluggable compaction strategies:

**Strategy 1: AutoCompactionStrategy** (Claude default)
- Delegates to Claude Code's native compaction
- No action needed - Claude handles it
- Just monitor threshold for logging

**Strategy 2: SummarizeCompactionStrategy** (GPT-4, Ollama)
- Uses Anchored Iterative Summarization (Factory.ai pattern)
- Maintains structured summary with sections
- Only summarizes newly-truncated spans (not entire history)
- Preserves recent N messages (e.g., last 20)
- Uses separate model call (Haiku for speed/cost)

**Strategy 3: TruncateCompactionStrategy** (Fallback)
- Removes oldest messages (FIFO)
- Keeps recent N messages
- Simple but loses historical context
- Use when summarization not available

**Strategy 4: ClusteringCompactionStrategy** (Advanced)
- Groups related memories into clusters
- Merges similar context
- 6-8x compression ratio
- Use for large-scale batch processing

**Interface**:
```python
class CompactionStrategy(ABC):
    @abstractmethod
    def compact(self, session_state: Dict[str, Any], keep_recent: int = 20) -> Dict[str, Any]:
        """Compact session state, return new state with reduced tokens."""
        pass
```

### 4. Checkpoint Integration

**File**: Extend `plugins/autonomous-dev/lib/session_state_manager.py`

Add checkpoint/resume around compaction:
- `create_compaction_checkpoint()`: Save state before compaction
- `restore_from_checkpoint()`: Rollback if compaction fails
- `cleanup_checkpoint()`: Remove checkpoint after success

**Integration pattern** (reuses Issue #276/#277 infrastructure):
```python
# Before compaction
checkpoint_id = manager.create_compaction_checkpoint()
try:
    compaction_strategy.compact(state)
    manager.cleanup_checkpoint(checkpoint_id)
except Exception as e:
    manager.restore_from_checkpoint(checkpoint_id)
    raise
```

### 5. Batch State Manager Integration

**File**: Extend `plugins/autonomous-dev/lib/batch_state_manager.py`

**Un-deprecate and enhance should_auto_clear():**
- Check `CUSTOM_CONTEXT_COMPACTION` environment variable
- If false (Claude): Return False (let Claude handle it)
- If true (custom backend): Use `context_window_manager.should_trigger_compaction()`
- Create checkpoint before triggering compaction

**Update from Issue #277:**
```python
def should_auto_clear(
    state: BatchState,
    checkpoint_callback: Optional[callable] = None
) -> bool:
    """Check if context should be auto-cleared.
    
    Issue #277: DEPRECATED for Claude backends (auto-compact handles it).
    Issue #278: RE-ACTIVATED for custom backends (CUSTOM_CONTEXT_COMPACTION=true).
    """
    # Check if custom compaction is enabled
    if not context_window_manager.needs_custom_compaction():
        return False  # Claude handles it automatically
    
    # Custom backend - check threshold
    needs_clear = context_window_manager.should_trigger_compaction(
        state.context_token_estimate
    )
    
    # Invoke checkpoint callback before compaction
    if needs_clear and checkpoint_callback:
        checkpoint_callback()
    
    return needs_clear
```

### 6. SessionStart Hook Enhancement

**File**: Extend `plugins/autonomous-dev/hooks/SessionStart-batch-recovery.sh`

Update hook to handle BOTH Claude auto-compact AND custom compaction:
- Detect `source="compact"` (Claude auto-compact) OR
- Detect compaction checkpoint file (custom compaction)
- Load checkpoint and resume batch workflow
- Display appropriate message based on compaction type

## Test Scenarios

### Fresh Install (No Existing Data)

**Test**: Install with defaults (Claude)
- [ ] Creates `.env.example` with Claude defaults
- [ ] Context window = 200K tokens
- [ ] CUSTOM_CONTEXT_COMPACTION=false
- [ ] No prompts needed
- [ ] Auto-compaction works as before (backward compatible)

**Test**: Install with GPT-4 backend
- [ ] Set `BACKEND=openai` before first use
- [ ] Context window auto-set to 128K
- [ ] CUSTOM_CONTEXT_COMPACTION=true (auto-enabled)
- [ ] Compaction strategy = summarize
- [ ] Checkpoint created on first compaction

**Test**: Install with Ollama backend
- [ ] Set `BACKEND=ollama` and `CONTEXT_WINDOW_SIZE=8192`
- [ ] Custom window respected
- [ ] CUSTOM_CONTEXT_COMPACTION=true
- [ ] Compaction triggers at 85% (6963 tokens)
- [ ] Summarization uses Haiku model (cost-effective)

### Update with Valid Existing Data

**Test**: Upgrade from v3.45.0 (Claude only)
- [ ] Preserves existing `.env` settings
- [ ] Adds new variables with defaults
- [ ] No behavior change for Claude users
- [ ] Backward compatible with existing workflows

**Test**: Switch from Claude to GPT-4
- [ ] Update `BACKEND=openai` in `.env`
- [ ] Context window reduces to 128K
- [ ] Warning logged about reduced capacity
- [ ] CUSTOM_CONTEXT_COMPACTION auto-enabled
- [ ] Session state preserved during switch

### Update with Invalid/Broken Data

**Test**: Invalid `CONTEXT_WINDOW_SIZE` in `.env`
- [ ] Log error message with valid range (1K-2M tokens)
- [ ] Fall back to backend default (128K for GPT-4, 200K for Claude)
- [ ] Create backup of broken `.env` as `.env.bak`
- [ ] Provide fix command in error message

**Test**: Missing `.env` file
- [ ] Create new `.env` from `.env.example`
- [ ] Use Claude defaults (backward compatible)
- [ ] Log info message about file creation
- [ ] No user prompts needed

### Update with User Customizations

**Test**: Custom `CONTEXT_WINDOW_SIZE=100000`
- [ ] Never overwrite user setting
- [ ] Use custom size for compaction decisions
- [ ] Log info about custom configuration
- [ ] Respect user's explicit choice

**Test**: Custom compaction strategy
- [ ] Honor `COMPACTION_STRATEGY=truncate`
- [ ] Use specified strategy over backend default
- [ ] Log which strategy is active
- [ ] Allow runtime override via API

### Rollback After Failure

**Test**: Compaction fails mid-operation
- [ ] Checkpoint created before compaction
- [ ] Restore state from checkpoint
- [ ] Log rollback event to audit log
- [ ] Session continues with pre-compaction state
- [ ] No data loss

**Test**: Backend switch during batch processing
- [ ] Detect backend change in `.env`
- [ ] Complete current feature with old settings
- [ ] Apply new settings for next feature
- [ ] Create checkpoint before switch

### Compaction Strategies

**Test**: Summarize strategy with GPT-4
- [ ] Preserves recent 20 messages
- [ ] Summarizes older messages
- [ ] Uses Haiku for summarization (cost-effective)
- [ ] Achieves 3-5x compression
- [ ] Maintains context coherence

**Test**: Truncate strategy fallback
- [ ] Removes oldest messages
- [ ] Keeps recent 20 messages
- [ ] Works when summarization unavailable
- [ ] Logs truncation event

**Test**: Clustering strategy (advanced)
- [ ] Groups related memories
- [ ] Achieves 6-8x compression
- [ ] Preserves semantic relationships
- [ ] Works for large batch processing

## Acceptance Criteria

### Fresh Install

- [ ] Creates `.env.example` with all backend configurations documented
- [ ] Claude backend works with zero configuration (200K context window)
- [ ] GPT-4 backend auto-configures to 128K when `BACKEND=openai` set
- [ ] Ollama backend respects custom `CONTEXT_WINDOW_SIZE` setting
- [ ] No user prompts needed for default backends
- [ ] Backward compatible with existing Claude-only installations

### Updates

- [ ] Preserves valid existing `.env` configurations
- [ ] Adds missing variables with backend-appropriate defaults
- [ ] Never overwrites user's explicit custom settings
- [ ] Logs info/warning messages about configuration changes
- [ ] Provides fix commands for invalid configurations

### Validation

- [ ] Reports configuration errors clearly (invalid window size, unknown backend)
- [ ] Provides specific fix commands in error messages
- [ ] Validates `CONTEXT_WINDOW_SIZE` is within backend limits (1K-2M tokens)
- [ ] Warns when custom size conflicts with backend capacity
- [ ] Audit logs all configuration changes

### Security

- [ ] Validates `.env` file paths (no path traversal - CWE-22)
- [ ] Prevents symlink attacks on checkpoint files (CWE-59)
- [ ] Sets checkpoint file permissions to 0o600 (owner only)
- [ ] Sanitizes backend names to prevent injection (CWE-117)
- [ ] Audit logs checkpoint create/restore/cleanup events

### Compaction

- [ ] Creates checkpoint before every compaction operation
- [ ] Restores from checkpoint if compaction fails
- [ ] Cleans up checkpoint after successful compaction
- [ ] Supports pluggable compaction strategies (auto, summarize, truncate, clustering)
- [ ] Logs compaction events with before/after token counts
- [ ] Preserves workflow methodology across compactions

### Backend Integration

- [ ] Auto-detects backend from `.env` `BACKEND` variable
- [ ] Maps backend to correct context window size:
  - Claude: 200K tokens (default)
  - OpenAI GPT-4: 128K tokens
  - Google Gemini 1.5: 1M tokens
  - Ollama: User-configurable (model-dependent)
  - Custom: User-configurable
- [ ] Selects appropriate default compaction strategy per backend
- [ ] Allows user override of all auto-detected settings
- [ ] Supports adding new backends without code changes (config-driven)

### Backward Compatibility

- [ ] Claude users see no behavior change (CUSTOM_CONTEXT_COMPACTION=false by default)
- [ ] Existing `.env` files work without modification
- [ ] should_auto_clear() remains available for testing (not removed)
- [ ] Issue #277 deprecation warning updated to mention Issue #278 re-activation
- [ ] All existing tests pass

## Security Considerations

- **CWE-22 (Path Traversal)**: Validate batch_id and checkpoint paths (reuse Issue #276 validation)
- **CWE-59 (Symlink Following)**: File permissions validation (0o600 only)
- **CWE-117 (Log Injection)**: Sanitize backend names and configuration values before logging
- **CWE-502 (Insecure Deserialization)**: JSON-only serialization for checkpoints (no pickle)
- **Atomic writes**: tempfile.mkstemp() → write → chmod(0o600) → replace() pattern
- **File locking**: threading.RLock for concurrent access serialization
- **Backup recovery**: Automatic fallback to .bak files on corruption

## Environment Requirements

- **Python**: 3.11+ (existing requirement)
- **Claude Code**: 2.0+ with plugins (existing requirement)
- **Operating System**: macOS, Linux (existing requirement)
- **Dependencies**: No new dependencies (reuses existing libraries)
- **Optional**: Haiku model access for summarization (cost-effective compaction)

## Breaking Changes

### should_auto_clear() Un-Deprecation

**Issue #277**: Deprecated should_auto_clear() assuming Claude handles compaction
**Issue #278**: Re-activates should_auto_clear() for custom backends

**Migration**:
- Claude users: No change (CUSTOM_CONTEXT_COMPACTION=false by default)
- Custom backend users: Set CUSTOM_CONTEXT_COMPACTION=true in `.env`
- Tests: Update deprecation notices to reference both #277 (deprecated for Claude) and #278 (re-activated for custom)

### Configuration Priority

New three-tier priority system:
1. Environment variables (`.env`)
2. PROJECT.md configuration
3. Backend-specific defaults

Example:
```bash
# .env takes precedence
CONTEXT_WINDOW_SIZE=150000  # Overrides backend default

# If not in .env, check PROJECT.md
# If not in PROJECT.md, use backend default (200K for Claude, 128K for GPT-4)
```

## Source of Truth

**Verified**: Existing codebase patterns
- `session_state_manager.py`: Checkpoint/atomic write patterns (Issue #247)
- `batch_state_manager.py`: Resume/crash recovery patterns (Issue #76)
- `ralph_loop_manager.py`: Checkpoint infrastructure (Issue #276/#277)
- `pool_config.py`: Environment-based configuration patterns
- `batch_resume_helper.py`: SessionStart hook integration (Issue #277)

**Web Research Verified**:
- Anchored Iterative Summarization (Factory.ai, 2024)
- Semantic Compression research (6-8x compression, arxiv 2023)
- Claude API context window specs (200K standard, 1M beta)
- GPT-4 specs (128K context window)

**Date Verified**: 2026-01-28

**Research Cached**: `.claude/cache/research_278.json` (for /implement reuse)

---

**Related Issues**:
- #277 (CLOSED): Claude auto-compact integration - deprecates should_auto_clear() for Claude
- #276 (CLOSED): RALPH checkpoint/resume infrastructure - foundation for compaction checkpoints
- #76 (CLOSED): Batch state-based auto-clearing - pattern for context management
- #179 (CLOSED): Cross-session memory layer - complementary context continuity
- #247 (CLOSED): Session state persistence - checkpoint infrastructure

**Estimated Implementation Time**: 4-6 hours (one focused development session)

**Priority**: HIGH - Enables Claude Code usage with custom backends (Ollama, vLLM, self-hosted models)

feat(context): Add configurable context window management for multi-backend support #278

Description

Summary

What Does NOT Work

Scenarios

Fresh Install

Update/Upgrade - Valid Existing Data

Update/Upgrade - Non-Claude Backend

Update/Upgrade - Ollama Self-Hosted

Update/Upgrade - User Customizations

Implementation Approach

1. Configuration Layer (.env variables)

2. Context Window Manager Library

3. Custom Compaction Strategies

4. Checkpoint Integration

5. Batch State Manager Integration

6. SessionStart Hook Enhancement

Test Scenarios

Fresh Install (No Existing Data)

Update with Valid Existing Data

Update with Invalid/Broken Data

Update with User Customizations

Rollback After Failure

Compaction Strategies

Acceptance Criteria

Fresh Install

Updates

Validation

Security

Compaction

Backend Integration

Backward Compatibility

Security Considerations

Environment Requirements

Breaking Changes

should_auto_clear() Un-Deprecation

Configuration Priority

Source of Truth

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions