Skip to content

cbonilla20/agent-local-coder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Agent Coder

A sophisticated AI coding assistant with multi-model architecture and comprehensive tool integration

Version Python License Ollama

Agent Coder is a local-first AI coding assistant that leverages multiple specialized models for enhanced reasoning, reflection, and code generation capabilities. Built on a three-model architecture with comprehensive tool integration, it provides professional-grade coding assistance without sending your code to external APIs.

✨ Key Features

🧠 Multi-Model Architecture

  • Primary Model: zdolny/qwen3-coder58k-tools for code generation and reasoning
  • Reflection Model: gemma2:2b for plan critique and quality improvement
  • Embedding Model: nomic-embed-text for semantic memory and context retrieval

🔄 Enhanced ReAct Cycle

  • Reasoning: Sophisticated plan generation and problem analysis
  • Reflection: AI-powered critique and plan refinement
  • Action: Tool execution and code generation
  • Observation: Result evaluation and iteration

🛠️ Comprehensive Tool Suite

  • Code Analysis: AST parsing, complexity metrics, dependency analysis
  • Project Structure: Framework detection, architecture analysis
  • Git Integration: Full version control workflow support
  • Test Generation: Automated test creation for Python/JavaScript
  • Package Management: Multi-manager support (pip, npm, yarn, cargo, etc.)
  • Documentation: Docstring and README generation
  • Code Quality: Linting, formatting, security auditing
  • File Operations: Advanced file I/O and shell command execution

🏠 Local-First Design

  • No external API calls - your code stays private
  • FAISS-based vector storage for conversation memory
  • Ollama integration for local LLM inference
  • Offline-capable with rich terminal interface

Performance Optimized

  • Dynamic iteration limits (8 for simple tasks, 15 for complex)
  • Smart memory search (only for complex requests)
  • Optimized LLM calls with selective reflection
  • Context size management to prevent bloat
  • 2-3x faster responses for simple requests

🤖 Autonomous Operation

  • No approval requests for standard operations
  • Automatic tool execution for project analysis
  • Real-time activity logging with progress indicators
  • Smart completion detection to prevent infinite loops
  • Seamless workflow from planning to execution

Prerequisites

Hardware Requirements

  • GPU: NVIDIA GPU with 16GB+ VRAM (recommended: 24GB for larger models)
  • CPU: Modern multi-core processor (AMD Ryzen 5 or Intel Core i5+)
  • RAM: At least 32GB system RAM
  • Storage: 1TB NVMe SSD for fast model loading

Software Requirements

  • Python 3.12+
  • Ollama installed and running
  • Git (for development)

Installation

  1. Install Ollama:

    # Visit https://ollama.ai/ and install for your platform
    # Then pull the coding model:
    ollama pull qwen2.5-coder:7b
  2. Pull the required models:

    # Primary model (main reasoning)
    ollama pull zdolny/qwen3-coder58k-tools
    
    # Reflection model (critique and refinement) 
    ollama pull gemma2:2b
    
    # Embedding model (memory storage)
    ollama pull nomic-embed-text
  3. Clone and install Agent Coder:

    git clone <repository-url>
    cd agent-local-coder
    
    # Install with uv (recommended)
    uv sync
    
    # Or with pip
    pip install -e .
  4. Verify installation:

    agent-coder models  # List available Ollama models

Usage

Interactive Chat Mode

Start an interactive session with three-model architecture:

# Default three-model setup
agent-coder chat

# Custom model selection
agent-coder chat --primary-model "custom-coder:7b" --reflection-model "phi3:mini"

# Disable reflection for faster responses
agent-coder chat --no-reflection

Available commands in interactive mode:

  • /help - Show available commands
  • /mode - Change agent mode (code generation, debugging, etc.)
  • /context <path> - Set project context to a directory
  • /memory - Show memory statistics
  • /history - Show conversation history
  • /clear - Clear conversation
  • /flush - Flush all memory and reset agent (fresh start)
  • /quit - Exit

Single Question Mode

Ask a single question:

# Basic usage with default models
agent-coder ask "How do I implement a binary search in Python?"

# With specific mode and custom models
agent-coder ask "Review this function for bugs" --mode code_review --reflection-model "phi3:mini"

# With project context (reflection enabled by default)
agent-coder ask "Add error handling to main.py" --context ./my-project

# Fast mode without reflection
agent-coder ask "Quick code snippet for file reading" --no-reflection

Agent Modes

  • code_generation: Generate new code based on requirements
  • code_review: Review existing code for issues and improvements
  • debug_assistance: Help debug errors and issues
  • refactoring: Improve code structure and maintainability
  • documentation: Generate documentation and comments
  • optimization: Optimize code for performance
  • interactive: Auto-detect the best mode based on input

Architecture

Three-Model Architecture

The Agent Coder implements the blueprint's three-model architecture:

  1. Primary Model (zdolny/qwen3-coder58k-tools): Main reasoning and code generation
  2. Reflection Model (gemma2:2b): Fast critique and plan refinement
  3. Embedding Model (nomic-embed-text): Memory storage and semantic search

Core Components

  1. CoreAgent: Main orchestrator with enhanced ReAct cycle
  2. ReflectionModel: Fast critique and refinement system
  3. OllamaIntegration: Handles communication with all Ollama models
  4. MemoryManager: FAISS vector storage with Ollama embeddings
  5. ToolSystem: Extensible tools for shell commands and file operations
  6. CLI: Rich command-line interface with multi-model support

Enhanced ReAct Cycle with Reflection

The agent uses an enhanced Reasoning and Acting cycle:

  1. Thought: Analyze the problem and plan approach (Primary Model)
  2. Reflection: Critique and refine the plan (Reflection Model)
  3. Action: Execute tools or generate refined code (Primary Model)
  4. Observation: Review results and continue or conclude

This reflection pattern prevents poor decisions and improves solution quality.

Memory System

  • Uses FAISS for efficient vector similarity search
  • Ollama-based embeddings (nomic-embed-text) for consistency
  • Stores conversation history and learned knowledge
  • Automatically retrieves relevant context for new requests

Development

Running Tests

# Install dev dependencies
uv sync --dev

# Run tests
pytest

# Run with coverage
pytest --cov=agent_coder

Project Structure

src/agent_coder/
├── __init__.py          # Main package
├── __main__.py          # Entry point
├── agents/              # Agent implementations
│   └── core_agent.py    # Main ReAct agent
├── cli/                 # Command-line interface
│   └── main.py          # CLI implementation
├── core/                # Core data types
│   └── types.py         # Dataclasses and enums
├── memory/              # Memory management
│   └── memory_manager.py # FAISS-based memory
├── models/              # Model integrations
│   └── ollama_integration.py # Ollama client
└── tools/               # Tool system
    ├── base_tool.py     # Base tool class
    ├── file_io_tool.py  # File operations
    └── shell_tool.py    # Shell commands

Configuration

Model Configuration

The agent uses a three-model setup by default:

# List available models
agent-coder models

# Use different models
agent-coder chat --primary-model "custom-coder:7b" --reflection-model "phi3:mini"

# Performance vs Quality trade-offs
agent-coder chat --no-reflection          # Faster, single model
agent-coder chat --reflection-model "llama3:8b"  # Slower, higher quality reflection

Reflection Model Benefits

The reflection model provides significant quality improvements:

  • Error Prevention: Catches flawed reasoning before execution
  • Plan Refinement: Improves initial approaches based on critique
  • Risk Assessment: Identifies potential issues early
  • Quality Assurance: Provides second opinion on technical decisions

Disable reflection (--no-reflection) for faster responses when quality is less critical.

Memory Configuration

Memory is stored in ./memory/ by default. The system automatically:

  • Generates embeddings for conversations
  • Stores them in FAISS index
  • Retrieves relevant context for new requests

Examples

Code Generation

> Create a Python function to calculate factorial with proper error handling

The agent will:
1. Analyze the request
2. Generate the function with error handling
3. Provide explanation and suggestions
4. Store the interaction in memory

Code Review

> /mode code_review
> Review this function: def calc(x, y): return x/y

The agent will:
1. Identify potential issues (division by zero)
2. Suggest improvements
3. Provide best practices

Project Context

> /context ./my-python-project
> Add logging to all functions in utils.py

The agent will:
1. Analyze the project structure
2. Read the target file
3. Add appropriate logging
4. Consider the project's existing patterns

Real-Time Activity Logging

The agent provides live updates showing its reasoning process:

📋 Planning approach for task...
🎯 Complex task detected: project_analysis
🤔 Iteration 1/15: Analyzing next step...
🔧 Executing: ProjectStructureTool
🛠️ ProjectStructureTool completed: ✅ Success
✅ Preparing final response...

This transparency helps you understand what the agent is doing and provides confidence that it's actively working on your request.

Troubleshooting

Common Issues

  1. Model not found:

    ollama pull qwen2.5-coder:7b
  2. Ollama not running:

    ollama serve
  3. Memory issues:

    • Ensure you have enough RAM for the model
    • Consider using smaller models like qwen2.5-coder:3b
  4. Embedding errors:

    pip install sentence-transformers

Performance Tips

  • Use GPU acceleration for faster inference
  • Keep conversations focused to improve memory retrieval
  • Use project context for better code understanding
  • Clear conversation history periodically

Potential New Features Backlog

  • Task split query in several smaller tasks that can call or not to a llm. Could use a small llm to figure out
  • Output from tasks executed showed in console screen. At least a summary of the executed ones.
  • Standardize operations with tools

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Built with Ollama for local LLM inference
  • Uses FAISS for vector storage
  • CLI built with Rich and Click

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors