A sophisticated AI coding assistant with multi-model architecture and comprehensive tool integration
Agent Coder is a local-first AI coding assistant that leverages multiple specialized models for enhanced reasoning, reflection, and code generation capabilities. Built on a three-model architecture with comprehensive tool integration, it provides professional-grade coding assistance without sending your code to external APIs.
- Primary Model:
zdolny/qwen3-coder58k-toolsfor code generation and reasoning - Reflection Model:
gemma2:2bfor plan critique and quality improvement - Embedding Model:
nomic-embed-textfor semantic memory and context retrieval
- Reasoning: Sophisticated plan generation and problem analysis
- Reflection: AI-powered critique and plan refinement
- Action: Tool execution and code generation
- Observation: Result evaluation and iteration
- Code Analysis: AST parsing, complexity metrics, dependency analysis
- Project Structure: Framework detection, architecture analysis
- Git Integration: Full version control workflow support
- Test Generation: Automated test creation for Python/JavaScript
- Package Management: Multi-manager support (pip, npm, yarn, cargo, etc.)
- Documentation: Docstring and README generation
- Code Quality: Linting, formatting, security auditing
- File Operations: Advanced file I/O and shell command execution
- No external API calls - your code stays private
- FAISS-based vector storage for conversation memory
- Ollama integration for local LLM inference
- Offline-capable with rich terminal interface
- Dynamic iteration limits (8 for simple tasks, 15 for complex)
- Smart memory search (only for complex requests)
- Optimized LLM calls with selective reflection
- Context size management to prevent bloat
- 2-3x faster responses for simple requests
- No approval requests for standard operations
- Automatic tool execution for project analysis
- Real-time activity logging with progress indicators
- Smart completion detection to prevent infinite loops
- Seamless workflow from planning to execution
- GPU: NVIDIA GPU with 16GB+ VRAM (recommended: 24GB for larger models)
- CPU: Modern multi-core processor (AMD Ryzen 5 or Intel Core i5+)
- RAM: At least 32GB system RAM
- Storage: 1TB NVMe SSD for fast model loading
- Python 3.12+
- Ollama installed and running
- Git (for development)
-
Install Ollama:
# Visit https://ollama.ai/ and install for your platform # Then pull the coding model: ollama pull qwen2.5-coder:7b
-
Pull the required models:
# Primary model (main reasoning) ollama pull zdolny/qwen3-coder58k-tools # Reflection model (critique and refinement) ollama pull gemma2:2b # Embedding model (memory storage) ollama pull nomic-embed-text
-
Clone and install Agent Coder:
git clone <repository-url> cd agent-local-coder # Install with uv (recommended) uv sync # Or with pip pip install -e .
-
Verify installation:
agent-coder models # List available Ollama models
Start an interactive session with three-model architecture:
# Default three-model setup
agent-coder chat
# Custom model selection
agent-coder chat --primary-model "custom-coder:7b" --reflection-model "phi3:mini"
# Disable reflection for faster responses
agent-coder chat --no-reflectionAvailable commands in interactive mode:
/help- Show available commands/mode- Change agent mode (code generation, debugging, etc.)/context <path>- Set project context to a directory/memory- Show memory statistics/history- Show conversation history/clear- Clear conversation/flush- Flush all memory and reset agent (fresh start)/quit- Exit
Ask a single question:
# Basic usage with default models
agent-coder ask "How do I implement a binary search in Python?"
# With specific mode and custom models
agent-coder ask "Review this function for bugs" --mode code_review --reflection-model "phi3:mini"
# With project context (reflection enabled by default)
agent-coder ask "Add error handling to main.py" --context ./my-project
# Fast mode without reflection
agent-coder ask "Quick code snippet for file reading" --no-reflectioncode_generation: Generate new code based on requirementscode_review: Review existing code for issues and improvementsdebug_assistance: Help debug errors and issuesrefactoring: Improve code structure and maintainabilitydocumentation: Generate documentation and commentsoptimization: Optimize code for performanceinteractive: Auto-detect the best mode based on input
The Agent Coder implements the blueprint's three-model architecture:
- Primary Model (
zdolny/qwen3-coder58k-tools): Main reasoning and code generation - Reflection Model (
gemma2:2b): Fast critique and plan refinement - Embedding Model (
nomic-embed-text): Memory storage and semantic search
- CoreAgent: Main orchestrator with enhanced ReAct cycle
- ReflectionModel: Fast critique and refinement system
- OllamaIntegration: Handles communication with all Ollama models
- MemoryManager: FAISS vector storage with Ollama embeddings
- ToolSystem: Extensible tools for shell commands and file operations
- CLI: Rich command-line interface with multi-model support
The agent uses an enhanced Reasoning and Acting cycle:
- Thought: Analyze the problem and plan approach (Primary Model)
- Reflection: Critique and refine the plan (Reflection Model)
- Action: Execute tools or generate refined code (Primary Model)
- Observation: Review results and continue or conclude
This reflection pattern prevents poor decisions and improves solution quality.
- Uses FAISS for efficient vector similarity search
- Ollama-based embeddings (nomic-embed-text) for consistency
- Stores conversation history and learned knowledge
- Automatically retrieves relevant context for new requests
# Install dev dependencies
uv sync --dev
# Run tests
pytest
# Run with coverage
pytest --cov=agent_codersrc/agent_coder/
├── __init__.py # Main package
├── __main__.py # Entry point
├── agents/ # Agent implementations
│ └── core_agent.py # Main ReAct agent
├── cli/ # Command-line interface
│ └── main.py # CLI implementation
├── core/ # Core data types
│ └── types.py # Dataclasses and enums
├── memory/ # Memory management
│ └── memory_manager.py # FAISS-based memory
├── models/ # Model integrations
│ └── ollama_integration.py # Ollama client
└── tools/ # Tool system
├── base_tool.py # Base tool class
├── file_io_tool.py # File operations
└── shell_tool.py # Shell commands
The agent uses a three-model setup by default:
# List available models
agent-coder models
# Use different models
agent-coder chat --primary-model "custom-coder:7b" --reflection-model "phi3:mini"
# Performance vs Quality trade-offs
agent-coder chat --no-reflection # Faster, single model
agent-coder chat --reflection-model "llama3:8b" # Slower, higher quality reflectionThe reflection model provides significant quality improvements:
- Error Prevention: Catches flawed reasoning before execution
- Plan Refinement: Improves initial approaches based on critique
- Risk Assessment: Identifies potential issues early
- Quality Assurance: Provides second opinion on technical decisions
Disable reflection (--no-reflection) for faster responses when quality is less critical.
Memory is stored in ./memory/ by default. The system automatically:
- Generates embeddings for conversations
- Stores them in FAISS index
- Retrieves relevant context for new requests
> Create a Python function to calculate factorial with proper error handling
The agent will:
1. Analyze the request
2. Generate the function with error handling
3. Provide explanation and suggestions
4. Store the interaction in memory
> /mode code_review
> Review this function: def calc(x, y): return x/y
The agent will:
1. Identify potential issues (division by zero)
2. Suggest improvements
3. Provide best practices
> /context ./my-python-project
> Add logging to all functions in utils.py
The agent will:
1. Analyze the project structure
2. Read the target file
3. Add appropriate logging
4. Consider the project's existing patterns
The agent provides live updates showing its reasoning process:
📋 Planning approach for task...
🎯 Complex task detected: project_analysis
🤔 Iteration 1/15: Analyzing next step...
🔧 Executing: ProjectStructureTool
🛠️ ProjectStructureTool completed: ✅ Success
✅ Preparing final response...
This transparency helps you understand what the agent is doing and provides confidence that it's actively working on your request.
-
Model not found:
ollama pull qwen2.5-coder:7b
-
Ollama not running:
ollama serve
-
Memory issues:
- Ensure you have enough RAM for the model
- Consider using smaller models like
qwen2.5-coder:3b
-
Embedding errors:
pip install sentence-transformers
- Use GPU acceleration for faster inference
- Keep conversations focused to improve memory retrieval
- Use project context for better code understanding
- Clear conversation history periodically
- Task split query in several smaller tasks that can call or not to a llm. Could use a small llm to figure out
- Output from tasks executed showed in console screen. At least a summary of the executed ones.
- Standardize operations with tools
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
MIT License - see LICENSE file for details.