🤖 Agent Coder

A sophisticated AI coding assistant with multi-model architecture and comprehensive tool integration

Agent Coder is a local-first AI coding assistant that leverages multiple specialized models for enhanced reasoning, reflection, and code generation capabilities. Built on a three-model architecture with comprehensive tool integration, it provides professional-grade coding assistance without sending your code to external APIs.

✨ Key Features

🧠 Multi-Model Architecture

Primary Model: zdolny/qwen3-coder58k-tools for code generation and reasoning
Reflection Model: gemma2:2b for plan critique and quality improvement
Embedding Model: nomic-embed-text for semantic memory and context retrieval

🔄 Enhanced ReAct Cycle

Reasoning: Sophisticated plan generation and problem analysis
Reflection: AI-powered critique and plan refinement
Action: Tool execution and code generation
Observation: Result evaluation and iteration

🛠️ Comprehensive Tool Suite

Code Analysis: AST parsing, complexity metrics, dependency analysis
Project Structure: Framework detection, architecture analysis
Git Integration: Full version control workflow support
Test Generation: Automated test creation for Python/JavaScript
Package Management: Multi-manager support (pip, npm, yarn, cargo, etc.)
Documentation: Docstring and README generation
Code Quality: Linting, formatting, security auditing
File Operations: Advanced file I/O and shell command execution

🏠 Local-First Design

No external API calls - your code stays private
FAISS-based vector storage for conversation memory
Ollama integration for local LLM inference
Offline-capable with rich terminal interface

⚡ Performance Optimized

Dynamic iteration limits (8 for simple tasks, 15 for complex)
Smart memory search (only for complex requests)
Optimized LLM calls with selective reflection
Context size management to prevent bloat
2-3x faster responses for simple requests

🤖 Autonomous Operation

No approval requests for standard operations
Automatic tool execution for project analysis
Real-time activity logging with progress indicators
Smart completion detection to prevent infinite loops
Seamless workflow from planning to execution

Prerequisites

Hardware Requirements

GPU: NVIDIA GPU with 16GB+ VRAM (recommended: 24GB for larger models)
CPU: Modern multi-core processor (AMD Ryzen 5 or Intel Core i5+)
RAM: At least 32GB system RAM
Storage: 1TB NVMe SSD for fast model loading

Software Requirements

Python 3.12+
Ollama installed and running
Git (for development)

Installation

Install Ollama:

# Visit https://ollama.ai/ and install for your platform
# Then pull the coding model:
ollama pull qwen2.5-coder:7b

Pull the required models:

# Primary model (main reasoning)
ollama pull zdolny/qwen3-coder58k-tools

# Reflection model (critique and refinement) 
ollama pull gemma2:2b

# Embedding model (memory storage)
ollama pull nomic-embed-text

Clone and install Agent Coder:

git clone <repository-url>
cd agent-local-coder

# Install with uv (recommended)
uv sync

# Or with pip
pip install -e .

Verify installation:

agent-coder models  # List available Ollama models

Usage

Interactive Chat Mode

Start an interactive session with three-model architecture:

# Default three-model setup
agent-coder chat

# Custom model selection
agent-coder chat --primary-model "custom-coder:7b" --reflection-model "phi3:mini"

# Disable reflection for faster responses
agent-coder chat --no-reflection

Available commands in interactive mode:

/help - Show available commands
/mode - Change agent mode (code generation, debugging, etc.)
/context <path> - Set project context to a directory
/memory - Show memory statistics
/history - Show conversation history
/clear - Clear conversation
/flush - Flush all memory and reset agent (fresh start)
/quit - Exit

Single Question Mode

Ask a single question:

# Basic usage with default models
agent-coder ask "How do I implement a binary search in Python?"

# With specific mode and custom models
agent-coder ask "Review this function for bugs" --mode code_review --reflection-model "phi3:mini"

# With project context (reflection enabled by default)
agent-coder ask "Add error handling to main.py" --context ./my-project

# Fast mode without reflection
agent-coder ask "Quick code snippet for file reading" --no-reflection

Agent Modes

code_generation: Generate new code based on requirements
code_review: Review existing code for issues and improvements
debug_assistance: Help debug errors and issues
refactoring: Improve code structure and maintainability
documentation: Generate documentation and comments
optimization: Optimize code for performance
interactive: Auto-detect the best mode based on input

Architecture

Three-Model Architecture

The Agent Coder implements the blueprint's three-model architecture:

Primary Model (zdolny/qwen3-coder58k-tools): Main reasoning and code generation
Reflection Model (gemma2:2b): Fast critique and plan refinement
Embedding Model (nomic-embed-text): Memory storage and semantic search

Core Components

CoreAgent: Main orchestrator with enhanced ReAct cycle
ReflectionModel: Fast critique and refinement system
OllamaIntegration: Handles communication with all Ollama models
MemoryManager: FAISS vector storage with Ollama embeddings
ToolSystem: Extensible tools for shell commands and file operations
CLI: Rich command-line interface with multi-model support

Enhanced ReAct Cycle with Reflection

The agent uses an enhanced Reasoning and Acting cycle:

Thought: Analyze the problem and plan approach (Primary Model)
Reflection: Critique and refine the plan (Reflection Model)
Action: Execute tools or generate refined code (Primary Model)
Observation: Review results and continue or conclude

This reflection pattern prevents poor decisions and improves solution quality.

Memory System

Uses FAISS for efficient vector similarity search
Ollama-based embeddings (nomic-embed-text) for consistency
Stores conversation history and learned knowledge
Automatically retrieves relevant context for new requests

Development

Running Tests

# Install dev dependencies
uv sync --dev

# Run tests
pytest

# Run with coverage
pytest --cov=agent_coder

Project Structure

src/agent_coder/
├── __init__.py          # Main package
├── __main__.py          # Entry point
├── agents/              # Agent implementations
│   └── core_agent.py    # Main ReAct agent
├── cli/                 # Command-line interface
│   └── main.py          # CLI implementation
├── core/                # Core data types
│   └── types.py         # Dataclasses and enums
├── memory/              # Memory management
│   └── memory_manager.py # FAISS-based memory
├── models/              # Model integrations
│   └── ollama_integration.py # Ollama client
└── tools/               # Tool system
    ├── base_tool.py     # Base tool class
    ├── file_io_tool.py  # File operations
    └── shell_tool.py    # Shell commands

Configuration

Model Configuration

The agent uses a three-model setup by default:

# List available models
agent-coder models

# Use different models
agent-coder chat --primary-model "custom-coder:7b" --reflection-model "phi3:mini"

# Performance vs Quality trade-offs
agent-coder chat --no-reflection          # Faster, single model
agent-coder chat --reflection-model "llama3:8b"  # Slower, higher quality reflection

Reflection Model Benefits

The reflection model provides significant quality improvements:

Error Prevention: Catches flawed reasoning before execution
Plan Refinement: Improves initial approaches based on critique
Risk Assessment: Identifies potential issues early
Quality Assurance: Provides second opinion on technical decisions

Disable reflection (--no-reflection) for faster responses when quality is less critical.

Memory Configuration

Memory is stored in ./memory/ by default. The system automatically:

Generates embeddings for conversations
Stores them in FAISS index
Retrieves relevant context for new requests

Examples

Code Generation

> Create a Python function to calculate factorial with proper error handling

The agent will:
1. Analyze the request
2. Generate the function with error handling
3. Provide explanation and suggestions
4. Store the interaction in memory

Code Review

> /mode code_review
> Review this function: def calc(x, y): return x/y

The agent will:
1. Identify potential issues (division by zero)
2. Suggest improvements
3. Provide best practices

Project Context

> /context ./my-python-project
> Add logging to all functions in utils.py

The agent will:
1. Analyze the project structure
2. Read the target file
3. Add appropriate logging
4. Consider the project's existing patterns

Real-Time Activity Logging

The agent provides live updates showing its reasoning process:

📋 Planning approach for task...
🎯 Complex task detected: project_analysis
🤔 Iteration 1/15: Analyzing next step...
🔧 Executing: ProjectStructureTool
🛠️ ProjectStructureTool completed: ✅ Success
✅ Preparing final response...

This transparency helps you understand what the agent is doing and provides confidence that it's actively working on your request.

Troubleshooting

Common Issues

Model not found:
```
ollama pull qwen2.5-coder:7b
```
Ollama not running:
```
ollama serve
```
Memory issues:
- Ensure you have enough RAM for the model
- Consider using smaller models like qwen2.5-coder:3b
Embedding errors:
```
pip install sentence-transformers
```

Performance Tips

Use GPU acceleration for faster inference
Keep conversations focused to improve memory retrieval
Use project context for better code understanding
Clear conversation history periodically

Potential New Features Backlog

Task split query in several smaller tasks that can call or not to a llm. Could use a small llm to figure out
Output from tasks executed showed in console screen. At least a summary of the executed ones.
Standardize operations with tools

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

Built with Ollama for local LLM inference
Uses FAISS for vector storage
CLI built with Rich and Click

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src/agent_coder		src/agent_coder
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
run.py		run.py
start.sh		start.sh
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🤖 Agent Coder

✨ Key Features

🧠 Multi-Model Architecture

🔄 Enhanced ReAct Cycle

🛠️ Comprehensive Tool Suite

🏠 Local-First Design

⚡ Performance Optimized

🤖 Autonomous Operation

Prerequisites

Hardware Requirements

Software Requirements

Installation

Usage

Interactive Chat Mode

Single Question Mode

Agent Modes

Architecture

Three-Model Architecture

Core Components

Enhanced ReAct Cycle with Reflection

Memory System

Development

Running Tests

Project Structure

Configuration

Model Configuration

Reflection Model Benefits

Memory Configuration

Examples

Code Generation

Code Review

Project Context

Real-Time Activity Logging

Troubleshooting

Common Issues

Performance Tips

Potential New Features Backlog

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages