Minimal POC of "dumb orchestrator – smart model". The LLM evolves itself by writing plugins (add_plugin / run_plugin). Inspired by MemPalace (96.6% on LongMemEval) and Claude Code’s TAOR loop. The core is immutable (<150 loc). HTTP transport is a plugin.
Instead of hard-coding complexity into the orchestrator, the model is given two tools:
add_plugin(name, code)— write and save a new plugin (or overwrite an existing one)run_plugin(name, input_data)— execute an already-loaded plugin by name
The core (orchestrator) is immutable and deliberately "dumb" (~150 lines). All intelligence — creating new capabilities, coordinating agents, managing memory, parsing data — lives in the LLM’s reasoning and in the plugins it generates. The model decides when to write a plugin and can hot-reload them at any time.
- MemPalace — the approach that dominated LongMemEval (96.6%) without complex RAG, simply by giving the model raw data and freedom to decide.
- Claude Code — the TAOR (Think‑Act‑Observe‑Repeat) architecture and the "dumb orchestrator, smart model" principle.
- Critique of over-engineered RAG pipelines — give the model a clean context and let it decide.
✅ Core & HTTP Plugin — implemented, tested, CI passing.
✅ Sprint 1 (Multi-Agent Foundation) — advanced metrics, tool reranking, and rejection handling completed.
⏳ Sprint 2 (Context & Reflection) — in progress: context repository and self-correction loop.
# 1. Install dependencies via Poetry or pip
poetry install
# or
pip install -e .
# 2. Create .env with your Anthropic API key
echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
# 3. Start the orchestrator (HTTP server on port 8080)
poetry run python run.py
# or
python run.py
# 4. Send a request
curl -X POST http://localhost:8080/ \
-H "Content-Type: application/json" \
-d '{"prompt": "Write a plugin that returns the current time", "context": {}}'The server port can be overridden via the HTTP_PORT environment variable.
For more isolated plugin execution in WSL, use the Docker backend. In this mode, untrusted plugins run under a separate user and container filesystem boundary rather than sharing the orchestrator process privileges.
rawllm-core- orchestrator process user on host siderawllm-plugin- plugin subprocess user inside sandbox container
docker build -t rawllm/plugin-sandbox:latest -f docker/sandbox/Dockerfile .echo "SANDBOX_BACKEND=docker" >> .env
echo "SANDBOX_DOCKER_IMAGE=rawllm/plugin-sandbox:latest" >> .envpytest -qWhen docker backend is enabled, plugin execution uses isolated volumes:
rawllm_workspace(rw)rawllm_core_repo(ro snapshot)rawllm_plugin_store(ro snapshot)
run.py supports any OpenAI-compatible provider via the LLM_PROVIDER
environment variable (default: anthropic). All security and versioning
features are automatically active.
LLM_PROVIDER |
API key env var | Default model |
|---|---|---|
anthropic (default) |
ANTHROPIC_API_KEY |
claude-3-5-sonnet-20241022 |
groq |
GROQ_API_KEY |
llama3-70b-8192 |
gemini |
GEMINI_API_KEY |
gemini-2.0-flash |
openrouter |
OPEN_ROUTER_API_KEY |
qwen/qwen3-coder:free |
deepseek |
DEEPSEEK_API_KEY |
deepseek-chat |
ollama |
(none required) | llama3.2:3b |
ollama-qwen-coder |
(none required) | qwen2.5-coder:7b |
Override any default with LLM_MODEL and LLM_BASE_URL.
echo "GROQ_API_KEY=gsk_..." >> .env
LLM_PROVIDER=groq python run.pyecho "GEMINI_API_KEY=AIza..." >> .env
LLM_PROVIDER=gemini python run.pyecho "OPEN_ROUTER_API_KEY=sk-or-..." >> .env
LLM_PROVIDER=openrouter python run.py
# Use a specific free model:
LLM_PROVIDER=openrouter LLM_MODEL=google/gemma-3-27b-it:free python run.pyecho "DEEPSEEK_API_KEY=sk-..." >> .env
LLM_PROVIDER=deepseek python run.py# 1. Install Ollama: https://ollama.com/
ollama pull llama3.2:3b # or any model you prefer
# 2. Run
LLM_PROVIDER=ollama python run.py
# Custom model:
LLM_PROVIDER=ollama LLM_MODEL=mistral python run.py# 1. Pull the local coding model in WSL / host environment
ollama pull qwen2.5-coder:7b
# 2. Run RawLLM against the dedicated provider alias
LLM_PROVIDER=ollama-qwen-coder python run.pyIf the orchestrator itself runs in a container and Ollama stays on the host, override the endpoint explicitly:
LLM_PROVIDER=ollama-qwen-coder \
LLM_BASE_URL=http://host.docker.internal:11434/v1 \
python run.pyInstall the package in editable mode to get the rawllm command:
pip install -e .Or invoke directly without installing:
python cli.py <command>rawllm run # use default provider (anthropic)
rawllm run --provider groq # use a specific providerrawllm plugin list
rawllm plugin show my_plugin
rawllm plugin add my_plugin path/to/code.py
rawllm plugin rollback my_pluginUse module-level docstring in every plugin as a prompt for RawLLM. The docstring should describe plugin role, input/output contract, operational constraints, and failure behavior. See plugins/http.py as a reference template.
rawllm deps pending # list modules awaiting approval
rawllm deps approve requests # approve a module
rawllm deps reject requests # reject a modulerawllm metrics show # all plugins, table format
rawllm metrics show --plugin my_plugin # one plugin
rawllm metrics show --format json # JSON output
rawllm metrics evolution my_plugin # chronological timeline
rawllm metrics trajectory <id> # view specific execution trajectory
rawllm metrics success-rate # aggregate success scoresrawllm config show
rawllm config set LLM_PROVIDER groq
rawllm config set ALLOWED_REQUIREMENTS "json,datetime,requests"The statement "plugins run with the same privileges as the orchestrator" applies to trusted in-process plugins and the legacy subprocess backend. When
SANDBOX_BACKEND=dockeris enabled, untrusted plugins run in a separate container with reduced privileges and isolated mounted volumes. Do not load plugins from untrusted sources in a production environment. This project is a research POC — run it only inside a hardened isolated environment (sandbox, Docker, VM), and review Docker runtime permissions.
rawllm/
├── core/
│ ├── llm/ # LLM abstraction subpackage
│ │ ├── protocol.py # LLMClientProtocol structural Protocol
│ │ ├── registry.py # LLM_PROVIDERS — single source of truth
│ │ ├── factory.py # get_llm_client(provider) factory
│ │ └── clients/
│ │ ├── anthropic.py # AnthropicClient
│ │ └── openai_compat.py# OpenAICompatibleClient (Groq, Gemini, Ollama, …)
│ ├── plugin_manager.py # Plugin loading, hot-reload, versioning, sandbox
│ ├── tool_executor.py # Tool-call routing + dependency gating
│ ├── taor_loop.py # Think → Act → Observe → Repeat loop
│ ├── config.py # Settings: trusted_plugins, allowed_requirements
│ ├── tool_management.py # Tool reranking and rejection handling (Sprint 1)
│ ├── metrics.py # Event logging with success_score and trajectory tracking
│ └── utils.py # Shared utilities + extract_imports
├── plugins/
│ └── http.py # HTTP transport plugin (port set via HTTP_PORT)
├── plugins_store/ # Versioned plugin storage (created automatically)
│ ├── current/ # Symlinks to active versions
│ └── archive/{name}/ # Previous versions with metrics snapshots
├── cli.py # CLI entry point (rawllm)
├── system_prompt.txt # LLM system prompt
└── run.py # Unified entry point (Anthropic / Groq / Gemini / Ollama / …)
MIT — use the ideas freely, fork, and improve.
└── HOLOBIONT_ROADMAP.md # Development roadmap and future phases