A dual-LLM autonomous agent that pairs a small local model (the Cerebellum) with giant cloud LLMs (the Cerebrum) to produce work that is not just intelligent, but verifiably effective in the real world.
- Why CereWorker
- Architecture
- Quick Start
- How It Works
- Self-Improvement
- Packages
- Built-in Tools
- Skills
- Cerebellum Models
- Hippocampus: Memory System
- Configuration
- Development
- Acknowledgments
- License
Most AI agents today are built on a single giant LLM. They reason well, but they have critical blind spots:
- They forget. Context windows are finite. Prompt-engineered memory (like injecting past summaries) is fragile and degrades over time. The agent loses track of what it learned three conversations ago.
- They lie about their work. A giant LLM can confidently report "I wrote the file" or "I sent the request" without the action actually succeeding. There is no independent verification layer.
- They run on external schedules. Cron jobs and timers are rigid. They don't understand "check this when the system seems idle" or "run this more frequently when things are failing." The scheduling has no intelligence.
CereWorker solves these problems by splitting the agent into two cooperating brains, modeled after the human nervous system:
- The Cerebrum (giant LLM) handles complex reasoning, planning, conversation, and tool use. It is the thinker.
- The Cerebellum (small local LLM, Qwen3 0.6B in Docker) handles coordination, verification, and persistent memory. It is the doer's watchdog.
This isn't just an architectural novelty. The key insight: a 600M-parameter model cannot reason, but it can answer "yes or no." The Cerebellum never tries to think -- it only judges simple binary facts, while deterministic code does the heavy lifting. This keeps the small model from degrading the intelligence of the real brain.
Tasks are registered with natural language hints like "every 5 minutes" or "when idle." Deterministic code parses these into intervals and handles clear-cut cases: too early? skip. Way overdue? invoke. Only in the ambiguous zone (0.8x-2.0x the interval) does the Cerebellum get asked a single binary question: "Should this task run now? yes or no." The model is a tiebreaker, not a scheduler.
When the Cerebrum says it wrote a file, programmatic checks verify the actual effect: does the file exist? Was it modified in the last 30 seconds? Is it non-empty? These are os.path.exists() and os.stat() calls, not LLM inference. Only after all checks pass does the Cerebellum get asked one yes/no question: "Is everything OK?" If any check fails, the result is immediately flagged without touching the model.
This means the Cerebrum gets a concrete warning appended to the tool output -- [Cerebellum warning: file does not exist, not recently modified] -- and can self-correct.
The Cerebellum is the agent's instinctive layer -- it handles the things that should never require conscious thought.
Exec safety. Every shell command the Cerebrum wants to run passes through an exec policy before execution. Safe read-only binaries (ls, git status, node, etc.) execute immediately. Destructive patterns (rm -rf, curl | bash, git push --force) are blocked outright. In supervised mode (default), unknown commands pause and ask the user for approval. In full-auto mode (/auto on), the Cerebellum pre-screens commands instead -- it answers "is this command safe? yes or no" the same way it answers every other binary question. Hard-block patterns (rm -rf /, mkfs, dd to block devices) are always blocked regardless of mode, because some things should never require a judgment call.
Emergency stop. The Cerebellum is the only component with low enough latency to respond instantly during a crisis. When the user types /stop -- which works even while the Cerebrum is mid-stream -- the Cerebellum triggers an immediate halt: the active stream is aborted via AbortController, all running sub-agents are cancelled, and the TUI confirms the stop. The Cerebrum is a slow thinker; the Cerebellum is a fast reflex. This is the architectural reason the emergency stop exists at the Cerebellum level rather than waiting for the Cerebrum to finish its current thought.
Stream watchdog. The Cerebellum monitors the main Cerebrum stream for stalls. Every 15 seconds during active streaming, it checks when the last chunk or tool call was received. If the stream goes silent beyond a configurable threshold (default 30 seconds), the Cerebellum is asked: "the stream stalled for N seconds -- should we nudge it to continue?" If yes, the stalled stream is aborted and a system message is injected telling the Cerebrum to continue where it left off. The conversation automatically resumes. This prevents the common failure mode where complex multi-tool workflows (browser automation, long API chains) cause the LLM to silently stop mid-response. The TUI shows a [STALL] indicator during detected stalls.
Persistent memory. Instead of simulating memory through prompt injection (which is lossy and context-limited), the Cerebellum periodically fine-tunes itself on conversations between the user and the agent. Knowledge is burned into model parameters, not pasted into prompts. The fine-tuning happens automatically during idle time: the Cerebellum copies itself, trains on accumulated conversations, and hot-swaps the container with updated weights. The agent genuinely learns, and that learning survives across sessions without consuming context window.
The Cerebellum is a watchdog, not a thinker. It answers "yes" or "no" -- nothing more. Deterministic code handles scheduling math, file system checks, and output validation. The model is a final sanity gate that uses 3 tokens per verdict. This ensures the 0.6B model never degrades the quality of the Cerebrum's work -- it only catches the Cerebrum's mistakes.
The Orchestrator sits at the center. It routes user messages to the Cerebrum, executes tool calls, streams responses to the TUI, and listens to heartbeat events from the Cerebellum. It also manages sub-agents via the SubAgentManager. It emits typed events (message:cerebrum:chunk, tool:start, heartbeat:tick, agent:spawned, etc.) that the UI and other components subscribe to.
The Cerebrum wraps Vercel AI SDK 6 to provide a unified interface across providers. Switching from Claude to GPT to Gemini to a local Ollama model is a config change. The Orchestrator owns the tool registry and shared execution runtime, while the Cerebrum adapts those tools to provider-specific payloads. This keeps shell execution, file operations, browser automation, memory tools, and sub-agent tools on one normalized execution path.
The Cerebellum runs as a Python gRPC service inside a Docker container with a configurable small LLM (Qwen3 0.6B/1.7B, SmolLM2, Phi-4 Mini, or a custom model). The TypeScript side communicates with it via streaming RPCs defined in proto/cerebellum.proto. The container manages its own model weights and supports fine-tuning via LoRA, QLoRA, or full methods on a configurable schedule. After fine-tuning, the container can be hot-swapped with updated weights without interrupting the main process.
The Hippocampus is CereWorker's temporary memory layer, inspired by the brain structure that consolidates short-term memory into long-term storage. It stores session notes, decisions, and observations in ~/.cereworker/memory/ as markdown files (MEMORY.md for curated knowledge, YYYY-MM-DD.md for daily logs). The Cerebrum reads and writes to the Hippocampus during normal conversation via memory tools. Periodically, a curator process reviews the Hippocampus and selects memories worth permanently learning -- these are extracted as training pairs and fed into the Cerebellum's fine-tuning pipeline. This is how ephemeral context becomes permanent knowledge without consuming context window.
The SubAgentManager enables the Cerebrum to spawn independent workers for parallel tasks. Each sub-agent gets its own isolated conversation, session (session.json + transcript.jsonl), and memory directory (~/.cereworker/agents/<id>/memory/). Sub-agents share the same Cerebrum provider and tool registry but cannot spawn sub-sub-agents (preventing infinite recursion). The Cerebellum monitors sub-agent health via the ReportAgentStates RPC -- deterministic checks detect stalls and timeouts, and the model answers "should we retry this stalled agent? yes/no" for ambiguous cases. The Cerebrum manages sub-agents through three tools: spawn_agent, query_agents, and cancel_agent.
Channels are pluggable IM adapters. Each implements a simple interface: start(handler), stop(), send(msg), isAllowed(senderId). The channel manager starts all enabled channels and routes inbound messages through the orchestrator, so the agent can be reached via Slack, Discord, Telegram, Matrix, Feishu, WeChat, WhatsApp, Signal, or IRC simultaneously.
Linux / macOS:
curl -fsSL https://raw.githubusercontent.com/Producible/CereWorker/main/install.sh | bashWindows (PowerShell):
irm https://raw.githubusercontent.com/Producible/CereWorker/main/install.ps1 | iexThe installer detects your OS, installs Node.js if missing, installs CereWorker via npm, sets up Docker for Cerebellum, and launches the onboarding wizard.
Prerequisites: Node.js 22+, Docker (optional, for Cerebellum)
npm install -g @cereworker/cliCereWorker persists conversations, pairing state, plans, and fine-tune material as plain JSON/JSONL files under ~/.cereworker/, so runs can be inspected and diffed by hand.
The easiest way to get started is the interactive onboarding wizard:
cereworker onboardThe wizard walks you through:
- LLM provider -- Anthropic, OpenAI API, OpenAI Codex (ChatGPT OAuth), Google, OpenRouter, DeepSeek, xAI, Mistral, Together, Moonshot, MiniMax, MiniMax Portal (OAuth), or local (Ollama/vLLM)
- Cerebellum model -- choose from Qwen3, SmolLM2, Phi-4 Mini, or a custom checkpoint, with hardware-aware recommendations
- Fine-tuning -- method (Auto/LoRA/QLoRA/Full) and schedule, with GPU/RAM detection
- Messaging channels -- enable Slack, Discord, Telegram, Matrix, Feishu, WeChat, WhatsApp, Signal, or IRC
- Config output -- writes
~/.cereworker/config.yamlwith env var references for secrets
For providers with multiple connection modes, onboarding groups them as provider -> type. For example, you select OpenAI, then choose either API or Codex (ChatGPT OAuth). MiniMax works the same way with API and Portal (OAuth).
After onboarding, start the agent:
cereworker # interactive TUI
cereworker serve # headless service (for production/systemd)
cereworker images # show local Cerebellum image and container status
cereworker images upgrade # pull latest Cerebellum Docker imageOn first run, CereWorker discovers its identity through conversation — it asks your name for it, its role, recurring tasks, and communication style. The answers are saved to ~/.cereworker/instance.json and seeded as training data for the Cerebellum's first fine-tune cycle. Each instance develops a unique identity through fine-tuning that persists across sessions.
Or configure manually:
mkdir -p ~/.cereworker
cat > ~/.cereworker/config.yaml << 'EOF'
cerebrum:
defaultProvider: anthropic
defaultModel: claude-sonnet-4-6
providers:
anthropic:
apiKey: ${ANTHROPIC_API_KEY}
EOF
ANTHROPIC_API_KEY=sk-... cereworkerFor OpenAI Codex subscription OAuth, use the dedicated provider:
cereworker auth openai-codexcerebrum:
defaultProvider: openai-codex
defaultModel: gpt-5.4
providers:
openai-codex:
auth: oauthFor MiniMax Portal OAuth, use the dedicated provider:
cereworker auth minimax-portalcerebrum:
defaultProvider: minimax-portal
defaultModel: MiniMax-M2.7
providers:
minimax-portal:
auth: oauth
baseUrl: https://api.minimax.io/anthropicAll cloud providers support custom model IDs through onboarding's Other (enter model ID) option or by editing cerebrum.defaultModel manually.
| Provider | Auth | Env var | Notes |
|---|---|---|---|
| Anthropic | API key | ANTHROPIC_API_KEY |
Claude models |
| OpenAI | API key | OPENAI_API_KEY |
Direct OpenAI API only |
| OpenAI Codex | OAuth | — | ChatGPT/Codex subscription flow via cereworker auth openai-codex |
| API key or OAuth | GOOGLE_API_KEY |
Gemini models | |
| OpenRouter | API key | OPENROUTER_API_KEY |
Default model is auto |
| DeepSeek | API key | DEEPSEEK_API_KEY |
Uses DeepSeek OpenAI-compatible API |
| xAI | API key | XAI_API_KEY |
Grok models with xAI-specific tool compatibility |
| Mistral | API key | MISTRAL_API_KEY |
Uses stricter Mistral tool-call ID sanitation |
| Together | API key | TOGETHER_API_KEY |
Curated Together-hosted open models |
| Moonshot | API key | MOONSHOT_API_KEY |
Global/CN endpoint choice is stored in baseUrl |
| MiniMax | API key | MINIMAX_API_KEY |
Global/CN endpoint choice is stored in baseUrl; Anthropic-compatible transport |
| MiniMax Portal | OAuth | — | Browser/device OAuth flow via cereworker auth minimax-portal; endpoint choice is stored in baseUrl |
| Local | none | — | Ollama, vLLM, or other local OpenAI-compatible endpoint |
git clone https://github.com/Producible/CereWorker.git
cd CereWorker
pnpm install
pnpm build
pnpm startThe onboarding wizard (cereworker onboard) automatically pulls the Cerebellum Docker image. To start it manually:
docker pull cereworker/cerebellum
docker run -d --name cereworker-cerebellum -p 50051:50051 cereworker/cerebellumOr from source:
docker compose up -d cerebellumCereWorker checks for new versions on startup. If a newer release is available on npm, the banner shows an update notice with the install command. The check is cached for 24 hours and runs in the background (no startup delay). cereworker -v also shows available updates.
When a new Cerebellum image is published, CereWorker automatically pulls it in the background on startup. To update the image explicitly:
cereworker images upgradeThis pulls the latest cereworker/cerebellum image from Docker Hub and removes the old container so it gets recreated with the new image on next start. To check the current image status:
cereworker imagesAdd channel config to ~/.cereworker/config.yaml:
channels:
slack:
enabled: true
botToken: xoxb-...
appToken: xapp-...
discord:
enabled: true
token: ...
applicationId: "..." # enables / slash command autocomplete
telegram:
enabled: true
token: ...
matrix:
enabled: true
homeserver: https://matrix.org
token: ...
userId: "@bot:matrix.org"
feishu:
enabled: true
appId: cli_...
appSecret: ...
verificationToken: ... # optional
encryptKey: ... # optional
wechat:
enabled: true
puppet: wechaty-puppet-wechat4u # or other puppet provider
token: ... # optional, depends on puppet
whatsapp:
enabled: true # scans QR code on first run
signal:
enabled: true
account: "+15551234567" # E.164 phone number
signalCliUrl: http://127.0.0.1:8080 # signal-cli REST daemon
irc:
enabled: true
host: irc.libera.chat
port: 6697
tls: true
nick: cereworker
channels: ["#mychannel"]When you type a message in the TUI:
- The Orchestrator appends it to the conversation and calls the Cerebrum
- The Cerebrum streams its response via AI SDK, emitting text chunks to the TUI in real-time
- If the Cerebrum decides to use a tool (shell, file, browser), the tool executes and the result feeds back into the LLM for the next reasoning step
- The final response is appended to the conversation
- Asynchronously, the Cerebellum is notified of the completed turn (for monitoring and future fine-tuning)
Running in parallel:
- The Cerebellum's heartbeat engine ticks every N seconds (configurable, default 30s)
- For each registered task, deterministic code checks elapsed time against the schedule: too early (< 0.8x interval)? skip. Way overdue (> 2.0x interval)? invoke. No model needed
- Only for tasks in the ambiguous zone does the Cerebellum get asked: "Should this run now? yes/no" (3 tokens, binary verdict)
- "Invoke" actions stream back to the TypeScript orchestrator via gRPC server-streaming
- The orchestrator executes the invoked tasks (which may trigger Cerebrum calls, tool runs, or notifications)
The Cerebrum can spawn independent sub-agents for parallel work:
- The Cerebrum calls
spawn_agentwith a task description, creating an async worker. SetlongRunning: truefor tasks that may take hours or days (unlimited timeout, session preserved for recovery) - Each sub-agent gets its own isolated conversation, session directory, and Hippocampus memory at
~/.cereworker/agents/<id>/memory/ - Sub-agents share the same Cerebrum provider and tools (except they cannot spawn further sub-agents). Long-running agents call
report_progressto report intermediate status - The Cerebellum is the sole lifecycle manager for sub-agents -- there are no hard timeouts on the TypeScript side. Every heartbeat tick, the Cerebellum evaluates agent health via
ReportAgentStateswith progress-aware logic:- Active: agent has recent activity or progress reports -- no action needed
- Stalled (no progress): deterministic stall detection, LLM tiebreaker for ambiguous cases ("should we retry? yes/no")
- Past deadline + making progress: LLM asked "agent exceeded deadline but reports progress -- should it continue?" before killing
- Past deadline + no progress: timed out deterministically
- Corrective actions are applied automatically:
ping(inject a prod message),retry(re-spawn), orcancel - The Cerebrum can query agent status (including progress) via
query_agentsand cancel viacancel_agent. Users can monitor and stop agents via/agents stop <id>and/agents info <id> - Restart recovery: if CereWorker restarts while agents are running, they are automatically resumed from their persisted
session.jsonand text-backed conversation files. A resume system message is injected so the agent can continue where it left off - Completed agents with
cleanup: "delete"have their session directories removed;cleanup: "keep"agents (and alllongRunningagents) persist for later reference
Long conversations no longer crash. Before each Cerebrum call, the orchestrator estimates total tokens (chars/4 heuristic with a 1.2x safety margin). When the estimate exceeds a configurable threshold (default 80% of the model's context window), older messages are summarized into a single system message via an LLM call, and only the most recent messages (default 10) are kept verbatim. This happens transparently -- the user sees no interruption.
Configure via:
cerebrum:
contextWindow: 128000 # model context window size (tokens)
compaction:
enabled: true
threshold: 0.8 # compact at 80% of context window
keepRecentMessages: 10 # keep last N messages verbatimShell command execution has two operating modes, toggled via /auto [on|off]:
-
Supervised mode (default): Safe read-only commands (
ls,cat,git status,node, etc.) execute automatically. Destructive patterns (rm -rf,curl | bash,git push --force, etc.) are blocked outright. Unknown commands prompt the user for approval before executing. -
Full-auto mode: All commands execute without prompts. The Cerebellum pre-screens commands as a safety net. Hard-block patterns (
rm -rf /,mkfs,dd if=... of=/dev/...) are always blocked regardless of mode.
Type /stop at any time -- even while the Cerebrum is streaming -- to immediately abort all operations. The orchestrator cancels the active stream, terminates all running sub-agents, and emits an emergency:stop event. The TUI confirms the stop.
The Cerebellum monitors the main Cerebrum stream for stalls. During active streaming, a watchdog checks every 15 seconds whether any chunks or tool results have been received. If the stream goes silent beyond the threshold (default 30 seconds), the Cerebellum is asked: "should we nudge it to continue?" If yes, the stalled stream is aborted via AbortSignal, a system message is injected, and the conversation automatically resumes. The TUI shows a [STALL] indicator during detected stalls.
Configure via:
cerebrum:
streamStallThreshold: 30 # seconds before stall detection
maxNudgeRetries: 2 # max nudge attempts per turnThis prevents the common failure mode where complex multi-tool workflows (browser automation, long API chains) cause the LLM to silently stop mid-response.
CereWorker supports a hub-spoke topology where one instance (the gateway) coordinates multiple instances (nodes) on other machines. The gateway's Cerebrum can call tools on any connected node as if they were local.
Gateway mode: One CereWorker instance runs the Cerebrum and starts a WebSocket server. Nodes connect and register their local tools. The gateway creates proxy tools (shell@workstation-gpu, file@nas-server, etc.) that the Cerebrum can call during normal reasoning. Tool invocations are forwarded over WebSocket, executed on the remote node (respecting the node's local exec-policy), and results flow back transparently.
Node mode: A CereWorker instance connects to a gateway and exposes its local tools. It receives invoke frames, executes them locally, and returns results. Emergency stop commands propagate from gateway to all nodes.
Standalone mode (default): No gateway behavior. Everything works as a single-machine agent.
Configure via:
# Gateway instance
gateway:
mode: gateway
port: 18800
token: ${GATEWAY_TOKEN} # shared secret for node auth
# Node instance
gateway:
mode: node
gatewayUrl: ws://gateway-host:18800
nodeId: workstation-gpu
token: ${GATEWAY_TOKEN}
capabilities: [shell, file] # tools to expose (empty = all)Use /nodes in the TUI to see connected nodes (gateway mode) or connection status (node mode). For remote access, use Tailscale, WireGuard, or SSH tunnels -- do not expose the WebSocket port on the public internet without TLS.
CereWorker's browser tools support three backend modes, configured via tools.browser.mode:
Launch mode (default): Starts a Puppeteer-managed browser. Set headless: false to see the browser window. Best for automated scraping and testing where no existing session is needed.
Connect mode: Attaches to a running Chrome instance via CDP. Start Chrome with --remote-debugging-port=9222 and set mode: connect. Useful when you need access to an already-authenticated browser session from the command line.
Extension mode: Controls the user's actual Chrome browser through the CereWorker Browser Bridge extension. The agent sees the same tabs, cookies, and sessions as the user -- ideal for tasks like posting to social media, managing dashboards, or any workflow that requires an authenticated browser.
The extension is a static folder (no build step) located at packages/browser/extension/ in the source repo.
From source:
git clone https://github.com/Producible/CereWorker.git
# The extension is at CereWorker/packages/browser/extension/From npm install:
# Print the extension directory path
cereworker extension-dirThen load it into Chrome:
- Open
chrome://extensionsin Chrome - Enable Developer mode (toggle in the top-right corner)
- Click Load unpacked and select the
extension/directory - The CereWorker Browser Bridge icon appears in your toolbar
Click the extension icon's three-dot menu -> Options (or right-click the icon -> Options):
- Relay Port: Must match your CereWorker config (default
18900) - Token: Must match
tools.browser.extension.tokenin your config (leave empty if no token is set) - Click Save & Test Connection to verify the relay server is reachable
Option A — Interactive CLI:
cereworker configure browser
# Select "Extension", set relay port and token interactivelyOption B — Manual config (~/.cereworker/config.yaml):
tools:
browser:
mode: extension
extension:
relayPort: 18900
# token: my-secret # optional, must match extension optionsThen start CereWorker:
cereworker-
Click the extension icon in Chrome -- the badge indicates status:
ON(green): Connected to CereWorker relay...(yellow): Connecting!(red): Connection error- Click again to disconnect
-
The TUI status bar shows
[EXT]when the extension is active
The agent can now navigate, click, type, take screenshots, and manage tabs in your actual browser. It automatically prefers httpFetch for API calls and only uses browser tools for pages requiring JavaScript rendering or interactive elements.
For production deployments, CereWorker can run without the TUI as a persistent background service:
cereworker serveThis starts the orchestrator, channels, and gateway/node connections headlessly. All log levels write to stderr (captured by journalctl) and to ~/.cereworker/cereworker.log. A health HTTP endpoint is exposed for monitoring:
| Mode | Health endpoint | Port |
|---|---|---|
| Gateway | /healthz |
18801 (WS server uses 18800) |
| Node | /healthz |
18800 |
| Standalone | /healthz |
18800 |
Systemd unit templates are provided in systemd/:
sudo cp systemd/cereworker-gateway.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now cereworker-gatewayThe service handles SIGTERM/SIGINT for graceful shutdown and restarts on failure. Both unit files include systemd hardening (NoNewPrivileges, ProtectSystem=strict, ProtectHome=read-only).
When a message arrives from Slack/Discord/Telegram/Matrix/Feishu/WeChat/WhatsApp/Signal/IRC:
- The channel adapter receives it and checks the sender against the allowlist
- If allowed, it forwards the message text to the orchestrator
- The orchestrator processes it the same way as a TUI message (Cerebrum reasoning + tools)
- The response is sent back through the same channel adapter
Slash command autocomplete: Discord registers Application Commands on startup (when applicationId is set), so users see all CereWorker commands in the / menu. Telegram registers commands via setMyCommands() for the same autocomplete experience. Other platforms process /command as plain text.
| Aspect | Traditional agent | CereWorker |
|---|---|---|
| Memory | Prompt injection, vector DB search | Fine-tuned into Cerebellum parameters |
| Scheduling | Cron expressions, fixed timers | Small LLM evaluates what needs attention |
| Verification | Trust LLM output | Cerebellum monitors actual disk/network effects |
| Context limits | Summarize and hope | Knowledge survives in model weights |
| Sub-agents | Hard timeouts, lost on restart | Cerebellum-managed lifecycle with progress tracking and restart recovery |
| Multi-node | Custom gateway or none | Built-in WebSocket gateway with remote tool proxying |
| Identity | Static config, swappable profiles | Learned through conversation, fine-tuned into weights |
| Cost | Every request hits giant LLM | Routine decisions handled by local 0.6B model |
Most AI agents today are static -- they improve only when a developer ships a new prompt or a new model version drops. CereWorker is designed to improve itself continuously through architectural mechanisms that no amount of prompt engineering can replicate.
Traditional agents simulate memory by stuffing summaries into the context window. This is lossy, expensive, and bounded by token limits. CereWorker's Hippocampus-to-Cerebellum pipeline turns conversations into actual model weight updates. Knowledge isn't recalled -- it's recognized, the way a person doesn't "look up" how to ride a bike. After fine-tuning, the Cerebellum responds faster, uses zero context tokens for learned knowledge, and retains it permanently across sessions.
A cron job runs whether it's useful or not. CereWorker's Heartbeat asks the Cerebellum "what needs attention right now?" every tick. As the Cerebellum fine-tunes on past decisions -- which tasks were invoked, which were pointless, which were deferred too long -- its scheduling judgment improves. The agent learns when to act, not just what to do.
When the Cerebellum catches the Cerebrum lying (e.g., claiming a file was written when it wasn't), that failure becomes a training signal. Over time, the Cerebellum builds an internal model of which tool outputs to trust and which to double-check. This is a safety property that emerges from architecture, not from a system prompt saying "please verify your work."
Every user's CereWorker diverges. A developer's Cerebellum learns to schedule builds and check test results. A researcher's learns to watch for new papers and summarize findings. A DevOps engineer's learns to correlate deploy times with incident frequency. The fine-tuning is unsupervised and domain-agnostic -- the architecture doesn't know what you do, but the weights will.
Other agents get more expensive as they get smarter (longer prompts, more retrieval, bigger context). CereWorker gets cheaper: knowledge that moves from Hippocampus files into Cerebellum weights no longer needs to be fetched, embedded, or injected. The more the agent learns, the less work the Cerebrum has to do per request.
| Prompt-Engineered Agents | CereWorker | |
|---|---|---|
| Memory | Injected into context (lossy, bounded) | Fine-tuned into weights (permanent, zero-cost at inference) |
| Scheduling | Static rules or rigid tool calls | Small LLM that improves its own judgment over time |
| Verification | "Please double-check" in system prompt | Independent model monitors actual side effects |
| Specialization | Same generic agent for every user | Weights diverge per user's domain and habits |
| Cost over time | Grows (more context, more retrieval) | Shrinks (learned knowledge exits the prompt) |
| Package | npm | Description |
|---|---|---|
@cereworker/cli |
Ink 5 terminal UI | |
@cereworker/core |
Orchestrator, message model, typed events, conversation store | |
@cereworker/cerebrum |
AI SDK 6 multi-provider LLM abstraction + built-in tools | |
@cereworker/cerebellum-client |
gRPC client for the Cerebellum container | |
@cereworker/channels |
IM adapters (Slack, Discord, Telegram, Matrix, Feishu, WeChat, WhatsApp, Signal, IRC) | |
@cereworker/browser |
Browser automation (Puppeteer, CDP, Chrome extension) | |
@cereworker/skills |
SKILL.md plugin loader and registry | |
@cereworker/hippocampus |
Temporary memory store, memory tools, fine-tune curator | |
@cereworker/gateway |
WebSocket gateway for multi-node control | |
@cereworker/config |
YAML config with Zod validation, env var interpolation |
Shell & File Operations -- Execute commands, read/write files, edit files by exact match, list directories, search file contents (grep/ripgrep), find files by glob pattern. Shell execution is governed by the exec policy: safe binaries (ls, git, node, etc.) auto-execute, destructive patterns are blocked, and unknown commands prompt for approval in supervised mode
Browser Automation -- Navigate, screenshot, click, type, evaluate JS, wait for elements, manage tabs (list/switch/open/close). Three backend modes: launch (Puppeteer headless), connect (CDP to running Chrome via --remote-debugging-port), or extension (control the user's actual Chrome session via the CereWorker Browser Bridge extension)
HTTP & Web Search -- Fetch URLs (httpFetch) with timeout and private-IP blocking. Search the web (webSearch) via DuckDuckGo, no API key required
Memory (Hippocampus) -- Read/write MEMORY.md, append daily logs, search across memory files
Sub-Agents -- Spawn parallel workers (spawn_agent with optional longRunning: true for hours/days tasks), check status and progress (query_agents), cancel (cancel_agent). Each sub-agent has isolated memory and session. Long-running agents call report_progress to report intermediate status. Agents are automatically recovered after a restart
Skills are defined as SKILL.md files with YAML frontmatter:
---
name: github
description: "GitHub operations via gh CLI"
metadata:
cereworker:
requires:
bins: ["gh"]
---
# GitHub Skill
Use the `gh` CLI to interact with GitHub...Place skills in ~/.cereworker/skills/ or the project's skills/ directory.
The Cerebellum supports multiple small LLMs, selectable during onboarding or via config:
| Model | HuggingFace ID | Size | Min RAM | Best for |
|---|---|---|---|---|
| Qwen3 0.6B | Qwen/Qwen3-0.6B |
~1.2 GB | 2 GB | CPU-only, low-memory systems |
| Qwen3 1.7B | Qwen/Qwen3-1.7B |
~3.4 GB | 4 GB | CPU with 8+ GB RAM |
| SmolLM2 360M | HuggingFaceTB/SmolLM2-360M-Instruct |
~720 MB | 1.5 GB | Ultra-lightweight, fastest |
| SmolLM2 1.7B | HuggingFaceTB/SmolLM2-1.7B-Instruct |
~3.4 GB | 4 GB | Good balance of speed and quality |
| Phi-4 Mini 3.8B | microsoft/Phi-4-mini-instruct |
~7.6 GB | 8 GB | GPU recommended, best quality |
| Custom | local path | varies | varies | Your own fine-tuned checkpoint |
Fine-tuning methods: Auto (detects your hardware), LoRA (GPU 4+ GB VRAM), QLoRA (GPU 2+ GB VRAM), Full (CPU with 16+ GB RAM or GPU with 8+ GB VRAM). Full fine-tuning works on CPU-only machines using float32 precision. Schedule: Auto (idle time), Hourly, Daily, or Weekly.
The Hippocampus is CereWorker's temporary memory layer that bridges conversations and fine-tuning:
~/.cereworker/memory/
MEMORY.md # Curated long-term notes (always loaded)
2026-03-08.md # Today's session log
2026-03-07.md # Yesterday's log
~/.cereworker/conversations/
<conversation-id>/
meta.json
messages.jsonl
~/.cereworker/pairing/
requests.jsonl
approved-users.jsonl
~/.cereworker/plans/
<plan-id>.json
~/.cereworker/finetune/
queue/
discovery.jsonl
conversations.jsonl
curated-memory.jsonl
rounds/
<job-id>/
manifest.json
training.jsonl
sources/
discovery.jsonl
conversations.jsonl
curated-memory.jsonl
The Cerebrum reads and writes memory through four tools: memory_read, memory_write, memory_log, and memory_search. Periodically, a curator reviews the Hippocampus and asks the Cerebrum: "Which of these memories contain durable knowledge worth permanently learning?" The answer is extracted as instruction/response training pairs and queued for the Cerebellum's fine-tuning pipeline.
This creates a natural flow: conversation --> Hippocampus (files) --> curation (Cerebrum) --> fine-tuning queue --> per-round archive --> permanent knowledge (model weights).
On first start after upgrading from the old SQLite-backed layout, CereWorker exports legacy conversations.db data into the text layout above and keeps the original file as conversations.db.bak.
Config is loaded with cascading precedence:
- Built-in defaults
~/.cereworker/config.yaml(global)./.cereworker.yaml(project-local)- Environment variables (
ANTHROPIC_API_KEY,OPENAI_API_KEY,GOOGLE_API_KEY,OPENROUTER_API_KEY,DEEPSEEK_API_KEY,XAI_API_KEY,MISTRAL_API_KEY,TOGETHER_API_KEY,MOONSHOT_API_KEY,MINIMAX_API_KEY) - CLI flags
Full config example:
# profile is learned through discovery on first run
# override here if needed:
# profile:
# name: Cere
# role: full-stack developer
# traits: [concise, proactive]
cerebrum:
defaultProvider: anthropic
defaultModel: claude-sonnet-4-6
providers:
anthropic:
apiKey: ${ANTHROPIC_API_KEY}
# direct OpenAI API usage:
# openai:
# apiKey: ${OPENAI_API_KEY}
# OpenRouter:
# openrouter:
# apiKey: ${OPENROUTER_API_KEY}
# DeepSeek:
# deepseek:
# apiKey: ${DEEPSEEK_API_KEY}
# xAI:
# xai:
# apiKey: ${XAI_API_KEY}
# Mistral:
# mistral:
# apiKey: ${MISTRAL_API_KEY}
# Together:
# together:
# apiKey: ${TOGETHER_API_KEY}
# Moonshot global or CN:
# moonshot:
# apiKey: ${MOONSHOT_API_KEY}
# baseUrl: https://api.moonshot.ai/v1
# MiniMax global or CN:
# minimax:
# apiKey: ${MINIMAX_API_KEY}
# baseUrl: https://api.minimax.io/anthropic
# ChatGPT/Codex subscription OAuth:
# openai-codex:
# auth: oauth
# MiniMax Portal OAuth:
# minimax-portal:
# auth: oauth
# baseUrl: https://api.minimax.io/anthropic
contextWindow: 128000
streamStallThreshold: 30 # seconds before Cerebellum nudges stalled stream
maxNudgeRetries: 2 # max nudge attempts per turn
compaction:
enabled: true
threshold: 0.8
keepRecentMessages: 10
cerebellum:
enabled: true
model:
source: huggingface
id: Qwen/Qwen3-0.6B
finetune:
enabled: true
method: auto # auto | lora | qlora | full
schedule: auto # auto | hourly | daily | weekly
docker:
autoStart: true
hippocampus:
enabled: true
directory: ~/.cereworker/memory
maxDailyLogDays: 30
autoLog: true
subAgents:
enabled: true
maxConcurrent: 5
defaultTimeoutMinutes: 5
monitorIntervalSeconds: 30
tools:
shell:
autoMode: false # true = full-auto (no approval prompts)
browser:
enabled: true
mode: launch # launch | connect | extension
headless: true
cdpPort: 9222 # for connect mode
extension:
relayPort: 18900 # for extension mode
# token: ${BROWSER_TOKEN} # optional shared secret
runtime:
engine: enhanced # enhanced | legacy
maxResultChars: 20000
loopDetection:
enabled: false
warningThreshold: 10
criticalThreshold: 20
tui:
showActivity: true # false = hide tool call details and progress chatter
`tools.runtime.engine: enhanced` is the default tool runtime. It adds provider-aware tool-schema normalization, transcript repair, replay truncation, and loop detection while keeping the existing tool names and CLI UX unchanged. Set `legacy` only if you need the older behavior for compatibility or debugging.
gateway:
mode: standalone # standalone | gateway | node
port: 18800
# token: ${GATEWAY_TOKEN} # shared secret for node auth
# gatewayUrl: ws://host:18800 # node mode: gateway address
# nodeId: my-node # node mode: unique identifier
# capabilities: [shell, file] # node mode: tools to expose
channels:
telegram:
enabled: true
token: ${TELEGRAM_BOT_TOKEN}pnpm install # install deps
pnpm build # build all packages
pnpm test # run the full local test suite
pnpm typecheck # type-check without emitting
pnpm dev # run CLI in dev mode (tsx)
pnpm dev -- serve # run headless service in dev modepnpm test:unit # unit and focused integration tests
pnpm test:e2e:integration # in-process service/orchestrator end-to-end tests
pnpm test:e2e:cli # built CLI smoke tests
pnpm test:e2e:install # clean-install smoke against a published packageThe test layers are split intentionally:
test:unitcovers the normal Vitest suite without the heavier end-to-end cases.test:e2e:integrationexercises the realcreateService(...)bridge with fake providers and channel inputs, including watchdog retry behavior and channel conversation routing.test:e2e:clibuilds the CLI and spawns the realcereworkerbinary to cover flows likecereworker -v,cereworker images,cereworker images upgrade,cereworker serve, and rerunning onboarding with "keep current configuration".test:e2e:installinstalls a published CLI into a temporary prefix and runs a blank-machine smoke flow. Override the package under test withCEREWORKER_PACKAGE_SPEC, for example:
CEREWORKER_PACKAGE_SPEC=@cereworker/cli@latest pnpm test:e2e:installGitHub Actions runs the new test layers in two stages:
.github/workflows/ci.ymlruns on pull requests and pushes tomain, and gates changes ontypecheck,test:unit,test:e2e:integration, andtest:e2e:cli..github/workflows/install-smoke.ymlruns nightly, on manual dispatch, and on release tags. It validates a fresh npm install of the published CLI and performs a Docker image smoke check against the public Cerebellum image.
CereWorker is built around a practical agent architecture focused on channels, skills, persistent memory, and autonomous task execution, extended here with the Cerebellum/Cerebrum dual-LLM design.
MIT

