An autonomous AI thinking agent that runs 24/7. It maintains a priority-based stimulus queue, generates thoughts via LLM endpoints, filters repetitive output through anti-rumination detection, rotates across multiple model servers, and persists a stream of thoughts with quality scoring.
Most AI systems only think when asked. This daemon thinks continuously -- processing events, generating observations, making predictions, and building a persistent stream of scored thoughts. It is designed for systems that benefit from ambient intelligence: a background process that watches, thinks, and surfaces insights without being prompted.
The anti-rumination system is the key innovation. Without it, an autonomous LLM loops on the same observations endlessly. The daemon uses topic signatures and concept overlap detection to suppress repetitive thoughts and force novelty in each thinking cycle.
+-------------------+
| Context Sources |
| (pluggable) |
+--------+----------+
|
v
+-------------------+
| Event Poller |
| (60s interval) |
+--------+----------+
|
creates Stimulus objects
|
v
+----------+ +-------------------+ +-------------------+
| Timer |------->| Priority Queue |------->| LLM Client |
| Prompts | | P1=reactive | | (multi-endpoint |
| (5min, | | P2=scheduled | | rotation with |
| 4hr) | | P3=deliberative | | health checks) |
+----------+ | P4=deep | +--------+----------+
+-------------------+ |
v
+-------------------+
| Anti-Rumination |
| Filter |
| - topic sigs |
| - concept overlap |
| - pattern detect |
+--------+----------+
|
pass / discard
|
v
+-------------------+
| Thought Stream |
| (JSONL + scoring) |
+-------------------+
|
v
+-------------------+
| Briefing Writer |
| (30min summaries) |
+-------------------+
- Priority Queue: Stimuli are prioritized (P1 reactive > P2 scheduled > P3 deliberative > P4 deep). Reactive events (new data, alerts) preempt scheduled thinking.
- Multi-Endpoint LLM Client: Supports both OpenAI-compatible and Ollama endpoints. Auto-discovers models, rotates on failure, cools down unhealthy endpoints.
- Anti-Rumination Filter: Three-layer deduplication:
- Topic signatures: Extracts key concepts, identifiers, and named entities from each thought
- Concept overlap: Computes Jaccard-style overlap between new and recent thoughts (>55% = duplicate)
- Pattern detection: Catches self-narration patterns ("You are thinking about...")
- Quality Scoring: Each thought gets an importance score (0-10) and a self-assessed quality score (1-5) based on novelty, length, and content type.
- Thought Stream: Append-only JSONL file with full metadata (timestamp, model, latency, scores, stimulus info). Weekly rotation.
- Rolling Briefings: Every 30 minutes, the daemon summarizes its recent thoughts into a briefing document for other systems to consume.
- Deliberative Prompts: Rotating set of thinking prompts that force diverse cognitive modes (analytical, contrarian, forward-looking, meta-cognitive, creative).
- Self-Healing: Watches its own thought output rate. If no thought is produced for 30 minutes, resets HTTP clients and re-probes all endpoints.
- Memory Leak Protection: Monitors RSS and self-restarts if memory exceeds threshold.
- HTTP API: Local REST API for health checks, state inspection, pause/resume, and injecting session context.
- Pause Windows: Configurable time windows where the daemon pauses (e.g., during scheduled reflection jobs).
# Install dependencies
pip install requests
# Configure at least one LLM endpoint
export THINKING_DAEMON_LLM_URL="http://localhost:8800/v1/chat/completions"
export THINKING_DAEMON_MODEL="your-model-name"
# Create a system prompt
cat > system_prompt.md << 'EOF'
You are a thinking daemon. You observe data, make predictions, and surface insights.
When you have an important observation, prefix it with SAVE: to persist it.
When you make a prediction, prefix it with PREDICT: to track it.
Keep thoughts concise (20-80 words). Be specific. Reference real data.
EOF
# Run
python3 thinking_daemon.py --system-prompt system_prompt.md| Variable | Default | Description |
|---|---|---|
THINKING_DAEMON_LLM_URL |
http://localhost:8800/v1/chat/completions |
Primary OpenAI-compatible endpoint |
THINKING_DAEMON_MODEL |
(auto-discover) | Model name for primary endpoint |
THINKING_DAEMON_OLLAMA_URLS |
http://localhost:11434 |
Comma-separated Ollama URLs |
THINKING_DAEMON_MLX_URLS |
(same as primary) | Comma-separated MLX server URLs |
THINKING_DAEMON_OLLAMA_MODEL_PREFERENCES |
qwen2.5:32b,llama3:70b |
Comma-separated model preferences for Ollama |
THINKING_DAEMON_PORT |
8768 |
HTTP API port |
THINKING_DAEMON_STREAM_FILE |
./thought_stream.jsonl |
Path to thought stream output |
THINKING_DAEMON_STATE_FILE |
./daemon_state.json |
Path to persistent state |
| Parameter | Default | Description |
|---|---|---|
| Poll interval | 60s | How often to check context sources for new events |
| Think interval | 5 min | How often to generate deliberative thoughts |
| Deep interval | 4 hours | How often to run deep analysis prompts |
| Briefing interval | 30 min | How often to write rolling briefings |
| LLM timeout | 120s | Timeout for LLM calls |
| Endpoint cooldown | 60s | How long to skip a failed endpoint |
The biggest challenge with autonomous LLM thinking is rumination -- the model saying the same thing over and over in slightly different words.
Each thought is decomposed into a set of topic tokens:
- Uppercase identifiers (2-5 letters, e.g., stock tickers, acronyms)
- Domain concept phrases (configurable list)
- Named entities extracted via regex
For each new thought, the daemon computes overlap with the last 8 thoughts:
overlap = |new_topics & old_topics| / min(|new_topics|, |old_topics|)
Using min() as the denominator means that if a thought's entire topic set is contained in a recent thought, it scores as a duplicate even if the recent thought covered more topics.
Thresholds:
>= 0.55: Thought is discarded as a duplicate>= 0.50: Thought is discarded (anti-rumination filter)- Quality scoring penalizes high overlap even below thresholds
Known rumination patterns are caught directly:
- "You are thinking about..." (self-narration)
- Repeated status descriptions without new insight
- Listing known facts without analysis
The daemon polls context sources for new events. To add your own:
- Create a context loader function that returns a text summary:
async def load_my_context() -> str:
"""Load current state for grounding thoughts in real data."""
# Read your data sources
# Return a text summary
return "Current status: ..."- Create an event poller that pushes stimuli to the queue:
async def poll_my_events(state: DaemonState, queue: PriorityQueue):
"""Check for new events and create stimuli."""
# Check your event source
new_events = check_for_events()
for event in new_events:
queue.push(Stimulus(
priority=1, # 1=reactive, 2=scheduled, 3=deliberative, 4=deep
layer="reactive",
trigger=event.type,
description=event.description,
))- Register them in the main loop.
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Uptime, thought counts |
/state |
GET | Full daemon state (paused, buffer, predictions) |
/thoughts?last=N |
GET | Last N thoughts from the stream |
/briefing |
GET | Latest rolling briefing |
/pause |
POST | Pause thinking |
/resume |
POST | Resume thinking |
/session-context |
POST | Inject context into the session buffer |
You are a thinking daemon observing a complex system.
Your job is to notice patterns, make predictions, and surface insights.
Keep thoughts between 20-80 words. Be specific. Reference real data.
Prefix important observations with SAVE: to persist them.
Prefix predictions with PREDICT: so they can be tracked.
You are an autonomous market analysis daemon.
You observe price data, volume, and market structure.
Think about what is happening, why, and what might happen next.
Do not repeat yourself. Each thought must add something new.
You are an infrastructure thinking daemon.
You watch service health, error rates, and system metrics.
Think about root causes, correlations, and preventive actions.
Only SAVE observations that are actionable.
Each line in the JSONL stream:
{
"timestamp": "2024-03-15T10:30:00+00:00",
"thought_num": 42,
"layer": "deliberative",
"trigger": "timer",
"stimulus": "What patterns span the last 24 hours?",
"thought": "The actual generated thought text...",
"importance": 6,
"thought_quality_self_score": 4,
"model_used": "qwen2.5:32b",
"endpoint_used": "http://localhost:11434/api/chat",
"latency_ms": 3200,
"token_count": 85
}<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.example.thinking-daemon</string>
<key>ProgramArguments</key>
<array>
<string>/usr/bin/python3</string>
<string>/path/to/thinking_daemon.py</string>
<string>--system-prompt</string>
<string>/path/to/system_prompt.md</string>
</array>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/thinking-daemon.log</string>
<key>StandardErrorPath</key>
<string>/tmp/thinking-daemon.err</string>
</dict>
</plist>[Unit]
Description=AI Thinking Daemon
After=network.target
[Service]
ExecStart=/usr/bin/python3 /path/to/thinking_daemon.py --system-prompt /path/to/system_prompt.md
Restart=always
RestartSec=10
Environment=THINKING_DAEMON_LLM_URL=http://localhost:8800/v1/chat/completions
[Install]
WantedBy=multi-user.targetMIT