ai-thinking-daemon

An autonomous AI thinking agent that runs 24/7. It maintains a priority-based stimulus queue, generates thoughts via LLM endpoints, filters repetitive output through anti-rumination detection, rotates across multiple model servers, and persists a stream of thoughts with quality scoring.

Why This Exists

Most AI systems only think when asked. This daemon thinks continuously -- processing events, generating observations, making predictions, and building a persistent stream of scored thoughts. It is designed for systems that benefit from ambient intelligence: a background process that watches, thinks, and surfaces insights without being prompted.

The anti-rumination system is the key innovation. Without it, an autonomous LLM loops on the same observations endlessly. The daemon uses topic signatures and concept overlap detection to suppress repetitive thoughts and force novelty in each thinking cycle.

Architecture

                     +-------------------+
                     |  Context Sources  |
                     |  (pluggable)      |
                     +--------+----------+
                              |
                              v
                     +-------------------+
                     |  Event Poller     |
                     |  (60s interval)   |
                     +--------+----------+
                              |
                    creates Stimulus objects
                              |
                              v
+----------+        +-------------------+        +-------------------+
| Timer    |------->|  Priority Queue   |------->|  LLM Client       |
| Prompts  |        |  P1=reactive      |        |  (multi-endpoint   |
| (5min,   |        |  P2=scheduled     |        |   rotation with    |
|  4hr)    |        |  P3=deliberative  |        |   health checks)   |
+----------+        |  P4=deep          |        +--------+----------+
                    +-------------------+                 |
                                                          v
                                                +-------------------+
                                                | Anti-Rumination   |
                                                | Filter            |
                                                | - topic sigs      |
                                                | - concept overlap |
                                                | - pattern detect  |
                                                +--------+----------+
                                                         |
                                              pass / discard
                                                         |
                                                         v
                                                +-------------------+
                                                | Thought Stream    |
                                                | (JSONL + scoring) |
                                                +-------------------+
                                                         |
                                                         v
                                                +-------------------+
                                                | Briefing Writer   |
                                                | (30min summaries) |
                                                +-------------------+

Features

Priority Queue: Stimuli are prioritized (P1 reactive > P2 scheduled > P3 deliberative > P4 deep). Reactive events (new data, alerts) preempt scheduled thinking.
Multi-Endpoint LLM Client: Supports both OpenAI-compatible and Ollama endpoints. Auto-discovers models, rotates on failure, cools down unhealthy endpoints.
Anti-Rumination Filter: Three-layer deduplication:
1. Topic signatures: Extracts key concepts, identifiers, and named entities from each thought
2. Concept overlap: Computes Jaccard-style overlap between new and recent thoughts (>55% = duplicate)
3. Pattern detection: Catches self-narration patterns ("You are thinking about...")
Quality Scoring: Each thought gets an importance score (0-10) and a self-assessed quality score (1-5) based on novelty, length, and content type.
Thought Stream: Append-only JSONL file with full metadata (timestamp, model, latency, scores, stimulus info). Weekly rotation.
Rolling Briefings: Every 30 minutes, the daemon summarizes its recent thoughts into a briefing document for other systems to consume.
Deliberative Prompts: Rotating set of thinking prompts that force diverse cognitive modes (analytical, contrarian, forward-looking, meta-cognitive, creative).
Self-Healing: Watches its own thought output rate. If no thought is produced for 30 minutes, resets HTTP clients and re-probes all endpoints.
Memory Leak Protection: Monitors RSS and self-restarts if memory exceeds threshold.
HTTP API: Local REST API for health checks, state inspection, pause/resume, and injecting session context.
Pause Windows: Configurable time windows where the daemon pauses (e.g., during scheduled reflection jobs).

Quick Start

# Install dependencies
pip install requests

# Configure at least one LLM endpoint
export THINKING_DAEMON_LLM_URL="http://localhost:8800/v1/chat/completions"
export THINKING_DAEMON_MODEL="your-model-name"

# Create a system prompt
cat > system_prompt.md << 'EOF'
You are a thinking daemon. You observe data, make predictions, and surface insights.
When you have an important observation, prefix it with SAVE: to persist it.
When you make a prediction, prefix it with PREDICT: to track it.
Keep thoughts concise (20-80 words). Be specific. Reference real data.
EOF

# Run
python3 thinking_daemon.py --system-prompt system_prompt.md

Configuration

Environment Variables

Variable	Default	Description
`THINKING_DAEMON_LLM_URL`	`http://localhost:8800/v1/chat/completions`	Primary OpenAI-compatible endpoint
`THINKING_DAEMON_MODEL`	(auto-discover)	Model name for primary endpoint
`THINKING_DAEMON_OLLAMA_URLS`	`http://localhost:11434`	Comma-separated Ollama URLs
`THINKING_DAEMON_MLX_URLS`	(same as primary)	Comma-separated MLX server URLs
`THINKING_DAEMON_OLLAMA_MODEL_PREFERENCES`	`qwen2.5:32b,llama3:70b`	Comma-separated model preferences for Ollama
`THINKING_DAEMON_PORT`	`8768`	HTTP API port
`THINKING_DAEMON_STREAM_FILE`	`./thought_stream.jsonl`	Path to thought stream output
`THINKING_DAEMON_STATE_FILE`	`./daemon_state.json`	Path to persistent state

Timing Parameters

Parameter	Default	Description
Poll interval	60s	How often to check context sources for new events
Think interval	5 min	How often to generate deliberative thoughts
Deep interval	4 hours	How often to run deep analysis prompts
Briefing interval	30 min	How often to write rolling briefings
LLM timeout	120s	Timeout for LLM calls
Endpoint cooldown	60s	How long to skip a failed endpoint

Anti-Rumination: How It Works

The biggest challenge with autonomous LLM thinking is rumination -- the model saying the same thing over and over in slightly different words.

Topic Signatures

Each thought is decomposed into a set of topic tokens:

Uppercase identifiers (2-5 letters, e.g., stock tickers, acronyms)
Domain concept phrases (configurable list)
Named entities extracted via regex

Concept Overlap Detection

For each new thought, the daemon computes overlap with the last 8 thoughts:

overlap = |new_topics & old_topics| / min(|new_topics|, |old_topics|)

Using min() as the denominator means that if a thought's entire topic set is contained in a recent thought, it scores as a duplicate even if the recent thought covered more topics.

Thresholds:

>= 0.55: Thought is discarded as a duplicate
>= 0.50: Thought is discarded (anti-rumination filter)
Quality scoring penalizes high overlap even below thresholds

Pattern Detection

Known rumination patterns are caught directly:

"You are thinking about..." (self-narration)
Repeated status descriptions without new insight
Listing known facts without analysis

Extending with Custom Context Sources

The daemon polls context sources for new events. To add your own:

Create a context loader function that returns a text summary:

async def load_my_context() -> str:
    """Load current state for grounding thoughts in real data."""
    # Read your data sources
    # Return a text summary
    return "Current status: ..."

Create an event poller that pushes stimuli to the queue:

async def poll_my_events(state: DaemonState, queue: PriorityQueue):
    """Check for new events and create stimuli."""
    # Check your event source
    new_events = check_for_events()
    for event in new_events:
        queue.push(Stimulus(
            priority=1,  # 1=reactive, 2=scheduled, 3=deliberative, 4=deep
            layer="reactive",
            trigger=event.type,
            description=event.description,
        ))

Register them in the main loop.

HTTP API

Endpoint	Method	Description
`/health`	GET	Uptime, thought counts
`/state`	GET	Full daemon state (paused, buffer, predictions)
`/thoughts?last=N`	GET	Last N thoughts from the stream
`/briefing`	GET	Latest rolling briefing
`/pause`	POST	Pause thinking
`/resume`	POST	Resume thinking
`/session-context`	POST	Inject context into the session buffer

Example System Prompts

General Observer

You are a thinking daemon observing a complex system.
Your job is to notice patterns, make predictions, and surface insights.
Keep thoughts between 20-80 words. Be specific. Reference real data.
Prefix important observations with SAVE: to persist them.
Prefix predictions with PREDICT: so they can be tracked.

Market Analyst

You are an autonomous market analysis daemon.
You observe price data, volume, and market structure.
Think about what is happening, why, and what might happen next.
Do not repeat yourself. Each thought must add something new.

System Monitor

You are an infrastructure thinking daemon.
You watch service health, error rates, and system metrics.
Think about root causes, correlations, and preventive actions.
Only SAVE observations that are actionable.

Thought Stream Format

Each line in the JSONL stream:

{
  "timestamp": "2024-03-15T10:30:00+00:00",
  "thought_num": 42,
  "layer": "deliberative",
  "trigger": "timer",
  "stimulus": "What patterns span the last 24 hours?",
  "thought": "The actual generated thought text...",
  "importance": 6,
  "thought_quality_self_score": 4,
  "model_used": "qwen2.5:32b",
  "endpoint_used": "http://localhost:11434/api/chat",
  "latency_ms": 3200,
  "token_count": 85
}

Running as a Service

macOS (launchd)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.example.thinking-daemon</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/bin/python3</string>
        <string>/path/to/thinking_daemon.py</string>
        <string>--system-prompt</string>
        <string>/path/to/system_prompt.md</string>
    </array>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/tmp/thinking-daemon.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/thinking-daemon.err</string>
</dict>
</plist>

Linux (systemd)

[Unit]
Description=AI Thinking Daemon
After=network.target

[Service]
ExecStart=/usr/bin/python3 /path/to/thinking_daemon.py --system-prompt /path/to/system_prompt.md
Restart=always
RestartSec=10
Environment=THINKING_DAEMON_LLM_URL=http://localhost:8800/v1/chat/completions

[Install]
WantedBy=multi-user.target

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
thinking_daemon.py		thinking_daemon.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai-thinking-daemon

Why This Exists

Architecture

Features

Quick Start

Configuration

Environment Variables

Timing Parameters

Anti-Rumination: How It Works

Topic Signatures

Concept Overlap Detection

Pattern Detection

Extending with Custom Context Sources

HTTP API

Example System Prompts

General Observer

Market Analyst

System Monitor

Thought Stream Format

Running as a Service

macOS (launchd)

Linux (systemd)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ai-thinking-daemon

Why This Exists

Architecture

Features

Quick Start

Configuration

Environment Variables

Timing Parameters

Anti-Rumination: How It Works

Topic Signatures

Concept Overlap Detection

Pattern Detection

Extending with Custom Context Sources

HTTP API

Example System Prompts

General Observer

Market Analyst

System Monitor

Thought Stream Format

Running as a Service

macOS (launchd)

Linux (systemd)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages