Skip to content

terry-li-hm/consilium-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Consilium

Multi-model deliberation CLI. 4 frontier LLMs debate a question, then Claude judges and synthesizes.

consilium (Latin): counsel, deliberation, plan

Inspired by Andrej Karpathy's LLM Council, with added blind phase (anti-anchoring), explicit engagement requirements, rotating challenger role, and social calibration mode.

Models

Council (deliberators):

  • GPT (gpt-5.2-pro)
  • Gemini (gemini-3.1-pro-preview)
  • Grok (grok-4)
  • GLM (glm-5)

Judge: Claude Opus 4.6 (synthesizes + adds own perspective)

Installation

pip install consilium

Or with uv:

uv tool install consilium

Setup

Set your OpenRouter API key:

export OPENROUTER_API_KEY=sk-or-v1-...

Optional fallback keys (for flaky models):

export GOOGLE_API_KEY=AIza...      # Gemini fallback
export MOONSHOT_API_KEY=sk-...     # Kimi fallback

Usage

# Basic question
consilium "Should we use microservices or monolith?"

# With social calibration (for interview/networking questions)
consilium "What questions should I ask in the interview?" --social

# With persona context
consilium "Should I take the job?" --persona "builder who hates process work"

# Multiple rounds
consilium "Architecture decision" --rounds 3

# Save transcript
consilium "Career question" --output transcript.md

# Share via GitHub Gist
consilium "Important decision" --share

# List past sessions
consilium --sessions

All sessions are auto-saved to ~/.consilium/sessions/ for later review.

Options

Flag Description
--rounds N Number of deliberation rounds (default: 1, exits early on consensus)
--output FILE Save transcript to file
--named Let models see real names during deliberation (may increase bias)
--no-blind Skip blind first-pass (faster, but first speaker anchors others)
--context TEXT Context hint for judge (e.g., "architecture decision")
--share Upload transcript to secret GitHub Gist
--social Enable social calibration mode (auto-detected for interview/networking)
--persona TEXT Context about the person asking
--challenger MODEL Which model starts as challenger (gpt/gemini/grok/kimi). Rotates each round.
--domain DOMAIN Regulatory domain context (banking, healthcare, eu, fintech, bio)
--followup Enable interactive drill-down after judge synthesis
--quiet Suppress progress output
--sessions List recent saved sessions
--no-save Don't auto-save transcript to ~/.consilium/sessions/

How It Works

Blind First-Pass (Anti-Anchoring):

  1. All models generate short "claim sketches" independently and in parallel
  2. This prevents the "first speaker lottery" where whoever speaks first anchors the debate
  3. Each model commits to an initial position before seeing any other responses

Deliberation Protocol:

  1. All models see everyone's blind claims, then deliberate
  2. Each model MUST explicitly AGREE, DISAGREE, or BUILD ON previous speakers by name
  3. After each round, the system checks for consensus (3/4 non-challengers agreeing triggers early exit)
  4. Judge synthesizes the full deliberation

Rotating Challenger:

  • One model each round is assigned the "challenger" role
  • The challenger MUST argue the contrarian position and identify weaknesses in emerging consensus
  • Role rotates each round (GPT R1 → Gemini R2 → Grok R3 → Kimi R4...) to ensure sustained disagreement
  • Challenger is excluded from consensus detection (forced disagreement shouldn't block early exit)

Anonymous Deliberation:

  • Models see each other as "Speaker 1", "Speaker 2", etc. during deliberation
  • Prevents models from playing favorites based on vendor reputation
  • Output transcript shows real model names for readability

When to Use

Use the council when:

  • Making an important decision that benefits from diverse perspectives
  • You want models to actually debate, not just answer in parallel
  • You need a synthesized recommendation, not raw comparison
  • Exploring trade-offs where different viewpoints matter

Skip the council when:

  • You're just thinking out loud (exploratory discussions)
  • The answer depends on personal preference more than objective trade-offs
  • Speed matters (council takes 60-90 seconds)

Python API

from consilium import run_council, COUNCIL
import os

api_key = os.environ["OPENROUTER_API_KEY"]

transcript, failed_models = run_council(
    question="Should we use microservices or monolith?",
    council_config=COUNCIL,
    api_key=api_key,
    rounds=2,
    verbose=True,
    social_mode=False,
)

print(transcript)

License

MIT

About

Multi-model deliberation CLI — 4 frontier LLMs debate, then Claude judges.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages