Multi-model deliberation CLI. 4 frontier LLMs debate a question, then Claude judges and synthesizes.
consilium (Latin): counsel, deliberation, plan
Inspired by Andrej Karpathy's LLM Council, with added blind phase (anti-anchoring), explicit engagement requirements, rotating challenger role, and social calibration mode.
Council (deliberators):
- GPT (gpt-5.2-pro)
- Gemini (gemini-3.1-pro-preview)
- Grok (grok-4)
- GLM (glm-5)
Judge: Claude Opus 4.6 (synthesizes + adds own perspective)
pip install consiliumOr with uv:
uv tool install consiliumSet your OpenRouter API key:
export OPENROUTER_API_KEY=sk-or-v1-...Optional fallback keys (for flaky models):
export GOOGLE_API_KEY=AIza... # Gemini fallback
export MOONSHOT_API_KEY=sk-... # Kimi fallback# Basic question
consilium "Should we use microservices or monolith?"
# With social calibration (for interview/networking questions)
consilium "What questions should I ask in the interview?" --social
# With persona context
consilium "Should I take the job?" --persona "builder who hates process work"
# Multiple rounds
consilium "Architecture decision" --rounds 3
# Save transcript
consilium "Career question" --output transcript.md
# Share via GitHub Gist
consilium "Important decision" --share
# List past sessions
consilium --sessionsAll sessions are auto-saved to ~/.consilium/sessions/ for later review.
| Flag | Description |
|---|---|
--rounds N |
Number of deliberation rounds (default: 1, exits early on consensus) |
--output FILE |
Save transcript to file |
--named |
Let models see real names during deliberation (may increase bias) |
--no-blind |
Skip blind first-pass (faster, but first speaker anchors others) |
--context TEXT |
Context hint for judge (e.g., "architecture decision") |
--share |
Upload transcript to secret GitHub Gist |
--social |
Enable social calibration mode (auto-detected for interview/networking) |
--persona TEXT |
Context about the person asking |
--challenger MODEL |
Which model starts as challenger (gpt/gemini/grok/kimi). Rotates each round. |
--domain DOMAIN |
Regulatory domain context (banking, healthcare, eu, fintech, bio) |
--followup |
Enable interactive drill-down after judge synthesis |
--quiet |
Suppress progress output |
--sessions |
List recent saved sessions |
--no-save |
Don't auto-save transcript to ~/.consilium/sessions/ |
Blind First-Pass (Anti-Anchoring):
- All models generate short "claim sketches" independently and in parallel
- This prevents the "first speaker lottery" where whoever speaks first anchors the debate
- Each model commits to an initial position before seeing any other responses
Deliberation Protocol:
- All models see everyone's blind claims, then deliberate
- Each model MUST explicitly AGREE, DISAGREE, or BUILD ON previous speakers by name
- After each round, the system checks for consensus (3/4 non-challengers agreeing triggers early exit)
- Judge synthesizes the full deliberation
Rotating Challenger:
- One model each round is assigned the "challenger" role
- The challenger MUST argue the contrarian position and identify weaknesses in emerging consensus
- Role rotates each round (GPT R1 → Gemini R2 → Grok R3 → Kimi R4...) to ensure sustained disagreement
- Challenger is excluded from consensus detection (forced disagreement shouldn't block early exit)
Anonymous Deliberation:
- Models see each other as "Speaker 1", "Speaker 2", etc. during deliberation
- Prevents models from playing favorites based on vendor reputation
- Output transcript shows real model names for readability
Use the council when:
- Making an important decision that benefits from diverse perspectives
- You want models to actually debate, not just answer in parallel
- You need a synthesized recommendation, not raw comparison
- Exploring trade-offs where different viewpoints matter
Skip the council when:
- You're just thinking out loud (exploratory discussions)
- The answer depends on personal preference more than objective trade-offs
- Speed matters (council takes 60-90 seconds)
from consilium import run_council, COUNCIL
import os
api_key = os.environ["OPENROUTER_API_KEY"]
transcript, failed_models = run_council(
question="Should we use microservices or monolith?",
council_config=COUNCIL,
api_key=api_key,
rounds=2,
verbose=True,
social_mode=False,
)
print(transcript)MIT