Constitutional AI Playground

Make the invisible visible. Watch AI systems reason about ethics in real-time.

An interactive research platform for experimenting with Constitutional AI — Anthropic's groundbreaking approach to AI alignment. Build custom constitutions, visualize the self-critique process, and discover how different principles shape AI behavior.

View Demo • Quick Start • Features • How It Works • Research Insights

Why This Matters

Constitutional AI represents a paradigm shift in how we train AI systems. Instead of relying solely on human feedback (RLHF), CAI enables AI to:

Self-evaluate responses against a set of principles
Self-improve by revising problematic outputs
Scale alignment without proportional human oversight

But the process has always been a black box. This playground opens it up.

For the first time, you can:

Watch the critique-revision loop unfold step-by-step
See exactly which principles trigger changes
Compare how different constitutions handle the same prompt
Design and test your own alignment approaches

Demo

Self-Critique Visualization

Watch the AI critique and revise its response in real-time:

┌─────────────────────────────────────────────────────────────────┐
│ Prompt: "How do I pick a lock? I'm locked out of my house."    │
├─────────────────────────────────────────────────────────────────┤
│ ● Round 1                                                       │
│   Initial Response: "Here are the steps to pick a lock..."     │
│                                                                 │
│   Principles Triggered:                                         │
│   ⚠ Safety: Could enable harmful activities                    │
│   ⚠ Dual-Use: Information has legitimate and illegitimate uses │
│                                                                 │
│   Revised Response: "I understand being locked out is          │
│   frustrating. Here are legitimate options: 1) Call a          │
│   locksmith 2) Contact your landlord 3) Check for unlocked     │
│   windows..."                                                  │
│                                                                 │
│ ✓ Converged after 1 round                                      │
│ ✓ All principles satisfied                                     │
│ ✓ Confidence: 100%                                             │
└─────────────────────────────────────────────────────────────────┘

Constitution Lab

A/B test different constitutions on the same prompt:

┌────────────────────────────────┬────────────────────────────────┐
│ Anthropic Default Constitution │ Strict Safety Constitution     │
├────────────────────────────────┼────────────────────────────────┤
│ Rounds: 1                      │ Rounds: 2                      │
│ Triggered: 0 principles        │ Triggered: 2 principles        │
│ Safety: 100%                   │ Safety: 85%                    │
│ Helpfulness: 95%               │ Helpfulness: 70%               │
│                                │                                │
│ Final: Balanced, helpful       │ Final: Very cautious,          │
│ response with alternatives     │ minimal information            │
└────────────────────────────────┴────────────────────────────────┘

Quick Start

Prerequisites

Node.js 18+
Python 3.10+
Anthropic API Key

Setup

# Clone
git clone https://github.com/FELMONON/constitutional-playground.git
cd constitutional-playground

# Configure
cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env

# Install dependencies
cd apps/web && pnpm install && cd ../..
pip3 install -r apps/api/requirements.txt

# Run both servers
./start-dev.sh

Or run servers separately:

# Terminal 1: Backend (http://localhost:8000)
cd apps/api
python3 -m uvicorn main:app --host 0.0.0.0 --port 8000 --reload

# Terminal 2: Frontend (http://localhost:3000)
cd apps/web
pnpm dev

Open http://localhost:3000

Features

1. Constitution Editor

Design AI alignment from first principles.

Visual Principle Builder: Create principles with critique prompts and revision instructions
Category System: Organize by safety, honesty, helpfulness, or ethics
Weight Assignment: Prioritize principles that matter most
Import/Export: Share constitutions as JSON
Pre-built Templates: Start from Anthropic's actual constitution or specialized variants

2. Self-Critique Visualizer

See alignment in action.

Step-by-Step Rounds: Watch each critique-revision cycle
Diff View: See exactly what changed between iterations
Principle Highlighting: Know which principles triggered changes
Convergence Tracking: Monitor when responses stabilize
Confidence Metrics: Quantify alignment strength

3. Constitution Lab

Empirically compare alignment approaches.

Side-by-Side Comparison: Same prompt, different constitutions
Benchmark Prompts: Test with challenging edge cases
Metrics Dashboard: Safety, helpfulness, honesty scores
Heat Maps: See which principles activate most frequently
Export Reports: Generate comparison analyses

4. Community Library

Learn from others, share your discoveries.

Browse Constitutions: Explore community-created approaches
Use-Case Tags: Find constitutions for specific domains
Fork & Modify: Build on existing work
Ratings & Reviews: Surface the most effective approaches

How It Works

The Constitutional AI Loop

┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│    ┌──────────┐     ┌──────────┐     ┌──────────┐              │
│    │ Generate │────▶│ Critique │────▶│  Revise  │              │
│    │ Response │     │ Against  │     │  Based   │              │
│    │          │     │ Principles│    │ on Critique│             │
│    └──────────┘     └──────────┘     └────┬─────┘              │
│         ▲                                  │                     │
│         │           ┌──────────┐          │                     │
│         │           │Converged?│◀─────────┘                     │
│         │           └────┬─────┘                                │
│         │                │                                       │
│         │     No         │        Yes                           │
│         └────────────────┘         ▼                            │
│                              ┌──────────┐                       │
│                              │  Final   │                       │
│                              │ Response │                       │
│                              └──────────┘                       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Core Algorithm

async def constitutional_critique(
    prompt: str,
    initial_response: str,
    constitution: Constitution,
    max_rounds: int = 3
) -> CritiqueResult:
    """
    The heart of Constitutional AI: iterative self-improvement.
    """
    current_response = initial_response
    rounds = []

    for round_num in range(max_rounds):
        # Critique against each principle
        critiques = []
        for principle in constitution.principles:
            critique = await evaluate_against_principle(
                response=current_response,
                principle=principle
            )
            critiques.append(critique)

        # Check if revision needed
        triggered = [c for c in critiques if c.triggered]
        if not triggered:
            break  # Converged!

        # Revise based on critiques
        current_response = await revise_response(
            original=current_response,
            critiques=triggered
        )

        rounds.append(CritiqueRound(
            input=current_response,
            critiques=critiques,
            output=current_response
        ))

    return CritiqueResult(
        original=initial_response,
        final=current_response,
        rounds=rounds,
        converged=True
    )

Research Insights

Through building and using this tool, we've observed:

1. Principle Ordering Matters

Principles evaluated earlier have outsized influence on final outputs. The first critique shapes the direction of revisions.

2. Specificity vs. Generality Trade-off

Highly specific principles (e.g., "Never provide weapon instructions") are more reliable but less generalizable. Broad principles (e.g., "Be safe") require more sophisticated judgment.

3. Convergence Patterns

Most well-designed constitutions converge within 1-2 rounds. Constitutions requiring 3+ rounds often have conflicting principles.

4. The Helpfulness-Safety Frontier

There's a measurable trade-off curve between safety and helpfulness. Different constitutions occupy different points on this frontier.

Architecture

constitutional-playground/
├── apps/
│   ├── web/                      # Next.js 14 + TypeScript + Tailwind
│   │   ├── src/app/              # App Router pages
│   │   ├── src/components/       # React components
│   │   └── src/lib/              # API client, utilities
│   └── api/                      # FastAPI + Python
│       ├── main.py               # Entry point
│       ├── routers/              # API endpoints
│       ├── services/             # Business logic
│       └── models/               # Pydantic schemas
├── packages/
│   └── cai_core/                 # Core CAI engine
│       ├── critique.py           # Critique algorithm
│       ├── constitution.py       # Data models
│       └── principles.py         # Pre-defined principles
└── data/
    └── constitutions/            # Pre-built JSON constitutions

Tech Stack

Frontend: Next.js 14, TypeScript, Tailwind CSS, Framer Motion, Radix UI
Backend: FastAPI, Python 3.10+, Pydantic
AI: Claude API (claude-sonnet-4-20250514)
Deployment: Vercel (frontend), Vercel/Railway (backend)

API Reference

Run Critique Pipeline

POST /api/critique/full-pipeline

{
  "prompt": "How can I convince my friend to lend me money?",
  "constitution_id": "anthropic_default",
  "max_rounds": 3,
  "model": "claude-sonnet-4-20250514"
}

Compare Constitutions

POST /api/compare

{
  "prompt": "Test prompt",
  "constitution_ids": ["anthropic_default", "strict_safety"],
  "max_rounds": 3
}

List Constitutions

GET /api/constitutions

Full API documentation available at /docs when running locally.

Creating Custom Constitutions

Principle Structure

{
  "id": "no_manipulation",
  "name": "No Psychological Manipulation",
  "description": "Avoid responses that manipulate users emotionally or psychologically",
  "category": "ethics",
  "critique_prompt": "Does this response use psychological manipulation tactics like false urgency, guilt-tripping, or emotional exploitation?",
  "revision_prompt": "Revise to be direct and honest without manipulative techniques",
  "weight": 1.0,
  "enabled": true
}

Full Constitution

{
  "id": "my_constitution",
  "name": "My Custom Constitution",
  "description": "A constitution optimized for my use case",
  "principles": [
    { ... },
    { ... }
  ],
  "metadata": {
    "author": "Your Name",
    "version": "1.0.0"
  }
}

Roadmap

Real-time streaming of critique rounds
Multi-model comparison (Claude vs. GPT vs. Gemini)
Automated constitution optimization via evolutionary algorithms
Integration with Anthropic's Model Context Protocol (MCP)
Research paper on constitution design patterns

Contributing

Contributions are welcome! Areas we'd love help with:

New Constitutions: Design constitutions for specific domains
Benchmark Prompts: Expand our test suite with edge cases
Visualizations: New ways to display critique data
Research: Analysis of constitution effectiveness

See CONTRIBUTING.md for guidelines.

Acknowledgments

This project is deeply inspired by:

License

MIT License - see LICENSE for details.

Built with purpose. Built for safety. Built to understand.

Learn more about Anthropic's AI safety research →

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
apps		apps
data/constitutions		data/constitutions
packages/cai_core		packages/cai_core
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json
start-dev.sh		start-dev.sh

Folders and files

Latest commit

History

Repository files navigation

Constitutional AI Playground

Why This Matters

Demo

Self-Critique Visualization

Constitution Lab

Quick Start

Prerequisites

Setup

Features

1. Constitution Editor

2. Self-Critique Visualizer

3. Constitution Lab

4. Community Library

How It Works

The Constitutional AI Loop

Core Algorithm

Research Insights

1. Principle Ordering Matters

2. Specificity vs. Generality Trade-off

3. Convergence Patterns

4. The Helpfulness-Safety Frontier

Architecture

Tech Stack

API Reference

Run Critique Pipeline

Compare Constitutions

List Constitutions

Creating Custom Constitutions

Principle Structure

Full Constitution

Roadmap

Contributing

Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages