Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions examples/jury_deliberation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Jury Deliberation

A small Mesa-LLM example that simulates a 12-person jury deliberating over a criminal case.
Built using Mesa for simulation and Mesa-LLM for language-based agents.

Each juror is an LLM-powered agent with a simple personality — a retired engineer, a social worker, a nurse, a skeptic, and so on. They argue, listen to each other, update their beliefs, and eventually vote. A lightweight Foreperson agent (no LLM) manages the whole process: deciding who speaks, when to vote, and when deliberation ends.

---

## Why this model is interesting

Most agent-based models don't handle persuasion very well. Traditional ABMs use probability tables to decide if one agent "convinces" another. This model uses actual language — agents construct arguments based on evidence and respond to what others said.

That introduces a few practical challenges:

- **Context window pressure**: 12 agents speaking across multiple rounds fills up memory fast. This model uses a rolling window — each juror only reads the last 6 statements, not the full history.
- **LLM cost control**: Instead of running all 12 agents each round, the Foreperson selects 2-3 speakers. Total LLM calls drop from 12 to ~3 per round.
- **Persona consistency**: Each juror has a fixed personality injected into their system prompt every step, so they don't drift.

---

## The case

*State v. Marcus Rivera* — a burglary charge with 7 pieces of evidence. The case is designed to be ambiguous: there's a fingerprint on the window, but Rivera visited the store legitimately two days before. There's a security video, but it's grainy. There's an alibi witness, but he's a close friend.

Jurors genuinely disagree. Some will lean guilty early; others will push back. That's the point.

---

## How it works

```
Each round:
1. Foreperson picks 2-3 jurors to speak (based on who's been quiet and who disagrees with the majority)
2. Selected jurors read the last few statements and generate an argument
3. Arguments are broadcast to all jurors and influence their belief (0=innocent, 1=guilty)
4. Every 3 rounds, a formal vote is held
5. If all 12 agree → verdict. If rounds run out → hung jury.
```

The model stops automatically when a unanimous verdict is reached or after `max_rounds` (default 15).

---

## Project structure

```
jury_deliberation/
├── case_data.py # the court case, evidence, and facts
├── agents.py # ForepersonAgent + JurorAgent (12 personas)
├── tools.py # speak_to_room, review_evidence, cast_vote
├── model.py # orchestration, voting, termination
├── app.py # Solara visualization
└── README.md
```

---

## Running it

**With visualization:**
```bash
solara run examples/jury_deliberation/app.py
```

**Headless (terminal only):**
```bash
python -m examples.jury_deliberation.model
```

The default LLM backend is `ollama/llama3.1`. You can change `llm_model` in `app.py` or `model.py` to use any LiteLLM-compatible model (e.g. `openai/gpt-4o`, `gemini/gemini-2.0-flash`).

---

## What you can observe

- How quickly jurors converge or polarize
- Which personalities tend to hold their position vs. be persuaded
- The juror belief table with simple progress bars (`█░`)
- How the rolling memory window affects the flow of argument (a friction point worth exploring)
- Whether the Foreperson's speaker selection strategy produces more realistic debates than random selection

---

## Limitations and friction points

A few things this model intentionally surfaces as open problems:

1. **Belief updates are heuristic** — juror beliefs shift based on keyword detection, not actual semantic understanding. A better version would use the LLM itself to rate persuasiveness.
2. **Memory compression** — the rolling window helps, but long deliberations still accumulate context. Summarization-based memory could help here.
3. **Scaling** — adding more jurors works, but the Foreperson's speaker selection gets harder to tune. This is worth experimenting with.
1 change: 1 addition & 0 deletions examples/jury_deliberation/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Jury Deliberation Model - LLM-based multi-agent jury simulation."""
254 changes: 254 additions & 0 deletions examples/jury_deliberation/agents.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
from mesa.agent import Agent

from examples.jury_deliberation.case_data import get_case_brief
from mesa_llm.llm_agent import LLMAgent
from mesa_llm.tools.tool_manager import ToolManager

juror_tool_manager = ToolManager()

# 12 juror personas with distinct backgrounds and reasoning tendencies
JUROR_PERSONAS = [
{
"name": "Linda Park",
"occupation": "Retired Engineer",
"traits": "analytical, methodical, demands hard evidence before deciding",
},
{
"name": "James Whitfield",
"occupation": "High School Teacher",
"traits": "patient, empathetic, considers human circumstances",
},
{
"name": "Rosa Gutierrez",
"occupation": "Nurse",
"traits": "compassionate but practical, trusts expert testimony",
},
{
"name": "Derek Thompson",
"occupation": "Small Business Owner",
"traits": "blunt, results-oriented, skeptical of excuses",
},
{
"name": "Aisha Patel",
"occupation": "College Student",
"traits": "idealistic, values fairness, questions authority",
},
{
"name": "Frank Morrison",
"occupation": "Former Military",
"traits": "disciplined, values order, respects the legal process",
},
{
"name": "Diane Kowalski",
"occupation": "Social Worker",
"traits": "understanding of difficult backgrounds, wary of systemic bias",
},
{
"name": "Robert Chen",
"occupation": "Accountant",
"traits": "detail-oriented, follows logical chains, dislikes speculation",
},
{
"name": "Carmen Reyes",
"occupation": "Restaurant Manager",
"traits": "street-smart, reads people well, trusts gut instinct",
},
{
"name": "William Hayes",
"occupation": "Retired Police Officer",
"traits": "experienced with criminal cases, trusts law enforcement procedures",
},
{
"name": "Megan O'Brien",
"occupation": "Freelance Artist",
"traits": "open-minded, emotionally perceptive, dislikes rigid thinking",
},
{
"name": "Howard Kim",
"occupation": "Pharmacist",
"traits": "cautious, evidence-driven, uncomfortable with uncertainty",
},
]


def _build_system_prompt(persona):
case_brief = get_case_brief()
return (
f"You are {persona['name']}, a {persona['occupation']} serving on a jury. "
f"Your personality: {persona['traits']}.\n\n"
"You are deliberating with 11 other jurors. You should argue your position, "
"listen to others, and be willing to change your mind if persuaded — but stay "
"true to your personality.\n\n"
f"CASE INFORMATION:\n{case_brief}\n\n"
"Keep your statements concise (2-3 sentences max). "
"Use the review_evidence tool if you want to examine specific evidence. "
"Use the speak_to_room tool to share your argument with the jury."
)


def get_recent_discussion(model, max_statements=6):
"""Pull the last few statements from the model's discussion log."""
recent = model.discussion_log[-max_statements:]
if not recent:
return "No discussion yet."
lines = []
for entry in recent:
lines.append(f"{entry['name']}: {entry['statement']}")
return "\n".join(lines)


class ForepersonAgent(Agent):
"""Manages deliberation flow — selects speakers, calls votes, tracks rounds.

This is a rule-based agent, not LLM-powered.
"""

def __init__(self, model):
super().__init__(model=model)
self.rounds_since_vote = 0
self.speaker_history = []
# LLMAgents expect all observable agents to have internal_state
self.internal_state = []

def select_speakers(self, jurors, num_speakers=3):
"""Pick jurors to speak this round, favoring those who haven't spoken recently
and those who disagree with the majority."""

# figure out current majority leaning
beliefs = [j.guilt_belief for j in jurors]
avg_belief = sum(beliefs) / len(beliefs) if beliefs else 0.5

scored = []
for juror in jurors:
score = 0.0

# boost jurors who haven't spoken recently
rounds_silent = 0
for past_id in reversed(self.speaker_history):
if past_id == juror.unique_id:
break
rounds_silent += 1
else:
rounds_silent = len(self.speaker_history)
score += min(rounds_silent, 5) * 2.0

# boost jurors who disagree with the majority
disagreement = abs(juror.guilt_belief - avg_belief)
score += disagreement * 3.0

# small random factor so it doesn't feel deterministic
score += self.model.random.random() * 1.5
scored.append((score, juror))

scored.sort(key=lambda x: x[0], reverse=True)
selected = [j for _, j in scored[:num_speakers]]

# update history
for j in selected:
self.speaker_history.append(j.unique_id)
# keep history from growing forever
if len(self.speaker_history) > len(jurors) * 3:
self.speaker_history = self.speaker_history[-len(jurors) * 2 :]

return selected

def should_call_vote(self):
"""Call a vote every 3 rounds of discussion."""
self.rounds_since_vote += 1
if self.rounds_since_vote >= 3:
self.rounds_since_vote = 0
return True
return False


class JurorAgent(LLMAgent):
def __init__(self, model, reasoning, llm_model, persona, vision=-1):
system_prompt = _build_system_prompt(persona)
internal_state = [
f"name: {persona['name']}",
f"occupation: {persona['occupation']}",
f"traits: {persona['traits']}",
]

super().__init__(
model=model,
reasoning=reasoning,
llm_model=llm_model,
system_prompt=system_prompt,
vision=vision,
internal_state=internal_state,
)

self.tool_manager = juror_tool_manager
self.persona = persona
self.guilt_belief = 0.5 # start neutral
self.vote = "undecided"
self.has_spoken_this_round = False

def __repr__(self):
return f"JurorAgent({self.persona['name']})"

def build_prompt(self):
# cap at 6 recent statements to keep token usage under control
discussion = get_recent_discussion(self.model, max_statements=6)
return (
f"RECENT DISCUSSION:\n{discussion}\n\n"
f"Your current belief about guilt: {self.guilt_belief:.1f} "
"(0=innocent, 1=guilty)\n\n"
"Based on the discussion so far and the evidence, make your argument. "
"You can review specific evidence with review_evidence if needed. "
"Then speak to the room with your position."
)

def step(self):
if self.has_spoken_this_round:
return

prompt = self.build_prompt()
observation = self.generate_obs()
plan = self.reasoning.plan(
prompt=prompt,
obs=observation,
selected_tools=["speak_to_room", "review_evidence"],
)
self.apply_plan(plan)
self.has_spoken_this_round = True

async def astep(self):
if self.has_spoken_this_round:
return

prompt = self.build_prompt()
observation = await self.agenerate_obs()
plan = await self.reasoning.aplan(
prompt=prompt,
obs=observation,
selected_tools=["speak_to_room", "review_evidence"],
)
self.apply_plan(plan)
self.has_spoken_this_round = True

def update_belief(self, persuasion_direction):
"""Shift belief based on what was argued. persuasion_direction is positive
for guilt arguments, negative for innocence arguments."""
shift = persuasion_direction * 0.1
# majority pressure — nudge slightly toward where most jurors lean
all_jurors = [a for a in self.model.agents if isinstance(a, JurorAgent)]
avg = sum(j.guilt_belief for j in all_jurors) / len(all_jurors)
conformity_nudge = (avg - self.guilt_belief) * 0.05
self.guilt_belief = max(
0.0, min(1.0, self.guilt_belief + shift + conformity_nudge)
)

def cast_formal_vote(self):
"""Called during voting rounds. Updates vote based on current belief."""
if self.guilt_belief >= 0.55:
self.vote = "guilty"
elif self.guilt_belief <= 0.45:
self.vote = "not_guilty"
else:
self.vote = "undecided"
return self.vote

def reset_round(self):
self.has_spoken_this_round = False
Loading
Loading