example: add jury deliberation model example by khansalman12 · Pull Request #252 · mesa/mesa-llm

khansalman12 · 2026-03-21T06:47:55Z

Example/Add Jury Deliberation Model Example

Summary

Adds a new LLM-powered agent-based example — Jury Deliberation — to the examples directory. The model simulates 12 jurors deliberating over a criminal burglary case (State v. Marcus Rivera). Each juror is an LLMAgent with a unique persona, personality traits, and reasoning style. A rule-based ForepersonAgent manages the room — picking speakers, calling votes, and declaring the verdict.

Motive

While exploring the mesa-examples repo I noticed that existing persuasion/opinion models are rule-based — an agent flips belief with probability p, full stop. That works for aggregate dynamics but completely misses the content of what was said and the character of the person hearing it.

The question this model is really asking is:

Can LLM agents actually deliberate — argue from evidence, respond to each other's reasoning, and shift position when persuaded — the way real jurors do?

You can't answer that with a fixed probability. The answer depends on what the argument says, which evidence it references, and what personality the listener has. That gap is what motivated this example.

Implementation

File structure

examples/jury_deliberation/
├── __init__.py
├── agents.py       # ForepersonAgent + JurorAgent (12 personas)
├── app.py          # Solara visualisation + model_params
├── case_data.py    # State v. Rivera — 7 evidence items, narratives, judge instructions
├── model.py        # JuryDeliberationModel — orchestration, voting, termination
├── tools.py        # speak_to_room, review_evidence, cast_vote (custom tools)
└── README.md

Agents

Agent	Type	Role
`ForepersonAgent`	Rule-based (`mesa.Agent`)	Selects speakers, calls votes every 3 rounds, declares verdict
`JurorAgent`	LLM-powered (`LLMAgent`)	Argues from evidence and personality, updates `guilt_belief`, casts votes

Each JurorAgent gets one of 12 distinct personas injected into its system_prompt and internal_state:

Juror	Occupation	Personality
Linda Park	Retired Engineer	Analytical, demands hard evidence
James Whitfield	High School Teacher	Patient, empathetic
Rosa Gutierrez	Nurse	Compassionate, trusts expert testimony
Derek Thompson	Small Business Owner	Blunt, skeptical of excuses
Aisha Patel	College Student	Idealistic, questions authority
Frank Morrison	Former Military	Disciplined, respects the legal process
Diane Kowalski	Social Worker	Wary of systemic bias
Robert Chen	Accountant	Detail-oriented, dislikes speculation
Carmen Reyes	Restaurant Manager	Street-smart, trusts gut instinct
William Hayes	Retired Police Officer	Trusts law enforcement procedures
Megan O'Brien	Freelance Artist	Open-minded, emotionally perceptive
Howard Kim	Pharmacist	Cautious, evidence-driven

The Case — State v. Marcus Rivera

A second-degree burglary charge with 7 labeled evidence items (E1–E7). The case is designed to be genuinely ambiguous:

ID	Evidence	Favors	Strength
E1	Security Footage (grainy, 40ft away)	Prosecution	Moderate
E2	Fingerprint on window glass (legitimate store visit 2 days prior)	Ambiguous	Strong
E3	Alibi witness — David Chen (close friend, possible bias)	Defense	Moderate
E4	Pawn shop identification (owner saw suspect's photo in news first)	Prosecution	Weak
E5	Prior record — 2019 petty theft (nonviolent misdemeanor)	Prosecution	Weak
E6	Store visit log — Nov 13 (confirms Rivera browsed engagement rings)	Defense	Strong
E7	Cell phone location data (tower covers both alibi and crime scene)	Ambiguous	Moderate

Jurors receive a compact case brief in their system_prompt via get_case_brief(). Full evidence details are available on demand via the review_evidence tool — this avoids stuffing all evidence into every prompt.

Deliberation Loop

%%{init: {'themeVariables': { 'fontSize': '16px'}}}%%
flowchart LR
    classDef bg fill:#f8f9fa,stroke:#dee2e6,stroke-width:2px,color:#212529;
    classDef start fill:#d1e7dd,stroke:#0f5132,stroke-width:2px,color:#0f5132;
    classDef obs fill:#cff4fc,stroke:#055160,stroke-width:2px,color:#055160;
    classDef reason fill:#e2e3e5,stroke:#41464b,stroke-width:2px,color:#41464b;
    classDef action fill:#fff3cd,stroke:#664d03,stroke-width:2px,color:#664d03;
    classDef choice fill:#f8d7da,stroke:#842029,stroke-width:2px,color:#842029;

    Start([Round Starts]):::start --> Foreperson["Foreperson.select_speakers()<br/>(Favors silence & disagreement)"]:::bg
    
    subgraph JurorAgent Loop [JurorAgent Step]
        direction TB
        Obs["1. generate_obs()<br/>Observe internal states"]:::obs
        Prompt["2. build_prompt()<br/>Read last 6 statements"]:::obs
        LLM{"3. reasoning.plan()<br/>CoT + Persona Context"}:::reason
        Exec["4. apply_plan()<br/>Execute Tool"]:::action
        
        Obs --> Prompt --> LLM --> Exec
    end
    
    Foreperson --> Obs
    
    Exec --> Which{"Tool Choice?"}:::choice
    
    Which -- speak_to_room --> Persuade["Add to discussion_log<br/>Push to peers' memory<br/>Update listeners' belief"]:::bg
    Which -- review_evidence --> Evid["Return detailed case fact"]:::bg
    
    Persuade --> VoteCheck{"Every 3 Rounds?"}:::choice
    Evid --> VoteCheck
    
    VoteCheck -- Yes --> Cast["All Jurors cast_formal_vote()<br/>> 0.55 = Guilty<br/>< 0.45 = Not Guilty<br/>Else = Undecided"]:::bg
    VoteCheck -- No --> Start
    
    Cast --> Unanimous{"Unanimous?"}:::choice
    Unanimous -- Yes --> Verdict([Guilty / Not Guilty]):::start
    Unanimous -- No --> Limit{"Max Rounds Reached?"}:::choice
    
    Limit -- Yes --> Hung([Hung Jury]):::start
    Limit -- No --> Start

Key Technical Decisions

No spatial grid — this is a deliberation room, not a spatial model. vision=-1 so agents observe all others.
Speaker selection scoring — the Foreperson computes: silence_bonus * 2.0 + disagreement_from_majority * 3.0 + random_factor * 1.5. This surfaces minority viewpoints and prevents one-sided cascades.
Rolling memory window — get_recent_discussion(model, max_statements=6) caps the number of past statements injected into each prompt. Prevents token blow-up over 15 rounds.
Statement truncation — speak_to_room hard-caps statements at 400 characters to prevent token explosion.
Broadcast + memory push — a single speak_to_room call writes to the shared discussion_log and pushes the statement into every other juror's memory.add_to_memory(). One LLM call propagates to 11 jurors without N×N messaging.
Keyword-based persuasion — _estimate_persuasion_direction() counts guilt-leaning vs innocence-leaning keywords and returns +0.1, -0.1, or 0.0. Combined with a conformity nudge of (avg - self) * 0.05, this produces gradual belief drift.
DataCollector tracks 6 metrics per step: Guilty_Votes, Not_Guilty_Votes, Undecided, Avg_Guilt_Belief, Total_Statements, Statements_Last_Round.
Tools registered via side-effect import — import examples.jury_deliberation.tools # noqa: F401 in model.py triggers @tool(tool_manager=juror_tool_manager) registration (same pattern as epstein_civil_violence).

Usage Examples

Default run (local Ollama):

# No env var needed — falls back to http://localhost:11434
solara run examples/jury_deliberation/app.py

Headless / terminal only:

python -m examples.jury_deliberation.model

Swap the Ollama backend model (in app.py):

model_params = {
    # You can swap to other local models, e.g., mistral
    "llm_model": "ollama/mistral",
    ...
}

Visualisation

The Solara UI has four components:

VerdictStatus — round counter, verdict banner, and a live juror belief table with ASCII progress bars (████░░░░░░) and current vote per juror.
Vote chart — Guilty_Votes (red #e74c3c), Not_Guilty_Votes (green #2ecc71), Undecided (orange #f39c12) over time.
Avg_Guilt_Belief chart — overall jury lean as a blue line (#3498db).
DiscussionLog — collapsible panel showing the last 8 statements with speaker name and round number.

Live simulation at Round 15 — Hung Jury verdict with full juror belief table:

Full dashboard — beliefs, vote distribution, belief trend, and discussion log:

What you should see: early rounds show split beliefs as jurors stake out positions. The Foreperson surfaces dissenting voices, so the discussion stays balanced. In this run (seed 42, ollama/llama3.1), the case ended in a Hung Jury after 15 rounds — exactly the realistic outcome the ambiguous evidence is designed to produce.

Additional Notes

No new core dependencies — only mesa-llm, mesa, solara, litellm, and python-dotenv (all already in the project).
Tested with ollama/llama3.1 over 15 rounds. Model runs stably with no unhandled exceptions.
The system exhibits emergent consensus dynamics — the Hung Jury outcome is not pre-programmed but arises naturally from the interaction patterns of the distinct personas handling ambiguous evidence.
Pre-commit (ruff check, ruff format) passes cleanly on all modified files.

Future Work / Enhancements

While the current implementation successfully demonstrates LLM-powered deliberation, the persuasion heuristic (+0.1 / -0.1 based on keywords) is intentionally lightweight. This leaves exciting room for future updates to move from "heuristic persuasion" to "semantic persuasion":

LLM-Assisted Persuasion Scoring: Using the LLM to evaluate the actual strength of an argument rather than counting keywords.
Confidence-Weighted Persuasion: Harder evidence shifts beliefs faster than circumstantial arguments.
Persona Resistance: Modifying update_belief() so a "stubborn" juror (e.g., Derek Thompson) requires far more persuasive force to change their mind than an "open-minded" one (e.g., Megan O'Brien).

coderabbitai · 2026-03-21T06:48:02Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a9e74058-d38d-4d14-a3af-802df959c02f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

for more information, see https://pre-commit.ci

codecov · 2026-03-21T07:50:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.67%. Comparing base (8e47d7b) to head (8dd4abe).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #252   +/-   ##
=======================================
  Coverage   90.67%   90.67%           
=======================================
  Files          19       19           
  Lines        1555     1555           
=======================================
  Hits         1410     1410           
  Misses        145      145

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

khansalman12 · 2026-03-21T08:11:43Z

Hi @jackiekazil, @colinfrisch, and @wang-boyu 👋

I’ve just submitted this Jury Deliberation example! It shows how LLM agents with unique personas can handle ambiguous evidence to reach a consensus or hung jury, moving beyond simple probability loops.

I’d love your quick thoughts on the core architecture and the future improvements I proposed below (like moving to LLM semantic scoring).

Open to any and all feedback!

example: add jury deliberation model example

d6d1cae

[pre-commit.ci] auto fixes from pre-commit.com hooks

8dd4abe

for more information, see https://pre-commit.ci

wang-boyu added the example Release notes label label Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

example: add jury deliberation model example#252

example: add jury deliberation model example#252
khansalman12 wants to merge 2 commits intomesa:mainfrom
khansalman12:feat/jury-deliberation

khansalman12 commented Mar 21, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 21, 2026 •

edited

Loading

Review skipped

Uh oh!

codecov bot commented Mar 21, 2026

Uh oh!

khansalman12 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

khansalman12 commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example/Add Jury Deliberation Model Example

Summary

Motive

Implementation

File structure

Agents

The Case — State v. Marcus Rivera

Deliberation Loop

Key Technical Decisions

Usage Examples

Visualisation

Additional Notes

Future Work / Enhancements

Uh oh!

coderabbitai bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codecov bot commented Mar 21, 2026

Codecov Report

Uh oh!

khansalman12 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

khansalman12 commented Mar 21, 2026 •

edited

Loading

coderabbitai bot commented Mar 21, 2026 •

edited

Loading