Skip to content

example: add jury deliberation model example#252

Open
khansalman12 wants to merge 2 commits intomesa:mainfrom
khansalman12:feat/jury-deliberation
Open

example: add jury deliberation model example#252
khansalman12 wants to merge 2 commits intomesa:mainfrom
khansalman12:feat/jury-deliberation

Conversation

@khansalman12
Copy link
Copy Markdown

@khansalman12 khansalman12 commented Mar 21, 2026

Example/Add Jury Deliberation Model Example

Summary

Adds a new LLM-powered agent-based example — Jury Deliberation — to the examples directory. The model simulates 12 jurors deliberating over a criminal burglary case (State v. Marcus Rivera). Each juror is an LLMAgent with a unique persona, personality traits, and reasoning style. A rule-based ForepersonAgent manages the room — picking speakers, calling votes, and declaring the verdict.


Motive

While exploring the mesa-examples repo I noticed that existing persuasion/opinion models are rule-based — an agent flips belief with probability p, full stop. That works for aggregate dynamics but completely misses the content of what was said and the character of the person hearing it.

The question this model is really asking is:

Can LLM agents actually deliberate — argue from evidence, respond to each other's reasoning, and shift position when persuaded — the way real jurors do?

You can't answer that with a fixed probability. The answer depends on what the argument says, which evidence it references, and what personality the listener has. That gap is what motivated this example.


Implementation

File structure

examples/jury_deliberation/
├── __init__.py
├── agents.py       # ForepersonAgent + JurorAgent (12 personas)
├── app.py          # Solara visualisation + model_params
├── case_data.py    # State v. Rivera — 7 evidence items, narratives, judge instructions
├── model.py        # JuryDeliberationModel — orchestration, voting, termination
├── tools.py        # speak_to_room, review_evidence, cast_vote (custom tools)
└── README.md

Agents

Agent Type Role
ForepersonAgent Rule-based (mesa.Agent) Selects speakers, calls votes every 3 rounds, declares verdict
JurorAgent LLM-powered (LLMAgent) Argues from evidence and personality, updates guilt_belief, casts votes

Each JurorAgent gets one of 12 distinct personas injected into its system_prompt and internal_state:

Juror Occupation Personality
Linda Park Retired Engineer Analytical, demands hard evidence
James Whitfield High School Teacher Patient, empathetic
Rosa Gutierrez Nurse Compassionate, trusts expert testimony
Derek Thompson Small Business Owner Blunt, skeptical of excuses
Aisha Patel College Student Idealistic, questions authority
Frank Morrison Former Military Disciplined, respects the legal process
Diane Kowalski Social Worker Wary of systemic bias
Robert Chen Accountant Detail-oriented, dislikes speculation
Carmen Reyes Restaurant Manager Street-smart, trusts gut instinct
William Hayes Retired Police Officer Trusts law enforcement procedures
Megan O'Brien Freelance Artist Open-minded, emotionally perceptive
Howard Kim Pharmacist Cautious, evidence-driven

The Case — State v. Marcus Rivera

A second-degree burglary charge with 7 labeled evidence items (E1E7). The case is designed to be genuinely ambiguous:

ID Evidence Favors Strength
E1 Security Footage (grainy, 40ft away) Prosecution Moderate
E2 Fingerprint on window glass (legitimate store visit 2 days prior) Ambiguous Strong
E3 Alibi witness — David Chen (close friend, possible bias) Defense Moderate
E4 Pawn shop identification (owner saw suspect's photo in news first) Prosecution Weak
E5 Prior record — 2019 petty theft (nonviolent misdemeanor) Prosecution Weak
E6 Store visit log — Nov 13 (confirms Rivera browsed engagement rings) Defense Strong
E7 Cell phone location data (tower covers both alibi and crime scene) Ambiguous Moderate

Jurors receive a compact case brief in their system_prompt via get_case_brief(). Full evidence details are available on demand via the review_evidence tool — this avoids stuffing all evidence into every prompt.


Deliberation Loop

%%{init: {'themeVariables': { 'fontSize': '16px'}}}%%
flowchart LR
    classDef bg fill:#f8f9fa,stroke:#dee2e6,stroke-width:2px,color:#212529;
    classDef start fill:#d1e7dd,stroke:#0f5132,stroke-width:2px,color:#0f5132;
    classDef obs fill:#cff4fc,stroke:#055160,stroke-width:2px,color:#055160;
    classDef reason fill:#e2e3e5,stroke:#41464b,stroke-width:2px,color:#41464b;
    classDef action fill:#fff3cd,stroke:#664d03,stroke-width:2px,color:#664d03;
    classDef choice fill:#f8d7da,stroke:#842029,stroke-width:2px,color:#842029;

    Start([Round Starts]):::start --> Foreperson["Foreperson.select_speakers()<br/>(Favors silence & disagreement)"]:::bg
    
    subgraph JurorAgent Loop [JurorAgent Step]
        direction TB
        Obs["1. generate_obs()<br/>Observe internal states"]:::obs
        Prompt["2. build_prompt()<br/>Read last 6 statements"]:::obs
        LLM{"3. reasoning.plan()<br/>CoT + Persona Context"}:::reason
        Exec["4. apply_plan()<br/>Execute Tool"]:::action
        
        Obs --> Prompt --> LLM --> Exec
    end
    
    Foreperson --> Obs
    
    Exec --> Which{"Tool Choice?"}:::choice
    
    Which -- speak_to_room --> Persuade["Add to discussion_log<br/>Push to peers' memory<br/>Update listeners' belief"]:::bg
    Which -- review_evidence --> Evid["Return detailed case fact"]:::bg
    
    Persuade --> VoteCheck{"Every 3 Rounds?"}:::choice
    Evid --> VoteCheck
    
    VoteCheck -- Yes --> Cast["All Jurors cast_formal_vote()<br/>> 0.55 = Guilty<br/>< 0.45 = Not Guilty<br/>Else = Undecided"]:::bg
    VoteCheck -- No --> Start
    
    Cast --> Unanimous{"Unanimous?"}:::choice
    Unanimous -- Yes --> Verdict([Guilty / Not Guilty]):::start
    Unanimous -- No --> Limit{"Max Rounds Reached?"}:::choice
    
    Limit -- Yes --> Hung([Hung Jury]):::start
    Limit -- No --> Start
Loading

Key Technical Decisions

  • No spatial grid — this is a deliberation room, not a spatial model. vision=-1 so agents observe all others.
  • Speaker selection scoring — the Foreperson computes: silence_bonus * 2.0 + disagreement_from_majority * 3.0 + random_factor * 1.5. This surfaces minority viewpoints and prevents one-sided cascades.
  • Rolling memory windowget_recent_discussion(model, max_statements=6) caps the number of past statements injected into each prompt. Prevents token blow-up over 15 rounds.
  • Statement truncationspeak_to_room hard-caps statements at 400 characters to prevent token explosion.
  • Broadcast + memory push — a single speak_to_room call writes to the shared discussion_log and pushes the statement into every other juror's memory.add_to_memory(). One LLM call propagates to 11 jurors without N×N messaging.
  • Keyword-based persuasion_estimate_persuasion_direction() counts guilt-leaning vs innocence-leaning keywords and returns +0.1, -0.1, or 0.0. Combined with a conformity nudge of (avg - self) * 0.05, this produces gradual belief drift.
  • DataCollector tracks 6 metrics per step: Guilty_Votes, Not_Guilty_Votes, Undecided, Avg_Guilt_Belief, Total_Statements, Statements_Last_Round.
  • Tools registered via side-effect importimport examples.jury_deliberation.tools # noqa: F401 in model.py triggers @tool(tool_manager=juror_tool_manager) registration (same pattern as epstein_civil_violence).

Usage Examples

Default run (local Ollama):

# No env var needed — falls back to http://localhost:11434
solara run examples/jury_deliberation/app.py

Headless / terminal only:

python -m examples.jury_deliberation.model

Swap the Ollama backend model (in app.py):

model_params = {
    # You can swap to other local models, e.g., mistral
    "llm_model": "ollama/mistral",
    ...
}

Visualisation

The Solara UI has four components:

  1. VerdictStatus — round counter, verdict banner, and a live juror belief table with ASCII progress bars (████░░░░░░) and current vote per juror.
  2. Vote chartGuilty_Votes (red #e74c3c), Not_Guilty_Votes (green #2ecc71), Undecided (orange #f39c12) over time.
  3. Avg_Guilt_Belief chart — overall jury lean as a blue line (#3498db).
  4. DiscussionLog — collapsible panel showing the last 8 statements with speaker name and round number.

Live simulation at Round 15 — Hung Jury verdict with full juror belief table:
Screenshot 2026-03-20 at 00 30 07

Full dashboard — beliefs, vote distribution, belief trend, and discussion log:

jury_delibeartion dashboard

What you should see: early rounds show split beliefs as jurors stake out positions. The Foreperson surfaces dissenting voices, so the discussion stays balanced. In this run (seed 42, ollama/llama3.1), the case ended in a Hung Jury after 15 rounds — exactly the realistic outcome the ambiguous evidence is designed to produce.


Additional Notes

  • No new core dependencies — only mesa-llm, mesa, solara, litellm, and python-dotenv (all already in the project).
  • Tested with ollama/llama3.1 over 15 rounds. Model runs stably with no unhandled exceptions.
  • The system exhibits emergent consensus dynamics — the Hung Jury outcome is not pre-programmed but arises naturally from the interaction patterns of the distinct personas handling ambiguous evidence.
  • Pre-commit (ruff check, ruff format) passes cleanly on all modified files.

Future Work / Enhancements

While the current implementation successfully demonstrates LLM-powered deliberation, the persuasion heuristic (+0.1 / -0.1 based on keywords) is intentionally lightweight. This leaves exciting room for future updates to move from "heuristic persuasion" to "semantic persuasion":

  1. LLM-Assisted Persuasion Scoring: Using the LLM to evaluate the actual strength of an argument rather than counting keywords.
  2. Confidence-Weighted Persuasion: Harder evidence shifts beliefs faster than circumstantial arguments.
  3. Persona Resistance: Modifying update_belief() so a "stubborn" juror (e.g., Derek Thompson) requires far more persuasive force to change their mind than an "open-minded" one (e.g., Megan O'Brien).

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 21, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a9e74058-d38d-4d14-a3af-802df959c02f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.67%. Comparing base (8e47d7b) to head (8dd4abe).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #252   +/-   ##
=======================================
  Coverage   90.67%   90.67%           
=======================================
  Files          19       19           
  Lines        1555     1555           
=======================================
  Hits         1410     1410           
  Misses        145      145           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@khansalman12
Copy link
Copy Markdown
Author

Hi @jackiekazil, @colinfrisch, and @wang-boyu 👋

I’ve just submitted this Jury Deliberation example! It shows how LLM agents with unique personas can handle ambiguous evidence to reach a consensus or hung jury, moving beyond simple probability loops.

I’d love your quick thoughts on the core architecture and the future improvements I proposed below (like moving to LLM semantic scoring).

Open to any and all feedback!

@wang-boyu wang-boyu added the example Release notes label label Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

example Release notes label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants