example: add jury deliberation model example#252
example: add jury deliberation model example#252khansalman12 wants to merge 2 commits intomesa:mainfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
for more information, see https://pre-commit.ci
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #252 +/- ##
=======================================
Coverage 90.67% 90.67%
=======================================
Files 19 19
Lines 1555 1555
=======================================
Hits 1410 1410
Misses 145 145 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @jackiekazil, @colinfrisch, and @wang-boyu 👋 I’ve just submitted this Jury Deliberation example! It shows how LLM agents with unique personas can handle ambiguous evidence to reach a consensus or hung jury, moving beyond simple probability loops. I’d love your quick thoughts on the core architecture and the future improvements I proposed below (like moving to LLM semantic scoring). Open to any and all feedback! |
Example/Add Jury Deliberation Model Example
Summary
Adds a new LLM-powered agent-based example — Jury Deliberation — to the examples directory. The model simulates 12 jurors deliberating over a criminal burglary case (State v. Marcus Rivera). Each juror is an
LLMAgentwith a unique persona, personality traits, and reasoning style. A rule-basedForepersonAgentmanages the room — picking speakers, calling votes, and declaring the verdict.Motive
While exploring the mesa-examples repo I noticed that existing persuasion/opinion models are rule-based — an agent flips belief with probability
p, full stop. That works for aggregate dynamics but completely misses the content of what was said and the character of the person hearing it.The question this model is really asking is:
You can't answer that with a fixed probability. The answer depends on what the argument says, which evidence it references, and what personality the listener has. That gap is what motivated this example.
Implementation
File structure
Agents
ForepersonAgentmesa.Agent)JurorAgentLLMAgent)guilt_belief, casts votesEach
JurorAgentgets one of 12 distinct personas injected into itssystem_promptandinternal_state:The Case — State v. Marcus Rivera
A second-degree burglary charge with 7 labeled evidence items (
E1–E7). The case is designed to be genuinely ambiguous:Jurors receive a compact case brief in their
system_promptviaget_case_brief(). Full evidence details are available on demand via thereview_evidencetool — this avoids stuffing all evidence into every prompt.Deliberation Loop
%%{init: {'themeVariables': { 'fontSize': '16px'}}}%% flowchart LR classDef bg fill:#f8f9fa,stroke:#dee2e6,stroke-width:2px,color:#212529; classDef start fill:#d1e7dd,stroke:#0f5132,stroke-width:2px,color:#0f5132; classDef obs fill:#cff4fc,stroke:#055160,stroke-width:2px,color:#055160; classDef reason fill:#e2e3e5,stroke:#41464b,stroke-width:2px,color:#41464b; classDef action fill:#fff3cd,stroke:#664d03,stroke-width:2px,color:#664d03; classDef choice fill:#f8d7da,stroke:#842029,stroke-width:2px,color:#842029; Start([Round Starts]):::start --> Foreperson["Foreperson.select_speakers()<br/>(Favors silence & disagreement)"]:::bg subgraph JurorAgent Loop [JurorAgent Step] direction TB Obs["1. generate_obs()<br/>Observe internal states"]:::obs Prompt["2. build_prompt()<br/>Read last 6 statements"]:::obs LLM{"3. reasoning.plan()<br/>CoT + Persona Context"}:::reason Exec["4. apply_plan()<br/>Execute Tool"]:::action Obs --> Prompt --> LLM --> Exec end Foreperson --> Obs Exec --> Which{"Tool Choice?"}:::choice Which -- speak_to_room --> Persuade["Add to discussion_log<br/>Push to peers' memory<br/>Update listeners' belief"]:::bg Which -- review_evidence --> Evid["Return detailed case fact"]:::bg Persuade --> VoteCheck{"Every 3 Rounds?"}:::choice Evid --> VoteCheck VoteCheck -- Yes --> Cast["All Jurors cast_formal_vote()<br/>> 0.55 = Guilty<br/>< 0.45 = Not Guilty<br/>Else = Undecided"]:::bg VoteCheck -- No --> Start Cast --> Unanimous{"Unanimous?"}:::choice Unanimous -- Yes --> Verdict([Guilty / Not Guilty]):::start Unanimous -- No --> Limit{"Max Rounds Reached?"}:::choice Limit -- Yes --> Hung([Hung Jury]):::start Limit -- No --> StartKey Technical Decisions
vision=-1so agents observe all others.silence_bonus * 2.0 + disagreement_from_majority * 3.0 + random_factor * 1.5. This surfaces minority viewpoints and prevents one-sided cascades.get_recent_discussion(model, max_statements=6)caps the number of past statements injected into each prompt. Prevents token blow-up over 15 rounds.speak_to_roomhard-caps statements at 400 characters to prevent token explosion.speak_to_roomcall writes to the shareddiscussion_logand pushes the statement into every other juror'smemory.add_to_memory(). One LLM call propagates to 11 jurors without N×N messaging._estimate_persuasion_direction()counts guilt-leaning vs innocence-leaning keywords and returns+0.1,-0.1, or0.0. Combined with a conformity nudge of(avg - self) * 0.05, this produces gradual belief drift.DataCollectortracks 6 metrics per step:Guilty_Votes,Not_Guilty_Votes,Undecided,Avg_Guilt_Belief,Total_Statements,Statements_Last_Round.import examples.jury_deliberation.tools # noqa: F401inmodel.pytriggers@tool(tool_manager=juror_tool_manager)registration (same pattern asepstein_civil_violence).Usage Examples
Default run (local Ollama):
# No env var needed — falls back to http://localhost:11434 solara run examples/jury_deliberation/app.pyHeadless / terminal only:
Swap the Ollama backend model (in
app.py):Visualisation
The Solara UI has four components:
VerdictStatus— round counter, verdict banner, and a live juror belief table with ASCII progress bars (████░░░░░░) and current vote per juror.Guilty_Votes(red#e74c3c),Not_Guilty_Votes(green#2ecc71),Undecided(orange#f39c12) over time.Avg_Guilt_Beliefchart — overall jury lean as a blue line (#3498db).DiscussionLog— collapsible panel showing the last 8 statements with speaker name and round number.Live simulation at Round 15 — Hung Jury verdict with full juror belief table:

Full dashboard — beliefs, vote distribution, belief trend, and discussion log:
What you should see: early rounds show split beliefs as jurors stake out positions. The Foreperson surfaces dissenting voices, so the discussion stays balanced. In this run (seed 42,
ollama/llama3.1), the case ended in a Hung Jury after 15 rounds — exactly the realistic outcome the ambiguous evidence is designed to produce.Additional Notes
mesa-llm,mesa,solara,litellm, andpython-dotenv(all already in the project).ollama/llama3.1over 15 rounds. Model runs stably with no unhandled exceptions.ruff check,ruff format) passes cleanly on all modified files.Future Work / Enhancements
While the current implementation successfully demonstrates LLM-powered deliberation, the persuasion heuristic (
+0.1/-0.1based on keywords) is intentionally lightweight. This leaves exciting room for future updates to move from "heuristic persuasion" to "semantic persuasion":update_belief()so a "stubborn" juror (e.g., Derek Thompson) requires far more persuasive force to change their mind than an "open-minded" one (e.g., Megan O'Brien).